How I Learned Python

Every once in a while, people ask me for my recommendations on how to learn Python. I don’t think I’m a good source for this because I learned computer science in high school, dabbled in statistical coding (Stata and Matlab), and got back into “real”programming through Python. This is a much different path than someone who is going from nothing or Excel into Python.

All that said, I’d rather not constantly type out variations of the same thing over and over again, so this post will serve as my go-to link when someone asks me the question.

But first, a PSA:

Every good coder is self-taught.

One of the bigger misconceptions about CS and CIS majors is that they spend their whole 4 years in college learning how to code constantly, so therefore anyone starting to learn code today has a 4-year deficit to make up just to catch up with the typical fresh CS major. This is not true. Never mind the gen ed courses: a great deal of every college’s core CS curriculum is dedicated to things like “how/why do networks work” and “why is quick sort on average more efficient than bubble sort, and what’s an edge case where bubble sort is better?” In other words, the actual science of computers.

A lot of CS majors leave college not knowing much about how to code outside of a very small handful of classes– either because they unfortunately didn’t land the summer internships that actually taught them how to code in the real world, or because they spent all their free time playing video games, or because they genuinely didn’t want to become a coder. Many CS majors never develop software. Many of them go into IT roles where they set up people’s work laptops and troubleshoot the company’s wifi whenever it goes down. I don’t mean this as a dig: IT is a respectable and honest line of work that pays well, and a good IT professional uses a lot of what they learned in college about how computers work. But they’re not writing code, and likely didn’t spend a whole lot of their time in college writing code.

And let’s also not forget about the broad number of people who ostensibly should know how to code pretty well based on how much exposure they have to coding at their jobs, but who also can’t pass FizzBuzz tests, i.e. basic competency tests that weed out the worst of the worst software development job candidates from people who at least stand a chance.

Most of the really good developers spent their free time in both college and high school coding. Your typical computer science major probably has like 1-2 years equivalent of real coding by the time they graduate a 4-year college; the all-stars have up to 10 years. In other words, the all-stars are more or less self-taught and the computer science degree isn’t a significant contributor to why they’re good.

All of this is to say if you’re discouraged from learning how to code because you feel like you won’t be able to “catch up” with people who seem to have a head start on you, or feel like it requires 4 years of dedication and a college degree, fear not: your typical CS major does not have as much experience coding as you think, experience does not guarantee competency, and every good coder is self-taught to some degree. You can learn how to code pretty quickly if you are reasonably smart, have the right mindset for it, and dedicate some time to it. Don’t sweat it.

Learning resources:

Intro to computer science.

I took comp sci in high school, but I relearned comp sci with Harvard’s CS-50. It’s a great class. There are lots of great intro comp sci courses out there, but this is the one I audited in my free time.

Intro to Python.

Charles Severance at the University of Michigan has an awesome 5-part track for learning Python available on Coursera. I personally skipped the first and last courses (the last one is a capstone project) and audited the videos.

Getting your hands dirty.

I’ll be honest: I don’t really know how to help people bridge the gap between “I took an intro Python course” and “I’m good at Python” using my personal experience. The reason why is because I relied a lot on a combination of my previous non-Python technical experience (mostly Excel, Stata, and Matlab on and off between 10th grade and college graduation) and real world needs at my particular job function (economics damages consulting). I understand a lot of people reading this don’t have cushy desk jobs where they have downtime that is in front of a computer and/or have jobs where they have computer tasks that are very clearly ripe for automation.

In my particular case, I started writing some really ugly but effective scripts that were inspired by things that I was doing at my job. There are four things these scripts had in common:

I couldn’t do these things in Stata, which is what I primarily used at that job.
They were iterated processes (i.e. something that fits in a loop).
The process or function being executed within each iteration wasn’t that complicated.
They were related to things I genuinely hated doing at work.

Rote tasks are both really obvious to code away, as well as really simple to code. A lousy coder can take a terribly slow and repetitive process and make it at least slightly more efficient, even if the code sucks. So try to identify things like that at work (or in your daily life as you learn how to code, if you don’t have the privilege of being paid to sit in front of a computer).

That final bullet– the fact that I hated doing these things– is really important. The desire to make my work life more enjoyable through automating the crappy stuff was both the carrot and stick that motivated me through learning Python’s conventions until it became natural for me.

Here are two examples of the many things I did to get my hands dirty:

Fuzzy-matching loop between two lists of strings.

The first script I wrote was a script that took a CSV file with two columns of text, fuzzy-matched each line of text in the first column with the best matching line in the other (without replacement). I used a Python package called “fuzzywuzzy” for the fuzzy-matching algorithm. Basically it created a rectangular matrix of the fuzzy matching algorithm scores and then deleted the row and column that contained the best score, and did this until the matrix was empty.

This script was inspired by someone at my company recounting a project where they had to match records from two sources that didn’t line up perfectly, e.g. “XYZ Company Incorporated” vs “XYZ Comp Inc.” I did eventually use this script on the job, so hoorah.

Automated PDF file creation.

The next Python script I wrote was inspired by a project I hated due to how repetitive and slow it was. Someone wanted me to print out each footnote reference in a report, and put it all together, in order, in a binder. If footnote 7 referenced Bates stamp ABC_000474, I had to print out page 474, then hole punch one of those legal binder tabs and stick it in front of that page. I did this for over 100 footnotes. I hated it. It took me over a full day of work to get it done because I had no idea where the files were because this wasn’t even a project I worked on; I had to learn a whole file directory for this tedious crap.

My solution at first was to create an Excel spreadsheet. No Python at all, I promise. I just wanted to document where stuff was for the report.

Footnote	Text	Pages	Link
1	John Doe’s Deposition, pp. 23:15-25:2. Also see Mary Jane’s Deposition, p. 15:12-17.	23-25	C:/files/doe_deposition.pdf
1	John Doe’s Deposition, pp. 23:15-25:2. Also see Mary Jane’s Deposition, p. 15:12-17.	15	C:/files/jane_deposition.pdf
2	Summers’s Expert Report, pp. 13, 16.	13, 16	C:/files/summers_report.pdf
3	Defendant’s motion to dismiss amended complaint, para. 83.	37	C:/files/motion_to_dismiss.pdf

Where does the Python come in? Well, by putting everything into an Excel spreadsheet, I realized the process was now just a repetitive task of clicking on file links and printing pages. And things like this can be automated!

Using Python’s wonderful PyPDF2 library, the code would loop through the Excel spreadsheet like this:

Create a “tab” for the footnote, e.g. footnote #1. (The tab page had big black borders to make it easier to see when printed to rip out and replace with real legal tabs.) Then create a blank page for double sided optimization.
Read the file and given page range with a PdfFileReader object, for each file that had footnote #1.
Check the number of pages added; if it’s an odd number, add another blank page.
Create a big PDF.

The Excel spreadsheet made the process manageable, but the code made it easy and even kind of fun. After I got this down, I actually really liked keeping track of footnotes, and the end result was a gigantic PDF, the fresh smell of toner, and a lot of documentation that made QAing reports so much easier.

Seeing other people’s code.

There are two primary ways I started looking at other people’s code.

First, I started using a website called Codewars to do practice problems in Python. Trust me when I say this, but focus more on the easy and trivial-ish problems on Codewars, not the harder stuff, even if you’re a good problem solver. Yes, you’re not going to learn problem-solving skills by doing some FizzBuzz problem on that website, but that’s not the point. After you submit your own working answer, you get to see other people’s solutions, sorted with the highest upvoted answers at the top of the page. And what you’re going to learn is that your solution is ugly compared to other people’s solution. That problem that took you 8 lines of code? Some guy did it in 1 line.

A lot of what makes someone good at Python (and coding in general) isn’t just knowing how to get a solution, but getting both the prettiest solution and the one that utilizes all the things that Python can do in an effective way. Prettiness isn’t the only thing that makes a code base “good,” and there are real dangers with trying to be too clever, but prettiness is important because it makes code easier to read and therefore easier to maintain in the long-run. (And don’t get me started on functional programming: concise code often avoids nasty “side-effects” that you might inadvertently create when you declare a bunch of variables in your function instead of just returning a quick one-liner.)

The other way I read other people’s code was by learning Flask, which is a way to create websites using Python. In order to figure out what the heck is going on with Flask, I had to follow the tutorial really carefully. And the tutorial is good! It taught me a lot of nice little conventions that had nothing to do with Flask that I eventually got into the habit of doing.

Additionally, once I finished up the tutorial and finally built my own website, I felt comfortable enough with all the things that Flask had to offer that I decided to look under the hood at Flask’s source code. This was my first time looking at something that qualified as real, professional, bona fide code. I learned about how an __init__.py file actually works (instead of just leaving it blank), I learned about Sphinx docstrings, I learned about unit-tests.

You don’t need to learn from Flask specifically; I only peeked under the hood because I felt comfortable with it. But my goodness, you really need to look at other people’s code eventually. You’ll eventually hit a point where you won’t get better until you do this. You’ll never realize how many things you’re not doing but should be doing until you see how other people code.

This video.

Last but not least, everyone who is at least somewhat familiar with Python needs to watch this video. This taught me more about coding in an hour than anything else ever has. Just trust me on it and thank me later. (Note: this video will not be as useful to you until you have 1+ year of Python, but eventually you do need to watch it.)