My old piece is getting traction thanks to a share on Hacker News, where some of the most insufferable tech guys in California try to dissect in the comments whether I have deep-seated psychological issues. Also, I was mentioned in this blog post at Win Vector LLC, which offers a fair and very good critique … Continue reading Retrospective on “On Moving From Statistics to Machine Learning”
The U.S. Weather Service has always phrased rain forecasts as probabilities. I do not want a classification of “it will rain today.” There is a slight loss/disutility of carrying an umbrella, and I want to be the one to make the tradeoff. Dr. Frank Harrell, https://www.fharrell.com/post/classification/ This is coming from personal experience and from multiple … Continue reading Why Do So Many Practicing Data Scientists Not Understand Logistic Regression?
Coding is computer science in the same way that buying something at the store is economics, or talking to your neighbor is sociology. Buying a widget at the store is governed by dynamics described by economics. We can use economics to answer questions like “why was the widget priced the way it is?” or “why … Continue reading Coding is Not Computer Science
Does something feel off about Matplotlib’s API to you? If you think Matplotlib is harder to use than it needs to be, your intuition is correct. If you think the reason why Matplotlib has a cumbersome API is because it has so much going on under the hood that it needs to be complicated, you … Continue reading Why You Hate Matplotlib
Someone reached out to me recently to critique their homework interview problems. I thought it would be useful for them if I wrote up a general overview of how I think about the hiring process and how homework problems fit in to the whole process. I should also note that absolutely nothing here is directly … Continue reading What Makes an Interview Homework Assignment Good or Bad?
It is always wrong to use the iterrows method in Pandas. If I were on the Pandas dev team, I would have no hesitation depreciating it and then deleting it out of existence. There are two problems with iterrows: Problem 1. Loops in Pandas are a sin. The first and most important problem is that, … Continue reading For the Love of God, Stop Using iterrows()
This is the 2nd article in a 2-part series on the use of AI in hiring. The first part is available here. In the previous post in this series, I discussed what the public thinks about whether AI can reduce discrimination in hiring. I briefly went over why we should expect it to have no … Continue reading Discrimination and Technical Problems
This is the 1st article in a 2-part series on the use of AI in hiring. The 2nd part will be available on Wednesday, January 8th. Arvind Narayanan somewhat recently put out a presentation called “How to recognize AI snake oil.” It’s incredible, and I highly recommend reading it in full. He also has a … Continue reading AI Will Not Reduce Discrimination in Hiring Practices. Does the Public Agree?
Let’s say you wrote a really basic data import function that finds the latest .csv file in a directory (sorted alphanumerically, and the name equals the date), then imports it into a Pandas dataframe, and then does some light processing of the data. The processing is done in two steps: First, it formats the dates. … Continue reading Making Good Code Great
Every once in a while, people ask me for my recommendations on how to learn Python. I don’t think I’m a good source for this because I learned computer science in high school, dabbled in statistical coding (Stata and Matlab), and got back into “real”programming through Python. This is a much different path than someone … Continue reading How I Learned Python