Update: The code for these animations is available here. The Frisch-Waugh-Lovell theorem states that within a multivariate regression on and , the coefficient for , which is , will be the exact same as if you had instead run a regression on the residuals of and after regressing each one on separately. The point of … Continue reading Frisch-Waugh-Lovell Theorem: Animated
Oh how the tables have turned. I now interview candidates for data science jobs. I have a sense of humor and can appreciate the irony of going from complaining about job interviews to now being one of those interviewers. I recently deleted a Twitter thread discussing my interview strategy, partly because I agreed with the … Continue reading On Being An Interviewer
My old piece is getting traction thanks to a share on Hacker News, where some of the most insufferable tech guys in California try to dissect in the comments whether I have deep-seated psychological issues. Also, I was mentioned in this blog post at Win Vector LLC, which offers a fair and very good critique … Continue reading Retrospective on “On Moving From Statistics to Machine Learning”
The U.S. Weather Service has always phrased rain forecasts as probabilities. I do not want a classification of “it will rain today.” There is a slight loss/disutility of carrying an umbrella, and I want to be the one to make the tradeoff. Dr. Frank Harrell, https://www.fharrell.com/post/classification/ This is coming from personal experience and from multiple … Continue reading Why Do So Many Practicing Data Scientists Not Understand Logistic Regression?
Coding is computer science in the same way that buying something at the store is economics, or talking to your neighbor is sociology. Buying a widget at the store is governed by dynamics described by economics. We can use economics to answer questions like “why was the widget priced the way it is?” or “why … Continue reading Coding is Not Computer Science
Does something feel off about Matplotlib’s API to you? If you think Matplotlib is harder to use than it needs to be, your intuition is correct. If you think the reason why Matplotlib has a cumbersome API is because it has so much going on under the hood that it needs to be complicated, you … Continue reading Why You Hate Matplotlib
Someone reached out to me recently to critique their homework interview problems. I thought it would be useful for them if I wrote up a general overview of how I think about the hiring process and how homework problems fit in to the whole process. I should also note that absolutely nothing here is directly … Continue reading What Makes an Interview Homework Assignment Good or Bad?
It is always wrong to use the iterrows method in Pandas. If I were on the Pandas dev team, I would have no hesitation depreciating it and then deleting it out of existence. There are two problems with iterrows: Problem 1. Loops in Pandas are a sin. The first and most important problem is that, … Continue reading For the Love of God, Stop Using iterrows()
This is the 2nd article in a 2-part series on the use of AI in hiring. The first part is available here. In the previous post in this series, I discussed what the public thinks about whether AI can reduce discrimination in hiring. I briefly went over why we should expect it to have no … Continue reading Discrimination and Technical Problems
This is the 1st article in a 2-part series on the use of AI in hiring. The 2nd part will be available on Wednesday, January 8th. Arvind Narayanan somewhat recently put out a presentation called “How to recognize AI snake oil.” It’s incredible, and I highly recommend reading it in full. He also has a … Continue reading AI Will Not Reduce Discrimination in Hiring Practices. Does the Public Agree?