So I made a mildly controversial tweet. Lots of people enjoyed it, but the LinkedIn-adjacent section of data science Twitter is not happy about it. I want to provide as much context for it as I can here, clarify a few things, and correct myself on a few things, including on some negative stuff I … Continue reading Zillow, Prophet, Time Series, & Prices
I have deliberately avoided using this blog to engage in overt political discourse, but I’ve never barred myself from political metadiscourse. Political discourse is about our beliefs on governance; political metadiscourse is more about understanding how people arrive at those beliefs and how those beliefs are expressed. I’ve been posting on the internet for over … Continue reading Proximate Cause & Theories of Agency
Update: The code for these animations is available here. The Frisch-Waugh-Lovell theorem states that within a multivariate regression on and , the coefficient for , which is , will be the exact same as if you had instead run a regression on the residuals of and after regressing each one on separately. The point of … Continue reading Frisch-Waugh-Lovell Theorem: Animated
Oh how the tables have turned. I now interview candidates for data science jobs. I have a sense of humor and can appreciate the irony of going from complaining about job interviews to now being one of those interviewers. I recently deleted a Twitter thread discussing my interview strategy, partly because I agreed with the … Continue reading On Being An Interviewer
My old piece is getting traction thanks to a share on Hacker News, where some of the most insufferable tech guys in California try to dissect in the comments whether I have deep-seated psychological issues. Also, I was mentioned in this blog post at Win Vector LLC, which offers a fair and very good critique … Continue reading Retrospective on “On Moving From Statistics to Machine Learning”
The U.S. Weather Service has always phrased rain forecasts as probabilities. I do not want a classification of “it will rain today.” There is a slight loss/disutility of carrying an umbrella, and I want to be the one to make the tradeoff. Dr. Frank Harrell, https://www.fharrell.com/post/classification/ This is coming from personal experience and from multiple … Continue reading Why Do So Many Practicing Data Scientists Not Understand Logistic Regression?
Coding is computer science in the same way that buying something at the store is economics, or talking to your neighbor is sociology. Buying a widget at the store is governed by dynamics described by economics. We can use economics to answer questions like “why was the widget priced the way it is?” or “why … Continue reading Coding is Not Computer Science
Does something feel off about Matplotlib’s API to you? If you think Matplotlib is harder to use than it needs to be, your intuition is correct. If you think the reason why Matplotlib has a cumbersome API is because it has so much going on under the hood that it needs to be complicated, you … Continue reading Why You Hate Matplotlib
Someone reached out to me recently to critique their homework interview problems. I thought it would be useful for them if I wrote up a general overview of how I think about the hiring process and how homework problems fit in to the whole process. I should also note that absolutely nothing here is directly … Continue reading What Makes an Interview Homework Assignment Good or Bad?
It is always wrong to use the iterrows method in Pandas. If I were on the Pandas dev team, I would have no hesitation depreciating it and then deleting it out of existence. There are two problems with iterrows: Problem 1. Loops in Pandas are a sin. The first and most important problem is that, … Continue reading For the Love of God, Stop Using iterrows()