It is always wrong to use the iterrows method in Pandas. If I were on the Pandas dev team, I would have no hesitation depreciating it and then deleting it out of existence. There are two problems with iterrows: Problem 1. Loops in Pandas are a sin. The first and most important problem is that, … Continue reading For the Love of God, Stop Using iterrows()
This is the 2nd article in a 2-part series on the use of AI in hiring. The first part is available here. In the previous post in this series, I discussed what the public thinks about whether AI can reduce discrimination in hiring. I briefly went over why we should expect it to have no … Continue reading Discrimination and Technical Problems
This is the 1st article in a 2-part series on the use of AI in hiring. The 2nd part will be available on Wednesday, January 8th. Arvind Narayanan somewhat recently put out a presentation called “How to recognize AI snake oil.” It’s incredible, and I highly recommend reading it in full. He also has a … Continue reading AI Will Not Reduce Discrimination in Hiring Practices. Does the Public Agree?
Let’s say you wrote a really basic data import function that finds the latest .csv file in a directory (sorted alphanumerically, and the name equals the date), then imports it into a Pandas dataframe, and then does some light processing of the data. The processing is done in two steps: First, it formats the dates. … Continue reading Making Good Code Great
Every once in a while, people ask me for my recommendations on how to learn Python. I don’t think I’m a good source for this because I learned computer science in high school, dabbled in statistical coding (Stata and Matlab), and got back into “real”programming through Python. This is a much different path than someone … Continue reading How I Learned Python
There are endless blog posts out there describing the basics of linear regression and penalized regressions such as ridge and lasso. These are useful resources, and I’m happy they exist to level the playing field both for people not in college and for people who don’t have the time or fortitude to trudge through mountains … Continue reading Some Things You (Maybe) Didn’t Know About Linear Regression
This recent Tweet erupted a discussion about how logistic regression in Scikit-learn uses L2 penalization with a lambda of 1 as default options. If you don’t care about data science, this sounds like the most incredibly banal thing ever. If you do care about data science, especially from the statistics side of things, well, have … Continue reading Scikit-learn’s Defaults are Wrong