## ChatGPT as a query engine on a giant corpus of text

In the popular imagination, ChatGPT is an intelligent robot that you can talk to. However, it is a better first order approximation to think of ChatGPT as a query engine on a giant corpus of text scraped off the internet. It is important to state explicitly that ChatGPT is like a query engine on a

## Intuitive Explanation of Arithmetic, Geometric, & Harmonic Mean

If you Google for an explainer on the differences and use cases for the arithmetic mean vs geometric mean vs harmonic mean, I feel like everything you'll find is pretty bad and won't properly explain the intuition of what's going on and why you'd ever do one or the other. In fact, you sometimes will

## Goodbye, Data Science

This is more of a personal post than something intended to be profound. If you are looking for a point, you will not find one here. Frankly I am not even sure who the target audience is for this (probably "data scientists who hate themselves"?). I had been a data scientist for the past few

## Caveats and Limitations of A/B Testing at Growth Tech Companies

For non-tech industry folks, an "A/B test" is just a randomized controlled trial where you split users or other things into treatment and control groups, and then later compare key metrics across those groups and decide which one performed better, so you can learn whether the treatment or control group is preferable. For the context

## Multiple Linear Regression in SQL with Only SUM() and AVG()

This post is inspired by someone dropping this in my mentions today: The technique the authors use is cute, but it's not a true arbitrary multivariate regression. They cheat a little bit using dummy variables for the majority of their coefficients. I respect it, but it's not an arbitrary regression. Fortunately, it is possible to

You don't need to be a "coder" to solve coding problems. If you know Microsoft Excel or Google Sheets, then you can solve these problems, too. There's tons of overlap between coding and working in spreadsheets. Sign up for the 2021 Advent of Code here. Take a stab at these problems with Microsoft Excel or

## Zillow, Prophet, Time Series, & Prices

So I made a mildly controversial tweet. Lots of people enjoyed it, but the LinkedIn-adjacent section of data science Twitter is not happy about it. I want to provide as much context for it as I can here, clarify a few things, and correct myself on a few things, including on some negative stuff I

## Proximate Cause & Theories of Agency

I have deliberately avoided using this blog to engage in overt political discourse, but I've never barred myself from political metadiscourse. Political discourse is about our beliefs on governance; political metadiscourse is more about understanding how people arrive at those beliefs and how those beliefs are expressed. I've been posting on the internet for over

## Frisch-Waugh-Lovell Theorem: Animated

Update: The code for these animations is available here. Another Update: I think some of the explanations on this page may be helped with more colors. I have some updated visuals here that include colors. The Frisch-Waugh-Lovell theorem states that within a multivariate regression on and , the coefficient for , which is , will

## Bitcoin Machine Learning.

