Oh how the tables have turned. I now interview candidates for data science jobs. I have a sense of humor and can appreciate the irony of going from complaining about job interviews to now being one of those interviewers.
I recently deleted a Twitter thread discussing my interview strategy, partly because I agreed with the critics and changed my mind on a few questions I was asking,* and partly because I do not think Twitter isn’t a good place for more nuanced takes.
The first major constraint of an interview from my perspective is this: we have M job openings and N>M candidates. I can pass or fail candidates in my round of the interview. The other major constraint is that I only have X minutes to interview you. I cannot spend the next 10 hours digging into everything about you and get a nuanced view of you as a person. I need to make a decision based on a relatively short interaction.
If I pass everybody or fail everybody then I am effectively relegating the decision-making process to co-interviewers since we ultimately have to pick M out of the N. candidates. If I won’t do it, they will. So, should I relegate the decision-making to those co-interviewers? I’m personally worried about doing that. One thing I’ve noticed in reading evaluations is that a lot of interviewers rely on things that are… odd to me. A lot of criteria interviewers rely on is stuff like “well I don’t think they seemed interested in the job” or “they seemed a little nervous.”
…Not interested in the job? If they’re this far into it they’re probably interested! …Nervous? If they’re interested in the job then I imagine they’d be! And hey wait, didn’t they complain before that they’re not interested? Plus some people don’t like interviews but they can work perfectly fine.
If I pass or fail everyone, then I’m relegating decision-making to the people who evaluate candidates based on that. And I don’t want to to do that because the above criteria are ripe for bias laundering. Those criteria seem to filter less for job competency and more for people who are “good at interviewing,” which may correlate with class and upbringing, which in turn correlate with race and gender.
This is not to say I think my coworkers who I interview with are dumb people for evaluating candidates this way, I just think that interviewing is a tiny part of their job and they haven’t thought about interviewing as much as they think about other job related problems. If you want to think about this problem more yourself, I recommend reading “How to Take the Bias Out of Interviews” for more. The problem is basically that the way people like to interview candidates is in an unstructured format, but unstructured interviews have zero predictive ability because they often come down to things like happenstance of where the conversation veered, or how often they smiled, or how many boilerplate questions they asked about “How’s the workplace culture at ___?”
So ultimately I need to filter candidates out, and I want to avoid bias laundering. The way I do this is to ask questions that have clearly correct and incorrect answers, and evaluate candidates based on their answers to those questions.
If We’re Being Honest, Case Studies Don’t Filter Candidates Well
People don’t like being evaluated based on questions that have correct and incorrect answers because the questions being asked can feel arbitrary: what if they know a lot and are really smart, but the particular question I asked is a personal shortcoming of the prospective candidate?
The reason I rely on questions where answers can be wrong is because I think a lot of the more discretionary parts of the interview process are actually bad.
Obviously, everyone thinks they personally would make the correct discretionary decision. But if two people would tend to make two different decisions given the same circumstances, then only one of them can be right. And obviously nobody thinks of themselves as “biased” in some superficial way. But there are studies that provide strong evidence that these biases exist, so at least some of these self-proclaimed unbiased people must be wrong.
I don’t just ask “correct/incorrect” questions. I also ask case study questions that include parts with possibly 1 million valid answers. I’ve also interviewed with companies who relying mostly or exclusively on such case study questions.
My verdict is that I actually don’t like my case study question much, as I’ve never filtered out a candidate based on that. “But surely, some answers to the case study are better than others,” you retort. Yes, I do agree some answers seem better than others! But how exactly do I compare answers across candidates? Obviously I am biased toward the subset of answers that I would have personally chosen– these are the ones I think are “better.” But then if I pick candidates that way, I’m just hiring people who think like me. And I want candidates who, to some extent, possibly don’t think like me, since diversity of thought is probably good to have.
When going by the case study alone, and applying the reasonable constraint that answers that differ from my preferences are fine (if not preferable to my personal answer by virtue of it providing diversity of thought), then I’d want to hire all of the candidates that I have interviewed with so far! None of the answers I’ve gotten are genuinely bad. But, this just leads us back to the problem outlined above: if I don’t make a decision, then someone else at my org will, and their decision might be bad.
False Wisdom of a Higher Power
I think people like both asking and answering case study questions because they believe that the unknowable process through which interviewers make decisions regarding these case study answers must be able to pick up on more nuance than the more knowable algorithm. I understand why people think that. I just think that these nuances that human discretion picks up on are often noise or (worse) bias– both forms of false wisdom.
The obscurity and multidimensionality of the discretionary decision-making process is, in this case, not a detriment but a benefit to its perceived value. You can nitpick whether or not the questions I’m asking are good or bad questions if my interview is based off questions that have the potential for incorrect answers. And yes, some of them might be bad questions. But it’s not clear how you criticize my processes if it’s entirely discretionary and obscure. If my hiring and firing decisions come down to “I did not believe they provided thoughtful answers” or “Alice’s answers were fine, but Bob’s were better and more creative,” how can you criticize that?
I’ve written a few times before about how one trade-off of using machine learning is that ML techniques obfuscate the interpretability of a model’s parameters. All ML algorithms suffer this to at least some degree (LASSO), and some algorithms suffer this more than others (RNNs). I argue that interpretability is a good thing, and obfuscating interpretability is a genuine trade-off that people should think more about.
However, if one accepts that higher powers have a great unknowable wisdom, interpretability becomes a bad thing. This conclusion is in stark contrast to the idea I’ve presented before in discussing ML, which is that knowing how things work is good actually. Knowing is actually bad if you believe in the false wisdom of a higher power. AI’s most uncritical proponents seem to believe the fact that ML models cannot be interpreted must mean they are doing things so advanced that mere mortals cannot comprehend them. The ML model is the higher power; the wisdom is the model’s outputs.
False wisdom of a higher power is not just a problem that stems from reliance on human discretion or machine learning. Another example of false wisdom of a higher power is Austrian economics and laissez-faire. The normative thrust of Austrian economics is that the market is calculating subjective human preferences in a way that ascends the understanding of mere mortals, therefore it is either “correct,” or at the very least will be more correct or more objective than a human trying to make decisions about how to allocate things. The market is the higher power; the wisdom is its allocation of resources.
The Double Standard of Transparency
Unknowable opaque processes can only be evaluated based on their outcomes. Transparent processes can be evaluated based on both their outcomes and their internal workings. Comparing a transparent process to an opaque one by criticizing the transparent process’s internal logic is a double standard.
Let’s say for example I have a process for hiring that was super transparent and also has no correlation with future performance. I show people the process’s internal logic: turns out, I’m just randomly sorting the list of N candidates and picking the first M candidates in that list.
Now let’s say I implement another process for choosing candidates. It’s just me making decisions based on my gut. Turns out, my gut is also no better than average and my gut choices have no correlation with future performance either.
The only way you can fairly compare these two processes is by their outcomes because the internal logic of the gut decision-making is not exposed. In that comparison, both processes are equally bad. Despite that, if I implement the latter process nobody at my company would bat an eye, and if I implemented the former process I’d get fired on the spot for being not serious.
It’s probably the case that I have in the past asked bad interview questions that have weak signals, only slightly better at best than flipping a coin. I’m working on making my questions have stronger signals. But is the signal any stronger or weaker than one’s unadulterated discretion, which is the preferred way of making hiring decisions for most interviewers? And is human discretion, on average, better than flipping a coin?
* The question I was asking related to knowledge of what I considered to be a basic part of the Python API, which I asked to compare their answer to people’s stated years of Python experience in order to get a signal on whether or not the candidate was exaggerating their Python experience or not. The intent here was that many candidates are slightly dishonest about their stated experiences and I did not want to penalize people who are more honest, but I did not approach it in an ideal way. I will not be asking the question from now on. Unfortunately though, without asking a few Python trivia questions, I may need to scrap the part of the interview where I try to assess Python experience in relation to one’s résumé in general, as this is hard to do without a barrage of trivia questions related to Python APIs or general OOP knowledge.