What Makes an Interview Homework Assignment Good or Bad?

Someone reached out to me recently to critique their homework interview problems. I thought it would be useful for them if I wrote up a general overview of how I think about the hiring process and how homework problems fit in to the whole process. I should also note that absolutely nothing here is directly targeted at them or their homework problems; these are just my very general thoughts from doing dozens of homework assignments and interviews.

Hiring is a process where you select (usually but not always) 1 person out of a pool of N heterogeneous candidates to work at a company. A simplified way to conceptualize the goal of your hiring process, assuming N is stable^[1] and pay is nonnegotiable, is to maximize a three-way trade-off between mean candidate quality, variance in candidate quality, and search costs, among all candidates who would take the job if offered.

A lot of employers seem to think that hiring is simply about picking the best candidate, which is clearly wrong. In fact, virtually no employer genuinely behaves this way, even if they will tell you that this is what their hiring process is doing. For example, imagine you have a choice between the following two hiring processes with equal search costs:

100% chance at hiring the average candidate in N.
50% chance at hiring the best candidate in N; 50% chance of hiring the worst candidate in N.

Most employers will pick the first process without hesitation. The best candidate might generate a lot more revenue for your company than their wage rate, but the worst candidate will bring your company down in a tidal wave of sexual harassment lawsuits and/or fraud investigations.

It might be better to think of candidate quality in earlier rounds not as a spectrum but as a binary between “good” or “bad,” and each stage in the hiring process before the final round is designed to make a best guess at to whether a candidate is “good” or “bad.” In this binary world, you might have to choose between the following two processes:

Identifies all the bad candidates and removes them, at the cost of removing some good candidates.
Identifies all the good candidates and keeps them, at the cost of keeping in some bad candidates.

Most employers implement processes like the first and not the second. Or in more technical terms, employers prioritize specificity over sensitivity. And why wouldn’t they? Most places are not hiring “all the good candidates they can,” they’re hiring for one single role.

Additionally, high specificity processes often have low search costs and can be implemented with extremely simple heuristics, like asking for an SAT score or seeing whether they went to an elite college. This is not the primary reason employers want specificity, but it’s an added bonus. It’s not ordained by God that high specificity heuristics are lower cost– it’s a side-effect of how our institutions are designed. Most kids who attend Yale are smart even though it is also true that most smart kids do not go to Yale. If the Ivy Leagues expanded their enrollment twenty-fold to take in every smart kid in America plus a lot of dumb kids, then “went to an Ivy League college such as Yale” would become a high sensitivity heuristic, while still remaining low cost.

Testing for What

The preamble is important because we need to know what the hiring process is and what employers are trying to achieve from it in order to understand what makes a homework problem good or bad.

I have blogged about the canonical “FizzBuzz” problem before, which to me is (or at least, at one point, was) the gold standard of interview problems. What is interesting is that FizzBuzz is a high sensitivity test, and not a high specificity test: FizzBuzz makes no claim to get you the best candidate, but it does guarantee that you don’t hire the worst ones. No good coder fails FizzBuzz, but some bad coders pass it.

How can FizzBuzz be such a good interview problem if it’s high sensitivity when employers want specificity? There are two reasons.

First, FizzBuzz has an extraordinarily low search cost associated with it. It takes all of 5 minutes to administer, assuming the candidate hasn’t already seen the problem, in which case it takes 2 minutes.

It’s not just that the minimally acceptable solution to FizzBuzz takes 5 minutes to come up with, but that the best solution to FizzBuzz takes no more than 10 minutes max:

# Minimally acceptable FizzBuzz in Python
for i in range(1, 101):
    if i % 3 == 0 and i % 5 == 0:
        print("FizzBuzz", end=" ")
    elif i % 3 == 0:
        print("Fizz", end=" ")
    elif i % 5 == 0:
        print("Buzz", , end=" ")
    else:
        print(i, end=" ")

# Best possible FizzBuzz in Python
print(" ".join([
    "{0}{1}".format(
        "Fizz" if i % 3 == 0 else "",
        "Buzz" if i % 5 == 0 else ""
    )
    or str(i) for i in range(1, 101)
]))

In other words, there is a very obvious hard cap on the amount of time one can reasonably take to solve the problem, which is why solutions like the “FizzBuzz Enterprise Edition” or this recently made web-scraped FizzBuzz solution are obviously absurd jokes.

A lot of interview questions fail at this. They may have minimally acceptable solutions that take an hour or two, but it’s possible to imagine spending 50 hours or more to get a solution.^[2] Homework assignments like this are not just selecting for people who can solve the problem but for people who have way too much free time on their hands. Some employers may justify having homework problems like this because it filters for people who really really want to join your company. I’m not super convinced by this: if someone is unemployed they’ll work anywhere that pays well, and if someone is currently employed and willing to spend time on homework instead of relaxing, then that itself is a strong signal they want to work at your company.

The second reason why FizzBuzz is so good is that we know what it does: we can explain exactly how FizzBuzz filters job seekers (it is a high sensitivity process that eliminates people who can’t code) and that the filtering itself is reasonable. In particular, we can answer the following two questions without any ambiguity:

Sensitivity: If someone fails FizzBuzz, are they a bad candidate? (A: Yes.)
Specificity: If someone passes FizzBuzz, are they a good candidate? (A: No, not necessarily.)

(I do think employers should be implement more sensitive filters and I am biased toward favoring sensitive processes such as FizzBuzz for personal reasons,^[3] but that’s not what makes FizzBuzz good per se.)

A lot of homework problems fail at filtering for good candidates in various ways:

Testing for highly specific subject matter knowledge. If you’re hiring for a data scientist role for a medical research job, ask yourself: if someone didn’t know what a proportional-hazards model is, would that be a good reason to disqualify them? Or could someone just learn what that is very quickly and do fine on the job? A lot of prior knowledge that employers seem to filter on are things that one could easily learn in an hour, such as a specific model that’s a spin on OLS or domain knowledge with a specific dashboard.
Not allowing for different approaches. If you want a diversity of backgrounds on your team– epidemiologists, economists, computer scientists, physicists– make sure that when evaluating answers, you consider that people from different backgrounds will show various strengths and may not answer it the way you expected.
Current employees would fail the test. If you consider your team of employees to be good and you want more candidates like your current employees, you might want to make sure your current employees can actually pass the test.
Pay is not commensurate with problem difficulty. At this point, are you actually seriously searching for someone to fill a chair in your office, or are you playing the job boards like the scratch lotto?

I’m not going to tell employers what their priorities here should be. It’s possible that, yes, you really do want someone who knows proportional hazards models like the back of their hand, so you unabashedly test for this. Or that you really do need a super-genius who can answer extremely hard problems, but you don’t have the budget for a super-genius, so you want to do the “job boards as scratch lotto” thing.

But I think it is more likely that employers are not thinking very carefully about what they are actually filtering for. I think it is likely that tech people approach question writing by thinking to themselves, “my ideal candidate would be able to answer all these questions correctly,” instead of thinking more carefully about sensitivity, specificity, and about whether they should try to aim for “good enough” candidates who may have a couple small gaps in their subject matter expertise that can be filled in a day or two. In the language of data science, you might say that employers should try regularizing their processes to avoid overfitting.

Working as Intended

I think I’ve laid out a pretty good framework for thinking about interview processes as a classification problem where each step is a filter that may have false positives and false negatives. This framework should let you understand whether your process is doing what you intend it to do. I think it is clear that many homework assignments fail in a lot of ways, either by selecting for candidates with too much free time, or by selecting for candidates on criteria that don’t correspond with a general idea of what people consider competence and ability to excel at a job.

“Employers make a lot of silly unforced errors when designing hiring processes” sounds a bit cynical, I admit. But there is a much more cynical approach, which is to simply assume that the employers are designing these processes exactly as intended, but that what is intended is not to hire the best most competent employees.

One of the more odd things to happen during the job search was that someone who admires my blog reached out to me, interviewed me, rejected me from the job, and then gave me some one-on-one advice on how to interview better. I want to be clear: I greatly admire and am thankful that they took the time out of their day to do this, they certainly did not need to do that. But what was odd is that none of the advice they gave me was on how to be a better employee, like “you answered this question wrong, here is the right answer” or “you were very rude; good employees are not rude.” I’m not even entirely sure I got the vibe that they thought I would be incompetent or a bad employee; at the end of it all they said I’d be a good employee somewhere else. All of their advice was all very clearly presented in the context that I should be playing a game with the interviewer.

In other words, if I take their advice to heart and if their advice is accurate, I would become a better interviewer but absolutely none of it would make me a better employee. Of course, if you think about job interviews as filtering for the quality of employees, it doesn’t make sense to receive this advice from someone who literally just interviewed me.

So what are employers filtering for, if not for competence at one’s role as it is typically imagined? I think the answer is a lot of the more strange aspects of interview processes are about finding the subset of candidates who are “good enough,” and then among those candidates, finding out which one will be a combination of most fun to hang out with, most unquestioningly subservient to the company, and easiest to rip off in terms of pay. I don’t imagine that many employers actually imagine themselves as doing what I described– they use euphemisms such as “cultural fit” and “agreeableness” to describe what they are looking for instead. So although it has all the contours of a grand conspiracy, it’s all quite banal and out in the open.

This combination of three things can explain why so many interview questions are ostensibly bad at filtering for competence. Consider the 4-point list from the prior section, and how the cynical view explains all the phenomena listed above:

Testing for highly specific subject matter expertise, like whether someone knows a dashboard, implies employers deliberately do not want to pay for someone who is smart enough to figure things out quickly.
Not allowing for different approaches, like hiring only a team of people with a specific major, is about hiring people who are most likely to get along amicably (at the expense of bringing in a diversity of perspectives to build a better product).
If current employees would fail the test, the employer may be trying to hire someone to be the subservient office lackey.
If the pay is not commensurate with difficulty, the employer may be trying to rip off someone who has low expectations and low self-confidence.

I’m trying to avoid needless invective and I’m trying to avoid interpreting my personal experiences in a self-aggrandizing manner (it is always better from a self-help perspective to focus on what you can improve). But the more interviews I go through, the harder it is for me to remain credulous that many employers are in fact selecting for candidates in what we may consider to be a reasonable and fair way designed to get a competent candidate at commensurate pay.

Footnotes

Footnote 1: “N is stable” means that we are discounting intertemporal effects of long, drawn-out recruiting processes. One challenge in hiring is that if you interview someone in the process who seems good, but it’s early on in the process, you might want to reject them if you think you can do better by searching longer, i.e. by increasing the candidate pool N. I am assuming that all available candidates are in the pool and no candidate’s ability to be measured or hired is dependent on time.

Footnote 2: I have no personal qualms with this: on a personal level, as someone who is currently unemployed, I benefit from processes like this.

Footnote 3: High sensitivity processes are more “fair” than high specificity processes, even if employers really like high specificity processes. Everyone who deserves to have a chance in a high sensitivity process gets to have that chance, and you get to expand your pool to include a lot of oddball candidates who are more often denied jobs because they can’t pass a low-cost filter. High specificity processes are often signalling games, e.g. whether you hold the right degree or have the right job title or worked with a very specific tool at your last job.

To make a good argument for why employers should not be so afraid of processes that are highly sensitive, we need to move away from the binary classification of “good” or “bad” and think more about ternary classifications: “great,” “good,” and “bad.” In practice, most of the best homework assignments are highly sensitive to “great” and “not great” while also being highly specific to “bad” and “not bad.” In other words, all the great candidates are retained, all the bad candidates are eliminated, and some indeterminate number of “good” candidates is retained. A process like this would not only be perceived as reasonably fair, but it’s something most employers would probably really want to have.

Testing for What

Working as Intended

Footnotes

Share this:

Related