Working experiments is a process that always falls to information scientists. If that’s you, congrats! It may be a rewarding and high-impact space of labor, but in addition requires instruments discovered exterior the everyday ML-heavy information science curriculum.
Even with the perfect instruments, solely a small share of experiments ship significant enterprise worth. I’ve been fortunate to design and execute many experiments. Of these, I’ve just a few winners. From these, I’m sharing some tales for example key ideas associated to experiments.
Background: I work at an organization known as IntelyCare. We assist join nurses with numerous work alternatives (full-time, part-time, contracts, per-diem… the entire menu).
- Considered one of our core choices is a nursing-only job board. For those who have a look within the 12 months 2025, you’ll discover two attainable methods of sorting jobs by date and by relevance.
Why it issues: The type-by-relevance characteristic is our present greatest lever to ensure a very good expertise for paying prospects. It additionally offers us a possibility to enhance the general effectivity of our job board by steering eyeballs away from low-quality jobs.
Sadly, we will’t put each job on the high of a search outcome. We face a tradeoff between the amount of top-page listings and the high quality of the expertise within the type of elevated applies.
The way it works: “Relevance” doesn’t imply what it usually means. Sorry!
We give every job a rating between 0 and 100. When filling a web page with jobs, sorting by relevance means we kind the outcomes by that rating. That’s it! For brevity, we’ll say any job with a rating increased than 0 is “boosted.”
I do know what you’re pondering, “This isn’t relevance!” And also you’re proper, not less than within the regular sense of the phrase. The rating doesn’t differ throughout job-seekers or search phrases. A greater title can be “related to Google.” We’re OK with that as a result of an enormous share of our job-board site visitors comes from Google, as proven beneath.
In Math: We now have N jobs. Daily we generate a vector of N integers between 0 and 100. We feed this vector right into a black field named Google. If we do a very good job, the black field rewards us with many job functions.
By placing the “proper” jobs on the high of the web page (loaded phrase there), we will enhance upon a chronological kind. Earlier than we will establish the precise jobs, we have to understand how a lot Google really rewards higher-placed jobs.
Day 0: Making progress when you realize nothing
Typically, simply to justify all of the simplifying assumptions I’m going to make later, I begin a undertaking by writing down the mathematics equation I’d like to unravel. I think about ours seems to be one thing like this:

- S is our vector of relevancy scores. There are N jobs, so every s_i (a component of S) corresponds to a special job. A operate known as applies turns S right into a scalar. Every day we’d like to seek out the S that makes that quantity as massive as attainable — the relevancy scores that generate the best variety of job functions for intelycare.com/jobs.
- applies is a superb goal operate on Day 0. In a while our goal operate may change (e.g. income, lifetime worth). Applies are simple to rely, although, and lets me spend my complexity tokens elsewhere. It’s Day 0, individuals. We’ll come again to those questions on Day 1.
- Drawback. We all know nothing in regards to the applies operate till we begin feeding it relevancy scores. 😱
First issues first: Seeing that we all know nothing in regards to the applies operate, our first query is, “how will we select an ongoing wave of each day S vectors so we will study what the applies operate seems to be like?”
- We all know (1) which jobs are boosted and when, (2) what number of applies every job receives every day. Observe the absence of page-load information. It’s Day 0! You may not have all the information you need on Day 0, but when we’re intelligent, we will make do with what now we have.
- Observe the refined change in our goal. Earlier, our aim was to perform some enterprise goal (maximize applies), and ultimately, we’ll come again to that aim. We took off our enterprise hat for a minute and placed on our science hat. Our solely aim now could be to study one thing. If we will study one thing, we will use it (later) to assist obtain some enterprise goal.🤓
- Since our aim is to study one thing, above all we wish to keep away from studying nothing. Bear in mind it’s Day 0 and now we have no assure that the Google Monster pays any consideration to how we kind issues. We might as properly go for broke and ensure this factor works earlier than throwing extra time at enhancing it.
How will we select an preliminary wave of each day S vectors? We’ll give each job a rating of 0 (default rating), and select a random subset of jobs to spice up to 100.
- Perhaps I’m stating the plain, but it surely needs to be random if you wish to isolate the impact of page-position on job functions. We would like the one distinction between boosted jobs and different jobs to be their relative ordering on the web page as decided by our relevance scores. [I can’t tell you how many phone screens I’ve conducted where a candidate doubled down on running an A/B test with the good customers in one group and the bad customers in the other group. In fairness, I’ve also vetted marketing-tech vendors who do the same thing 😭].
- The randomness might be good afterward for different causes. It’s possible that some jobs profit from page-placement greater than others. We’ll have a better time figuring out these jobs with an enormous, randomly-generated dataset.
The plan: Delicate however vital particulars
We all know we will’t increase each job. Anytime I put a job on the high of the web page, I bump all different jobs down the web page (traditional instance of a “spillover”).
The spillover will get worse as I increase increasingly jobs, I impose a larger and larger punishment on all different jobs by pushing them down within the kind (together with different boosted jobs).
- With little exception, nursing jobs are in-person and native, so any boosting spillovers might be restricted to different close by jobs. That is vital.
How will we select an preliminary wave of each day S vectors? (ultimate reply) We’ll give each job a rating of 0 (default rating), and select a random subset of jobs to spice up to 100. The scale of the random subset will differ throughout geographies.
- We create 4 teams of distinct geographies with roughly the identical quantity of internet site visitors in every group. Every group is balanced alongside the important thing dimensions we predict are vital. We randomly increase a special proportion of jobs in every group.
Right here’s the way it regarded…

- Every black circle represents a special geography. Its elevation reveals the distinction in applies-per-job between boosted jobs and all different jobs (measured as a %).
- Whereas teams are balanced in combination, the person geographies differ significantly. The steadiness remains to be vital although. In any other case, what you see within the chart could possibly be an artifact of the combination of city/rural or massive/small geographies in every group. As it’s, we’re assured the outcomes come from our relevancy scores.
- A fast-and-dirty interpretation of this chart is one thing like, “the 5% of jobs on the high of the web page have ~26% extra applies per day than the 95% of jobs positioned beneath. The ten% of jobs on the high of the web page have ~21% extra applies per day than the 90% of jobs beneath…” and so forth. I might by no means be so daring as to say that in actual life, however in a perfect-experiment world it could be true.
- By the point we increase 25% of jobs, the increase expertise is totally averaged out! We diluted the perks of premium placement to virtually nothing for the median geography. “And when everyone is super, no one will be!
.” Are you able to think about studying this the exhausting means? - There are numerous different layers to peel again. Maybe dilution occurs extra shortly for nursing specialties with many pages of listings? What about states that overlap with our long-standing per-diem staffing enterprise? Many superb questions, now we have solutions for some, however all greater than I can embody on this put up.
What comes subsequent? Day 1 is when the actual enjoyable begins! 🎉
- We now have guardrails in opposition to diluting our premium expertise (tremendous vital), however what’s the greatest ~10% of jobs to spice up every day? Clearly our paying prospects have precedence, however then what?
- Does increase assist some jobs greater than others? The randomly-generated information from our experiment is properly suited to reply this and plenty of different questions. We’ll save these questions for future posts.
- As soon as now we have a method for enhancing, is our goal actually to maximise the complete variety of applies? Or will we solely care in regards to the applies for boosted jobs? 🤔 (Typically I miss the Day 0 days when all the roles had been equally related. Could be time to revisit these equations on the high of the put up.)
Key takeaways for individuals who made it this far
- By being considerate about how we generated our preliminary information, we shortly discovered a convincing reply to our query, set ourselves as much as reply many future questions, and saved ourselves a ton of time making an attempt to construct an uplift mannequin on non-existent historic information.
- Considering of a check? Go for it! For those who execute properly, you may see the outcomes clearly in a chart and keep away from all of the difficult statistics (obligatory xkcd reference). [hmm, maybe *most* of the statistics. I still love a good regression table.]
- Spillovers are in every single place. Typically various the remedy throughout an aggregated group might help prefer it did right here. That may shortly axe your sample-size, however I discover it higher to have a small information set with which means than an enormous information set that’s sizzling rubbish.
Bonus: We ran this experiment in 2023. How are issues now?
On the time of our little geo-randomized experiment, you see within the charts that our premium job openings carried out ~25% higher than common jobs (which means they’d 25% extra applies on common).
Why it issues: We’ve taken over a 12 months to develop and iterate our product to make sure our premium listings ship the absolute best expertise. Taking a look at some latest numbers… (actually working the queries as I write this)
- Boosted job openings obtain 425% extra applies than common openings
- Boosted jobs are 450% extra prone to have obtain not less than one apply in comparison with common openings
Not dangerous! This isn’t randomized, in order that 425% consists of all types of choice bias, further product work, a crack search engine optimisation group, and a profitable electronic mail operation, all along with the incremental results from premium web page place. Importantly, all the additional product and advertising work is concentrated on a small variety of jobs as our preliminary testing recommends. 🏆