Statistical Inference is centered on using random samples effectively. In order to draw a sample at random, one has to ensure that each observation is drawn independently from the same population. A random sample of observations is said to be independent, identically distributed (iid) and this property is key to the Law of Large Numbers, which ensures that the mean of a sample converges to the true mean of the population.
Practically, this implies that when one is analyzing a random sample and evaluating its properties, the analyst can expect those attributes to be true for the underlying population as well. This simple concept is at the basis of the vast majority of data analysis, from surveys to AB testing.
Selection bias occurs when the sample is no longer random, which then implies that its characteristics can be better (positive selection) or worse (negative selection) than the overall population. Here are some examples of selection biases that one may encounter in daily life and as a professional analyst and data scientist:
There is one specific case of selection bias called adverse selection that was brilliantly formalized by Nobel Prize Winner George Akerlof in 1970, The Market for "Lemons": Quality Uncertainty and the Market Mechanism. The paper describes how asymmetry of information can lead to the collapse of the used car markets. The dynamic is the following:
In such a market, only sellers that have “lemons” (with true value below 0.5) are willing to sell, as any seller with good cars that are worth more than 0.5 do not want to sell at the median price. Hence, the selection of cars offered in this market is not a representative selection, it is instead biased towards the lemons, with a true value that is even below the median value of 0.5. Such a market will collapse, as the median price will continue to lower until no transactions are made. George Akerlof, Michael Spence and Joseph Stiglitz, won together the 2001 Nobel Memorial Prize in Economic Sciences for their analyses of markets with asymmetric information.
Their work expanded into a concept in economic terms called signaling, which can help attenuate asymmetric information and prevent markets from collapsing. For example, good car owners could offer warranties to signal that they have a good car, and the “used car with warranty” sample would be a positive selection of good cars.
Beyond the previously owned car market, both asymmetric information, and signaling as a way to mitigate it, are wildly popular in business. Some examples include:
When developing digital products one should consider how to create incentives and costs, or friction, that encourage positive selection and to create mechanisms for signaling. When analyzing test results, survey data and customer behavior, product analysts, customer insights researchers and data scientists should evaluate whether their data could be biased due to positive or adverse selection.
Selection bias in product analytics and digital businesses are pervasive in both how we design the product and business as well as in common methodologies of survey and product data analysis, including incentive programs and rollout designs with AB testing.
Incentive programs include refer-a-friend discounts or gift cards, sign-up miles, and things like cashback and points. These programs should be designed to stimulate positive selection and encourage new customer acquisition. However, the quality of customers acquired through these incentives will largely depend on whether they are able to attract the desired type of customer. Let’s look at two different examples while considering selection bias.
A health and fitness app calculated that the average customers lifetime value (LTV) is $30 and as they are in the growth phase, they are willing to pay up to $25 as customer acquisition cost (CAC) to increase their customer base. The marketing department designs a new campaign offering a $25 cash voucher to any customer who refers a friend when that friend signs up for a free trial. The program is expanded to the entire customer base and advertised on the website. There are no rules on the max number of friends a customer can refer to. The average organic new sign-ups is 1000 per day, the marketing team is expecting 5X new sign-ups for a week.
The campaign is wildly successful and they sign up 100,000 new customers in a week. They had to use all their Marketing budget for the semester to pay off the 2.5M in cash vouchers vs the expected 875K, but were overall very happy with the campaign. Within three months, when the free trial ended, 90% of the new customers canceled their membership. While the remaining 10% continued on with the paid subscription.
This example shows in practice how incentives can change the distribution of the customer base. The campaign was designed considering the historic average LTV. For the historic average LTV to hold, the new customers signing up need to have the same characteristics of the historic customer base, hence being drawn from the same population at random. What we saw happening in the example above, is that instead, the cash voucher created an incentive that largely changed the customer base, heavily attracting a type of customer that are discount seekers, and those customers will go above and beyond to sign up new friends to get the voucher. Their friends are also likely to be strong discount seekers and to further refer their friends, creating a flywheel of sign-ups from that specific customer type. On the other hand, for most paying recurring customers, the $25 voucher might not be enough to incentivize them to go through the hurdle of signing up new friends, but the ones that went through the process were able to sign up friends similar to themselves, who also became paying customers, but that were also not likely to further recruit new friends, hence not creating any flywheel.
This is a typical example of adverse selection in incentive programs, which is driven by the following two metrics, probability of referring a friend, correlation between LTV of referring customer and LTV of referred friend. In this example, discount seeker customers are more likely to refer new friends that also have low LTV, hence increasing both the volume of sign-up and decreasing the LTV per sign-up.
Some tips on how to design programs that attracts similar customers and limit losses for the business:
Obviously, these examples also increase friction, and can limit the success of the campaign, testing different approaches will help fine tune the best one for your audience.
A common practice in product analytics is to phase the rollout of a new product to quickly learn and iterate before the full launch, which may include a go-to-market campaign. This is often designed as an AB test, with the treatment group, selected at random, seeing the new product, and the control group not seeing it. This is a useful practice to detect bugs, monitor execution quality, and to closely evaluate customer response, for example by analyzing CX contacts or through social listening.
Besides that, you can consider any other product and business metrics to be affected by selection bias, even the rollout was designed under an AB test with eligibility for the new product being random. The reason for it is that while the eligibility of the new product was random, adoption is a form of self selection, and customers that sign up, buy or use a product when eligible, might not be the same as the average eligible group. Specifically, customers that are at the lookout for new products and sign-up immediately after launch are different from less responsive customers. Whether it’s positive or adverse selection will depend on the product.
On average, first adopters of credit products are very the most credit hungry customers, and will likely have higher risk than average, which would be an adverse selection of the overall population. Customers that are evaluating their credit needs and options through time, and apply for a new credit product after long consideration, will likely have better financial planning skills, which generally correlates with less risk.
Outside of credit, most digital products see a positive selection bias in their early adopters. The customers that sign-up at release are usually enthusiastic, heavy users in that industry, pioneers of new products and technology, and are always looking for what is new out there. These customers tend to engage and purchase more, and churn less, than subsequent groups, which led to early cohorts being also called “golden cohort”, especially in the gaming industry. This implies that first adopters not only are different from the eligible population, but they might also be different from subsequent adopters. Overall, early performance during rollout is likely not representative of the long term product performance. This is an example of positive selection bias, in which the best customers sign up first, making the first cohorts less indicative of the average customer behavior. A rule of thumb would be to continue measuring business metrics at the cohort level for weeks or months until they stabilize.
Understanding how self selection can lead to selection bias, may help you analyze and model data with more rigor, avoiding common mistakes of extrapolating conclusions to a population that is different than the sample at hand. The examples above are just some of the most common cases, but in virtually every analysis it is worth checking whether the underlying process of data generation is truly random.