Selection Bias in product analytics and common pitfalls

Statistical Inference is centered on using random samples effectively. In order to draw a sample at random, one has to ensure that each observation is drawn independently from the same population. A random sample of observations is said to be independent, identically distributed (iid) and this property is key to the Law of Large Numbers, which ensures that the mean of a sample converges to the true mean of the population.

Practically, this implies that when one is analyzing a random sample and evaluating its properties, the analyst can expect those attributes to be true for the underlying population as well. This simple concept is at the basis of the vast majority of data analysis, from surveys to AB testing.

Selection bias occurs when the sample is no longer random, which then implies that its characteristics can be better (positive selection) or worse (negative selection) than the overall population. Here are some examples of selection biases that one may encounter in daily life and as a professional analyst and data scientist:

Analyses of credit card usage: this is conditioned on customers being approved for a credit card, a positive selection. One cannot infer much about the overall population and what would be for customers that were not approved. In technical terms, the data is censored.
Surveys after a purchase: customers that already were aware of a certain store, considered a product and made a purchase, have inherently different characteristics of customers that were not aware or were aware but did not consider buying. Any conclusions from these surveys are only applicable to the population of actual buyers, and cannot be extended to the total addressable market.
Studies on academic achievement and income of kids that were breastfed past one year of age: kids that were breastfed for one year or more likely were born from mothers that were healthy, had higher flexibility on their work schedule or were staying at home, and were well informed on the benefits of breastfeeding. These characteristics create a positive selection that correlates well to other leading causes of academic achievement, such as income, parental education and access to information. In technical terms, these are called “confounding variables.” Any inference on the effect of breastfed milk past one year of age cannot be applied to kids that don’t have the same parental characteristics.
Friends that come to a wedding: when organizing a wedding, the couple has an overwhelmingly large set of options to consider, from intimate and simple courthouse ceremony during the weekday, to a fancy destination wedding overseas. Each choice creates different incentives and costs for the attendees and the invitees will self select based on those incentives. It’s likely that a weekend day celebration in town will receive more guests with children than an evening reception, a weekday celebration will likely be missed by those with less job flexibility, and a black-tie destination affair will likely be attended by those that have more discretionary income and that value traveling and fancy events. When going to a friend’s wedding, the attending guests are a biased selection of the true social and family ties of the couple.

Adverse Selection and the Lemons Market

There is one specific case of selection bias called adverse selection that was brilliantly formalized by Nobel Prize Winner George Akerlof in 1970, The Market for "Lemons": Quality Uncertainty and the Market Mechanism. The paper describes how asymmetry of information can lead to the collapse of the used car markets. The dynamic is the following:

The buyer doesn’t know if the car is a lemon or a good car. If the quality of cars can be valued from 0 to 1, a buyer will be willing to pay at most 0.5, the median value.
However, sellers know the true value of their cars, hence they are only willing to sell their cars if the price offered by the buyer is higher than the true value of the car.

In such a market, only sellers that have “lemons” (with true value below 0.5) are willing to sell, as any seller with good cars that are worth more than 0.5 do not want to sell at the median price. Hence, the selection of cars offered in this market is not a representative selection, it is instead biased towards the lemons, with a true value that is even below the median value of 0.5. Such a market will collapse, as the median price will continue to lower until no transactions are made. George Akerlof, Michael Spence and Joseph Stiglitz, won together the 2001 Nobel Memorial Prize in Economic Sciences for their analyses of markets with asymmetric information.

Their work expanded into a concept in economic terms called signaling, which can help attenuate asymmetric information and prevent markets from collapsing. For example, good car owners could offer warranties to signal that they have a good car, and the “used car with warranty” sample would be a positive selection of good cars.

Beyond the previously owned car market, both asymmetric information, and signaling as a way to mitigate it, are wildly popular in business. Some examples include:

Education, especially higher education, signals to the labor market that an otherwise unknown person might be more productive or skilled in that specific field than the average population, hence the employer will be willing to pay more for candidates with that degree
Willingness to go through medical exams and health questionnaires limits the information asymmetry between the insurance company and the applicant, signaling that the applicant is in good health, and allowing insurance companies to price policies based on the individual’s risk
The share of ownership a founder keeps before an IPO signals to the market that the company has long term value, as the founder, who has much more information about the true value of a company than potential investors, tied most of his net worth to that asset

Applications of Selection Bias and Signaling in Product Analytics

When developing digital products one should consider how to create incentives and costs, or friction, that encourage positive selection and to create mechanisms for signaling. When analyzing test results, survey data and customer behavior, product analysts, customer insights researchers and data scientists should evaluate whether their data could be biased due to positive or adverse selection.

Selection bias in product analytics and digital businesses are pervasive in both how we design the product and business as well as in common methodologies of survey and product data analysis, including incentive programs and rollout designs with AB testing.

Incentive programs

Incentive programs include refer-a-friend discounts or gift cards, sign-up miles, and things like cashback and points. These programs should be designed to stimulate positive selection and encourage new customer acquisition. However, the quality of customers acquired through these incentives will largely depend on whether they are able to attract the desired type of customer. Let’s look at two different examples while considering selection bias.

Refer-a-friend incentive program case

A health and fitness app calculated that the average customers lifetime value (LTV) is $30 and as they are in the growth phase, they are willing to pay up to $25 as customer acquisition cost (CAC) to increase their customer base. The marketing department designs a new campaign offering a $25 cash voucher to any customer who refers a friend when that friend signs up for a free trial. The program is expanded to the entire customer base and advertised on the website. There are no rules on the max number of friends a customer can refer to. The average organic new sign-ups is 1000 per day, the marketing team is expecting 5X new sign-ups for a week.

Expected return of the campaign per customer: LTV - CAC = 30-25= $5
Expected volume of sign-ups per week: 1000 * 5 * 7 = 35,000
Marketing budget (upfront investment) = Number of Sign-ups * Voucher = 35,000 * 25 = 875,000

The campaign is wildly successful and they sign up 100,000 new customers in a week. They had to use all their Marketing budget for the semester to pay off the 2.5M in cash vouchers vs the expected 875K, but were overall very happy with the campaign. Within three months, when the free trial ended, 90% of the new customers canceled their membership. While the remaining 10% continued on with the paid subscription.

LTV for the 90% of customers that dropped: 0
LTV for the 10% of customers that stayed as paying members: $30
CAC per customer: $25
Marketing cost = $25 * 100,000 = 2.5M
Net return of the campaign per customer: 0*90% + 10% * 30 - 25 = - $22
Net return of the campaign (total)= number of new customers acquired with campaign * net return of campaign = 100,000 * (-22) = -2.2M

This example shows in practice how incentives can change the distribution of the customer base. The campaign was designed considering the historic average LTV. For the historic average LTV to hold, the new customers signing up need to have the same characteristics of the historic customer base, hence being drawn from the same population at random. What we saw happening in the example above, is that instead, the cash voucher created an incentive that largely changed the customer base, heavily attracting a type of customer that are discount seekers, and those customers will go above and beyond to sign up new friends to get the voucher. Their friends are also likely to be strong discount seekers and to further refer their friends, creating a flywheel of sign-ups from that specific customer type. On the other hand, for most paying recurring customers, the $25 voucher might not be enough to incentivize them to go through the hurdle of signing up new friends, but the ones that went through the process were able to sign up friends similar to themselves, who also became paying customers, but that were also not likely to further recruit new friends, hence not creating any flywheel.

This is a typical example of adverse selection in incentive programs, which is driven by the following two metrics, probability of referring a friend, correlation between LTV of referring customer and LTV of referred friend. In this example, discount seeker customers are more likely to refer new friends that also have low LTV, hence increasing both the volume of sign-up and decreasing the LTV per sign-up.

Some tips on how to design programs that attracts similar customers and limit losses for the business:

Segment your population and create different programs for different groups. In the example above, instead of advertising on their website, this app could have designed their refer-a-friend campaign with different incentives, such as offering the voucher only to paying customers, and offering one additional free month to customers that were still in the first three months of the free trial.
Cap the incentive per customer, limiting the risk of gamification (eg creating fake accounts to refer to)
Cap the total number of vouchers to your marketing budget “the first 10,000 customers that refer a friend, will receive a 25 voucher”
Delay the payout time or tie the payout to the incentive you are building “once your friend becomes a paying member, you will receive a $25 voucher”
Limit cash incentives, offer instead incentives that encourage further use of your product, such as premium versions, loyalty status, free shipping ect

Obviously, these examples also increase friction, and can limit the success of the campaign, testing different approaches will help fine tune the best one for your audience.

Early adopters and the golden cohorts

A common practice in product analytics is to phase the rollout of a new product to quickly learn and iterate before the full launch, which may include a go-to-market campaign. This is often designed as an AB test, with the treatment group, selected at random, seeing the new product, and the control group not seeing it. This is a useful practice to detect bugs, monitor execution quality, and to closely evaluate customer response, for example by analyzing CX contacts or through social listening.

Besides that, you can consider any other product and business metrics to be affected by selection bias, even the rollout was designed under an AB test with eligibility for the new product being random. The reason for it is that while the eligibility of the new product was random, adoption is a form of self selection, and customers that sign up, buy or use a product when eligible, might not be the same as the average eligible group. Specifically, customers that are at the lookout for new products and sign-up immediately after launch are different from less responsive customers. Whether it’s positive or adverse selection will depend on the product.

On average, first adopters of credit products are very the most credit hungry customers, and will likely have higher risk than average, which would be an adverse selection of the overall population. Customers that are evaluating their credit needs and options through time, and apply for a new credit product after long consideration, will likely have better financial planning skills, which generally correlates with less risk.

Outside of credit, most digital products see a positive selection bias in their early adopters. The customers that sign-up at release are usually enthusiastic, heavy users in that industry, pioneers of new products and technology, and are always looking for what is new out there. These customers tend to engage and purchase more, and churn less, than subsequent groups, which led to early cohorts being also called “golden cohort”, especially in the gaming industry. This implies that first adopters not only are different from the eligible population, but they might also be different from subsequent adopters. Overall, early performance during rollout is likely not representative of the long term product performance. This is an example of positive selection bias, in which the best customers sign up first, making the first cohorts less indicative of the average customer behavior. A rule of thumb would be to continue measuring business metrics at the cohort level for weeks or months until they stabilize.

Final thoughts

Understanding how self selection can lead to selection bias, may help you analyze and model data with more rigor, avoiding common mistakes of extrapolating conclusions to a population that is different than the sample at hand. The examples above are just some of the most common cases, but in virtually every analysis it is worth checking whether the underlying process of data generation is truly random.