Data driven decision making: Using Causal Inference for E-commerce growth

By Ashwin Viswanathan Kannan

In the growing landscape of e-commerce, making decisions based on correlation alone is insufficient and potentially misleading. Causal inference provides a more sound methodology that identifies the true factors influencing customer behavior and business performance.
Leveraging causal inference enables data driven industries to make informed choices and drive strategic decisions that help in driving market positioning, resource allocation and revenue growth. Integrating causal inference into e-commerce strategies facilitates accurate measurement, granting a substantial competitive edge and fostering long-term success.

Causal inference is a statistical approach that determines whether a cause-and-effect relationship exists between variables. For businesses that sell products or services online, it's really important to know these cause-and-effect relationships to make smart choices that will help the company do well. 

This method is more helpful than just looking at simple associations because it shows what actually causes customers to behave a certain way or what leads to more sales. Understanding this can help a business improve its marketing, where it places products, and the shopping experience to make customers happier and help the business grow. In e-commerce, where decisions are often based on numbers and trends, knowing the cause of something is key.

When businesses know what causes what, they can make their marketing and customer service better and grow in a smarter way. They can tell the difference between what really causes results and what just happens to be related, which means they can guess what will happen next more accurately and make better choices. By using causal inference properly, online businesses can adjust their strategies to match what customers want and what the market is doing, leading to better results.

Correlation is not causation

Correlation indicates a relationship where two variables move together, whereas causation implies that one variable directly affects another. In e-commerce, an example of correlation vs causation could be observing that higher website traffic correlates with increased sales. However, this does not mean higher traffic causes more sales directly; a third factor, like a marketing campaign, could be driving both. Understanding the difference helps businesses identify true growth drivers rather than coincidental patterns.

Causal Inference Techniques

Causal inference utilizes a variety of methodologies, including Difference in Differences (DiD), Potential Outcomes, A/B testing, propensity score matching, and instrumental variables, among others.

Randomized Controlled Trials (RCT)

A Randomized Controlled Trial (RCT) is a gold standard experimental design used in causal inference to evaluate the effectiveness of an intervention or treatment. In an RCT, participants are randomly assigned to either a treatment group, which receives the intervention being studied, or a control group, which does not receive the intervention (or receives a placebo or standard treatment). 

RCTs are highly valued in causal inference because they allow researchers to establish a causal relationship between the intervention and the outcomes with a high degree of confidence. By randomly assigning participants to treatment groups, RCTs help ensure that any differences in outcomes between the treatment and control groups are due to the intervention itself and not to other factors. 

This randomization helps control for confounding variables and minimize bias. In e-commerce, randomizing trials to test website features or marketing tactics is challenging due to the diverse customer base and the potential for skewing results. 

Ethical concerns and the risk of damaging the customer experience or company reputation further complicate these trials. Given the significant resources and time required, the industry frequently employs alternative methods like observational studies or quasi-experimental approaches like propensity score matching or instrumental variables, to analyze causal relationships while mitigating these challenges.

Difference in Differences (DiD)

Difference in Differences (DiD) is a quasi-experimental design used to estimate a treatment's effect by comparing the outcome changes over time between a treatment group and a control group. It's particularly effective in observational studies where randomized control trials are not feasible. DiD is used when there's a clear demarcation of before and after the intervention and when there are comparable groups that did not receive the intervention, allowing for the control of unobserved variables that do not change over time. It helps in isolating the treatment effect from other external factors.

The DiD estimator can be represented as:

Where ​ and  are the average outcomes after and before the treatment for the treatment group, and ​ and  are the averages for the control group.

In an e-commerce setting, DiD could estimate the impact of a new website feature by comparing the change in conversion rates before and after its implementation between users who were exposed to the feature (treatment group) and those who were not (control group).

Potential Outcomes

Potential outcomes, also known as counterfactuals, represent the hypothetical outcomes that would have been observed under different treatment conditions for each unit in a study population. In the context of e-commerce, potential outcomes can help estimate the causal effect of interventions or treatments on customer behavior and business outcomes.

Let us look at an example from ecommerce to illustrate the concept of potential outcomes

Suppose an online retailer is interested in evaluating the effectiveness of a new email marketing campaign on customer purchase behavior. Each customer in the retailer's database can be considered as having two potential outcomes:

  • : amount the i-th customer would spend if they receive the email marketing campaign.
  • : amount the i-th customer would spend if they do not receive the email marketing campaign

The "treatment effect" for each customer is how much their behavior changes because of the email campaign. We calculate this by finding the difference between what happened when they got the email and what would have happened if they hadn't.

But in real life, we only see one outcome for each customer, depending on whether they got the email or not. So, we use a binary indicator variable  to show if a customer received the email or not. To estimate the average effect of the email campaign on those who got it, we use a formula that takes into account the observed outcomes for each treated customer and the average outcome for those who didn't get the email.

This approach helps us understand the impact of the email campaign by considering what could have happened under different conditions.

Propensity score matching (PSM)

It is a statistical method used to determine the impact of a treatment or intervention in observational studies. It aims to ensure that treated and control groups have similar characteristics, reducing bias and resembling randomized controlled trials (RCTs).

Steps in PSM:

  • Estimating Propensity Scores: Predicted probabilities of receiving the treatment based on observed characteristics are calculated. These scores indicate the likelihood of being assigned to the treatment group.
  • Matching: Individuals in the treatment group are paired with similar individuals in the control group based on their propensity scores. Matching methods include nearest neighbor, caliper, or kernel matching.
  • Assessing Balance: The balance of characteristics between the treated and control groups is checked after matching to ensure similarity. This can be done using statistical tests or graphical methods.
  • Estimating Treatment Effect: Finally, the impact of the treatment is estimated using the matched samples. This can involve comparing average outcomes between groups or adjusting for covariates using regression analysis.

Employing propensity score matching can account for potential confounding variables and obtain more reliable estimates of the treatment effect, enhancing the validity of findings in observational studies.

Instrumental Variables

Instrumental variables (IV) analysis is a statistical method used in causal inference to estimate the effect of a treatment on an outcome variable when there are hidden factors that could influence the results. It involves finding a variable that is linked to the treatment but not directly to the outcome, except through its impact on the treatment. This variable, known as an "instrument," helps reveal the true causal effect of the treatment.

In e-commerce, instrumental variables (IVs) can be used to estimate the causal effect of certain interventions or treatments on outcomes of interest while accounting for potential confounding factors. For example, in the scenario of measuring the effectiveness of a marketing campaign on product sales or web traffic, Click-through rate (CTR) could serve as an Instrumental Variable (IV) since it may not directly affect the outcome of sales except through its influence on the proposed campaign

Challenges in applying Causal Inference

  1. Complex Customer Behavior: E-commerce platforms often have diverse customer segments with varying preferences, behaviors, and purchasing patterns. Understanding and accurately modeling the complex interactions between these factors can pose a challenge for causal inference.
  2. Changing Markets: E-commerce markets are highly dynamic, with rapidly changing trends, competitive landscapes, and external influences such as seasonality and economic factors. Capturing and accounting for these dynamic market dynamics in causal inference models can be challenging.
  3. Treatment Heterogeneity: Changes or promotions may affect different customer groups in different ways. It's hard to know how to tailor strategies to each group effectively.
  4. Data Challenges and Bias: E-commerce data can be messy, with missing information or errors. Making sense of this data and ensuring its accurate can be tough
  5. Ethical and Privacy Concerns: E-commerce data often contain sensitive customer information, raising ethical and privacy concerns

Addressing these challenges requires a combination of advanced statistical methods, domain expertise to understand customer behavior better and make informed decisions.


Causal inference is a valuable tool for e-commerce companies wanting to grasp how their strategies affect business results. By pinpointing cause-and-effect relationships, businesses can make informed decisions to improve customer experiences, streamline operations, and boost growth. In today's competitive e-commerce environment, mastering causal inference is essential for crafting effective business strategies.

More real-life use cases and applications of AI in retail, as well as hands-on expert experience shared during Data Science Salon Seattle event, are still available on-demand! Don't hesitate to watch it now!

Get the latest data science news and resources every Friday right to your inbox!