Customer Lifetime Values: Revisions for a Digital Retail Brand

By Shreya Ghosh, Data Science Manager at Restaurant Brands International

Innovations in Data Science have increased not only the amount of consumer data available to brands, but also the methods in which this data can be measured.  Data Scientists in the retail industry are no longer limited to counting coupons and moreover, can focus on what makes individual customers return to purchase time and time again.  Shreya Ghosh, Data Science Manager at Restaurant Brands International, discusses the popular types of customer segmentation while expressing the ways in which new applications of data collection have vitalized brands’ understanding of their most valuable customers. 

Hello Everyone, my name is Shreya Ghosh. I have been working with Restaurant Brands International for the last year.  Restaurant Brands International might not immediately ring a bell, but we manage Tim Hortons Inc. in Canada as well as Popeye’s Brands of Quick Service Restaurants.  

After 65 years of being in business, Restaurant Brands International is finally looking into consumer data for the first time. We’ve undergone a huge shift in that we have apps now and you can buy products on our apps. So for the first time we can look at our customers and know what buying data looks like, which leads us to ask how to determine customer value, how can we use this information to drive our marketing and how much money should we invest in a customer. 

First off, it’s important to recognize that we do not have the funding that Uber does. We can barely convince upper management to spend on apps, so we can’t spend $300 acquiring a customer.  So how do we do it? We do it based on existing loyalty and also on our products. Knowing Customer Lifetime Value, which is the amount of value a customer can provide us within a given timeline (we’ve been working with a year) allows us to understand what products actually sell and what products drive frequency versus an increase in price. Some firms determine response time based on Customer Lifetime Value, but we definitely do not do that.  If you want a burger, no matter who you are, we will get it to you. 

Customer Lifetime Value is a metric that can be used for any brand, but we are a retail brand.  We are looking at the digital part of our retail brand, specifically, to pair users with values.  Customer Lifetime Value can be complicated to calculate. There are at least ten different formulas involved and each business has a different way of using the formulas.  Honestly, the “right” mathematical formula to use depends on how your business is set up. There are contractual settings that are relevant when you are using a subscription service like Netflix and then there are non-contractual settings. Cars and homes are examples of non-contractual settings because you’re not likely to be purchasing a new car or a home every month.  

As for us, we fall under the category of continuous purchases because as a retail company, we have a lot of daily behavior.  There obviously isn’t a contract in that we will not penalize you if you don’t buy a coffee from Tim Hortons tomorrow and switching costs are also pretty low.  These metrics are wonderful for the customer, but terrible for the data scientist involved. I have no clue the user does on the app or when they are defecting in this case.  All of this is based on approximations.

 

See talks like this in person at our next Data Science Salon: Applying AI and Machine Learning to Finance, Healthcare and Hospitality, in Miami.

Register here

 

At Netflix, you would calculate Customer Lifetime Value by looking at the average amount of days the customer lives on the program and multiplying it by the margin per customer.  If you know that a customer will give you $700 dollars at the end of the period, you know you can spend up to that much on acquiring that customer. The problem with this method is that customers behave very differently from one another with different buying habits and if you get caught up in all these averages, you use a lot of desirable granularity in your users.  For example, I buy toothpaste from Amazon regularly and my dad buys beautiful rare books on Amazon, but just because I buy everything off Amazon does not necessarily make me a better customer. If a customer has big gaps between their purchases, that doesn’t necessarily mean that they have churned.  

There are three things that the Customer Lifetime Value model does.  The first is to gauge the lifetime of the customer, which is basically a churn estimation by which we can say, “If a customer has not bought by day x, they have probably churned.” It assigns a probability to each day that the customer is involved.  The next is the purchase rate, by which we estimate the frequency of purchases in the future and finally, the monetary value, where we try to predict an average value of these purchases. The Pareto/NBD model is the most well-known of these models, but there is also a simplified model called the Gamma-Gamma model which we have come to use. It’s simplified only because the math is simplified.  It’s actually a beautiful model and really works. 

I first encountered Pareto in an Environmental Science class talking about animal death rates.  This can actually be applied to customer lifetimes in terms of their churn rates, which follow exponential distributions.  We model these relationships using both Poisson and Gamma distributions, in which the parameters within the model are based on all the previous transactions.  Decay in general is modeled with a Gamma distribution so that explains in a way why monetary value is as well. One of the things we assume is that the monetary value is not dependent on the frequency of purchases.  Higher frequency does not necessarily mean higher spending.

 

IMG_1738

Shreya Ghosh, Data Science Manager at Restaurant Brands International, speaks at DSS Miami 2018

The data we need from the customer is quite simple:  We need a unique user ID, the time of purchase calculated in days and the amount spent on each purchase.  After we have this transaction data, we can compile it to develop a user profile called the Recency, Frequency and Monetary matrix.  We find how recently the customer bought, how often they buy and the average of the monetary expense that they spend. Once we have our model, we can develop our cost function and regulate overfitting by dividing our data into testing and training sets. As the number of purchases get higher, the model does better because it has more data to learn from.  Our errors are low, especially because we are selling coffee, not healthcare, and have relatively low risk. Our confusion matrix does not have to be 100 % accurate to provide us with valuable insights.    

So how do we decide who our best customers are? Obviously, if someone has bought an item yesterday and as a high total of items bought in the past, they are among our best customers. That’s a given.  But sometimes, customers who have bought a lot in the past have not bought recently and we have to accept that those customers have probably churned. In between these extremes, we see another set of valuable customers. These are customers who might not have bought that much and do not have a high frequency, but that frequency could simply be the waiting period that works for them. We are always looking for those customers and our model lets us look at subsets of customers who might fit this description.  

Another interesting phenomenon involves the fact that if a customer is known to purchase a coffee from our app every day and is silent for two days, our model can assume that customer has churned because purchasing from our app is such a daily habit. We go as far to zero in on whether purchases fall mainly on weekends and holidays to see how this affects a customer’s probability. Customer Lifetime Value can be very seasonal and so your data needs to travel across multiple seasons to account for that bias.  

Overall, the Customer Lifetime Value model allows us to better understand our customers and develop these lifetime stories, which lead us to accurate segmentation.  When we segment based on frequency or amount spent, we realize that for some customers, we don’t need to sell frequency in the form of buy-one-get-one free deals. Maybe we just need to add a new product here and there.  We have started to value our promotions based on Customer Lifetime Value as opposed to number of coupons used. Evaluating Customer Lifetime Value as a metric before and after the release of the promotion can tell us how much value the promotion has provided to our business.

 

Curious for more?

Don’t miss the next Data Science Salon in Miami, September 10-11, 2019.

Register here

 

SIGN UP FOR THE DSS PLAY WEEKLY NEWSLETTER
Get the latest data science news and resources every Friday right to your inbox!

Sign up for our newsletter