Analytical approaches to understanding customer retention at Hulu

By Jeffrey Rosenberg, Director of Software Development for Data and Analytics at Hulu

 Hulu’s focus on subscriber retention and the models they’ve built to assess its implications make them an innovative player in the streaming industry.  We were delighted to hear from Jeffrey Rosenberg, Head of Data and Analytics at Hulu, about the steps that Hulu’s data science team has taken to identify characteristics that might signify retention-risks and examine subsets of subscribers whose patterns of behavior bring to light such risks.  

I am here to talk a little bit about subscriber retention, which is a core problem for those of us working in subscription businesses.

There are three key takeaways that I would like to have you guys all take away from this presentation. One is that there are a lot of different ways that we in the subscription business try to retain customers and analyze the data to support those use cases. Essentially, we at Hulu evolved in our data science and analytics groups a bunch of these methods, We’ve actually come up with something that’s new and interesting and that’s the thing that want to talk with you about today. Finally, Hulu data science is a cool and interesting place to work and you should come and talk to us afterwards.

When we talk about retention, we at Hulu tend to call this likelihood to be retained subscriber.  This is basically an assessment of what is going to cause someone not to cancel. If you think of us as doctors, we are basically engaging in measurement problem detection and early intervention. We tend to think about canceled types as two buckets: one is involuntary, meaning users who were cancelled by us without a choice and the other is voluntary, being the ones who actually choose to cancel. Today’s presentation is really focused on the voluntary cancels and what we can do to identify why they canceled, focus in on that and do something about it.

It’s important for us to think about first using the lens of what the users tell us. That is the primary way  to focus in on what they care about and why they would have left. The main point of feedback users generally tend to give when they leave a video is that they’re missing their content. Maybe the thing that they came to watch is not there or maybe, they are not finding the thing that they want to watch next. Either way, the breadth of content isn’t there for the type of genre or cluster of content in which they are interested.  They have access to a lot of video services and they’re getting more value out of those other ones, leading them to spend their money elsewhere. We want to focus in on understand the drivers of retention risk. As data scientists, we have a lot of different methods that we use to analyze this data, anywhere from pretty simple to rather complex survival curves and cluster analyses of user survey data.

What we’re talking about today actually is possibly one of the more interesting examples that we’ve chosen to try to get a different angle on the problem, which is called subscriber health scores. I first want to give props to our data science team who did all of the heavy lifting involved in this model and with whom it wouldn’t exist without. These scores essentially are answering two key questions.  First, what is the chance of retaining each subscriber? Second, what do we find as the doctors that are worrisome symptoms of each subscriber? What we want to do first is to gather information concerning all these traits and input data that we have on our users collected from all over the service and even sources outside. These are pieces of information that we filter and analyze. We look at correlations among them as well.   The main function of this is finding the main drivers of health or retention risk.

We think about it as a set of you know five categories. One is related to content – the variety of things that you would watch on the service in terms of the quantity of total minutes watched. That’s how we would characterize those account types in terms of what type of product a user has purchased.  It could be no ads or the live product. There are lots of differences in the amount of content and entitlements that one has per type. How long you’ve been a user on the service also matters. What’s interesting about that is that it that the type of product becomes a really good predictor of retention. We have also manipulated via the algorithm a way to separate these scores into mutually exclusive and collectively exhaustive sub-scores.


See talks like this in person at our next Data Science Salon: Applying AI and ML to Media and Entertainment, in Los Angeles.

Register here


One of the mantras of data science attribution is that correlation does not equal causation. When you make assertions on this data, a lot of times you’ll be wrong. We have to do a lot of work to minimize overfitting and that’s because generally, we’re missing a lot of features. Our solution has a couple of different remedies for that problem. First, the architecture itself is designed to isolate the inputs into the separate sub scores. Then, we have a couple of different regularizer functions that essentially force all scores to be somewhat important and explode the variance among them in a way that makes them relative to one another. You can understand that something like quantity of minutes watched and usage  would have some overlap. There’s actually a way that we look at how to force them to be separate. First we’ve gone through this process of gathering all the sub and trade data, to which we are constantly adding new data. We assess it either through business insight and talking to stakeholders or doing our own correlation analysis. In terms of the level of importance that we actually want to attribute to that data, the first job that the algorithm does is to organize the data, essentially assigning each feature to one of five dimensions. It will then train the model to predict the retention probability and each score becomes a prediction. We then train that model to validate against actual retention. The product of all of these score predictions should then be relatively similar to the prediction of retention for the user.

Our regularizer functions are trained to remove inter dependencies if they’re not considered to be significant. If you wanted to look further at that, you could look within each sub score to try to see what else is going on. For example, if you isolate tenure for instance, how does that look against that total viewing time ? You would expect tenure on the service as long as you’ve been on plus total viewing time, but that doesn’t really have any correlation. Similarly, there is not really a lot of correlation with viewing time and content score variety. The really interesting thing is that quantity score very tightly correlates with this histogram of total viewing time across all viewers. What that gives us is actually a way of being able to laser in on an interesting point of diminishing return.

So is there a sweet spot for how long or how much content people would watch actually per month so or even for their user lifetime? If so, can we extend it?  Our analyses gives us an ability to zoom in on those customers and start to analyze them at a deeper level. If we’re looking for a way to isolate different groups of users who we think are exhibiting interesting retention behavior, what can help us characterize that behavior is one of the sub scores and that actually opens up different possibilities of cohort analysis. Being able to go through and characterize the risk we can conduct a secondary analysis on those cohorts , perhaps marrying them up with other segments or applying different correlation analyses with other features. We can experiment to try to move the score to a different bracket risk category. We also have the ability to focus on a subscription level that allows us to do cool things like detecting the act at risk. We then can do experiments with score thresholds within our experimentation platform.

This is obviously just the beginning. We’re starting with the applications that we can try and actually get data back. To see if it’s effective, we can use a transition matrix, looking at change month over month at how our users are migrating from different risk categories and determining if that migration is meaningful. You can actually output data about groups of users and transition drivers and then do some analysis on that to see what caused this group of users to go down in retention. That’s something we’re just starting to use as an analytical tool. We also have a practice that’s been nascent but getting ramped up around cohort identification for marketing purposes. We’re now able to use the scores to surgically select groups of subscribers who are at risk for very specific marketing purposes and that’s had some very promising results so far/

In closing, I want to again shout out to our team. We were able to build an interpretable model for retention risk and it has a lot of capabilities as a diagnostic tool that I don’t even know that we were able to foresee to begin. This is just one example of the types of challenging innovative work in which we’ve been engaged at Hulu. If it inspired you, please come talk to us afterwards.


For more content like this, don’t miss the next Data Science Salon in LA, on November 7 2019.

Register here

Sign up for our newsletter