Data Science and Entertainment Production at Netflix

It’s nearly impossible to remember what date night looked like before the coining of the term “Netflix and Chill.” With Netflix playing such an integral role in our daily lives, we were intrigued to hear from Jen Walraven, Manager of Science and Analytics at Netflix, on the ins and outs of the company’s complex system of data analytics, especially as it pertains to intricately tailoring each viewer’s profile to their unique tastes no matter where in the world they are.

Hi everybody, I’m Jen. I lead our studio and production science and analytics team at the Netflix office in LA right in the heart of Hollywood.

Netflix is pretty well known for its recommendation systems, but I’m not going to talk about any of that. I’m going to talk about something a little bit different, which is how we’re using data science as part of our studio. I’ll give a brief history on the Netflix studio and how we think about content production at Netflix. I’ll motivate our discussion a little bit with this question of why use data in such a creative space and walk through some of the challenges that we are facing.

In 2013, we launched House of Cards and Orange is the New Black as well as Netflix originals. This is back at a time when releasing the entire season of a show all at once on a streaming service made headline news. Stranger Things became one of our first self-managed originals. With a self-managed title, we were responsible for the whole production infrastructure. Previously, we could license that work out to another production company or another studio, but now, we were in the thick of it. In the first half of this year alone, this is just a sample of the Netflix self-managed originals that we’ve released on the service, but what is production? What does it actually mean to self-manage? I like to use the analogy of a homeowner with a vision for their house – they may have a vision but then they’ll engage an architect who designs some blueprints. They may connect with a general contractor who then reaches out to their network of experts to help bring the whole thing together. They may reach out to an interior designer, a stonemason, a landscaper, a framer and so on. Production is not that different a title. We’ll start with a writer or a creator who has a vision. They will write a script, which is in many ways the blueprint of production. They then engage with producers who reach out to their network of experts.

If you have ever sat through a movie through the credits, you’ll know how many people are actually involved in making this happen. It’s a very analogous process in content production. At Netflix, we typically frame our content lifecycle in this high level set of steps. A new content will come in as a creative pitch. If we decide to greenlight it, we move into a long series of negotiations that truly continues throughout the whole process. Then we move into the heart of the content creation from pre-production all the way through to localization and quality control. After that content has been created, we launch it on the Netflix service and have a bunch of personalized recommendations that we all know and love.

I’ll talk a little bit about the primary studio functions. Although my team is using data from this entire lifecycle, we’re really affecting decision making in the studio. Pre-production is a lot of planning and logistics. It’s been said that 80% – give or take – of the prep work happens where we’re answering questions in terms of what crew we are going to work with or what the budget or schedule for this title should be. Where are we going to film it? Do we need special equipment? Do we need a stage? All of these questions are asked in this phase and then we move into production, otherwise referred to as principal photography. This is when the actual filming happens when we think of actors and directors on set moving around. In addition to the on-camera work, there’s also a tremendous amount of just tracking daily schedules and the vendors we interact with by reviewing daily footage and making sure that we’re moving forward as close to on-time and on-budget as possible.

Post-production, even though post implies afterwards, they’re truly involved throughout the whole process because they are the guardians of the media assets that are ultimately delivered from that filming. They are responsible for editing the audio and the visual from any supplementary material that goes into what you see when you press play on Netflix. They’re also responsible for editing. This can be more creative editing like arranging the story to make sure that it’s true to that original creative vision, but also it includes more technical editing like color and sound finishes, making sure that the color the color looks good. The audio and the visual are aligned and that type of work and visual effects is also involved around this stage, bringing things to life that we can’t do on camera. Finally, localization and quality control come in after that content has been made in terms of how we make this content accessible and understandable to our global audience.

When you turn on Netflix, you’ll see that we have a bunch of categories and then within those categories, there are different cards for the titles. If you’re watching us in the United States, Canada, Great Britain or Ireland, you’ll see most of these titles in English. If you’re watching this in a country that does not have English as its primary language, some of these titles may actually be in that language as in the subtitles and dubs that you use to watch that content will be in a different language perhaps by default. The video montages that come up when you turn on the Netflix service may also be pre-dubbed in that other language, so a lot of effort goes into making the whole product experience true to the content in that language. Of course, quality control makes sure that the experience of watching it interact with the product is at a standard that we want to communicate to our members.

See talks like this in person at our next Data Science Salon: Applying AI and ML to Media and Entertainment.

At this point you may be wondering: if this is an inherently creative space where we’re creating and telling stories, not sharing algorithms in production, why use data? Our reason is this: We use data to build the tools that enable content creators to tell their best stories across geographic boundaries. Our members trust us to invest money wisely. If we’re going to keep barring local stories with a global audience, we have to invest in the tools and the infrastructure that make that possible. Data is critical to enabling us to do that at scale.

Now it’s particularly important to recognize that although there’s this glamorous, inherently creative piece of telling stories, if you look behind the scenes, it’s really a very complex supply chain. We can learn from how other industries have solved those problems to make the production experience at Netflix better, but this is not without its challenges. The first challenge is visibility in a traditional production process. There are hundreds of paper forms that are filled out and all managed by different parts of the production crew. The information is disparate. It can be siloed and it’s almost never aggregated in a structured way. The challenge for us is capturing this information in a way that is structured. How do we draw insight across all of the unique productions that this information reflects?

Our motivating question centers around how we can make foundational questions easy to answer. I like to think of these as the everyday questions that our stakeholders are thinking about when they walk into the office where are we filming. Who are we working with? How much did this thing cost? Are we on time? We need to know how to address all this – even something as simple as how much are we filming. Determining our filming volume has so many different aspects to it from the crew to the facilities. It could also include the external folks we work with for visual effects or even our talent. This becomes especially complicated as we think about how that volume is changing over time and in what markets. We need to know what that means about the resources in terms of the resource supply and demand. This can be something at a large scale like enough stage space, but it could also be something like the assets that were delivered from post-production. We need to know if there are bottlenecks in our supply chain for the delivery of certain audio assets. How do we see that change as we move through the year? Even further at a macro level, how do the networks of vendors impact the way that we’re receiving those assets. Are there particularly complicated hops where we’re delivering assets to seven different vendors at a time where we could be more efficient? There are other macro questions that are really valuable here – what about global equipment logistics or travel requirements? If we were making a small handful of films a year, maybe this wouldn’t be useful. However, at scale and in markets where this infrastructure doesn’t exist or it hasn’t in the past, this becomes incredibly valuable.

Our second challenge is scale. Although we’re a global company, there is always room for us to improve in our ability to make local content really exciting to people who may not share the same language. As we continue to grow, we’re finding incredible stories from all around the world – things that transcend beyond a language or a country of origin. In doing so, we get to reach an audience that may have been underserved or underrepresented in media and that’s really powerful, but doing this for every title in every language for the entire Netflix catalog is a massive amount of work. The content localization timeline may look nice and linear for one show. When we have scripts for dubbing and subtitling, we move through the recording and translation that ultimately gets delivered to Netflix. It looks really clean but in actuality, it is messy – especially when multiplied for all the titles. Here we must ask where can we learn from what we already know. In the case of creating subtitles, we can use per-language consumption to help us predict per-language future viewing. If I have a new show that’s a dramedy, I can use information about viewing from a comedy in 2016, a sci-fi thriller in 2017 and a drama in 2018 to help get a better sense of how I need to prioritize this localization work. Some of the features we think about in this notion of similarity are things like the genre, the original language of the content and the local language of the content as well as how was it consumed relative to its original production. We’ll also think about the dub consumption or the sub consumption and see how those things change depending on the market or the type of the content.

Our third big challenge is efficiency as our investment in original content continues to grow. We again come back to this question of how we are empowering content creators at scale. Some of this we can accomplish with the tools we create and maintain internally to help with the data capture or with operational decisions that we make daily. An intuitive UI or UX can take you 90% of the way in a lot of cases, but there are cases where it can’t or where that last 10% really matters. Those cases require us to use data more effectively.

A major point of focus for us is how we can spend more time discussing good options instead of defining new ones. I’ll explain what I mean here with an example, not from Netflix, but from the NFL. Every season, the NFL needs to create a game schedule for what teams are going to play when and where. Historically, this was done by league executives. They would literally have a pegboard on which they would arrange the teams over the course of months, so just to come up with one option there was a tremendous amount of manual work. A few years ago, the NFL started to turn to constrained optimization, a technique that’s used in a lot of industries. It’s used in supply chains with distribution centers, in high frequency trading, and as we see now, in sports. Now, the NFL can take into account things that were never even factored in before, such as how far the team is traveling from home and how often the team goes multiple weeks without a break. On top of being able to take into account all of these constraints, they can create trillions of options and then spend their time discussing the best ones, not coming up with all of them from scratch.

Can we apply this to film schedules? It’s sort of a novel concept but is it really that different? Well the way this works in practice is starting with a script, an assistant director will go through and do something called a breakdown. This is where they mark up the different elements of the script to ultimately fit it into a schedule. As they’re going through the script and thinking about a schedule, we start to introduce some constraints. We probably don’t want to have different scenes that are using different stages to be filmed at a bunch of different times, so all of the scenes that use one particular soundstage should probably be filmed around the same time. That way, we will be more efficient with our transfer time. In addition, we have to know if we are we able to pay people overtime or if this is a market where that’s just not possible. Like with the NFL, in the past, all of the scenes from the script would literally be represented as paper that would be moved around on this massive board to try to come up with a single option. Although there is a software that digitizes some of this process, there’s not much in the way of making it efficient.

Let’s try to frame this as an optimization problem: We have some data, we have a scene and for every scene, we have a location. We have the performers in that scene and we also have the time- both in the approximate number of hours and the number of pages in the script. Constructing our variables, we have the days on which the scenes are expected to be filmed, we have a start time for every scene and we know that in most cases, scenes cannot overlap. We would start to work in some of our costs and constraints overtime, such as if our work days cannot exceed a certain number of hours. Depending on the unions that we’re working with, the talent availability or any other number of factors, that’s actually quite variable. We also know that certain scenes will require that they be filmed during the day or the night. From there, we can start thinking about even more constraints such as the cast availability or the time that it takes to switch from one sound stage to another. We can even start thinking about the problem formulation itself, if structuring this optimization in the way I just described is computationally inefficient or not provided us with the results we want.

Another question that we asked in a very early prototype of this work was if we need to refine our constraints themselves . For example, we found that the optimal solution was that for four days, we would shoot in the dead of night. However, although this may be the optimal solution, we can’t separate this from the human judgement. We can propose an option that can save time, but this is designed to work in concert with the creative teams not. Ultimately, if we can develop an interactive tool one that takes into account all of these things we’ve talked about and it generates those candidate schedules for discussion, we can help our creative teams make better decisions each step of the way. This is an incredibly exciting space for us because not only is it new for us – it’s new for the industry at large. It’s exciting to think that data, in one way or another, gets to be part of that.

For more content like this, don’t miss the next Data Science Salon in LA, on November 7 2019.

Data Science and Entertainment Production at Netflix

Sign up for our newsletter