During our recent Data Science Salon Virtual conference for the media, advertising and entertainment industries, we had Anne Bauer with us from The New York Times. She gave one of our most viewed talks about how The New York Times experiments with personalized recommendations. We received great questions during her live Q&A. To view Anne’s full talk and all past Data Science Salon thought leaders subscribe to our DSSInsider program.
Q: How is your team organized around recommendations?
Anne: Our recommendations team is made up of engineering, data science and data analytics. We all work together very closely and there are blurry regions of work that can be done by different roles on the team. Typically, engineers are focused on building out new capabilities in our infrastructure, where they need to make architectural design decisions. Data scientists are focused on developing algorithms. They then apply them or combinations of them, as part of experiments in various parts of the site, apps or email. Data analysts understand how our experiments are going and work with data scientists on what that means for next steps in algo development. They also make sure that the necessary data for experimentation is available and good quality. Everybody collaborates with people on other teams, such as the “For You” tab or the cooking team.
Q: How do you handle the cold-start issue in your collaborative filtering implementation?
Anne: We haven't solved the cold start problem. For that reason, if we are recommending content in a place where there's a lot of article turnover and a big cold start problem, we might use a different algorithm (like content similarity).
We have also used collaborative topic modeling as a way to take advantage of collaborative filtering info while being less sensitive to the cold start problem.
Q: Does the NYT use machine learning for tracking and predicting user churn/turnover or retention with various incentives?
Anne: The data science and data analytics teams at The New York Times do work on problems like churn. I'm not up to date on those projects, though! We collaborate across the company on questions like acquisition and retention, advertising, etc.
Q: How do you approach testing to maximize speed and minimize error? For example, standard p values, sequential testing or bayesian approaches?
Anne: Our main approach is to run experiments in production and directly measure the change in engagement metrics (for example click-through rate).
We have a variety of engagement metrics that we look at. We also do offline testing to make sure we don't launch a test of a bad quality algorithm, but we rely on the live testing to really evaluate our techniques.
Q: Why are contextual bandits your favorite algorithm for personalization? Do you run into the pitfall of just optimizing for clicks and then losing the content context of the recommendations?
Anne: Contextual bandits can include context on the content, but ultimately you're right, it will focus on improving click through rate, whether or not it ends up being a personalized recommendation. We do make sure that we're using appropriate approaches in each location where we're serving content. So, if a part of the product is telling the user that the content there is really personalized to them, then we would rely on an algorithm there that is essentially unpersonalized.
This is an important part of our collaboration with the product groups and the newsroom. We need to make sure that our recommendations are really staying true to the intent of the different parts of the NYT’s site and apps. We make sure to choose approaches and recommendations from different lists of candidate articles that are really crafted towards the place where the recs will show up.
You can read more about contextual bandits in a blog post by Anna Coenen.