DSS Blog

APPLICATIONS OF EMBEDDINGS AND DEEP LEARNING AT GROUPON

Written by Bojan Babick, Senior Software Engineer at Groupon | Apr 5, 2019 8:37:06 PM

Hey guys! Thank you everyone for coming today. I’ll be talking about the applications of deep learning at Groupon. Particularly, I will focus on the applications within the ranking as well as their relevance and search capabilities. I’m a senior software engineer and have been working at Groupon for over five years. I really like what I’m doing and before actually getting into the talk, I want to present a little bit about the company that I represent.

In case you are not familiar with Groupon, we are a dynamic marketplace joining one million merchants with over 50 million paying customers. We poured over 20 billion dollars into local commerce and our mobile app is rated as one of the top 5 ecommerce apps of all time. As of last week, we reached over 200 million downloads of our mobile apps. The team I work on has the overarching theme of making Groupon a daily habit for our customers and we do that by growing customers and increasing customer lifetime value to increase profitability. But above all, we try to make Groupon a great place to work.

Today’s talk will actually be split between four avenues. I will talk about different applications within our search and current standing followed by how we deploy applications in browse, our image detection and image propensity to purchase features in the home feed. and the last bit would be talking about the similarity and also how do we apply similarity for different contexts on which our customers might land. We are particularly focused on post-purchase and collaborative filtering.

Search is interest-based. It’s the place where customers come and express their intents and these need to be well understood. Current standing is the piece where we try to extract the different bits of what customers want to express. We apply that with the known features from our customers. This could be the platform they use, or perhaps the mobile or web time per day or week. We try to actually turn that into reasonable recall but it’s not always easy.  

We have multiple ways of extracting or expressing one’s intent. But how do we make sure that we are always returning relevant results for each particular query? We are great at head queries but not as good with tail queries.  So we’ve come up with a project which would try to predict query similarities based on the user footprints that they live by. We always consider where the similarities are between users and specifically, if they share they share similar terms. We will start from the random node in our graph and do the random walk to make sure that the data will in aggregate show us what is actually the most probable node land given the random walk from the starting node.  A problem with this particular approach is that there is no guarantee that the similar queries will yield the same results so we decided to build another strategy.

Instead of actually learning the word to back embeddings we are now also learning the document embeddings and that’s apparently very powerful. We’re able to extract very similar queries given the original query as input. If you think about PlayStation, PS4 and PlayStation 4, they’re all similar queries for Sony Playstation. We have several different versions of a model that can recognize more and more queries, but how do we actually evaluate if our next version is the same or better than the previous one? We came up with two approaches: One would be using the property of Dr. Wecht, who proposed that the real world relationships are preserved in the embedding space.  Still, we will need to actually maintain the handcrafted list of analogies. To actually solve our problems, we need to keep this scale in mind. We basically use the extrinsic model to evaluate our embeddings and designate a model for evaluation purposes that we will never use in production. We will try to predict the query category given the current bearings and they will just have the basic or final elevation for our models.

Search is different from browse in that search is high intent and the results we return need to be very well targeted. Browse, on the other hand, can help us find something that we haven’t seen before and we have a huge set of features that we use for browsing but not for search. Some features that are important for search will not have the same importance for browse. One of the super important features for browse is a dual category and we have fairly complex taxonomies for this. We have over fifty hundred lives and six-level deep taxonomies. I say taxonomies because we have not only one taxonomy, but several.  We have a customer-facing taxonomy and third-party taxonomies as well as taxonomies from our partners. Whether or not a restaurant is a good place to work and whether it has Wi-Fi or not are pieces of information that we get from our partners. All these nodes are interconnecting so to be easily translated from one taxonomy to another.

 

See talks like this in person at our next Data Science Salon: Applying AI and Machine Learning to Retail and E-commerce, in Seattle.

 

However, we have recently had to put restrictions on our partners’ capabilities to prevent customer dissatisfaction. Sometimes, our merchants may be trying to spam us by actually tagging incompatible queries together. We decided to tackle this problem by not letting them do this anymore. Also, if you let merchants tag the deals themselves, it is important to remember that they may not be aware of all the possibilities and all the right nodes within their taxonomy.

Our goal is to try to come up with an algorithm that can infer deal category given the deal content. We would take the content, the highlights and the description of a deal and try to predict what the category of the deal is given its content. We try multiple approaches – we start with the classic k-means cluster approach, where we provide “n” clusters and expect that those big, clustered filters will be closer and closer to each other after each trial. We also try shallow machine learning approaches, but may very soon decide to move away from that. What worked for us are the deep neural networks in particular – the ones that we actually use in production for predicting the ideal category. With certain strides, we’re able to detect the features that are important for any given context, only focusing on a small region within that input sequence. However, knowledge of the sequence as a whole is important because the output of the previous individual cell is being fed into the next cell. The last cell has the combined knowledge of the whole sequence. So far, we’ve had 94% accuracy on the training samples and 89% accuracy on our evaluation.

The next avenue I would like to focus on is the home feed and particularly the images on it. Images represent a fairly big chunk of your screen whenever you visit the Groupon website, so we bought this idea from the Netflix networks. We have different posters for the movies presented to different customers based on their personas. We know that good images mean that the deal will come, but how do we consistently create good images? The thing is: there is no clear answer because people will always prefer different images from one another. We try to streamline results by using neural networks to extract the features of an image and then at the end add the fully corrected layer. We replace the image with the actual vector’s representation to classify whether the customer made the purchase or not.

There’s like multiple ways to build similarity by looking at the content. We can tackle similarity by looking at the similar behaviors of the customers that purchased deals together. When we think about the fact that each word is defined by the context of the deal, we can look at the thickness of deals in the user sessions that customers have seen before purchasing a deal as a sentence. There is the start of the session, a customer’s first page, a customer’s second page and all the random pages they click on additionally. There might be some social interaction. People will share the deal on social media and then purchase it at the end. That will be the end of our sentence. We represent each deal with its own word and train our model to recognize it. We end up with a hidden layer of how this particular network is used for our recommendation purposes. Basically, our algorithm will produce the “k”-nearest deals in the embedding space and they’ll be what we will recommend to our customers in the post-purchase context.

So as you can see there are multiple applications of deep learning within Groupon’s strategy.  What we realized from working this project is we don’t have to be constraining ourselves on words. Words are only significant in terms of how they represent deals and how we can create embeddings for entities. We can even create frameworks through which merchants can create their own entities.  We can apply the same for customer service when they pull the transcripts from customers’ calls.

We came up with the idea of tangibly working on the entity embedding framework which would produce the sum of the architectures of any relevant data set.  After the embedding has been properly trained, we can use some offline validation techniques to validate it. The hope is that we can then use embeddings in predictions so we can have our inferences both in real time and in the end in the offline mode as well. What happens with this particular approach is that we are able to seamlessly deploy embeddings to production. We’re able to learn in real-time, replacing the current representations of embeddings with relearned ones. This offloads the heavy work from our online or offline data processing and we’re able to capture lots of user interactions that  cannot be captured by merely looking at features.

There are many latent features that our customers are expressing by interacting with our website. We are using embeddings in order to reduce the number of features that we need in production and we are able to serve smaller models and have higher throughput as a result. This results in lower networking costs overall. We are happy to be moving from classical machine learning and feature engineering towards the embeddings.  

To conclude, this is the direction that companies are going and other organizations within our company are reaching out to us to see how they can get on board with the new applications. Even the mobile team recently asked us to help them build a new method for extracting user information from credit cards. In this case, we’re trying to achieve both ultimate feature extraction and automated parameter tuning.

If you think about the journey that we went on from rule-based systems to classical machine learning models with hand-designed features to representation learning and deep learning, we’re basically replacing all classical features with the embeddings and that currently is the roadmap for our team for the next couple of quarters.

 

For more content like this, don’t miss the next Data Science Salon in Seattle, on October 17, 2019.