With the recent advancements in machine learning (ML), organizations can use algorithms to positively influence every stage of the business growth cycle - from customer acquisition, to activation, retention and referral.
However, an ML system applied in one stage of the cycle has an impact on the performance of another stage. According to Sayan Maity, Senior Research Data Scientist at Roku, the recipe for success with ML is to maintain harmony between all stages.
At the latest edition of DSS Virtual for Retail & e-Commerce, Sayan gave us a deeper understanding of how to apply ML across the different stages of the business growth cycle and how an ML system for each stage can look like. We sat down with him in a DSS Break session after the event and asked him about the advice he has for organizations that want to grow fast using ML.
Sayan: When I first got the invitation for this talk, I realized I am not directly working in customer relationship management or retail business anymore. However, I realized it was a good opportunity for me to have a look back at all the different projects I have worked on, what I have learned and what I feel would be something of interest to share with a bigger audience. I had the opportunity to sit down and think about how we can bring each and every component where machine learning is acting on the business growth cycle together. Additionally, how each component might have an intrinsic influence on the other and how we can define it to make sure that every component has harmony with the bigger picture. That is what I talked about yesterday.
Sayan: Absolutely! Every industry is in a different phase of growth, right? While some industries are still maturing, some retail markets are already popular. However, it is important to have an environment where team members who design machine learning algorithms for a certain component are tightly aligned but loosely coupled.
When you're using a new algorithm or data from a new demographic, you have to make sure you are aware of how it's impacting the entire end-to-end business chain. I think that's one way that most of the industry has adapted overtime. You need to give that freedom to be creative. However, when the end product is getting incorporated into the business, you have to be mindful of the changes.
"When you're using a new algorithm or new data source, you have to make sure you are aware of how it's impacting the entire end-to-end business chain."
We use a lot of AB test frameworks before a certain feature goes into the final product. I believe that is a way we can make sure that before a new feature or product is hitting a new market, we have made all the checks and balances using this control experiment so that there is no negative effect to the bigger goal.
Sayan: There is no specific part of the cycle, which cannot be addressed with machine learning. ML algorithms are applicable for all problems as long as we’re able to define them. The core requirement for a successful algorithm to have enough data and to ask the right question that we are trying to answer by using that data.
But if we look at the entire cycle, the first hurdle that a startup faces is getting the user’s brand perception. All users need to be aware about it and it should have a positive value. I believe that the problem in this first stage of gathering awareness is very open ended. We don't necessarily know the specific requirements of the product.
"We have to move fast and fail fast to be able to identify the right features and the correct market for a product."
This isn’t a hard problem to solve, but it is an interesting one. There's a lot of open-ended questions that can be answered. And that's the scope where we have to move fast and fail fast to be able to identify the right features and the correct market for a product to identify the need and solve the problem. It is not difficult, but it is challenging and more interesting for an ML practitioner.
Sayan: As we know, the tech industry is moving very fast and things change over time. It's very fast paced. For example there was a time where Tensorflow was the go-to library when you wanted to build something but now it is PyTorch. It's very hard to have favorites because needs are always growing and changing so my list will never stay the same for too long. However, generally speaking, Python is a tool every data science practitioner uses. In day to day business, we don't have the liberty to have thousands of data to do an experiment, we have to do the experiment on the fly. So you have to use billions of data points. So for me, the go to tool is the Spark Environment.
Spark allows us to communicate with different kinds of APIs-- Python, Scala, R. So if I have to choose something, I would use the Spark environment. There's also a lot of cool libraries which allow us to read data quickly from flagged files and based on that we apply different kinds of algorithms to not necessarily make the final product, but all the experimentation beforehand that will help the product itself.
Sayan: As always, there is a tradeoff between personal information that can be used in the ad industry and each individual when they opt in or out. The tradeoff means that if we don't know specific information about what someone likes, it is very hard to show them the correct advertisement.
Given all these changes that happen through GDPR and very recently in CCPA, all these norms, we are finding that there are going to be cookie lists. For example, Apple OS is going to block specific identifiers. Not necessarily at the user level but for the device and browser level. Those IDs that are being used in advertising are going away pretty soon. So, how are we going to be able to attribute and target the proper demographic for a certain product?
"If we don't know specific information about what someone likes, it is very hard to show them the correct advertisement."
As I mentioned earlier, our goal is to abide by every norm that will keep personal information private, but how can we come up with identifiers that include no personal information? This is called deduplicating with a certain identifier that has nothing to do with the device or individuals or routers. So we have to come up with moving all these personalized aspects from identifiers and then use it as a way to attribute those ads.
Check out Sayan Maity’s presentation about Applying Holistic ML to Solve Industry Business Problems in which he discusses how cutting edge machine learning techniques can be used to solve the core business needs of an expanding customer base without impacting the brand perception and simultaneously minimizing fraud. His talk is available on-demand here