Machine learning Projects: Lab to Live Journey

By Data Science Salon

AI is everywhere and its rate of adoption has significantly increased since the digital acceleration seen through the usher of the covid period. But the aesthetics get worn out soon as the aspiring AI/ML projects do not live up to the potential and end up under-performing.

As per Gartner's research, most of the ML projects do not make it into production. The PoCs appear as mere attempts (read experiments) of trying to solve a business problem but do not get the thrill of stepping out from the research environment and going live. These experiments are essential considering the iterative nature of ML research projects, but a few get through the funnel. 

There are a lot of roadblocks in the journey of taking a PoC to production – some are inevitable by the nature of experimentative work while other issues can be ironed out by smart planning at the onset. 

In this article, we will focus on how better planning and thorough requirement analysis can help save business time and money while working on ML projects.

Any new project should cater to the following questions: 

  • Does it enhance customer experience? 
  • Does it add value and justify the return on investment (ROI)?
  • Will it improve current processes, bring efficiencies, and yield cost-benefit?

Based on the criteria listed above, the business prioritizes a certain project and proposes a feasibility analysis.

Understanding proof of concept and how to take the concept out of the lab

Proof of a concept involves a quick model building to analyze if the proposed solution solves the business ask and can be moved to production. 

Once the PoC is successful, the tough road to productionize the model starts, and the following questions need to be answered:

  • Fallback: What if the model does not work as expected? What is the fallback plan?
  • Robustness: Data lies at the core of ML models and is dynamic by nature. Will the deployed model be robust to changing data characteristics?
  • Data Volume: “The more data you have, the better it is” - is a pure generalization and does not hold true if the volume of data is huge but not the data quality. The data must possess a good quality signal for learning statistical association. 
  • Good quality data: There are no golden rules to define good quality and consistent data, it differs with each dataset and requires careful data investigation during exploratory data analysis. But following pointers are globally valid and need to be checked at the minimum to make sure that there are no data discrepancies:
    • Missing values of critical attributes
    • Outliers or Anomalous behavior
    • A certain category of entities missing altogether – the entity could be a group of people, items, regions, etc. 
    • Multiple sources of data can lead to irregularities during data integration 
  • Model Deployment: How will the model be deployed, which tech stack will be used, does it integrate well with the organization’s current infrastructure?

Source: Vidhi Chugh

Guidelines to swiftly move the model from the nascent research stage to production

    • Working in silos: The data is everywhere and is often not centralized. It takes a long wait for approvals to get the data out from silos and ready for model building. This wait cycle is detrimental to the “try fast, fail fast” philosophy. Organizations need to be agile in experimenting and learning from the failures to get up to speed in the competitive landscape and get the first-mover advantag
    • Simple or complex: Committing a moon-shot pathbreaking solution to a complex business problem is very tempting for business at the onset but the lack of reality check will unnecessarily pass the pressure of delivering solutions to the developers and might lead to a sub-optimal model. Be very cautious of what you commit to delivering – a recommended way would be to start with simple business problems and make their ML solutions operational, to gain clients’ trust. It is not to say that complex problems should not be attempted at all, but over-commitment with given resources – be it data, infrastructure, team skills, and the problem’s own stochastic nature might hamper the expected outcome. 
    • One vs Many: The organization working on multiple projects parallels at a much better place as against being reliant on one single high stakes project. In the worst case, if one project does not add business value as expected, we can still learn from what did not work out before shutting it down and apply the analysis to the other projects in the pipeline. Remember the old school wisdom of “diversification reduces the risk”, is equally valid in project management.
    • DS or MLOps: The skills required for building an ML model are much different than the skills, tools, and technology needed to take it to scale. This skill gap gives birth to a new role called: MLOps, which requires the engineer to take the model from the data scientists and owns the responsibility of not just deploying it but monitoring and maintaining it as well. The MLOps engineer works in collaboration with the data science and data engineering and cloud infrastructure team to deploy and operationalize ML models.
  •  

Rally up your production journey by working on these areas

    • ML-aware stakeholders: Make sure that you make the client ML-aware and share what is ML-solvable and what is not. It can happen that the probabilistic solution is not acceptable to the business. Do not assume they know the pros and cons of using ML solutions, your honest disclaimer will set clear expectations at the beginning and decide the future of the project.
    • Trust: Building and shipping ML solutions are incomplete if the client is not able to trust the ML outcome. The client can ask you questions like – why it is giving a certain output, what change in the input would have led to flipped prediction, what degree of confidence should I have in the model outcome, when does the model fail. Model Explainability plays a crucial role in productionizing the ML model and building the much-needed trust with the client to be able to take actions based on model output.
    • Data Availability: If you are building a product with no client aboard, chances are that you will not have data to prove the concept. You will need to find the closest matching data available in the public domain or at the worst simulate your own data to show the business value before the client promises to onboard and share their data. Be ready to start scratching up by curating the appropriate dataset.
    • Collaboration: Data Science is teamwork, it involves product managers, data analysts, data labelers, data scientists, solution architects, engineers, and infrastructure teams brought together to build a successful and operationalized ML model. Everyone needs to own their work and communicate effectively to wrinkle out the cross-team issues. 

Though no ML project is a doddle that can be committed with great confidence, hence we have shared a few tips in this article to make your ML projects’ lab to live journey as hassle-free as possible.

SIGN UP FOR THE DSS PLAY WEEKLY NEWSLETTER
Get the latest data science news and resources every Friday right to your inbox!