Challenges with productionizing machine learning use cases in organizations
At this point, we understand what a typical ML project life cycle looks like in an organization and the different personas involved in the ML process. It looks very intuitive, though we still see many enterprises struggling to deliver business value from their data science projects.
In 2017, Gartner analyst Nick Heudecker admitted that 85% of data science projects fail. A report published by Dimensional Research (https://dimensionalresearch.com/) also uncovered that only 4% of companies have been successful in deploying ML use cases to production. A recent study done by Rackspace Global Technologies in 2021 uncovered that only 20% of the 1,870 organizations in various industries have mature AI and ML practices.
Sources
See the Further reading section for more details on these statistics.
Most enterprises face some common technical challenges in successfully delivering business value from data science projects:
- Unintended data silos and messy data: Data silos can be considered as groups of data in an organization that are governed and accessible only by specific users or groups within the organization. Some valid reasons to have data silos include compliance with particular regulations around privacy laws such as General Data Protection Regulation (GDPR) in Europe or the California Privacy Rights Act (CCPA). These conditions are usually an exception to the norm. Gartner stated that almost 87% of organizations have low analytics and business intelligence maturity, meaning that data is not being fully utilized.
Data silos generally arise as different departments within organizations. They have different technology stacks to manage and process the data.
The following figure highlights this challenge:
Figure 1.3 – The tools used by the different teams in an organization and the different silos
The different personas work with different sets of tools and have different work environments. Data analysts, data engineers, data scientists, and ML engineers utilize different tools and development environments due to their distinct roles and objectives. Data analysts rely on SQL, spreadsheets, and visualization tools for insights and reporting. Data engineers work with programming languages and platforms such as Apache Spark to build and manage data infrastructure. Data scientists use statistical programming languages, ML frameworks, and data visualization libraries to develop predictive models. ML engineers combine ML expertise with software engineering skills to deploy models into production systems. These divergent toolsets can pose challenges in terms of data consistency, tool compatibility, and collaboration. Standardized processes and knowledge sharing can help mitigate these challenges and foster effective teamwork. Traditionally, there is little to no collaboration between these teams. As a result, a data science use case with a validated business value may not be developed at the required pace, negatively impacting the growth and effective management of the business.
When the concept of data lakes came up in the past decade, they promised a scalable and cheap solution to support structured and unstructured data. The goal was to enable organization-wide effective usage and collaboration of data. In reality, most data lakes ended up becoming data swamps, with little to no governance regarding the quality of data.
This inherently made ML very difficult since an ML model is only as good as the data it’s trained on.
- Building and managing an effective ML production environment is challenging: The ML teams at Google have done a lot of research on the technical challenges around setting up an ML development environment. A research paper published in NeurIPS on hidden technical debt in ML systems engineering from Google (https://proceedings.neurips.cc/paper/2015/file/86df7dcfd896fcaf2674f757a2463eba-Paper.pdf) documented that writing ML code is just a tiny piece of the whole ML development life cycle. To develop an effective ML development practice in an organization, many tools, configurations, and monitoring aspects need to be integrated into the overall architecture. One of the critical components is monitoring drift in model performance and providing feedback and retraining:
Figure 1.4 – Hidden Technical Debt in Machine Learning Systems, NeurIPS 2015
Let’s understand the requirements of an enterprise-grade ML platform a bit more.