Discovering the roles associated with machine learning projects in organizations
Typically, three different types of persona are involved in developing an ML solution in an organization:
- Data engineers: The data engineers create data pipelines that take in structured, semi-structured, and unstructured data from source systems and ingest them in a data lake. Once the raw data lands in the data lake, the data engineers are also responsible for securely storing the data, ensuring that the data is reliable, clean, and easy to discover and utilize by the users in the organization.
- Data scientists: Data scientists collaborate with subject matter experts (SMEs) to understand and address business problems, ensuring a solid business justification for projects. They utilize clean data from data lakes and perform feature engineering, selecting and transforming relevant features. By developing and training multiple ML models with different sets of hyperparameters, data scientists can evaluate them on test sets to identify the best-performing model. Throughout this process, collaboration with SMEs validates the models against business requirements, ensuring their alignment with objectives and key performance indicators (KPIs). This iterative approach helps data scientists select a model that effectively solves the problem and meets the specified KPIs.
- Machine learning engineers: The ML engineering teams deploy the ML models created by data scientists into production environments. It is crucial to establish procedures, governance, and access control early on, including defining data scientist access to specific environments and data. ML engineers also implement monitoring systems to track model performance and data drift. They enforce governance practices, track model lineage, and ensure access control for data security and compliance throughout the ML life cycle.
A typical ML project life cycle consists of data engineering, then data science, and lastly, production deployment by the ML engineering team. This is an iterative process.
Now, let’s take a look at the various challenges involved in productionizing ML models.