Data management considerations for ML
Data management is a broad and complex topic. Many organizations have dedicated data management teams and organizations to manage and govern the various aspects of a data platform. Traditionally, the main focus of data management has been meeting the needs of transactional systems or analytics systems. With the growing adoption of ML solutions, there are new business and technology considerations for data management platforms.
To understand where data management intersects with the ML workflow, let's bring back the ML life cycle, as shown in the following figure:
At a high level, data management intersects with the ML life cycle in three stages: data understanding and preparation, model training and evaluation, and model deployment.
During the data understanding and preparation stage, data scientists will need to identify data sources...