Data management architecture for ML
Depending on the scope of your ML initiatives, you may want to consider different data management architecture patterns to support them.
For small-scale ML projects with limited data scope, team size, and cross-functional dependencies, consider purpose-built data pipelines that meet the project's specific needs. For example, suppose you only need to work with structured data from an existing data warehouse and a dataset from the public domain. In that case, you want to consider building a simple data pipeline that extracts the required data from the data warehouse and the public domain to a storage location owned by the project team on an as-needed schedule for further analysis and processing. The following figure shows a simple data management flow to support a small-scope ML project:
For large, enterprise-wide ML initiatives, the data...