Overview of MLflow
The ML life cycle is complex. It starts with ingesting raw data into the data/Delta lake in raw format from various batch and streaming sources. The data engineers create data pipelines using tools such as Apache Spark with Python, R, SQL, or Scala to process a large amount of data in a scalable, performant, and cost-effective manner.
The data scientists then utilize the various curated datasets in the data lake to generate feature tables to train their ML models. The data scientists prefer programming languages such as Python and R for feature engineering and libraries such as scikit-learn, pandas, NumPy, PyTorch, or any other popular ML or deep learning libraries for training and tuning ML models.
Once the models have been trained, they need to be deployed in production either as a representational state transfer (REST) application programming interface (API) for real-time inference, or a user-defined function (UDF) for batch and stream inference on Apache...