Docker sets up quite a bit of the way towards having our machine learning workflows deployed in our company's infrastructure. However, there are still a few missing pieces, as outlined here:
- How do we string the various stages of our workflow together? In this simple example, we have a training stage and a prediction stage. In other pipelines, you might also have data preprocessing, data splitting, data combining, visualization, evaluation, and so on.
- How to get the right data to the right stages of our workflow, especially as we receive new data and/or our data changes? It's not sustainable to manually copy new attributes over to a folder that is co-located with our prediction image every time we need to make new predictions, and we cannot log in to a server every time we need to update our training set...