Our focus for this book is on developing deep learning systems in Go. So, naturally, now that we are talking about how to manage the data that we feed to our networks, let's take a look at a tool to do so that is also written in Go.
Pachyderm is a mature and scalable tool that offers containerized data pipelines. In these, everything you could possibly need, from data to tools, is held together in a single place where deployments can be maintained and managed and versioning for the data itself. The Pachyderm team sell their tool as Git for data, which is a useful analogy. Ideally, we want to version the entire pipeline so that we know which data was used to train, and which, in turn, gave us the specific prediction of X.
Pachyderm removes much of the complexity of managing these pipelines. Both Docker and Kubernetes run under the hood. We will explore...