Preface
Pachyderm is a distributed version control platform for building end-to-end data science workflows. Since its creation in 2016, Pachyderm has become a go-to solution for large and small organizations. The core functionality of Pachyderm is open source and has a vivid community of engineers around it. This book walks you through basic and advanced examples of Pachyderm usage. This book will help you get started quickly and integrate a reliable data science solution into your infrastructure.
Reproducible Data Science with Pachyderm provides a clear overview of Pachyderm, as well as instructions on how to install and run Pachyderm in the cloud, and how to use the Pachyderm Software-as-a-Service (SaaS) version – Pachyderm Hub. This book has practical examples of data science technics running on a Pachyderm cluster.