What this book covers
Chapter 1, The Problem of Data Reproducibility, discusses the problem of reproducibility in modern science and data science and how it aligns with the Pachyderm mission.
Chapter 2, Pachyderm Basics, describes basic Pachyderm concepts and primitives.
Chapter 3, Pachyderm Pipeline Specification, provides a detailed overview of the Pachyderm specification file, the main configuration file of Pachyderm pipelines.
Chapter 4, Installing Pachyderm Locally, walks you through the process of installing Pachyderm locally on your computer.
Chapter 5, Installing Pachyderm on a Cloud Platform, describes how to install Pachyderm on three major cloud platforms: Amazon Elastic Kubernetes Service (EKS), Google Kubernetes Engine (GKE), and Microsoft Azure Kubernetes Service (AKS).
Chapter 6, Creating Your First Pipeline, covers how to create a simple pipeline that processes images.
Chapter 7, Pachyderm Operations, looks at the most often used operations.
Chapter 8, Creating an End-to-End Machine Learning Workflow, shows how to deploy an end-to-end ML workflow on an example Natural Language Processing (NLP) pipeline.
Chapter 9, Distributed Hyperparameter Tuning with Pachyderm, looks at performing distributed hyperparameter tuning with a Named-Entity Recognition (NER) pipeline.
Chapter 10, Pachyderm Language Clients, walks you through the most common examples of using Pachyderm Python and Golang clients.
Chapter 11, Using Pachyderm Notebooks, discusses the Pachyderm Hub, Pachyderm's Software-as-a-Service (SaaS) platform, and you will learn about Pachyderm Notebooks, an Integrated Development Environment (IDE) for data scientists.