You're reading from Reproducible Data Science with Pachyderm Learn how to build version-controlled, end-to-end data pipelines using Pachyderm 2.0

Product type Paperback

Published in Mar 2022

Publisher Packt

ISBN-13 9781801074483

Length 364 pages

Edition 1st Edition

Languages

Tools

GitHub

Concepts

Data Science

Author (1):

Svetlana Karslioglu

View More author details

Table of Contents (16) Chapters

Preface

1. Section 1: Introduction to Pachyderm and Reproducible Data Science

2. Chapter 1: The Problem of Data Reproducibility FREE CHAPTER

3. Chapter 2: Pachyderm Basics

4. Chapter 3: Pachyderm Pipeline Specification

5. Section 2:Getting Started with Pachyderm

6. Chapter 4: Installing Pachyderm Locally

7. Chapter 5: Installing Pachyderm on a Cloud Platform

8. Chapter 6: Creating Your First Pipeline

9. Chapter 7: Pachyderm Operations

10. Chapter 8: Creating an End-to-End Machine Learning Workflow

11. Chapter 9: Distributed Hyperparameter Tuning with Pachyderm

12. Section 3:Pachyderm Clients and Tools

13. Chapter 10: Pachyderm Language Clients

14. Chapter 11: Using Pachyderm Notebooks

15. Other Books You May Enjoy

What this book covers

Chapter 1, The Problem of Data Reproducibility, discusses the problem of reproducibility in modern science and data science and how it aligns with the Pachyderm mission.

Chapter 2, Pachyderm Basics, describes basic Pachyderm concepts and primitives.

Chapter 3, Pachyderm Pipeline Specification, provides a detailed overview of the Pachyderm specification file, the main configuration file of Pachyderm pipelines.

Chapter 4, Installing Pachyderm Locally, walks you through the process of installing Pachyderm locally on your computer.

Chapter 5, Installing Pachyderm on a Cloud Platform, describes how to install Pachyderm on three major cloud platforms: Amazon Elastic Kubernetes Service (EKS), Google Kubernetes Engine (GKE), and Microsoft Azure Kubernetes Service (AKS).

Chapter 6, Creating Your First Pipeline, covers how to create a simple pipeline that processes images.

Chapter 7, Pachyderm Operations, looks at the most often used operations.

Chapter 8, Creating an End-to-End Machine Learning Workflow, shows how to deploy an end-to-end ML workflow on an example Natural Language Processing (NLP) pipeline.

Chapter 9, Distributed Hyperparameter Tuning with Pachyderm, looks at performing distributed hyperparameter tuning with a Named-Entity Recognition (NER) pipeline.

Chapter 10, Pachyderm Language Clients, walks you through the most common examples of using Pachyderm Python and Golang clients.

Chapter 11, Using Pachyderm Notebooks, discusses the Pachyderm Hub, Pachyderm's Software-as-a-Service (SaaS) platform, and you will learn about Pachyderm Notebooks, an Integrated Development Environment (IDE) for data scientists.

The rest of the chapter is locked

You're reading from Reproducible Data Science with Pachyderm Learn how to build version-controlled, end-to-end data pipelines using Pachyderm 2.0

Table of Contents (16) Chapters

What this book covers

Authors (1)

Personalised recommendations for you

You're reading from Reproducible Data Science with Pachyderm Learn how to build version-controlled, end-to-end data pipelines using Pachyderm 2.0

Table of Contents (16) Chapters

What this book covers

Unlock this book and the full library FREE for 7 days

Authors (1)

Personalised recommendations for you