Discovering pipeline elements
This section walks you through the main Pachyderm pipeline concepts. The Pachyderm Pipeline System (PPS) is the centerpiece of Pachyderm functionality.
A Pachyderm pipeline is a sequence of computational tasks that data undergoes before it outputs the final result. For example, it could be a series of image processing tasks, such as labeling each image or applying a photo filter. Or it could be a comparison between two datasets or a finding similarities task.
A pipeline performs the following three steps:
- Downloads the data from a specified location.
- Applies the transformation steps specified by your cod.
- Outputs the result to a specified location.
The following diagram shows how a Pachyderm pipeline works:
Each Pachyderm pipeline has an input and output repository. An input repository is a filesystem within Pachyderm where it is being placed from an outside source...