Optimizing your pipeline
This section will walk you through the pipeline specification parameters that may help you optimize your pipeline to perform better. Because Pachyderm runs on top of Kubernetes, it is a highly scalable system that can help you use your underlying hardware resources wisely.
One of the biggest advantages of Pachyderm is that you can specify resources for each pipeline individually, as well as defining how many workers your pipeline will spin off for each run and what their behavior will be when they are idle and waiting for new work to come.
If you are just testing Pachyderm to understand whether or not it would work for your use case, the optimization parameters may not be as important. But if you are working on implementing an enterprise-level data science platform with multiple pipelines and massive amounts of data being injected into Pachyderm, knowing how to optimize your pipeline becomes a priority.
You must understand the concept of...