Summary
In this chapter, we have learned how to build a complex machine learning workflow with the NER pipeline example. We have learned how to clean the data with the NTLK library, how to do POS tagging, and finally, how to retrain a spaCy model inside Pachyderm and output results for preview. You can do much more and tweak this example further to achieve better accuracy of NER by adding more training data and tweaking the model training parameters.
In the next chapter, we will learn how to do hyperparameter tuning in Pachyderm on an example of housing price prediction.