Retraining an NER model
Inaccuracy in NER pipeline results is a common problem. The only way to fix it is to retrain an existing model or train your own model completely from scratch. Training a model from scratch is a difficult and lengthy operation. In our case, we don't need to necessarily train a completely new model but instead, we can retrain the existing model to understand the missing context. To accomplish this task, we will put training data into the data-clean
repository, create a training pipeline that will train on that data, save our model to an output repository, and then run the retrained model against our original text again.
In Pachyderm terms, this means that we will create two pipelines:
- The first pipeline, called
retrain
, will train our model and output the new model to thetrain
output repository. - The second pipeline, called
my-model
, will use the new model to analyze our text and upload the results to themy-model
repository.
Now, let...