Summary
This chapter has explored some of the basic and most useful classical statistical techniques for NLP. They are especially valuable for small projects that start out without a large amount of training data, and for the exploratory work that often precedes a large-scale project.
We started out by learning about some basic evaluation concepts. We learned particularly about accuracy, but we also looked at some confusion matrices. We also learned how to apply Naïve Bayes classification to texts represented in TF-IDF format, and then we worked through the same classification task using a more modern technique, SVMs. Comparing the results produced by Naïve Bayes and SVMs, we saw that we got better performance from the SVMs. We then turned our attention to a related NLP task, slot-filling. We learned about different ways to represent slot-tagged data and finally illustrated CRFs with a restaurant recommendation task. These are all standard approaches that are good to have...