Machine Learning Part 1 – Statistical Machine Learning
In this chapter, we will discuss how to apply classical statistical machine learning techniques such as Naïve Bayes, term frequency-inverse document frequency (TF-IDF), support vector machines (SVMs), and conditional random fields (CRFs) to common natural language processing (NLP) tasks such as classification (or intent recognition) and slot filling.
There are two aspects of these classical techniques that we need to consider: representations and models. Representation refers to the format of the data that we are going to analyze. You will recall from Chapter 7, that it is standard to represent NLP data in formats other than lists of words. Numeric data representation formats such as vectors make it possible to use widely available numeric processing techniques, and consequently open up many possibilities for processing. In Chapter 7, we also explored data representations such as the count bag of words(BoW), TF...