Introduction
In the last chapter, the discussion focused on preparing data for modeling using dimensionality reduction and autoencoding. Large feature sets can be problematic when it comes to modeling because of multicollinearity and extensive computation and can thereby hinder real-time prediction. Dimensionality reduction using principal component analysis is one antidote to that problem. Similarly, autoencoders seek to find optimal feature encodings. You can think of autoencoders as a means of identifying quality interaction terms for the dataset. Let's now move past dimensionality reduction and look at some real-world modeling techniques.
Topic modeling is one facet of Natural Language Processing (NLP), the field of computer science exploring the syntactic and semantic analysis of natural language, which has been increasing in popularity with the increased availability of textual datasets. NLP can deal with language in almost any form, including text, speech, and images...