Summary
This has been an interesting dive into natural-language processing and topic modeling, and hopefully we've learned a little US history at the same time. I know I have.
However, it seems that the larger takeaway is something that we all know, but likely forget: Freeform, unstructured, text data is messy, messy, messy. In fact, what we have been working with here is exceptionally clean, as these things go. Topics don't often stand out clearly, and the relationships between subjects as opposed to the topics identified by LDA are often complex and difficult to tease apart.
However, we've also seen some interesting technologies and algorithms to help us deal with the messiness. Topic modeling doesn't—and possibly shouldn't—completely sweep the ambiguities and messiness of texts under the rug, but it does help us get a handle on what's inside large collections of documents.
In the next chapter, we'll head in a different direction and apply Bayesian classification to reports of UFO sightings...