Modeling tweet topics
In machine learning and natural language processing, a topic model is a type of statistical model used to discover the abstract topics that occur in a collection of documents. A good example or use case to illustrate this concept is Twitter. Suppose we could analyze an individual's (or an organization's) tweets to discover any overriding trend. Let's look at a simple example.
If you have a Twitter account, you can perform this exercise pretty easily (you can then apply the same process to an archive of tweets you want to focus on and/or model). First, we need to create a tweet archive file.
Under Settings, you can submit a request to receive your tweets in an archive file. Once it's ready, you'll get an email with a link to download it:
And then save your file locally:
Now that we have a data source to work with, we can move the tweets into a list object (we'll call it x) and then convert that into an R data frame object (df1):
The tweets were first converted to a data frame...