What we're going to do is to cluster tweets on Twitter. We will be using two different clustering techniques, K-means and DBSCAN. For this chapter, we're going to rely on some skills we built up in Chapter 2, Linear Regression – House Price Prediction. We will also be using the same libraries used in Chapter 2, Linear Regression – House Price Prediction. On top of that, we will also be using the clusters library by mpraski.
By the end of the project, we will be able to clean up any collection of tweets from Twitter, and cluster them into groups. The main body of code that fulfills the objective is very simple, it's only about 150 lines of code in total. The rest of the code is for fetching and preprocessing data.