Introduction
Computer algorithms are becoming better and better at analyzing large datasets. As their performance enhances, their ability to detect interesting patterns in data also improves.
The first few algorithms in this chapter demonstrate how to look at thousands of points and identify clusters. A cluster is simply a congregation of points defined by how closely they lie together. This measure of "closeness" is entirely up to us. One of the most popular closeness metrics is the Euclidian distance.
We can understand clusters by looking up at the night sky and pointing at stars that appear together. Our ancestors found it convenient to name "clusters" of stars, of which we refer to as constellations. We will be finding our own constellations in the "sky" of data points.
This chapter also focuses on classifying words. We will label words by their parts of speech as well as topic.
We will implement our own decision tree to classify practical data. Lastly, we will visualize clusters and points...