Getting insights via clustering
So far, we got articles, analyzed their words, and measured the value of them, but how can staring at a bunch of words and numbers possibly provide some insight?
We need to create clusters around the keywords that we are interested in. Since we know the value of the words, we can easily navigate through the bags and see which one of them is a close match to our insight. In other words, what we need to do is group similar articles together.
To be more specific, a keyword will be used as a cluster center, and a group of articles which have the highest similarity (value of words) will be considered to be in closer distance with the cluster center and so eligible to be clustered in one group. This group of articles (called a cluster) provides the insight we are looking for.
For example, if we want to get some insight into the practicality of growing food, building houses, transportation, and so on on Mars, we need to look for related keywords in the corpus and find...