Summary
In this chapter, you learned how to process unstructured information and how to represent such information by using graphs. Starting from a well-known benchmark dataset, the Reuters-21578 dataset, we applied standard NLP engines to tag and structure textual information. Then, we used these high-level features to create different types of networks: knowledge-based networks, bipartite networks, and projections for a subset of nodes, as well as a network relating the dataset topics. These different graphs have also allowed us to use the tools we presented in previous chapters to extract insights from the network representation.
We used local and global properties to show you how these quantities can represent and describe structurally different types of networks. We then used unsupervised techniques to identify semantic communities and cluster documents that belong to similar subjects/topics. Finally, we used the labeled information provided in a dataset to train supervised...