There are three types of big data: structured, semi-structured, and unstructured. World Wide Web (WWW) is the largest source of information today, and most of it is semi-structured. We learned about various types of unstructured data modeling and how several open source tools can be used to generate high-quality models.
We discussed VSM and its limitations and advantages and explored its implementation using Lucene. Then we looked at the graph-data model using Gephi and the semi-structured data model of JSON files and XML files. In the next chapter, we are going to explore various structured data models.