Dealing with Semi-Structured Data
We learned about various types of data in Chapter 2, Feature Extraction Methods. Let's quickly recapitulate what semi-structured data refers to. A dataset is said to be semi-structured if it is not in a row-column format but, if required, can be converted into a structured format that has a definite number of rows and columns. Often, we come across data that is stored as key-value pairs or embedded between tags, as is the case with JSON (JavaScript Object Notation) and XML (Extensible Markup Language) files. These are the most popularly used instances of semi-structured data.
JSON
JSON files are used for storing and exchanging data. JSON is human-readable and easy to interpret. Just like text files and CSV files, JSON files are language-independent. This means that different programming languages, such as Python, Java, and so on, can work with JSON files effectively. In Python, a built-in data structure called a dictionary is capable of...