Data formats
When we are working with data for human consumption, the easiest way to store it is in text files. In this section, we will present parsing examples of the most common formats such as CSV, JSON, and XML. These examples will be very helpful in the following chapters.
Tip
The dataset used for these examples is a list of Pokemon by National Pokedex number, obtained from: http://bulbapedia.bulbagarden.net/All the scripts and dataset files can be found in the author's GitHub repository: https://github.com/hmcuesta/PDA_Book/tree/master/Chapter3
CSV is a very simple and common open format for table-like data, which can be exported and imported by most of the data analysis tools. CSV is a plain text format; this means that the file is a sequence of characters, with no data that has to be interpreted instead, such as binary numbers.
There are many ways to parse a CSV file from Python, and here we will discuss two:
The first eight records of the CSV file (pokemon.csv
) look like this:
id,...