Additional text formats
Although text data can be viewed and read in a text editor, that doesn't mean it always contains plain text or simple columns of data. Two formats are encountered so often in today's projects that we need to spend some additional time studying them: JSON and HTML/XML. The JSON format is plain text but is structured much like a Python dictionary
. Because it is plain text, it's easy to send and receive over internet connections, and because it has structure, it can encode complex table structures, including hierarchical or tree-like tables and other forms. You will find that many APIs use JSON by default, so you will likely encounter this format at some point. If you are reading data from a website, then it is likely encoded as HTML or XML data. In Exercise 3.01 – reading data from web pages, you saw a simple example of scraping a web page using .read_html()
.
Let's look at these formats in more detail.
Working with JSON
Let...