Introduction
So far in this book, we have focused on learning pandas DataFrame objects as the main data structure for the application of wrangling techniques. Now, we will learn about various techniques by which we can read data into a DataFrame from external sources. Some of those sources could be text-based (CSV, HTML, JSON, and so on), whereas some others could be binary (Excel, PDF, and so on), that is, not in ASCII format. In this chapter, we will learn how to deal with data that is present in web pages or HTML documents. This holds very high importance in the work of a data practitioner.
Note
Since we have gone through a detailed example of basic operations with NumPy and pandas, in this chapter, we will often skip trivial code snippets such as viewing a table, selecting a column, and plotting. Instead, we will focus on showing code examples for the new topics we aim to learn about here.