Introduction
So far in this book, we have focused on studying pandas DataFrame objects as the main data structure for the application of wrangling techniques. In this chapter, we will learn about various techniques by which we can read data into a DataFrame from external sources. Some of these sources could be text-based (such as CSV, HTML, and JSON), whereas others could be binary (that is, not in ASCII format; for example, from Excel or PDFs). We will also learn how to deal with data that is present in web pages or HTML documents.
Being able to deal with and extract meaningful data from various sources is of paramount interest to a data practitioner. Data can, and often does, come in various forms and flavors. It is essential to be able to get the data into a form that is useful for performing predictive or other kinds of downstream tasks.
As we have gone through detailed examples of basic operations with NumPy and pandas, in this chapter, we will often skip trivial code snippets...