Chapter 2. Exploratory Data Analysis and Visualization in Python
Analytic pipelines are not built from raw data in a single step. Rather, development is an iterative process that involves understanding the data in greater detail and systematically refining both model and inputs to solve a problem. A key part of this cycle is interactive data analysis and visualization, which can provide initial ideas for features in our predictive modeling or clues as to why an application is not behaving as expected.
Spreadsheet programs are one kind of interactive tool for this sort of exploration: they allow the user to import tabular information, pivot and summarize data, and generate charts. However, what if the data in question is too large for such a spreadsheet application? What if the data is not tabular, or is not displayed effectively as a line or bar chart? In the former case, we could simply obtain a more powerful computer, but the latter is more problematic. Simply put, many traditional...