Summary
After finishing this chapter, you should now be able to describe the core components of an analytic pipeline and the ways in which they interact. We've also examined the differences between batch and streaming processes, and some of the use cases in which each type of application is well suited. We've also walked through examples using both paradigms and the design decisions needed at each step.
In the following sections we will develop the concepts previously described, and go into greater detail on some of the technical terms brought up in the case studies. In Chapter 2, Exploratory Data Analysis and Visualization in Python, we will introduce interactive data visualization and exploration using open source Python tools. Chapter 3, Finding Patterns in the Noise – Clustering and Unsupervised Learning, describes how to identify groups of related objects in a dataset using clustering methods, also known as unsupervised learning. In contrast, Chapter 4, Connecting the Dots with Models – Regression Methods, and Chapter 5, Putting Data in its Place – Classification Methods and Analysis, explore supervised learning, whether for continuous outcomes such as prices (using regression techniques in Chapters 4, Connecting the Dots with Models – Regression Methods), or categorical responses such as user sentiment (using classification models described in Chapter 5, Putting Data in its Place – Classification Methods and Analysis). Given a large number of features, or complex data such as text or image, we may benefit by performing dimensionality reduction, as described in Chapter 6, Words and Pixels – Working with Unstructured Data. Alternatively, we may fit textual or image data using more sophisticated models such as the deep neural networks covered in Chapter 7, Learning from the Bottom Up – Deep Networks and Unsupervised Features, which can capture complex interactions between input variables. In order to use these models in business applications, we will develop a web framework to deploy analytical solutions in Chapter 8, Sharing Models with Prediction Services, and describe ongoing monitoring and refinement of the system in Chapter 9, Reporting and Testing – Iterating on Analytic Systems.
Throughout, we will emphasize both how these methods work and practical tips for choosing between different approaches for various problems. Working through the code examples will illustrate the required components for building and maintaining an application for your own use case. With these preliminaries, let's dive next into some exploratory data analysis using notebooks: a powerful way to document and share analysis.