In this chapter, we're going to summarize many of the concepts discussed in the book with the purpose of defining a complete machine learning architecture that is able to preprocess the input data, decompose/augment it, classify/cluster it, and, eventually, show the results, using graphical tools. We're also going to show how scikit-learn manages complex pipelines and how it's possible to fit them and search for the optimal parameters in the global context of a complete architecture.
In particular, we are going to discuss the following:
- Data collection, preprocessing, and augmentation
- Normalization, regularization, and dimensionality reduction
- Vectorized computation and GPU support
- Distributed architectures
- Pipelines and feature unions