Introducing TPOT
TPOT, or Tree-based Pipeline Optimization Tool, is an open source library for performing machine learning in an automated fashion with the Python programming language. Below the surface, it uses the well-known scikit-learn machine learning library to perform data preparation, transformation, and machine learning. It also uses GP procedures to discover the best-performing pipeline for a given dataset. The concept of GP is covered in later sections.
As a rule of thumb, you should use TPOT every time you need an automated machine learning pipeline. Data science is a broad field, and libraries such as TPOT enable you to spend much more time on data gathering and cleaning, as everything else is done automatically.
The following figure shows what a typical machine learning pipeline looks like:
The preceding figure shows which parts of a machine learning process can and can't be automated...