Introducing TPOT
The Tree-based Pipeline Optimization Tool, or TPOT for short, is a product of the University of Pennsylvania's, Computational Genetics Lab. TPOT is an automated ML tool written in Python. It helps build and optimize ML pipelines with genetic programming. Built on top of scikit-learn, TPOT helps automate the feature selection, preprocessing, construction, model selection, and parameter optimization processes by "exploring thousands of possible pipelines to find the best one". It is one of the only toolkits with a short learning curve.
The toolkit is available on GitHub to be downloaded: github.com/EpistasisLab/tpot.
To explain the framework, let's start with a minimal working example. For this example, we will be using the MNIST database of handwritten digits:
- Create a new Colab notebook and run
pip install TPOT
. TPOT can be directly used from the command line or via Python code: