AutoML has seen a steep rise in research and new tools over the last couple of years, but its recent mention during Google IO 2017 has piqued the interest of the entire developer community. What is AutoML all about and what makes it so interesting?
Before we try to understand AutoML, let’s look at what triggered the need for automated machine learning. Until now, building machine learning models that work in the real world has been a domain ruled by researchers, scientists, and machine learning experts. The process of manually designing a machine learning model involves several complex and time-consuming steps such as:
Add to this, the several layers of neural networks required for an efficient ML architecture -- an n-layer neural network could result in nn potential networks. This level of complexity could be overwhelming for the millions of developers who are keen on embracing machine learning.
AutoML tries to solve this problem of complexity and makes machine learning accessible to a large group of developers by automating routine but complex tasks such as the design of neural networks. Since this cuts down development time significantly and takes care of several complex tasks involved in building machine learning models, AutoML is expected to play a crucial role in bringing machine learning to the mainstream.
With a growing body of research, AutoML aims to automate the following tasks in the field of machine learning:
It does this by using a wide range of algorithms and approaches such as:
One of the fundamental approaches for automating model generation is to use Bayesian methods for hyperparameter tuning. By modeling the uncertainty of parameter performance, different variations of the model can be explored which offers an optimal solution.
To further increase AutoML efficiency, meta-learning techniques are used to find and pick optimal hyperparameter settings. These techniques can be further coupled with auto-ensemble construction techniques to create effective ensemble model from a collection of models that undergo optimization. Using these techniques, a high level of accuracy can be achieved throughout the process of automated generation of models.
Certain tools like TPOT also make use of a variation of genetic programming (tree-based pipeline optimization) to automatically design and optimize ML models that offer highly accurate results for a given set of data. This approach makes use of operators at various stages of the data pipeline which are assembled together in the form of a tree-based pipeline. These are then further optimized and newer pipelines are auto-generated using genetic programming.
If these weren’t enough, Google in its recent posts disclosed that they are using reinforcement learning approach to give a further push to develop efficient AutoML techniques.
Although it’s still early days, we can already see some frameworks emerging to automate the generation of your machine learning models.
Auto-sklearn, the tool which won the ChaLearn AutoML Challenge, provides a wrapper around the popular Python library scikit-learn to automate machine learning. This is a great addition to the ever-growing ecosystem of Python data science tools. Built on top of Bayesian optimization, it takes away the hassle of algorithm selection, parameter tuning, and ensemble construction while building machine learning pipelines. With auto-sklearn, developers can create rapid iterations and refinements to their machine learning models, thereby saving a significant amount of development time. The tool is still in its early stages of development, so expect a few hiccups while using it.
DataRobot offers a machine learning automation platform to all levels of data scientists aimed at significantly reducing the time to build and deploy predictive models. Since it’s a cloud platform it offers great power and speed throughout the process of automating the model generation process. In addition to automating the development of predictive models, it offers other useful features such as a web-based interface, compatibility with several leading tools such as Hadoop and Spark, scalability, and rapid deployment. It’s one of those few machine learning automation platforms which are ready for industry use.
TPOT is yet another Python tool meant for automated machine learning. It uses a genetic programming approach to iterate and optimize machine learning models. As in the case of auto-sklearn, TPOT is also built on top of scikit-learn. It has a growing interest level on GitHub with 2400 stars and has observed a 100% rise in the past one year alone. Its goals, however, are quite similar to those of Auto-sklearn: feature construction, feature selection, model selection, and parameter optimization. With these goals in mind, TPOT aims at building efficient machine learning systems in lesser time and with better accuracy.
AutoML as a concept is still in its infancy. But as market leaders like Google, Facebook, and others research more in this field, AutoML will keep evolving at a brisk pace. Assuming that AutoML would replace humans in the field of data science, however, is a far-fetched thought and nowhere near reality.
Here is why.
AutoML as a technique is meant to make the neural network design process efficient rather than replace humans and researchers in the field of building neural networks. The primary goal of AutoML is to help experienced data scientists be more efficient at their work i.e., enhance productivity by a huge margin and to reduce the steep learning curve for the many developers who are keen on designing ML models - i.e., make ML more accessible. With the advancements in this field, it’s exciting times for developers to embrace machine learning and start building intelligent applications.
We see automated machine learning as a game changer with the power to truly democratize the building of AI apps. With automated machine learning, you don’t have to be a data scientist to develop an elegant AI app!