Michelangelo PyML: Introducing Uber’s platform for rapid machine learning development

Transportation network giants Uber have developed Michelangelo PyML - a Python-powered platform for rapid prototyping of machine learning models. The aim of this platform is to offer machine learning as a service that democratizes machine learning and makes it possible to scale the AI models to meet business needs efficiently.

Michelangelo PyML is an integration of Michelangelo - which Uber developed for large-scale machine learning in 2017. This will make it possible for their data scientists and engineers to build intelligent Python-based models that run at scale for online as well as offline tasks.

Why Uber chose PyML for Michelangelo

Uber developed Michelangelo in September 2017 with a clear focus of high performance and scalability. It currently enables Uber’s product teams to design, build, deploy and maintain machine learning solutions at scale and powers roughly close to 1 million predictions per second. However, that also came at the cost of flexibility. Users mainly were faced with 2 critical issues:

It was possible to train the models using the algorithms that were only natively supported by Michelangelo. To run unsupported algorithms, the platform’s capability had to be extended to include additional training and deployment components. This caused a lot of inconvenience at times.

The users could not use any feature transformations apart from those offered by Michelangelo’s DSL (Domain Specific Language)

Apart from these constraints, Uber also observed that data scientists usually preferred Python over other programming language, given the rich suite of libraries and frameworks available in Python for effective analytics and machine learning. Also, many data scientists gathered and worked with data locally using tools such as pandas, scikit-learn and Tensorflow, as opposed to Big Data tools such as Apache Spark and Hive, while spending hours in setting them up.

How PyML improves Michelangelo

Based on the challenges faced in using Michelangelo, Uber decided to revamp the platform by integrating PyML to make it more flexible. PyML provides a concrete framework for data scientists to build and train machine learning models that can be deployed quickly, safely and reliably across different environments. This, without any restriction on the types of data they can use or the algorithms they can choose to build the model, makes it an ideal choice of tool to integrate with a platform like Michelangelo.

By integrating Python-based models that can operate at scale with Michelangelo, Uber will now be able to handle online as well as offline queries and give smart predictions quite easily. This could be a potential masterstroke by Uber, as they try to boost their business and revenue growth after it slowed down over the last year.

Why did Uber created Hudi, an open source incremental processing framework on Apache Hadoop?

Uber’s Head of corporate development, Cameron Poetzscher, resigns following a report on a 2017 investigation into sexual misconduct

Uber’s Marmaray, an Open Source Data Ingestion and Dispersal Framework for Apache Hadoop