Predictive Analytics with AWS: A quick look at Amazon ML

As artificial intelligence and big data have become a ubiquitous part of our everyday lives, cloud-based machine learning services are part of a rising billion-dollar industry. Among the several services currently available in the market, Amazon Machine Learning stands out for its simplicity. In this article, we will look at Amazon Machine Learning, MLaaS, and other related concepts.

This article is an excerpt taken from the book 'Effective Amazon Machine Learning' written by Alexis Perrier.

Machine Learning as a Service

Amazon Machine Learning is an online service by Amazon Web Services (AWS) that does supervised learning for predictive analytics.

Launched in April 2015 at the AWS Summit, Amazon ML joins a growing list of cloud-based machine learning services, such as Microsoft Azure, Google prediction, IBM Watson, Prediction IO, BigML, and many others. These online machine learning services form an offer commonly referred to as Machine Learning as a Service or MLaaS following a similar denomination pattern of other cloud-based services such as SaaS, PaaS, and IaaS respectively for Software, Platform, or Infrastructure as a Service.

Studies show that MLaaS is a potentially big business trend. ABI Research, a business intelligence consultancy, estimates machine learning-based data analytics tools and services revenues to hit nearly $20 billion in 2021 as MLaaS services take off as outlined in this business report

Eugenio Pasqua, a Research Analyst at ABI Research, said the following:

"The emergence of the Machine-Learning-as-a-Service (MLaaS) model is good news for the market, as it cuts down the complexity and time required to implement machine learning and thus opens the doors to an increase in its adoption level, especially in the small-to-medium business sector."

The increased accessibility is a direct result of using an API-based infrastructure to build machine-learning models instead of developing applications from scratch. Offering efficient predictive analytics models without the need to code, host, and maintain complex code bases lowers the bar and makes ML available to smaller businesses and institutions.

Amazon ML takes this democratization approach further than the other actors in the field by significantly simplifying the predictive analytics process and its implementation. This simplification revolves around four design decisions that are embedded in the platform:

A limited set of tasks: binary classification, multi-classification, and regression

A single linear algorithm

A limited choice of metrics to assess the quality of the prediction

A simple set of tuning parameters for the underlying predictive algorithm

That somewhat constrained environment is simple enough while addressing most predictive analytics problems relevant to business. It can be leveraged across an array of different industries and use cases. Let's see how!

Leveraging full AWS integration

The AWS data ecosystem of pipelines, storage, environments, and Artificial Intelligence (AI) is also a strong argument in favor of choosing Amazon ML as a business platform for its predictive analytics needs. Although Amazon ML is simple, the service evolves to greater complexity and more powerful features once it is integrated into a larger structure of AWS data related services.

AWS is already a major factor in cloud computing. Here's what an excerpt from The Economist, August 2016 has to say about AWS (http://www.economist.com/news/business/21705849-how-open-source-software-and-cloud-computing-have-set-up-it-industry):

AWS shows no sign of slowing its progress towards full dominance of cloud computing's wide skies. It has ten times as much computing capacity as the next 14 cloud providers combined, according to Gartner, a consulting firm. AWS's sales in the past quarter were about three times the size of its closest competitor, Microsoft's Azure.

This gives an edge to Amazon ML, as many companies that are using cloud services are likely to be already using AWS. Adding simple and efficient machine learning tools to the product offering mix anticipates the rise of predictive analytics features as a standard component of web services. Seamless integration with other AWS services is a strong argument in favor of using Amazon ML despite its apparent simplicity.

The following architecture is a case study taken from an AWS January 2016 white paper titled Big Data Analytics Options on AWS (http://d0.awsstatic.com/whitepapers/Big_Data_Analytics_Options_on_AWS.pdf), showing a potential AWS architecture for sentiment analysis on social media. It shows how Amazon ML can be part of a more complex architecture of AWS services:

predictive-analytics-with-amazon-ml-img-0

Comparing performances in Amazon ML services

Keeping systems and applications simple is always difficult, but often worth it for the business. Examples abound with overloaded UIs bringing down the user experience, while products with simple, elegant interfaces and minimal features enjoy widespread popularity. The Keep It Simple mantra is even more difficult to adhere to in a context such as predictive analytics where performance is key. This is the challenge Amazon took on with its Amazon ML service.

A typical predictive analytics project is a sequence of complex operations: getting the data, cleaning the data, selecting, optimizing and validating a model and finally making predictions. In the scripting approach, data scientists develop codebases using machine learning libraries such as the Python scikit-learn library or R packages to handle all these steps from data gathering to predictions in production.

As a developer breaks down the necessary steps into modules for maintainability and testability, Amazon ML breaks down a predictive analytics project into different entities: datasource, model, evaluation, and predictions. It's the simplicity of each of these steps that makes AWS so powerful to implement successful predictive analytics projects.

Engineering data versus model variety

Having a large choice of algorithms for your predictions is always a good thing, but at the end of the day, domain knowledge and the ability to extract meaningful features from clean data is often what wins the game.

Kaggle is a well-known platform for predictive analytics competitions, where the best data scientists across the world compete to make predictions on complex datasets. In these predictive competitions, gaining a few decimals on your prediction score is what makes the difference between earning the prize or being just an extra line on the public leaderboard among thousands of other competitors. One thing Kagglers quickly learn is that choosing and tuning the model is only half the battle. Feature extraction or how to extract relevant predictors from the dataset is often the key to winning the competition.

In real life, when working on business-related problems, the quality of the data processing phase and the ability to extract meaningful signal out of raw data is the most important and time-consuming part of building an effective predictive model. It is well known that "data preparation accounts for about 80% of the work of data scientists" (http://www.forbes.com/sites/gilpress/2016/03/23/data-preparation-most-time-consuming-least-enjoyable-data-science-task-survey-says/). Model selection and algorithm optimization remains an important part of the work but is often not the deciding factor when the implementation is concerned.

A solid and robust implementation that is easy to maintain and connects to your ecosystem seamlessly is often preferred to an overly complex model developed and coded in-house, especially when the scripted model only produces small gains when compared to a service-based implementation.

Amazon's expertise and the gradient descent algorithm

Amazon has been using machine learning for the retail side of its business and has built a serious expertise in predictive analytics. This expertise translates into the choice of algorithm powering the Amazon ML service.

The Stochastic Gradient Descent (SGD) algorithm is the algorithm powering Amazon ML linear models and is ultimately responsible for the accuracy of the predictions generated by the service. The SGD algorithm is one of the most robust, resilient, and optimized algorithms. It has been used in many diverse environments, from signal processing to deep learning and for a wide variety of problems, since the 1960s with great success.

The SGD has also given rise to many highly efficient variants adapted to a wide variety of data contexts. We will come back to this important algorithm in a later chapter; suffice it to say at this point that the SGD algorithm is the Swiss army knife of all possible predictive analytics algorithm.

Several benchmarks and tests of the Amazon ML service can be found across the web (Amazon, Google, and Azure: https://blog.onliquid.com/machine-learning-services-2/ and Amazon versus scikit-learn: http://lenguyenthedat.com/minimal-data-science-2-avazu/). Overall results show that the Amazon ML performance is on a par with other MLaaS platforms, but also with scripted solutions based on popular machine learning libraries such as scikit-learn.

For a given problem in a specific context and with an available dataset and a particular choice of a scoring metric, it is probably possible to code a predictive model using an adequate library and obtain better performances than the ones obtained with Amazon ML. But what Amazon ML offers is stability, an absence of coding, and a very solid benchmark record, as well as a seamless integration with the Amazon Web Services ecosystem that already powers a large portion of the Internet.

Amazon ML service pricing strategy

As with other MLaaS providers and AWS services, Amazon ML only charges for what you consume.

The cost is broken down into the following:

An hourly rate for the computing time used to build predictive models

A prediction fee per thousand prediction samples

And in the context of real-time (streaming) predictions, a fee based on the memory allocated upfront for the model

The computational time increases as a function of the following:

The complexity of the model

The size of the input data

The number of attributes

The number and types of transformations applied

At the time of writing, these charges are as follows:

$0.42 per hour for data analysis and model building fees

$0.10 per 1,000 predictions for batch predictions

$0.0001 per prediction for real-time predictions

$0.001 per hour for each 10 MB of memory provisioned for your model

These prices do not include fees related to the data storage (S3, Redshift, or RDS), which are charged separately.

During the creation of your model, Amazon ML gives you a cost estimation based on the data source that has been selected.

The Amazon ML service is not part of the AWS free tier, a 12-month offer applicable to certain AWS services for free under certain conditions.

To summarize, we presented a simple introduction to the Amazon ML service. Amazon ML is built on a solid ground, with a simple yet very efficient algorithm driving its predictions.

If you found this post useful, be sure to check out the book 'Effective Amazon Machine Learning' to learn about predictive analytics and other concepts in AWS machine learning.

Integrate applications with AWS services: Amazon DynamoDB & Amazon Kinesis [Tutorial]

AWS makes Amazon Rekognition, its image recognition AI, available for Asia-Pacific developers

AWS Elastic Load Balancing: support added for Redirects and Fixed Responses in Application Load Balancer