Chapter 1, Introduction to Machine Learning and Predictive Analytics, this introductory chapter is a general presentation of the Amazon Machine Learning service and the types of predictive analytics problems it can address. We show how the service uses a simple linear model for regression and classification problems, and we present the context for successful predictions with Amazon Machine Learning.
Chapter 2, Machine Learning Definitions and Concepts, this chapter explains the machine learning concepts needed to use the Amazon Machine Learning service and fully understand how it works. What are the preparation techniques used when dealing with raw data? How do we evaluate a predictive model? What strategies are available to remediate poor predictive performances?
Chapter 3, Overview of an Amazon Machine Learning Workflow, this chapter is an overview of a simple Amazon Machine Learning project. The reader will learn how to get started on the Amazon Machine Learning platform, how to set up an account, and how to secure it. We go through a simple numeric prediction problem based on a classic dataset. We describe how to prepare the data, train and select a model, and make predictions.
Chapter 4, Loading and Preparing the Dataset, Amazon ML offers powerful features to transform the data through recipes. Working with a classic dataset, we upload data on S3, implement cross validation, create a schema, and examine available data statistics. We then extend Amazon ML feature engineering and data cleaning capabilities by using Athena, a recently launched AWS SQL service.
Chapter 5, Model Creation, in this chapter, we explore the Amazon ML data transformations and how to apply them through recipes. We train and tune models and select the best ones by analyzing different prediction metrics. We present insights into the Stochastic Gradient Descent algorithm, and the use of different types of regularization. Finally, we analyze the training logs to better understand what goes on under the Amazon ML hood during model training.
Chapter 6, Predictions and Performances, we apply our newly trained models to make predictions on previously unseen data, and we make a final assessment of their performance and robustness. We show how to make batch predictions on a given dataset and how to set up a real-time endpoint for streaming predictions.
Chapter 7, Command Line and SDK, using the AWS web interface to manage and run your projects is time-consuming. In this chapter, we move away from the web interface and start running our projects via the command line with the AWS Command Line Interface (AWS CLI) and the Python SDK with the Boto3 library. We use our new powers to implement cross validation and recursive feature selection.
Chapter 8, Creating Datasources from Redshift, in this chapter, we will use the power of SQL queries to address non-linear datasets. Creating datasources in Redshift gives us the potential for upstream SQL based feature engineering prior to datasource creation. We explore how to upload data from S3 to Redshift, access the database, run queries, and export results.
Chapter 9, Building a Streaming Data Analysis Pipeline, in the final chapter of the book, we extend Amazon ML capabilities by integrating with other AWS services. We build a fully featured streaming data flow integrating AWS Kinesis, Lambda, Redshift, and Machine Learning to implement real time tweets classification.