Explore Products

Best Sellers

New Releases

Books

Videos

Audiobooks

Learning Hub

Conferences

Free Learning

You're reading from Mastering Machine Learning on AWS Advanced machine learning in Python using SageMaker, Apache Spark, and TensorFlow

Product type Paperback

Published in May 2019

Publisher Packt

ISBN-13 9781789349795

Length 306 pages

Edition 1st Edition

Languages

Python

Tools

Apache Spark

Concepts

Machine Learning

Authors (2):

Maximo Gurmendez

Dr. Saket S.R. Mengle

View More author details

Table of Contents (24) Chapters

Preface

1. Section 1: Machine Learning on AWS FREE CHAPTER

2. Getting Started with Machine Learning for AWS

3. Section 2: Implementing Machine Learning Algorithms at Scale on AWS

4. Classifying Twitter Feeds with Naive Bayes

5. Predicting House Value with Regression Algorithms

6. Predicting User Behavior with Tree-Based Methods

7. Customer Segmentation Using Clustering Algorithms

8. Analyzing Visitor Patterns to Make Recommendations

9. Section 3: Deep Learning

10. Implementing Deep Learning Algorithms

11. Implementing Deep Learning with TensorFlow on AWS

12. Image Classification and Detection with SageMaker

13. Section 4: Integrating Ready-Made AWS Machine Learning Services

14. Working with AWS Comprehend

15. Using AWS Rekognition

16. Building Conversational Interfaces Using AWS Lex

17. Section 5: Optimizing and Deploying Models through AWS

18. Creating Clusters on AWS

19. Optimizing Models in Spark and SageMaker

20. Tuning Clusters for Machine Learning

21. Deploying Models Built in AWS

22. Other Books You May Enjoy

Leave a review - let other readers know what you think

Appendix: Getting Started with AWS

Clustering with Apache Spark on EMR

In this section, we step through the creation of a clustering model capable of grouping consumer patterns into three distinct clusters. The first step will be to launch an EMR notebook, along with a small cluster (a single m5.xlarge node works fine, as the dataset we selected is not very large). Simply follow these steps:

The first step is to load the dataframe and inspect the dataset:

df = spark.read.csv(SRC_PATH + 'data.csv', 
                    header=True, 
                    inferSchema=True)

The following screenshot shows the first few lines of our df dataframe:

As you can see, the dataset involves transactions of products bought by different customers at different times and in different locations. We attempt to cluster these customer transactions using k-means by looking at three factors:

The product (represented by the...

The rest of the chapter is locked

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $19.99/month. Cancel anytime

Authors (2)

Dr. Saket S.R. Mengle

Dr. Saket S.R. Mengle holds a PhD in text mining from Illinois Institute of Technology, Chicago. He has worked in a variety of fields, including text classification, information retrieval, large-scale machine learning, and linear optimization. He currently works as senior principal data scientist at dataxu, where he is responsible for developing and maintaining the algorithms that drive dataxu's real-time advertising platform.

See other products by Dr. Saket S.R. Mengle

Maximo Gurmendez

Maximo Gurmendez holds a master's degree in computer science/AI from Northeastern University, where he attended as a Fulbright Scholar. Since 2009, he has been working with dataxu as data science engineering lead. He's also the founder of Montevideo Labs (a data science and engineering consultancy). Additionally, Maximo is a computer science professor at the University of Montevideo and is director of its data science for business program.

See other products by Maximo Gurmendez