Subscription

Explore Products

Best Sellers

New Releases

Books

Videos

Audiobooks

Learning Hub

Conferences

Free Learning

You're reading from Mastering Machine Learning on AWS Advanced machine learning in Python using SageMaker, Apache Spark, and TensorFlow

Product type Paperback

Published in May 2019

Publisher Packt

ISBN-13 9781789349795

Length 306 pages

Edition 1st Edition

Languages

Python

Tools

Apache Spark

Concepts

Machine Learning

Authors (2):

Maximo Gurmendez

Dr. Saket S.R. Mengle

View More author details

Table of Contents (24) Chapters

Preface

1. Section 1: Machine Learning on AWS FREE CHAPTER

2. Getting Started with Machine Learning for AWS

3. Section 2: Implementing Machine Learning Algorithms at Scale on AWS

4. Classifying Twitter Feeds with Naive Bayes

5. Predicting House Value with Regression Algorithms

6. Predicting User Behavior with Tree-Based Methods

7. Customer Segmentation Using Clustering Algorithms

8. Analyzing Visitor Patterns to Make Recommendations

9. Section 3: Deep Learning

10. Implementing Deep Learning Algorithms

11. Implementing Deep Learning with TensorFlow on AWS

12. Image Classification and Detection with SageMaker

13. Section 4: Integrating Ready-Made AWS Machine Learning Services

14. Working with AWS Comprehend

15. Using AWS Rekognition

16. Building Conversational Interfaces Using AWS Lex

17. Section 5: Optimizing and Deploying Models through AWS

18. Creating Clusters on AWS

19. Optimizing Models in Spark and SageMaker

20. Tuning Clusters for Machine Learning

21. Deploying Models Built in AWS

22. Other Books You May Enjoy

Leave a review - let other readers know what you think

Appendix: Getting Started with AWS

Introduction to the EMR architecture

In Chapter 4, Predicting User Behavior with Tree-Based Methods, we introduced EMR, which is an AWS service that allows us to run and scale Apache Spark, Hadoop, HBase, Presto, Hive, and other big data frameworks. These big data frameworks typically require a cluster of machines running specific pieces of software that are correctly configured so that the machines are able to communicate with each other. Let's look at the most commonly used products within EMR.

Apache Hadoop

Many applications, such as Spark and HBase, require Hadoop. The basic installation of Hadoop comes with two main services:

Hadoop Distributed File System (HDFS): This is a service that allows us to store large...

The rest of the chapter is locked

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $19.99/month. Cancel anytime

Authors (2)

Dr. Saket S.R. Mengle

Dr. Saket S.R. Mengle holds a PhD in text mining from Illinois Institute of Technology, Chicago. He has worked in a variety of fields, including text classification, information retrieval, large-scale machine learning, and linear optimization. He currently works as senior principal data scientist at dataxu, where he is responsible for developing and maintaining the algorithms that drive dataxu's real-time advertising platform.

See other products by Dr. Saket S.R. Mengle

Maximo Gurmendez

Maximo Gurmendez holds a master's degree in computer science/AI from Northeastern University, where he attended as a Fulbright Scholar. Since 2009, he has been working with dataxu as data science engineering lead. He's also the founder of Montevideo Labs (a data science and engineering consultancy). Additionally, Maximo is a computer science professor at the University of Montevideo and is director of its data science for business program.

See other products by Maximo Gurmendez