Explore Products

Best Sellers

New Releases

Books

Videos

Audiobooks

Learning Hub

Conferences

Free Learning

You're reading from Mastering Machine Learning on AWS Advanced machine learning in Python using SageMaker, Apache Spark, and TensorFlow

Product type Paperback

Published in May 2019

Publisher Packt

ISBN-13 9781789349795

Length 306 pages

Edition 1st Edition

Languages

Python

Tools

Apache Spark

Concepts

Machine Learning

Authors (2):

Maximo Gurmendez

Dr. Saket S.R. Mengle

View More author details

Table of Contents (24) Chapters

Preface

1. Section 1: Machine Learning on AWS FREE CHAPTER

2. Getting Started with Machine Learning for AWS

3. Section 2: Implementing Machine Learning Algorithms at Scale on AWS

4. Classifying Twitter Feeds with Naive Bayes

5. Predicting House Value with Regression Algorithms

6. Predicting User Behavior with Tree-Based Methods

7. Customer Segmentation Using Clustering Algorithms

8. Analyzing Visitor Patterns to Make Recommendations

9. Section 3: Deep Learning

10. Implementing Deep Learning Algorithms

11. Implementing Deep Learning with TensorFlow on AWS

12. Image Classification and Detection with SageMaker

13. Section 4: Integrating Ready-Made AWS Machine Learning Services

14. Working with AWS Comprehend

15. Using AWS Rekognition

16. Building Conversational Interfaces Using AWS Lex

17. Section 5: Optimizing and Deploying Models through AWS

18. Creating Clusters on AWS

19. Optimizing Models in Spark and SageMaker

20. Tuning Clusters for Machine Learning

21. Deploying Models Built in AWS

22. Other Books You May Enjoy

Leave a review - let other readers know what you think

Appendix: Getting Started with AWS

Managing data pipelines with Glue

Data scientists and data engineers run different jobs to transform, extract, and load data into systems such as S3. For example, we might have a daily job that processes text data and stores a table with the bag-of-words table representation that we saw in Chapter 2, Classifying Twitter Feeds with Naive Bayes. We might want to update the table each day to point to the latest available data. Upstream processes can then only rely on the table name to find and process the latest version of the data. If we do not catalog this data properly, it will be very hard to combine the different data sources or even to know where the data is located, which is where the AWS Glue metastore comes in. Tables in Glue are grouped into databases; however, tables in different databases can be joined and referenced.

...

The rest of the chapter is locked

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $19.99/month. Cancel anytime

Authors (2)

Dr. Saket S.R. Mengle

Dr. Saket S.R. Mengle holds a PhD in text mining from Illinois Institute of Technology, Chicago. He has worked in a variety of fields, including text classification, information retrieval, large-scale machine learning, and linear optimization. He currently works as senior principal data scientist at dataxu, where he is responsible for developing and maintaining the algorithms that drive dataxu's real-time advertising platform.

See other products by Dr. Saket S.R. Mengle

Maximo Gurmendez

Maximo Gurmendez holds a master's degree in computer science/AI from Northeastern University, where he attended as a Fulbright Scholar. Since 2009, he has been working with dataxu as data science engineering lead. He's also the founder of Montevideo Labs (a data science and engineering consultancy). Additionally, Maximo is a computer science professor at the University of Montevideo and is director of its data science for business program.

See other products by Maximo Gurmendez