Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Mastering Machine Learning on AWS

You're reading from   Mastering Machine Learning on AWS Advanced machine learning in Python using SageMaker, Apache Spark, and TensorFlow

Arrow left icon
Product type Paperback
Published in May 2019
Publisher Packt
ISBN-13 9781789349795
Length 306 pages
Edition 1st Edition
Languages
Arrow right icon
Authors (2):
Arrow left icon
Maximo Gurmendez Maximo Gurmendez
Author Profile Icon Maximo Gurmendez
Maximo Gurmendez
Dr. Saket S.R. Mengle Dr. Saket S.R. Mengle
Author Profile Icon Dr. Saket S.R. Mengle
Dr. Saket S.R. Mengle
Arrow right icon
View More author details
Toc

Table of Contents (24) Chapters Close

Preface 1. Section 1: Machine Learning on AWS FREE CHAPTER
2. Getting Started with Machine Learning for AWS 3. Section 2: Implementing Machine Learning Algorithms at Scale on AWS
4. Classifying Twitter Feeds with Naive Bayes 5. Predicting House Value with Regression Algorithms 6. Predicting User Behavior with Tree-Based Methods 7. Customer Segmentation Using Clustering Algorithms 8. Analyzing Visitor Patterns to Make Recommendations 9. Section 3: Deep Learning
10. Implementing Deep Learning Algorithms 11. Implementing Deep Learning with TensorFlow on AWS 12. Image Classification and Detection with SageMaker 13. Section 4: Integrating Ready-Made AWS Machine Learning Services
14. Working with AWS Comprehend 15. Using AWS Rekognition 16. Building Conversational Interfaces Using AWS Lex 17. Section 5: Optimizing and Deploying Models through AWS
18. Creating Clusters on AWS 19. Optimizing Models in Spark and SageMaker 20. Tuning Clusters for Machine Learning 21. Deploying Models Built in AWS 22. Other Books You May Enjoy Appendix: Getting Started with AWS

Managing data pipelines with Glue

Data scientists and data engineers run different jobs to transform, extract, and load data into systems such as S3. For example, we might have a daily job that processes text data and stores a table with the bag-of-words table representation that we saw in Chapter 2, Classifying Twitter Feeds with Naive Bayes. We might want to update the table each day to point to the latest available data. Upstream processes can then only rely on the table name to find and process the latest version of the data. If we do not catalog this data properly, it will be very hard to combine the different data sources or even to know where the data is located, which is where the AWS Glue metastore comes in. Tables in Glue are grouped into databases; however, tables in different databases can be joined and referenced.

...
lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime