Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds
Arrow up icon
GO TO TOP
AWS Certified Machine Learning - Specialty (MLS-C01) Certification Guide

You're reading from   AWS Certified Machine Learning - Specialty (MLS-C01) Certification Guide The ultimate guide to passing the MLS-C01 exam on your first attempt

Arrow left icon
Product type Paperback
Published in Feb 2024
Publisher Packt
ISBN-13 9781835082201
Length 342 pages
Edition 2nd Edition
Languages
Tools
Arrow right icon
Authors (2):
Arrow left icon
Somanath Nanda Somanath Nanda
Author Profile Icon Somanath Nanda
Somanath Nanda
Weslley Moura Weslley Moura
Author Profile Icon Weslley Moura
Weslley Moura
Arrow right icon
View More author details
Toc

Table of Contents (13) Chapters Close

Preface 1. Chapter 1: Machine Learning Fundamentals 2. Chapter 2: AWS Services for Data Storage FREE CHAPTER 3. Chapter 3: AWS Services for Data Migration and Processing 4. Chapter 4: Data Preparation and Transformation 5. Chapter 5: Data Understanding and Visualization 6. Chapter 6: Applying Machine Learning Algorithms 7. Chapter 7: Evaluating and Optimizing Models 8. Chapter 8: AWS Application Services for AI/ML 9. Chapter 9: Amazon SageMaker Modeling 10. Chapter 10: Model Deployment 11. Chapter 11: Accessing the Online Practice Resources 12. Other Books You May Enjoy

Introducing ML frameworks

Being aware of some ML frameworks will put you in a much better position to pass the AWS Machine Learning Specialty exam. There is no need to master these frameworks since this is not a framework-specific certification; however, knowing some common terms and solutions will help you to understand the context of the problems/questions.

scikit-learn is probably the most popular ML framework that you should be aware of. It is an open source Python package that provides implementations of ML algorithms such as decision trees, support vector machines, linear regression, and many others. It also implements classes for data preprocessing, for example, one-hot encoding, label encoders, principal component analysis, and so on. All these preprocessing methods (and many others) will be covered in later sections of this book.

The downside of scikit-learn is the fact that it needs customization to scale up through multiple machines. There is another ML library that is very popular because of the fact that it can handle multiprocessing straight away: Spark’s ML library.

As the name suggests, it is an ML library that runs on top of Apache Spark, which is a unified analytical multi-processing framework used to process data on multiple machines. AWS offers a specific service that allows developers to create Spark clusters with a few clicks, known as EMR. Additionally, SageMaker (a fully managed ML service provided by AWS, which you will cover in a separate chapter) is well integrated with Apache Spark.

The Spark ML library is in constant development. As of the time of writing, it offers support to many ML classes of algorithms, such as classification and regression, clustering, and collaborative filtering. It also offers support for basic statistics computation, such as correlations and some hypothesis tests, as well as many data transformations, such as one-hot encoding, principal component analysis, min-max scaling, and others.

Another very popular ML framework is known as TensorFlow. This ML framework was created by the Google team and it is used for numerical computation and large-scale ML model development. TensorFlow implements not only traditional ML algorithms but also DL models.

TensorFlow is considered a low-level API for model development, which means that it can be very complex to develop more sophisticated models, such as transformers (for text mining). As an attempt to facilitate model development, other ML frameworks were built on top of TensorFlow to make it easier. One of these high-level frameworks is Keras. With Keras, developers can create complex DL models with just a few lines of code. More recently, Keras was incorporated into TensorFlow and it can be now called inside the TensorFlow library.

MXNet is another open source DL library. Using MXNet, you can scale up neural network-based models using multiple GPUs running on multiple machines. It also supports different programming languages, such as Python, R, Scala, and Java.

Graphical processing unit (GPU) support is particularly important in DL libraries such as TensorFlow and MXNet. These libraries allow developers to create and deploy neural network-based models with multiple layers. The training process of neural networks relies a lot on matrix operations, which perform much better on GPUs rather than on CPUs. That’s why these DL libraries commonly offer GPU support. AWS also offers EC2 instances with GPU enabled.

These ML frameworks need a special channel to communicate with GPU units. NVIDIA, the most common supplier of GPUs nowadays, has created an API called the Compute Unified Device Architecture (CUDA). CUDA is used to configure GPU units on NVIDIA devices; for example, setting up caching memory and the number of threads needed to train a neural network model. There is no need to master CUDA or GPU architecture for the AWS Machine Learning Specialty exam, but you definitely need to know what they are and how DL models take advantage of them.

Last, but not least, you should also be aware of some development frameworks widely used by the data science community, but not necessarily to create ML models. These frameworks interoperate with ML libraries to facilitate data manipulation and calculations. For example: pandas is a Python library that provides data processing capabilities and NumPy is an open source Python library that provides numerical computing.

These terms and libraries are so incorporated into data scientists’ daily routines that they might come up during the exam to explain some problem domain for you. Being aware of what they are will help you to quickly understand the context of the question.

lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image