Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds
Arrow up icon
GO TO TOP
AWS Certified Machine Learning - Specialty (MLS-C01) Certification Guide

You're reading from   AWS Certified Machine Learning - Specialty (MLS-C01) Certification Guide The ultimate guide to passing the MLS-C01 exam on your first attempt

Arrow left icon
Product type Paperback
Published in Feb 2024
Publisher Packt
ISBN-13 9781835082201
Length 342 pages
Edition 2nd Edition
Languages
Tools
Arrow right icon
Authors (2):
Arrow left icon
Somanath Nanda Somanath Nanda
Author Profile Icon Somanath Nanda
Somanath Nanda
Weslley Moura Weslley Moura
Author Profile Icon Weslley Moura
Weslley Moura
Arrow right icon
View More author details
Toc

Table of Contents (13) Chapters Close

Preface 1. Chapter 1: Machine Learning Fundamentals 2. Chapter 2: AWS Services for Data Storage FREE CHAPTER 3. Chapter 3: AWS Services for Data Migration and Processing 4. Chapter 4: Data Preparation and Transformation 5. Chapter 5: Data Understanding and Visualization 6. Chapter 6: Applying Machine Learning Algorithms 7. Chapter 7: Evaluating and Optimizing Models 8. Chapter 8: AWS Application Services for AI/ML 9. Chapter 9: Amazon SageMaker Modeling 10. Chapter 10: Model Deployment 11. Chapter 11: Accessing the Online Practice Resources 12. Other Books You May Enjoy

Classifying supervised, unsupervised, and reinforcement learning

ML is a very extensive field of study; that’s why it is very important to have a clear definition of its sub-divisions. From a very broad perspective, you can split ML algorithms into two main classes: supervised learning and unsupervised learning.

Introducing supervised learning

Supervised algorithms use a class or label (from the input data) as support to find and validate the optimal solution. In Table 1.1, there is a dataset that aims to classify fraudulent transactions from a financial company.

Day of the week

Hour

Transaction amount

Merchant type

Is fraud?

Mon

09:00

$1000

Retail

No

Tue

23:00

$5500

E-commerce

Yes

Fri

14:00

$500

Travel

No

Mon

10:00

$100

Retail

No

Tue

22:00

$100

E-commerce

No

Tue

22:00

$6000

E-commerce

Yes

Table 1.1 – Sample dataset for supervised learning

The first four columns are known as features or independent variables, and they can be used by a supervised algorithm to find fraudulent patterns. For example, by combining those four features (day of the week, EST hour, transaction amount, and merchant type) and six observations (each row is technically one observation), you can infer that e-commerce transactions with a value greater than $5,000 and processed at night are potentially fraudulent cases.

Important note

In a real scenario, you will have more observations in order to have statistical support to make this type of inference.

The key point is that you were able to infer a potential fraudulent pattern just because you knew, a priori, what is fraud and what is not fraud. This information is present in the last column of Table 1.1 and is commonly referred to as a target variable, label, response variable, or dependent variable. If the input dataset has a target variable, you should be able to apply supervised learning.

In supervised learning, the target variable might store different types of data. For instance, it could be a binary column (yes or no), a multi-class column (class A, B, or C), or even a numerical column (any real number, such as a transaction amount). According to the data type of the target variable, you will find which type of supervised learning your problem refers to. Table 1.2 shows how to classify supervised learning into two main groups: classification and regression algorithms:

Data type of the target variable

Sub data type of the target variable

Type of supervised learning applicable

Categorical

Binary

Binary classification

Categorical

Multi class

Multi classification

Numerical

N/A

Regression

Table 1.2 – Choosing the right type of supervised learning given the target variable

While classification algorithms predict a class (either binary or multiple classes), regression algorithms predict a real number (either continuous or discrete).

Understanding data types is important to make the right decisions on ML projects. You can split data types into two main categories: numerical and categorical data. Numerical data can then be split into continuous or discrete subclasses, while categorical data might refer to ordinal or nominal data:

  • Numerical/discrete data refers to individual and countable items (for example, the number of students in a classroom or the number of items in an online shopping cart).
  • Numerical/continuous data refers to an infinite number of possible measurements and they often carry decimal points (for example, temperature).
  • Categorical/nominal data refers to labeled variables with no quantitative value (for example, name or gender).
  • Categorical/ordinal data adds a sense of order to a labeled variable (for example, education level or employee title level).

In other words, when choosing an algorithm for your project, you should ask yourself: do I have a target variable? Does it store categorical or numerical data? Answering these questions will put you in a better position to choose a potential algorithm that will solve your problem.

However, what if you don’t have a target variable? In that case, you are facing an unsupervised learning problem. Unsupervised problems do not provide labeled data; instead, they provide all the independent variables (or features) that will allow unsupervised algorithms to find patterns in the data. The most common type of unsupervised learning is clustering, which aims to group the observations of the dataset into different clusters, purely based on their features. Observations from the same cluster are expected to be similar to each other, but very different from observations from other clusters. Clustering will be covered in more detail in future chapters of this book.

Semi-supervised learning is also present in the ML literature. This type of algorithm can learn from partially labeled data (some observations contain a label and others do not).

Finally, another learning approach that has been taken by another class of ML algorithms is reinforcement learning. This approach rewards the system based on the good decisions that it has made autonomously; in other words, the system learns by experience.

You have been learning about approaches and classes of algorithms at a very broad level. However, it is time to get specific and introduce the term model.

lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime