You're reading from Python Machine Learning By Example Unlock machine learning best practices with real-world use cases

Product type Paperback

Published in Jul 2024

Publisher Packt

ISBN-13 9781835085622

Length 518 pages

Edition 4th Edition

Languages

Python

Tools

PyTorch

Concepts

Computer Vision

Author (1):

Yuxi (Hayden) Liu

View More author details

Table of Contents (18) Chapters

Preface

1. Getting Started with Machine Learning and Python FREE CHAPTER

2. Building a Movie Recommendation Engine with Naïve Bayes

3. Predicting Online Ad Click-Through with Tree-Based Algorithms

4. Predicting Online Ad Click-Through with Logistic Regression

5. Predicting Stock Prices with Regression Algorithms

6. Predicting Stock Prices with Artificial Neural Networks

7. Mining the 20 Newsgroups Dataset with Text Analysis Techniques

8. Discovering Underlying Topics in the Newsgroups Dataset with Clustering and Topic Modeling

9. Recognizing Faces with Support Vector Machine

10. Machine Learning Best Practices

11. Categorizing Images of Clothing with Convolutional Neural Networks

12. Making Predictions with Sequences Using Recurrent Neural Networks

13. Advancing Language Understanding and Generation with the Transformer Models

14. Building an Image Search Engine Using CLIP: a Multimodal Approach

15. Making Decisions in Complex Environments with Reinforcement Learning

16. Other Books You May Enjoy

17. Index

Learning without guidance – unsupervised learning

In the previous chapter, we applied t-SNE to visualize the newsgroup text data, reduced to two dimensions. t-SNE, or dimensionality reduction in general, is a type of unsupervised learning. Instead of being guided by predefined labels or categories, such as a class or membership (classification), and a continuous value (regression), unsupervised learning identifies inherent structures or commonalities in the input data. Since there is no guidance in unsupervised learning, there is no clear answer on what is a right or wrong result. Unsupervised learning has the freedom to discover hidden information underneath input data.

An easy way to understand unsupervised learning is to think of going through many practice questions for an exam. In supervised learning, you are given answers to those practice questions. You basically figure out the relationship between the questions and answers and learn how to map the questions to the answers. Hopefully, you will do well in the actual exam in the end by giving the correct answers. However, in unsupervised learning, you are not provided with the answers to those practice questions. What you might do in this instance could include the following:

Grouping similar practice questions so that you can later study related questions together at one time
Finding questions that are highly repetitive so that you don’t have to waste time working out the answer for each one individually
Spotting rare questions so that you can be better prepared for them
Extracting the key chunk of each question by removing boilerplate text so you can cut to the point

You will notice that the outcomes of all these tasks are pretty open-ended. They are correct as long as they are able to describe the commonality and the structure underneath the data.

Practice questions are the features in machine learning, which are also often called attributes, observations, or predictive variables. Answers to questions are the labels in machine learning, which are also called targets or target variables. Practice questions with answers provided are called labeled data, while practice questions without answers are called unlabeled data. Unsupervised learning works with unlabeled data and acts on that information without guidance.

Unsupervised learning can include the following types:

Clustering: This means grouping data based on commonality, which is often used for exploratory data analysis. Grouping similar practice questions, as mentioned earlier, is an example of clustering. Clustering techniques are widely used in customer segmentation or for grouping similar online behaviors for a marketing campaign. We will learn more about the popular algorithm k-means clustering in this chapter.
Association: This explores the co-occurrence of particular values of two or more features. Outlier detection (also called anomaly detection) is a typical case, where rare observations are identified. Spotting rare questions in the preceding example can be achieved using outlier detection techniques.
Projection: This maps the original feature space to a reduced dimensional space retaining or extracting a set of principal variables. Extracting the key chunk of practice questions is an example projection or, specifically, a dimensionality reduction. The t-SNE we learned about previously is a good example.

Unsupervised learning is extensively employed in the area of NLP mainly because of the difficulty of obtaining labeled text data. Unlike numerical data (such as house prices, stock data, and online click streams), labeling text can sometimes be subjective, manual, and tedious. Unsupervised learning algorithms that do not require labels become effective when it comes to mining text data.

In Chapter 7, Mining the 20 Newsgroups Dataset with Text Analysis Techniques, you experienced using t-SNE to reduce the dimensionality of text data. Now, let’s explore text mining with clustering algorithms and topic modeling techniques. We will start with clustering the newsgroups data.

The rest of the chapter is locked

You're reading from Python Machine Learning By Example Unlock machine learning best practices with real-world use cases

Table of Contents (18) Chapters

Learning without guidance – unsupervised learning

Authors (1)

Personalised recommendations for you

You're reading from Python Machine Learning By Example Unlock machine learning best practices with real-world use cases

Table of Contents (18) Chapters

Learning without guidance – unsupervised learning

Unlock this book and the full library FREE for 7 days

Authors (1)

Personalised recommendations for you