Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Python Machine Learning By Example

You're reading from   Python Machine Learning By Example Unlock machine learning best practices with real-world use cases

Arrow left icon
Product type Paperback
Published in Jul 2024
Publisher Packt
ISBN-13 9781835085622
Length 518 pages
Edition 4th Edition
Languages
Tools
Arrow right icon
Author (1):
Arrow left icon
Yuxi (Hayden) Liu Yuxi (Hayden) Liu
Author Profile Icon Yuxi (Hayden) Liu
Yuxi (Hayden) Liu
Arrow right icon
View More author details
Toc

Table of Contents (18) Chapters Close

Preface 1. Getting Started with Machine Learning and Python FREE CHAPTER 2. Building a Movie Recommendation Engine with Naïve Bayes 3. Predicting Online Ad Click-Through with Tree-Based Algorithms 4. Predicting Online Ad Click-Through with Logistic Regression 5. Predicting Stock Prices with Regression Algorithms 6. Predicting Stock Prices with Artificial Neural Networks 7. Mining the 20 Newsgroups Dataset with Text Analysis Techniques 8. Discovering Underlying Topics in the Newsgroups Dataset with Clustering and Topic Modeling 9. Recognizing Faces with Support Vector Machine 10. Machine Learning Best Practices 11. Categorizing Images of Clothing with Convolutional Neural Networks 12. Making Predictions with Sequences Using Recurrent Neural Networks 13. Advancing Language Understanding and Generation with the Transformer Models 14. Building an Image Search Engine Using CLIP: a Multimodal Approach 15. Making Decisions in Complex Environments with Reinforcement Learning 16. Other Books You May Enjoy
17. Index

Learning without guidance – unsupervised learning

In the previous chapter, we applied t-SNE to visualize the newsgroup text data, reduced to two dimensions. t-SNE, or dimensionality reduction in general, is a type of unsupervised learning. Instead of being guided by predefined labels or categories, such as a class or membership (classification), and a continuous value (regression), unsupervised learning identifies inherent structures or commonalities in the input data. Since there is no guidance in unsupervised learning, there is no clear answer on what is a right or wrong result. Unsupervised learning has the freedom to discover hidden information underneath input data.

An easy way to understand unsupervised learning is to think of going through many practice questions for an exam. In supervised learning, you are given answers to those practice questions. You basically figure out the relationship between the questions and answers and learn how to map the questions to the answers. Hopefully, you will do well in the actual exam in the end by giving the correct answers. However, in unsupervised learning, you are not provided with the answers to those practice questions. What you might do in this instance could include the following:

  • Grouping similar practice questions so that you can later study related questions together at one time
  • Finding questions that are highly repetitive so that you don’t have to waste time working out the answer for each one individually
  • Spotting rare questions so that you can be better prepared for them
  • Extracting the key chunk of each question by removing boilerplate text so you can cut to the point

You will notice that the outcomes of all these tasks are pretty open-ended. They are correct as long as they are able to describe the commonality and the structure underneath the data.

Practice questions are the features in machine learning, which are also often called attributes, observations, or predictive variables. Answers to questions are the labels in machine learning, which are also called targets or target variables. Practice questions with answers provided are called labeled data, while practice questions without answers are called unlabeled data. Unsupervised learning works with unlabeled data and acts on that information without guidance.

Unsupervised learning can include the following types:

  • Clustering: This means grouping data based on commonality, which is often used for exploratory data analysis. Grouping similar practice questions, as mentioned earlier, is an example of clustering. Clustering techniques are widely used in customer segmentation or for grouping similar online behaviors for a marketing campaign. We will learn more about the popular algorithm k-means clustering in this chapter.
  • Association: This explores the co-occurrence of particular values of two or more features. Outlier detection (also called anomaly detection) is a typical case, where rare observations are identified. Spotting rare questions in the preceding example can be achieved using outlier detection techniques.
  • Projection: This maps the original feature space to a reduced dimensional space retaining or extracting a set of principal variables. Extracting the key chunk of practice questions is an example projection or, specifically, a dimensionality reduction. The t-SNE we learned about previously is a good example.

Unsupervised learning is extensively employed in the area of NLP mainly because of the difficulty of obtaining labeled text data. Unlike numerical data (such as house prices, stock data, and online click streams), labeling text can sometimes be subjective, manual, and tedious. Unsupervised learning algorithms that do not require labels become effective when it comes to mining text data.

In Chapter 7, Mining the 20 Newsgroups Dataset with Text Analysis Techniques, you experienced using t-SNE to reduce the dimensionality of text data. Now, let’s explore text mining with clustering algorithms and topic modeling techniques. We will start with clustering the newsgroups data.

lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image