Packt+ | Advance your knowledge in tech

You're reading from Applied Unsupervised Learning with Python Discover hidden patterns and relationships in unstructured data with Python

Product type Paperback

Published in May 2019

Publisher

ISBN-13 9781789952292

Length 482 pages

Edition 1st Edition

Languages

Python

Tools

Scikit-learn

Concepts

Machine Learning

Authors (3):

Benjamin Johnston

Christopher Kruger

Aaron Jones

View More author details

Table of Contents (12) Chapters

Applied Unsupervised Learning with Python

Preface

1. Introduction to Clustering

2. Hierarchical Clustering FREE CHAPTER

3. Neighborhood Approaches and DBSCAN

4. Dimension Reduction and PCA

5. Autoencoders

6. t-Distributed Stochastic Neighbor Embedding (t-SNE)

7. Topic Modeling

8. Market Basket Analysis

9. Hotspot Analysis

Appendix

Unsupervised Learning versus Supervised Learning

Unsupervised learning is one of the most exciting areas of development in machine learning today. If you have explored machine learning bookwork before, you are probably familiar with the common breakout of problems in either supervised or unsupervised learning. Supervised learning encompasses the problem set of having a labeled dataset that can be used to either classify (for example, predicting smokers and non-smokers if you're looking at a lung health dataset) or fit a regression line on (for example, predicting the sale price of a home based on how many bedrooms it has). This model most closely mirrors an intuitive human approach to learning.

If you wanted to learn how to not burn your food with a basic understanding of cooking, you could build a dataset by putting your food on the burner and seeing how long it takes (input) for your food to burn (output). Eventually, as you continue to burn your food, you will build a mental model of when burning will occur and avoid it in the future. Development in supervised learning was once fast-paced and valuable, but it has since simmered down in recent years – many of the obstacles to knowing your data have already been tackled:

Figure 1.1: Differences between unsupervised and supervised learning

Conversely, unsupervised learning encompasses the problem set of having a tremendous amount of data that is unlabeled. Labeled data, in this case, would be data that has a supplied "target" outcome that you are trying to find the correlation to with supplied data (you know that you are looking for whether your food was burned in the preceding example). Unlabeled data is when you do not know what the "target" outcome is, and you only have supplied input data.

Building upon the previous example, imagine you were just dropped on planet Earth with zero knowledge of how cooking works. You are given 100 days, a stove, and a fridge full of food without any instructions on what to do. Your initial exploration of a kitchen could go in infinite directions – on day 10, you may finally learn how to open the fridge; on day 30, you may learn that food can go on the stove; and after many more days, you may unwittingly make an edible meal. As you can see, trying to find meaning in a kitchen devoid of adequate informational structure leads to very noisy data that is completely irrelevant to actually preparing a meal.

Unsupervised learning can be an answer to this problem. By looking back at your 100 days of data, clustering can be used to find patterns of similar days where a meal was produced, and you can easily review what you did on those days. However, unsupervised learning isn't a magical answer –simply finding clusters can be just as likely to help you to find pockets of similar yet ultimately useless data.

This challenge is what makes unsupervised learning so exciting. How can we find smarter techniques to speed up the process of finding clusters of information that are beneficial to our end goals?

The rest of the chapter is locked

You're reading from Applied Unsupervised Learning with Python Discover hidden patterns and relationships in unstructured data with Python

Table of Contents (12) Chapters

Unsupervised Learning versus Supervised Learning

Authors (3)

Other recommended products

Personalised recommendations for you

You're reading from Applied Unsupervised Learning with Python Discover hidden patterns and relationships in unstructured data with Python

Table of Contents (12) Chapters

Unsupervised Learning versus Supervised Learning

Unlock this book and the full library FREE for 7 days

Authors (3)

Other recommended products

Personalised recommendations for you