Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds
The Machine Learning Workshop
The Machine Learning Workshop

The Machine Learning Workshop: Get ready to develop your own high-performance machine learning algorithms with scikit-learn , Second Edition

eBook
€20.99 €23.99
Paperback
€29.99
Subscription
Free Trial
Renews at €18.99p/m

What do you get with eBook?

Product feature icon Instant access to your Digital eBook purchase
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Product feature icon AI Assistant (beta) to help accelerate your learning
OR
Modal Close icon
Payment Processing...
tick Completed

Billing Address

Table of content icon View table of contents Preview book icon Preview Book

The Machine Learning Workshop

2. Unsupervised Learning – Real-Life Applications

Overview

This chapter explains the concept of clustering in machine learning. It explains three of the most common clustering algorithms, with a hands-on approximation to solve a real-life data problem. By the end of this chapter, you should have a firm understanding of how to create clusters out of a dataset using the k-means, mean-shift, and DBSCAN algorithms, as well as the ability to measure the accuracy of those clusters.

Introduction

In the previous chapter, we learned how to represent data in a tabular format, created features and target matrices, pre-processed data, and learned how to choose the algorithm that best suits the problem at hand. We also learned how the scikit-learn API works and why it is easy to use, as well as the difference between supervised and unsupervised learning.

This chapter focuses on the most important task in the field of unsupervised learning: clustering. Consider a situation in which you are a store owner wanting to make a targeted social media campaign to promote selected products to certain customers. Using clustering algorithms, you would be able to create subgroups of your customers, allowing you to profile those subgroups and target them accordingly. The main objective of this chapter is to solve a case study, where you will implement three different unsupervised learning solutions. These different applications serve to demonstrate the uniformity of the scikit-learn API, as well as to explain the steps taken to solve machine learning problems. By the end of this chapter, you will be able to understand the use of unsupervised learning to comprehend data in order to make informed decisions.

Clustering

Clustering is a type of unsupervised learning technique where the objective is to arrive at conclusions based on the patterns found within unlabeled input data. This technique is mainly used to segregate large data into subgroups in order to make informed decisions.

For instance, from a large list of restaurants in a city, it would be useful to segregate the data into subgroups (clusters) based on the type of food, quantity of clients, and style of experience, in order to be able to offer each cluster a service that's been configured to its specific needs.

Clustering algorithms divide the data points into n number of clusters so that the data points in the same cluster have similar features, whereas they differ significantly from the data points in other clusters.

Clustering Types

Clustering algorithms can classify data points using a methodology that is either hard or soft. The former designates data points completely to a cluster, whereas the latter method calculates the probability of each data point belonging to each cluster. For example, for a dataset containing customer's past orders that are divided into eight subgroups (clusters), hard clustering occurs when each customer is placed inside one of the eight clusters. On the other hand, soft clustering assigns each customer a probability of belonging to each of the eight clusters.

Considering that clusters are created based on the similarity between data points, clustering algorithms can be further divided into several groups, depending on the set of rules used to measure similarity. Four of the most commonly known sets of rules are explained as follows:

  • Connectivity-based models: This model's approach to similarity is based on proximity in a data space. The creation of clusters can be done by assigning all data points to a single cluster and then partitioning the data into smaller clusters as the distance between data points increases. Likewise, the algorithm can also start by assigning each data point an individual cluster, and then aggregating data points that are close by. An example of a connectivity-based model is hierarchical clustering.
  • Density-based models: As the name suggests, these models define clusters by their density in the data space. This means that areas with a high density of data points will become clusters, which are typically separated from one another by low-density areas. An example of this is the DBSCAN algorithm, which will be covered later in this chapter.
  • Distribution-based models: Models that fall into this category are based on the probability that all the data points from a cluster follow the same distribution, such as a Gaussian distribution. An example of such a model is the Gaussian Mixture algorithm, which assumes that all data points come from a mixture of a finite number of Gaussian distributions.
  • Centroid-based models: These models are based on algorithms that define a centroid for each cluster, which is updated constantly by an iterative process. The data points are assigned to the cluster where their proximity to the centroid is minimized. An example of such a model is the k-means algorithm, which will be discussed later in this chapter.

In conclusion, data points are assigned to clusters based on their similarity to each other and their difference from data points in other clusters. This classification into clusters can be either absolute or variably distributed by determining the probability of each data point belonging to each cluster.

Moreover, there is no fixed set of rules to determine similarity between data points, which is why different clustering algorithms use different rules. Some of the most commonly known sets of rules are connectivity-based, density-based, distribution-based, and centroid-based.

Applications of Clustering

As with all machine learning algorithms, clustering has many applications in different fields, some of which are as follows:

  • Search engine results: Clustering can be used to generate search engine results containing keywords that are approximate to the keywords searched by the user and ordered as per the search result with greater similarity. Consider Google as an example; it uses clustering not only for retrieving results but also for suggesting new possible searches.
  • Recommendation programs: It can also be used in recommendation programs that cluster together, for instance, people that fall into a similar profile, and then make recommendations based on the products that each member of the cluster has bought. Consider Amazon, for example, which recommends more items based on your purchase history and the purchases of similar users.
  • Image recognition: This is where clusters are used to group images that are considered to be similar. For instance, Facebook uses clustering to help suggest who is present in a picture.
  • Market segmentation: Clustering can also be used for market segmentation to divide a list of prospects or clients into subgroups in order to provide a customized experience or product. For example, Adobe uses clustering analysis to segment customers in order to target them differently by recognizing those who are more willing to spend money.

The preceding examples demonstrate that clustering algorithms can be used to solve different data problems in different industries, with the primary purpose of understanding large amounts of historical data that, in some cases, can be used to classify new instances.

Exploring a Dataset – Wholesale Customers Dataset

As part of the process of learning the behavior and applications of clustering algorithms, the following sections of this chapter will focus on solving a real-life data problem using the Wholesale Customers dataset, which is available at the UC Irvine Machine Learning Repository.

Note

Datasets in repositories may contain raw, partially pre-processed, or pre-processed data. To use any of these datasets, ensure that you read the specifications of the data that's available to understand the process that needs to be followed to model the data effectively, or whether it is the right dataset for your purpose.

For instance, the current dataset is an extract from a larger dataset, as per the following citation:

The dataset originates from a larger database referred on: Abreu, N. (2011). Analise do perfil do cliente Recheio e desenvolvimento de um sistema promocional. Mestrado em Marketing, ISCTE-IUL, Lisbon.

In the following section, we will analyze the contents of the dataset, which will then be used in Activity 2.01, Using Data Visualization to Aid the Pre-processing Process. To download a dataset from the UC Irvine Machine Learning Repository, perform the following steps:

  1. Access the following link: http://archive.ics.uci.edu/ml/datasets/Wholesale+customers.
  2. Below the dataset's title, find the download section and click on Data Folder.
  3. Click on the Wholesale Customers data.csv file to trigger the download and save the file in the same path as that of your current Jupyter Notebook.

    Note

    You can also access it by going to this book's GitHub repository: https://packt.live/3c3hfKp

Understanding the Dataset

Each step will be explained generically and will then be followed by an explanation of its application in the current case study (the Wholesale Customers dataset):

  1. First of all, it is crucial to understand the way in which data is presented by the person who's responsible for gathering and maintaining it.

    Considering that the dataset of the case study was obtained from an online repository, the format in which it is presented must be understood. The Wholesale Customers dataset consists of a snippet of historical data of clients from a wholesale distributor. It contains a total of 440 instances (each row) and eight features (each column).

  2. Next, it is important to determine the purpose of the study, which is dependent on the data that's available. Even though this might seem like a redundant statement, many data problems become problematic because the researcher does not have a clear view of the purpose of the study, and hence the pre-processing methodology, the model, and the performance metrics are chosen incorrectly.

    The purpose of using clustering algorithms on the Wholesale Customers dataset is to understand the behavior of each customer. This will allow you to group customers with similar behaviors into one cluster. The behavior of a customer will be defined by how much they spent on each category of product, as well as the channel and the region where they bought products.

  3. Subsequently explore all the features that are available. This is mainly done for two reasons: first, to rule out features that are considered to be of low relevance based on the purpose of the study or that are considered to be redundant, and second, to understand the way the values are presented to determine some of the pre-processing techniques that may be needed.

    The current case study has eight features, each one of which is considered to be relevant to the purpose of the study. Each feature is explained in the following table:

    Figure 2.1: A table explaining the features in the case study

Figure 2.1: A table explaining the features in the case study

In the preceding table, no features are to be dismissed, and nominal (categorical) features have already been handled by the author of the dataset.

As a summary, the first thing to do when choosing a dataset or being handed one is to understand the characteristics that are visible at first glance, which involves recognizing the information available, then determining the purpose of the project, and finally revising the features to select those that will be part of the study. After this, the data can be visualized so that it can be understood before it's pre-processed.

Left arrow icon Right arrow icon
Download code icon Download Code

Key benefits

  • Discover how to apply the scikit-learn uniform API in all types of machine learning models
  • Understand the difference between supervised and unsupervised learning models
  • Reinforce your understanding of machine learning concepts by working on real-world examples

Description

Machine learning algorithms are an integral part of almost all modern applications. To make the learning process faster and more accurate, you need a tool flexible and powerful enough to help you build machine learning algorithms quickly and easily. With The Machine Learning Workshop, you'll master the scikit-learn library and become proficient in developing clever machine learning algorithms. The Machine Learning Workshop begins by demonstrating how unsupervised and supervised learning algorithms work by analyzing a real-world dataset of wholesale customers. Once you've got to grips with the basics, you'll develop an artificial neural network using scikit-learn and then improve its performance by fine-tuning hyperparameters. Towards the end of the workshop, you'll study the dataset of a bank's marketing activities and build machine learning models that can list clients who are likely to subscribe to a term deposit. You'll also learn how to compare these models and select the optimal one. By the end of The Machine Learning Workshop, you'll not only have learned the difference between supervised and unsupervised models and their applications in the real world, but you'll also have developed the skills required to get started with programming your very own machine learning algorithms.

Who is this book for?

The Machine Learning Workshop is perfect for machine learning beginners. You will need Python programming experience, though no prior knowledge of scikit-learn and machine learning is necessary.

What you will learn

  • Understand how to select an algorithm that best fits your dataset and desired outcome
  • Explore popular real-world algorithms such as K-means, Mean-Shift, and DBSCAN
  • Discover different approaches to solve machine learning classification problems
  • Develop neural network structures using the scikit-learn package
  • Use the NN algorithm to create models for predicting future outcomes
  • Perform error analysis to improve your model s performance
  • Write unit tests to validate your code with JUnit

Product Details

Country selected
Publication date, Length, Edition, Language, ISBN-13
Publication date : Jul 22, 2020
Length: 286 pages
Edition : 2nd
Language : English
ISBN-13 : 9781838985462
Category :
Languages :
Tools :

What do you get with eBook?

Product feature icon Instant access to your Digital eBook purchase
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Product feature icon AI Assistant (beta) to help accelerate your learning
OR
Modal Close icon
Payment Processing...
tick Completed

Billing Address

Product Details

Publication date : Jul 22, 2020
Length: 286 pages
Edition : 2nd
Language : English
ISBN-13 : 9781838985462
Category :
Languages :
Tools :

Packt Subscriptions

See our plans and pricing
Modal Close icon
€18.99 billed monthly
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Simple pricing, no contract
€189.99 billed annually
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just €5 each
Feature tick icon Exclusive print discounts
€264.99 billed in 18 months
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just €5 each
Feature tick icon Exclusive print discounts

Frequently bought together


Stars icon
Total 92.97
The Machine Learning Workshop
€29.99
The Data Wrangling Workshop
€29.99
The Deep Learning Workshop
€32.99
Total 92.97 Stars icon

Table of Contents

6 Chapters
1. Introduction to Scikit-Learn Chevron down icon Chevron up icon
2. Unsupervised Learning – Real-Life Applications Chevron down icon Chevron up icon
3. Supervised Learning – Key Steps Chevron down icon Chevron up icon
4. Supervised Learning Algorithms: Predicting Annual Income Chevron down icon Chevron up icon
5. Supervised Learning – Key Steps Chevron down icon Chevron up icon
6. Building Your Own Program Chevron down icon Chevron up icon

Customer reviews

Top Reviews
Rating distribution
Full star icon Full star icon Full star icon Full star icon Half star icon 4.3
(6 Ratings)
5 star 50%
4 star 33.3%
3 star 16.7%
2 star 0%
1 star 0%
Filter icon Filter
Top Reviews

Filter reviews by




Allen G. Feb 26, 2021
Full star icon Full star icon Full star icon Full star icon Full star icon 5
This is an extremely well written title, and is easy to follow and learn effectively from the content. I strongly recommend it to anyone who's looking to pursue machine learning.
Amazon Verified review Amazon
Robert John Jan 08, 2021
Full star icon Full star icon Full star icon Full star icon Full star icon 5
There are two approaches to learning ML: a theoretical approach, and a practical one. Theoretical approaches are useful if you have a background in a numerical field like mathematics or statistics. Practical approaches are useful for getting your hands dirty with the application without first trying to understand why things work. This book takes the latter approach and does an excellent job of showing you how to get things done using the very popular scikit-learn library and the Python programming language. The workshop format gives you lots of practice. You can't do ML without data, and data is rarely clean. There is an entire chapter dedicated to data preprocessing. The book teaches you how to work with various algorithms, both supervised and unsupervised. It also covers evaluation metrics. If you would like to get started with putting Machine Learning, this is a good book that I would recommend!
Amazon Verified review Amazon
Bert Gollnick Oct 24, 2020
Full star icon Full star icon Full star icon Full star icon Full star icon 5
I enjoyed reading the book. It helps you to start in the field of Machine Learning with Python. You learn the main concepts based on the main Python library for this purpose: scikit-learn. You learn algorithms from different fields like unsupervised learning (e.g. different clustering techniques) and supervised learning (e.g. SVM, Naive Bayes, Decision Trees, Artificial Neural Networks). The algorithms are well explained.Overall it is a good book for Python developers who want to dive into the field of Machine Learning.
Amazon Verified review Amazon
Hakuna Matata Nov 15, 2020
Full star icon Full star icon Full star icon Full star icon Empty star icon 4
This is a well-written book with good breadth and depth - that covers all the key concepts and modules of ML. I love all the new examples - especially real world examples. I also like a quick introduction to NN - giving a taste of what's beyond the basic ML concepts. The best part is the demo environment where I got to play around with all the code snippets by each of the concepts.
Amazon Verified review Amazon
Amazon Customer Mar 18, 2021
Full star icon Full star icon Full star icon Full star icon Empty star icon 4
This book is a good introduction to data analysis and the basics of machine learning. The author does a good job covering the basic algorithms (kmeans, naive bayes, svm, decision trees and even neural networks), but does not go into too much depth regarding the mathematical or statistical underpinnings. There are sample problems with straight forward code applying each of these algorithms. One thing I liked about this book was that the author sticks to well-known machine learning packages such as scikit-learn, pandas, numpy, and matplotlib, and avoids bringing in a whole lot of extraneous minor packages. This has the benefit of not requiring the user to spend a lot of time installing other dependencies, as well as building some familiarity with these packages -- which are generally powerful and popular for good reason.Overall, if you're new to the field and looking for a quick, basic introduction to some of the most well-known machine learning algorithms and techniques, this book would be a good choice.
Amazon Verified review Amazon
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

How do I buy and download an eBook? Chevron down icon Chevron up icon

Where there is an eBook version of a title available, you can buy it from the book details for that title. Add either the standalone eBook or the eBook and print book bundle to your shopping cart. Your eBook will show in your cart as a product on its own. After completing checkout and payment in the normal way, you will receive your receipt on the screen containing a link to a personalised PDF download file. This link will remain active for 30 days. You can download backup copies of the file by logging in to your account at any time.

If you already have Adobe reader installed, then clicking on the link will download and open the PDF file directly. If you don't, then save the PDF file on your machine and download the Reader to view it.

Please Note: Packt eBooks are non-returnable and non-refundable.

Packt eBook and Licensing When you buy an eBook from Packt Publishing, completing your purchase means you accept the terms of our licence agreement. Please read the full text of the agreement. In it we have tried to balance the need for the ebook to be usable for you the reader with our needs to protect the rights of us as Publishers and of our authors. In summary, the agreement says:

  • You may make copies of your eBook for your own use onto any machine
  • You may not pass copies of the eBook on to anyone else
How can I make a purchase on your website? Chevron down icon Chevron up icon

If you want to purchase a video course, eBook or Bundle (Print+eBook) please follow below steps:

  1. Register on our website using your email address and the password.
  2. Search for the title by name or ISBN using the search option.
  3. Select the title you want to purchase.
  4. Choose the format you wish to purchase the title in; if you order the Print Book, you get a free eBook copy of the same title. 
  5. Proceed with the checkout process (payment to be made using Credit Card, Debit Cart, or PayPal)
Where can I access support around an eBook? Chevron down icon Chevron up icon
  • If you experience a problem with using or installing Adobe Reader, the contact Adobe directly.
  • To view the errata for the book, see www.packtpub.com/support and view the pages for the title you have.
  • To view your account details or to download a new copy of the book go to www.packtpub.com/account
  • To contact us directly if a problem is not resolved, use www.packtpub.com/contact-us
What eBook formats do Packt support? Chevron down icon Chevron up icon

Our eBooks are currently available in a variety of formats such as PDF and ePubs. In the future, this may well change with trends and development in technology, but please note that our PDFs are not Adobe eBook Reader format, which has greater restrictions on security.

You will need to use Adobe Reader v9 or later in order to read Packt's PDF eBooks.

What are the benefits of eBooks? Chevron down icon Chevron up icon
  • You can get the information you need immediately
  • You can easily take them with you on a laptop
  • You can download them an unlimited number of times
  • You can print them out
  • They are copy-paste enabled
  • They are searchable
  • There is no password protection
  • They are lower price than print
  • They save resources and space
What is an eBook? Chevron down icon Chevron up icon

Packt eBooks are a complete electronic version of the print edition, available in PDF and ePub formats. Every piece of content down to the page numbering is the same. Because we save the costs of printing and shipping the book to you, we are able to offer eBooks at a lower cost than print editions.

When you have purchased an eBook, simply login to your account and click on the link in Your Download Area. We recommend you saving the file to your hard drive before opening it.

For optimal viewing of our eBooks, we recommend you download and install the free Adobe Reader version 9.