Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Applied Unsupervised Learning with Python

You're reading from   Applied Unsupervised Learning with Python Discover hidden patterns and relationships in unstructured data with Python

Arrow left icon
Product type Paperback
Published in May 2019
Publisher
ISBN-13 9781789952292
Length 482 pages
Edition 1st Edition
Languages
Arrow right icon
Authors (3):
Arrow left icon
Benjamin Johnston Benjamin Johnston
Author Profile Icon Benjamin Johnston
Benjamin Johnston
Christopher Kruger Christopher Kruger
Author Profile Icon Christopher Kruger
Christopher Kruger
Aaron Jones Aaron Jones
Author Profile Icon Aaron Jones
Aaron Jones
Arrow right icon
View More author details
Toc

Table of Contents (12) Chapters Close

Applied Unsupervised Learning with Python
Preface
1. Introduction to Clustering FREE CHAPTER 2. Hierarchical Clustering 3. Neighborhood Approaches and DBSCAN 4. Dimension Reduction and PCA 5. Autoencoders 6. t-Distributed Stochastic Neighbor Embedding (t-SNE) 7. Topic Modeling 8. Market Basket Analysis 9. Hotspot Analysis Appendix

Preface

Note

About

This section briefly introduces the authors, the coverage of this book, the technical skills you'll need to get started, and the hardware and software requirements required to complete all of the included activities and exercises.

About the Book

Unsupervised learning is a useful and practical solution in situations where labeled data is not available.

Applied Unsupervised Learning with Python guides you through the best practices for using unsupervised learning techniques in tandem with Python libraries to extract meaningful information from unstructured data. The book begins by explaining how basic clustering works to find similar data points in a dataset. Once you are well-versed with the k-means algorithm and how it operates, you'll learn what dimensionality reduction is and where to apply it. As you progress, you'll learn various neural network techniques and how they can improve your model. While studying the applications of unsupervised learning, you will also learn how to mine topics that are trending on Twitter. You will complete the book by challenging yourself with various interesting activities, such as performing market basket analysis and identifying relationships between different products.

By the end of this book, you will have the skills you need to confidently build your own models using Python.

About the Authors

Benjamin Johnston is a senior data scientist for one of the world's leading data-driven medtech companies and is involved in the development of innovative digital solutions throughout the entire product development pathway, from problem definition, to solution research and development, through to final deployment. He is currently completing his PhD in machine learning, specializing in image processing and deep convolutional neural networks. He has more than 10 years' experience in medical device design and development, working in a variety of technical roles, and holds first-class honors bachelor's degrees in both engineering and medical science from the University of Sydney, Australia.

Aaron Jones is a full-time senior data scientist at one of America's biggest retailers, as well as a statistical consultant. He has built predictive and inferential models, and numerous data products while working in retail, media, and environmental science. Aaron is based in Seattle, Washington, and has a particular interest in causal modeling, clustering algorithms, natural language processing, and Bayesian statistics.

Christopher Kruger has worked as senior data scientist in the field of advertising. He has designed scalable clustering solutions for clients in a variety of industries. Chris was recently awarded a computer science master's degree from Cornell University, and currently works in the field of computer vision.

Learning Objectives

  • Gain an understanding of the basics and importance of clustering

  • Build k-means, hierarchical, and DBSCAN clustering algorithms from scratch with built-in packages

  • Explore dimensionality reduction and its applications

  • Use scikit-learn (sklearn) to implement and analyze principal component analysis (PCA) on the Iris dataset

  • Employ Keras to build autoencoder models for the CIFAR-10 dataset

  • Apply the Apriori algorithm with machine learning extensions (Mlxtend) to study transaction data

Audience

Applied Unsupervised Learning with Python is designed for developers, data scientists, and machine learning enthusiasts who are interested in unsupervised learning. Some familiarity with Python programming along with basic knowledge of mathematical concepts including exponents, square roots, means, and medians will be beneficial.

Approach

Applied Unsupervised Learning with Python takes a hands-on approach to using Python to reveal the hidden patterns in your unstructured data. It contains multiple activities that use real-life business scenarios for you to practice and apply your new skills in a highly relevant context.

Hardware Requirements

For the optimal student experience, we recommend the following hardware configuration:

  • Processor: Intel Core i5 or equivalent

  • Memory: 4 GB RAM

  • Storage: 5 GB available space

Software Requirements

We also recommend that you have the following software installed in advance:

  • OS: Windows 7 SP1 64-bit, Windows 8.1 64-bit, or Windows 10 64-bit; Linux (Ubuntu, Debian, Red Hat, or Suse); or the latest version of OS X

  • Python (3.6.5 or later, preferably 3.7; available through https://www.python.org/downloads/release/python-371/)

  • Anaconda (This is for the basemap module of mlp_toolkits; go to https://www.anaconda.com/distribution/, download the 3.7 version, and follow the instructions to install it.)

Conventions

Code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles are shown as follows: "There are no prerequisites for using the math package, and it is included in all standard installations of Python."

A block of code is set as follows:

from sklearn.datasets import make_blobs
import matplotlib.pyplot as plt
import numpy as np
import math
%matplotlib inline

New terms and important words are shown in bold. Words that you see on the screen, for example, in menus or dialog boxes, appear in the text like this: "Next, click Generate file followed by Download now and name the downloaded file metro-jul18-dec18."

Installation and Setup

Each great journey begins with a humble step. Our upcoming adventure in the world of unsupervised learning is no exception. Before we can do awesome things with data, we need to be prepared with the most productive environment. Here, we shall see how to do that.

Install Anaconda on Windows

Anaconda is a Python package manager that easily allows you to install and use the libraries needed for this book. To install it on Windows, please follow these steps:

  1. Anaconda installation for Windows is very user-friendly. Visit the download page here to get the installation executable: https://www.anaconda.com/distribution/#download-section.

  2. Double-click the installer on your computer.

  3. Follow the prompts on screen to complete the installation of Anaconda.

  4. After installation, you can access Anaconda Navigator, which will be available alongside the rest of your applications as normal.

Install Anaconda on Linux

Anaconda is a Python package manager that easily allows you to install and use the libraries needed for this book. To install it on Linux, please follow these steps:

  1. Visit the Anaconda download page here to get the installation shell script: https://www.anaconda.com/distribution/#download-section.

  2. To download the shell script directly to your Linux instance use the curl or wget retrieval libraries. The example here shows how to use curl to retrieve the file located at the URL you found on the Anaconda download page:

  3. After downloading the shell script, you can run it with the following command:

    bash Anaconda3-2019.03-Linux-x86_64.sh
  4. Running the preceding command will move you to a very user-friendly installation process. You will be prompted on where you want to install things and how you wish Anaconda to work. In this case, you should just keep all the standard settings.

  5. After Anaconda is installed, you must create environments where you will install packages you wish to use. The great thing about Anaconda environments is that you can build individual environments for specific projects you're working on! To create a new environment, use the following command:

    conda create --name my_packt_env python=3.7
  6. Once the environment is created, you can activate it using the well-named activate command:

    conda activate my_env

    That's it! You are now in your own customized environment, which will allow you to install packages as needed for your projects. To exit your environment, you can simply use the conda deactivate command.

Install Anaconda on macOS

Anaconda is a Python package manager that allows you to easily install and use the libraries needed for this book. To install it on macOS , please follow these steps:

  1. Anaconda installation for Windows is very user-friendly. Visit the download page here to get the installation executable: https://www.anaconda.com/distribution/#download-section.

  2. Make sure macOS is selected and double-click the Download button for the Python 3 installer.

  3. Follow the prompts on screen to complete the installation of Anaconda.

  4. After installation, you can access Anaconda Navigator, which will be available alongside the rest of your applications as normal.

Install Python on Windows

  1. Find your desired version of Python on the official installation page here: https://www.python.org/downloads/windows/.

  2. Ensure that you install the correct "-bit" version depending on your computer system, either 32-bit or 64-bit. You can find out this information in the System Properties window of your OS.

    After you download the installer, simply double-click the file and follow the user-friendly prompts onscreen.

Install Python on Linux

To install Python on Linux, perform the following:

  1. Open Command Prompt and verify that Python 3 is not already installed by running python3 --version.

  2. To install Python 3, run the following:

    sudo apt-get update
    sudo apt-get install python3.6
  3. If you encounter problems, there are numerous sources online that can help you troubleshoot the issue.

Install Python on macOS X

To install Python on macOS X, perform the following:

  1. Open the terminal by holding CMD + Space, typing terminal in the open search box, and hitting Enter.

  2. Install Xcode through the command line by running xcode-select --install.

  3. The easiest way to install Python 3 is with homebrew, which is installed through the command line by running ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"

  4. Add homebrew to your PATH environment variable. Open your profile in the command line by running sudo nano ~/.profile and inserting export PATH="/usr/local/opt/python/libexec/bin:$PATH" at the bottom.

  5. The final step is to install Python. In the command line, run brew install python.

  6. Note that if you install Anaconda, the latest version of Python will be installed automatically.

Additional Resources

The code bundle for this book is also hosted on GitHub at: https://github.com/TrainingByPackt/Applied-Unsupervised-Learning-with-Python. We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

We also provide a PDF file that has color images of the screenshots/diagrams used in this book. You can download it here: https://www.packtpub.com/sites/default/files/downloads/9781789952292_ColorImages.pdf.

lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image