Preface
Note
About
This section briefly introduces the authors, the coverage of this book, the technical skills you'll need to get started, and the hardware and software requirements required to complete all of the included activities and exercises.
About the Book
Unsupervised learning is a useful and practical solution in situations where labeled data is not available.
Applied Unsupervised Learning with Python guides you through the best practices for using unsupervised learning techniques in tandem with Python libraries to extract meaningful information from unstructured data. The book begins by explaining how basic clustering works to find similar data points in a dataset. Once you are well-versed with the k-means algorithm and how it operates, you'll learn what dimensionality reduction is and where to apply it. As you progress, you'll learn various neural network techniques and how they can improve your model. While studying the applications of unsupervised learning, you will also learn how to mine topics that are trending on Twitter. You will complete the book by challenging yourself with various interesting activities, such as performing market basket analysis and identifying relationships between different products.
By the end of this book, you will have the skills you need to confidently build your own models using Python.
About the Authors
Benjamin Johnston is a senior data scientist for one of the world's leading data-driven medtech companies and is involved in the development of innovative digital solutions throughout the entire product development pathway, from problem definition, to solution research and development, through to final deployment. He is currently completing his PhD in machine learning, specializing in image processing and deep convolutional neural networks. He has more than 10 years' experience in medical device design and development, working in a variety of technical roles, and holds first-class honors bachelor's degrees in both engineering and medical science from the University of Sydney, Australia.
Aaron Jones is a full-time senior data scientist at one of America's biggest retailers, as well as a statistical consultant. He has built predictive and inferential models, and numerous data products while working in retail, media, and environmental science. Aaron is based in Seattle, Washington, and has a particular interest in causal modeling, clustering algorithms, natural language processing, and Bayesian statistics.
Christopher Kruger has worked as senior data scientist in the field of advertising. He has designed scalable clustering solutions for clients in a variety of industries. Chris was recently awarded a computer science master's degree from Cornell University, and currently works in the field of computer vision.
Learning Objectives
Gain an understanding of the basics and importance of clustering
Build k-means, hierarchical, and DBSCAN clustering algorithms from scratch with built-in packages
Explore dimensionality reduction and its applications
Use scikit-learn (sklearn) to implement and analyze principal component analysis (PCA) on the Iris dataset
Employ Keras to build autoencoder models for the CIFAR-10 dataset
Apply the Apriori algorithm with machine learning extensions (Mlxtend) to study transaction data
Audience
Applied Unsupervised Learning with Python is designed for developers, data scientists, and machine learning enthusiasts who are interested in unsupervised learning. Some familiarity with Python programming along with basic knowledge of mathematical concepts including exponents, square roots, means, and medians will be beneficial.
Approach
Applied Unsupervised Learning with Python takes a hands-on approach to using Python to reveal the hidden patterns in your unstructured data. It contains multiple activities that use real-life business scenarios for you to practice and apply your new skills in a highly relevant context.
Hardware Requirements
For the optimal student experience, we recommend the following hardware configuration:
Processor: Intel Core i5 or equivalent
Memory: 4 GB RAM
Storage: 5 GB available space
Software Requirements
We also recommend that you have the following software installed in advance:
OS: Windows 7 SP1 64-bit, Windows 8.1 64-bit, or Windows 10 64-bit; Linux (Ubuntu, Debian, Red Hat, or Suse); or the latest version of OS X
Python (3.6.5 or later, preferably 3.7; available through https://www.python.org/downloads/release/python-371/)
Anaconda (This is for the basemap module of mlp_toolkits; go to https://www.anaconda.com/distribution/, download the 3.7 version, and follow the instructions to install it.)
Conventions
Code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles are shown as follows: "There are no prerequisites for using the math package, and it is included in all standard installations of Python."
A block of code is set as follows:
from sklearn.datasets import make_blobs import matplotlib.pyplot as plt import numpy as np import math %matplotlib inline
New terms and important words are shown in bold. Words that you see on the screen, for example, in menus or dialog boxes, appear in the text like this: "Next, click Generate file followed by Download now and name the downloaded file metro-jul18-dec18."
Installation and Setup
Each great journey begins with a humble step. Our upcoming adventure in the world of unsupervised learning is no exception. Before we can do awesome things with data, we need to be prepared with the most productive environment. Here, we shall see how to do that.
Install Anaconda on Windows
Anaconda is a Python package manager that easily allows you to install and use the libraries needed for this book. To install it on Windows, please follow these steps:
Anaconda installation for Windows is very user-friendly. Visit the download page here to get the installation executable: https://www.anaconda.com/distribution/#download-section.
Double-click the installer on your computer.
Follow the prompts on screen to complete the installation of Anaconda.
After installation, you can access Anaconda Navigator, which will be available alongside the rest of your applications as normal.
Install Anaconda on Linux
Anaconda is a Python package manager that easily allows you to install and use the libraries needed for this book. To install it on Linux, please follow these steps:
Visit the Anaconda download page here to get the installation shell script: https://www.anaconda.com/distribution/#download-section.
To download the shell script directly to your Linux instance use the curl or wget retrieval libraries. The example here shows how to use curl to retrieve the file located at the URL you found on the Anaconda download page:
After downloading the shell script, you can run it with the following command:
bash Anaconda3-2019.03-Linux-x86_64.sh
Running the preceding command will move you to a very user-friendly installation process. You will be prompted on where you want to install things and how you wish Anaconda to work. In this case, you should just keep all the standard settings.
After Anaconda is installed, you must create environments where you will install packages you wish to use. The great thing about Anaconda environments is that you can build individual environments for specific projects you're working on! To create a new environment, use the following command:
conda create --name my_packt_env python=3.7
Once the environment is created, you can activate it using the well-named activate command:
conda activate my_env
That's it! You are now in your own customized environment, which will allow you to install packages as needed for your projects. To exit your environment, you can simply use the conda deactivate command.
Install Anaconda on macOS
Anaconda is a Python package manager that allows you to easily install and use the libraries needed for this book. To install it on macOS , please follow these steps:
Anaconda installation for Windows is very user-friendly. Visit the download page here to get the installation executable: https://www.anaconda.com/distribution/#download-section.
Make sure macOS is selected and double-click the Download button for the Python 3 installer.
Follow the prompts on screen to complete the installation of Anaconda.
After installation, you can access Anaconda Navigator, which will be available alongside the rest of your applications as normal.
Install Python on Windows
Find your desired version of Python on the official installation page here: https://www.python.org/downloads/windows/.
Ensure that you install the correct "-bit" version depending on your computer system, either 32-bit or 64-bit. You can find out this information in the System Properties window of your OS.
After you download the installer, simply double-click the file and follow the user-friendly prompts onscreen.
Install Python on Linux
To install Python on Linux, perform the following:
Open Command Prompt and verify that Python 3 is not already installed by running python3 --version.
To install Python 3, run the following:
sudo apt-get update sudo apt-get install python3.6
If you encounter problems, there are numerous sources online that can help you troubleshoot the issue.
Install Python on macOS X
To install Python on macOS X, perform the following:
Open the terminal by holding CMD + Space, typing terminal in the open search box, and hitting Enter.
Install Xcode through the command line by running xcode-select --install.
The easiest way to install Python 3 is with homebrew, which is installed through the command line by running ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
Add homebrew to your PATH environment variable. Open your profile in the command line by running sudo nano ~/.profile and inserting export PATH="/usr/local/opt/python/libexec/bin:$PATH" at the bottom.
The final step is to install Python. In the command line, run brew install python.
Note that if you install Anaconda, the latest version of Python will be installed automatically.
Additional Resources
The code bundle for this book is also hosted on GitHub at: https://github.com/TrainingByPackt/Applied-Unsupervised-Learning-with-Python. We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!
We also provide a PDF file that has color images of the screenshots/diagrams used in this book. You can download it here: https://www.packtpub.com/sites/default/files/downloads/9781789952292_ColorImages.pdf.