About the Book
Machine learning algorithms are an integral part of almost all modern applications. To make the learning process faster and more accurate, you need a tool flexible and powerful enough to help you build machine learning algorithms quickly and easily. With The Machine Learning Workshop, Second Edition, you'll master the scikit-learn library and become proficient in developing clever machine learning algorithms.
The Machine Learning Workshop, Second Edition, begins by demonstrating how unsupervised and supervised learning algorithms work by analyzing a real-world dataset of wholesale customers. Once you've got to grips with the basics, you'll develop an artificial neural network using scikit-learn and then improve its performance by fine-tuning hyperparameters. Towards the end of the workshop, you'll study the dataset of a bank's marketing activities and build machine learning models that can list clients who are likely to subscribe to a term deposit. You'll also learn how to compare these models and select the optimal one.
By the end of The Machine Learning Workshop, Second Edition, you'll not only have learned the difference between supervised and unsupervised models and their applications in the real world, but you'll also have developed the skills required to get started with programming your very own machine learning algorithms.
Audience
The Machine Learning Workshop, Second Edition, is perfect for machine learning beginners. You will need Python programming experience, though no prior knowledge of scikit-learn and machine learning is necessary.
About the Chapters
Chapter 1, Introduction to Scikit-Learn, introduces the two main topics of the book: machine learning and scikit-learn. It explains the key steps of preprocessing your input data, separating the features from the target, dealing with messy data, and rescaling the values of data.
Chapter 2, Unsupervised Learning – Real-Life Applications, explains the concept of clustering in machine learning by covering the three most common clustering algorithms.
Chapter 3, Supervised Learning – Key Steps, describes the different tasks that can be solved through supervised learning algorithms: classification and regression.
Chapter 4, Supervised Learning Algorithms: Predicting Annual Income, teaches the different concepts and steps for solving a supervised learning data problem.
Chapter 5, Artificial Neural Networks: Predicting Annual Income, shows how to solve a supervised learning classification problem using a neural network and analyze the results by performing error analysis.
Chapter 6, Building Your Own Program, explains all the steps required to develop a comprehensive machine learning solution.
Conventions
Code words in text, database table names, folder names, filenames, file extensions, path names, dummy URLs, user input, and Twitter handles are shown as follows:
"Load the titanic
dataset using the seaborn
library."
Words that you see on the screen (for example, in menus or dialog boxes) appear in the same format.
A block of code is set as follows:
import seaborn as sns titanic = sns.load_dataset('titanic') titanic.head(10)
New terms and important words are shown like this:
"Data that is missing information or that contains outliers or noise is considered to be messy data."
Code Presentation
Lines of code that span multiple lines are split using a backslash ( \
). When the code is executed, Python will ignore the backslash, and treat the code on the next line as a direct continuation of the current line.
For example:
history = model.fit(X, y, epochs=100, batch_size=5, verbose=1, \ validation_split=0.2, shuffle=False)
Comments are added into code to help explain specific bits of logic. Single-line comments are denoted using the #
symbol, as follows:
# Print the sizes of the dataset print("Number of Examples in the Dataset = ", X.shape[0]) print("Number of Features for each example = ", X.shape[1])
Multi-line comments are enclosed by triple quotes, as shown below:
""" Define a seed for the random number generator to ensure the result will be reproducible """ seed = 1 np.random.seed(seed) random.set_seed(seed)
Setting up Your Environment
Before we explore the book in detail, we need to set up specific software and tools. In the following section, we shall see how to do that.
Installing Python on Windows and MacOS
Follow these steps to install Python 3.7 on Windows and macOS:
- Visit https://www.python.org/downloads/release/python-376/ to download Python 3.7.
- At the bottom of the page, locate the table under the heading
Files
:For Windows, click on
Windows x86-64 executable installer
for 64-bit orWindows x86 executable installer
for 32-bit.For macOS, click on
macOS 64-bit/32-bit installer
for macOS 10.6 and later, ormacOS 64-bit installer
for macOS 10.9 and later. - Run the installer that you have downloaded.
Installing Python on Linux
- Open your Terminal and type the following command:
sudo apt-get install python3.7
Installing pip
pip
is included by default with the installation of Python 3.7. However, it may be the case that it does not get installed. To check whether it was installed, execute the following command in your Terminal or Command Prompt:
pip --version
You might need to use the pip3
command, due to previous versions of pip
on your computer that are already using the pip
command.
If the pip
command (or pip3
) is not recognized by your machine, follow these steps to install it:
- To install
pip
, visit https://pip.pypa.io/en/stable/installing/ and download theget-pip.py
file. - Then, on the Terminal or Command Prompt, use the following command to install it:
python get-pip.py
You might need to use the python3 get-pip.py
command, due to previous versions of Python on your machine that are already using the python
command.
Installing Libraries
pip
comes pre-installed with Anaconda. Once Anaconda is installed on your machine, all the required libraries can be installed using pip
, for example, pip install numpy
. Alternatively, you can install all the required libraries using pip install –r requirements.txt
. You can find the requirements.txt
file at https://packt.live/2Ar1i3v.
The exercises and activities will be executed in Jupyter Notebooks. Jupyter is a Python library and can be installed in the same way as the other Python libraries – that is, with pip install jupyter
, but fortunately, it comes pre-installed with Anaconda. To open a notebook, simply run the command jupyter notebook
in the Terminal or Command Prompt.
Opening a Jupyter Notebook
- Open a Terminal/Command Prompt.
- In the Terminal/Command Prompt, go to the directory location where you have cloned the book's repository.
- Open a Jupyter notebook by typing in the following command:
jupyter notebook
- By executing the previous command, you will be able to use Jupyter notebooks through the default browser of your machine.
Accessing the Code Files
You can find the complete code files of this book at https://packt.live/2wkiC8d. You can also run many activities and exercises directly in your web browser by using the interactive lab environment at https://packt.live/3cYbopv.
We've tried to support interactive versions of all activities and exercises, but we recommend a local installation as well for instances where this support isn't available.
The high-quality color images used in this book can be found at https://packt.live/3exaFfJ.
If you have any issues or questions about installation, please email us at workshops@packt.com.