Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases now! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Python Machine Learning

You're reading from   Python Machine Learning Machine Learning and Deep Learning with Python, scikit-learn, and TensorFlow 2

Arrow left icon
Product type Paperback
Published in Dec 2019
Publisher Packt
ISBN-13 9781789955750
Length 772 pages
Edition 3rd Edition
Languages
Tools
Arrow right icon
Authors (2):
Arrow left icon
Vahid Mirjalili Vahid Mirjalili
Author Profile Icon Vahid Mirjalili
Vahid Mirjalili
Sebastian Raschka Sebastian Raschka
Author Profile Icon Sebastian Raschka
Sebastian Raschka
Arrow right icon
View More author details
Toc

Table of Contents (21) Chapters Close

Preface 1. Giving Computers the Ability to Learn from Data 2. Training Simple Machine Learning Algorithms for Classification FREE CHAPTER 3. A Tour of Machine Learning Classifiers Using scikit-learn 4. Building Good Training Datasets – Data Preprocessing 5. Compressing Data via Dimensionality Reduction 6. Learning Best Practices for Model Evaluation and Hyperparameter Tuning 7. Combining Different Models for Ensemble Learning 8. Applying Machine Learning to Sentiment Analysis 9. Embedding a Machine Learning Model into a Web Application 10. Predicting Continuous Target Variables with Regression Analysis 11. Working with Unlabeled Data – Clustering Analysis 12. Implementing a Multilayer Artificial Neural Network from Scratch 13. Parallelizing Neural Network Training with TensorFlow 14. Going Deeper – The Mechanics of TensorFlow 15. Classifying Images with Deep Convolutional Neural Networks 16. Modeling Sequential Data Using Recurrent Neural Networks 17. Generative Adversarial Networks for Synthesizing New Data 18. Reinforcement Learning for Decision Making in Complex Environments 19. Other Books You May Enjoy 20. Index

Artificial neurons – a brief glimpse into the early history of machine learning

Before we discuss the perceptron and related algorithms in more detail, let's take a brief tour of the beginnings of machine learning. Trying to understand how the biological brain works, in order to design artificial intelligence (AI), Warren McCulloch and Walter Pitts published the first concept of a simplified brain cell, the so-called McCulloch-Pitts (MCP) neuron, in 1943 (A Logical Calculus of the Ideas Immanent in Nervous Activity, W. S. McCulloch and W. Pitts, Bulletin of Mathematical Biophysics, 5(4): 115-133, 1943). Biological neurons are interconnected nerve cells in the brain that are involved in the processing and transmitting of chemical and electrical signals, which is illustrated in the following figure:

McCulloch and Pitts described such a nerve cell as a simple logic gate with binary outputs; multiple signals arrive at the dendrites, they are then integrated into the cell body, and, if the accumulated signal exceeds a certain threshold, an output signal is generated that will be passed on by the axon.

Only a few years later, Frank Rosenblatt published the first concept of the perceptron learning rule based on the MCP neuron model (The Perceptron: A Perceiving and Recognizing Automaton, F. Rosenblatt, Cornell Aeronautical Laboratory, 1957). With his perceptron rule, Rosenblatt proposed an algorithm that would automatically learn the optimal weight coefficients that would then be multiplied with the input features in order to make the decision of whether a neuron fires (transmits a signal) or not. In the context of supervised learning and classification, such an algorithm could then be used to predict whether a new data point belongs to one class or the other.

The formal definition of an artificial neuron

More formally, we can put the idea behind artificial neurons into the context of a binary classification task where we refer to our two classes as 1 (positive class) and –1 (negative class) for simplicity. We can then define a decision function () that takes a linear combination of certain input values, x, and a corresponding weight vector, w, where z is the so-called net input :

Now, if the net input of a particular example, , is greater than a defined threshold, , we predict class 1, and class –1 otherwise. In the perceptron algorithm, the decision function, , is a variant of a unit step function:

For simplicity, we can bring the threshold, , to the left side of the equation and define a weight-zero as and so that we write z in a more compact form:

And:

In machine learning literature, the negative threshold, or weight, , is usually called the bias unit.

Linear algebra basics: dot product and matrix transpose

In the following sections, we will often make use of basic notations from linear algebra. For example, we will abbreviate the sum of the products of the values in x and w using a vector dot product, whereas superscript T stands for transpose, which is an operation that transforms a column vector into a row vector and vice versa:

For example:

Furthermore, the transpose operation can also be applied to matrices to reflect it over its diagonal, for example:

Please note that the transpose operation is strictly only defined for matrices; however, in the context of machine learning, we refer to or matrices when we use the term "vector."

In this book, we will only use very basic concepts from linear algebra; however, if you need a quick refresher, please take a look at Zico Kolter's excellent Linear Algebra Review and Reference, which is freely available at http://www.cs.cmu.edu/~zkolter/course/linalg/linalg_notes.pdf.

The following figure illustrates how the net input is squashed into a binary output (–1 or 1) by the decision function of the perceptron (left subfigure) and how it can be used to discriminate between two linearly separable classes (right subfigure):

The perceptron learning rule

The whole idea behind the MCP neuron and Rosenblatt's thresholded perceptron model is to use a reductionist approach to mimic how a single neuron in the brain works: it either fires or it doesn't. Thus, Rosenblatt's initial perceptron rule is fairly simple, and the perceptron algorithm can be summarized by the following steps:

  1. Initialize the weights to 0 or small random numbers.
  2. For each training example, :
    1. Compute the output value, .
    2. Update the weights.

Here, the output value is the class label predicted by the unit step function that we defined earlier, and the simultaneous update of each weight, , in the weight vector, w, can be more formally written as:

The update value for (or change in ), which we refer to as , is calculated by the perceptron learning rule as follows:

Where is the learning rate (typically a constant between 0.0 and 1.0), is the true class label of the ith training example, and is the predicted class label. It is important to note that all weights in the weight vector are being updated simultaneously, which means that we don't recompute the predicted label, , before all of the weights are updated via the respective update values, . Concretely, for a two-dimensional dataset, we would write the update as:

Before we implement the perceptron rule in Python, let's go through a simple thought experiment to illustrate how beautifully simple this learning rule really is. In the two scenarios where the perceptron predicts the class label correctly, the weights remain unchanged, since the update values are 0:

(1)

(2)

However, in the case of a wrong prediction, the weights are being pushed toward the direction of the positive or negative target class:

(3)

(4)

To get a better understanding of the multiplicative factor, , let's go through another simple example, where:

Let's assume that , and we misclassify this example as –1. In this case, we would increase the corresponding weight by 1 so that the net input, , would be more positive the next time we encounter this example, and thus be more likely to be above the threshold of the unit step function to classify the example as +1:

The weight update is proportional to the value of . For instance, if we have another example, , that is incorrectly classified as –1, we will push the decision boundary by an even larger extent to classify this example correctly the next time:

It is important to note that the convergence of the perceptron is only guaranteed if the two classes are linearly separable and the learning rate is sufficiently small (interested readers can find the mathematical proof in my lecture notes: https://sebastianraschka.com/pdf/lecture-notes/stat479ss19/L03_perceptron_slides.pdf.). If the two classes can't be separated by a linear decision boundary, we can set a maximum number of passes over the training dataset (epochs) and/or a threshold for the number of tolerated misclassifications—the perceptron would never stop updating the weights otherwise:

Downloading the example code

If you bought this book directly from Packt, you can download the example code files from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can download all code examples and datasets directly from https://github.com/rasbt/python-machine-learning-book-3rd-edition.

Now, before we jump into the implementation in the next section, what you just learned can be summarized in a simple diagram that illustrates the general concept of the perceptron:

The preceding diagram illustrates how the perceptron receives the inputs of an example, x, and combines them with the weights, w, to compute the net input. The net input is then passed on to the threshold function, which generates a binary output of –1 or +1—the predicted class label of the example. During the learning phase, this output is used to calculate the error of the prediction and update the weights.

You have been reading a chapter from
Python Machine Learning - Third Edition
Published in: Dec 2019
Publisher: Packt
ISBN-13: 9781789955750
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime