You're reading from Python Machine Learning Learn how to build powerful Python machine learning algorithms to generate useful data insights with this data analysis tutorial

Product type Paperback

Published in Sep 2015

Publisher Packt

ISBN-13 9781783555130

Length 454 pages

Edition 1st Edition

Languages

Python

Tools

SciPy

Concepts

Machine Learning

Author (1):

Sebastian Raschka

View More author details

Table of Contents (15) Chapters

Preface

1. Giving Computers the Ability to Learn from Data FREE CHAPTER

2. Training Machine Learning Algorithms for Classification

3. A Tour of Machine Learning Classifiers Using Scikit-learn

4. Building Good Training Sets – Data Preprocessing

5. Compressing Data via Dimensionality Reduction

6. Learning Best Practices for Model Evaluation and Hyperparameter Tuning

7. Combining Different Models for Ensemble Learning

8. Applying Machine Learning to Sentiment Analysis

9. Embedding a Machine Learning Model into a Web Application

10. Predicting Continuous Target Variables with Regression Analysis

11. Working with Unlabeled Data – Clustering Analysis

12. Training Artificial Neural Networks for Image Recognition

13. Parallelizing Neural Network Training with Theano

Index

An introduction to the basic terminology and notations

Now that we have discussed the three broad categories of machine learning—supervised, unsupervised, and reinforcement learning—let us have a look at the basic terminology that we will be using in the next chapters. The following table depicts an excerpt of the Iris dataset, which is a classic example in the field of machine learning. The Iris dataset contains the measurements of 150 iris flowers from three different species: Setosa, Versicolor, and Viriginica. Here, each flower sample represents one row in our data set, and the flower measurements in centimeters are stored as columns, which we also call the features of the dataset:

To keep the notation and implementation simple yet efficient, we will make use of some of the basics of linear algebra. In the following chapters, we will use a matrix and vector notation to refer to our data. We will follow the common convention to represent each sample as separate row in a feature matrix An introduction to the basic terminology and notations , where each feature is stored as a separate column.

The Iris dataset, consisting of 150 samples and 4 features, can then be written as a An introduction to the basic terminology and notations matrix :

Note

For the rest of this book, we will use the superscript (i) to refer to the ith training sample, and the subscript j to refer to the jth dimension of the training dataset.

We use lower-case, bold-face letters to refer to vectors An introduction to the basic terminology and notations and upper-case, bold-face letters to refer to matrices, respectively . To refer to single elements in a vector or matrix, we write the letters in italics ( or , respectively).

For example, An introduction to the basic terminology and notations refers to the first dimension of flower sample 150, the sepal width. Thus, each row in this feature matrix represents one flower instance and can be written as four-dimensional column vector , .

Each feature dimension is a 150-dimensional row vector An introduction to the basic terminology and notations , for example:

Similarly, we store the target variables (here: class labels) as a 150-dimensional column vector An introduction to the basic terminology and notations .

You're reading from Python Machine Learning Learn how to build powerful Python machine learning algorithms to generate useful data insights with this data analysis tutorial

Table of Contents (15) Chapters

An introduction to the basic terminology and notations

Note

Authors (1)

Personalised recommendations for you