Packt+ | Advance your knowledge in tech

You're reading from Machine Learning with R R gives you access to the cutting-edge software you need to prepare data for machine learning. No previous knowledge required ‚Äì this book will take you methodically through every stage of applying machine learning.

Product type Paperback

Published in Oct 2013

Publisher Packt

ISBN-13 9781782162148

Length 396 pages

Edition 1st Edition

Languages

Concepts

Machine Learning

Author (1):

Brett Lantz

View More author details

Table of Contents (19) Chapters

Machine Learning with R

Credits

About the Author

About the Reviewers

www.PacktPub.com

Preface

1. Introducing Machine Learning FREE CHAPTER

2. Managing and Understanding Data

3. Lazy Learning – Classification Using Nearest Neighbors

4. Probabilistic Learning – Classification Using Naive Bayes

5. Divide and Conquer – Classification Using Decision Trees and Rules

6. Forecasting Numeric Data – Regression Methods

7. Black Box Methods – Neural Networks and Support Vector Machines

8. Finding Patterns – Market Basket Analysis Using Association Rules

9. Finding Groups of Data – Clustering with k-means

10. Evaluating Model Performance

11. Improving Model Performance

12. Specialized Machine Learning Topics

Index

Choosing a machine learning algorithm

The process of choosing a machine learning algorithm involves matching the characteristics of the data to be learned to the biases of the available approaches. Since the choice of a machine learning algorithm is largely dependent upon the type of data you are analyzing and the proposed task at hand, it is often helpful to be thinking about this process while you are gathering, exploring, and cleaning your data.

Tip

It may be tempting to learn a couple of machine learning techniques and apply them to everything, but resist this temptation. No machine learning approach is best for every circumstance. This fact is described by the No Free Lunch theorem, introduced by David Wolpert in 1996. For more information, visit: http://www.no-free-lunch.org.

Thinking about the input data

All machine learning algorithms require input training data. The exact format may differ, but in its most basic form, input data takes the form of examples and features.

An example is literally a single exemplary instance of the underlying concept to be learned; it is one set of data describing the atomic unit of interest for the analysis. If you were building a learning algorithm to identify spam e-mail, the examples would be data from many individual electronic messages. To detect cancerous tumors, the examples might comprise biopsies from a number of patients.

The phrase unit of observation is used to describe the units that the examples are measured in. Commonly, the unit of observation is in the form of transactions, persons, time points, geographic regions, or measurements. Other possibilities include combinations of these such as person years, which would denote cases where the same person is tracked over multiple time points.

A feature is a characteristic or attribute of an example, which might be useful for learning the desired concept. In the previous examples, attributes in the spam detection dataset might consist of the words used in the e-mail messages. For the cancer dataset, the attributes might be genomic data from the biopsied cells, or measured characteristics of the patient such as weight, height, or blood pressure.

The following spreadsheet shows a dataset in matrix format, which means that each example has the same number of features. In matrix data, each row in the spreadsheet is an example and each column is a feature. Here, the rows indicate examples of automobiles while the columns record various features of the cars such as the price, mileage, color, and transmission. Matrix format data is by far the most common form used in machine learning, though as you will see in later chapters, other forms are used occasionally in specialized cases.

Features come in various forms as well. If a feature represents a characteristic measured in numbers, it is unsurprisingly called numeric. Alternatively, if it measures an attribute that is represented by a set of categories, the feature is called categorical or nominal. A special case of categorical variables is called ordinal, which designates a nominal variable with categories falling in an ordered list. Some examples of ordinal variables include clothing sizes such as small, medium, and large, or a measurement of customer satisfaction on a scale from 1 to 5. It is important to consider what the features represent because the type and number of features in your dataset will assist with determining an appropriate machine learning algorithm for your task.

Thinking about types of machine learning algorithms

Machine learning algorithms can be divided into two main groups: supervised learners that are used to construct predictive models, and unsupervised learners that are used to build descriptive models. Which type you will need to use depends on the learning task you hope to accomplish.

A predictive model is used for tasks that involve, as the name implies, the prediction of one value using other values in the dataset. The learning algorithm attempts to discover and model the relationship among the target feature (the feature being predicted) and the other features. Despite the common use of the word "prediction" to imply forecasting predictive models need not necessarily foresee future events. For instance, a predictive model could be used to predict past events such as the date of a baby's conception using the mother's hormone levels; or, predictive models could be used in real time to control traffic lights during rush hours.

Because predictive models are given clear instruction on what they need to learn and how they are intended to learn it, the process of training a predictive model is known as supervised learning. The supervision does not refer to human involvement, but rather the fact that the target values provide a supervisory role, which indicates to the learner the task it needs to learn. Specifically, given a set of data, the learning algorithm attempts to optimize a function (the model) to find the combination of feature values that result in the target output.

The often used supervised machine learning task of predicting which category an example belongs to is known as classification. It is easy to think of potential uses for a classifier. For instance, you could predict whether:

A football team will win or lose
A person will live past the age of 100
An applicant will default on a loan
An earthquake will strike next year

The target feature to be predicted is a categorical feature known as the class and is divided into categories called levels. A class can have two or more levels, and the levels need not necessarily be ordinal. Because classification is so widely used in machine learning, there are many types of classification algorithms.

Supervised learners can also be used to predict numeric data such as income, laboratory values, test scores, or counts of items. To predict such numeric values, a common form of numeric prediction fits linear regression models to the input data. Although regression models are not the only type of numeric models, they are by far the most widely used. Regression methods are widely used for forecasting, as they quantify in exact terms the association between the inputs and the target, including both the magnitude and uncertainty of the relationship.

Tip

Since it is easy to convert numbers to categories (for example, ages 13 to 19 are teenagers) and categories to numbers (for example, assign 1 to all males, 0 to all females), the boundary between classification models and numeric prediction models is not necessarily firm.

A descriptive model is used for tasks that would benefit from the insight gained from summarizing data in new and interesting ways. As opposed to predictive models that predict a target of interest; in a descriptive model, no single feature is more important than any other. In fact, because there is no target to learn, the process of training a descriptive model is called unsupervised learning. Although it can be more difficult to think of applications for descriptive models—after all, what good is a learner that isn't learning anything in particular—they are used quite regularly for data mining.

For example, the descriptive modeling task called pattern discovery is used to identify frequent associations within data. Pattern discovery is often used for market basket analysis on transactional purchase data. Here, the goal is to identify items that are frequently purchased together, such that the learned information can be used to refine the marketing tactics. For instance, if a retailer learns that swimming trunks are commonly purchased at the same time as sunscreen, the retailer might reposition the items more closely in the store, or run a promotion to "up-sell" customers on associated items.

Tip

Originally used only in retail contexts, pattern discovery is now starting to be used in quite innovative ways. For instance, it can be used to detect patterns of fraudulent behavior, screen for genetic defects, or prevent criminal activity.

The descriptive modeling task of dividing a dataset into homogeneous groups is called clustering. This is sometimes used for segmentation analysis that identifies groups of individuals with similar purchasing, donating, or demographic information so that advertising campaigns can be tailored to particular audiences. Although the machine is capable of identifying the groups, human intervention is required to interpret them. For example, given five different clusters of shoppers at a grocery store, the marketing team will need to understand the differences among the groups in order to create a promotion that best suits each group. However, this is almost certainly easier than trying to create a unique appeal for each customer.

Matching your data to an appropriate algorithm

The following table lists the general types of machine learning algorithms covered in this book, each of which may be implemented in several ways. Although this covers only some of the entire set of all machine learning algorithms, learning these methods will provide a sufficient foundation for making sense of other methods as you encounter them.

Model	Task	Chapter
Supervised Learning Algorithms
Nearest Neighbor	Classification	Chapter 3
naive Bayes	Classification	Chapter 4
Decision Trees	Classification	Chapter 5
Classification Rule Learners	Classification	Chapter 5
Linear Regression	Numeric prediction	Chapter 6
Regression Trees	Numeric prediction	Chapter 6
Model Trees	Numeric prediction	Chapter 6
Neural Networks	Dual use	Chapter 7
Support Vector Machines	Dual use	Chapter 7
Unsupervised Learning Algorithms
Association Rules	Pattern detection	Chapter 8
k-means Clustering	Clustering	Chapter 9

To match a learning task to a machine learning approach, you will need to begin with one of the four types of tasks: classification, numeric prediction, pattern detection, or clustering. Certain tasks make the choice of algorithm simpler. For instance, if you are undertaking pattern detection, you will likely employ association rules. Similarly, a clustering problem will likely utilize the k-means algorithm while numeric prediction will utilize regression analysis or regression trees.

For classification, more thought is needed to match a learning problem to an appropriate classifier. In these cases, it is helpful to consider the various distinctions among the algorithms. For instance, within classification problems, decision trees result in models that are readily understood, while the models of neural networks are notoriously difficult to interpret. If you were designing a credit-scoring model, this could be an important distinction because law often requires that the applicant must be notified about the reasons he or she was rejected for the loan. Even if the neural network was better at predicting loan defaults if the predictions cannot be explained, then it is useless.

In each chapter, the key strengths and weaknesses of each approach will be listed. Although you will sometimes find that these characteristics exclude certain models from consideration in most cases, the choice of model is arbitrary. In this case, feel free to use whichever algorithm you are most comfortable with. Other times, when predictive accuracy is primary, you may need to test several and choose the one that fits best. In later chapters, we will even look at methods of combining models that utilize the best properties of each.