Packt+ | Advance your knowledge in tech

You're reading from Machine Learning with R R gives you access to the cutting-edge software you need to prepare data for machine learning. No previous knowledge required ‚Äì this book will take you methodically through every stage of applying machine learning.

Product type Paperback

Published in Oct 2013

Publisher Packt

ISBN-13 9781782162148

Length 396 pages

Edition 1st Edition

Languages

Concepts

Machine Learning

Author (1):

Brett Lantz

View More author details

Table of Contents (19) Chapters

Machine Learning with R

Credits

About the Author

About the Reviewers

www.PacktPub.com

Preface

1. Introducing Machine Learning

2. Managing and Understanding Data FREE CHAPTER

3. Lazy Learning – Classification Using Nearest Neighbors

4. Probabilistic Learning – Classification Using Naive Bayes

5. Divide and Conquer – Classification Using Decision Trees and Rules

6. Forecasting Numeric Data – Regression Methods

7. Black Box Methods – Neural Networks and Support Vector Machines

8. Finding Patterns – Market Basket Analysis Using Association Rules

9. Finding Groups of Data – Clustering with k-means

10. Evaluating Model Performance

11. Improving Model Performance

12. Specialized Machine Learning Topics

Index

A

= assignment operator / Vectors
abline() function / ROC curves
abstraction process
- about / Abstraction and knowledge representation
actionable associations / Step 4 – evaluating model performance
activation function
- about / From biological to artificial neurons, Activation functions
- threshold activation function / Activation functions
- unit step activation function / Activation functions
- sigmoid activation function / Activation functions
AdaBoost
- about / Boosting
adaptive boosting
- about / Boosting the accuracy of decision trees
aggregate() function / Step 5 – improving model performance
aggregate function / Bagging
apply() function / Data preparation – creating indicator features for frequent words
appropriate k
- selecting / Choosing an appropriate k
Apriori
- about / The Apriori algorithm for association rule learning
apriori() function / Step 3 – training a model on the data
Apriori algorithm
- for association rule learning / The Apriori algorithm for association rule learning
- strengths / The Apriori algorithm for association rule learning
- weaknesses / The Apriori algorithm for association rule learning
Apriori principle
- used, for building set of rules / Building a set of rules with the Apriori principle
Apriori property
- about / The Apriori algorithm for association rule learning
array
- about / R data structures, Matrixes and arrays
Artificial Neural Network (ANN)
- about / Understanding neural networks
- applications / Understanding neural networks
association rules
- about / Understanding association rules
- potential applications / Understanding association rules
- rule interest, measuring / Measuring rule interest – support and confidence
- set of rules, building with Apriori principle / Building a set of rules with the Apriori principle
- frequently purchased groceries, identifying with / Example – identifying frequently purchased groceries with association rules
automated parameter tuning
- caret package used / Using caret for automated parameter tuning
- requisites / Using caret for automated parameter tuning
axon
- about / From biological to artificial neurons

B

0.632 bootstrap accounts / Bootstrap sampling
backpropagation
- neural networks, training with / Training neural networks with backpropagation
- about / Training neural networks with backpropagation
backpropagation algorithm
- strengths / Training neural networks with backpropagation
- weaknesses / Training neural networks with backpropagation
bag() function / Bagging
bag-of-words / Step 2 – exploring and preparing the data
bagging
- about / Bagging
bagging() function
- about / Bagging
bank loans example, with C5.0 decision trees
- data, collecting / Step 1 – collecting data
- data, exploring / Step 2 – exploring and preparing the data
- data, preparing / Step 2 – exploring and preparing the data
- random training, creating / Data preparation – creating random training and test datasets
- test datasets, creating / Data preparation – creating random training and test datasets
- model, training on data / Step 3 – training a model on the data
- model performance, evaluating / Step 4 – evaluating model performance
- model performance, improving / Step 5 – improving model performance
basics concepts, Bayesian methods
- about / Basic concepts of Bayesian methods
- probability / Probability
- joint probability / Joint probability
- conditional probability / Conditional probability with Bayes' theorem
Bayesian classifiers
- uses / Understanding naive Bayes
Bayesian methods
- about / Understanding naive Bayes
- basic concepts / Basic concepts of Bayesian methods
benefits, machine learning / Uses and abuses of machine learning
bias
- about / Generalization
bias-variance tradeoff
- about / Choosing an appropriate k
biganalytics package / Using massive matrices with bigmemory
bigkmeans() function / Using massive matrices with bigmemory
biglm() function
- about / Building bigger regression models with biglm
biglm package
- about / Building bigger regression models with biglm
- regression model, building / Building bigger regression models with biglm
bigmemory package
- about / Using massive matrices with bigmemory
- URL, for documentation / Using massive matrices with bigmemory
- massive matrices, using with / Using massive matrices with bigmemory
bigrf package
- about / Growing bigger and faster random forests with bigrf
- random forests, building / Growing bigger and faster random forests with bigrf
bimodal / Measuring the central tendency – the mode
binning
- about / Using numeric features with naive Bayes
bins
- about / Visualizing numeric variables – histograms
Bioconductor project
- about / Working with bioinformatics data
- URL / Working with bioinformatics data
bioinformatics data
- working with / Working with bioinformatics data
bivariate relationships
- about / Exploring relationships between variables
blind tasting experience example / The kNN algorithm
body mass index (BMI) / Step 1 – collecting data
boosting
- about / Boosting
bootstrap aggregating
- about / Bagging
bootstrap sampling / Bootstrap sampling
box-and-whiskers plot
- about / Visualizing numeric variables – boxplots
boxplot
- about / Visualizing numeric variables – boxplots
boxplot() function / Visualizing numeric variables – boxplots
branches
- about / Understanding decision trees
breast cancer
- diagnosing, with kNN algorithm / Diagnosing breast cancer with the kNN algorithm
breast cancer example
- data, collecting / Step 1 – collecting data
- data, exploring / Step 2 – exploring and preparing the data
- data, preparing / Step 2 – exploring and preparing the data
- model, training on data / Step 3 – training a model on the data
- model performance, evaluating / Step 4 – evaluating model performance
- model performance, improving / Step 5 – improving model performance, Transformation – z-score standardization

C

c() function / Vectors
C5.0 algorithm
- about / The C5.0 decision tree algorithm
- strengths / The C5.0 decision tree algorithm
- weaknesses / The C5.0 decision tree algorithm
- split, selecting / Choosing the best split
- decision tree, pruning / Pruning the decision tree
caret character / Ordinary least squares estimation
caret package
- using, for automated parameter tuning / Using caret for automated parameter tuning
- about / Using caret for automated parameter tuning, Training and evaluating models in parallel with caret
categorical variables
- about / Exploring categorical variables
- exploring / Exploring categorical variables
- central tendency, measuring / Measuring the central tendency – the mode
cbind() function / Multiple linear regression
central tendency
- measuring / Measuring the central tendency – mean and median
centroid / Using distance to assign and update clusters
characteristics, neural networks
- activation function / From biological to artificial neurons, Activation functions
- network topology / From biological to artificial neurons, Network topology, The number of layers, The direction of information travel, The number of nodes in each layer
- training algorithm / From biological to artificial neurons, Training neural networks with backpropagation
character vectors
- about / Factors
Chi-Squared statistic
- about / Choosing the best split
classification
- about / Thinking about types of machine learning algorithms
- nearest neighbors used / Understanding classification using nearest neighbors
classification performance
- measuring / Measuring performance for classification
classification prediction data
- working with / Working with classification prediction data in R
classification rules
- about / Understanding classification rules
- separate-and-conquer / Separate and conquer
- One Rule algorithm / The One Rule algorithm
- RIPPER algorithm / The RIPPER algorithm
- obtaining, from decision trees / Rules from decision trees
cluster
- about / Understanding clustering
clustering
- about / Thinking about types of machine learning algorithms, Understanding clustering
- applications / Understanding clustering
- as machine learning task / Clustering as a machine learning task
clustering, k-means algorithm
- about / The k-means algorithm for clustering
- distance, used for assigning cluster / Using distance to assign and update clusters
- distance, used for updating cluster / Using distance to assign and update clusters
- appropriate number of clusters, selecting / Choosing the appropriate number of clusters
clusters / Learning faster with parallel computing
column-major order
- about / Matrixes and arrays
combine function / Vectors
components, machine learning
- generalization / Generalization
- success of learning, assessing / Assessing the success of learning
components, machine learnng
- data input / How do machines learn?
- abstraction / How do machines learn?, Abstraction and knowledge representation
- generalization / How do machines learn?
- knowledge representation / Abstraction and knowledge representation
Comprehensive R Archive Network (CRAN)
- about / Using R for machine learning
/ Step 3 – training a model on the data
concrete strength, modeling with ANNs
- about / Modeling the strength of concrete with ANNs
- data, collecting / Step 1 – collecting data
- data, preparing / Step 2 – exploring and preparing the data
- data, exploring / Step 2 – exploring and preparing the data
- model, training on data / Step 3 – training a model on the data
- model performance, evaluating / Step 4 – evaluating model performance
- model performance, improving / Step 5 – improving model performance
conditional probability
- about / Conditional probability with Bayes' theorem
confusion matrix
- about / A closer look at confusion matrices
- used, for measuring performance / Using confusion matrices to measure performance
contingency table
- about / Examining relationships – two-way cross-tabulations
convex hull / The case of linearly separable data
cor() command / Exploring relationships among features – the correlation matrix
cor() function / Correlations
corpus
- about / Data preparation – processing text data for analysis
Corpus() function / Data preparation – processing text data for analysis
correlation / Visualizing relationships – scatterplots
- about / Correlations
correlation ellipse / Visualizing relationships among features – the scatterplot matrix
correlation matrix / Exploring relationships among features – the correlation matrix
cov() function / Ordinary least squares estimation, Correlations
covariance
- about / Ordinary least squares estimation
createDataPartition() function / The holdout method
cross-validation / Cross-validation
crosstab
- about / Examining relationships – two-way cross-tabulations
CrossTable() function / Examining relationships – two-way cross-tabulations, Using confusion matrices to measure performance
CSV files
- data, importing from / Importing and saving data from CSV files
- about / Importing and saving data from CSV files
- loading, into R / Importing and saving data from CSV files
CUDA
- about / GPU computing
curve() function / Choosing the best split
cut points
- about / Using numeric features with naive Bayes

D

data
- machine learning algorithm, applying to / Steps to apply machine learning to your data
- managing, with R / Managing data with R
- importing, from CSV files / Importing and saving data from CSV files
- importing, from SQL databases / Importing data from SQL databases
- about / Working with specialized data
- obtaining, from web / Getting data from the Web with the RCurl package
data.frame() function / Data frames
data.table package
- about / Making data frames faster with data.table
data dictionary
- about / Exploring the structure of data
data exploration
- about / Exploring and understanding data
data frame
- about / R data structures, Data frames
- making faster, with data.table package / Making data frames faster with data.table
data mining
- about / The origins of machine learning
data munging
- about / Working with specialized data
data preparation, breast cancer example
- training, creating / Data preparation – creating training and test datasets
- test datasets, creating / Data preparation – creating training and test datasets
data structures, R
- about / R data structures
- vector / Vectors
- factor / Factors, Lists
- data frame / Data frames
- matrix / Matrixes and arrays
- array / Matrixes and arrays
- saving / Saving and loading R data structures
- loading / Saving and loading R data structures
- exploring / Exploring the structure of data
DBMS
- about / Importing data from SQL databases
decision nodes
- about / Understanding decision trees
decision tree
- about / Understanding decision trees, Example – identifying risky bank loans using C5.0 decision trees
- potential uses / Understanding decision trees
- divide-and-conquer / Divide and conquer
- pruning / Pruning the decision tree
- used, for identifying risky bank loans / Example – identifying risky bank loans using C5.0 decision trees, Step 1 – collecting data
- accuracy, boosting / Boosting the accuracy of decision trees
decision tree forests
- about / Random forests
decision trees
- classification rules, obtaining from / Rules from decision trees
deep learning / The direction of information travel
delimiter
- about / Importing and saving data from CSV files
dendrites
- about / From biological to artificial neurons
dependent events
- about / Joint probability
dependent variable / Visualizing relationships – scatterplots
- about / Understanding regression
descriptive model
- about / Thinking about types of machine learning algorithms
diff() function / Measuring spread – quartiles and the five-number summary
disk-based data frames
- creating, with ff package / Creating disk-based data frames with ff
distance function
- about / Calculating distance
divide-and-conquer
- about / Divide and conquer
DSN
- about / Importing data from SQL databases
dummy coding
- about / Preparing data for use with kNN
/ Step 3 – training a model on the data

E

e1071 package
- naive Bayes classification, with naiveBayes() function / Step 3 – training a model on the data
elbow method / Choosing the appropriate number of clusters
elbow point / Choosing the appropriate number of clusters
elements
- about / Vectors
ensemble methods
- bagging / Bagging
- boosting / Boosting
- random forests / Random forests
ensembles
- about / Understanding ensembles
- advantages / Understanding ensembles
entropy
- about / Choosing the best split
epoch / Training neural networks with backpropagation
ethical considerations, machine learning / Ethical considerations
Euclidean distance
- about / Calculating distance
Euclidean norm / The case of linearly separable data
events
- about / Basic concepts of Bayesian methods
example
- about / Thinking about the input data

F

10-fold cross-validation
- about / Cross-validation
F-measure
- about / The F-measure
Facebook / Finding teen market segments using k-means clustering
factor
- about / R data structures, Factors, Lists
- creating, from character vector / Factors
factor() function / Factors
feature
- about / Thinking about the input data
feedforward networks
- about / The direction of information travel
ff package
- about / Creating disk-based data frames with ff
- used, for creating disk-based data frames / Creating disk-based data frames with ff
five-number summary
- about / Measuring spread – quartiles and the five-number summary
foreach package
- about / Working in parallel with foreach, Training and evaluating models in parallel with caret
frequently purchased groceries
- identifying, with association rules / Example – identifying frequently purchased groceries with association rules
future performance
- estimating / Estimating future performance
future performance estimation
- holdout method / The holdout method
- cross-validation / Cross-validation
- bootstrap sampling / Bootstrap sampling

G

gain ratio
- about / Choosing the best split
Gaussian Radial Basis Function (RBF) kernel / Using kernels for non-linear spaces
generalization
- about / Generalization
generalized linear models (GLM)
- about / Understanding regression
Gini index
- about / Choosing the best split
GPU computing
- about / GPU computing
gputools package
- about / GPU computing
gradient descent
- about / Training neural networks with backpropagation
graph data
- working with / Working with social network data and graph data
greedy learners
- about / Separate and conquer
grid
- about / Using caret for automated parameter tuning

H

Hadoop
- parallel computing / Parallel cloud computing with MapReduce and Hadoop
header line
- about / Importing and saving data from CSV files
heuristics
- about / Generalization
hidden layers
- about / The number of layers
hist() function / Visualizing numeric variables – histograms
histogram
- about / Visualizing numeric variables – histograms
holdout method / The holdout method
human brain / Understanding neural networks
hyperplane / Understanding Support Vector Machines

I

imputation / Data preparation – imputing missing values
Incremental Reduced Error Pruning algorithm (IREP) / The RIPPER algorithm
independent events
- about / Joint probability
independent variables
- about / Understanding regression
information gain / Choosing the best split
Input Nodes / The number of layers
installation, R package / Installing an R package
instance-based learning
- about / Why is the kNN algorithm lazy?
interaction
- about / Model specification – adding interaction effects
intercept
- about / Understanding regression
interquartile range (IQR) / Measuring spread – quartiles and the five-number summary
ipred package
- about / Bagging
IQR() function / Measuring spread – quartiles and the five-number summary
itemFrequencyPlot() function / Visualizing item support – item frequency plots
itemset
- about / Understanding association rules

J

joint probability
- about / Joint probability
JRip() classifier / Step 5 – improving model performance
JSON
- about / Reading and writing JSON with the rjson package
- reading, rjson package used / Reading and writing JSON with the rjson package
- writing, rjson package used / Reading and writing JSON with the rjson package
- converting, to R / Reading and writing JSON with the rjson package
JSON format
- URL / Reading and writing JSON with the rjson package

K

k-means algorithm
- about / The k-means algorithm for clustering
- strengths / The k-means algorithm for clustering
- weaknesses / The k-means algorithm for clustering
kappa statistic
- about / The kappa statistic
kernels
- using, for non-linear spaces / Using kernels for non-linear spaces
kernel trick / Using kernels for non-linear spaces
kernlab package / Bagging
kmeans() function / Step 3 – training a model on the data
knn() function / Step 3 – training a model on the data
kNN algorithm
- about / The kNN algorithm, Step 3 – training a model on the data
- strengths / The kNN algorithm
- weaknesses / The kNN algorithm
- distance, calculating / Calculating distance
- appropriate k, selecting / Choosing an appropriate k
- data, preparing / Preparing data for use with kNN
- used, for diagnosing breast cancer / Diagnosing breast cancer with the kNN algorithm
knowledge representation
- about / Abstraction and knowledge representation
ksvm() function / Bagging

L

Laplace estimator
- about / The Laplace estimator
lapply() function / Transformation – normalizing numeric data, Transformation – z-score standardization
large datasets
- managing / Managing very large datasets
large datasets management
- about / Managing very large datasets
- data frames, making faster with data.table package / Making data frames faster with data.table
- disk-based data frames, creating with ff package / Creating disk-based data frames with ff
- massive matrices, using with bigmemory package / Using massive matrices with bigmemory
layers
- about / The number of layers
lazy learning algorithms / Why is the kNN algorithm lazy?
leaf nodes
- about / Understanding decision trees
learning, with parallel computing
- about / Learning faster with parallel computing
- execution time, measuring / Measuring execution time
- working, in paralle with foreach / Working in parallel with foreach
- multitasking operating system, using with multicore package / Using a multitasking operating system with multicore
- multiple workstations, networking / Networking multiple workstations with snow and snowfall
- parallel cloud computing, with MapReduce / Parallel cloud computing with MapReduce and Hadoop
- parallel cloud computing, with Hadoop / Parallel cloud computing with MapReduce and Hadoop
learning rate / Training neural networks with backpropagation
left hand side (LHS) / Step 4 – evaluating model performance
levels
- about / Thinking about types of machine learning algorithms
likelihood
- about / Conditional probability with Bayes' theorem
likelihood table
- about / Conditional probability with Bayes' theorem
linear kernel / Using kernels for non-linear spaces
linearly separable / Classification with hyperplanes
linear regression
- about / Understanding regression
link function
- about / Understanding regression
list() function / Lists
lists
- about / R data structures
lm() function
- about / Building bigger regression models with biglm
load() command / Saving and loading R data structures
loess smooth / Visualizing relationships among features – the scatterplot matrix
logistic regression
- about / Understanding regression

M

M5' algorithm (M5-prime) / Step 5 – improving model performance
machine learning
- origins / The origins of machine learning
- benefits / Uses and abuses of machine learning
- ethical considerations / Ethical considerations
- about / How do machines learn?
- applying, to data / Steps to apply machine learning to your data
- R, using / Using R for machine learning
machine learning algorithm
- about / The origins of machine learning, Uses and abuses of machine learning
- selecting / Choosing a machine learning algorithm
- data, matching / Matching your data to an appropriate algorithm
machine learning algorithms
- input training data / Thinking about the input data
- types / Thinking about types of machine learning algorithms
Manhattan distance
- about / Calculating distance
MapReduce programming model
- about / Parallel cloud computing with MapReduce and Hadoop
- parallel computing / Parallel cloud computing with MapReduce and Hadoop
marginal likelihood
- about / Conditional probability with Bayes' theorem
market basket analysis example
- data, collecting / Step 1 – collecting data
- data, preparing / Step 2 – exploring and preparing the data
- data, exploring / Step 2 – exploring and preparing the data
- sparse matrix, creating for transaction data / Data preparation – creating a sparse matrix for transaction data
- item support, visualizing / Visualizing item support – item frequency plots
- transaction data, visualizing / Visualizing transaction data – plotting the sparse matrix
- model, training on data / Step 3 – training a model on the data
- model performance, evaluating / Step 4 – evaluating model performance
- model performance, improving / Step 5 – improving model performance
- set of association rules, sorting / Sorting the set of association rules, Taking subsets of association rules
- association rules, saving to file / Saving association rules to a file or data frame
massive matrices
- using, with bigmemory package / Using massive matrices with bigmemory
matrix
- about / Matrixes and arrays
matrix() function / Matrixes and arrays
Maximum Margin Hyperplane (MMH)
- about / Finding the maximum margin
- case, of linearly separable data / The case of linearly separable data
- case, of non-linearly separable data / The case of non-linearly separable data
mcapply() function / Using a multitasking operating system with multicore
mean
- about / Measuring the central tendency – mean and median
mean() function / Measuring the central tendency – mean and median, Ordinary least squares estimation
mean absolute error (MAE) / Measuring performance with mean absolute error
median
- about / Measuring the central tendency – mean and median
median() function / Measuring the central tendency – mean and median
medical expenses, predicting with linear regression
- about / Example – predicting medical expenses using linear regression
- data, collecting / Step 1 – collecting data
- data, preparing / Step 2 – exploring and preparing the data
- data, exploring / Step 2 – exploring and preparing the data
- correlation matrix / Exploring relationships among features – the correlation matrix
- relationships, exploring among features / Exploring relationships among features – the correlation matrix
- relationships, visualizing among features / Visualizing relationships among features – the scatterplot matrix
- scatterplot matrix / Visualizing relationships among features – the scatterplot matrix
- model, training on data / Step 3 – training a model on the data
- model performance, training / Step 4 – evaluating model performance
- model performance, improving / Step 5 – improving model performance, Transformation – converting a numeric variable to a binary indicator, Putting it all together – an improved regression model
meta-learning methods
- about / Improving model performance with meta-learning
- used, for improving model performance / Improving model performance with meta-learning
Microsoft Excel / Importing and saving data from CSV files
Microsoft Excel spreadsheets
- reading, xlsx package used / Reading and writing Microsoft Excel spreadsheets using xlsx
- writing, xlsx package used / Reading and writing Microsoft Excel spreadsheets using xlsx
Microsoft SQL
- about / Importing data from SQL databases
min-max normalization
- about / Preparing data for use with kNN
Mobile Phone Spam
- filtering, with naive Bayes algorithm / Example – filtering mobile phone spam with the naive Bayes algorithm
Mobile Phone Spam example
- data, collecting / Step 1 – collecting data
- data, preparing / Step 2 – exploring and preparing the data
- data, exploring / Step 2 – exploring and preparing the data
- text data, processing for analysis / Data preparation – processing text data for analysis
- training, creating / Data preparation – creating training and test datasets
- test datasets, creating / Data preparation – creating training and test datasets
- text data, visualizing / Visualizing text data – word clouds
- indicator features, creating for frequent words / Data preparation – creating indicator features for frequent words
- model, training on data / Step 3 – training a model on the data
- model performance, evaluating / Step 4 – evaluating model performance
- model performance, improving / Step 5 – improving model performance
mode
- about / Measuring the central tendency – the mode
mode() function / Measuring the central tendency – the mode
model
- about / Abstraction and knowledge representation
model performance
- improving, with meta-learning / Improving model performance with meta-learning
model performance, breast cancer example
- z-score standardization / Transformation – z-score standardization
- alternatives values, testing of k / Testing alternative values of k
model trees
- about / Understanding regression trees and model trees
multicore package
- about / Using a multitasking operating system with multicore
multidimensional feature space / The kNN algorithm
multilayer network
- about / The number of layers
Multilayer Perceptron (MLP)
- about / The direction of information travel
multimodal / Measuring the central tendency – the mode
multiple linear regression
- about / Multiple linear regression
- strengths / Multiple linear regression
- weaknesses / Multiple linear regression
multiple workstations
- networking, with snow package / Networking multiple workstations with snow and snowfall
- networking, with snowfall package / Networking multiple workstations with snow and snowfall
multitasking operating system
- using, with multicore package / Using a multitasking operating system with multicore
multivariate relationships
- about / Exploring relationships between variables
MySpace / Finding teen market segments using k-means clustering
MySQL
- about / Importing data from SQL databases

N

naive Bayes
- numeric features, using with / Using numeric features with naive Bayes
naive Bayes algorithm
- about / Understanding naive Bayes, The naive Bayes algorithm
- strengths / The naive Bayes algorithm
- weaknesses / The naive Bayes algorithm
- naive Bayes classification / The naive Bayes classification
- Laplace estimator / The Laplace estimator
- used, for filtering Mobile Phone Spam / Example – filtering mobile phone spam with the naive Bayes algorithm
naive Bayes classification
- about / The naive Bayes classification
- naiveBayes() function, using in e1071 package / Step 3 – training a model on the data
nearest neighbor classifiers
- about / Understanding classification using nearest neighbors
network package
- about / Working with social network data and graph data
- URL, for info / Working with social network data and graph data
network topology
- about / Network topology
- number of layers / The number of layers
- direction, of information travel / The direction of information travel
- number of nodes, in each layer / The number of nodes in each layer
neural networks
- about / Understanding neural networks
- biological, to artificial neurons / From biological to artificial neurons
- characteristics / From biological to artificial neurons
- training, with backpropagation / Training neural networks with backpropagation
neurons
- about / Understanding neural networks
No Free Lunch theorem
- about / Choosing a machine learning algorithm
nominal variables
- about / Factors
non-linearly separable data / The case of non-linearly separable data
non-linear spaces
- kernels, using for / Using kernels for non-linear spaces
normal distributions / Understanding numeric data – uniform and normal distributions
normalize() function / Transformation – normalizing numeric data
numeric data
- about / Understanding numeric data – uniform and normal distributions
- normalizing / Transformation – normalizing numeric data
numeric features
- using, with naive Bayes / Using numeric features with naive Bayes
numeric prediction
- about / Thinking about types of machine learning algorithms
numeric variables
- about / Exploring numeric variables
- exploring / Exploring numeric variables
- central tendency, measuring / Measuring the central tendency – mean and median
- spread, measuring / Measuring spread – quartiles and the five-number summary
- visualizing / Visualizing numeric variables – boxplots, Visualizing numeric variables – histograms

O

<- operator / Vectors
OCR, performing with SVMs
- about / Performing OCR with SVMs
- data, collecting / Step 1 – collecting data
- data, exploring / Step 2 – exploring and preparing the data
- data, preparing / Step 2 – exploring and preparing the data
- model, training on data / Step 3 – training a model on the data
- model performance, evaluating / Step 4 – evaluating model performance
- model performance, improving / Step 5 – improving model performance
ODBC
- about / Importing data from SQL databases
odbcConnect() function / Importing data from SQL databases
one-way table / Exploring categorical variables
One Rule algorithm
- about / The One Rule algorithm
- strengths / The One Rule algorithm
- weaknesses / The One Rule algorithm
optimized learning algorithms
- deploying / Deploying optimized learning algorithms
optimized learning algorithms deployment
- regression models, building with biglm package / Building bigger regression models with biglm
- random forests, building with bigrf package / Growing bigger and faster random forests with bigrf
- caret package, used for evaluating models in parallel / Training and evaluating models in parallel with caret
Oracle
- about / Importing data from SQL databases
order() function / Data preparation – creating random training and test datasets
ordinary least squares (OLS) / Ordinary least squares estimation
ordinary least squares estimation
- about / Ordinary least squares estimation
out-of-bag error rate
- about / Training random forests
Output Node / The number of layers
overfitting
- about / Assessing the success of learning

P

pairs() function / Visualizing relationships among features – the scatterplot matrix
parallel computing methods
- about / Learning faster with parallel computing
parameter estimates
- about / Simple linear regression
parameter tuning
- about / Tuning stock models for better performance
pattern discovery
- about / Thinking about types of machine learning algorithms
Pearson's Chi-squared test / Examining relationships – two-way cross-tabulations
Pearson's correlation
- about / Correlations
performance
- measuring, confusion matrices used / Using confusion matrices to measure performance
- improving, of R / Improving the performance of R
performance() function / ROC curves
performance measures
- about / Beyond accuracy – other measures of performance
- kappa statistic / The kappa statistic
- sensitivity / Sensitivity and specificity
- specificity / Sensitivity and specificity
- precision / Precision and recall
- recall / Precision and recall
- F-measure / The F-measure
performance tradeoffs
- visualizing / Visualizing performance tradeoffs
plot() command / ROC curves
plot() function / Visualizing relationships – scatterplots
point-and-click interface
- used, for installing R package / Installing a package using the point-and-click interface
poisonous mushrooms
- identifying, with rule learners / Example – identifying poisonous mushrooms with rule learners
poisonous mushrooms example, with rule learners
- data, collecting / Step 1 – collecting data
- data, exploring / Step 2 – exploring and preparing the data
- data, preparing / Step 2 – exploring and preparing the data
- model, training on data / Step 3 – training a model on the data
- model performance, evaluating / Step 4 – evaluating model performance
- model performance, improving / Step 5 – improving model performance
Poisson regression
- about / Understanding regression
polynomial kernel / Using kernels for non-linear spaces
posPredValue() function / Precision and recall
posterior probability
- about / Conditional probability with Bayes' theorem
PostgreSQL
- about / Importing data from SQL databases
postpruning
- about / Pruning the decision tree
precision
- about / Precision and recall
pred function / Bagging
predict() function / Working with classification prediction data in R, Creating a simple tuned model
- about / Bagging
predictive model
- about / Thinking about types of machine learning algorithms
prepruning
- about / Pruning the decision tree
prior probability
- about / Conditional probability with Bayes' theorem
probability
- about / Probability

Q

quadratic optimization / The case of linearly separable data
quantile() function / Measuring spread – quartiles and the five-number summary
quartiles
- about / Measuring spread – quartiles and the five-number summary

R

68-95-99.7 rule / Measuring spread – variance and standard deviation
R
- using, for machine learning / Using R for machine learning
- data structures / R data structures
- used, for managing data / Managing data with R
- CSV file, loading into / Importing and saving data from CSV files
- working with classification prediction data / Working with classification prediction data in R
- JSON, converting to / Reading and writing JSON with the rjson package
- performance, improving / Improving the performance of R
Radial Basis Function (RBF) network
- about / Activation functions
randomForest() function
- about / Training random forests
/ Evaluating random forest performance
randomForest package / Training random forests
random forests
- about / Random forests
- strengths / Random forests
- weaknesses / Random forests
- training / Training random forests
- performance, evaluating / Evaluating random forest performance
range
- about / Measuring spread – quartiles and the five-number summary
range() function / Measuring spread – quartiles and the five-number summary
RCurl package
- about / Getting data from the Web with the RCurl package
- used, for obtaining data from web / Getting data from the Web with the RCurl package
- URL, for documentation / Getting data from the Web with the RCurl package
real-world data
- about / Working with specialized data
recall / Precision and recall
recurrent network
- about / The direction of information travel
recursive partitioning
- about / Divide and conquer
reg() function / Multiple linear regression
regression
- about / Understanding regression
- simple linear regression / Simple linear regression
- ordinary least squares estimation / Ordinary least squares estimation
- correlation / Correlations
- multiple linear regression / Multiple linear regression
- adding, to trees / Adding regression to trees
regression analysis
- use cases / Understanding regression
regression equations
- about / Understanding regression
regression models
- building, with biglm package / Building bigger regression models with biglm
regression trees
- about / Understanding regression trees and model trees
- strengths / Adding regression to trees
- weaknesses / Adding regression to trees
relationships
- exploring, between variables / Exploring relationships between variables
- visualizing / Visualizing relationships – scatterplots
- examining / Examining relationships – two-way cross-tabulations
residuals
- about / Ordinary least squares estimation
resubstitution error / Estimating future performance
RHIPE package / Parallel cloud computing with MapReduce and Hadoop
right hand side (RHS) / Step 4 – evaluating model performance
RIPPER algorithm
- about / The RIPPER algorithm
- strengths / The RIPPER algorithm
- weaknesses / The RIPPER algorithm
risky bank loans
- identifying, C5.0 decision trees used / Example – identifying risky bank loans using C5.0 decision trees, Step 1 – collecting data
rjson package
- about / Reading and writing JSON with the rjson package
- used, for reading JSON / Reading and writing JSON with the rjson package
- used, for writing JSON / Reading and writing JSON with the rjson package
rmr package
- about / Parallel cloud computing with MapReduce and Hadoop
ROC curve
- about / ROC curves
- creating / ROC curves
ROCR package
- about / Visualizing performance tradeoffs
RODBC package
- about / Importing data from SQL databases
rote learning
- about / Why is the kNN algorithm lazy?
round() function / Exploring categorical variables
R package
- installing / Installing an R package
- installing, point-and-click interface used / Installing a package using the point-and-click interface
- loading / Loading an R package
R performance
- large datasets, managing / Managing very large datasets
- learning, with parallel computing / Learning faster with parallel computing
- GPU computing / GPU computing
- optimized learning algorithms, deploying / Deploying optimized learning algorithms
rudimentary ANNs / Understanding neural networks
runif() function / Data preparation – creating random training and test datasets
RWeka package
- using / Installing and loading R packages
- loading / Loading an R package
/ The C5.0 decision tree algorithm

S

save() function / Saving and loading R data structures
scale() function / Transformation – z-score standardization
scatterplot
- about / Visualizing relationships – scatterplots
Scoville scale
- about / Preparing data for use with kNN
sd() function / Measuring spread – variance and standard deviation, Correlations
semi-supervised learning
- about / Clustering as a machine learning task
sensitivity() function / Precision and recall
sensor / The origins of machine learning
separate-and-conquer
- about / Separate and conquer
seq() function / Measuring spread – quartiles and the five-number summary
Short Message Service (SMS) / Example – filtering mobile phone spam with the naive Bayes algorithm
sigmoid activation function
- about / Activation functions
sigmoid kernel / Using kernels for non-linear spaces
simple linear regression
- about / Simple linear regression
simple tuned model
- creating / Creating a simple tuned model
single-layer network
- about / The number of layers
skew / Visualizing numeric variables – histograms
slack variable / The case of non-linearly separable data
slope
- about / Understanding regression
sna package
- URL, for info / Working with social network data and graph data
snowfall package
- multiple workstations, networking / Networking multiple workstations with snow and snowfall
snow package
- about / Networking multiple workstations with snow and snowfall
- multiple workstations, networking / Networking multiple workstations with snow and snowfall
social network data
- working with / Working with social network data and graph data
Social Networking Service (SNS) / Finding teen market segments using k-means clustering
sparse matrix
- about / Data preparation – processing text data for analysis, Data preparation – creating a sparse matrix for transaction data
- creating, for transaction data / Data preparation – creating a sparse matrix for transaction data
specialized data
- working with / Working with specialized data
SQL databases
- data, importing from / Importing data from SQL databases
SQLite
- about / Importing data from SQL databases
sqlQuery() function / Importing data from SQL databases
stacking
- about / Understanding ensembles
standard deviation
- about / Measuring spread – variance and standard deviation
standard deviation reduction (SDR) / Adding regression to trees
stock models
- tuning, for better performance / Tuning stock models for better performance
stop words
- about / Data preparation – processing text data for analysis
str() function
- about / Exploring the structure of data
/ Step 2 – exploring and preparing the data
stringsAsFactors option / Data frames
subset() function / Working with classification prediction data in R
summary() function / Exploring numeric variables
summary statistics
- about / Exploring numeric variables
Sum of Squared Errors (SSE) / Step 3 – training a model on the data
supervised learning
- about / Thinking about types of machine learning algorithms
support vector machine (SVM)
- about / Bagging
Support Vector Machine (SVM)
- about / Understanding Support Vector Machines
- applications / Understanding Support Vector Machines
- classifications, with hyperplanes / Classification with hyperplanes
- maximum margin, finding / Finding the maximum margin
- OCR, performing with / Performing OCR with SVMs
support vectors / Finding the maximum margin
synapse
- about / From biological to artificial neurons

T

Tab-Separated Value (TSV)
- about / Importing and saving data from CSV files
table() function / Exploring categorical variables, Using confusion matrices to measure performance
target feature
- about / Thinking about types of machine learning algorithms
teen market segments serach, with k-means clustering
- about / Finding teen market segments using k-means clustering
- data, collecting / Step 1 – collecting data
- data, exploring / Step 2 – exploring and preparing the data
- data, preparing / Step 2 – exploring and preparing the data, Data preparation – dummy coding missing values, Data preparation – imputing missing values
- model, training on data / Step 3 – training a model on the data
- model performance, evaluating / Step 4 – evaluating model performance
- model performance, improving / Step 5 – improving model performance
threshold activation function
- about / Activation functions
tm package / Data preparation – processing text data for analysis
token
- about / Data preparation – processing text data for analysis
tokenization
- about / Data preparation – processing text data for analysis
topology
- about / Network topology
train() function / Using caret for automated parameter tuning, Creating a simple tuned model
trainControl() function / Customizing the tuning process
training
- about / Abstraction and knowledge representation
transaction data
- sparse matrix, creating for / Data preparation – creating a sparse matrix for transaction data
transpose
- about / Multiple linear regression
trees
- regression, adding to / Adding regression to trees
tree structure
- about / Understanding decision trees
trial
- about / Basic concepts of Bayesian methods
trivial rules / Step 4 – evaluating model performance
tuning process
- customizing / Customizing the tuning process
Turing test
- about / Understanding neural networks
two-way cross-tabulation
- about / Examining relationships – two-way cross-tabulations

U

UCI Machine Learning Data Repository
- URL / Step 1 – collecting data, Step 1 – collecting data
- about / Step 1 – collecting data
uniform distribution / Understanding numeric data – uniform and normal distributions
unimodal / Measuring the central tendency – the mode
unit of observation phrase / Thinking about the input data
unit step activation function
- about / Activation functions
univariate statistics
- about / Exploring relationships between variables
universal function approximator
- about / The number of nodes in each layer
unsupervised classification
- about / Clustering as a machine learning task
unsupervised learning
- about / Thinking about types of machine learning algorithms
usedcars.csv dataset
- about / Exploring and understanding data

V

var() function / Measuring spread – variance and standard deviation, Ordinary least squares estimation
variables
- relationships, exploring between / Exploring relationships between variables
variance
- about / Measuring spread – variance and standard deviation
vector
- about / R data structures, Vectors
vector types
- integer / Vectors
- numeric / Vectors
- character / Vectors
- logical / Vectors
Venn diagram
- about / Joint probability
Voronoi diagram / Using distance to assign and update clusters

W

web
- data, obtaining from / Getting data from the Web with the RCurl package
weighted voting process
- about / Choosing an appropriate k
wine quality estimation, with regression trees
- about / Example – estimating the quality of wines with regression trees and model trees
- data, collecting / Step 1 – collecting data
- data, exploring / Step 2 – exploring and preparing the data
- data, preparing / Step 2 – exploring and preparing the data
- model, training on data / Step 3 – training a model on the data
- decision trees, visualizing / Visualizing decision trees
- model performance, evaluating / Step 4 – evaluating model performance
- performance, measuring with mean absolute error / Measuring performance with mean absolute error
- model performance, improving / Step 5 – improving model performance
word cloud
- about / Visualizing text data – word clouds

X

xlsx package
- about / Reading and writing Microsoft Excel spreadsheets using xlsx
- used, for reading Microsoft Excel spreadsheets / Reading and writing Microsoft Excel spreadsheets using xlsx
- used, for writing Microsoft Excel spreadsheets / Reading and writing Microsoft Excel spreadsheets using xlsx
- URL / Reading and writing Microsoft Excel spreadsheets using xlsx
XML
- about / Reading and writing XML with the XML package
- reading, XML package used / Reading and writing XML with the XML package
- writing, XML package used / Reading and writing XML with the XML package
XML package
- about / Reading and writing XML with the XML package
- used, for reading XML / Reading and writing XML with the XML package
- used, for writing XML / Reading and writing XML with the XML package
- URL, for info / Reading and writing XML with the XML package

Z

z-score standardization
- about / Preparing data for use with kNN
ZeroR
- about / The One Rule algorithm

The rest of the chapter is locked

You're reading from Machine Learning with R R gives you access to the cutting-edge software you need to prepare data for machine learning. No previous knowledge required ‚Äì this book will take you methodically through every stage of applying machine learning.

Table of Contents (19) Chapters

Index

A

B

C

D

E

F

G

H

I

J

K

L

M

N

O

P

Q

R

S

T

U

V

W

X

Z

Authors (1)

Personalised recommendations for you

You're reading from Machine Learning with R R gives you access to the cutting-edge software you need to prepare data for machine learning. No previous knowledge required ‚Äì this book will take you methodically through every stage of applying machine learning.

Table of Contents (19) Chapters

Index

A

B

C

D

E

F

G

H

I

J

K

L

M

N

O

P

Q

R

S

T

U

V

W

X

Z

Unlock this book and the full library FREE for 7 days

Authors (1)

Personalised recommendations for you