Index
A
- = assignment operator / Vectors
- abline() function / ROC curves
- abstraction process
- about / Abstraction and knowledge representation
- actionable associations / Step 4 – evaluating model performance
- activation function
- about / From biological to artificial neurons, Activation functions
- threshold activation function / Activation functions
- unit step activation function / Activation functions
- sigmoid activation function / Activation functions
- AdaBoost
- about / Boosting
- adaptive boosting
- about / Boosting the accuracy of decision trees
- aggregate() function / Step 5 – improving model performance
- aggregate function / Bagging
- apply() function / Data preparation – creating indicator features for frequent words
- appropriate k
- selecting / Choosing an appropriate k
- Apriori
- about / The Apriori algorithm for association rule learning
- apriori() function / Step 3 – training a model on the data
- Apriori algorithm
- for association rule learning / The Apriori algorithm for association rule learning
- strengths / The Apriori algorithm for association rule learning
- weaknesses / The Apriori algorithm for association rule learning
- Apriori principle
- used, for building set of rules / Building a set of rules with the Apriori principle
- Apriori property
- about / The Apriori algorithm for association rule learning
- array
- about / R data structures, Matrixes and arrays
- Artificial Neural Network (ANN)
- about / Understanding neural networks
- applications / Understanding neural networks
- association rules
- about / Understanding association rules
- potential applications / Understanding association rules
- rule interest, measuring / Measuring rule interest – support and confidence
- set of rules, building with Apriori principle / Building a set of rules with the Apriori principle
- frequently purchased groceries, identifying with / Example – identifying frequently purchased groceries with association rules
- automated parameter tuning
- caret package used / Using caret for automated parameter tuning
- requisites / Using caret for automated parameter tuning
- axon
- about / From biological to artificial neurons
B
- 0.632 bootstrap accounts / Bootstrap sampling
- backpropagation
- neural networks, training with / Training neural networks with backpropagation
- about / Training neural networks with backpropagation
- backpropagation algorithm
- strengths / Training neural networks with backpropagation
- weaknesses / Training neural networks with backpropagation
- bag() function / Bagging
- bag-of-words / Step 2 – exploring and preparing the data
- bagging
- about / Bagging
- bagging() function
- about / Bagging
- bank loans example, with C5.0 decision trees
- data, collecting / Step 1 – collecting data
- data, exploring / Step 2 – exploring and preparing the data
- data, preparing / Step 2 – exploring and preparing the data
- random training, creating / Data preparation – creating random training and test datasets
- test datasets, creating / Data preparation – creating random training and test datasets
- model, training on data / Step 3 – training a model on the data
- model performance, evaluating / Step 4 – evaluating model performance
- model performance, improving / Step 5 – improving model performance
- basics concepts, Bayesian methods
- about / Basic concepts of Bayesian methods
- probability / Probability
- joint probability / Joint probability
- conditional probability / Conditional probability with Bayes' theorem
- Bayesian classifiers
- uses / Understanding naive Bayes
- Bayesian methods
- about / Understanding naive Bayes
- basic concepts / Basic concepts of Bayesian methods
- benefits, machine learning / Uses and abuses of machine learning
- bias
- about / Generalization
- bias-variance tradeoff
- about / Choosing an appropriate k
- biganalytics package / Using massive matrices with bigmemory
- bigkmeans() function / Using massive matrices with bigmemory
- biglm() function
- about / Building bigger regression models with biglm
- biglm package
- about / Building bigger regression models with biglm
- regression model, building / Building bigger regression models with biglm
- bigmemory package
- about / Using massive matrices with bigmemory
- URL, for documentation / Using massive matrices with bigmemory
- massive matrices, using with / Using massive matrices with bigmemory
- bigrf package
- about / Growing bigger and faster random forests with bigrf
- random forests, building / Growing bigger and faster random forests with bigrf
- bimodal / Measuring the central tendency – the mode
- binning
- about / Using numeric features with naive Bayes
- bins
- about / Visualizing numeric variables – histograms
- Bioconductor project
- about / Working with bioinformatics data
- URL / Working with bioinformatics data
- bioinformatics data
- working with / Working with bioinformatics data
- bivariate relationships
- about / Exploring relationships between variables
- blind tasting experience example / The kNN algorithm
- body mass index (BMI) / Step 1 – collecting data
- boosting
- about / Boosting
- bootstrap aggregating
- about / Bagging
- bootstrap sampling / Bootstrap sampling
- box-and-whiskers plot
- about / Visualizing numeric variables – boxplots
- boxplot
- about / Visualizing numeric variables – boxplots
- boxplot() function / Visualizing numeric variables – boxplots
- branches
- about / Understanding decision trees
- breast cancer
- diagnosing, with kNN algorithm / Diagnosing breast cancer with the kNN algorithm
- breast cancer example
- data, collecting / Step 1 – collecting data
- data, exploring / Step 2 – exploring and preparing the data
- data, preparing / Step 2 – exploring and preparing the data
- model, training on data / Step 3 – training a model on the data
- model performance, evaluating / Step 4 – evaluating model performance
- model performance, improving / Step 5 – improving model performance, Transformation – z-score standardization
C
- c() function / Vectors
- C5.0 algorithm
- about / The C5.0 decision tree algorithm
- strengths / The C5.0 decision tree algorithm
- weaknesses / The C5.0 decision tree algorithm
- split, selecting / Choosing the best split
- decision tree, pruning / Pruning the decision tree
- caret character / Ordinary least squares estimation
- caret package
- using, for automated parameter tuning / Using caret for automated parameter tuning
- about / Using caret for automated parameter tuning, Training and evaluating models in parallel with caret
- categorical variables
- about / Exploring categorical variables
- exploring / Exploring categorical variables
- central tendency, measuring / Measuring the central tendency – the mode
- cbind() function / Multiple linear regression
- central tendency
- measuring / Measuring the central tendency – mean and median
- centroid / Using distance to assign and update clusters
- characteristics, neural networks
- activation function / From biological to artificial neurons, Activation functions
- network topology / From biological to artificial neurons, Network topology, The number of layers, The direction of information travel, The number of nodes in each layer
- training algorithm / From biological to artificial neurons, Training neural networks with backpropagation
- character vectors
- about / Factors
- Chi-Squared statistic
- about / Choosing the best split
- classification
- about / Thinking about types of machine learning algorithms
- nearest neighbors used / Understanding classification using nearest neighbors
- classification performance
- measuring / Measuring performance for classification
- classification prediction data
- working with / Working with classification prediction data in R
- classification rules
- about / Understanding classification rules
- separate-and-conquer / Separate and conquer
- One Rule algorithm / The One Rule algorithm
- RIPPER algorithm / The RIPPER algorithm
- obtaining, from decision trees / Rules from decision trees
- cluster
- about / Understanding clustering
- clustering
- about / Thinking about types of machine learning algorithms, Understanding clustering
- applications / Understanding clustering
- as machine learning task / Clustering as a machine learning task
- clustering, k-means algorithm
- about / The k-means algorithm for clustering
- distance, used for assigning cluster / Using distance to assign and update clusters
- distance, used for updating cluster / Using distance to assign and update clusters
- appropriate number of clusters, selecting / Choosing the appropriate number of clusters
- clusters / Learning faster with parallel computing
- column-major order
- about / Matrixes and arrays
- combine function / Vectors
- components, machine learning
- generalization / Generalization
- success of learning, assessing / Assessing the success of learning
- components, machine learnng
- data input / How do machines learn?
- abstraction / How do machines learn?, Abstraction and knowledge representation
- generalization / How do machines learn?
- knowledge representation / Abstraction and knowledge representation
- Comprehensive R Archive Network (CRAN)
- about / Using R for machine learning
- concrete strength, modeling with ANNs
- about / Modeling the strength of concrete with ANNs
- data, collecting / Step 1 – collecting data
- data, preparing / Step 2 – exploring and preparing the data
- data, exploring / Step 2 – exploring and preparing the data
- model, training on data / Step 3 – training a model on the data
- model performance, evaluating / Step 4 – evaluating model performance
- model performance, improving / Step 5 – improving model performance
- conditional probability
- about / Conditional probability with Bayes' theorem
- confusion matrix
- about / A closer look at confusion matrices
- used, for measuring performance / Using confusion matrices to measure performance
- contingency table
- about / Examining relationships – two-way cross-tabulations
- convex hull / The case of linearly separable data
- cor() command / Exploring relationships among features – the correlation matrix
- cor() function / Correlations
- corpus
- about / Data preparation – processing text data for analysis
- Corpus() function / Data preparation – processing text data for analysis
- correlation / Visualizing relationships – scatterplots
- about / Correlations
- correlation ellipse / Visualizing relationships among features – the scatterplot matrix
- correlation matrix / Exploring relationships among features – the correlation matrix
- cov() function / Ordinary least squares estimation, Correlations
- covariance
- about / Ordinary least squares estimation
- createDataPartition() function / The holdout method
- cross-validation / Cross-validation
- crosstab
- about / Examining relationships – two-way cross-tabulations
- CrossTable() function / Examining relationships – two-way cross-tabulations, Using confusion matrices to measure performance
- CSV files
- data, importing from / Importing and saving data from CSV files
- about / Importing and saving data from CSV files
- loading, into R / Importing and saving data from CSV files
- CUDA
- about / GPU computing
- curve() function / Choosing the best split
- cut points
- about / Using numeric features with naive Bayes
D
- data
- machine learning algorithm, applying to / Steps to apply machine learning to your data
- managing, with R / Managing data with R
- importing, from CSV files / Importing and saving data from CSV files
- importing, from SQL databases / Importing data from SQL databases
- about / Working with specialized data
- obtaining, from web / Getting data from the Web with the RCurl package
- data.frame() function / Data frames
- data.table package
- about / Making data frames faster with data.table
- data dictionary
- about / Exploring the structure of data
- data exploration
- about / Exploring and understanding data
- data frame
- about / R data structures, Data frames
- making faster, with data.table package / Making data frames faster with data.table
- data mining
- about / The origins of machine learning
- data munging
- about / Working with specialized data
- data preparation, breast cancer example
- training, creating / Data preparation – creating training and test datasets
- test datasets, creating / Data preparation – creating training and test datasets
- data structures, R
- about / R data structures
- vector / Vectors
- factor / Factors, Lists
- data frame / Data frames
- matrix / Matrixes and arrays
- array / Matrixes and arrays
- saving / Saving and loading R data structures
- loading / Saving and loading R data structures
- exploring / Exploring the structure of data
- DBMS
- about / Importing data from SQL databases
- decision nodes
- about / Understanding decision trees
- decision tree
- about / Understanding decision trees, Example – identifying risky bank loans using C5.0 decision trees
- potential uses / Understanding decision trees
- divide-and-conquer / Divide and conquer
- pruning / Pruning the decision tree
- used, for identifying risky bank loans / Example – identifying risky bank loans using C5.0 decision trees, Step 1 – collecting data
- accuracy, boosting / Boosting the accuracy of decision trees
- decision tree forests
- about / Random forests
- decision trees
- classification rules, obtaining from / Rules from decision trees
- deep learning / The direction of information travel
- delimiter
- about / Importing and saving data from CSV files
- dendrites
- about / From biological to artificial neurons
- dependent events
- about / Joint probability
- dependent variable / Visualizing relationships – scatterplots
- about / Understanding regression
- descriptive model
- about / Thinking about types of machine learning algorithms
- diff() function / Measuring spread – quartiles and the five-number summary
- disk-based data frames
- creating, with ff package / Creating disk-based data frames with ff
- distance function
- about / Calculating distance
- divide-and-conquer
- about / Divide and conquer
- DSN
- about / Importing data from SQL databases
- dummy coding
- about / Preparing data for use with kNN
E
- e1071 package
- naive Bayes classification, with naiveBayes() function / Step 3 – training a model on the data
- elbow method / Choosing the appropriate number of clusters
- elbow point / Choosing the appropriate number of clusters
- elements
- about / Vectors
- ensemble methods
- bagging / Bagging
- boosting / Boosting
- random forests / Random forests
- ensembles
- about / Understanding ensembles
- advantages / Understanding ensembles
- entropy
- about / Choosing the best split
- epoch / Training neural networks with backpropagation
- ethical considerations, machine learning / Ethical considerations
- Euclidean distance
- about / Calculating distance
- Euclidean norm / The case of linearly separable data
- events
- about / Basic concepts of Bayesian methods
- example
- about / Thinking about the input data
F
- 10-fold cross-validation
- about / Cross-validation
- F-measure
- about / The F-measure
- Facebook / Finding teen market segments using k-means clustering
- factor
- about / R data structures, Factors, Lists
- creating, from character vector / Factors
- factor() function / Factors
- feature
- about / Thinking about the input data
- feedforward networks
- about / The direction of information travel
- ff package
- about / Creating disk-based data frames with ff
- used, for creating disk-based data frames / Creating disk-based data frames with ff
- five-number summary
- about / Measuring spread – quartiles and the five-number summary
- foreach package
- about / Working in parallel with foreach, Training and evaluating models in parallel with caret
- frequently purchased groceries
- identifying, with association rules / Example – identifying frequently purchased groceries with association rules
- future performance
- estimating / Estimating future performance
- future performance estimation
- holdout method / The holdout method
- cross-validation / Cross-validation
- bootstrap sampling / Bootstrap sampling
G
- gain ratio
- about / Choosing the best split
- Gaussian Radial Basis Function (RBF) kernel / Using kernels for non-linear spaces
- generalization
- about / Generalization
- generalized linear models (GLM)
- about / Understanding regression
- Gini index
- about / Choosing the best split
- GPU computing
- about / GPU computing
- gputools package
- about / GPU computing
- gradient descent
- about / Training neural networks with backpropagation
- graph data
- working with / Working with social network data and graph data
- greedy learners
- about / Separate and conquer
- grid
- about / Using caret for automated parameter tuning
H
- Hadoop
- parallel computing / Parallel cloud computing with MapReduce and Hadoop
- header line
- about / Importing and saving data from CSV files
- heuristics
- about / Generalization
- hidden layers
- about / The number of layers
- hist() function / Visualizing numeric variables – histograms
- histogram
- about / Visualizing numeric variables – histograms
- holdout method / The holdout method
- human brain / Understanding neural networks
- hyperplane / Understanding Support Vector Machines
I
- imputation / Data preparation – imputing missing values
- Incremental Reduced Error Pruning algorithm (IREP) / The RIPPER algorithm
- independent events
- about / Joint probability
- independent variables
- about / Understanding regression
- information gain / Choosing the best split
- Input Nodes / The number of layers
- installation, R package / Installing an R package
- instance-based learning
- about / Why is the kNN algorithm lazy?
- interaction
- about / Model specification – adding interaction effects
- intercept
- about / Understanding regression
- interquartile range (IQR) / Measuring spread – quartiles and the five-number summary
- ipred package
- about / Bagging
- IQR() function / Measuring spread – quartiles and the five-number summary
- itemFrequencyPlot() function / Visualizing item support – item frequency plots
- itemset
- about / Understanding association rules
J
- joint probability
- about / Joint probability
- JRip() classifier / Step 5 – improving model performance
- JSON
- about / Reading and writing JSON with the rjson package
- reading, rjson package used / Reading and writing JSON with the rjson package
- writing, rjson package used / Reading and writing JSON with the rjson package
- converting, to R / Reading and writing JSON with the rjson package
- JSON format
- URL / Reading and writing JSON with the rjson package
K
- k-means algorithm
- about / The k-means algorithm for clustering
- strengths / The k-means algorithm for clustering
- weaknesses / The k-means algorithm for clustering
- kappa statistic
- about / The kappa statistic
- kernels
- using, for non-linear spaces / Using kernels for non-linear spaces
- kernel trick / Using kernels for non-linear spaces
- kernlab package / Bagging
- kmeans() function / Step 3 – training a model on the data
- knn() function / Step 3 – training a model on the data
- kNN algorithm
- about / The kNN algorithm, Step 3 – training a model on the data
- strengths / The kNN algorithm
- weaknesses / The kNN algorithm
- distance, calculating / Calculating distance
- appropriate k, selecting / Choosing an appropriate k
- data, preparing / Preparing data for use with kNN
- used, for diagnosing breast cancer / Diagnosing breast cancer with the kNN algorithm
- knowledge representation
- about / Abstraction and knowledge representation
- ksvm() function / Bagging
L
- Laplace estimator
- about / The Laplace estimator
- lapply() function / Transformation – normalizing numeric data, Transformation – z-score standardization
- large datasets
- managing / Managing very large datasets
- large datasets management
- about / Managing very large datasets
- data frames, making faster with data.table package / Making data frames faster with data.table
- disk-based data frames, creating with ff package / Creating disk-based data frames with ff
- massive matrices, using with bigmemory package / Using massive matrices with bigmemory
- layers
- about / The number of layers
- lazy learning algorithms / Why is the kNN algorithm lazy?
- leaf nodes
- about / Understanding decision trees
- learning, with parallel computing
- about / Learning faster with parallel computing
- execution time, measuring / Measuring execution time
- working, in paralle with foreach / Working in parallel with foreach
- multitasking operating system, using with multicore package / Using a multitasking operating system with multicore
- multiple workstations, networking / Networking multiple workstations with snow and snowfall
- parallel cloud computing, with MapReduce / Parallel cloud computing with MapReduce and Hadoop
- parallel cloud computing, with Hadoop / Parallel cloud computing with MapReduce and Hadoop
- learning rate / Training neural networks with backpropagation
- left hand side (LHS) / Step 4 – evaluating model performance
- levels
- about / Thinking about types of machine learning algorithms
- likelihood
- about / Conditional probability with Bayes' theorem
- likelihood table
- about / Conditional probability with Bayes' theorem
- linear kernel / Using kernels for non-linear spaces
- linearly separable / Classification with hyperplanes
- linear regression
- about / Understanding regression
- link function
- about / Understanding regression
- list() function / Lists
- lists
- about / R data structures
- lm() function
- about / Building bigger regression models with biglm
- load() command / Saving and loading R data structures
- loess smooth / Visualizing relationships among features – the scatterplot matrix
- logistic regression
- about / Understanding regression
M
- M5' algorithm (M5-prime) / Step 5 – improving model performance
- machine learning
- origins / The origins of machine learning
- benefits / Uses and abuses of machine learning
- ethical considerations / Ethical considerations
- about / How do machines learn?
- applying, to data / Steps to apply machine learning to your data
- R, using / Using R for machine learning
- machine learning algorithm
- about / The origins of machine learning, Uses and abuses of machine learning
- selecting / Choosing a machine learning algorithm
- data, matching / Matching your data to an appropriate algorithm
- machine learning algorithms
- input training data / Thinking about the input data
- types / Thinking about types of machine learning algorithms
- Manhattan distance
- about / Calculating distance
- MapReduce programming model
- about / Parallel cloud computing with MapReduce and Hadoop
- parallel computing / Parallel cloud computing with MapReduce and Hadoop
- marginal likelihood
- about / Conditional probability with Bayes' theorem
- market basket analysis example
- data, collecting / Step 1 – collecting data
- data, preparing / Step 2 – exploring and preparing the data
- data, exploring / Step 2 – exploring and preparing the data
- sparse matrix, creating for transaction data / Data preparation – creating a sparse matrix for transaction data
- item support, visualizing / Visualizing item support – item frequency plots
- transaction data, visualizing / Visualizing transaction data – plotting the sparse matrix
- model, training on data / Step 3 – training a model on the data
- model performance, evaluating / Step 4 – evaluating model performance
- model performance, improving / Step 5 – improving model performance
- set of association rules, sorting / Sorting the set of association rules, Taking subsets of association rules
- association rules, saving to file / Saving association rules to a file or data frame
- massive matrices
- using, with bigmemory package / Using massive matrices with bigmemory
- matrix
- about / Matrixes and arrays
- matrix() function / Matrixes and arrays
- Maximum Margin Hyperplane (MMH)
- about / Finding the maximum margin
- case, of linearly separable data / The case of linearly separable data
- case, of non-linearly separable data / The case of non-linearly separable data
- mcapply() function / Using a multitasking operating system with multicore
- mean
- about / Measuring the central tendency – mean and median
- mean() function / Measuring the central tendency – mean and median, Ordinary least squares estimation
- mean absolute error (MAE) / Measuring performance with mean absolute error
- median
- about / Measuring the central tendency – mean and median
- median() function / Measuring the central tendency – mean and median
- medical expenses, predicting with linear regression
- about / Example – predicting medical expenses using linear regression
- data, collecting / Step 1 – collecting data
- data, preparing / Step 2 – exploring and preparing the data
- data, exploring / Step 2 – exploring and preparing the data
- correlation matrix / Exploring relationships among features – the correlation matrix
- relationships, exploring among features / Exploring relationships among features – the correlation matrix
- relationships, visualizing among features / Visualizing relationships among features – the scatterplot matrix
- scatterplot matrix / Visualizing relationships among features – the scatterplot matrix
- model, training on data / Step 3 – training a model on the data
- model performance, training / Step 4 – evaluating model performance
- model performance, improving / Step 5 – improving model performance, Transformation – converting a numeric variable to a binary indicator, Putting it all together – an improved regression model
- meta-learning methods
- about / Improving model performance with meta-learning
- used, for improving model performance / Improving model performance with meta-learning
- Microsoft Excel / Importing and saving data from CSV files
- Microsoft Excel spreadsheets
- reading, xlsx package used / Reading and writing Microsoft Excel spreadsheets using xlsx
- writing, xlsx package used / Reading and writing Microsoft Excel spreadsheets using xlsx
- Microsoft SQL
- about / Importing data from SQL databases
- min-max normalization
- about / Preparing data for use with kNN
- Mobile Phone Spam
- filtering, with naive Bayes algorithm / Example – filtering mobile phone spam with the naive Bayes algorithm
- Mobile Phone Spam example
- data, collecting / Step 1 – collecting data
- data, preparing / Step 2 – exploring and preparing the data
- data, exploring / Step 2 – exploring and preparing the data
- text data, processing for analysis / Data preparation – processing text data for analysis
- training, creating / Data preparation – creating training and test datasets
- test datasets, creating / Data preparation – creating training and test datasets
- text data, visualizing / Visualizing text data – word clouds
- indicator features, creating for frequent words / Data preparation – creating indicator features for frequent words
- model, training on data / Step 3 – training a model on the data
- model performance, evaluating / Step 4 – evaluating model performance
- model performance, improving / Step 5 – improving model performance
- mode
- about / Measuring the central tendency – the mode
- mode() function / Measuring the central tendency – the mode
- model
- about / Abstraction and knowledge representation
- model performance
- improving, with meta-learning / Improving model performance with meta-learning
- model performance, breast cancer example
- z-score standardization / Transformation – z-score standardization
- alternatives values, testing of k / Testing alternative values of k
- model trees
- about / Understanding regression trees and model trees
- multicore package
- about / Using a multitasking operating system with multicore
- multidimensional feature space / The kNN algorithm
- multilayer network
- about / The number of layers
- Multilayer Perceptron (MLP)
- about / The direction of information travel
- multimodal / Measuring the central tendency – the mode
- multiple linear regression
- about / Multiple linear regression
- strengths / Multiple linear regression
- weaknesses / Multiple linear regression
- multiple workstations
- networking, with snow package / Networking multiple workstations with snow and snowfall
- networking, with snowfall package / Networking multiple workstations with snow and snowfall
- multitasking operating system
- using, with multicore package / Using a multitasking operating system with multicore
- multivariate relationships
- about / Exploring relationships between variables
- MySpace / Finding teen market segments using k-means clustering
- MySQL
- about / Importing data from SQL databases
N
- naive Bayes
- numeric features, using with / Using numeric features with naive Bayes
- naive Bayes algorithm
- about / Understanding naive Bayes, The naive Bayes algorithm
- strengths / The naive Bayes algorithm
- weaknesses / The naive Bayes algorithm
- naive Bayes classification / The naive Bayes classification
- Laplace estimator / The Laplace estimator
- used, for filtering Mobile Phone Spam / Example – filtering mobile phone spam with the naive Bayes algorithm
- naive Bayes classification
- about / The naive Bayes classification
- naiveBayes() function, using in e1071 package / Step 3 – training a model on the data
- nearest neighbor classifiers
- about / Understanding classification using nearest neighbors
- network package
- about / Working with social network data and graph data
- URL, for info / Working with social network data and graph data
- network topology
- about / Network topology
- number of layers / The number of layers
- direction, of information travel / The direction of information travel
- number of nodes, in each layer / The number of nodes in each layer
- neural networks
- about / Understanding neural networks
- biological, to artificial neurons / From biological to artificial neurons
- characteristics / From biological to artificial neurons
- training, with backpropagation / Training neural networks with backpropagation
- neurons
- about / Understanding neural networks
- No Free Lunch theorem
- about / Choosing a machine learning algorithm
- nominal variables
- about / Factors
- non-linearly separable data / The case of non-linearly separable data
- non-linear spaces
- kernels, using for / Using kernels for non-linear spaces
- normal distributions / Understanding numeric data – uniform and normal distributions
- normalize() function / Transformation – normalizing numeric data
- numeric data
- about / Understanding numeric data – uniform and normal distributions
- normalizing / Transformation – normalizing numeric data
- numeric features
- using, with naive Bayes / Using numeric features with naive Bayes
- numeric prediction
- about / Thinking about types of machine learning algorithms
- numeric variables
- about / Exploring numeric variables
- exploring / Exploring numeric variables
- central tendency, measuring / Measuring the central tendency – mean and median
- spread, measuring / Measuring spread – quartiles and the five-number summary
- visualizing / Visualizing numeric variables – boxplots, Visualizing numeric variables – histograms
O
- <- operator / Vectors
- OCR, performing with SVMs
- about / Performing OCR with SVMs
- data, collecting / Step 1 – collecting data
- data, exploring / Step 2 – exploring and preparing the data
- data, preparing / Step 2 – exploring and preparing the data
- model, training on data / Step 3 – training a model on the data
- model performance, evaluating / Step 4 – evaluating model performance
- model performance, improving / Step 5 – improving model performance
- ODBC
- about / Importing data from SQL databases
- odbcConnect() function / Importing data from SQL databases
- one-way table / Exploring categorical variables
- One Rule algorithm
- about / The One Rule algorithm
- strengths / The One Rule algorithm
- weaknesses / The One Rule algorithm
- optimized learning algorithms
- deploying / Deploying optimized learning algorithms
- optimized learning algorithms deployment
- regression models, building with biglm package / Building bigger regression models with biglm
- random forests, building with bigrf package / Growing bigger and faster random forests with bigrf
- caret package, used for evaluating models in parallel / Training and evaluating models in parallel with caret
- Oracle
- about / Importing data from SQL databases
- order() function / Data preparation – creating random training and test datasets
- ordinary least squares (OLS) / Ordinary least squares estimation
- ordinary least squares estimation
- about / Ordinary least squares estimation
- out-of-bag error rate
- about / Training random forests
- Output Node / The number of layers
- overfitting
- about / Assessing the success of learning
P
- pairs() function / Visualizing relationships among features – the scatterplot matrix
- parallel computing methods
- about / Learning faster with parallel computing
- parameter estimates
- about / Simple linear regression
- parameter tuning
- about / Tuning stock models for better performance
- pattern discovery
- about / Thinking about types of machine learning algorithms
- Pearson's Chi-squared test / Examining relationships – two-way cross-tabulations
- Pearson's correlation
- about / Correlations
- performance
- measuring, confusion matrices used / Using confusion matrices to measure performance
- improving, of R / Improving the performance of R
- performance() function / ROC curves
- performance measures
- about / Beyond accuracy – other measures of performance
- kappa statistic / The kappa statistic
- sensitivity / Sensitivity and specificity
- specificity / Sensitivity and specificity
- precision / Precision and recall
- recall / Precision and recall
- F-measure / The F-measure
- performance tradeoffs
- visualizing / Visualizing performance tradeoffs
- plot() command / ROC curves
- plot() function / Visualizing relationships – scatterplots
- point-and-click interface
- used, for installing R package / Installing a package using the point-and-click interface
- poisonous mushrooms
- identifying, with rule learners / Example – identifying poisonous mushrooms with rule learners
- poisonous mushrooms example, with rule learners
- data, collecting / Step 1 – collecting data
- data, exploring / Step 2 – exploring and preparing the data
- data, preparing / Step 2 – exploring and preparing the data
- model, training on data / Step 3 – training a model on the data
- model performance, evaluating / Step 4 – evaluating model performance
- model performance, improving / Step 5 – improving model performance
- Poisson regression
- about / Understanding regression
- polynomial kernel / Using kernels for non-linear spaces
- posPredValue() function / Precision and recall
- posterior probability
- about / Conditional probability with Bayes' theorem
- PostgreSQL
- about / Importing data from SQL databases
- postpruning
- about / Pruning the decision tree
- precision
- about / Precision and recall
- pred function / Bagging
- predict() function / Working with classification prediction data in R, Creating a simple tuned model
- about / Bagging
- predictive model
- about / Thinking about types of machine learning algorithms
- prepruning
- about / Pruning the decision tree
- prior probability
- about / Conditional probability with Bayes' theorem
- probability
- about / Probability
Q
- quadratic optimization / The case of linearly separable data
- quantile() function / Measuring spread – quartiles and the five-number summary
- quartiles
- about / Measuring spread – quartiles and the five-number summary
R
- 68-95-99.7 rule / Measuring spread – variance and standard deviation
- R
- using, for machine learning / Using R for machine learning
- data structures / R data structures
- used, for managing data / Managing data with R
- CSV file, loading into / Importing and saving data from CSV files
- working with classification prediction data / Working with classification prediction data in R
- JSON, converting to / Reading and writing JSON with the rjson package
- performance, improving / Improving the performance of R
- Radial Basis Function (RBF) network
- about / Activation functions
- randomForest() function
- about / Training random forests
- randomForest package / Training random forests
- random forests
- about / Random forests
- strengths / Random forests
- weaknesses / Random forests
- training / Training random forests
- performance, evaluating / Evaluating random forest performance
- range
- about / Measuring spread – quartiles and the five-number summary
- range() function / Measuring spread – quartiles and the five-number summary
- RCurl package
- about / Getting data from the Web with the RCurl package
- used, for obtaining data from web / Getting data from the Web with the RCurl package
- URL, for documentation / Getting data from the Web with the RCurl package
- real-world data
- about / Working with specialized data
- recall / Precision and recall
- recurrent network
- about / The direction of information travel
- recursive partitioning
- about / Divide and conquer
- reg() function / Multiple linear regression
- regression
- about / Understanding regression
- simple linear regression / Simple linear regression
- ordinary least squares estimation / Ordinary least squares estimation
- correlation / Correlations
- multiple linear regression / Multiple linear regression
- adding, to trees / Adding regression to trees
- regression analysis
- use cases / Understanding regression
- regression equations
- about / Understanding regression
- regression models
- building, with biglm package / Building bigger regression models with biglm
- regression trees
- about / Understanding regression trees and model trees
- strengths / Adding regression to trees
- weaknesses / Adding regression to trees
- relationships
- exploring, between variables / Exploring relationships between variables
- visualizing / Visualizing relationships – scatterplots
- examining / Examining relationships – two-way cross-tabulations
- residuals
- about / Ordinary least squares estimation
- resubstitution error / Estimating future performance
- RHIPE package / Parallel cloud computing with MapReduce and Hadoop
- right hand side (RHS) / Step 4 – evaluating model performance
- RIPPER algorithm
- about / The RIPPER algorithm
- strengths / The RIPPER algorithm
- weaknesses / The RIPPER algorithm
- risky bank loans
- identifying, C5.0 decision trees used / Example – identifying risky bank loans using C5.0 decision trees, Step 1 – collecting data
- rjson package
- about / Reading and writing JSON with the rjson package
- used, for reading JSON / Reading and writing JSON with the rjson package
- used, for writing JSON / Reading and writing JSON with the rjson package
- rmr package
- about / Parallel cloud computing with MapReduce and Hadoop
- ROC curve
- about / ROC curves
- creating / ROC curves
- ROCR package
- about / Visualizing performance tradeoffs
- RODBC package
- about / Importing data from SQL databases
- rote learning
- about / Why is the kNN algorithm lazy?
- round() function / Exploring categorical variables
- R package
- installing / Installing an R package
- installing, point-and-click interface used / Installing a package using the point-and-click interface
- loading / Loading an R package
- R performance
- large datasets, managing / Managing very large datasets
- learning, with parallel computing / Learning faster with parallel computing
- GPU computing / GPU computing
- optimized learning algorithms, deploying / Deploying optimized learning algorithms
- rudimentary ANNs / Understanding neural networks
- runif() function / Data preparation – creating random training and test datasets
- RWeka package
- using / Installing and loading R packages
- loading / Loading an R package
S
- save() function / Saving and loading R data structures
- scale() function / Transformation – z-score standardization
- scatterplot
- about / Visualizing relationships – scatterplots
- Scoville scale
- about / Preparing data for use with kNN
- sd() function / Measuring spread – variance and standard deviation, Correlations
- semi-supervised learning
- about / Clustering as a machine learning task
- sensitivity() function / Precision and recall
- sensor / The origins of machine learning
- separate-and-conquer
- about / Separate and conquer
- seq() function / Measuring spread – quartiles and the five-number summary
- Short Message Service (SMS) / Example – filtering mobile phone spam with the naive Bayes algorithm
- sigmoid activation function
- about / Activation functions
- sigmoid kernel / Using kernels for non-linear spaces
- simple linear regression
- about / Simple linear regression
- simple tuned model
- creating / Creating a simple tuned model
- single-layer network
- about / The number of layers
- skew / Visualizing numeric variables – histograms
- slack variable / The case of non-linearly separable data
- slope
- about / Understanding regression
- sna package
- URL, for info / Working with social network data and graph data
- snowfall package
- multiple workstations, networking / Networking multiple workstations with snow and snowfall
- snow package
- about / Networking multiple workstations with snow and snowfall
- multiple workstations, networking / Networking multiple workstations with snow and snowfall
- social network data
- working with / Working with social network data and graph data
- Social Networking Service (SNS) / Finding teen market segments using k-means clustering
- sparse matrix
- about / Data preparation – processing text data for analysis, Data preparation – creating a sparse matrix for transaction data
- creating, for transaction data / Data preparation – creating a sparse matrix for transaction data
- specialized data
- working with / Working with specialized data
- SQL databases
- data, importing from / Importing data from SQL databases
- SQLite
- about / Importing data from SQL databases
- sqlQuery() function / Importing data from SQL databases
- stacking
- about / Understanding ensembles
- standard deviation
- about / Measuring spread – variance and standard deviation
- standard deviation reduction (SDR) / Adding regression to trees
- stock models
- tuning, for better performance / Tuning stock models for better performance
- stop words
- about / Data preparation – processing text data for analysis
- str() function
- about / Exploring the structure of data
- stringsAsFactors option / Data frames
- subset() function / Working with classification prediction data in R
- summary() function / Exploring numeric variables
- summary statistics
- about / Exploring numeric variables
- Sum of Squared Errors (SSE) / Step 3 – training a model on the data
- supervised learning
- about / Thinking about types of machine learning algorithms
- support vector machine (SVM)
- about / Bagging
- Support Vector Machine (SVM)
- about / Understanding Support Vector Machines
- applications / Understanding Support Vector Machines
- classifications, with hyperplanes / Classification with hyperplanes
- maximum margin, finding / Finding the maximum margin
- OCR, performing with / Performing OCR with SVMs
- support vectors / Finding the maximum margin
- synapse
- about / From biological to artificial neurons
T
- Tab-Separated Value (TSV)
- about / Importing and saving data from CSV files
- table() function / Exploring categorical variables, Using confusion matrices to measure performance
- target feature
- about / Thinking about types of machine learning algorithms
- teen market segments serach, with k-means clustering
- about / Finding teen market segments using k-means clustering
- data, collecting / Step 1 – collecting data
- data, exploring / Step 2 – exploring and preparing the data
- data, preparing / Step 2 – exploring and preparing the data, Data preparation – dummy coding missing values, Data preparation – imputing missing values
- model, training on data / Step 3 – training a model on the data
- model performance, evaluating / Step 4 – evaluating model performance
- model performance, improving / Step 5 – improving model performance
- threshold activation function
- about / Activation functions
- tm package / Data preparation – processing text data for analysis
- token
- about / Data preparation – processing text data for analysis
- tokenization
- about / Data preparation – processing text data for analysis
- topology
- about / Network topology
- train() function / Using caret for automated parameter tuning, Creating a simple tuned model
- trainControl() function / Customizing the tuning process
- training
- about / Abstraction and knowledge representation
- transaction data
- sparse matrix, creating for / Data preparation – creating a sparse matrix for transaction data
- transpose
- about / Multiple linear regression
- trees
- regression, adding to / Adding regression to trees
- tree structure
- about / Understanding decision trees
- trial
- about / Basic concepts of Bayesian methods
- trivial rules / Step 4 – evaluating model performance
- tuning process
- customizing / Customizing the tuning process
- Turing test
- about / Understanding neural networks
- two-way cross-tabulation
- about / Examining relationships – two-way cross-tabulations
U
- UCI Machine Learning Data Repository
- URL / Step 1 – collecting data, Step 1 – collecting data
- about / Step 1 – collecting data
- uniform distribution / Understanding numeric data – uniform and normal distributions
- unimodal / Measuring the central tendency – the mode
- unit of observation phrase / Thinking about the input data
- unit step activation function
- about / Activation functions
- univariate statistics
- about / Exploring relationships between variables
- universal function approximator
- about / The number of nodes in each layer
- unsupervised classification
- about / Clustering as a machine learning task
- unsupervised learning
- about / Thinking about types of machine learning algorithms
- usedcars.csv dataset
- about / Exploring and understanding data
V
- var() function / Measuring spread – variance and standard deviation, Ordinary least squares estimation
- variables
- relationships, exploring between / Exploring relationships between variables
- variance
- about / Measuring spread – variance and standard deviation
- vector
- about / R data structures, Vectors
- vector types
- integer / Vectors
- numeric / Vectors
- character / Vectors
- logical / Vectors
- Venn diagram
- about / Joint probability
- Voronoi diagram / Using distance to assign and update clusters
W
- web
- data, obtaining from / Getting data from the Web with the RCurl package
- weighted voting process
- about / Choosing an appropriate k
- wine quality estimation, with regression trees
- about / Example – estimating the quality of wines with regression trees and model trees
- data, collecting / Step 1 – collecting data
- data, exploring / Step 2 – exploring and preparing the data
- data, preparing / Step 2 – exploring and preparing the data
- model, training on data / Step 3 – training a model on the data
- decision trees, visualizing / Visualizing decision trees
- model performance, evaluating / Step 4 – evaluating model performance
- performance, measuring with mean absolute error / Measuring performance with mean absolute error
- model performance, improving / Step 5 – improving model performance
- word cloud
- about / Visualizing text data – word clouds
X
- xlsx package
- about / Reading and writing Microsoft Excel spreadsheets using xlsx
- used, for reading Microsoft Excel spreadsheets / Reading and writing Microsoft Excel spreadsheets using xlsx
- used, for writing Microsoft Excel spreadsheets / Reading and writing Microsoft Excel spreadsheets using xlsx
- URL / Reading and writing Microsoft Excel spreadsheets using xlsx
- XML
- about / Reading and writing XML with the XML package
- reading, XML package used / Reading and writing XML with the XML package
- writing, XML package used / Reading and writing XML with the XML package
- XML package
- about / Reading and writing XML with the XML package
- used, for reading XML / Reading and writing XML with the XML package
- used, for writing XML / Reading and writing XML with the XML package
- URL, for info / Reading and writing XML with the XML package
Z
- z-score standardization
- about / Preparing data for use with kNN
- ZeroR
- about / The One Rule algorithm