What this book covers
Chapter 1, Gearing Up for Predictive Modeling, helps you set up and get ready to start looking at individual models and case studies, then describes the process of predictive modeling in a series of steps, and introduces several fundamental distinctions.
Chapter 2, Tidying Data and Measuring Performance, covers performance metrics, learning curves, and a process for tidying data.
Chapter 3, Linear Regression, explains the classic starting point for predictive modeling; it starts from the simplest single variable model and moves on to multiple regression, over-fitting, regularization, and describes regularized extensions of linear regression.
Chapter 4, Generalized Linear Models, follows on from linear regression, and in this chapter, introduces logistic regression as a form of binary classification, extends this to multinomial logistic regression, and uses these as a platform to present the concepts of sensitivity and specificity.
Chapter 5, Neural Networks, explains that the model of logistic regression can be seen as a single layer perceptron. This chapter discusses neural networks as an extension of this idea, along with their origins and explores their power.
Chapter 6, Support Vector Machines, covers a method of transforming data into a different space using a kernel function and as an attempt to find a decision line that maximizes the margin between the classes.
Chapter 7, Tree-Based Methods, presents various tree-based methods that are popularly used, such as decision trees and the famous C5.0 algorithm. Regression trees are also covered, as well as random forests, making the link with the previous chapter on bagging. Cross validation methods for evaluating predictors are presented in the context of these tree-based methods.
Chapter 8, Dimensionality Reduction, covers PCA, ICA, Factor analysis, and Non-negative Matrix factorization.
Chapter 9, Ensemble Methods, discusses methods for combining either many predictors, or multiple trained versions of the same predictor. This chapter introduces the important notions of bagging and boosting and how to use the AdaBoost algorithm to improve performance on one of the previously analyzed datasets using a single classifier.
Chapter 10, Probabilistic Graphical Models, introduces the Naive Bayes classifier as the simplest graphical model following a discussion of conditional probability and Bayes' rule. The Naive Bayes classifier is showcased in the context of sentiment analysis. Hidden Markov Models are also introduced and demonstrated through the task of next word prediction.
Chapter 11, Topic Modeling, provides step-by-step instructions for making predictions on topic models. It will also demonstrate methods of dimensionality reduction to summarize and simplify the data.
Chapter 12, Recommendation Systems, explores different approaches to building recommender systems in R, using nearest neighbor approaches, clustering, and algorithms such as collaborative filtering.
Chapter 13, Scaling Up, explains working with very large datasets, including some worked examples of how to train some models we've seen so far with very large datasets.
Chapter 14, Deep Learning, tackles the really important topic of deep learning using examples such as word embedding and recurrent neural networks (RNNs).