You're reading from Regression Analysis with R Design and develop statistical nodes to identify unique relationships within data at scale

Product type Paperback

Published in Jan 2018

Publisher Packt

ISBN-13 9781788627306

Length 422 pages

Edition 1st Edition

Languages

Concepts

Design

Author (1):

Giuseppe Ciaburro

View More author details

R packages for regression

Previously, we have mentioned the R packages, which allow us to access a series of features to solve a specific problem. In this section, we will present some packages that contain valuable resources for regression analysis. These packages will be analyzed in detail in the following chapters, where we will provide practical applications.

The R stats package

R stats is a package that contains many useful functions for statistical calculations and random number generation. In the following table you will see some of the information on this package:

Package	`stats`
Date	October 3, 2017
Version	3.5.0
Title	The R stats package
Author	R core team and contributors worldwide

There are so many functions in the package; we will only mention the ones that are closest to regression analysis. These are the most useful functions used in regression analysis:

lm: This function is used to fit linear models. It can be used to carry out regression, single stratum analysis of variance, and analysis of co-variance.
summary.lm: This function returns a summary for linear model fits.
coef: With the help of this function, coefficients from objects returned by modeling functions can be extracted. Coefficients is an alias for it.
fitted: Fitted values are extracted by this function from objects returned by modeling functions fitted. Values are an alias for it.
formula: This function provides a way of extracting formulae which have been included in other objects.
predict: This function predicts values based on linear model objects.
residuals: This function extracts model residuals from objects returned by modeling functions.
confint: This function computes confidence intervals for one or more parameters in a fitted model. Base has a method for objects inheriting from the lm class.
deviance: This function returns the deviance of a fitted model object.
influence.measures: This suite of functions can be used to compute some of the regression (leave-one-out deletion) diagnostics for linear and generalized linear models (GLM).
lm.influence: This function provides the basic quantities used when forming a wide variety of diagnostics for checking the quality of regression fits.
ls.diag: This function computes basic statistics, including standard errors, t-values, and p-values for the regression coefficients.
glm: This function is used to fit GLMs, specified by giving a symbolic description of the linear predictor and a description of the error distribution.
loess: This function fits a polynomial surface determined by one or more numerical predictors, using local fitting.
loess.control: This function sets control parameters for loess fits.
predict.loess: This function extracts predictions from a loess fit, optionally with standard errors.
scatter.smooth: This function plots and adds a smooth curve computed by loess to a scatter plot.

What we have analyzed are just some of the many functions contained in the stats package. As we can see, with the resources offered by this package we can build a linear regression model, as well as GLMs (such as multiple linear regression, polynomial regression, and logistic regression). We will also be able to make model diagnosis in order to verify the plausibility of the classic hypotheses underlying the regression model, but we can also address local regression models with a non-parametric approach that suits multiple regressions in the local neighborhood.

The car package

This package includes many functions for: ANOVA analysis, matrix and vector transformations, printing readable tables of coefficients from several regression models, creating residual plots, tests for the autocorrelation of error terms, and many other general interest statistical and graphing functions.

In the following table you will see some of the information on this package:

Package	`car`
Date	June 25, 2017
Version	2.1-5
Title	Companion to Applied Regression
Author	John Fox, Sanford Weisberg, and many others

The following are the most useful functions used in regression analysis contained in this package:

Anova: This function returns ANOVA tables for linear and GLMs
linear.hypothesis: This function is used for testing a linear hypothesis and methods for linear models, GLMs, multivariate linear models, and linear and generalized linear mixed-effects models
cookd: This function returns Cook's distances for linear and GLMs
outlier.test: This function reports the Bonferroni p-values for studentized residuals in linear and GLMs, based on a t-test for linear models and a normal-distribution test for GLMs
durbin.watson: This function computes residual autocorrelations and generalized Durbin-Watson statistics and their bootstrapped p-values
levene.test: This function computes Levene's test for the homogeneity of variance across groups
ncv.test: This function computes a score test of the hypothesis of constant error variance against the alternative that the error variance changes with the level of the response (fitted values), or with a linear combination of predictors

What we have listed are just some of the many functions contained in the stats package. In this package, there are also many functions that allow us to draw explanatory graphs from information extracted from regression models as well as a series of functions that allow us to make variables transformations.

The MASS package

This package includes many useful functions and data examples, including functions for estimating linear models through generalized least squares (GLS), fitting negative binomial linear models, the robust fitting of linear models, and Kruskal's non-metric multidimensional scaling.

In the following table you will see some of the information on this package:

Package	`MASS`
Date	October 2, 2017
Version	7.3-47
Title	Support Functions and Datasets for Venables and Ripley's MASS
Author	Brian Ripley, Bill Venables, and many others

The following are the most useful functions used in regression analysis contained in this package:

lm.gls: This function fits linear models by GLS
lm.ridge: This function fist a linear model by Ridge regression
glm.nb: This function contains a modification of the system function
glm(): It includes an estimation of the additional parameter, theta, to give a negative binomial GLM
polr: A logistic or probit regression model to an ordered factor response is fitted by this function
lqs: This function fits a regression to the good points in the dataset, thereby achieving a regression estimator with a high breakdown point
rlm: This function fits a linear model by robust regression using an M-estimator
glmmPQL: This function fits a GLMM model with multivariate normal random effects, using penalized quasi-likelihood (PQL)
boxcox: This function computes and optionally plots profile log-likelihoods for the parameter of the Box-Cox power transformation for linear models

As we have seen, this package contains many useful features in regression analysis; in addition there are numerous datasets that we can use for our examples that we will encounter in the following chapters.

The caret package

This package contains many functions to streamline the model training process for complex regression and classification problems. The package utilizes a number of R packages.

In the following table you will see listed some of the information on this package:

Package	`caret`
Date	September 7, 2017
Version	6.0-77
Title	Classification and Regression Training
Author	Max Kuhn and many others

The most useful functions used in regression analysis in this package are as follows:

train: Predictive models over different tuning parameters are fitted by this function. It fits each model, sets up a grid of tuning parameters for a number of classification and regression routines, and calculates a resampling-based performance measure.
trainControl: This function permits the estimation of parameter coefficients with the help of resampling methods like cross-validation.
varImp: This function calculates variable importance for the objects produced by train and method-specific methods.
defaultSummary: This function calculates performance across resamples. Given two numeric vectors of data, the mean squared error and R-squared error are calculated. For two factors, the overall agreement rate and Kappa are determined.
knnreg: This function performs K-Nearest Neighbor (KNN) regression that can return the average value for the neighbors.
plotObsVsPred: This function plots observed versus predicted results in regression and classification models.
predict.knnreg: This function extracts predictions from the KNN regression model.

The caret package contains hundreds of machine learning algorithms (also for regression), and renders useful and convenient methods for data visualization, data resampling, model tuning, and model comparison, among other features.

The glmnet package

This package contains many extremely efficient procedures in order to fit the entire Lasso or ElasticNet regularization path for linear regression, logistic and multinomial regression models, Poisson regression, and the Cox model. Multiple response Gaussian and grouped multinomial regression are the two recent additions.

In the following table you will see listed some of the information on this package:

Package	`glmnet`
Date	September 21, 2017
Version	2.0-13
Title	Lasso and Elastic-Net Regularized Generalized Linear Models
Author	Jerome Friedman, Trevor Hastie, Noah Simon, Junyang Qian, and Rob Tibshirani

The following are the most useful functions used in regression analysis contained in this package:

glmnet: A GLM is fit by this function via penalized maximum likelihood. The regularization path is computed for the Lasso or ElasticNet penalty at a grid of values for the regularization parameter lambda. This function can also deal with all shapes of data, including very large sparse data matrices. Finally, it fits linear, logistic and multinomial, Poisson, and Cox regression models.
glmnet.control: This function views and/or changes the factory default parameters in glmnet.
predict.glmnet: This function predicts fitted values, logits, coefficients, and more from a fitted glmnet object.
print.glmnet: This function prints a summary of the glmnet path at each step along the path.
plot.glmnet: This function produces a coefficient profile plot of the coefficient paths for a fitted glmnet object.
deviance.glmnet: This function computes the deviance sequence from the glmnet object.

As we have mentioned, this package fits Lasso and ElasticNet model paths for regression, logistic, and multinomial regression using coordinate descent. The algorithm is extremely fast, and exploits sparsity in the input matrix where it exists. A variety of predictions can be made from the fitted models.

The sgd package

This package contains a fast and flexible set of tools for large scale estimation. It features many stochastic gradient methods, built-in models, visualization tools, automated hyperparameter tuning, model checking, interval estimation, and convergence diagnostics.

In the following table you will see listed some of the information on this package:

Package	`sgd`
Date	January 5, 2016
Version	1.1
Title	Stochastic Gradient Descent for Scalable Estimation
Author	Dustin Tran, Panos Toulis, Tian Lian, Ye Kuang, and Edoardo Airoldi

The following are the most useful functions used in regression analysis contained in this package:

sgd: This function runs Stochastic Gradient Descent (SGD) in order to optimize the induced loss function given a model and data
print.sgd: This function prints objects of the sgd class
predict.sgd: This function forms predictions using the estimated model parameters from SGD
plot.sgd: This function plots objects of the sgd class

The BLR package

This package performs a special case of linear regression named Bayesian linear regression. In Bayesian linear regression, the statistical analysis is undertaken within the context of a Bayesian inference.

In the following table you will see listed some of the information on this package:

Package	`BLR`
Date	December 3, 2014
Version	1.4
Title	Bayesian Linear Regression
Author	Gustavo de los Campos, Paulino Perez Rodriguez

The following are the most useful functions used in regression analysis contained in this package:

BLR: This function was designed to fit parametric regression models using different types of shrinkage methods.
sets: This is a vector (599x1) that assigns observations to ten disjointed sets; the assignment was generated at random. This is used later to conduct a 10-fold CV.

The Lars package

This package contains efficient procedures for fitting an entire Lasso sequence with the cost of a single least squares fit. Least angle regression and infinitesimal forward stagewise regression are related to the Lasso.

In the following table you will see listed some of the information on this package:

Package	`Lars`
Date	April 23, 2013
Version	1.2
Title	Least Angle Regression, Lasso and Forward Stagewise
Author	Trevor Hastie and Brad Efron

The following are the most useful functions used in regression analysis contained in this package:

lars: This function fits least angle regression and Lasso and infinitesimal forward stagewise regression models.
summary.lars: This function produces an ANOVA-type summary for a lars object.
plot.lars: This function produce a plot of a lars fit. The default is a complete coefficient path.
predict.lars: This function make predictions or extracts coefficients from a fitted lars model.