Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
R Statistics Cookbook

You're reading from   R Statistics Cookbook Over 100 recipes for performing complex statistical operations with R 3.5

Arrow left icon
Product type Paperback
Published in Mar 2019
Publisher Packt
ISBN-13 9781789802566
Length 448 pages
Edition 1st Edition
Languages
Tools
Concepts
Arrow right icon
Author (1):
Arrow left icon
Francisco Juretig Francisco Juretig
Author Profile Icon Francisco Juretig
Francisco Juretig
Arrow right icon
View More author details
Toc

Table of Contents (12) Chapters Close

Preface 1. Getting Started with R and Statistics FREE CHAPTER 2. Univariate and Multivariate Tests for Equality of Means 3. Linear Regression 4. Bayesian Regression 5. Nonparametric Methods 6. Robust Methods 7. Time Series Analysis 8. Mixed Effects Models 9. Predictive Models Using the Caret Package 10. Bayesian Networks and Hidden Markov Models 11. Other Books You May Enjoy

C++ in R via the Rcpp package

The Rcpp package has become one of the most important packages for R. Essentially, it allows us to integrate C++ code into our R scripts seamlessly. The main advantage of this, is that we can achieve major efficiency gains, especially if our code needs to use lots of loops. A second advantage, is that C++ has a library called the standard template library (STL) that has very efficient containers (for example, vectors and linked lists) that are extremely fast and well suited for many programming tasks. Depending on the case, you could expect to see improvements anywhere from 2x to 50x. This is why many of the most widely used packages have been rewritten to leverage Rcpp. In Rcpp, we can code in C++ using the typical C++ syntax, but we also have specific containers that are well suited to store R-specific elements. For example, we can use the Rcpp::NumericVector or the Rcpp::DataFrame, which are certainly not native C++ variable types.

The traditional way of using Rcpp involves writing the code in C++ and sourcing it in R. Since Rcpp 0.8.3, we can also use the Rcpp sugar style, which is a different way of coding in C++. Rcpp sugar brings some elements of the high-level R syntax into C++, allowing us to achieve the same things with a much more concise syntax.

We can use Rcpp in two ways, using the functions in an inline way, or coding them in a separate script and sourcing it via sourcecpp(). The latter approach is slightly preferable as it clearly separates the C++ and R code into different files.

Getting ready

In order to run this code, Rcpp needs to be installed using install.packages ("Rcpp").

How to do it...

In this case, we will compare R against Rcpp (C++) for the following task. We will load a vector and a matrix. Our function will loop through each element of the vector, through each row, and through each row and column of that matrix, counting the number of instances where the elements of the matrix are greater than the ones in the vector.

  1. Save the C++ code into a file named rcpp_example.cpp and we will source it from an R script:
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
int bring_element (NumericVector rand_vector, NumericMatrix rand_matrix) {
Rcout << "Process starting" << std::endl;
int mcounter = 0;
for (int q = 0; q < rand_vector.size();q++){
for (int x = 0; x < rand_matrix.rows();x++){
for (int y = 0; y < rand_matrix.cols();y++){
double v1 = rand_matrix.at(x,y);
double v2 = rand_vector[q];
if ( v1 < v2){
mcounter++;
}
}
}
}
Rcout << "Process ended" << std::endl;
return mcounter;
}
  1. In the corresponding R script, we need the following code:
library(Rcpp)
sourceCpp("./rcpp_example.cpp")
Rfunc <- function(rand__vector,rand_matrix){
mcounter = 0
for (q in 1:length(rand__vector)){
for (x in 1:dim(rand_matrix)[1]){
for (y in 1:dim(rand_matrix)[2]){
v1 = rand_matrix[x,y];
v2 = rand__vector[q];
if ( v1 < v2){
mcounter = mcounter+1
}
}
}
}
return (mcounter)
}
  1. Generate a vector and a matrix of random Gaussian numbers:
some__matrix = replicate(500, rnorm(20))
some__vector = rnorm(100)
  1. Save the starting and end times for the Rcpp function, and subtract the starting time from the end time:
start_time <- Sys.time()
bring_element(some__vector,some__matrix)
end_time <- Sys.time()
print(end_time - start_time)
  1. Do the same as we did in the previous step, but for the R function:
start_time <- Sys.time()
Rfunc(some__vector,some__matrix)
end_time <- Sys.time()
print(end_time - start_time)

The C++ function takes 0.10 seconds to complete, whereas the R one takes 0.21 seconds. This is to be expected, as R loops are generally slow, but they are extremely fast in C++:

How it works...

The sourcecpp function loads the C++ or Rcpp code from an external script and compiles it. In this C++ file, we first include the Rcpp headers that will allow us to add R functionality to our C++ script. We use using namespace Rcpp to tell the compiler that we want to work with that namespace (so we avoid the need to type Rcpp:: when using Rcpp functionality). The Rcpp::export declaration tells the compiler that we want to export this function to R.

Our bring_element function will return an integer (so that is why we have an int in its declaration). The arguments will be NumericVector and NumericMatrix, which are not C++ native variable types but Rcpp ones. These allow us to use vectors and matrices that operate with numbers without needing to declare explicitly if we will be working with integers, large integers, or float numbers. The Rcout function allows us to print output from C++ to the R console. We then loop through the columns and rows from the vector and the matrix, using standard C++ syntax. What is not standard here is the way we can get the number of columns and rows in these elements (using .rows() and .cols()), since these attributes are available through Rcpp. The R code is quite straightforward, and we then run it using a standard timer.

See also

The homepage for Rcpp is hosted at http://www.rcpp.org.

Rcpp is deeply connected to the Armadillo library (used for very high-performance algebra). This allows us to include this excellent library in our Rcpp projects.

The Rinside package (http://dirk.eddelbuettel.com/code/rinside.html), built by the creators of Rcpp, allows us to embed R code easily inside our C++ projects. These C++ projects can be compiled and run without requiring R. The implications of the Rinside package are enormous, since we can code highly performant statistical applications in C++ that we can distribute as standalone executable programs.

You have been reading a chapter from
R Statistics Cookbook
Published in: Mar 2019
Publisher: Packt
ISBN-13: 9781789802566
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image