Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds
Arrow up icon
GO TO TOP
R Statistics Cookbook

You're reading from   R Statistics Cookbook Over 100 recipes for performing complex statistical operations with R 3.5

Arrow left icon
Product type Paperback
Published in Mar 2019
Publisher Packt
ISBN-13 9781789802566
Length 448 pages
Edition 1st Edition
Languages
Tools
Concepts
Arrow right icon
Author (1):
Arrow left icon
Francisco Juretig Francisco Juretig
Author Profile Icon Francisco Juretig
Francisco Juretig
Arrow right icon
View More author details
Toc

Table of Contents (12) Chapters Close

Preface 1. Getting Started with R and Statistics FREE CHAPTER 2. Univariate and Multivariate Tests for Equality of Means 3. Linear Regression 4. Bayesian Regression 5. Nonparametric Methods 6. Robust Methods 7. Time Series Analysis 8. Mixed Effects Models 9. Predictive Models Using the Caret Package 10. Bayesian Networks and Hidden Markov Models 11. Other Books You May Enjoy

Using R6 classes

Object-oriented programming allows us to organize our code in classes, encapsulating similar functionality together, and also allowing us clearly to separate internal from external methods. For example, we can design a class that has a method for reading data from a file, another method for removing outliers, and another one for selecting a subset of the columns. We can decide to keep all of these methods as public, meaning that we can access them from outside the class definition.

R supports object-oriented programming via S3 and S4 classes. The R6Class package, allows us to use R6 classes. These allow us to define our own classes in R in a very easy way. They also support inheritance, meaning that we can define a parent class and several derived classes that inherit from it. This implies that the derived classes can access all the methods and attributes from the parent class.

The central advantage of using inheritance is its simplification of the code (thus avoiding the duplication of functions). Also, using inheritance generates a structure in our code (where classes are connected via base/parent classes), which makes our code easier to read.

Getting ready

In order to run this example, we need to install the R6 package. It can be installed using install.packages("R6")

How to do it...

We will load data from a .csv file containing records customers, and we will instantiate a new class instance for each record. These records will be added to a list.

  1. Import the R6 library:
library(R6)
  1. Load the data from a .csv file:
customers = read.csv("./Customers_data.csv")
  1. We will now begin defining the R6Class structure. Note that we have two lists, one for the public attributes or methods, and another one for the private (these methods or attributes can only be accessed by other methods from this class). The initialize method is called whenever we create a new instance of this class. Note that we refer to the internal elements from this class using the self$ notation:
Customer = R6Class(public=list(Customer_id = NULL,Name = NULL,City = NULL,
initialize = function(customer_id,name,city,Missing_product,Missing_since){
self$Customer_id <- customer_id
self$Name <- name
self$City <- city
},
is_city_in_america = function(){
return (upper_(self$City) %in% c("NEW YORK","LONDON","MIAMI","BARCELONA"))
},
full_print = function(){
print("------------------------------------")
print(paste("Customer name ->",self$Name))
print(paste("Customer city ->",self$City))
print("------------------------------------")
}
),private=list(
upper_ = function(x){
return (toupper(x))
}
))
  1. We loop through our DataFrame and create a new Customer instance, passing three arguments. These are passed to the initialize method that we defined previously:
list_of_customers = list()
for (row in 1:nrow(customers)){
row_read = customers[row,]
customer = Customer$new(row_read$Customer_id,row_read$Name,row_read$City)
list_of_customers[[row]] <- (customer)
}
  1. We call our print method:
list_of_customers[[1]]$full_print()

The following screenshot prints the customer name and city:

How it works...

Let's assume we want to process clients' data from a CSV file. The R6 classes support public and private components. Each one of them will be defined as a list containing both methods or attributes. For example, we will store the customer_id, the name, and the city as public attributes. We need to initialize them to NULL. We also need an initialize method that will be called whenever the class is instantiated. This is the equivalent of a constructor in other programming languages. Inside the initializer or constructor, we typically want to store the variables provided by the user. We need to use the self keyword to refer to the class variables. We then define a method that will return either TRUE or FALSE if the city belongs is in America or not. Another method, called full_print(), will print the contents of the class.

The lock_objects method is not usually very important; it indicates whether we want to lock the elements in the class. If we set lock=FALSE, that means that we can add more attributes later, if we want to.

Here, we only have one private method. Since it is private, it can only be called within the class, but not externally. This method, called upper_, will be used to transform the text into uppercase.

After the class is defined, we loop through the DataFrame and select each row sequentially. We instantiate the class for each row, and then we add each one of these into a list.

The convenience of using classes is that we now have a list containing each instance. We can call the specific methods or attributes for each element in this list. For example, we can get a specific element and then call the is_city_in_america method; and finally we call the full_print method.

There's more...

The R6 package also supports inheritance, meaning that we can define a base class (that will act as a parent), and a derived class (that will act as a child). The derived class will be able to access all the methods and attributes defined in the parent class, reducing code duplication, and simplifying its maintainability. In this example, we will create a derived class called Customer_missprod, which will store data for clients who haven't yet received a product they were expecting. Note that the way we achieve this is by using the inherit parameter.

Note that we are overriding the full_print method, and we are printing some extra variables. It is important to understand the difference between the super and self methods—the former is used to refer to attributes or methods present in the base class. We evidently need to override the constructor (already defined in the base class) because we have more variables now:

library(R6)
customers = read.csv("./Customers_data_missing_products.csv")
Customer_missprod = R6Class(inherit = Customer,
public=list(Missing_prod = NULL,Missing_since = NULL,
initialize = function(customer_id,name,city,Missing_product,Missing_since){
super$Customer_id <- customer_id
super$Name <- name
super$City <- city
self$Missing_prod <- Missing_product
self$Missing_since <- Missing_since
},
full_print = function(){
print("------------------------------------")
print(paste("Customer name ->",super$Name))
print(paste("Customer city ->",super$City))
print(paste("Missing prod ->",self$Missing_prod))
print(paste("Missing since ->",self$Missing_since))
print("------------------------------------")
}
)
)

list_of_customers = list()
for (row in 1:nrow(customers)){
row_read = customers[row,]
customer = Customer_missprod$new(row_read$Customer_id,row_read$Name,row_read$City,row_read$Missing_product,row_read$Missing_since)
list_of_customers[[row]] <- (customer)
}

list_of_customers[[1]]$full_print()

Take a look at the following screenshot:

You have been reading a chapter from
R Statistics Cookbook
Published in: Mar 2019
Publisher: Packt
ISBN-13: 9781789802566
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image