Looping allows us to do repetitive task in a couple of lines of code, saving us much effort and time. Functions allow us to write a block of instructions that could be modified to work according to the way they are being called. Combining the power of looping, functions, and apply family in R allows us to loop through the elements of a data type, or similar, and apply a function or use a block of instructions on each of these.
Looping, functions, and apply family in R
Looping in R
Suppose we want to loop through all the values of the aug_price column inside all_prices4 and square them and return them. We can do so in the following way:
jan = all_prices4$jan_price
for(price in jan){
print(price^2)
}
This prints a square of all the prices in January as follows:
Functions in R
We can also achieve the previous result by using a function. Let's name this function square:
square = function(data){
for(price in data){
print(price^2)
}
}
Now call the function as follows:
square(all_prices4$jan_price)
The following output also shows the squared price of jan_price:
Now suppose we want to have the ability to take elements to any power, not just square. We can attain it by making a little tweak to the function:
power_function = function(data, power){
for(price in data){
print(price^power)
}
}
Now suppose we want to take the power of 4 for the price in June, we can do the following:
power_function(all_prices4$june_price, 4)
We can see that the june_price column is taken to the fourth power as follows:
Apply family – lapply, sapply, apply, tapply
We discuss apply family here, which allows us not to have to write loops and reduces our workload. We will discuss four functions under this family: apply, lapply, sapply, and tapply.
apply
apply works on arrays or matrices and gives us an easier way to compute something row-wise or column-wise. For the apply() function, this row- or column-wise consideration is denoted by a margin. The apply() function takes the following form: apply(data, margin, function). This data has to be an array or a matrix, and the margin can be either 1 or 2, where 1 stands for a row-wise operation and 2 stands for a column-wise operation. We will work with the matrix all_prices, which has the following structure:
Here, we have a record of prices of three different items in three different months (January, March, and June), where a row represents the prices of an item in three different months and a column represents the prices of three different items in any single month. Now, if we want to know which item's price fluctuated most over these three months, we would have to compute a standard deviation row-wise for each row. We can do this very easily using margin = 1 in apply().
apply(all_prices, 1, sd)
We can see the standard deviation for these three items as follows:
Now suppose we want to know the month-wise total cost of all three items. As every column corresponds to different months, we can apply apply() with margin = 2 and a function mean to achieve this:
apply(all_prices, 2, sum)
This gives the sum for all three months in a vector:
We see that the total prices were the highest in June (the third column), totaling 78.
lapply
In the previously mentioned power_function() function, we had to use a for loop to loop through all the values of the june_price column of the all_prices4 data frame. lapply allows us to define a function (or use an already existing function) over all the elements of a list or vector and it returns a list. Let's redefine power_function() to allow for the computation of different powers on elements and then use lapply to loop through each element of a list or vector and take the power of each of these elements on every iteration of the loop. lapply() has the following format:
lapply(data, function, arguments_of_the_function) power_function2 = function(data, power){
data^power
}
lapply(all_prices4$june_price, power_function2, 4)
As we saw in the last output, all the prices of june_price are taken to the fourth power and are returned as a list:
unlist(lapply(all_prices4$june_price, power_function2, 4))
Now we are returned the fourth power of the june_price column as a vector.
Now we will again work with a combined array, which has the prices of different items in three different months each for 2017 and 2018. Do you remember the structure of it? It looked like this:
Here, the first matrix corresponds to prices for 2017 and the second matrix corresponds to 2018. We will now recreate this array to become a list of matrices in the following way:
combined2 = list(matrix(c(jan_2018, mar_2018, june_2018), nrow = 3),
matrix(c(jan_2017, mar_2017, june_2017), nrow = 3))
combined2
This returns us the following list of matrices:
Now, if we want the prices for March for both 2017 and 2018, we can use lapply() in the following way:
lapply(combined2, "[", 2,)
So, what this has done is selected the second row from each list:
Now we can modify it further to select a column, row, or any element according to our needs.
sapply
What we have got by using unlist(lapply(data, function, arguments_of_the_function)) can be obtained simply by using sapply(data, function, arguments_of_the_function).
sapply(all_prices4$june_price, power_function2, 4)
We are returned with a vector again as follows:
Now let's go back to the example of the all_prices3 data frame. We can see this from the screenshot that follows:
tapply
Now, suppose instead of prices for 2018 only, we have prices for these items for 2017, 2016, and 2015 as well. This new data frame is defined as follows:
all_prices = data.frame(items = rep(c("potato", "rice", "oil"), 4),
jan_price = c(10, 20, 30, 10, 18, 25, 9, 17, 24, 9, 19,27),
mar_price = c(11, 22, 33, 13, 25, 32, 12, 21, 33, 15, 27,39),
june_price = c(20, 25, 33, 21, 24, 40, 17, 22, 27, 13, 18,23)
)
all_prices
The output for the preceding lines of code can be seen as follows:
Now suppose we want to take the mean price of different items for very March in all years. We can do this by using tapply(numerical_variable, categorical_variable, function). So, we will need to convert the items column of the all_prices data frame to a categorical variable to take the mean price.
tapply(all_prices$mar_price, factor(all_prices$items), mean)
This gives us a mean March price for oil, potato, and rice in all years, as follows:
Note the use of factor() to convert the items column to a factor variable.
There are other apply functions, but that's it for now, folks. We will introduce new functions as and when it will be necessary as we proceed to new chapters for geospatial analysis.
To install a new package, we need to write install.packages("package_name"), and to use any package, we need to write load.packages("package_name").