Functions are a fundamental building block of R. To master many of the more advanced techniques in this book, you need a solid foundation in how they work. We've already used a few functions above since you can't really do anything interesting in R without them. They are just what you remember from your mathematics classes, a way to transform inputs into outputs. Specifically in R, a function is an object that takes other objects as inputs, called arguments, and returns an output object. Most functions are in the following form f(argument_1, argument_2, ...). Where f is the name of the function, and argument_1, argument_2, and so on are the arguments to the function.
Before we continue, we should briefly mention the role of curly braces ({}) in R. Often they are used to group a set of operations in the body of a function, but they can also be used in other contexts (as we will see in the case of the web application we will build in Chapter 10, Adding Interactivity with Dashboards). Curly braces are used to evaluate a series of expressions, which are separated by newlines or semicolons, and return only the last expression as a result. For example, the following line only prints the x + y operation to the screen, hiding the output of the x * y operation, which would have been printed had we typed the expressions step by step. In this sense, curly braces are used to encapsulate a set of behavior and only provide the result from the last expression:
{ x <- 1; y <- 2; x * y; x + y }
#> [1] 3
We can create our own function by using the function() constructor and assign it to a symbol. The function() constructor takes an arbitrary number of named arguments, which can be used within the body of the function. Unnamed arguments can also be passed using the "..." argument notation, but that's an advanced technique we won't look at in this book. Feel free to read the documentation for functions to learn more about them.
When calling the function, arguments can be passed by position or by name. The positional order must correspond to the order provided in the function's signature (that is, the function() specification with the corresponding arguments), but when using named arguments, we can send them in whatever order we prefer. As the following example shows.
In the following example, we create a function that calculates the Euclidian distance (https://en.wikipedia.org/wiki/Euclidean_distance) between two numeric vectors, and we show how the order of the arguments can be changed if we use named arguments. To realize this effect, we use the print() function to make sure we can see in the console what R is receiving as the x and y vectors. When developing your own programs, using the print() function in similar ways is very useful to understand what's happening.
Instead of using the function name like euclidian_distance, we will use l2_norm because it's the generalized name for such an operation when working with spaces of arbitrary number dimensions and because it will make a follow-up example easier to understand. Note that even though outside the function call our vectors are called a and b, since they are passed into the x and y arguments, those are the names we need to use within our function. It's easy for beginners to confuse these objects as being the same if we had used the x and y names in both places:
l2_norm <- function(x, y) { print("x") print(x) print("y") print(y) element_to_element_difference <- x - y result <- sum(element_to_element_difference^2) return(result) } a <- c(1, 2, 3)
b <- c(4, 5, 6)
l2_norm(a, b)
#> [1] "x"
#> [1] 1 2 3
#> [1] "y"
#> [1] 4 5 6
#> [1] 27
l2_norm(b, a)
#> [1] "x"
#> [1] 4 5 6
#> [1] "y"
#> [1] 1 2 3
#> [1] 27
l2_norm(x = a, y = b)
#> [1] "x"
#> [1] 1 2 3
#> [1] "y"
#> [1] 4 5 6
#> [1] 27
l2_norm(y = b, x = a)
#> [1] "x"
#> [1] 1 2 3
#> [1] "y"
#> [1] 4 5 6
#> [1] 27
Functions may use the return() function to specify the value returned by the function. However, R will simply return the last evaluated expression as the result of a function, so you may see code that does not make use of the return() function explicitly.
Our previous l2_norm() function implementation seems to be somewhat cluttered. If the function has a single expression, then we can avoid using the curly braces, which we can achieve by removing the print() function calls and avoid creating intermediate objects, and since we know that it's working fine, we can do so without hesitation. Furthermore, we avoid explicitly calling the return() function to simplify our code even more. If we do so, our function looks much closer to its mathematical definition and is easier to understand, isn't it?
l2_norm <- function(x, y) sum((x - y)^2)
Furthermore, in case you did not notice, since we use vectorized operations, we can send vectors of different lengths (dimensions), provided that both vectors share the same length, and the function will work just as we expect it to, without regard for the dimensionality of the space we're working with. As I had mentioned earlier, vectorization can be quite powerful. In the following example, we show such behavior with vectors of dimension 1 (mathematically known as scalars), as well as vectors of dimension 5, created with the ":" shortcut syntax:
l2_norm(1, 2)
#> [1] 1
l2_norm(1:5, 6:10)
#> [1] 125
Before we move on, I just want to mention that you should always make an effort to follow the Single Responsibility principle, which states that each object (functions in this case) should focus on doing a single thing, and do it very well. Whenever you describe a function you created as doing "something" and "something else," you're probably doing it wrong since the "and" should let you know that the function is doing more than one thing, and you should split it into two or more functions that possibly call each other. To read more about good software engineering principles, take a look at Martin's great book title Agile Software Development, Principles, Patterns, and Practices, Pearson, 2002.