Like most programming languages, R lets you assign values to variables and refer to these objects by name. The names you use to refer to variables are called symbols in R. This allows you to keep some information available in case it's needed at a later point in time. These variables may contain any type of object available in R, even combinations of them when using lists, as we will see in a later section in this chapter. Furthermore, these objects are immutable, but that's a topic for Chapter 9, Implementing an Efficient Simple Moving Average.
In R, the assignment operator is <-, which is a less-than symbol (<) followed by a dash (-). If you have worked with algorithm pseudo code before, you may find it familiar. You may also use the single equals symbol (=) for assignments, similar to many other languages, but I prefer to stick to the <- operator.
An expression like x <- 1 means that the value 1 is assigned to the x symbol, which can be thought of as a variable. You can also assign the other way around, meaning that with an expression like 1 -> x we would have the same effect as we did earlier. However, the assignment from left to right is very rarely used, and is more of a convenience feature in case you forget the assignment operator at the beginning of a line in the console.
Note that the value substitution is done at the time when the value is assigned to z, not at the time when z is evaluated. If you enter the following code into the console, you can see that the second time z is printed, it still has the value that y had when it was used to assign to it, not the y value assigned afterward:
x <- 1 y <- 2 z <- c(x, y) z
#> [1] 1 2
y <- 3 z
#> [1] 1 2
It's easy to use variable names like x, y, and z, but using them has a high cost for real programs. When you use names like that, you probably have a very good idea of what values they will contain and how they will be used. In other words, their intention is clear for you. However, when you give your code to someone else or come back to it after a long period of time, those intentions may not be clear, and that's when cryptic names can be harmful. In real programs, your names should be self descriptive and instantly communicate intention.
For a deeper discussion about this and many other topics regarding high-quality code, take a look at Martin's excellent book Clean Code: A Handbook of Agile Software Craftsmanship, Prentice Hall, 2008.
Standard object names in R should only contain alphanumeric characters (numbers and ASCII letters), underscores (_), and, depending on context, even periods (.). However, R will allow you to use very cryptic strings if you want. For example, in the following code, we show how the variable !A @B #C $D %E ^F name is used to contain a vector with three integers. As you can see, you are even allowed to use spaces. You can use this non-standard name provided that you wrap the string with backticks (`):
`!A @B #C $D %E ^F` <- c(1, 2, 3) `!A @B #C $D %E ^F`
#> [1] 1 2 3
It goes without saying that you should avoid those names, but you should be aware they exist because they may come in handy when using some of R's more advanced features. These types of variable names are not allowed in most languages, but R is flexible in that way. Furthermore, the example goes to show a common theme around R programming: it is so flexible that if you're not careful, you will end up shooting yourself in the foot. It's not too rare for someone to be very confused about some code because they assumed R would behave a certain way (for example, raise an error under certain conditions) but don't explicitly test for such behavior, and later find that it behaves differently.