Ensure you have completed the previous recipes by installing R on your operating system.
Understanding of basic data structures
Data types
You need to have brief idea about basic data types and structures in R in order to grasp all the recipies in book. This section will give you an overview for the same and make you ready for using R. R supports all the basic data types supported by any other programming and scripting language. In simple words, data can be of numeric, character, date, and logical type. As the name suggests, numeric means all type of numbers, while logical allows only true and false. To check the type of data, the class function, which will display the class of the data, is used.
Perform following task on R Console or RStudio:
> x=123 > class(x) Output: [1] "numeric"
> x="ABC"
> class(x)
Output:
[1] "character"
Data structures
R supports different types of data structures to store and process data. The following is a list of basic and commonly used data structures used in R:
- Vectors
- List
- Array
- Matrix
- DataFrames
Vectors
A vector is a container that stores data of same type. It can be thought of as a traditional array in programming language. It is not to be confused with mathematical vector which have rows and columns. To create a vector the c() function, which will combine the arguments, is used. One of the beautiful features of vectors is that any operation performed on vector is performed on each element of the vector. If a vector consists of three elements, adding two will increases every element by two.
How to do it...
Perform the following steps to create and see vector in R:
> x=c(1,2,3,4) # c(1:4) > x Output: [1] 1 2 3 4 > x=c(1,2,3,4,"ABC") > x Output: [1] "1" "2" "3" "4" "ABC" > x * 2 Output: [1] 2 4 6 8 > sqrt(x) Output: [1] 1.000000 1.414214 1.732051 2.000000 > y = x==2 > y Output: [1] FALSE TRUE FALSE FALSE > class(y) Output: [1] "logical" > t = c(1:10) > t Output: [1] 1 2 3 4 5 6 7 8 9 10
How it works...
Printing a vector will starts with index [1] which shows the elements are indexed in vector and it starts from 1, not from 0 like other languages. Any operation done on a vector is applied on individual elements of the vector, so the multiplication operation is applied on individual elements of the vector. If vector is passed as an argument to any inbuilt function, it will be applied on individual elements. You can see how powerful it is and it removes the need to write the loops for doing the operation. The vector changes the type on basis of data it holds and operation we apply on it. Using x==2 will check each element of vector for equality with two and returns the vector with logical value, that is, TRUE or FALSE. There are many other ways of creating a vector; one such way is shown in creating vector t.
Lists
Unlike a vector, a list can store any type of data. A list is, again, a container that can store arbitrary data. A list can contain another list, a vector, or any other data structure. To create a list, the list function is used.
How to do it...
Perform the following steps to create and see a list in R:
> y = list(1,2,3,"ABC") > y Output: [[1]]
[1]1
[[2]]
[1]2
[[3]]
[1]3
[[4]]
[1] "ABC" > y = list(c(1,2,3),c("A","B","C")) > y Output: [[1]]
[1] 1 2 3
[[2]]
[1] "A" "B" "C"
How it works...
A list, as said, can contain anything; we start with a simple example to store some elements in a list using the list function. In the next step, we create a list with a vector as element of the list. So, y is a list with its first element as vector of 1, 2, 3 and its second element as vector of A, B, and C.
Array
An array is nothing but a multidimensional vector, and can store only the same type of data. The way to create a multidimensional vector dimension is specified using dim.
How to do it...
Perform the following steps to create and see an array in R:
> t = array(c(1,2,3,4), dim=c(2,2)) # Create two dimensional array > t Output: [,1] [,2]
[1,] 1 3
[2,] 2 4 > arr = array(c(1:24), dim=c(3,4,2)) # Creating three dimensional array > arr Output: , , 1
[,1] [,2] [,3] [,4]
[1,] 1 4 7 10
[2,] 2 5 8 11
[3,] 3 6 9 12
, , 2
[,1] [,2] [,3] [,4]
[1,] 13 16 19 22
[2,] 14 17 20 23 [3,] 15 18 21 24
How it works...
Creating an array is straightforward. Use the array function and provide the value for nth row; it will create a two-dimensional array with appropriate columns.
Matrix
A matrix is like a DataFrame, with the constraint that every element must be of the same type.
How to do it...
Perform the following steps to create and see a matrix in R:
> m = matrix(c(1,2,3,4,5,6), nrow=3) > m Output: [,1] [,2]
[1,] 1 4
[2,] 2 5
[3,] 3 6
DataFrame
DataFrame can be seen as an Excel spreadsheet, with rows and columns where every column can have different data types. In R, each column of a DataFrame is a vector.
How to do it...
Perform the following steps:
> p = c(1,2,3) > q = c("A","B","C") > r = c(TRUE, FALSE, FALSE) > d = data.frame(No=p, Name=q, Attendance=r) > d Output: No Name Attendance 1 1 A TRUE 2 2 B FALSE 3 3 C FALSE