Search icon CANCEL
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Machine Learning with R Cookbook, Second Edition

You're reading from   Machine Learning with R Cookbook, Second Edition Analyze data and build predictive models

Arrow left icon
Product type Paperback
Published in Oct 2017
Publisher Packt
ISBN-13 9781787284395
Length 572 pages
Edition 2nd Edition
Languages
Tools
Arrow right icon
Authors (2):
Arrow left icon
Ashish Bhatia Ashish Bhatia
Author Profile Icon Ashish Bhatia
Ashish Bhatia
Yu-Wei, Chiu (David Chiu) Yu-Wei, Chiu (David Chiu)
Author Profile Icon Yu-Wei, Chiu (David Chiu)
Yu-Wei, Chiu (David Chiu)
Arrow right icon
View More author details
Toc

Table of Contents (15) Chapters Close

Preface 1. Practical Machine Learning with R FREE CHAPTER 2. Data Exploration with Air Quality Datasets 3. Analyzing Time Series Data 4. R and Statistics 5. Understanding Regression Analysis 6. Survival Analysis 7. Classification 1 - Tree, Lazy, and Probabilistic 8. Classification 2 - Neural Network and SVM 9. Model Evaluation 10. Ensemble Learning 11. Clustering 12. Association Analysis and Sequence Mining 13. Dimension Reduction 14. Big Data Analysis (R and Hadoop)

Understanding of basic data structures

Ensure you have completed the previous recipes by installing R on your operating system.

Data types

You need to have brief idea about basic data types and structures in R in order to grasp all the recipies in book. This section will give you an overview for the same and make you ready for using R. R supports all the basic data types supported by any other programming and scripting language. In simple words, data can be of numeric, character, date, and logical type. As the name suggests, numeric means all type of numbers, while logical allows only true and false. To check the type of data, the class function, which will display the class of the data, is used.

Perform following task on R Console or RStudio:

> x=123 
> class(x) 
Output: 
[1] "numeric"
> x="ABC"
> class(x)
Output:
[1] "character"

Data structures

R supports different types of data structures to store and process data. The following is a list of basic and commonly used data structures used in R:

  • Vectors
  • List
  • Array
  • Matrix
  • DataFrames

Vectors

A vector is a container that stores data of same type. It can be thought of as a traditional array in programming language. It is not to be confused with mathematical vector which have rows and columns. To create a vector the c() function, which will combine the arguments, is used. One of the beautiful features of vectors is that any operation performed on vector is performed on each element of the vector. If a vector consists of three elements, adding two will increases every element by two.

How to do it...

Perform the following steps to create and see vector in R:

> x=c(1,2,3,4) # c(1:4) 
> x 
Output: 
[1] 1 2 3 4 
> x=c(1,2,3,4,"ABC") 
> x 
Output: 
[1] "1" "2" "3" "4" "ABC" 
> x * 2 
Output: 
[1] 2 4 6 8 
> sqrt(x) 
Output: 
[1] 1.000000 1.414214 1.732051 2.000000 
> y = x==2 
> y 
Output: 
[1] FALSE TRUE FALSE FALSE 
> class(y) 
Output: 
[1] "logical" 
> t = c(1:10) 
> t 
Output: 
[1] 1 2 3 4 5 6 7 8 9 10

How it works...

Printing a vector will starts with index [1] which shows the elements are indexed in vector and it starts from 1, not from 0 like other languages. Any operation done on a vector is applied on individual elements of the vector, so the multiplication operation is applied on individual elements of the vector. If vector is passed as an argument to any inbuilt function, it will be applied on individual elements. You can see how powerful it is and it removes the need to write the loops for doing the operation. The vector changes the type on basis of data it holds and operation we apply on it. Using x==2 will check each element of vector for equality with two and returns the vector with logical value, that is, TRUE or FALSE. There are many other ways of creating a vector; one such way is shown in creating vector t.

Lists

Unlike a vector, a list can store any type of data. A list is, again, a container that can store arbitrary data. A list can contain another list, a vector, or any other data structure. To create a list, the list function is used.

How to do it...

Perform the following steps to create and see a list in R:

> y = list(1,2,3,"ABC") 
> y 
Output: 
[[1]]
[1]1
[[2]]
[1]2
[[3]]
[1]3
[[4]]
[1] "ABC" > y = list(c(1,2,3),c("A","B","C")) > y Output: [[1]]
[1] 1 2 3
[[2]]
[1] "A" "B" "C"

How it works...

A list, as said, can contain anything; we start with a simple example to store some elements in a list using the list function. In the next step, we create a list with a vector as element of the list. So, y is a list with its first element as vector of 1, 2, 3 and its second element as vector of A, B, and C.

Array

An array is nothing but a multidimensional vector, and can store only the same type of data. The way to create a multidimensional vector dimension is specified using dim.

How to do it...

Perform the following steps to create and see an array in R:

> t = array(c(1,2,3,4), dim=c(2,2)) # Create two dimensional array 
> t 
Output: 
    [,1]       [,2]
[1,] 1 3
[2,] 2 4 > arr = array(c(1:24), dim=c(3,4,2)) # Creating three dimensional array > arr Output: , , 1

[,1] [,2] [,3] [,4]
[1,] 1 4 7 10
[2,] 2 5 8 11
[3,] 3 6 9 12
, , 2
[,1] [,2] [,3] [,4]
[1,] 13 16 19 22
[2,] 14 17 20 23 [3,] 15 18 21 24

How it works...

Creating an array is straightforward. Use the array function and provide the value for nth row; it will create a two-dimensional array with appropriate columns.

Matrix

A matrix is like a DataFrame, with the constraint that every element must be of the same type.

How to do it...

Perform the following steps to create and see a matrix in R:

> m = matrix(c(1,2,3,4,5,6), nrow=3) 
> m 
Output: 
[,1]       [,2]
[1,] 1 4
[2,] 2 5
[3,] 3 6

DataFrame

DataFrame can be seen as an Excel spreadsheet, with rows and columns where every column can have different data types. In R, each column of a DataFrame is a vector.

How to do it...

Perform the following steps:

    > p = c(1,2,3)
    > q = c("A","B","C")
    > r = c(TRUE, FALSE, FALSE)
    > d = data.frame(No=p, Name=q, Attendance=r)
    > d
    Output:
           No       Name   Attendance
    1       1          A      TRUE
    2       2          B      FALSE
    3       3          C      FALSE
You have been reading a chapter from
Machine Learning with R Cookbook, Second Edition - Second Edition
Published in: Oct 2017
Publisher: Packt
ISBN-13: 9781787284395
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime