Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Machine Learning with R Cookbook, Second Edition

You're reading from   Machine Learning with R Cookbook, Second Edition Analyze data and build predictive models

Arrow left icon
Product type Paperback
Published in Oct 2017
Publisher Packt
ISBN-13 9781787284395
Length 572 pages
Edition 2nd Edition
Languages
Tools
Arrow right icon
Authors (2):
Arrow left icon
Ashish Bhatia Ashish Bhatia
Author Profile Icon Ashish Bhatia
Ashish Bhatia
Yu-Wei, Chiu (David Chiu) Yu-Wei, Chiu (David Chiu)
Author Profile Icon Yu-Wei, Chiu (David Chiu)
Yu-Wei, Chiu (David Chiu)
Arrow right icon
View More author details
Toc

Table of Contents (15) Chapters Close

Preface 1. Practical Machine Learning with R FREE CHAPTER 2. Data Exploration with Air Quality Datasets 3. Analyzing Time Series Data 4. R and Statistics 5. Understanding Regression Analysis 6. Survival Analysis 7. Classification 1 - Tree, Lazy, and Probabilistic 8. Classification 2 - Neural Network and SVM 9. Model Evaluation 10. Ensemble Learning 11. Clustering 12. Association Analysis and Sequence Mining 13. Dimension Reduction 14. Big Data Analysis (R and Hadoop)

Reading and writing data

Before starting to explore data, you must load the data into the R session. This recipe will introduce methods to load data from a file into the memory, use the predefined data within R, using the data from database.

Getting ready

First, start an R session on your machine. As this recipe involves steps toward the file I/O, if the user does not specify the full path, read and write activity will take place in the current working directory. For working with databases, it is assumed you have working PostgreSQL on your system with some data.

You can simply type getwd() in the R session to obtain the current working directory location. However, if you would like to change the current working directory, you can use setwd("<path>"), where <path> can be replaced with your desired path, to specify the working directory.

How to do it...

Perform the following steps to read and write data with R:

  1. To view the built-in datasets of R, type the following command:
        > data() 
  1. R will return a list of datasets in a dataset package, and the list comprises the name and description of each dataset.
  2. To load the dataset iris into an R session, type the following command:
        > data(iris)  
  1. The dataset iris is now loaded into the DataFrame format, which is a common
    data structure in R to store a data table.
  1. To view the data type of iris, simply use the class function:
        > class(iris)
        [1] "data.frame"
  1. The data.frame console print shows that the iris dataset is in the structure of DataFrame.
  2. Use the save function to store an object in a file. For example, to save the loaded iris data into myData.RData, use the following command:
        > save(iris, file="myData.RData")  
  1. Use the load function to read a saved object into an R session. For example, to load iris data from myData.RData, use the following command:
        > load("myData.RData")  
  1. In addition to using built-in datasets, R also provides a function to import data from text into a DataFrame. For example, the read.table function can format a given text into a DataFrame:
        > test.data = read.table(header = TRUE, text = " 
        + a b 
        + 1 2 
        + 3 4 
        + ") 
  1. You can also use row.names and col.names to specify the names of columns and rows:
        > test.data = read.table(text = " 
        + 1 2 
        + 3 4",  
        + col.names=c("a","b"), 
        + row.names = c("first","second")) 
  1. View the class of the test.data variable:
        > class(test.data) 
        [1] "data.frame" 
  1. The class function shows that the test.data variable contains a DataFrame.
  2. In addition to importing data by using the read.table function, you can use the write.table function to export data to a text file:
        > write.table(test.data, file = "test.txt" , sep = " ") 
  1. The write.table function will write the content of test.data into test.txt
    (the written path can be found by typing getwd()), with a separation delimiter as white space.
  2. Similar to write.table, write.csv can also export data to a file. However, write.csv uses a comma as the default delimiter:
        > write.csv(test.data, file = "test.csv")  
  1. With the read.csv function, the csv file can be imported as a DataFrame. However, the last example writes column and row names of the DataFrame to the test.csv file. Therefore, specifying header to TRUE and row names as the first column within the function can ensure the read DataFrame will not treat the header and the first column as values:
        > csv.data = read.csv("test.csv", header = TRUE, row.names=1) 
        > head(csv.data) 
          a b 
        1 1 2 
        2 3 4 

This section will cover how to work with the database. To connect with PostgreSQL, the RPostgreSQL package is required which can be installed using this command:

> install.packages("RPostgreSQL") 
It will install package in your system. You need to have active internet connection for this command to complete. Once installed you can use the package for accessing database. You need to have username, password, database name for accessing the PostgreSQL. Replace the value with your values for parameter in dbconnect function.
> require("RPostgreSQL") 
> driver = dbDriver("PostgreSQL") 
> connection = dbConnect(driver, dbname="restapp", host="localhost", 
         port=5432, user="postgres", password="postgres") 
> dbExistsTable(connection, "country") 
[1] TRUE
TRUE shows that table exists in the database. To query the table use.
> data = dbGetQuery(connection, "select * from country") 
> class(data) 
Output: 
[1] "data.frame" 
> data 
Output: 
    id         code     name
1 1 US USA
2 43 AS Austria
3 55 BR Brazil

Reading table data will result in to DataFrame in R.

How it works...

Generally, data for collection may be in multiple files and different formats. To exchange data between files and RData, R provides many built-in functions, such as save, load, read.csv, read.table, write.csv, and write.table.

This example first demonstrates how to load the built-in dataset iris into an R session.
The iris dataset is the most famous and commonly used dataset in the field of machine learning. Here, we use the iris dataset as an example. The recipe shows how to save RData and load it with the save and load functions. Furthermore, the example explains how to use read.table, write.table, read.csv, and write.csv to exchange data from files to a DataFrame. The use of the R I/O function to read and write data is very important as most of the data sources are external. Therefore, you have to use these functions to load data into an R session.

You need to install the package for reading from the database. For all database, you can find the package, after installing the steps mostly remains the same for reading the data from the database.

There's more...

For the load, read.table, and read.csv functions, the file to be read can also be a complete URL (for supported URLs, use ?url for more information).

On some occasions, data may be in an Excel file instead of a flat text file. The WriteXLS package allows writing an object into an Excel file with a given variable in the first argument and the file to be written in the second argument:

  1. Install the WriteXLS package:
        > install.packages("WriteXLS")  
  1. Load the WriteXLS package:
        > library("WriteXLS")  
  1. Use the WriteXLS function to write the DataFrame iris into a file named iris.xls:
        > WriteXLS("iris", ExcelFileName="iris.xls")  
You have been reading a chapter from
Machine Learning with R Cookbook, Second Edition - Second Edition
Published in: Oct 2017
Publisher: Packt
ISBN-13: 9781787284395
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image