Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
RStudio for R Statistical Computing Cookbook

You're reading from   RStudio for R Statistical Computing Cookbook Over 50 practical and useful recipes to help you perform data analysis with R by unleashing every native RStudio feature

Arrow left icon
Product type Paperback
Published in Apr 2016
Publisher
ISBN-13 9781784391034
Length 246 pages
Edition 1st Edition
Languages
Tools
Arrow right icon
Author (1):
Arrow left icon
Andrea Cirillo Andrea Cirillo
Author Profile Icon Andrea Cirillo
Andrea Cirillo
Arrow right icon
View More author details
Toc

Table of Contents (10) Chapters Close

Preface 1. Acquiring Data for Your Project 2. Preparing for Analysis – Data Cleansing and Manipulation FREE CHAPTER 3. Basic Visualization Techniques 4. Advanced and Interactive Visualization 5. Power Programming with R 6. Domain-specific Applications 7. Developing Static Reports 8. Dynamic Reporting and Web Application Development Index

Loading your data into R with rio packages

The rio package is a relatively recent R package, developed by Thomas J. Leeper, which makes data import and export in R painless and quick.

This objective is mainly reached when rio makes assumptions about the file format. This means that the rio package guesses the format of the file you are trying to import and consequently applies import functions appropriate to that format.

All of this is done behind the scenes, and the user is just required to run the import() function.

As Leeper often states when talking about the package: "it just works."

One of the great results you can obtain by employing this package is streamlining workflows involving different development and productivity tools.

For instance, it is possible to produce tables directly into sas and make them available to the R environment without any particular export procedure in sas, we can directly acquire data in R as it is produced, or input into an Excel spreadsheet.

Getting ready

As you would expect, we first need to install and load the rio package:

install.packages("rio")
library(rio)

In the following example, we are going to import our well-known world_gdp_data dataset from a local .csv file.

How to do it...

  1. The first step is to import the dataset using the import() function:
    messy_gdp ← import("world_gdp_data.csv")
  2. Then, we visualize the result with the RStudio viewer:
    View(messy_gdp)

How it works...

We first import the dataset using the import() function. To understand the structure of the import() function, we can leverage a useful behavior of the R console: putting a function name without parentheses and running the command will result in the printing of all the function definitions.

Running the import on the R console will produce the following output:

function (file, format, setclass, ...) 
{
    if (missing(format)) 
        fmt <- get_ext(file)
    else fmt <- tolower(format)
    if (grepl("^http.*://", file)) {
        temp_file <- tempfile(fileext = fmt)
        on.exit(unlink(temp_file))
        curl_download(file, temp_file, mode = "wb")
        file <- temp_file
    }
    x <- switch(fmt, r = dget(file = file), tsv = import.delim(file = file, 
        sep = "\t", ...), txt = import.delim(file = file, sep = "\t", 
        ...), fwf = import.fwf(file = file, ...), rds = readRDS(file = file, 
        ...), csv = import.delim(file = file, sep = ",", ...), 
        csv2 = import.delim(file = file, sep = ";", dec = ",", 
            ...), psv = import.delim(file = file, sep = "|", 
            ...), rdata = import.rdata(file = file, ...), dta = import.dta(file = file, 
            ...), dbf = read.dbf(file = file, ...), dif = read.DIF(file = file, 
            ...), sav = import.sav(file = file, ...), por = read_por(path = file), 
        sas7bdat = read_sas(b7dat = file, ...), xpt = read.xport(file = file), 
        mtp = read.mtp(file = file, ...), syd = read.systat(file = file, 
            to.data.frame = TRUE), json = fromJSON(txt = file, 
            ...), rec = read.epiinfo(file = file, ...), arff = read.arff(file = file), 
        xls = read_excel(path = file, ...), xlsx = import.xlsx(file = file, 
            ...), fortran = import.fortran(file = file, ...), 
        zip = import.zip(file = file, ...), tar = import.tar(file = file, 
            ...), ods = import.ods(file = file, ...), xml = import.xml(file = file, 
            ...), clipboard = import.clipboard(...), gnumeric = stop(stop_for_import(fmt)), 
        jpg = stop(stop_for_import(fmt)), png = stop(stop_for_import(fmt)), 
        bmp = stop(stop_for_import(fmt)), tiff = stop(stop_for_import(fmt)), 
        sss = stop(stop_for_import(fmt)), sdmx = stop(stop_for_import(fmt)), 
        matlab = stop(stop_for_import(fmt)), gexf = stop(stop_for_import(fmt)), 
        npy = stop(stop_for_import(fmt)), stop("Unrecognized file format"))
    if (missing(setclass)) {
        return(set_class(x))
    }
    else {
        a <- list(...)
        if ("data.table" %in% names(a) && isTRUE(a[["data.table"]])) 
            setclass <- "data.table"
        return(set_class(x, class = setclass))
    }
}

As you can see, the first task performed by the import() function calls the get_ext() function, which basically retrieves the extension from the filename.

Once the file format is clear, the import() function looks for the right subimport function to be used and returns the result of this function.

Next, we visualize the result with the RStudio viewer. One of the most powerful RStudio tools is the data viewer, which lets you get a spreadsheet-like view of your data.frame objects. With RStudio 0.99, this tool got even more powerful, removing the previous 1000-row limit and adding the ability to filter and format your data in the correct order.

When using this viewer, you should be aware that all filtering and ordering activities will not affect the original data.frame object you are visualizing.

There's more...

As fully illustrated within the Rio vignette (which can be found at https://cran.r-project.org/web/packages/rio/vignettes/rio.html), the following formats are supported for import and export:

Format

Import

Export

Tab-separated data (.tsv)

Yes

Yes

Comma-separated data (.csv)

Yes

Yes

CSVY (CSV + YAML metadata header) (.csvy)

Yes

Yes

Pipe-separated data (.psv)

Yes

Yes

Fixed-width format data (.fwf)

Yes

Yes

Serialized R objects (.rds)

Yes

Yes

Saved R objects (.RData)

Yes

Yes

JSON (.json)

Yes

Yes

YAML (.yml)

Yes

Yes

Stata (.dta)

Yes

Yes

SPSS and SPSS portable

Yes (.sav and .por)

Yes (.sav only)

XBASE database files (.dbf)

Yes

Yes

Excel (.xls)

Yes

 

Excel (.xlsx)

Yes

Yes

Weka Attribute-Relation File Format (.arff)

Yes

Yes

R syntax (.R)

Yes

Yes

Shallow XML documents (.xml)

Yes

Yes

SAS (.sas7bdat)

Yes

 

SAS XPORT (.xpt)

Yes

 

Minitab (.mtp)

Yes

 

Epiinfo (.rec)

Yes

 

Systat (.syd)

Yes

 

Data Interchange Format (.dif)

Yes

 

OpenDocument Spreadsheet (.ods)

Yes

 

Fortran data (no recognized extension)

Yes

 

Google Sheets

Yes

 

Clipboard (default is .tsv)

  

Since Rio is still a growing package, I strongly suggest that you follow its development on its GitHub repository, where you will easily find out when new formats are added, at https://github.com/leeper/rio.

lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image