Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Mastering Scientific Computing with R

You're reading from   Mastering Scientific Computing with R Employ professional quantitative methods to answer scientific questions with a powerful open source data analysis environment

Arrow left icon
Product type Paperback
Published in Jan 2015
Publisher
ISBN-13 9781783555253
Length 432 pages
Edition 1st Edition
Languages
Arrow right icon
Toc

Table of Contents (12) Chapters Close

Preface 1. Programming with R 2. Statistical Methods with R FREE CHAPTER 3. Linear Models 4. Nonlinear Methods 5. Linear Algebra 6. Principal Component Analysis and the Common Factor Model 7. Structural Equation Modeling and Confirmatory Factor Analysis 8. Simulations 9. Optimization 10. Advanced Data Management Index

Basic plots and the ggplot2 package

This section will review how to make basic plots using the built-in R functions and the ggplot2 package to plot graphics.

Basic plots in R include histograms and scatterplots. To plot a histogram, we use the hist() function:

> x <- c(5, 7, 12, 15, 35, 9, 5, 17, 24, 27, 16, 32)
> hist(x) 

The output is shown in the following plot:

Basic plots and the ggplot2 package

You can plot mathematical formulas with the plot() function as follows:

> x <- seq(2, 25, by=1)
> y <- x^2 +3
> plot(x, y)

The output is shown in the following plot:

Basic plots and the ggplot2 package

You can graph a univariate mathematical function on an interval using the curve() function with the from and to arguments to set the left and right endpoints, respectively. The expr argument allows you to set a numeric vector or function that returns a numeric vector as an output, as follows:

# For two figures per plot.
> par(mfrow=c(1,2))
> curve(expr=cos(x), from=0, to=8*pi)
> curve(expr=x^2, from=0, to=32)

In the following figure, the plot to your left shows the curve for cox(x) and the plot to the right shows the curve for x^2. As you can see, using the from and to arguments, we can specify the x values to show in our figure.

Basic plots and the ggplot2 package

You can also graph scatterplots using the plot() function. For example, we can use the iris dataset as part of R to plot Sepal.Length versus Sepal.Width as follows:

> plot(iris$Sepal.Length, iris$Sepal.Width, main="Iris sepal length vs width measurements", xlab="Length", ylab="Width")

The output is shown in the following plot:

Basic plots and the ggplot2 package

R has built-in functions that allow you to plot other types of graphics such as the barplots(), dotchart(), pie(), and boxplot() functions. The following are some examples using the VADeaths dataset:

> VADeaths
      Rural Male Rural Female Urban Male Urban Female
50-54       11.7          8.7       15.4          8.4
55-59       18.1         11.7       24.3         13.6
60-64       26.9         20.3       37.0         19.3
65-69       41.0         30.9       54.6         35.1
70-74       66.0         54.3       71.1         50.0
> barplot(VADeaths, beside=TRUE, legend=TRUE, ylim=c(0, 100), ylab="Deaths per 1000 population", main="Death rate in VA") #Requires that the data to plot be a vector or a matrix.

The output is shown in the following plot:

Basic plots and the ggplot2 package

However, when working with data frames, it is often much simpler to use the ggplot2 package to make a bar plot, since your data will not have to be converted to a vector or matrix first. However, you need to be aware that ggplot2 often requires that your data be stored in a data frame in long format and not wide format.

The following is an example of data stored in wide format. In this example, we look at the expression level of the MYC and BRCA2 genes in two different cell lines, after these cells were treated with a vehicle-control, drug1 or drug2 for 48 hours:

> geneExpdata.wide <- read.table(header=TRUE, text='
 cell_line gene control drug1 drug2
       CL1   MYC     20.4  15.9  1.5
       CL2   MYC     26.9  18.1  6.7
       CL1   BRCA2     109.5  18.1  89.8
       CL2   BRCA2    121.3  24.4  120.2
 ')

The following is the data rewritten in long format:

> geneExpdata.long <- read.table(header=TRUE, text='
   cell_line  gene variable value
1        CL1   MYC  control  20.4
2        CL2   MYC  control  26.9
3        CL1 BRCA2  control 109.5
4        CL2 BRCA2  control 121.3
5        CL1   MYC    drug1  15.9
6        CL2   MYC    drug1  18.1
7        CL1 BRCA2    drug1  18.1
8        CL2 BRCA2    drug1  24.4
9        CL1   MYC    drug2   1.5
10       CL2   MYC    drug2   6.7
11       CL1 BRCA2    drug2  89.8
12       CL2 BRCA2    drug2 120.2
')

Instead of rewriting the data frame by hand, this process can be automated using the melt() function, which is a part of the reshape2 package:

> library("reshape2")
> geneExpdata.long<- melt(geneExpdata.wide, id.vars=c("cell_line","gene"), measure.vars=c("control", "drug1", "drug2" ), variable.name="condition", value.name="gene_expr_value")

Now, we can plot the data using ggplot2 as follows:

> library("ggplot2")
> ggplot(geneExpdata.long, aes(x=gene, y= gene_expr_value)) + geom_bar(aes(fill=condition), colour="black", position=position_dodge(), stat="identity")

The output is shown in the following plot:

Basic plots and the ggplot2 package

Another useful trick to know is how to add error bars to bar plots. Here, we have a summary data frame of standard deviation (sd), standard error (se), and confidence interval (ci) for the geneExpdata.long dataset as follows:

> geneExpdata.summary <- read.table(header=TRUE, text='
   gene condition N gene_expr_value        sd    se        ci
1 BRCA2   control 2          115.40  8.343860  5.90  74.96661
2 BRCA2     drug1 2           21.25  4.454773  3.15  40.02454
3 BRCA2     drug2 2          105.00 21.496046 15.20 193.13431
4   MYC   control 2           23.65  4.596194  3.25  41.29517
5   MYC     drug1 2           17.00  1.555635  1.10  13.97683
6   MYC     drug2 2            4.10  3.676955  2.60  33.03613
')
> #Note the plot is stored in the p object 
> p<- ggplot(geneExpdata.summary, aes(x=gene, y= gene_expr_value, fill=condition)) + geom_bar(aes(fill=condition), colour="black", position=position_dodge(), stat="identity")
> #Define the upper and lower limits for the error bars
> limits <- aes(ymax = gene_expr_value + se, ymin= gene_expr_value - se)
> #Add error bars to plot
> p + geom_errorbar(limits, position=position_dodge(0.9), size=.3, width=.2)

The result is shown in the following plot:

Basic plots and the ggplot2 package

Going back to the VADeaths example, we could also plot a Cleveland dot plot (dot chart) as follows:

> dotchart(VADeaths,xlim=c(0, 75), xlab=Deaths per 1000, main="Death rates in VA")

Note

Note that the built-in dotchart() function requires that the data be stored as a vector or matrix.

The result is shown in the following plot:

Basic plots and the ggplot2 package

The following are some other graphics you can generate with built-in R functions:

You can generate pie charts with the pie() function as follows:

> labels <- c("grp_A", "grp_B", "grp_C")
> pie_groups <- c(12, 26, 62) 
> pie(pie_groups, labels, col=c("white", "black", "grey")) #Fig. 3B

You can generate box-and-whisker plots with the boxplot() function as follows:

> boxplot(value ~ variable, data= geneExpdata.long, subset=gene == "MYC", ylab="expression value", main="MYC Expression by Condition", cex.lab=1.5, cex.main=1.5)

Note

Note that unlike other built-in R graphing functions, the boxplot() function takes data frames as the input.

Using our cell line drug treatment experiment, we can graph MYC expression for all cell lines by condition. The result is shown in the following plot:

Basic plots and the ggplot2 package

The following is another example using the iris dataset to plot Petal.Width by Species:

> boxplot(Petal.Width ~ Species, data=iris, ylab="petal width", cex.lab=1.5, cex.main=1.5)

The result is shown in the following plot:

Basic plots and the ggplot2 package
lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image