Basic plots and the ggplot2 package
This section will review how to make basic plots using the built-in R functions and the ggplot2
package to plot graphics.
Basic plots in R include histograms and scatterplots. To plot a histogram, we use the hist()
function:
The output is shown in the following plot:
You can plot mathematical formulas with the plot()
function as follows:
The output is shown in the following plot:
You can graph a univariate mathematical function on an interval using the curve()
function with the from
and to
arguments to set the left and right endpoints, respectively. The expr
argument allows you to set a numeric vector or function that returns a numeric vector as an output, as follows:
In the following figure, the plot to your left shows the curve for cox(x)
and the plot to the right shows the curve for x^2
. As you can see, using the from
and to
arguments, we can specify the x
values to show in our figure.
You can also graph scatterplots using the plot()
function. For example, we can use the iris
dataset as part of R to plot Sepal.Length
versus Sepal.Width
as follows:
The output is shown in the following plot:
R has built-in functions that allow you to plot other types of graphics such as the barplots()
, dotchart()
, pie()
, and boxplot()
functions. The following are some examples using the VADeaths
dataset:
The output is shown in the following plot:
However, when working with data frames, it is often much simpler to use the ggplot2
package to make a bar plot, since your data will not have to be converted to a vector or matrix first. However, you need to be aware that ggplot2
often requires that your data be stored in a data frame in long format and not wide format.
The following is an example of data stored in wide format. In this example, we look at the expression level of the MYC
and BRCA2
genes in two different cell lines, after these cells were treated with a vehicle-control, drug1
or drug2
for 48 hours:
The following is the data rewritten in long format:
Instead of rewriting the data frame by hand, this process can be automated using the melt()
function, which is a part of the reshape2
package:
Now, we can plot the data using ggplot2
as follows:
The output is shown in the following plot:
Another useful trick to know is how to add error bars to bar plots. Here, we have a summary data frame of standard deviation (sd
), standard error (se
), and confidence interval (ci
) for the geneExpdata.long
dataset as follows:
The result is shown in the following plot:
Going back to the VADeaths
example, we could also plot a Cleveland dot plot (dot chart) as follows:
Note
Note that the built-in dotchart()
function requires that the data be stored as a vector or matrix.
The result is shown in the following plot:
The following are some other graphics you can generate with built-in R functions:
You can generate pie charts with the pie()
function as follows:
You can generate box-and-whisker plots with the boxplot()
function as follows:
Note
Note that unlike other built-in R graphing functions, the boxplot()
function takes data frames as the input.
Using our cell line drug treatment experiment, we can graph MYC expression for all cell lines by condition. The result is shown in the following plot:
The following is another example using the iris dataset to plot Petal.Width
by Species
:
The result is shown in the following plot: