Visualization is a powerful way to communicate information through graphical means. Visual presentations make data easier to comprehend. This recipe presents some basic functions to plot charts, and demonstrates how visualizations are helpful in data exploration.
Visualizing data
Getting ready
Ensure that you have completed the previous recipes by installing R on your operating system.
How to do it...
Perform the following steps to visualize a dataset:
- Load the iris data into the R session:
> data(iris)
- Calculate the frequency of species within the iris using the table command:
> table.iris = table(iris$Species) > table.iris Output:
setosa versicolor virginica 50 50 50
- As the frequency in the table shows, each species represents 1/3 of the iris data. We can draw a simple pie chart to represent the distribution of species within the iris:
> pie(table.iris) Output:
- The histogram creates a frequency plot of sorts along the x-axis. The following example produces a histogram of the sepal length:
> hist(iris$Sepal.Length)
- In the histogram, the x-axis presents the sepal length and the y-axis presents the count for different sepal lengths. The histogram shows that for most irises, sepal lengths range from 4 cm to 8 cm.
- Boxplots, also named box and whisker graphs, allow you to convey a lot of information in one simple plot. In such a graph, the line represents the median of the sample. The box itself shows the upper and lower quartiles. The whiskers show the range:
> boxplot(Petal.Width ~ Species, data = iris)
- The preceding screenshot clearly shows the median and upper range of the petal width of the setosa is much shorter than versicolor and virginica. Therefore, the petal width can be used as a substantial attribute to distinguish iris species.
- A scatter plot is used when there are two variables to plot against one another. This example plots the petal length against the petal width and color dots in accordance to the species it belongs to:
> plot(x=iris$Petal.Length, y=iris$Petal.Width, col=iris$Species)
- The preceding screenshot is a scatter plot of the petal length against the petal width. As there are four attributes within the iris dataset, it takes six operations to plot all combinations. However, R provides a function named pairs, which can generate each subplot in one figure:
> pairs(iris[1:4], main = "Edgar Anderson's Iris Data", pch = 21,
bg = c("red", "green3", "blue")[unclass(iris$Species)])
How it works...
R provides many built-in plot functions, which enable users to visualize data with different kinds of plots. This recipe demonstrates the use of pie charts that can present category distribution. A pie chart of an equal size shows that the number of each species is equal. A histogram plots the frequency of different sepal lengths. A box plot can convey a great deal of descriptive statistics, and shows that the petal width can be used to distinguish an iris species. Lastly, we introduced scatter plots, which plot variables on a single plot. In order to quickly generate a scatter plot containing all the pairs of iris dataset, one may use the pairs command.
See also
- ggplot2 is another plotting system for R, based on the implementation of Leland Wilkinson's grammar of graphics. It allows users to add, remove, or alter components in a plot with a higher abstraction. However, the level of abstraction results is slow compared to lattice graphics. For those of you interested in the topic of ggplot, you can refer to this site: http://ggplot2.org/.