Histograms and density plots
Histograms are plots used to explore how one or more quantitative variables are distributed. To show some examples of histograms, we will use the iris data. This dataset contains measurements in centimetres of the length and width variables of the sepal and petal, and these measurements are available for 50 flowers from each of three species of iris: Iris setosa, versicolor, and virginica. You can get more details upon running ?iris
.
The geometric attribute used to produce histograms is defined simply by specifying geom="histogram"
in the qplot
function. This default histogram will represent the variable specified in the function on the x axis, while the y axis will represent the number of elements in each bin. One other very useful way of representing distributions is to look at the kernel density function, which represents an approximation of the distribution of the data as a continuous function instead of different bins, by estimating the probability...