Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds
Arrow up icon
GO TO TOP
ggplot2 Essentials

You're reading from   ggplot2 Essentials Explore the full range of ggplot2 plotting capabilities to create meaningful and spectacular graphs

Arrow left icon
Product type Paperback
Published in Jun 2015
Publisher
ISBN-13 9781785283529
Length 234 pages
Edition 1st Edition
Languages
Tools
Arrow right icon
Author (1):
Arrow left icon
Donato Teutonico Donato Teutonico
Author Profile Icon Donato Teutonico
Donato Teutonico
Arrow right icon
View More author details
Toc

Bar charts

Bar charts are usually used to explore how one (or more) categorical variables are distributed. In qplot(), this is done using the geom option bar. This geometry counts the number of occurrences of each factor variable, which appears in the data. To show an example of the bar chart, we will use the movies dataset, which is included within the ggplot2 package. We have already seen how to recall the dataset included with the basic installation of R, but if you are interested in the list of datasets within a specific package (ggplot2 in this case), you can use the following code:

require(ggplot2)       ## Load ggplot2 if needed
data(package="ggplot2")  ## List of dataset within ggplot2

The movies dataset contains information about movies, including the rating, from the http://imdb.com/ website. You can get a more detailed description in the help page of the dataset.

This dataset contains different variables but, for our example, we will not need all of them, so let´s rearrange a bit of its content. For our exercise, we are first interested in knowing how many movies were produced in each category - Action, Animation, Comedy, Drama, Documentary, and Romance. Let's also keep in the dataset the information about the movie budget, whether it was a short or regular movie, its year, and so on. So, the steps covered in our code are:

  1. Load the data.
  2. Extract from the dataset the information for each movie type concerning budget and length.
  3. Create a factor variable containing the movie type.

The header of our final dataset, called myMovieData, will then be Budget, Short, Year, and Type. So, here's our code:

d1 <-data.frame(movies[movies$Action==1, c("budget", "Short", "year")])
d1$Type <- "Animation"
d2 <-data.frame(movies[movies$Animation==1, c("budget", "Short", "year")])
d2$Type <- "Animation"
d3 <-data.frame(movies[movies$Comedy==1, c("budget", "Short", "year")])
d3$Type <- "Comedy"
d4 <-data.frame(movies[movies$Drama==1, c("budget", "Short", "year")])
d4$Type <- "Drama"
d5 <-data.frame(movies[movies$Documentary==1, c("budget", "Short", "year")])
d5$Type <- "Documentary"
d6 <-data.frame(movies[movies$Romance==1, c("budget", "Short", "year")])
d6$Type <- "Romance"
myMovieData <- rbind(d1, d2, d3, d4, d5, d6)
names(myMovieData) <- c("Budget", "Short", "Year", "Type" )

Now that our data is ready, let's create our first bar chart. In general, we will follow the same structure as the other plots, just replacing the geom specification:

qplot(Type, data=myMovieData , geom="bar", fill=Type)

This standard bar chart will generate bars representing the count of each element (the movie type) for each type available. Since we have also assigned the fill aesthetic attribute to the same type variable, we also obtain the coloring of each bar in a different way. The plot generated is represented in Figure 2.5:

Bar charts

Figure 2.5: This shows a bar chart of the different movie types

In the plot we just created, the bars are colored differently depending on the movie type. However, we can use the fill argument in a more useful way. In fact, we could also require a different color based on the value of a second variable, in this way adding more information to the plot. In our simple example, we can split each bar by the relative amount of a short or regular movie. This is done simply by assigning the Short column to the fill argument as shown in the following code:

qplot(Type, data=myMovieData , geom="bar", fill=factor(Short))

The result is shown in Figure 2.6. As illustrated, we can now see the movie counts for short and regular movies, summing up the total number of movies for each type.

Bar charts

Figure 2.6: This shows a bar chart of the different movie types with filling split by movie length

As you probably noticed in this last example, we assigned the Short variable to the fill argument, but in the assignment, we also converted the variable to factor, while in the previous example, when we used the Type variable, we did not do so. The reason is that the fill aesthetic attribute, in this case, needed a discrete variable, which defined different levels. These, in turn, were assigned to different colors. The Type variable of the previous example was already a factor, where each level represented the movie type. On the other hand, the Short variable is actually numeric: 0 for regular movies and 1 for short movies. For this reason, we had to convert it first to a factor, so qplot could identify this variable as indicating two levels of a discrete variable. We will also discuss in detail the assignment of discrete and continuous variables in Chapter 4, Advanced Plotting Techniques. You can check out the class of the two columns with the following code:

 > class(myMovieData$Short)
[1] "integer"
> class(myMovieData$Type)
[1] "factor"

One last thing to mention about bar charts is the position argument of the qplot function. Such argument defines the way you would like to display the bars within the chart. The three main options are stack, dodge, and fill. The stack option puts the bars with the same x value on top of each other; the dodge option places the bars next to each other for the same x value; and the fill option places the bars on top of each other but normalizes the height to 1. The following code shows the position adjustment applied to our last example:

qplot(Type, data=myMovieData, geom="bar", fill=factor(Short), position="stack")
qplot(Type, data=myMovieData, geom="bar", fill=factor(Short), position="dodge")
qplot(Type, data=myMovieData, geom="bar", fill=factor(Short), position="fill")

Figure 2.7 shows you the resulting plot for each option:

Bar charts

Figure 2.7: This shows the bar chart of different movie types with filling split by movie length for different displays of bars—stack (A), dodge (B), and fill (C)

lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image