The following image shows one of the heat maps that we are going to create in this recipe from the total count of air passengers:
Download the script 5644_01_01.r
from your account at http://www.packtpub.com and save it to your hard disk. The first section of the script, below the comment line starting with ### loading packages
, will automatically check for the availability of the R packages gplots
and lattice
, which are required for this recipe.
If those packages are not already installed, you will be prompted to select an official server from the Comprehensive R Archive Network (CRAN) to allow the automatic download and installation of the required packages.
If you have already installed those two packages prior to executing the script, I recommend you to update them to the most recent version by calling the following function in the R command line:
Tip
Use the source()
function in the R command-line to execute an external script from any location on your hard drive.
If you start a new R session from the same directory as the location of the script, simply provide the name of the script as an argument in the function call as follows:
You have to provide the absolute or relative path to the script on your hard drive if you started your R session from a different directory to the location of the script. Refer to the following example:
You can view the current working directory of your current R session by executing the following command in the R command-line:
Run the 5644OS_01_01.r
script in R to execute the following code, and take a look at the output printed on the screen as well as the PDF file, first_heatmaps.pdf
that will be created by this script:
Tip
Downloading the example code
You can download the example code files for all Packt books you have purchased from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.
By default, levelplot()
places the color key on the right-hand side of the heat map, but it can be easily moved to the top, bottom, or left-hand side of the map by modifying the space
parameter of colorkey
:
Replacing top
by left
or bottom
will place the color key on the left-hand side or on the bottom of the heat map, respectively.
Moving around the color key for heatmap.2()
can be a little bit more of a hassle. In this case we have to modify the parameters of the layout()
function. By default, heatmap.2()
passes a matrix, lmat
, to layout()
, which has the following content:
The numbers in the preceding matrix specify the locations of the different visual elements on the plot (1
implies heat map, 2
implies row dendrogram, 3
implies column dendrogram, and 4
implies key). If we want to change the position of the key, we have to modify and rearrange those values of lmat
that heatmap.2()
passes to layout()
.
For example, if we want to place the color key at the bottom left-hand corner of the heat map, we need to create a new matrix for lmat
as follows:
We can construct such a matrix by using the rbind()
function and assigning it to lmat
:
Furthermore, we have to pass an argument for the column height parameter lhei
to heatmap.2()
, which will allow us to use our modified lmat
matrix for rearranging the color key:
If you don't need a color key for your heat map, you could turn it off by using the argument key = FALSE
for heatmap.2()
and colorkey = FALSE
for levelplot()
, respectively.
Tip
R also has a base function for creating heat maps that does not require you to install external packages and is most advantageous if you can go without a color key. The syntax is very similar to the heatmap.2()
function, and all options for heatmap.2()
that we have seen in this recipe also apply to heatmap()
:
More information on dendrograms and clustering
By default, the dendrograms of heatmap.2()
are created by a hierarchical agglomerate clustering method, also known as bottom-up clustering.
In this approach, all individual objects start as individual clusters and are successively merged until only one single cluster remains. The distance between a pair of clusters is calculated by the farthest neighbor method, also called the complete linkage method, which is based by default on the Euclidean distance of the two points from both clusters that are farthest apart from each other. The computed dendrograms are then reordered based on the row and column means.
By modifying the default parameters of the dist()
function, we can use another distance measure rather than the Euclidean distance. For example, if we want to use the Manhattan distance measure (based on a grid-like path rather than a direct connection between two objects), we would modify the method
parameter of the dist()
function and assign it to a variable distance
first:
Other options for the method
parameter are: euclidean
(default), maximum
, canberra
, binary
, or minkowski
.
To use other agglomeration methods than the complete linkage method, we modify the method
parameter in the hclust()
function and assign it to another variable cluster
. Note the first argument distance
that we pass to the hclust()
function, which comes from our previous assignment:
By setting the method
parameter to ward
, R will use Joe H. Ward's minimum variance method for hierarchical clustering. Other options for the method
parameter that we can pass as arguments to hclust()
are: complete
(default), single
, average
, mcquitty
, median
, or centroid
.
To use our modified clustering parameters, we simply call the as.dendrogram()
function within heatmap.2()
using the variable cluster
that we assigned previously:
We can also draw the cluster dendrogram without the heat map by using the plot()
function:
Tip
To turn off row and column reordering, we need to turn off the dendrograms and set the parameters Colv
and Rowv
to NA
: