Working with R packages for Excel manipulation
There are several packages available both on CRAN and on GitHub that allow for reading and manipulation of Excel files. In this section, we are specifically going to focus on the packages: readxl
, openxlsx
, and xlsx
to read Excel files. These three packages all have their own functions to read Excel files. These functions are as follows:
readxl::read_excel()
openxlsx::read.xlsx()
xlsx::read.xlsx()
Each function has a set of parameters and conventions to follow. Since readxl
is part of the tidyverse
collection of packages, it follows its conventions and returns a tibble
object upon reading the file. If you do not know what a tibble is, it is a modern version of R’s data.frame
, a sort of spreadsheet in the R environment. It is the building block of most analyses. Moving on to openxlsx
and xlsx
, they both return a base R data.frame
object, with the latter also able to return a list
object. If you are wondering how this relates to manipulating an actual Excel file, I can explain. First, to manipulate something in R, the data must be in the R environment, so you cannot manipulate the file unless the data is read in. These packages have different functions for manipulating Excel or reading data in certain ways that allow for further analysis and or manipulation. It is important to note that xlsx
does require Java to be installed.
As we transition from our exploration of R packages for Excel manipulation, we’ll turn our attention to the crucial task of effectively reading Excel files into R, thereby unlocking even more possibilities for data analysis and manipulation.