Tidying a long format table into a tidy table with tidyr
In this recipe, we look at the complementary operation to that of the Tidying a wide format table into a tidy table with tidyr recipe. We’ll take a long table and split one of its columns out to make multiple new columns. Initially, this might seem like we’re now violating our tidy data frame requirement, but we do occasionally come across data frames that have more than one variable squeezed into a single column. As in the previous recipe, tidyr
has a specification-based function to allow us to correct our data frame.
Getting ready
We’ll use the tidyr
package and the treatment
data frame in the rbioinfcookbook
package. This data frame has four columns, one of which—measurement
—has got two variable names in it that need splitting into columns of their own.
How to do it…
In stark contrast to the Tidying a wide format table into a tidy table with tidyr recipe, this expression is extremely terse; we can tidy the wide table very easily:
library(rbioinfcookbook)library(tidyr) treatments |> pivot_wider( names_from = measurement, values_from = value )
This is so simple because all the data we need is already in the data frame.
How it works…
In this very simple-looking recipe, the specification is gloriously clear: simply take the measurement
column and create new column names from its values, moving the value appropriately. The names_from
argument specifies the column to split, and values_from
specifies where its values come from.
There’s more…
It is quite possible to incorporate values from more than one column at a time; just pass a vector of columns to the names_from
argument, and you can format the computed column names in the output with names_glue
.