Rearranging data
Sometimes, we do not want to filter any part of the data (neither the rows, nor the columns), but the data is simply not in the most useful order due to convenience or performance issues, as we have seen, for instance, in Chapter 3, Filtering and Summarizing Data.
Besides the base sort
and order
functions, or providing the order of variables passed to the [
operator, we can also use some SQL-like solutions with the sqldf
package, or query the data in the right format directly from the database. And the previously mentioned dplyr
package also provides an effective method for ordering data. Let's sort the hflights
data, based on the actual elapsed time for each of the quarter million flights:
> str(arrange(hflights, ActualElapsedTime)) 'data.frame': 227496 obs. of 21 variables: $ Year : int 2011 2011 2011 2011 2011 2011 ... $ Month : int 7 7 8 9 1 4 5 6 7 8 ... $ DayofMonth : int 24 25 13 21 3 29 9 21 8 2 ... $ DayOfWeek...