Filtering data by string matching
Although some filtering algorithms were already discussed in the previous chapters, the dplyr
package contains some magic features that have not yet been covered and are worth mentioning here. As we all know by this time, the
subset
function in base
, or the filter
function from dplyr
is used for filtering rows, and the
select
function can be used to choose a subset of columns.
The function filtering rows usually takes an R expression, which returns the IDs of the rows to drop, similar to the which
function. On the other hand, providing such R expressions to describe column names is often more problematic for the select
function; it's harder if not impossible to evaluate R expressions on column names.
The dplyr
package provides some useful functions to select some columns of the data, based on column name patterns. For example, we can keep only the variables ending with the string, delay
:
> library(dplyr) > library(hflights) > str(select(hflights...