Data transformation with dplyr
Data transformation refers to a collection of techniques for performing row-level treatment on the raw data using dplyr
functions. In this section, we will cover five fundamental functions for data transformation: filter()
, arrange()
, mutate()
, select()
, and top_n()
.
Slicing the dataset using the filter() function
One of the biggest highlights of the tidyverse
ecosystem is the pipe operator, %>%
, which provides the statement before it as the contextual input for the statement after it. Using the pipe operator gives us better clarity in terms of code structuring, besides saving the need to type multiple repeated contextual statements. Let’s go through an exercise on how to use the pipe operator to slice the iris
dataset using the filter()
function.
Exercise 2.02 – filtering using the pipe operator
For this exercise, we have been asked to keep only the setosa
species in the iris
dataset using the pipe operator and the filter...