Sampling a dataset
The dplyr
package has a function to gather a sample from your dataset, sample()
. You pass in the dataset to operate against and how many samples you want drawn, sample_n()
, and the fraction percentage, sample_frac()
, as in this example:
data <- sample_n(players, 30)glimpse(data)
We see the results as shown in the following screenshot:
![](https://static.packt-cdn.com/products/9781785880070/graphics/caa74df2-04f8-4897-8ddd-d38927df639f.png)
Note that there are 30 observations in the results set, as requested.
Filtering rows in a data frame
Another function we can use is the filter
function. The filter
function takes a data frame as an argument and a filtering statement. The function passes over each row of the data frame and returns those rows that meet the filtering statement:
#filter only players with over 200 hits in a season over200 <- filter(players, h > 200) head(over200) nrow(over200)
![](https://static.packt-cdn.com/products/9781785880070/graphics/6120529e-bd5c-4c14-9f64-dfd3399e540a.png)
it looks like many players were capable of 200 hits a season. How about if we look at those players that could also get over 40 home runs in a season?
over200and40hr <- filter(players...