Sampling data with dplyr
As a single machine cannot efficiently process big data problems, a practical approach is to take samples that we can effectively use to draw conclusions. Here, we will show you how to use dplyr
to sample from data.
Getting ready
Ensure that you installed and loaded data.table
in your R session. You also need to complete the Enhancing a data.frame with a data.table recipe to load purchase_view.tab
and purchase_order.tab
as both data.frame
and data.table
into your R environment.
How to do it…
Perform the following steps to sample data with dplyr
:
First, we can sample six rows from the data:
> set.seed(123) > sample_n(order.dt, 6, replace = TRUE) Time Action User Product Quantity Price 1: 2015-07-10 09:22:37 order U46651253 P0004306934 1 750 2: 2015-07-25 21:42:34 order U232322558 P0014273055 1 3688 3: 2015-07-13 22:55:33 order U14804834 P0013147260 1 32900 4: 2015-07-29 08:48:18 order U364096419 P0003425855...