The purpose of sampling
Sampling is the process to extract a subset of a dataset that is chosen to draw inferences about the properties of this dataset. It is not always practical to use an entire dataset for the following reasons:
Dataset is too large
Dataset is not available in a timely fashion
Extraction of complex features is very computationally intensive
A very large percentage of the training data is labeled to one of the classes which require down-sampling
Data is a continuous signal
The most commonly-cited benefits of sampling are reduction of computation cost and latency of execution.
Note
Independent and identical distribution
It is generally assumed that the original dataset reflects an independent and identically distributed population (i.i.d).
The challenge is to devise a procedure to generate a sample that represents accurately the original dataset so that any inference derived from the sample applies equally to the original dataset.