Generating controlled random datasets
In this recipe, we will show different ways of generating random number sequences and word sequences. Some of the examples use standard Python modules, and some use NumPy/SciPy functions.
We will go into some statistics terminology, but we will explain every term so you don't have to have a statistical reference book with you while reading this recipe.
We generate artificial datasets using common Python modules. By doing so, we are able to understand distributions, variance, sampling, and similar statistical terminology. More importantly, we can use this fake data as a way to understand if our statistical method is capable of discovering models we want to discover. We can do that because we know the model in advance and verify our statistical method by applying it over our known data. In real life, we don't have that ability and there is always a percentage of uncertainty that we must assume, giving way to errors.
Getting ready
We don't need anything new...