Time for action – sampling with numpy.random.choice()
We will use the numpy.random.choice()
function to perform bootstrapping.
- Start the IPython or Python shell and import NumPy:
$ ipython In [1]: import numpy as np
- Generate a data sample following the normal distribution:
In [2]: N = 500 In [3]: np.random.seed(52) In [4]: data = np.random.normal(size=N)
- Calculate the mean of the data:
In [5]: data.mean() Out[5]: 0.07253250605445645
Generate
100
samples from the original data and calculate their means (of course, more samples may lead to a more accurate result):In [6]: bootstrapped = np.random.choice(data, size=(N, 100)) In [7]: means = bootstrapped.mean(axis=0) In [8]: means.shape Out[8]: (100,)
- Calculate the mean, variance, and standard deviation of the arithmetic means we obtained:
In [9]: means.mean() Out[9]: 0.067866373318115278 In [10]: means.var() Out[10]: 0.001762807104774598 In [11]: means.std() Out[11]: 0.041985796464692651
If we are assuming a normal distribution for...