Time for action – doing simple statistics
We can use some kind of threshold to weed out outliers, but there is a better way. It is called the median, and it basically picks the middle value of a sorted set of values. For example, if we have the values of 1, 2, 3, 4, and 5, the median would be 3, since it is in the middle. The following are the steps to calculate the median:
Determine the median of the close price. Create a new Python script and call it
simplestats.py
. You already know how to load the data from a CSV file into an array. So, copy that line of code and make sure that it only gets the close price. The code should appear like the following, by now:c=np.loadtxt('data.csv', delimiter=',', usecols=(6,), unpack=True)
The function that will do the magic for us is called median. We will call it and print the result immediately. Add the following line of code:
print "median =", np.median(c)
The program prints the following output:
median = 352.055
Since it is our first time using the median...