Time for action – performing simple statistics
We can use some kind of threshold to weed out outliers, but there is a better way. It is called the median, and it basically picks the middle value of a sorted set of values (see https://www.khanacademy.org/math/probability/descriptive-statistics/central_tendency/e/mean_median_and_mode). One half of the data is below the median and the other half is above it. For example, if we have the values of 1, 2, 3, 4, and 5, then the median will be 3, since it is in the middle.
These are the steps to calculate the median:
Create a new Python script and call it
simplestats.py
. You already know how to load the data from a CSV file into an array. So, copy that line of code and make sure that it only gets the close price. The code should appear like this:c=np.loadtxt('data.csv', delimiter=',', usecols=(6,), unpack=True)
The function that will do the magic for us is called
median()
. We will call it and print the result immediately. Add the following line of code...