The tsfresh Python package
This is a bonus section not directly related to the subject of the book, but it is helpful, nonetheless. It is about a handy Python package called tsfresh
, which can give you a good overview of your time series from a statistical perspective. We are not going to present all the capabilities of tsfresh
, just the ones that you can easily use to get information about your time series data – at this point, you might need to install tsfresh
on your machine. Keep in mind that the tsfresh
package has lots of package dependencies.
So, we are going to compute the following properties of a dataset – in this case, a time series:
- Mean value: The mean value of a dataset is the summary of all the values divided by the number of values.
- Standard deviation: The standard deviation of a dataset measures the amount of variation in it. There is a formula to calculate the standard deviation, but we usually compute it using a function from a Python package.
- Skewness: The skewness of a dataset is a measure of the asymmetry in it. The value of skewness can be positive, negative, zero, or undefined.
- Kurtosis: The kurtosis of a dataset is a measure of the tailedness of a dataset. In more mathematical terms, kurtosis measures the heaviness of the tail of a distribution compared to a normal distribution.
All these quantities will make much more sense once you plot your data, which is left as an exercise for you; otherwise, they will be just numbers. So, now that we know some basic statistic terms, let us present a Python script that calculates all these quantities for a time series.
The Python code for using_tsfresh.py
is as follows:
#!/usr/bin/env python3 import sys import pandas as pd import tsfresh def main(): if len(sys.argv) != 2: print("TS") sys.exit() TS1 = sys.argv[1] ts1Temp = pd.read_csv(TS1, compression='gzip') ta = ts1Temp.to_numpy() ta = ta.reshape(len(ta)) # Mean value meanValue = tsfresh.feature_extraction.feature_calculators.mean(ta) print("Mean value:\t\t", meanValue) # Standard deviation stdDev = tsfresh.feature_extraction.feature_calculators.standard_deviation(ta) print("Standard deviation:\t", stdDev) # Skewness skewness = tsfresh.feature_extraction.feature_calculators.skewness(ta) print("Skewness:\t\t", skewness) # Kurtosis kurtosis = tsfresh.feature_extraction.feature_calculators.kurtosis(ta) print("Kurtosis:\t\t", kurtosis) if __name__ == '__main__': main()
The output of using_tsfresh.py
when processing ts1.gz
should look similar to the following:
$ ./using_tsfresh.py ts1.gz Mean value: 15.706410001204729 Standard deviation: 8.325017802111901 Skewness: 0.008971113265160474 Kurtosis: -1.2750042973761417
The tsfresh
package can do many more things; we have just presented the tip of the iceberg of the capabilities of tsfresh
.
The next section is about creating a histogram of a time series.