Counting the SAX representations of a time series
This section of the chapter presents a utility that counts the SAX representations of a time series. The Python data structure behind the logic of the utility is a dictionary, where the keys are the SAX representations converted into strings and the values are integers.
The code for counting.py
is as follows:
#!/usr/bin/env python3 import sys import pandas as pd from sax import sax def main(): if len(sys.argv) != 5: print("TS1 sliding_window cardinality segments") print("Suggestion: The window be a power of 2.") print("The cardinality SHOULD be a power of 2.") sys.exit() file = sys.argv[1] sliding = int(sys.argv[2]) cardinality = int(sys.argv[3]) segments = int(sys.argv[4]) if sliding % segments != 0: print("sliding MODULO segments != 0...") sys.exit() if sliding <= 0: print("Sliding value is not allowed:", sliding) sys.exit() if cardinality <= 0: print("Cardinality Value is not allowed:", cardinality) sys.exit() ts = pd.read_csv(file, names=['values'], compression='gzip') ts_numpy = ts.to_numpy() length = len(ts_numpy) KEYS = {} for i in range(length - sliding + 1): t1_temp = ts_numpy[i:i+sliding] # Generate SAX for each subsequence tempSAXword = sax.createPAA(t1_temp, cardinality, segments) tempSAXword = tempSAXword[:-1] if KEYS.get(tempSAXword) == None: KEYS[tempSAXword] = 1 else: KEYS[tempSAXword] = KEYS[tempSAXword] + 1 for k in KEYS.keys(): print(k, ":", KEYS[k]) if __name__ == '__main__': main()
The for
loop splits the time series into subsequences and computes the SAX representation of each subsequence using sax.createPAA()
, before updating the relevant counter in the KEYS
dictionary. The tempSAXword = tempSAXword[:-1]
statement removes an unneeded underscore character from the SAX representation. Finally, we print the content of the KEYS
dictionary.
The output of counting.py
should be similar to the following:
$ ./counting.py ts1.gz 4 4 2 10_01 : 18 11_00 : 8 01_10 : 14 00_11 : 7
What does this output tell us?
For a time series with 50 elements (ts1.gz
) and a sliding window size of 4, there exist 18
subsequences with the 10_01
SAX representation, 8
subsequences with the 11_00
SAX representation, 14
subsequences with the 01_10
SAX representation, and 7
subsequences with the 00_11
SAX representation. For easier comparison, and to be able to use a SAX representation as a key to a dictionary, we convert [01 10]
into the 01_10
string, [11 00]
into 11_00
, and so on.
How many subsequences does a time series have?
Keep in mind that given a time series with n
elements and a sliding window size of w
, the total number of subsequences is n – w +
1
.
counting.py
can be used for many practical tasks and will be updated in Chapter 3.
The next section discusses a handy Python package that can help us learn more about processing our time series from a statistical point of view.