Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Time Series Indexing

You're reading from   Time Series Indexing Implement iSAX in Python to index time series with confidence

Arrow left icon
Product type Paperback
Published in Jun 2023
Publisher Packt
ISBN-13 9781838821951
Length 248 pages
Edition 1st Edition
Languages
Arrow right icon
Author (1):
Arrow left icon
Mihalis Tsoukalos Mihalis Tsoukalos
Author Profile Icon Mihalis Tsoukalos
Mihalis Tsoukalos
Arrow right icon
View More author details
Toc

Table of Contents (11) Chapters Close

Preface 1. Chapter 1: An Introduction to Time Series and the Required Python Knowledge 2. Chapter 2: Implementing SAX FREE CHAPTER 3. Chapter 3: iSAX – The Required Theory 4. Chapter 4: iSAX – The Implementation 5. Chapter 5: Joining and Comparing iSAX Indexes 6. Chapter 6: Visualizing iSAX Indexes 7. Chapter 7: Using iSAX to Approximate MPdist 8. Chapter 8: Conclusions and Next Steps 9. Index 10. Other Books You May Enjoy

Counting the SAX representations of a time series

This section of the chapter presents a utility that counts the SAX representations of a time series. The Python data structure behind the logic of the utility is a dictionary, where the keys are the SAX representations converted into strings and the values are integers.

The code for counting.py is as follows:

#!/usr/bin/env python3
import sys
import pandas as pd
from sax import sax
def main():
     if len(sys.argv) != 5:
           print("TS1 sliding_window cardinality segments")
           print("Suggestion: The window be a power of 2.")
           print("The cardinality SHOULD be a power of 2.")
           sys.exit()
     file = sys.argv[1]
     sliding = int(sys.argv[2])
     cardinality = int(sys.argv[3])
     segments = int(sys.argv[4])
     if sliding % segments != 0:
           print("sliding MODULO segments != 0...")
           sys.exit()
     if sliding <= 0:
           print("Sliding value is not allowed:", sliding)
           sys.exit()
     if cardinality <= 0:
           print("Cardinality Value is not allowed:", cardinality)
           sys.exit()
     ts = pd.read_csv(file, names=['values'], compression='gzip')
     ts_numpy = ts.to_numpy()
     length = len(ts_numpy)
     KEYS = {}
     for i in range(length - sliding + 1):
           t1_temp = ts_numpy[i:i+sliding]
           # Generate SAX for each subsequence
           tempSAXword = sax.createPAA(t1_temp, cardinality, segments)
           tempSAXword = tempSAXword[:-1]
           if KEYS.get(tempSAXword) == None:
                 KEYS[tempSAXword] = 1
           else:
                 KEYS[tempSAXword] = KEYS[tempSAXword] + 1
     for k in KEYS.keys():
           print(k, ":", KEYS[k])
if __name__ == '__main__':
     main()

The for loop splits the time series into subsequences and computes the SAX representation of each subsequence using sax.createPAA(), before updating the relevant counter in the KEYS dictionary. The tempSAXword = tempSAXword[:-1] statement removes an unneeded underscore character from the SAX representation. Finally, we print the content of the KEYS dictionary.

The output of counting.py should be similar to the following:

$ ./counting.py ts1.gz 4 4 2
10_01 : 18
11_00 : 8
01_10 : 14
00_11 : 7

What does this output tell us?

For a time series with 50 elements (ts1.gz) and a sliding window size of 4, there exist 18 subsequences with the 10_01 SAX representation, 8 subsequences with the 11_00 SAX representation, 14 subsequences with the 01_10 SAX representation, and 7 subsequences with the 00_11 SAX representation. For easier comparison, and to be able to use a SAX representation as a key to a dictionary, we convert [01 10] into the 01_10 string, [11 00] into 11_00, and so on.

How many subsequences does a time series have?

Keep in mind that given a time series with n elements and a sliding window size of w, the total number of subsequences is n – w + 1.

counting.py can be used for many practical tasks and will be updated in Chapter 3.

The next section discusses a handy Python package that can help us learn more about processing our time series from a statistical point of view.

You have been reading a chapter from
Time Series Indexing
Published in: Jun 2023
Publisher: Packt
ISBN-13: 9781838821951
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image