Exploring the PeMS-M dataset
In this section, we will explore our dataset to find patterns and get insights that will be useful to the task of interest.
The dataset we will use for this application is the medium variant of the PeMSD7
dataset [1]. The original dataset was obtained by collecting traffic speed from 39,000 sensor stations on the weekdays of May and June 2012 using the Caltrans Performance Measurement System (PeMS). We will only consider 228 stations across District 7 of California in the medium variant. These stations output 30-second speed measurements that are aggregated into 5-minute intervals in this dataset. For example, the following figure shows the Caltrans PeMS (pems.dot.ca.gov) with various traffic speeds:
Figure 15.1 – Traffic data from Caltrans PeMS with high speed (>60 mph) in green and low speed (<35 mph) in red
We can directly load the dataset from GitHub and unzip it:
from io import BytesIO from urllib.request...