Loading a time series using pandas
In this first recipe, we start by loading a dataset in a Python session using pandas
. Throughout this book, we’ll work with time series using pandas
data structures. pandas
is a useful Python package for data analysis and manipulation. Univariate time series can be structured as pandas
Series objects, where the values of the series have an associated index or timestamp with a pandas.Index
structure.
Getting ready
We will focus on a dataset related to solar radiation that was collected by the U.S. Department of Agriculture. The data, which contains information about solar radiation (in watts per square meter), spans from October 1, 2007, to October 1, 2013. It was collected at an hourly frequency totaling 52,608 observations.
You can download the dataset from the GitHub URL provided in the Technical requirements section of this chapter. You can also find the original source at the following URL: https://catalog.data.gov/dataset/data-from-weather-snow-and-streamflow-data-from-four-western-juniper-dominated-experimenta-b9e22.
How to do it…
The dataset is a .csv
file. In pandas
, we can load a .csv
file using the pd.read_csv
()
function:
import pandas as pd data = pd.read_csv('path/to/data.csv', parse_dates=['Datetime'], index_col='Datetime') series = data['Incoming Solar']
In the preceding code, note the following:
- First, we import
pandas
using theimport
keyword. Importing this library is a necessary step to make its methods available in a Python session. - The main argument to
pd.read_csv
is the file location. Theparse_dates
argument automatically converts the input variables (in this case,Datetime
) into a datetime format. Theindex_col
argument sets the index of the data to theDatetime
column. - Finally, we subset the
data
object using squared brackets to get theIncoming Solar
column, which contains the information about solar radiation at each time step.
How it works…
The following table shows a sample of the data. Each row represents the level of the time series at a particular hour.
Datetime |
Incoming Solar |
2007-10-01 09:00:00 |
35.4 |
2007-10-01 10:00:00 |
63.8 |
2007-10-01 11:00:00 |
99.4 |
2007-10-01 12:00:00 |
174.5 |
2007-10-01 13:00:00 |
157.9 |
2007-10-01 14:00:00 |
345.8 |
2007-10-01 15:00:00 |
329.8 |
2007-10-01 16:00:00 |
114.6 |
2007-10-01 17:00:00 |
29.9 |
2007-10-01 18:00:00 |
10.9 |
2007-10-01 19:00:00 |
0.0 |
Table 1.1: Sample of an hourly univariate time series
The series
object that contains the time series is a pandas
Series data structure. This structure contains several methods for time series analysis. We could also create a Series object by calling pd.Series
with a dataset and the respective time series. The following is an example of this: pd.Series(data=values, index=timestamps)
, where values
refers to the time series values and timestamps
represents the respective timestamp of each observation.