Working with time series data
The native R classes suitable for storing time series data include vector
, matrix
, data.frame
, and ts
objects. But the types of data that can be stored in these objects are narrow; furthermore, the methods provided by these representations are limited in scope. Luckily, there exist specialized objects that deal with more general representation of time series data: zoo
, xts
, or timeSeries
objects, available from packages of the same name.
It is not necessary to create time series objects for every time series analysis problem, but more sophisticated analyses require time series objects. You could calculate the mean or variance of time series data represented as a vector in R, but if you want to perform a seasonal decomposition using decompose
, you need to have the data stored in a time series object.
In the following examples, we assume you are working with zoo
objects because we think it is one of the most widely used packages. Before we can use zoo
objects, we need to install and load the zoo
package (if you have already installed it, you only need to load it) using the following command:
In order to familiarize ourselves with the available methods, we create a zoo
object called aapl
from the daily closing prices of Apple's stock, which are stored in the CSV file aapl.csv
. Each line on the sheet contains a date and a closing price separated by a comma. The first line contains the column headings (Date and Close). The date is formatted according to the recommended primary standard notation of ISO 8601 (YYYY-MM-DD). The closing price is adjusted for stock splits, dividends, and related changes.
Tip
Downloading the example code
You can download the example code files for all Packt books you have purchased from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.
We load the data from our current working directory using the following command:
To get a first impression of the data, we plot the stock price chart and specify a title for the overall plot (using the main
argument) and labels for the x and y axis (using xlab
and ylab
respectively).
We can extract the first or last part of the time series using the following commands:
Apple's all-time high and the day on which it occurred can be found using the following command:
When dealing with time series, one is normally more interested in returns instead of prices. This is because returns are usually stationary. So we will calculate simple returns or continuously compounded returns (in percentage terms).
Summary statistics about simple returns can also be obtained. We use the coredata
method here to indicate that we are only interested in the stock prices and not the index (dates).
The biggest single-day loss is -51.86%. The date on which that loss occurred can be obtained using the following command:
A quick search on the Internet reveals that the large movement occurred due to the issuance of a profit warning. To get a better understanding of the relative frequency of daily returns, we can plot the histogram. The number of cells used to group the return data can be specified using the break
argument.
We can restrict our analysis to a subset (a window
) of the time series. The highest stock price of Apple in 2013 can be found using the following command lines:
The quantiles of the return distribution are of interest from a risk-management perspective. We can, for example, easily determine the 1 day 99% Value-at-Risk using a naive historical approach.
Hence, the probability that the return is below 7% on any given day is only 1%. But if this day occurs (and it will occur approximately 2.5 times per year), 7% is the minimum amount you will lose.