The recent few years have witnessed the widespread application of statistics and machine learning to derive actionable insights and business value out of data in almost all industrial sectors. Hence, it is becoming imperative for business analysts and software professionals to be able to tackle different types of datasets. Often, the data is a time series in the form of a sequence of quantitative observations about a system or process and made at successive points in time. Commonly, the points in time are equally spaced. Examples of time series data include gross domestic product, sales volumes, stock prices, weather attributes when recorded over a time spread of several years, months, days, hours, and so on. The frequency of observation depends on the nature of the variable and its applications. For example, gross domestic product, which is used for measuring annual economic progress of a country, is publicly reported every year. Sales volumes are published monthly, quarterly or biyearly, though figures over longer duration of time might have been generated by aggregating more granular data such as daily or weekly sales. Information about stock prices and weather attributes are available at every second. On the other extreme, there are several physical processes which generate time series data at fraction of a second.
Successful utilization of time series data would lead to monitoring the health of the system over time. For example, the performance of a company is tracked from its quarterly profit margins. Time series analysis aims to utilize such data for several purposes that can be broadly categorized as:
- To understand and interpret the underlying forces that produce the observed state of a system or process over time
- To forecast the future state of the system or process in terms of observable characteristics
To achieve the aforementioned objectives, time series analysis applies different statistical methods to explore and model the internal structures of the time series data such as trends, seasonal fluctuations, cyclical behavior, and irregular changes. Several mathematical techniques and programming tools exist to effectively design computer programs that can explore, visualize, and model patterns in time series data.
However, before taking a deep dive into these techniques, this chapter aims to explain the following two aspects:
- Difference between time series and non-time series data
- Internal structures of time series (some of which have been briefly mentioned in the previous paragraph)
For problem solving, readers would find this chapter useful in order to:
- Distinguish between time series and non-time series data and hence choose the right approach to formulate and solve a given problem.
- Select the appropriate techniques for a time series problem. Depending on the application, one may choose to focus on one or more internal structures of the time series data.
At the end of this chapter, you will understand the different types of datasets you might have to deal with in your analytics project and be able to differentiate time series from non-time series. You will also know about the special internal structures of data which makes it a time series. The overall concepts learnt from this chapter will help in choosing the right approach of dealing with time series.
This chapter will cover the following points:
- Knowing the different types of data you might come across in your analytics projects
- Understanding the internal structures of data that makes a time series
- Dealing with auto-correlation, which is the single most important internal structure of a time series and is often the primary focus of time series analysis