Time series analysis is the process of extracting meaningful insights from time series data with the use of data visualization tools, statistical applications, and mathematical models. Those insights can be used to learn and explore past events and to forecast future events. The analysis process can be divided into the following steps:
- Data collection: This step includes extracting data from different data sources, such as flat files (such as CSV, TXT, and XLMS), databases (for example, SQL Server, and Teradata), or other internet sources (such as academic resources and the Bureau of Statistics datasets). Later on in this chapter, we will learn how to load data to R from different sources.
- Data preparation: In most cases, raw data is unstructured and may require cleaning, transformation, aggregation, and reformatting. In Chapter 2, Working with Date and Time Objects; Chapter 3, The Time Series Object; and Chapter 4, Working with zoo and xts Objects, we will focus on the core data preparation methods of time series data with R.
- Descriptive analysis: This is used in summary statistics and data visualization tools to extract insights from the data, such as patterns, distributions, cycles, and relationships with other drivers to learn more about past events. In Chapter 5, Decomposition of Time Series Data; Chapter 6, Seasonality Analysis; and Chapter 7, Correlation Analysis, we will focus on descriptive analysis methods of time series data.
- Predictive analysis: We use this to apply statistical methods in order to forecast future events. Chapter 8, Forecasting Strategies; Chapter 9, Forecasting with Linear Regression; Chapter 10, Forecasting with Exponential Smoothing Models; Chapter 11, Forecasting with ARIMA Models; and Chapter 12, Forecasting with Machine Learning Models, we will focus on traditional forecasting approaches (such as linear regression, exponential smoothing, and ARIMA models), as well as advanced forecasting approaches with machine learning models.
It may be surprising but, in reality, the first two steps may take most of the process time and effort, which is mainly due to data challenges and complexity. For instance, companies tend to restructure their business units (BU) and IT systems every couple of years, and therefore it is hard to identify and track the historical contribution (production, revenues, unit sales, and so on) of a specific BU before the changes.
In other cases, additional effort is required to clean the raw data and handle missing values and outliers. This sadly leaves less time for the analysis itself. Fortunately, R has a variety of wonderful applications for data preparations, visualizations, and time series modeling. This helps to reduce the time that's spent on the preparation steps and lets you allocate more time to the analysis itself. Throughout the rest of this chapter, we will provide background information on R and its applications for time series analysis.