Exploratory Data Analysis
Later in this book, we'll use the field of EDA as a source for concrete examples of functional programming. This field is rich with algorithms and approaches to working with complex datasets; functional programming is often a very good fit between the problem domain and automated solutions.
While details vary from author to author, there are several widely accepted stages of EDA. These include the following:
- Data preparation: This might involve extraction and transformation for source applications. It might involve parsing a source data format and doing some kinds of data scrubbing to remove unusable or invalid data. This is an excellent application of functional design techniques.
- Data exploration: This is a description of the available data. This usually involves the essential statistical functions. This is another excellent place to explore functional programming. We can describe our focus as univariate and bivariate statistics but that sounds too daunting and complex. What this really means is that we'll focus on mean, median, mode, and other related descriptive statistics. Data exploration may also involve data visualization. We'll skirt this issue because it doesn't involve very much functional programming. I'll suggest that you use a toolkit like
SciPy
.Visit the following link to get more information how SciPY works and its usage:
https://www.packtpub.com/big-data-and-business-intelligence/learning-scipy-numerical-and-scientific-computing or https://www.packtpub.com/big-data-and-business-intelligence/learning-python-data-visualization
- Data modeling and machine learning: This tends to be proscriptive as it involves extending a model to new data. We're going to skirt this because some of the models can become mathematically complex. If we spend too much time on these topics, we won't be able to focus on functional programming.
- Evaluation and comparison: When there are alternative models, each must be evaluated to determine which is a better fit for the available data. This can involve ordinary descriptive statistics of model outputs. This can benefit from functional design techniques.
The goal of EDA is often to create a model that can be deployed as a decision support application. In many cases, a model might be a simple function. A simple functional programming approach can apply the model to new data and display results for human consumption.