3.1 Description
Analysts and decision-makers need to acquire data for further analysis. In many cases, the data is available in CSV-formatted files. These files may be extracts from databases or downloads from web services.
For testing purposes, it’s helpful to start with something relatively small. Some of the Kaggle data sets are very, very large, and require sophisticated application design. One of the most fun small data sets to work with is Anscombe’s Quartet. This can serve as a test case to understand the issues and concerns in acquiring raw data.
We’re interested in a few key features of an application to acquire data:
When gathering data from multiple sources, it’s imperative to convert it to a common format. Data sources vary, and will often change with software upgrades. The acquisition process needs to be flexible with respect to data sources and avoid assumptions about formats.
A CLI application permits a variety of automation possibilities....