A typical analytic scenario using large datasets
One of the most common activities of a data scientist is to analyze a dataset of information relevant to a business scenario. The goal of the analysis is to be able to identify associations and relationships between variables that will somehow help to discover new measurable aspects of the business (insights) that can then be used to make it grow better. It may be the case that the available data is not sufficient to identify strong associations between variables, because any additional variables may not be considered. In this case, attempting to obtain new data that is not generated by your organization but that enriches the context of your dataset (a data augmentation process) can improve the strength of the statistical associations between your variables. For example, being able to link weather forecast data to a dataset that reports the measurements of a dam's water level will certainly introduce significant variables to better...