A typical analytic scenario using large datasets
One of the most frequent activities of a data scientist is to analyze a dataset of information relevant to a business scenario. The objective of the analysis is to be able to identify associations and relationships between variables, which help in some way to discover new measurable aspects of the business (insights) and can then be used to make it grow better. It may be the case that the available data may not be sufficient to determine strong associations between variables, because any additional variables may not be considered. In this case, attempting to obtain new data that is not generated by your business but enriches the context of your dataset (a data augmentation process) can improve the strength of the statistical associations between your variables. Being able to link, for example, weather forecast data to a dataset that reports the measurements of the water level of a dam certainly introduces significant variables to better...