Once we have identified the data sources, the next task is to gather all the tuples or records as a homogeneous set. The format can be a tabular arrangement, a series of real values (such as audio or weather variables), and N-dimensional matrices (a set of images or cloud points), among other types.
Dataset definition and retrieval
The ETL process
The previous stages in the big data processing field evolved over several decades under the name of data mining, and then adopted the popular name of big data.
One of the best outcomes of these disciplines is the specification of the Extraction, Transform, Load (ETL) process.
This process starts with a mix of many data sources from business systems, then moves to a system that transforms...