Data sourcing is one of the most preliminary steps in the life cycle of data. It includes activities such as data acquisition, cleaning, and organization. The following is a list of the specific activities that it involves:
- Raw data delivery—push model versus pull (extract) model
- Handling a variety of data formats (CSV, JSON, XML)
- Detecting errors in the data that is delivered
- Removing bad data
- Data enrichment—filling the gaps in the data
- Combining data with other datasets
- Defining a data model
- Transforming the raw data model into the defined model
- Storing the data