Staged data is considered to be immutable. Immutability in this context implies that staged data, once created, never changes. Now, the data cleaning and normalizing process can start. This step could also involve determining the degree of errors in the data received. In particular, it is expected that big data will have a certain amount of errors.
Raw data coming from external sources comes in a variety of formats. These formats are generally designed for data delivery and are not suitable for use by systems consuming data. It is also very common for some of the information to be clubbed together as part of data delivery; however, the consumer of the data needs to have more fine-grained access to the information.
An example of this is the address part of the data. The data producer might provide a free-form address. The contained information, such as...