Handling Schema Drifts
A schema drift refers to the changes in schema over time due to changes happening in the event sources. This could be due to newer columns or fields getting older, columns getting deleted, and more. For example, a large online US marketplace might measure real-time purchasing data in order to analyze shopping habits over a holiday period. The company then starts to expand internationally and wants to add fields for customer location, currency, and shipping costs. The company wants to be able to make this change without compromising the integrity of the data stream.
Note
This section primarily focuses on the Handle schema drift concept of the DP-203: Data Engineering on Microsoft Azure exam.
Handling Schema Drifts Using Event Hubs
One way of ensuring data integrity is for the schema details to always be shared with data. However, if an event publisher needs to share schema details with the consumer, they must serialize the schema along with the data...