Normalizing data
Data normalization is a technique for cleaning data. There are different techniques for normalizing data that make it easy to understand and analyze. This section covers the following techniques and use cases:
- Casting data types and map column names
- Inferring schemas
- Computing schemas on the fly
- Enforcing schemas
- Flattening nested schemas
- Normalizing scale
- Handling missing values and outliers
- Normalizing date and time values
- Handling error records
Let’s dive in!
Casting data types and map column names
In the context of data lakes, there can be a lot of different data sources. This may cause inconsistency in data types or column names. For example, when you want to join multiple tables where there is inconsistency, it can cause query errors or invalid calculations. To avoid such issues and make further analytics easier, it is a good approach to cast the data types and apply mapping to the data during the...