Chapter 4 – The Most Common Data Cleaning Operations
- B – To enhance data accuracy in the analysis – Removing duplicates is crucial to prevent inaccuracies in data analysis, especially when dealing with numerical values.
- C –
Product Name
, as the main identifier – In the provided example, theProduct Name
column is selected to remove duplicates, as it serves as the main identifier. - B – Distorts analysis results – Missing data, or
NULL
values, can distort analysis results and visuals. - C – To gain desired dimensions for analysis – for example, splitting a date field – Columns may need to be split to extract specific dimensions for analysis.
- C – Split Columns by Delimiter, based on data format – In the
Date
table example, the By Delimiter function is used to split the date column based on the / delimiter. - C – Merging columns to format date data – Merging columns may...