Handling duplicate values
Dealing with duplicate values is one of the common challenges we encounter when analyzing data or building data transformations. There are DataFrame/Series methods and expressions to find duplicate values, remove them, and extract only unique values.
In this recipe, we’ll cover how to check and handle duplicate values in Polars.
How to do it...
Here are the steps:
- Check the shape of the dataset:
df.shape
The preceding code will return the following output:
>> (137700, 16)
- Check the number of duplicated/unique rows at the dataset level with all the columns:
df.is_duplicated().sum()
The preceding code will return the following output:
>> 0
df.is_unique().sum()
The preceding code will return the following output:
>> 137700 df.n_unique()
The preceding code will return the following output:
>> 137700
- Display the number of unique values for selected columns:
df.select(pl.all().n_unique())
The preceding...