Removing duplicates
In many cases, as we start working with data, there will often be duplicates within the data. As we discussed in Chapter 2, Understanding Data Quality and Why Data Cleaning is Important, there are a number of reasons why the values in your data may have been duplicated. For example, say we're a retailer and we accidentally entered two product items for the same product. We don’t want to have inaccurate numbers for that product by leaving the duplicate data in, so it’s key that we remove it before we get started with our analysis.
So, let’s get started. In the following example, we will find, select, and remove the duplicate in the data:
- Download the
Products.xlsx
dataset from the given GitHub repository. - Connect to this CSV using Power BI Desktop by selecting Get data in the toolbar (as shown) and then selecting Excel workbook:
Figure 4.1 – The Get data menu within Power BI Desktop
...