Understanding the importance of data quality
Remember the old adage that says garbage in, garbage out? This is especially true in data science. The quality of data will influence the entire downstream project. It is difficult for people who work on the downstream tasks to identify the sources of possible issues.
In the following section, I will present three examples in which poor data quality causes difficulties.
Understanding why data can be problematic
The three examples fall into three different categories that represent three different problems:
- Inherent bias in data
- Miscommunication in large-scale projects
- Insufficient documentation and irreversible preprocessing
Let's start with the first example, which is quite a recent one and is pretty much a hot topic—face generation.
Bias in data sources
The first example we are going to look at is bias in data. Face-Depixelizer is a tool that is capable of significantly increasing the resolution...