Defining data source awareness
The internet makes it incredibly easy to locate many common kinds of dataset. There are so many datasets available for some purposes that sometimes it’s hard to choose based on the content of the dataset alone. However, content isn’t the only consideration. It’s also important to consider the third party that collected it. In some cases, datasets are extremely biased or have special requirements that make them inappropriate to use for many kinds of analysis. Even if you were to ignore the issues with the dataset, the experimentation you perform with it would yield less-than-useful results.
Validating user permissions
Part of data source awareness is to ensure that people using the dataset actually have the need and credentials to use it. This is especially true with datasets that deal with sensitive or confidential materials, or datasets that are controlled by government regulation, such as medical datasets that must follow...