Avoiding the risk of hidden bias and taking into account ethical considerations in ChatGPT
ChatGPT has been provided with the Moderator API so that it cannot engage in conversations that might be unsafe. The Moderator API is a classification model performed by a GPT model based on the following classes: violence, self-harm, hate, harassment, and sex. For this, OpenAI uses anonymized data and synthetic data (in zero-shot form) to create synthetic data.
The Moderation API is based on a more sophisticated version of the content filter model available among OpenAI APIs. We discussed this model in Chapter 1, where we saw how it is very conservative toward false positives rather than false negatives.
However, there is something we can refer to as hidden bias, which derives directly from the knowledge base the model has been trained on. For example, concerning the main chunk of training data of GPT-3, known as the Common Crawl, experts believe that it was written mainly by white males...