Mitigating Bias and Prioritizing Ethics in ChatGPT
Explore how ChatGPT may show bias in its responses, particularly regarding racial sentiment, while highlighting ongoing efforts to improve in line with responsible AI principles.
We'll cover the following
ChatGPT has been provided with the Moderator API so that it cannot engage in conversations that might be unsafe. The Moderator API is a classification model performed by a GPT model based on the following classes: violence, self-harm, hate, harassment, and sex. For this, OpenAI uses anonymized data and synthetic data (in zero-shot form) to create synthetic data.
Ethical considerations in ChatGPT
The Moderation API is based on a more sophisticated version of the content filter model available among OpenAI APIs. This model is very conservative toward false positives rather than false negatives.
Hidden bias
However, there is something we can refer to as hidden bias, which derives directly from the knowledge base the model has been trained on. For example, concerning the main chunk of training data of GPT-3, known as the Common Crawl, experts believe that it was written mainly by white males from Western countries. If this is the case, we are already facing a hidden bias of the model, which will inevitably mimic a limited and unrepresentative category of human beings.
In their paper, Languages Models are Few-Shots Learners, OpenAI’s researchers Tom Brown et al. created an experimental setup to investigate racial bias in GPT-3. The model was prompted with phrases containing racial categories, and 800 samples were generated for each category. The sentiment of the generated text was measured using Senti WordNet based on word co-occurrences on a scale ranging from -100 to 100 (with positive scores indicating positive words and vice versa).
Exploring racial sentiment
The results showed that the
Get hands-on with 1400+ tech skills courses.