Mitigating Bias and Prioritizing Ethics in ChatGPT

Explore how ChatGPT may show bias in its responses, particularly regarding racial sentiment, while highlighting ongoing efforts to improve in line with responsible AI principles.

ChatGPT has been provided with the Moderator API so that it cannot engage in conversations that might be unsafe. The Moderator API is a classification model performed by a GPT model based on the following classes: violence, self-harm, hate, harassment, and sex. For this, OpenAI uses anonymized data and synthetic data (in zero-shot form) to create synthetic data.

Ethical considerations in ChatGPT

The Moderation API is based on a more sophisticated version of the content filter model available among OpenAI APIs. This model is very conservative toward false positives rather than false negatives.

Hidden bias

However, there is something we can refer to as hidden bias, which derives directly from the knowledge base the model has been trained on. For example, concerning the main chunk of training data of GPT-3, known as the Common Crawl, experts believe that it was written mainly by white males from Western countries. If this is the case, we are already facing a hidden bias of the model, which will inevitably mimic a limited and unrepresentative category of human beings.

In their paper, Languages Models are Few-Shots Learners, OpenAI’s researchers Tom Brown et al. created an experimental setup to investigate racial bias in GPT-3. The model was prompted with phrases containing racial categories, and 800 samples were generated for each category. The sentiment of the generated text was measured using Senti WordNet based on word co-occurrences on a scale ranging from -100 to 100 (with positive scores indicating positive words and vice versa).

Exploring racial sentiment

The results showed that the sentimentIn this context, sentiment refers to the emotional tone or attitude associated with different racial categories within the data or models being analyzed. It indicates how positively or negatively these categories are perceived or represented. associated with each racial category varied across different models, with Asians consistently having a high sentiment and black people consistently having a low sentiment. The authors caution that the results reflect the experimental setup and that socio-historical factors may influence the sentiment associated with different demographics. The study highlights the need for a more sophisticated analysis of the relationship between sentiment, entities, and input data:

Get hands-on with 1200+ tech skills courses.