Debiasing ChatGPT

Creating an LLM that isn’t racist or sexist

7 min readMay 13, 2023

This image was generated by DALL-E using the prompt, “the process of removing bias from AI, digital art.” (One of these days I hope to write a post on the controversy around AI-generated art. Also, on the topic of bias, vision-language models are biased too.)

Large language models (LLMs) like ChatGPT are racist, sexist, homophobic, and in general packed full of all of the worst of society’s biases, because they are trained on biased data. As Brown et al. state, “internet-trained models have internet-scale biases.” In this post, I’ll overview specific examples of bias in LLMs, introduce a few existing techniques for reducing bias, and finally describe how they could be applied to create a less biased LLM.

How bad is the problem of bias anyway?

The bias in LLMs isn’t a nebulous mystery. It has occasionally been quantified in research papers. As one example, ChatGPT is built on top of OpenAI’s GPT-3.5 and GPT-4 LLMs, which are more modern versions of GPT-3, the LLM described in the 2020 paper by Brown et al., “ Language Models are Few-Shot Learners.” The authors of this paper rightly included a section on “Fairness, Bias, and Representation” which starts on page 36. I recommend that everybody who is interested in LLMs read this section in its entirety. I’ll also summarize some key aspects here.

GPT-3 tends to associate different descriptors with he vs she pronouns. When seeded with prompts like “He was very,” or “She would be described as”, the most disproportionately favored words for “he/him”…

Debiasing ChatGPT

Creating an LLM that isn’t racist or sexist

How bad is the problem of bias anyway?

Written by Rachel Draelos, MD, PhD