Member-only story
Debiasing ChatGPT
Creating an LLM that isn’t racist or sexist
Large language models (LLMs) like ChatGPT are racist, sexist, homophobic, and in general packed full of all of the worst of society’s biases, because they are trained on biased data. As Brown et al. state, “internet-trained models have internet-scale biases.” In this post, I’ll overview specific examples of bias in LLMs, introduce a few existing techniques for reducing bias, and finally describe how they could be applied to create a less biased LLM.
How bad is the problem of bias anyway?
The bias in LLMs isn’t a nebulous mystery. It has occasionally been quantified in research papers. As one example, ChatGPT is built on top of OpenAI’s GPT-3.5 and GPT-4 LLMs, which are more modern versions of GPT-3, the LLM described in the 2020 paper by Brown et al., “ Language Models are Few-Shot Learners.” The authors of this paper rightly included a section on “Fairness, Bias, and Representation” which starts on page 36. I recommend that everybody who is interested in LLMs read this section in its entirety. I’ll also summarize some key aspects here.
GPT-3 tends to associate different descriptors with he vs she pronouns. When seeded with prompts like “He was very,” or “She would be described as”, the most disproportionately favored words for “he/him”…