LLMs, like people, lie without thinking
The question isn't whether they get stuff wrong, it's how much they do
One of the most persistent and obvious problems with the current generation of large language models (LLMs) is their tendency to hallucinate: to invent, from whole cloth, answers to questions they've been asked by human interlocutors.
This tendency is the source of one of the most persistent criticisms made by AI skeptics. The skeptics say GPT-4 and its ilk are untrustworthy. They are less like search engines and, to repeat a common attack, more like autocomplete.
And they're right - this is not only incredibly annoying, it's also seemingly dangerous. Every week I seem to come across some otherwise intelligent and well-informed AI optimist arguing that GPT-4 is ready, today, to revolutionize education. But, as many users have pointed out, LLMs don't always say "I don't know" when they don't know something, and it's not always a matter of getting small facts wrong; sometimes they conjure up totally plausible-sounding narratives, histories, or explanations with no grounding in reality.
I agree this is a problem. But it's not, as many AI skeptics seem to think, an automatic reason to distrust advanced LLMs — at least not more than we mistrust humans.
That's because, as decades of research have shown, people also have a tendency to bullshit in exactly the the same way as the GPTs have been doing.
In a well-known 2002 study, 121 subjects were secretly videotaped having conversations with someone they'd just met. More than 60% of people lied at once during these ten-minute interactions, and on average did so roughly three times. Earlier research found that people reported telling two lies per day, on average. A more recent study of 632 subjects over three months suggested that about two-thirds of "survey-person-days" included at least one lie, with the caveat that the vast majority of lies were told by a small minority of prolific fibbers.
The authors of the 2002 study wrote that "more lies were told when participants had an ingratiation or competence goal for the session compared to no specific goal for the interaction." This pattern bears comparison with the training methodology used by many production LLMs: reinforcement learning with human feedback (RLHF), which primes chatbots to respond based on what's previously been given the nod by human trainers. LLMs, like people, aim to please.
Yet a much more compelling piece of information, to me, is the observed tendency of so-called "split-brain patients"to lie without meaning to. In such individuals, the two hemispheres of the brain have been surgically separated, allowing experimenters to, for instance, give an instruction to one hemisphere of the brain without the other having heard it. This results in a situation where the other side of the brain attempts to confect an explanation for what the body it's traveling in has just done.
On another occasion, experimenters commanded the right hemisphere of a split-brain patient to stand. The patient stood and the experimenter asked him why he did so. Again, the speaking left hemisphere created an explanation for his behavior, explaining he was thirsty and wanted to get a drink.
- Marinsek, Gazzaniga and Miller 2016 (or on Sci-Hub)
The concept of the left-brain interpreter developed by Michael Gazzaniga and Joseph LeDoux suggests that part of the role of the brain's left hemisphere is to construct sensible explanations about the world. Note that this task is quite distinct from retrieving and repeating true facts. Explanations can be had without any recourse whatsoever to true knowledge, and the split-brain patients have them.
What the left hemisphere is doing in these settings, in some sense, is bullshitting. It's certainly not lying — there's no malice here — but its job is to create an explanation, and it creates one. This is bullshit.
In his memorable treatise "On Bullshit," the philosopher Harry Frankfurt argued that bulshitters are distinguished not by malice or deception but by disregard for the truth.
Frankfurt says of the bullshitter:
His eye is not on the facts at all, as the eyes of the honest man and of the liar are, except insofar as they may be pertinent to his interest in getting away with what he says. He does not care whether the things he says describe reality correctly. He just picks them out, or makes them up, to suit his purpose.
Bullshitters rise to the occasion when they're expected to know something that they simply don't. Their need to respond crashes through the flimsy guardrails of society's expectation for a truthful response, and, like the manipulated split-brain patients, they provide as plausible-sounding a statement as they can.
This is basically what LLMs are doing when they hallucinate. GPT isn't simply retrieving data from a storehouse of all the world's text and rephrasing it for the user. Rather, its answers draw on a statistical "understanding" of the world that it's gained from having read all that text. When you ask GPT-4 a question, it conjures (to oversimplify) the most likely response it can muster based on that understanding. Crucially, such a "most likely response" exists whether or not there is a fact of the matter. Drawing on it is a sort of gamble on what's most likely to be true. It's precisely this kind of bet that bullshitters are making when speak whereof they do not know.
Training an LLM to only say true things, even with humans in the loop, is hard precisely because there is no reservoir of facts that it can draw on. It can only make considered judgments about what is likely to be true. This is what human bullshitters do. But, because it's the only thing that can be done in a world of uncertain and constantly evolving knowledge, it's also what human institutions do — including those that are tasked explicitly with saying true things.
Take the textbooks that AI optimists hope will soon be rendered obsolete. Textbooks have been wrong so often over the years that, when I was growing up, one of the most popular books among educators was "Lies My Teacher Told Me." That book was about errors history textbooks, but I remember plenty of bullshit in less politically charged domains: my ninth-grade physics teacher repeated the old saw about how glass windows slowly flow over time, because glass is really a liquid (they don't, and it isn't) and the following year my school's beloved history teacher argued that Genghis Khan was probably Jewish (sounds a bit like Cohen, doesn't it?).
It's been nearly twenty years since researchers first claimed that Wikipedia was as accurate as Britannica. Back then, educators wouldn't let students cite Wikipedia as a source, and I doubt they will today, since their concern isn't really about Wikipedia's accuracy as such: the problem is that, I mean, they let just anybody edit it.
LLMs bullshit. But so do teachers, textbooks, and regular people. If accuracy is the issue, what's called for is a head-to-head. The real question with regard to deploying LLMs in truth-focused domains like education is really this: how do they stack up against the alternative?