Dateline Woking 8th April 2023.
A lawyer once told me that the golden rule he learned at law school was that when you are cross-examining a witness, you should never ask a question that you do not already know the answer to. Well, it seems to me that the very same maxim applies to ChatGPT.
Follow that golden rule, and you will find ChatGPT and its ilk very useful. Ignore it at your peril.
LLMs
ChatGPT is a Large Language Model (LLM), a form of generative AI. Unless you have been in a coma for the last few months, you cannot fail to have noticed just how rapidly it has become part of the mainstream discourse in fintech and other sectors. And it is, let’s not beat about the bush, astonishing. Which is why Microsoft have invested billions into OpenAI, ChatGPT’s developer, and why Google launched Bard, a similar service based on a similar model.
When set ten problems from an American maths competition (things like “Find the number of ordered pairs of prime numbers that sum to 60”) and ten reading questions from America’s sat school-leavers’ exam (things like “Read the passage and determine which choice best describes what happens in it”) and asked for dating advice (“Given the following conversation from a dating app, what is the best way to ask someone out on a first date?”), neither ai emerged as clearly superior. Bard was slightly better at maths, answering five questions correctly, compared with three for ChatGPT. The dating advice was uneven: fed some real exchanges in a dating app, each gave specific suggestions on one occasion, and platitudes (eg, “be open minded” in another).
While the output from these models is amazing, it is a mistake to think that you are seeing any form of intelligence. Just to be clear: ChatGPT doesn’t know what it is talking about. If you ask it to produce a six point plan to bring older workers back into employment post-pandemic, it can produce text that could have been lifted wholesale from an expensive report from a top team of management consultants. In fact, as this was something I needed an opinion about in a recent meeting, that is exactly what I did. See for yourself:
Call for government incentives to employers who hire older workers.
Encourage retraining and education opportunities for older workers.
Introduce workplace accommodations that are beneficial to older workers.
Promote positive attitudes towards older workers in the workplace.
Stimulate the development of age-friendly workplaces.
Advocate for government policies that support the employment of older workers.
Not bad. And certainly good enough to create an agenda item for a board meeting or respond to a request for a press interview.
If you want real expertise though, ChatGPT might not be your best friend. Arvind Narayanan, a computer science professor at Princeton, wrote on Twitter in December that he had asked ChatGPT some basic questions about information security that he had posed to students in an exam. The chatbot responded with answers that sounded plausible but were actually nonsense and, as he pointed out in the New York Times (in adherence to my golden rule), that was very dangerous because “you can’t tell when it’s wrong unless you already know the answer”.
(Note also that ChatGPT’s model was trained on data up until 2021, so it is limited in how it responds to question about current events.)
with kind permission of Helen Holmes (CC-BY-ND 4.0)
When ChatGPT was set specific tasks around computer programming, it ended up getting banned for “constantly giving wrong answers”. The Stack Overflow moderators wrote that a particular problem is that while the answers it produces have a high error rate, they typically look like they might be good and, in further support of my golden rule that people should not look to ChatGPT to create answers without the “expertise or willingness” to verify that the answer is correct.
This is not a fault with ChatGPT. It is how these models work. Hence the amusing launch of Bard that wiped billions from the value of Alphabet after Google released promotional material for Bard that contained an error.
(Bard said that the James Webb space telescope, JWST, took the very first pictures of a planet outside the Earth’s solar system. But that is wrong. Just plain wrong. Bruce Macintosh, the director of University of California Observatories, tweeted: “Speaking as someone who imaged an exoplanet 14 years before JWST was launched, it feels like you should find a better example?”)
In fact, the situation is even worse than it appears at first: not only does ChatGPT not know what it is talking about, it makes shit up.
Not Even Wrong
The philosopher Harry Frankfurt defined “bullshit” as speech that is intended to persuade without regard for the truth. In that sense, ChatGPT and Bard and so on are the greatest bullshitters ever! Such models produce plausible text but not true statements, since they cannot evaluate what is true or not. That is not their purpose. They are not Einsteins or Wittgensteins. They do not know anything, they have no insight and they deliver “hallucination” rather than illumination.
Yes, they hallucinate. The propensity of tools based on large language models to simply make stuff up has come to be known as hallucination. For example, here is Microsoft’s Bing AI hallucinating some financial results. This is what Bing said:
“Gap Inc. reported operating margin of 5.9%, adjusted for impairment charges and restructuring costs, and diluted earnings per share of $0.42, adjusted for impairment charges, restructuring costs, and tax impacts.”
Yet as Dmitri Brereton points out, “5.9%” is neither the adjusted nor the unadjusted value. This number doesn’t even appear in the entire document. It’s completely made up. In fact the operating margin including impairment is 4.6% and excluding impairment is 3.9%.
Well, you may say, that’s not such a big deal. After all, what’s the odd mistake in company results here and there? But I must report that the situation has been getting steadily worse as more and more people use LLMs to generate material, especially when the hallucinations end up back on the internet where they get fed back into other LLMs.
Consider the recent case of Jonathan Turley, a law professor at George Washington University, who received a surprising email because his name appeared on a list of "legal scholars who have sexually harassed someone" that another lawyer had asked ChatGPT for. The chatbot made up claims about abuse on a trip that he had never taken for a law school he had never worked for. ChatGPT cited a Washington Post article published in 2018… but the publication said that article doesn't exist.
(Investigation by the Washington Post which found that GPT-4 had also shared the same claims about Turley seems to show that the repeated defamation follows press coverage highlighting the original mistake, demonstrating how easily misinformation can spread.)
Ruh roh, as they say.
There Is A Place For Them
I stress that LLMs are not useless because of their lies and stupidity. Far from it, in fact. When you know what the answer is but need some help writing it up, ChatGPT is a godsend to authors and it saves huge amounts of time by pulling together draft paragraphs or sections of text on things that you know about, while you can remain focused on the narrative thread and investigating the things that you don’t know about.
In some domains the productivity increases are already evident. In software engineering, for example, programmers have been quick to start using LLMs as a “coding buddy” to come up with new ideas and to improve reliability. Lee Atchison, an author who writes on application modernisation says that these tools "should be the headline of any discussion about the future of DevOps” and goes on to observe that the tools are not “to replace people, but help perform some types of tasks". Similarly, lawyers are beginning to add LLMs to their toolkit. Allen & Overy is using a chatbot based on the technology to help its lawyers in tasks such as drafting merger & acquisition documents or memos to clients.
While (thank goodness!) such models are a long way from putting people like me out of work, they are already helping me to be more productive, and that’s a big benefit already.
In the economist Diane Coyle’s recent review of Mariana Mazzucato and Rosie Collington’s polemic against consultancies, The Big Con, she notes their view that while “It would be foolish to blame consultancies for all the problems that advanced capitalism has created,” they do blame them for “copy-and-pasted formulas for strategy and often ineffective tools” that such organisations use. But copy-and-paste, it seems to me, is precisely the value of management consultants. When they do it properly, instead of the lame repackaging of conventional wisdom that I often see in my inbox, it has real value.
I say this because many years ago I used to teach the Information Technology Management (ITM) module at a business school in London. I did it for a few years and one of the things that I liked most about it was that during the residential teaching sessions I could go and sit in on other modules and learn something myself. During one of these modules, I am far too old to remember which module it was or what the point of the conversation was, someone was talking about the best way to make use of management consultants and the point being made was that management consultants are supposed to be a virus that spreads best practice throughout organisations.
My personal experience is that this is generally true and the good use of management consultants, as opposed to specialist and technical consultants (eg, me), is to find where that “copy and paste” works and exploit it. Not for innovation, where there is nothing to copy and paste from, but for efficiency and productivity and cost-benefits and so on where they have access to case studies and experiences and reports and spreadsheets and models that can be spread, more as a vaccine than a virus, to a new host.
This is, essentially, what ChatGPT does. It’s best to think of it as an error-prone management consultant, rather than as a digital Descartes or an electronic Elon Musk.
Are you looking for:
A speaker/moderator for your online or in person event?
Written content or contribution for your publication?
A trusted advisor for your company’s board?
Some comment on the latest digital financial services news/media?