I remember finding ChatGPT usage in a research paper for the first time.
It was April 2023, and a Twitter user pointed out that you could identify bots by their use of the phrase “as an AI language model…”. It’s a hallmark of ChatGPT usage. So I searched Google Scholar for the phrase
“as an AI language model…” AND NOT chatgpt
That paper must have been created and peer-reviewed without any of the ‘author’, editor, or reviewers actually reading it because the phrase couldn’t have been more obvious.
As detection methods go, this search is as basic as they come. I have no doubt that it missed most of the ChatGPT content in Google Scholar at the time. After trying it again just now, I found it harder to avoid false-positives.
I’ve seen a lot of research since the launch of ChatGPT suggesting that AI-generated text can be detected with high accuracy. Some of it is simply wrong — it’s a very hard problem and it’s unlikely to get easier. I’ve also seen a few papers which show promising results, but which ignore class-imbalance. This is a non-intuitive feature of statistics: even if your method is 99% accurate, you can still make errors 99% of the time.
But what if we did have 100% accurate genAI-text detection? Would that be useful? I don’t think it would.
Until the release of ChatGPT, it was hard, or impossible to get generative AI to produce logically, and grammatically, coherent text. For that reason, if a research paper was AI-generated, it was almost guaranteed to be fraud. There were some exceptions (e.g. where AI had been used for translation), but detecting AI generation was a good idea at that time because it was a really clear indicator of fraud.
With the release of ChatGPT, generating coherent text became much easier. At that point, 2 things happened:
- Generating text for the purpose of fraud became much easier. And ChatGPT was used for that purpose.
- Generating text for any non-fraudulent purpose became much easier as well. ChatGPT became used for those purposes, too, and therefore AI generated text ceased to be a good indicator of fraud.
This is why we don’t seek to detect generative AI at Clear Skies. It’s because the moment it became a serious problem, it also became irrelevant to detect it. Instead, we look for genuine indicators of fraudulent behaviour. The result being that we have a high success rate in detecting fraud (whether it’s AI-generated or not).
When I sat down to write this blog post in MS Word, ChatGPT (which is now built into MS Word) offered up its rendition of the entire blog post. I chose to go down the oldskool route of writing the blog post myself, but I really can’t see much wrong with people using AI to assist their writing. It might save them time, or give them ideas to help them improve. It’s just part of the way that people write now, and from now on.
I’ve written before about how I think genAI is essentially just automated plagiarism. Perhaps that’s too cynical. ChatGPT (or, at least, the version I use through MS Copilot) does cite its sources now. Also, re-using ideas isn’t a problem most of the time, except in cases where the original author expects recognition.
Bottom line: genAI is beside the point.
The reason that ChatGPT paper was problematic wasn’t that it was generated by AI, it was that it was produced dishonestly and not peer-reviewed properly.
The solution to that rising tide of dishonesty in the literature is complex, but a strong peer-review system is a big part of it. There is no substitute for human scrutiny and supporting a healthy robust peer-review system is what we do at Clear Skies.
The Papermill Alarm is an award-winning AI tool which supports peer-review and automates a lot of the hard work in identifying fraud. But, I still think it’s most important function is to tell us one thing: get human eyes on this!
P.S. I should be clear that I’m referring exclusively to text in the above post. I think that there is some mileage in detecting genAI images. AI generation in an image is much more likely to be an indicator of fraud now and for the foreseeable future.