A tool capable of identifying productions of ChatGPT has been developed by Stanford University
ChatGPT productions soon detectable in all circumstances? Researchers at Stanford University (California) have developed a model capable of identifying textual content generated by OpenAI’s conversational agent. Scientists have shared their first results in an article published on January 26, 2023.
A 95% reliable model?
As ChatGPT’s popularity grows, the education world is taking steps to restrict its use. With a few sentences, it is indeed possible to ask the AI to generate any textual content on a given subject. Producing a presentation, answering a MCQ, writing a dissertation, it’s possible… A hard blow for teachers who cannot formally identify these productions. And for good reason, for the same question, ChatGPT can give a multitude of unique answers. It is therefore impossible to detect a text written by a human or by the AI, or almost.
However, Stanford researchers have analyzed the productions of the chatbot and established common factors. To identify the productions of ChatGPT, they relied on a (logarithmic) probability function calculated from the “random disturbances” of a text. Called DetectGPT, the model would be able to identify 95% of AI productions, a detection rate much higher than most other analysis tools.
An imperfect imitation of human writing
“In other words, MLLs [Les grands modèles de langage, NDLR] which do not perfectly imitate human handwriting filter essentially implicitly”, write the researchers in their article. AI-generated productions would naturally have a watermark that is imperceptible to a human, a sort of digital trace. Even if the results of DetectGPT are very good, the design of the model (by scoring the set of disturbances of each pass) would generate a large consumption of resources on the system where the program is executed. A problem that the authors of the study are trying to solve.
In the future, Stanford researchers could try to combine their tool with other detection algorithms in order to obtain even more advanced results and follow the progress of AI models (GPT-4 in particular). Scientists are also wondering about the future use of their research to detect artificial intelligence productions in the fields of audio, image or video. The properties identified on the text-generating models could, possibly, be reproduced by these AIs.
“We hope that the present work will serve as an inspiration for future work aimed at developing efficient and versatile methods to mitigate the potential drawbacks of machine-generated media”conclude the specialists.