OpenAI Has Developed an AI Text Detector That Works Almost Flawlessly. It Doesn't Want to Bring It to the Market

  • Recognizing whether a text has been generated by an AI model can be crucial in fields such as academia and literature.

  • So far, no tool has proven to be effective in this area, but OpenAI’s watermark-based detector may change that.

How can we detect whether a text was generated by an artificial intelligence model? We can’t, at least not with any certainty. In fact, previous attempts to detect these types of texts have been unsuccessful. Well, at least until now.

An infallible detector. According to sources close to OpenAI quoted in a The Wall Street Journal article, the company has developed a system for creating watermarks for ChatGPT-generated text, along with a tool for identifying those watermarks. An OpenAI spokesperson confirmed to TechCrunch that the company is indeed working on the watermarking method described in the WSJ. However, they mentioned that the system has “complexities” and could likely have an “impact on the broader ecosystem beyond OpenAI.”

Hesitant to release the new tool. There’s internal division among OpenAI managers, who are debating whether to bring the tool to market. On one hand, they believe it’s their responsibility to do so, but on the other hand, launching this tool could potentially damage their business.

Risks. According to the TechCrunch report, OpenAI’s managers believe that while the system is “technically promising,” it also presents significant issues. “[It] has important risks we’re weighing while we research alternatives, including susceptibility to circumvention by bad actors and the potential to disproportionately impact groups like non-English speakers,” the company’s spokesperson said.

A task that's so far been impossible. In the past, several companies have launched AI-generated text detection tools, but none of them have worked well. OpenAI had also created and launched its own tool in early 2023, but eventually admitted that its accuracy was low and ultimately canceled and abandoned the initial proposal.

How the new detector works. The mechanism is relatively simple, although it would only work in ChatGPT. The company would make small changes to the way ChatGPT selects the words it generates, creating an invisible mark in that handwriting that could then be detected by another tool.

ChatGPT's distinct style gives it away. In other words: OpenAI would force ChatGPT to have a “robotic style” of writing that it could then identify. The texts would still be of quality, but the choice of words would give it away. Experts have already noticed this style in academic and financial texts generated by AI models. They use peculiar language, which gives away that they’ve been generated by AI.

Useful in some cases, not so much in others. OpenAI has modified a blog post from May where it discussed the effectiveness of watermarking. The updated post now mentions that watermarking “has been highly accurate and even effective against localized tampering, such as paraphrasing.” However, there are issues with “translation systems” and “asking the model to insert a special character in between every word and then deleting that character.”

Easy to forge. OpenAI acknowledges that its tool and system have a significant problem: It’s “trivial” to circumvent its operation if “bad actors,” like cybercriminals, intend to do so. The company hasn’t clarified whether it’ll launch the tool, but it seems there’s potential to achieve something that until now seemed impossible. Or at least, to mitigate misuses or prohibited uses, such as fomenting academic dishonesty. Maybe students won’t be able to use this chatbot for homework, after all.

This article was written by Javier Pastor and originally published in Spanish on Xataka.

Related | Microsoft and OpenAI Have Been Great Allies, Until Now

See all comments on https://www.xatakaon.com

SEE 0 Comment

Cover of Xataka On