Google Has a Tool for Tagging and Detecting AI-Generated Text. It’s a Nice Concept, but There’s Still a Problem With It

  • The company has recently made its SynthID watermarking technology available to all developers and platforms.

  • The tool is beneficial for identifying the surge of AI-generated content.

  • However, the main issue lies in the existence of various options. What we truly need is a universal standard.

AI should label its creations. Just like authors sign their written works and painters sign their paintings, generative AI systems should mark the content they produce as AI-generated. Google, which has previously explored this concept, has recently made significant strides in this area. However, the problem still persists: We need a universal standard.

SynthID. Google and DeepMind have been working toward tackling this issue. Although they introduced SynthID over a year ago, the text watermarking tool is now available for free to developers and businesses. The goal is to provide generative AI platforms with a method to sign the content they create, making it easier to identify AI-generated works.

How it works. According to DeepMind, SynthID can tag text, music, images, and videos generated by AI. For example, when an AI generates text, it does so by using tokens. Each token can represent a single letter, a word, and part of a sentence. The model predicts the next token based on the preceding context by assigning scores to each token. It ultimately generates recognizable patterns of scores that can be compared to any text, helping to determine whether it was generated by AI.

Gemini already utilizes SynthID. SynthID Text, the version of the tool for AI-generated text, has been integrated into the Gemini series of models since spring. Google claims this integration hasn’t compromised the quality, accuracy, and speed of text generation.

There are some limitations. However, the company acknowledges that SynthID struggles with short texts, rewritten content, translations from other languages, and highly specific questions, where the answers are often narrow (for example, “What is the capital of France?”).

No universal standard. Watermarking techniques for content seem like a good idea. However, while Google recognizes SynthID’s limitations, the lack of a standard labeling system poses a significant challenge. OpenAI has been developing its own watermarking system for years. Meanwhile, Adobe is part of the Coalition for Content Provenance and Authenticity, better known as C2PA, which has its own specifications. Although the overall concept is similar to those from Google and OpenAI, the implementations differ slightly. Additionally, Meta has developed its own watermarking system for AI-generated audio.

Consensus is needed. The solution to this situation is straightforward: The industry should select one solution from the available options and establish it as a universal standard. This standard should be easily adoptable by all companies and developers, promoting widespread use over time. While current efforts are helpful, they also highlight the absence of a unified criterion, which is essential in the generative AI field.

Image | Thomas Lefebvre

Related | Wikipedia Is Filling Up So Much With AI-Generated Content That It Has a Group Dedicated to Finding It

See all comments on https://www.xatakaon.com

SEE 0 Comment

Cover of Xataka On