Wikipedia Is Filling Up So Much With AI-Generated Content That It Has a Group Dedicated to Finding It

  • The WikiProject AI Cleanup group is a team of volunteers who search Wikipedia for AI-generated content to edit or remove it.

  • It’s not an easy job.

Content generated by AI has reached every corner. It’s appeared on Amazon, where books written by ChatGPT are now available. It's also reached media outlets, scientific articles, videos, music, images, and photographs—essentially, everything. Given that landscape, it’s no surprise that generative AI is also present on Wikipedia.

That’s a problem. As such, the platform is addressing it with a group of collaborators dedicated to finding and removing this content. They call themselves the WikiProject AI Cleanup group.

How Wikipedia works. Understanding that Wikipedia is open for anyone to write and edit articles is essential. This openness has a positive side. If you have information about a topic or are an expert in a field, you can enhance the encyclopedia by adding new information, refining existing content, or correcting errors.

However, the downside is that anyone can edit Wikipedia and create false information. Introducing tools like ChatGPT complicates this issue further.

AI floods everything. According to 404 Media, Ilyas Lebleu, founder of the WikiProject AI Cleanup initiative, explains that the project began when they noticed “the prevalence of unnatural writing that showed clear signs of being AI-generated.” Using ChatGPT, the team replicated similar styles, making it an obvious step.

404 Media cites a notable example: the Ottoman fort of Amberlisihar, allegedly built in 1466. Its Wikipedia page, a 2,000-word article, detailed its history, construction, materials—everything you would expect. The issue is that this fort doesn’t exist. It’s a product of an AI hallucination. The article appeared in January 2023, but Wikipedia only discovered it in December.

The same goes for photographs. An article on the Islamic seminary Darul Uloom Deoband included the image below, which at first glance may seem authentic. However, a closer look at the hands and feet—pay attention to detail below—reveals it is AI-generated. The WikiProject AI Cleanup team removed the image because it “contributes little to the article, could be mistaken for a contemporary artwork, and is anatomically incorrect.” The team doesn’t remove all AI-generated images, only those deemed inappropriate.

The image’s description read: “An AI-created image of the early days of the Darul Uloom Deoband Islamic seminary. This AI-generated image shows professor Mahmud Deobandi instructing his student Mahmud Hasan Deobandi, the seminary’s first student, who later became known as ‘Shaykh al-Hind’ and played an important role in the Indian independence movement.” The AI-generated nature of the image was evident from details in the hands, the book, and the feet.

Volunteers vs. AI. WikiProject AI Cleanup is “a collaboration to combat the increasing problem of unsourced, poorly written AI-generated content on Wikipedia.” Anyone can join and participate. The goal isn’t to ban or restrict AI, but rather “to verify that its output is acceptable and constructive, and to fix or remove it otherwise.”

This is no easy task. If there’s one thing LLMs are good at, it’s passing off their creations as legitimate text. However, they can leave some clues. Phrases like “as a model AI language,” vague descriptions like “a town known for its fertile land,” or an overly promotional tone are all signs that AI might be behind the content.

AI-generated texts may contain clues that indicate their synthetic origin: a highly promotional tone, generic descriptions, etc.

On the other hand, detecting AI-generated content might seem as simple as checking for references. However, AI is capable of “hallucinating” sources. As the WikiProject AI Cleanup group explains on its Wikipedia page, AI can fabricate sources or cite real ones that are completely off-topic.

Someone wrote this article on Leninist historiography entirely with AI. The system quoted Russian and Hungarian sources that appeared legitimate but didn’t exist. The volunteer team deleted the article. In another case, an article on the Estola albosignata beetle included real French and German sources, but they never mentioned the beetle. WikiProject AI Cleanup edited the article.

The AI challenge. The use of AI isn’t inherently bad, but it does pose a challenge to credibility. If Wikipedia allowed AI-generated content to flood the platform, it would no longer be trustworthy. AI models tend to “hallucinate,” meaning they can invent information. Even when the text seems plausible and well-written, the data, dates, names, and events may not be accurate.

This isn’t just an issue for Wikipedia. There’s a broader risk that false, inaccurate, or fabricated information could spread across the Internet. Since Wikipedia is one of the key sources for LLMs, inaccurate information on Wikipedia could lead to these models producing more inaccurate results, creating a vicious cycle. That’s why the work of these volunteers is so crucial.

Image | Sanket Mishra

Related | Defining What Is Open Source AI Is Proving to Be a Nightmare, and Purists Are Refusing to Budge

See all comments on https://www.xatakaon.com

SEE 0 Comment

Cover of Xataka On