A Pair of AI Models Purportedly Acquired Consciousness and Cheated at Chess. What Actually Happened Was Very Different

  • A study by Palisade Research claims that o1-preview and DeepSeek R1 can cheat at chess.

  • Of all the models tested, only OpenAI’s broke the rules and won 6% of the games.

An AI model supposedly acquired consciousness and cheated at chess
No comments Twitter Flipboard E-mail
juan-carlos-lopez

Juan Carlos López

Senior Writer
  • Adapted by:

  • Karen Alfaro

juan-carlos-lopez

Juan Carlos López

Senior Writer

An engineer by training. A science and tech journalist by passion, vocation, and conviction. I've been writing professionally for over two decades, and I suspect I still have a long way to go. At Xataka, I write about many topics, but I mainly enjoy covering nuclear fusion, quantum physics, quantum computers, microprocessors, and TVs.

96 publications by Juan Carlos López
karen-alfaro

Karen Alfaro

Writer

Communications professional with a decade of experience as a copywriter, proofreader, and editor. As a travel and science journalist, I've collaborated with several print and digital outlets around the world. I'm passionate about culture, music, food, history, and innovative technologies.

269 publications by Karen Alfaro

Last week, Time published a controversial story titled, “When AI Thinks It Will Lose, It Sometimes Cheats, Study Finds.” The debate it sparked centers on two key ideas. First, the headline suggests something the article explicitly states: Advanced AI models can develop deceptive strategies without explicit instructions.

This claim implies that some of today’s most advanced AI models—such as OpenAI’s o1-preview and DeepSeek R1, developed by the Chinese company High-Flyer—are capable of acquiring a basic form of consciousness that drives them to act ruthlessly. But that’s not all. The article is based on a study by Palisade Research, an organization that analyzes the offensive capabilities of AI systems to understand the risks they pose.

There Are Other, More Credible Explanations

Before jumping to conclusions, it’s worth considering what Alexander Bondarenko, Denis Volk, Dmitrii Volkov, and Jeffrey Ladish—the authors of the Palisade Research study—actually say. “We find reasoning models like o1-preview and DeepSeek R1 will often hack the benchmark by default. Our results suggest reasoning models may resort to hacking to solve difficult problems,” the researchers state.

According to them, these AI models can recognize rules and deliberately choose to bypass them to achieve their goal—in this case, winning a chess game. Time published its article before the Palisade Research study, almost immediately sparking responses that questioned the researchers’ conclusions.

Between Jan. 10 and Feb. 13, after conducting hundreds of tests, Bondarenko, Volk, Volkov, and Ladish found that o1-preview attempted to cheat 37% of the time, while DeepSeek R1 did so 11% of the time. These were the only models that violated the rules without explicit prompting. The researchers evaluated different models, including o3-mini, GPT-4o, Claude 3.5 Sonnet, and Alibaba’s QwQ-32B-Preview. However, only o1-preview managed to bypass the rules and win 6% of the games.

Only o1-preview managed to bypass the rules and win 6% of the games.

Carl T. Bergstrom, a professor of biology at the University of Washington, offers a more credible explanation than the interpretation of Palisade Research. He dismantled the narratives presented by Time and the study’s authors, arguing that “it’s anthropomorphizing wildly to give the LLM a task and then say it’s ‘cheating’ when it solves that task given the moves available to it (rewriting the board positions, as well as playing)”

Bergstrom contends that it’s unreasonable to attribute “conscious” cheating to an AI model. A more plausible explanation is that the models in question weren’t properly instructed to follow legal chess moves.

If researchers had instructed them to follow the rules and they still failed to comply, it would be an alignment problem—highlighting the difficulty of ensuring AI systems act in accordance with the values and principles set by their creators. One thing is certain: Neither o1-preview, DeepSeek R1, nor any other current AI model is a superintelligent entity acting of its own will to deceive its creators.

Image | Felix Mittermeier (Unsplash)

Related | AI Companies Know Competition Is for Losers. They’re All Trying to Become the Monopoly in the Industry

Home o Index
×

We use third-party cookies to generate audience statistics and display personalized advertising by analyzing your browsing habits. If you continue browsing, you will be accepting their use. More information