Last week, Time published a controversial story titled, “When AI Thinks It Will Lose, It Sometimes Cheats, Study Finds.” The debate it sparked centers on two key ideas. First, the headline suggests something the article explicitly states: Advanced AI models can develop deceptive strategies without explicit instructions.
This claim implies that some of today’s most advanced AI models—such as OpenAI’s o1-preview and DeepSeek R1, developed by the Chinese company High-Flyer—are capable of acquiring a basic form of consciousness that drives them to act ruthlessly. But that’s not all. The article is based on a study by Palisade Research, an organization that analyzes the offensive capabilities of AI systems to understand the risks they pose.
There Are Other, More Credible Explanations
Before jumping to conclusions, it’s worth considering what Alexander Bondarenko, Denis Volk, Dmitrii Volkov, and Jeffrey Ladish—the authors of the Palisade Research study—actually say. “We find reasoning models like o1-preview and DeepSeek R1 will often hack the benchmark by default. Our results suggest reasoning models may resort to hacking to solve difficult problems,” the researchers state.
According to them, these AI models can recognize rules and deliberately choose to bypass them to achieve their goal—in this case, winning a chess game. Time published its article before the Palisade Research study, almost immediately sparking responses that questioned the researchers’ conclusions.
Between Jan. 10 and Feb. 13, after conducting hundreds of tests, Bondarenko, Volk, Volkov, and Ladish found that o1-preview attempted to cheat 37% of the time, while DeepSeek R1 did so 11% of the time. These were the only models that violated the rules without explicit prompting. The researchers evaluated different models, including o3-mini, GPT-4o, Claude 3.5 Sonnet, and Alibaba’s QwQ-32B-Preview. However, only o1-preview managed to bypass the rules and win 6% of the games.
Only o1-preview managed to bypass the rules and win 6% of the games.
Carl T. Bergstrom, a professor of biology at the University of Washington, offers a more credible explanation than the interpretation of Palisade Research. He dismantled the narratives presented by Time and the study’s authors, arguing that “it’s anthropomorphizing wildly to give the LLM a task and then say it’s ‘cheating’ when it solves that task given the moves available to it (rewriting the board positions, as well as playing)”
Bergstrom contends that it’s unreasonable to attribute “conscious” cheating to an AI model. A more plausible explanation is that the models in question weren’t properly instructed to follow legal chess moves.
If researchers had instructed them to follow the rules and they still failed to comply, it would be an alignment problem—highlighting the difficulty of ensuring AI systems act in accordance with the values and principles set by their creators. One thing is certain: Neither o1-preview, DeepSeek R1, nor any other current AI model is a superintelligent entity acting of its own will to deceive its creators.
Image | Felix Mittermeier (Unsplash)
Log in to leave a comment