AI Models Have Always Been a Big Black Box, Preventing Users From Knowing How They ‘Think’ Inside

AI systems have no idea what they’re saying or why they’re saying it. Almost everything makes sense when they respond to users—even their mistakes. But machines don’t understand what they’re doing, they just do it. Users don’t yet understand how they think inside, but that may soon change.

Opening the black box. Researchers at Anthropic, the company that created the Claude chatbot, claim to have made an important discovery that will allow users to understand how large language models (LLMs) work. These models work like big black boxes: Users know from the start what to give them (a prompt) and what they get as a result. What remains a mystery, however, is what goes on inside these “black boxes” and how the models generate the content they do.

Why it’s essential to know how AI models “think.” AI models’ inscrutability creates significant problems. For example, it’s difficult to predict whether they will “hallucinate” or make mistakes and why they made them. Knowing exactly how they work inside would help developers better understand these errors, correct these problems, and improve their behavior.

Safer, more reliable. Knowing why AI models do what they do would also be crucial for trusting them more. These models would offer many more guarantees in areas such as privacy and data protection, which often bar companies from using them.

What about reasoning models? The emergence of models such as o1 or DeepSeek R1 has made it possible for these “reasoning” processes to appear and show their actions at any moment. The list of mini-tasks they perform (“searching the web,” “analyzing information,” etc.) is helpful. Still, the so-called “chain of reasoning” doesn’t reflect how these models process users’ requests.

How does Claude calculate 36 + 59? The exact mechanism remains unclear, but Anthropic researchers are beginning to figure it out.

Deciphering how AI models think. Anthropic experts have created a tool that attempts to decipher this black box. It’s like MRI scans of the human brain, revealing which brain regions play a role in specific cognitive domains.

Anthropic Raises Another $3.5 Billion. It’s Exactly What It Needs to Survive in an Increasingly Competitive Market

AI Models Have Always Been a Big Black Box, Preventing Users From Knowing How They ‘Think’ Inside—Until Now

Researchers at Anthropic have created a tool to understand why AI systems do things the way they do.

Receive "Xatakaletter", our weekly newsletter