OpenAI Has Just Undertaken a Challenging and Ambitious Journey: Understanding AI’s ‘Black Box’

The company aims to shed light on the inner workings of neural networks.
So-called “sparse autoencoders” hold promise in helping us tackle this challenge.

June 7, 2024, 13:07

Updated June 4, 2025, 15:30 ET

Javier Márquez

Writer

Artificial intelligence has proven to be incredibly useful in various applications. It powers driver assistance systems like Tesla’s Autopilot and enables conversational chatbots like ChatGPT. However, despite its widespread use, we still don’t fully understand how AI works. This lack of understanding poses a challenge when it comes to ensuring the safety of the models we use every day.

On Thursday, OpenAI announced new methods for understanding how GPT-4 works. The company led by Sam Altman is using “sparse autoencoders” to identify features and patterns that can help us comprehend the model. So far, they’ve found 16 million features, but this number is expected to grow as they continue their research.

Understanding AI’s “Black Box”

In the field of AI, experts work with well-defined concepts and utilize extensive datasets to train neural networks in large language models (LLM). When these models become too large and complex to run on existing computing infrastructure, they employ techniques like Mixture of Experts (MoE) to divide the model capacity into different specialties.

OpenAI Has Just Undertaken a Challenging and Ambitious Journey: Understanding AI’s ‘Black Box’

The company aims to shed light on the inner workings of neural networks.

So-called “sparse autoencoders” hold promise in helping us tackle this challenge.

Understanding AI’s “Black Box”