TRENDING

Elon Musk Says AI Systems Have Consumed All Human Knowledge. He Has a Plan B: Synthetic Data

The scarcity of data for training AI models marks a historic turning point. Synthetic data offers an alternative but carries risks.

Elon Musk says AI systems have consumed all human knowledge
No comments Twitter Flipboard E-mail
javier-lacort

Javier Lacort

Senior Writer

I write long-form content at Xataka about the intersection between technology, business and society. I also host the daily Spanish podcast Loop infinito (Infinite Loop), where we analyze Apple news and put it into perspective. LinkedIn

Elon Musk, owner of X and CEO of xAI, among other companies, says AI systems are nearing the exhaustion of all available online data for training.

His solution involves crossing the Rubicon of model training by using synthetic data, meaning AI models will generate the data they learn from.

Why it matters. The scarcity of training data will mark a pivotal moment in the development of AI tools. However, it could also slow technological progress.

Context. Large language models require vast amounts of data to improve their performance. The depletion of real data, generated by humans through traditional means, is pushing the industry to seek alternatives to enhance products like chatbots and image generators.

  • The idea isn’t new. Other AI projects have already adopted it. Gartner predicts that by 2024, 60% of the data used in AI projects will be synthetically generated. Companies such as Microsoft, OpenAI, Anthropic, and Meta are turning to synthetic data.
  • Palmyra X 004, a model designed to power existing AI applications, was trained this way at a cost of $700,000.
  • By comparison, training a similarly sized OpenAI model costs an estimated $4.6 million.

What’s different about Musk’s proposal? So far, synthetic data has supplemented real data, not replaced it. Musk believes synthetic data will soon become the only viable training source.

Between the lines. Musk isn’t alone in raising concerns. In December, Ilya Sutskever, a former chief scientist at OpenAI, issued a similar warning: “We have reached the peak of data, and there will be no more data in the future.”

  • The issue with synthetic data lies in the risk of creating a closed loop, where biases and limitations become amplified.
  • This could result in model collapse through a gradual loss of creativity and accuracy.

Despite these risks, the industry continues to embrace synthetic data.

Image | Xataka On with Grok 

Related | Elon Musk Has Calculated What He Needs to Build a Sustainable City on Mars: 1,000 Starships and 20 Years of Launches

Home o Index