Moshi, a Real-Time AI Assistant, Is Challenging Silicon Valley From Europe

Behind Moshi is Kyutai, a French startup that launched its multimodal AI assistant with an extremely low latency of 200 milliseconds.

Europe still has much to say about AI development, and Moshi is one of its best arguments
No comments Twitter Flipboard E-mail
javier-lacort

Javier Lacort

Senior Writer

I write long-form content at Xataka about the intersection between technology, business and society. I also host the daily Spanish podcast Loop infinito (Infinite Loop), where we analyze Apple news and put it into perspective. LinkedIn

Kyutai, a French startup supported by big names in tech and finance, has launched Moshi, an AI assistant capable of speaking and listening in real-time. It’s like what OpenAI promised with GPT-4o but ended up delaying in the end.

Why it matters. Moshi represents a leap in conversational AI, offering capabilities that OpenAI hasn’t implemented yet. Along with Mistral, it represents Europe’s growing role in the AI competition.

Context. Founded in November 2023, Kyutai received more than $300 million in investment to enter the AI race with a clear focus on open source and transparency.

It did so with the help of several minds:

  • Xavier Niel: French billionaire and founder of Iliad, a telecommunications company.
  • Rodolphe Saadé: French-Lebanese billionaire and CEO of CMA CGM, a shipping giant.
  • Eric Schmidt: Former CEO of Google and technology investor.
  • Patrick Perez: CEO of Kyutai, former head of Valeo, a century-old automaker supplier.
  • Hervé Jégou: Kyutai’s chief scientific officer and a former member of Google's DeepMind and Meta.

Moshi’s keys:

  • It can express 70 different emotions and styles.
  • It simultaneously processes and generates audio and text, allowing you to “think while you speak.”
  • It works in near real-time, with a latency of 200 milliseconds.
  • It uses Helium, a 7-billion-parameter language model.
  • It can run on a general-purpose computer.

Behind the technology. The developers of Helium and Moshi have trained these systems with 100,000 synthetic “speech-style” conversations, according to the company, and have used 20 hours of audio from a voice actress named “Alice.” Kyutai has declined to give her full name. There's probably a reason.

Its creations include a watermark indicating that they’re AI-generated, a way to address ethical and security concerns about synthetic content.

Next steps. Kyutai will have to find the balance between innovation and security, which has made previous projects controversial. Its open source approach may help it accelerate.

  • Kyutai plans to release the model's source code, the 7-billion-parameter model, the audio codec, and the full stack.
  • Future versions (1.1, 1.2, and 2.0) will refine the model based on user feedback.
  • The company aims to make the license as permissive as possible to encourage widespread adoption and innovation.

In perspective. Moshi represents a breakthrough in conversational AI and the balance of power in the tech world. It has the backing of influential figures and a focus on transparency and open source that could redefine the AI landscape.

It also seeks to challenge Silicon Valley’s dominance from across the Atlantic and position Europe as a significant player in the future of AI.

You can try it out online here.

Image | Xataka On with Mockuuups Studio

Related | The ChatGPT Client for Mac Is the Latest Example of Why We Need More Security in AI

Home o Index