Since early this year, major tech companies have had a clear goal: encouraging conversations with AI. OpenAI, Microsoft, Google, and Meta have been adding voice features to their assistants, but this seems to be just the start. The industry is moving at a rapid pace, and the way users interact with these tools continues to evolve.
Say hello to voice agents. OpenAI has focused on text-based agents for months, with tools such as Operator and Computer-Using Agents. However, the company is preparing its next big move to stand out in the AI development race: the launch of a powerful new generation of voice agents.
New models are entering the scene. On Thursday, OpenAI announced the release of new audio models for converting speech to text and vice versa. These models aren’t part of ChatGPT but will be available through an API, allowing developers to create speech agents. Notably, these models aim to be much more accurate and offer enhanced customization.
OpenAI’s new models will be built on GPT-4o and GPT-4o-mini. They promise improvements over Whisper and previous text-to-speech tools, which will continue to be available via the API. However, performance isn’t the only focus. These models can also modulate their tone to sound to “talk like a sympathetic customer service agent” and more.

Next destination: call centers. OpenAI has made its intentions clear with the recent release. “For the first time, developers can also instruct the text-to-speech model to speak in a specific way… unlocking a new level of customization for voice agents. This enables a wide range of tailored applications, from more empathetic to dynamic customer service voices to expressive narration for creative storytelling experiences,” the company says.
According to OpenAI, this technology will facilitate the creation of much richer “conversational experiences.” ChatGPT was launched in November 2022 powered by GPT-3.5. This makes it evident that progress has been rapid. More importantly, all signs indicate that these models will eventually be implemented in call centers.
Interactions might be somewhat limited initially, but they’ll certainly surpass current voice systems. These new models will move beyond traditional automated assistants, becoming far more natural in their interactions. Over time, the distinction between conversing with a human and an AI could become nearly indistinguishable.
Images | LumenSoft Technologies | OpenAI
Related | I Tested ChatGPT’s Advanced Voice Mode. It’s the Beginning of Something Huge
Log in to leave a comment