OpenAI’s most recent product is quite impressive. GPT-4o, the company's newest AI model, can carry a fluid conversation, understand what it sees through the camera and what users say, and provide live voice responses in practically real-time. It’s the closest we’ve ever come to talking to machines. The question is: Where is the voice coming from? Is it synthetic, or is it a person? And why does the voice in the demo sound so much like Scarlet Johansson in Her?
Five voices. ChatGPT has had a voice since September 2023. Well, not a voice, but rather, voices. Five, to be exact: Breeze, Cove, Ember, Juniper, and Sky. They’re synthetic because it isn’t possible to translate every single word and phrase in all the languages it currently supports (37), but behind these five voices are real people. And OpenAI has explained where they come from.
Scarlet Johansson, is that you? When OpenAI first presented GPT-4o, more than a few users thought the voice sounded like Samantha, the AI system in the movie Her. Although the ChatGPT sounds like Samantha, the reality is that this voice has been part of the chatbot for a long time. Its “name” is Sky. According to OpenAI: “It is not an imitation of Scarlett Johansson but belongs to a different professional actress using her own natural speaking voice.”
However, given such confusion, the company has paused Sky’s voice. To protect the privacy of the professionals it worked with, OpenAI hasn’t shared the names of its voice actors.
The requirements. OpenAI was thoughtful in the selection of the five voices for ChatGPT. In early 2023, the company worked with “well-known, award-winning independent casting directors and producers"—whose names it also didn’t disclose—to create a set of criteria the voices had to meet. After all, many, many people would hear them. These were the requirements:
- Actors from diverse backgrounds or who can speak multiple languages.
- A timeless voice.
- An approachable voice that inspires trust.
- A warm, engaging, trustworthy, charismatic voice with a rich tone.
- Natural and easy to listen to.
400 actors, five selected. In May 2023, OpenAI (through an agency) hosted an open call for participants and received about 400 applications. It provided the actors with a script of ChatGPT-type responses, such as answering mindfulness questions, brainstorming to plan a trip, or having a mundane conversation. The agency selected 14 voices and then reduced them to five. Recording sessions took place between June and July. As OpenAI explains:
“All of the actors are paid above market rates and will continue to be as long as their voices are used in our products.”
Other famous voices and their names. At the moment, we don’t know who has voiced ChatGPT, but we do know the names of other actors. For example, Kat Callahan is behind TikTok's "text-to-image" voice. And in the case of Siri, Susan Bennett was behind the original English voice of Apple's assistant.
Image | Unsplash (Solen Feyissa)