OpenAI Introduces GPT-4o: A Surprising Free AI Model That Works With Voice, Text, and Image at Once

With its latest product upgrade, the company led by Sam Altman aims to take ChatGPT to the next level.
GPT-4o responds to audio inputs in about 320 milliseconds, a pace similar to humans.

May 14, 2024, 08:02

Updated May 14, 2024, 08:12 ET

Javier Márquez

Writer

At its highly anticipated live event on Monday, OpenAI unveiled a new artificial intelligence model called GPT-4o. This product is expected to be the smartest and fastest in the company’s history.

GPT-4o is inherently multimodal, which addresses latency issues and allows for “real-time” interaction. The good news is that this promising model will be available to all ChatGPT users, including those using the free version.

OpenAI’s Most Advanced Model

OpenAI’s CTO, Muri Murati, mentioned during a broadcast that the new AI model possesses “GPT-4 level intelligence” along with improved text, audio, and vision capabilities. This technological advancement has made it possible to develop a new voice mode.

The original voice mode in ChatGPT had an average response latency of 2.8 seconds in GPT-3.5 and 5.4 seconds in GPT-4. The new voice mode powered by GPT-4o, however, reduces the response time to 320 milliseconds and offers a much more natural interaction system.

When it's available, it will provide users with an assistant that they can converse with using natural language, just as they would with another person. For instance, they’ll be able to interrupt the assistant while it’s responding, features that haven’t been seen in any AI chatbot until now.

During the live demonstration, we saw some truly amazing features. The model not only responded instantly but also provided answers in different tones of voice. It can even laugh, sing, express different moods, and solve mathematical problems.

Another notable feature that coming to ChatGPT as part of the advancements in the new model is instant translation. OpenAI says that users will be able to ask the chatbot what they want it to translate, and it’ll start working right away. All of this will in natural language, according to the company, without the need for specific commands.

Users will also be able to tell ChatGPT that they're with someone who speaks another language, such as Italian, and ask it to translate the conversation into English in real-time. Then, all users would have to do is to start talking so that the chatbot can translate.

As mentioned before, this new model also has enhanced visual capabilities. In this regard, users will be able to provide it with a photo or a screenshot so it can analyze it and look for related information. Use cases range from identifying a car’s model to checking for mistakes in a programming code.

How to Access the OpenAI’s New GPT-4o Model

The Microsoft-backed AI company has started rolling out GPT-4o to paid users of ChatGPT Plus and Teams. OpenAI has also started to roll out GPT-4o on the free version of its chatbot, with an “iterative” release that includes only text and image-related new features.

Paid ChatGPT users will still enjoy some benefits over free users, including higher usage limits (5x more than free users). Additionally, over the next few weeks, paid users be able to access the new real-time voice mode, reminiscent of Spike Jonze’s Her. Free users will receive these features later on.

OpenAI has also announced the release of a ChatGPT app for macOS, which lets users to pull up the assistant with the keyboard shortcut Option+ Space. This app is designed to integrate into the user's desktop, enabling them to ask it to analyze a statistical graph or join a video call. The company said that paid users have early access to the macOS app and can download it now.

Image | OpenAI

See all comments on https://www.xatakaon.com

SEE 0 Comment

OpenAI Introduces GPT-4o: A Surprising Free AI Model That Works With Voice, Text, and Image at Once

With its latest product upgrade, the company led by Sam Altman aims to take ChatGPT to the next level.

GPT-4o responds to audio inputs in about 320 milliseconds, a pace similar to humans.

OpenAI’s Most Advanced Model

How to Access the OpenAI’s New GPT-4o Model