Google's avalanche of new generative AI features concluded with a very special announcement: the unveiling of Project Astra. The project is the company's ambitious initiative to create the virtual assistant of the future, which would be powered by AI and by the vision provided by the cameras of our smartphones and other devices.
Project Astra. With this ambitious project, Google wants to “develop universal AI agents that can be useful in our daily lives.” The company emphasizes that such an assistant has be able to understand and respond in the same way as humans do. It must also “remember what it sees and hears to understand the context and act.”
Latency is the challenge. Company executives recognized that they've come a long way in understanding multimodal information, including text, voice, audio, and video data. However, “reducing the response time to something conversation-oriented is a difficult engineering challenge.”
Voice tones. Project Astra is working to provide higher-quality speech synthesis models that allow different agents to have a wide range of intonations. According to Google, agents will understand the context they’re being used in better than ever before to respond quickly, which will allow them to respond quickly.
The Gemini app is on the horizon. All of this learning will eventually be integrated into solutions like Gemini’s mobile app, which will be the equivalent of what OpenAI presented with the ChatGPT based on its new GPT-4o model. According to Google, the app will be available before the end of the year.
Tell me, what do you see? In the demo, Google demonstrates a preliminary version of Project Astra, which uses cell phone cameras to recognize objects independently. In fact, Google asked AI to identify some rather interesting objects in unique situations. For example, company executives drew an arrow on the screen and then asked the AI model to describe what it was. It's reminiscent of the new "Circle to Search" feature, but in this case, it was applied to live queries about what the AI model could recognize.
Glasses! The most striking part of the video is when the person doing the demonstration asks: “Where did I leave my glasses?” and the assistant answers her. When she puts them on, the video shows they aren’t regular glasses but a model with a camera and Project Astra integration. From there, the user makes some quick demonstrations of how the device, with the aid of this integration, helps and answers her questions in a remarkable and sincere way.
OpenAI is ahead, but Google is close behind. Project Astra is a direct competitor to the features OpenAI recently introduced with GPT-4o. The company led by Sam Altman may be slightly ahead because these voice interaction options are already being rolled out to some users. However, mass availability of OpenAI's new features will take several weeks... or months. Google is behind, yes, but this alternative looks just as promising and will be an exciting way to spice up this competition. One thing is clear: It's the users who will win in the end.
Image | Google