Gemini Live Lets Your Phone See—And It’s Both Fascinating and Creepy. Here’s My Experience

Gemini Live competes directly with ChatGPT. Its vision mode feels both unsettling and useful.

Gemini Live lets your phone see
No comments Twitter Flipboard E-mail
ricardo-aguilar

Ricardo Aguilar

Writer
  • Adapted by:

  • Karen Alfaro

ricardo-aguilar

Ricardo Aguilar

Writer

Mobile tech writer and analyst. I studied Psychology, but I've been working in the consumer tech field for the last 10 years. Interested in motor projects and new forms of mobility.

87 publications by Ricardo Aguilar
karen-alfaro

Karen Alfaro

Writer

Communications professional with a decade of experience as a copywriter, proofreader, and editor. As a travel and science journalist, I've collaborated with several print and digital outlets around the world. I'm passionate about culture, music, food, history, and innovative technologies.

348 publications by Karen Alfaro

In December 2024, OpenAI surprised the world with a powerful feature: ChatGPT had “eyes” and could interpret the world in real time. The demo was breathtaking—the app could recognize everything it saw through the camera.

In early 2025, Google announced a major new feature for Gemini Live: its advanced voice mode. This feature competes directly with ChatGPT’s vision capability and is already available on the Google Pixel 9 and Samsung Galaxy S25, as long as you subscribe to the Advanced plan.

I tested this feature on a Google Pixel 9 Pro. And yes, it’s as impressive as you might expect.

Interface. Activating Gemini Live’s new “view” modes is straightforward. Open the app and go to Advanced Voice Mode (the icon in the bottom right corner).

Gemini Live

Once Gemini Live opens, you’ll see two new shortcuts: one for the camera and one for your screen. It can also read your screen contents in real time.

Camera mode. When you activate camera mode, Gemini sees everything your camera captures. It’s spectacular how it recognizes almost anything, and how quickly it identifies specific items like plant types and tech device models—without any labels.

You can ask it anything. It acts as a guide, translator, and tutor. This last function stood out to me: it solved equations, psycho-technical problems and all kinds of questions, explaining them step by step.

Screen mode. This mode raises more privacy concerns, but if you’re willing, Gemini can read everything displayed on your screen. You can ask it anything.

I didn’t find it especially useful, since Google Lens already offers quick information if you’re searching for something specific. Still, it’s another sign of the expanding capabilities of Gemini.

Gemini Live test

Don’t trust AI, ever. As always, the best approach is to remain skeptical. I tested how well it could recognize my computer with a wide shot of my desktop. It failed to identify the device. When I focused on it directly and asked if it was a Mac Mini M1 or M4, it guessed M1—which is easy to tell based on ports and size.

It also misread some numbers when I gave it a psycho-technical test. Overall, you need to stay actively involved to ensure it gives correct answers.

Don’t trust its questions, either. Gemini Live—and GPT’s advanced conversation mode—share a flaw: they’re too inquisitive. They often end answers with follow-up questions to keep the conversation going. That’s especially annoying in view mode.

Getting straight answers is tough. They often interrupt responses with unnecessary questions. It’s a minor flaw common to most AI models, but it disrupts the conversation flow.

Still, I think Gemini Live’s view mode is great.

Images | Amanz (Unsplash) | Xataka On 

Related | Taking AI Beyond the Screen: With Gemini Robotics, Google Enhances Robots’ Interaction With the Physical World

Home o Index
×

We use third-party cookies to generate audience statistics and display personalized advertising by analyzing your browsing habits. If you continue browsing, you will be accepting their use. More information