Sample transcript and context
Vision Live mode. The iPhone camera streams live video to Gemini Live, which sees in real time what you're filming. Usage pattern: you hold your iPhone, you film around you (up to 360° to sweep the entire environment), the AI receives the stream live. To ask a question, you tap or double-tap anywhere on the screen; listening starts, you speak, the AI answers. At the end of the response, the app goes back to sleep automatically; a new tap or double-tap also interrupts a response in progress.
Example: user walking in a pedestrian street, films the street, double-taps then asks "What's in front of me?". Answer: "A bakery window on your right, a public bench on your left and a newsstand about thirty meters ahead."
The video stream is continuous and streamed in real time to the AI, but no answer is emitted without a question: there is no automatic description. You can chain questions on the live stream: "read this shop window", "what's the street number?", "is there a bus stop nearby?". Vision Live recognizes shop windows, street furniture (benches, newsstands, trash cans), construction signs, directional signs, street numbers, business names — but only on your request.
Vision Live is a visual assistant; it does not replace the white cane, guide dog or human alertness. It must not be used to decide when to cross a street. Particularly useful for users with retinitis pigmentosa (tunnel vision) or glaucoma (reduced visual field) who need to compensate for the loss of peripheral vision. iPhone 16+, $16.99/month. The flagship feature that justifies the Pro edition on its own.