Gemini Live Agent Challenge 2026

Your Digital World,
Through a New Lens

The Spatial Eye is a multimodal AI companion that sees what you see, hears what you hear, and acts on your digital intent in real-time using Gemini 2.5 Live.

Real-time Vision

Multimodal AI

Atomic Design

Unified Intelligence

Experience the future of real-time environmental awareness. The Spatial Eye leverages Gemini's multimodal power to bridge the gap between AI and physical reality.

Real-Time Object Tracking

Pinpoint accuracy in the physical world. The AI continuously analyzes the live video feed to identify, track, and visually highlight objects in your environment.

Explore Mode→

Multimodal Context Alignment

Going beyond simple detection. The assistant understands the relationship between highlighted objects and your verbal queries, providing grounded, context-aware answers.

Explore Mode→

Challenges & Architecture Decisions

Building a real-time multimodal agent requires solving complex synchronization and state management challenges. Here's how we tackled them.

Powering the Future
on Google Cloud

Built for the Gemini Live Agent Challenge, our infrastructure leverages high-performance Google Cloud services to ensure sub-second latency and global scalability.

Gemini 2.5 Live:Low-latency WebSocket interaction
Google Cloud Run:Serverless backend orchestration
Firebase Auth:Secure Google Sign-In
Firebase Hosting:Managed CDN & Static Assets
Terraform:Automated IaC for reliability
Next.js 15:Cutting-edge frontend performance
shadcn/ui:Atomic design components

Client Edge

Next.js 15 Frontend
WebRTC & PCM Audio API

Cloud Relay

Google Cloud Run • FastAPI

State Hook Tools

Gemini Live

Multi-modal Brain API

Your Digital World,
Through a New Lens

Unified Intelligence

Real-Time Object Tracking

Multimodal Context Alignment

Challenges & Architecture Decisions

Action-Looping & Input Hallucination

Ghost Highlights & Verbalized Coordinates

Context Bleed & Tracking Hallucinations

Powering the Future
on Google Cloud

Cloud Relay

Gemini Live

Your Digital World, Through a New Lens

Unified Intelligence

Real-Time Object Tracking

Multimodal Context Alignment

Challenges & Architecture Decisions

Action-Looping & Input Hallucination

Ghost Highlights & Verbalized Coordinates

Context Bleed & Tracking Hallucinations

Powering the Future on Google Cloud

Cloud Relay

Gemini Live

Your Digital World,
Through a New Lens

Powering the Future
on Google Cloud