Gemini Live Agent Challenge 2026

Your Digital World,
Through a New Lens

The Spatial Eye is a multimodal AI companion that sees what you see, hears what you hear, and acts on your digital intent in real-time using Gemini 2.5 Live.

Real-time Vision
Multimodal AI
Atomic Design

Unified Intelligence

Experience the future of real-time environmental awareness. The Spatial Eye leverages Gemini's multimodal power to bridge the gap between AI and physical reality.

Challenges & Architecture Decisions

Building a real-time multimodal agent requires solving complex synchronization and state management challenges. Here's how we tackled them.

Powering the Future
on Google Cloud

Built for the Gemini Live Agent Challenge, our infrastructure leverages high-performance Google Cloud services to ensure sub-second latency and global scalability.

  • Gemini 2.5 Live:Low-latency WebSocket interaction
  • Google Cloud Run:Serverless backend orchestration
  • Firebase Auth:Secure Google Sign-In
  • Firebase Hosting:Managed CDN & Static Assets
  • Terraform:Automated IaC for reliability
  • Next.js 15:Cutting-edge frontend performance
  • shadcn/ui:Atomic design components
Client Edge

Next.js 15 Frontend
WebRTC & PCM Audio API

Cloud Relay

Google Cloud Run • FastAPI

State Hook Tools

Gemini Live

Multi-modal Brain API