Your Digital World,
Through a New Lens
The Spatial Eye is a multimodal AI companion that sees what you see, hears what you hear, and acts on your digital intent in real-time using Gemini 2.5 Live.
Unified Intelligence
Experience the future of real-time environmental awareness. The Spatial Eye leverages Gemini's multimodal power to bridge the gap between AI and physical reality.
Real-Time Object Tracking
Pinpoint accuracy in the physical world. The AI continuously analyzes the live video feed to identify, track, and visually highlight objects in your environment.
Multimodal Context Alignment
Going beyond simple detection. The assistant understands the relationship between highlighted objects and your verbal queries, providing grounded, context-aware answers.
Challenges & Architecture Decisions
Building a real-time multimodal agent requires solving complex synchronization and state management challenges. Here's how we tackled them.
Powering the Future
on Google Cloud
Built for the Gemini Live Agent Challenge, our infrastructure leverages high-performance Google Cloud services to ensure sub-second latency and global scalability.
- Gemini 2.5 Live:Low-latency WebSocket interaction
- Google Cloud Run:Serverless backend orchestration
- Firebase Auth:Secure Google Sign-In
- Firebase Hosting:Managed CDN & Static Assets
- Terraform:Automated IaC for reliability
- Next.js 15:Cutting-edge frontend performance
- shadcn/ui:Atomic design components
Next.js 15 Frontend
WebRTC & PCM Audio API
Cloud Relay
Google Cloud Run • FastAPI