
π The Ultimate AI Stack: Gemini 3.1 + Claude 4.6
πΒ Github repo for World Monitor π
πΒ Helpful Resources π
Sales & Marketing AI Agents that work out of the box:
π€Β Credit for the geospatial video π€
Author β bilawalsidhu (ex-Google PM) on X
π§ Why Gemini 3.1 + Claude 4.6 Combo is Overpowered
Right now, we are in a unique era of AI pricing and performance. You no longer need the most expensive flagship models to get top-tier results. Instead, developers are using Hybrid AI Workflowsβrouting specific tasks to the models best suited for them:
- Claude 4.6: Undisputed champion at software engineering, logic, backend architecture, refactoring, and deep debugging.
- Gemini 3.1: State-of-the-art multimodal capabilities, massive 1M+ token context windows, visual web scraping, and real-time data analysis (perfect for analyzing live satellite traffic cams and panoptic feeds).
When you combine them, Claude writes the system architecture, and Gemini acts as the "eyes and ears" to process massive amounts of live geospatial data.
π How to Connect Them
You can't just open two browser tabs and expect them to communicate. To build real systems, you need to use MCP (Model Context Protocol) or CLI bridging tools. Here are the three best ways to make Claude and Gemini work as one hive-mind.
Method 1: The clink CLI Bridge (Easiest for Devs)
The open-source community recently released the PAL MCP Server (Provider Abstraction Layer) which includes a tool called clink (CLI + Link). This allows you to spawn Gemini subagents directly from inside your Claude coding session github.com.
With clink, Claude Code can spawn an isolated Gemini CLI instance to offload heavy tasks (like analyzing a map screenshot) without polluting Claude's context window.
# Example command inside your terminal
clink with gemini panoptic_analyzer to audit live_traffic_feed.jpg for vehicle coordinatesThe Gemini subagent runs the visual analysis in isolation and returns only the final structured JSON data to Claude, who then writes the Python script to plot it on a map.
Method 2: Custom Bash Wrapper Script (No Extra Dependencies)
If you prefer a lightweight approach, you can create a simple wrapper script that allows Claude Code to trigger the Gemini CLI via a /gemini slash command, a method popularized by AI developers working on hybrid workflows paddo.dev.
1. Install Gemini CLI:
npm install -g @google/gemini-cli
export GEMINI_API_KEY=your_key_here2. Create the wrapper script (~/.claude/bin/gemini-clean):
#!/bin/bash
output=$(gemini "$@" 2>&1)
echo "$output" | jq -r '.response' 2>/dev/null || echo "$output"chmod +x ~/.claude/bin/gemini-cleanNow, while coding with Claude, you can simply type /gemini analyze this architecture to pass the context to Gemini 3.1!
Method 3: Enterprise Integration via Composio MCP
If you are building an automated agentic loop (like a bot that runs 24/7 scanning plane coordinates), you'll want to use Anthropic's Claude Agent SDK connected to the Gemini MCP Server via a tool router like Composio composio.dev.
By integrating Claude with the Gemini MCP, Claude gains live control over Gemini's multimodal and embedding tools.
- Claude dictates the plan.
- Claude calls a tool:
call_gemini_vision(image_url="<http://live-traffic-cam>...")
- Gemini processes the request using its massive context and returns the data.
- Claude updates your database.
π Blueprint: Recreating the Geospatial Tracker (Step-by-Step)
This is the section most of you asked for. Below is the full architecture breakdown β not pseudocode, but the actual project structure and logic you'd hand to Claude + Gemini to build.
Project Structure
geospatial-tracker/
βββ backend/
β βββ main.py # FastAPI app + WebSocket hub
β βββ ingestion/
β β βββ opensky.py # Live aircraft positions
β β βββ traffic_cams.py # Public DOT camera feeds
β β βββ satellite.py # Sentinel/Planet tile fetcher
β βββ analysis/
β β βββ gemini_client.py # Gemini vision API wrapper
β β βββ panoptic.py # Detection orchestrator
β βββ models/
β β βββ schemas.py # Pydantic models for all data
β βββ config.py # API keys, polling intervals
βββ frontend/
β βββ src/
β β βββ App.tsx
β β βββ components/
β β β βββ LiveMap.tsx # Mapbox GL JS map layer
β β β βββ PlaneLayer.tsx
β β β βββ VehicleLayer.tsx
β β β βββ CameraPanel.tsx
β β βββ hooks/
β β βββ useWebSocket.ts
β βββ package.json
βββ docker-compose.yml
βββ .envStep 1: Real-Time Data Ingestion Pipeline (Claude 4.6)
This is where Claude shines. Ask it to generate the entire backend. Here's exactly what each data source looks like:
Aircraft Tracking β OpenSky Network API (Free, No Key Required)
# backend/ingestion/opensky.py
import httpx
import asyncio
from models.schemas import AircraftPosition
OPENSKY_URL = "<https://opensky-network.org/api/states/all>"
async def fetch_aircraft(bbox: dict = None) -> list[AircraftPosition]:
"""
Fetches all live aircraft positions from OpenSky.
bbox: {"lamin": 45.0, "lomin": -125.0, "lamax": 50.0, "lomax": -115.0}
Rate limit: 5 req/10s (anonymous), 1 req/5s (authenticated)
"""
params = bbox or {}
async with httpx.AsyncClient(timeout=10) as client:
resp = await client.get(OPENSKY_URL, params=params)
data = resp.json()
aircraft = []
for state in data.get("states", []):
aircraft.append(AircraftPosition(
icao24=state[0],
callsign=(state[1] or "").strip(),
origin_country=state[2],
longitude=state[5],
latitude=state[6],
altitude=state[7], # meters (barometric)
velocity=state[9], # m/s ground speed
heading=state[10], # degrees from north
vertical_rate=state[11],
on_ground=state[8],
last_contact=state[4],
))
return aircraftTraffic Camera Feeds β Public DOT Streams
Most U.S. state Departments of Transportation publish JPEG snapshot URLs or MJPEG streams. For example:
# backend/ingestion/traffic_cams.py
import httpx
from datetime import datetime
# Example: Caltrans public traffic camera feeds
CAMERA_FEEDS = {
"I-405_LAX": {
"url": "<https://cwwp2.dot.ca.gov/data/d7/cctv/image/i405-lax/i405-lax.jpg>",
"lat": 33.9425,
"lon": -118.4081,
},
"I-5_Downtown": {
"url": "<https://cwwp2.dot.ca.gov/data/d7/cctv/image/i5-downtown/i5-downtown.jpg>",
"lat": 34.0522,
"lon": -118.2437,
},
}
async def capture_frame(camera_id: str) -> dict:
"""Downloads a single JPEG frame from a public traffic camera."""
cam = CAMERA_FEEDS[camera_id]
async with httpx.AsyncClient() as client:
resp = await client.get(cam["url"])
return {
"camera_id": camera_id,
"image_bytes": resp.content,
"lat": cam["lat"],
"lon": cam["lon"],
"captured_at": datetime.utcnow().isoformat(),
}Satellite Imagery β Sentinel Hub or Planet API
For overhead views, you can use the free tier of Sentinel Hub (Copernicus program) or Planet's Explorer:
# backend/ingestion/satellite.py
import httpx
SENTINEL_WMS = "<https://services.sentinel-hub.com/ogc/wms/{instance_id}>"
async def fetch_satellite_tile(bbox: list, width: int = 1024, height: int = 1024) -> bytes:
"""
Fetches a recent Sentinel-2 satellite tile for a bounding box.
bbox: [min_lon, min_lat, max_lon, max_lat]
Free tier: 30,000 requests/month
"""
params = {
"SERVICE": "WMS",
"REQUEST": "GetMap",
"LAYERS": "TRUE_COLOR",
"BBOX": ",".join(map(str, bbox)),
"WIDTH": width,
"HEIGHT": height,
"FORMAT": "image/jpeg",
"CRS": "EPSG:4326",
"TIME": "2026-02-01/2026-02-20", # recent range
}
async with httpx.AsyncClient() as client:
resp = await client.get(SENTINEL_WMS, params=params)
return resp.contentStep 2: Visual Panoptic Detection (Gemini 3.1)
This is the part people lose their minds over. You're sending raw camera frames and satellite tiles to Gemini and asking it to return structured detection data.
The Gemini Vision Client
# backend/analysis/gemini_client.py
import google.generativeai as genai
import json
import base64
from config import GEMINI_API_KEY
genai.configure(api_key=GEMINI_API_KEY)
PANOPTIC_SYSTEM_PROMPT = """You are an advanced geospatial analyst model.
Analyze the provided image and detect ALL visible objects in these categories:
- vehicles (cars, trucks, buses, motorcycles)
- aircraft (planes, helicopters)
- pedestrians
- infrastructure (bridges, intersections)
For each detected object, return:
1. category (string)
2. estimated_lat and estimated_lon (float) β infer from camera metadata provided
3. confidence (float, 0-1)
4. bounding_box (optional, [x1, y1, x2, y2] in pixel coords)
5. attributes (color, direction, estimated_speed if moving)
Return ONLY valid JSON. No markdown. No explanation."""
async def analyze_frame(
image_bytes: bytes,
camera_lat: float,
camera_lon: float,
camera_heading: float = 0,
fov_degrees: float = 90,
) -> list[dict]:
"""
Sends a camera frame to Gemini 3.1 for panoptic detection.
Camera metadata helps Gemini estimate real-world coordinates.
"""
model = genai.GenerativeModel("gemini-3.1-pro")
context = f"""Camera metadata:
- Position: ({camera_lat}, {camera_lon})
- Heading: {camera_heading}Β° from North
- Field of view: {fov_degrees}Β°
- Image type: Traffic camera JPEG snapshot
Use this metadata to estimate real-world lat/lon for each detected object."""
response = model.generate_content(
[
PANOPTIC_SYSTEM_PROMPT,
context,
{"mime_type": "image/jpeg", "data": base64.b64encode(image_bytes).decode()},
],
generation_config={"response_mime_type": "application/json"},
)
detections = json.loads(response.text)
return detections if isinstance(detections, list) else detections.get("detections", [])The Detection Orchestrator β Ties It All Together
# backend/analysis/panoptic.py
import asyncio
from ingestion.traffic_cams import capture_frame, CAMERA_FEEDS
from ingestion.opensky import fetch_aircraft
from analysis.gemini_client import analyze_frame
async def run_detection_cycle() -> dict:
"""
One full detection cycle:
1. Pull aircraft data from OpenSky (structured API β no vision needed)
2. Capture frames from all traffic cameras
3. Send each frame to Gemini for panoptic detection
4. Merge all results into a single GeoJSON payload
"""
# Aircraft data is already structured β no Gemini needed
aircraft = await fetch_aircraft()
# Traffic cam analysis β this is where Gemini earns its keep
camera_tasks = []
for cam_id, cam_info in CAMERA_FEEDS.items():
frame = await capture_frame(cam_id)
camera_tasks.append(
analyze_frame(
image_bytes=frame["image_bytes"],
camera_lat=frame["lat"],
camera_lon=frame["lon"],
)
)
all_detections = await asyncio.gather(*camera_tasks)
# Flatten into unified GeoJSON
features = []
# Add aircraft as features
for ac in aircraft:
if ac.latitude and ac.longitude:
features.append({
"type": "Feature",
"geometry": {"type": "Point", "coordinates": [ac.longitude, ac.latitude]},
"properties": {
"category": "aircraft",
"callsign": ac.callsign,
"altitude": ac.altitude,
"velocity": ac.velocity,
"heading": ac.heading,
"source": "opensky",
},
})
# Add Gemini detections as features
for cam_id, detections in zip(CAMERA_FEEDS.keys(), all_detections):
for det in detections:
features.append({
"type": "Feature",
"geometry": {
"type": "Point",
"coordinates": [det["estimated_lon"], det["estimated_lat"]],
},
"properties": {
**det,
"source": f"camera:{cam_id}",
"source_model": "gemini-3.1-pro",
},
})
return {"type": "FeatureCollection", "features": features}Step 3: The WebSocket Hub (Claude 4.6)
This is the heartbeat of the app β a FastAPI server that runs detection cycles on a loop and pushes GeoJSON updates to every connected frontend client in real time.
# backend/main.py
from fastapi import FastAPI, WebSocket, WebSocketDisconnect
from fastapi.middleware.cors import CORSMiddleware
import asyncio
import json
from analysis.panoptic import run_detection_cycle
app = FastAPI(title="Geospatial Tracker")
app.add_middleware(CORSMiddleware, allow_origins=["*"], allow_methods=["*"], allow_headers=["*"])
connected_clients: list[WebSocket] = []
@app.websocket("/ws/live")
async def websocket_endpoint(ws: WebSocket):
await ws.accept()
connected_clients.append(ws)
try:
while True:
await ws.receive_text() # keep-alive
except WebSocketDisconnect:
connected_clients.remove(ws)
async def broadcast_loop():
"""Runs every 10 seconds β pulls data, analyzes, broadcasts."""
while True:
try:
geojson = await run_detection_cycle()
payload = json.dumps(geojson)
for client in connected_clients.copy():
try:
await client.send_text(payload)
except:
connected_clients.remove(client)
except Exception as e:
print(f"Cycle error: {e}")
await asyncio.sleep(10) # adjust polling interval
@app.on_event("startup")
async def startup():
asyncio.create_task(broadcast_loop())Step 4: The Frontend Map (Claude 4.6)
Have Claude generate a React + Mapbox GL JS frontend. The key component:
// frontend/src/components/LiveMap.tsx
import { useEffect, useRef, useState } from "react";
import mapboxgl from "mapbox-gl";
import "mapbox-gl/dist/mapbox-gl.css";
mapboxgl.accessToken = import.meta.env.VITE_MAPBOX_TOKEN;
export default function LiveMap() {
const mapContainer = useRef<HTMLDivElement>(null);
const map = useRef<mapboxgl.Map | null>(null);
const [stats, setStats] = useState({ aircraft: 0, vehicles: 0 });
useEffect(() => {
map.current = new mapboxgl.Map({
container: mapContainer.current!,
style: "mapbox://styles/mapbox/dark-v11",
center: [-118.25, 34.05], // Los Angeles
zoom: 10,
});
map.current.on("load", () => {
// Add empty GeoJSON source β gets updated via WebSocket
map.current!.addSource("detections", {
type: "geojson",
data: { type: "FeatureCollection", features: [] },
});
// Aircraft layer β larger icons, colored by altitude
map.current!.addLayer({
id: "aircraft-layer",
type: "circle",
source: "detections",
filter: ["==", ["get", "category"], "aircraft"],
paint: {
"circle-radius": 8,
"circle-color": [
"interpolate", ["linear"], ["get", "altitude"],
0, "#00ff88", // ground level = green
5000, "#ffaa00", // mid-altitude = orange
12000, "#ff0044", // cruise altitude = red
],
"circle-stroke-width": 2,
"circle-stroke-color": "#ffffff",
},
});
// Vehicle layer β smaller dots from camera detections
map.current!.addLayer({
id: "vehicle-layer",
type: "circle",
source: "detections",
filter: ["==", ["get", "category"], "vehicles"],
paint: {
"circle-radius": 4,
"circle-color": "#00d4ff",
"circle-opacity": 0.8,
},
});
});
// WebSocket connection
const ws = new WebSocket("ws://localhost:8000/ws/live");
ws.onmessage = (event) => {
const geojson = JSON.parse(event.data);
// Update map source
const source = map.current!.getSource("detections") as mapboxgl.GeoJSONSource;
if (source) source.setData(geojson);
// Update stats
const features = geojson.features || [];
setStats({
aircraft: features.filter((f: any) => f.properties.category === "aircraft").length,
vehicles: features.filter((f: any) => f.properties.category === "vehicles").length,
});
};
return () => {
ws.close();
map.current?.remove();
};
}, []);
return (
<div style={{ position: "relative", width: "100vw", height: "100vh" }}>
<div ref={mapContainer} style={{ width: "100%", height: "100%" }} />
{/* HUD overlay */}
<div style={{
position: "absolute", top: 16, left: 16,
background: "rgba(0,0,0,0.8)", color: "#0f0",
padding: "12px 20px", borderRadius: 8, fontFamily: "monospace",
}}>
<div>β AIRCRAFT TRACKED: {stats.aircraft}</div>
<div>π VEHICLES DETECTED: {stats.vehicles}</div>
<div style={{ fontSize: 10, opacity: 0.6 }}>LIVE β’ 10s refresh</div>
</div>
</div>
);
}Step 5: Pydantic Schemas β The Glue That Prevents Chaos
This is critical. Gemini returns free-form JSON. Without strict validation, one malformed response crashes your entire map. Claude should generate these schemas:
# backend/models/schemas.py
from pydantic import BaseModel, Field
from typing import Optional
class AircraftPosition(BaseModel):
icao24: str
callsign: str = ""
origin_country: str = ""
longitude: Optional[float] = None
latitude: Optional[float] = None
altitude: Optional[float] = None
velocity: Optional[float] = None
heading: Optional[float] = None
vertical_rate: Optional[float] = None
on_ground: bool = False
last_contact: Optional[int] = None
class Detection(BaseModel):
category: str = Field(..., description="vehicle, aircraft, pedestrian, etc.")
estimated_lat: float = Field(..., ge=-90, le=90)
estimated_lon: float = Field(..., ge=-180, le=180)
confidence: float = Field(..., ge=0, le=1)
bounding_box: Optional[list[float]] = None
attributes: dict = Field(default_factory=dict)
class DetectionResponse(BaseModel):
"""Validates Gemini's entire response before it hits your map."""
detections: list[Detection]Then in your gemini_client.py, wrap the raw response:
from models.schemas import DetectionResponse
# After getting raw JSON from Gemini:
validated = DetectionResponse(detections=raw_json)
return validated.detections # guaranteed clean dataStep 6: Run It
# .env
GEMINI_API_KEY=your_gemini_key
MAPBOX_TOKEN=your_mapbox_token
# Terminal 1 β Backend
cd backend && uvicorn main:app --reload --port 8000
# Terminal 2 β Frontend
cd frontend && npm run devOpen http://localhost:5173. You should see a dark map with live aircraft dots appearing within seconds, and vehicle detections populating as camera frames are analyzed.
β οΈ Cost Reality Check
| Component | Cost |
|---|---|
| OpenSky API | Free (rate-limited) |
| Gemini 3.1 Pro (vision) | ~$0.002/frame analyzed |
| Sentinel Hub (satellite) | Free tier β 30k req/month |
| Mapbox | Free tier β 50k loads/month |
| Claude 4.6 (generating all this code) | ~$0.30 total for the full project |
Running 6 cameras at 10-second intervals = ~$3.10/day in Gemini API costs. That's a CIA-grade surveillance dashboard for the price of a coffee.
π Key Takeaway
The power isn't in either model alone, it's in the routing. Claude is your architect and engineer. Gemini is your analyst with superhuman vision. The MCP bridge (github.com, github.com) is the nervous system connecting them. This is how real AI-native applications are built in 2026.
https://www.altari.ai/agents?utm_source=notion&utm_campaign=art-youtube-brain