How Yugma works
Architecture
# The pipeline
User prompt
→ referenceResolver ("that"/"the red sphere" → IDs)
→ aiSerializer (YSL scene context, ~45 tok/object)
→ aiCompose (Cloud Function, the brain)
├─ System prompt (coords, scale, materials, design principles)
├─ 19 tool schemas
└─ Agentic loop (max 3 iters)
→ TOOL_DISPATCH → Zustand stores → Three.js rerenders
→ Collab broadcast deltas to peers
# The 19 tools
Add / update / remove / duplicate / animate objects. Set environment / clear scene / focus camera. Apply material presets. Align / distribute / group / tag / search-select. Sketchfab + layout generation + preview / commit.
# The scene graph
Every object is { id, name, type, transform, material, geometry, tags, parentId, ... }. Stored in a Zustand store as the source of truth. Rendered by R3F. Serialized to GLB / USDZ / PNG for export.
# Why this design
- Speed. Cerebras streams the first tool call ~3-5s into the prompt; you see objects appear before the AI finishes.
- Reliability. Typed schemas mean the AI can't emit malformed code that breaks rendering.
- Editability. Every object has a stable ID; subsequent prompts can reference and mutate it.
- Exportability. The graph serializes cleanly to whatever format you need.
FAQ
How fast is the first tool call?
Streaming SSE delivers the first tool call ~3-5 seconds in. Full scene typically 10-30 seconds depending on object count.
How does the AI know spatial relationships?
A spatial pre-processor handles obvious patterns (circle of N, grid, stack, scatter) by computing exact positions. The LLM owns the rest, guided by scale references in the system prompt.
Can I see a demo?
Open Yugma Studio; type a sentence; watch the scene build.