Why LLMs Fail at Modern Three.js — and How Tool Calls Fix It
A real complaint from the Three.js community: "how could AI be tuned to avoid this?" — a forum thread about ChatGPT generating Three.js code that calls removed methods, hallucinated APIs, or correct-looking-but-broken material props. The cause is structural and the fix is a pattern Yugma uses across its 19 AI tools.
# Why LLMs lag the API
Three.js evolves quickly. Method names rename, materials add or remove props, helpers move between core and addons. ChatGPT's training data lags by 6–18 months. The model emits the API as it remembers it from training — not as it exists today.
Common failure modes:
geometry.computeVertexNormals()callscomputeFlatVertexNormals()(removed).MeshStandardMaterialconstructor passedroughnessasroughnessMap(typo from old docs).THREE.Geometry(deprecated since r125) instead ofBufferGeometry.TWEEN.update()calls without import (separate package).
The model is not "wrong" — it's confidently emitting a 2022 API.
# The fix: typed tool calls
Yugma never asks an LLM to emit Three.js code. Instead, the LLM sees a typed JSON-schema for each of 19 tool calls, and emits validated calls that get dispatched against the scene store:
{
"name": "add_object",
"input": {
"type": "box",
"name": "table_main",
"position": [0, 0.375, 0],
"scale": [1.2, 0.75, 0.6],
"color": "#8B6914",
"roughness": 0.7,
"metalness": 0
}
}
The LLM cannot accidentally:
- Call a removed method (no methods in the surface area).
- Pass wrong-typed parameters (schema validates).
- Make up tool names (only the 19 are listed).
- Confuse Three.js versions (the schema is version-pinned).
# What you give up
This pattern trades raw flexibility for reliability. The LLM can't do anything outside the 19 tools without a code change. That's deliberate — every tool added is a deliberate API surface decision.
# What you gain
- Reliability: tool call success rate >99% in production.
- Versioning: bump the schema, the LLM uses the new shape — no model retraining.
- Composability: 10 parallel tool calls in one response, applied as one transaction.
- Observability: every AI action is a structured log line, not an opaque code blob.
# Generalizing the pattern
This works for any AI-driving-state product. Cursor uses it for code edits (typed file-edit tools). Linear uses it for issue management. The recipe:
- Define your domain's mutations as a typed schema.
- Expose those schemas as tools the LLM sees.
- Validate every call before applying.
- Emit them in parallel when independent.
Anywhere you'd have asked an LLM to "write code that does X", consider replacing with "emit a typed call that mutates state for X". The reliability gap is dramatic.