Kiva · Principal Designer · AI Pioneer · 2025–2026

Teaching AI to speak
in tokens.

Even with a three-tier variable library, DTCG JSON exports, Code Connect, style documentation, and Figma Make pointing directly at our design system kit, AI-assisted prototyping still required 5 correction loops to produce token-accurate output. The design system wasn't broken. The interface between it and AI was. I built a new one.

Role
Principal Product Designer · AI Pioneer
Stakeholders
Designers, PMs, Engineering, Leadership
Timeline
Q4 2025 – Q1 2026
Tools
Figma, DTCG JSON, Claude Code, Claude Design
Process
Audit existing infrastructure → Root cause diagnosis → Skill design → Modular architecture → Validation framework → Scale
01 — The Problem

Every prototype session started
with a correction loop.

Across the design team, prototyping with AI followed the same frustrating pattern: output a component, spot a token error, correct it, output again, spot another, correct again. By the time a component matched our actual design system, we'd spent more time correcting AI than it saved. And it wasn't isolated to one tool. Figma Make, Claude, Cursor — same pattern every time.

The average prototype session required 4–5 rounds of correction. Designers lost the productivity benefit. PMs got slower turnaround. And each session started from scratch — no learning, no memory, no cumulative improvement.

5→1
correction loops per prototyping session
0
hardcoded hex values in AI output
100+
teammates able to use the skill
02 — What Already Existed

The design system was complete.
And it still wasn't enough.

Before building anything new, I needed to understand why such comprehensive infrastructure was producing such inconsistent output. We had everything the design community recommends:

Variable Library
Figma variables with Global, Alias, and Mapped layers — full 3-tier architecture with 5 theme modes.
Production
DTCG JSON Exports
Design Token Community Group format exports, machine-readable by any standards-compliant tool.
Exported
Style Guidelines
Documented typography, color, spacing, and component specifications inside Figma.
Documented
Code Connect
Figma Code Connect mappings linking components to their codebase implementation.
Mapped
Figma Make
Figma's AI directly referencing our design system kit — the most integrated scenario possible.
Integrated

Despite all of this, every session still started with corrections. The problem wasn't documentation depth or tooling maturity. The problem was the format. Design systems are written to be understood by humans. AI doesn't read documentation — it processes instructions.

03 — Root Cause

A format designed for humans
reads poorly to machines.

I ran a structured audit comparing what AI needed to produce token-accurate output against what our existing documentation provided. The gap wasn't quantity — it was structure.

What AI was getting
  • Visual hierarchy conveyed through layout, not rules
  • Token names without semantic disambiguation
  • Confusable pairs: title vs subheadline, text.primary vs text.action
  • No explicit hierarchy rule: when to use Alias vs Mapped
  • No "stop and ask" instruction when a token was missing
  • Responsive values buried in long documentation prose
What AI actually needs
  • Ordered, numbered rules — a decision procedure, not a reference
  • Semantic intent tables: role → token name, not just token name
  • Explicit disambiguation for every confusable pair
  • Tier resolution rule: Mapped first, Alias second, Global never
  • A hard stop: flag missing tokens, never invent values
  • Breakpoint-specific values co-located with token definitions

AI models are excellent at following explicit, ranked instructions. They're poor at inferring implicit conventions from visual documentation. The fix wasn't more documentation — it was a different type of documentation, written for how AI actually processes context.

04 — Skill Architecture

Modular by design.
Precision through layering.

A skill that tries to cover everything in one file becomes too large for efficient AI context use and too rigid to update. I designed the architecture as a two-level system: a compact main orchestrator that defines the decision process, pointing to focused reference modules for each token category. AI loads only what it needs.

The 3-tier token architecture maps directly into the skill's resolution rule: AI always looks for the most specific token available — Mapped for known components, Alias for semantic intent, Global only as reference — never hardcoded.

Tier 1 · Primitive
Global
#276A43 · eco-green/3
Tier 2 · Semantic
Alias
background.action
Tier 3 · Component
Mapped
btn.background.primary

Each reference file is scoped to a single token category and sized to stay within efficient AI context limits. The main SKILL.md acts as the entry point — it defines the workflow, architecture rule, and common mistakes, then delegates deep token lookups to the relevant sub-file.

05 — Inside the Skill

Rules first. Examples second.
Ambiguity never.

The core design principle was: compress everything an experienced designer holds implicitly into an explicit, ordered decision process. The main SKILL.md has three functional sections, each solving a different failure mode from the original audit.

SKILL.md — Quick Reference: Building a Component 8-step decision process
Ordered workflow — AI follows this sequence for every component
1. Confirm breakpoint: mobile (sm), tablet (md), or desktop (lg, default)
2. Confirm theme: default, green-dark, green-light, marigold, or stone
3–7. Load token modules in order: typographycolorspacingradius/elevationcomponents
8. Cross-check: every visual property must trace to a named token. Zero hardcoded values.
Architecture rule — tier resolution order
Always use Alias or Mapped tokens in UI code. Never Global directly.
Check Mapped first (buttons, cards). Fall back to Alias for semantic intent.
Missing token rule — critical guard
! Missing token? Stop and flag it — never invent a value. Propose the closest match and ask.

The two disambiguation tables were the highest-leverage additions. Before these existed, AI regularly confused token pairs that look similar but serve different roles. After — zero errors in these categories.

What AI used to do wrong What the skill enforces
text.secondary for button labels text.secondary-button
border.secondary for button outlines border.secondary-button
background.action for secondary button background.action-secondary
subheadline (Book) for card headings title (Medium) for card headings
display for page headings headline 1 = pages; display = marketing heroes only
Desktop spacing applied on mobile Always check sm breakpoint values explicitly
text.primary for links text.action
Hardcoded 16px for radius base radius token
06 — Validation Framework

Not just functional.
Measurably excellent.

Once the skill was built, I needed a rigorous way to evaluate it — not just "does it work" but "how well does it work, and where can it improve." I developed an 8-dimension evaluation framework spanning structural quality and real-world performance, applying it before and after each iteration cycle.

The framework weights dimensions by their impact on output quality: real-world performance carries 25% of the total score, instruction specificity and workflow clarity each carry 15%, reflecting how directly these dimensions determine whether AI produces correct output on first pass.

Overall skill quality score
8 dimensions · weighted rubric · real-world validated
91 /100
Frontmatter quality 8/10 · weight ×8
Workflow clarity 9/10 · weight ×15
Edge case coverage 10/10 · weight ×10
Checkpoint design 7/10 · weight ×7
Instruction specificity 10/10 · weight ×15
Resource integration 10/10 · weight ×5
Overall architecture 9/10 · weight ×15
Real-world performance 9/10 · weight ×25
Key finding: Checkpoint design scored lowest (7/10) — appropriate for a reference skill where checkpoints are embedded as inline guards ("flag it, don't invent") rather than workflow pauses. This is a deliberate design choice, not a gap. Raising it would increase skill verbosity without improving output quality.
07 — The HTML Design Guide

One source of truth.
Two surfaces.

The skill solved AI's format problem. But designers, PMs, and engineers still needed a human-readable reference. Rather than maintain two separate sources, I built a parallel surface: an HTML design guide using Claude Design that maps every token directly to its intended use, with visual previews, copy-pasteable CSS variable names, and developer-ready Tailwind equivalents.

The result: when a developer deploys an AI-generated component, they can cross-reference the same token names in the HTML guide to verify intent. AI codes it. Humans verify it. The same token vocabulary runs through both surfaces — no translation required.

For AI
SKILL.md — Machine-readable
Modular instruction set: ordered decision rules, disambiguation tables, edge case guards. AI reads once, applies correctly every time.
Quick Reference: Building a Component
1. Confirm breakpoint
2. Confirm theme
3–7. Load token modules
8. Cross-check: zero hardcoded values
View skill on GitHub →
For Humans
Claude Design — Visual guide
Interactive HTML reference with visual previews, CSS variable names, Tailwind mappings, and developer-ready token documentation.
View design guide →

This dual-surface approach closes a gap that most teams leave open: design systems that are either too abstract for AI or too visual for programmatic use. The token vocabulary stays identical across both — what AI calls text.action, the HTML guide labels and previews with the same name. Designers, AI, and developers share one language.

Link text text.action #276A43
Primary button bg background.action #276A43
Card background background.primary #FFFFFF
Body paragraph base / Kiva Post Grot Book 16/22
Page section gap spacing.structure.XL 32px desktop
08 — Outcomes

What changed

The shift from 5 correction loops to 1–2 wasn't the most significant outcome. The more durable change was qualitative: AI prototyping went from a source of frustration — something that needed constant supervision — to a reliable capability. People started using it more, because they trusted it more.

5→1–2
AI correction loops per session
91/100
skill quality score across 8 dimensions
2
surfaces: AI skill + HTML design guide

The real lesson: AI tools don't fail because they're dumb — they fail because they're given documentation written for humans. Once the interface matched how AI actually processes instructions, the same model that produced 5 errors on first pass produced 0. The capability was always there. The missing piece was a format it could actually use.

For designers
  • Prototype directly in the right tokens — no correction overhead
  • Figma Make, Claude, Cursor all use the same skill
  • First-pass output matches production design system
For engineering
  • AI-generated code references real token names
  • HTML guide cross-references the same vocabulary
  • Design-to-dev handoff requires less translation
For the org
  • Methodology independently adopted by engineering into the shared token library
  • Skill architecture replicated across 8 design system sections in kv-tokens
  • Scalable: update the skill once, all AI tools benefit
09 — In Practice

The proof is in
the prototype.

Same AI model. Same design system. Same prompt. The only variable: whether the skill was loaded. Left is what came back before the skill existed — five correction rounds before the tokens matched. Right is what came back after — accurate on the first pass, tokens correct, no corrections needed.

Before the skill
v1 — ~5 correction loops per session
After the skill
v25 — 1–2 loops, accurate first pass
What's next

Let's close
a gap together.

I'm looking for teams where research drives the roadmap and design ships — not just specs.

lin@linzhao.design