Before building anything, I needed to understand where the system was failing — and for whom. I ran structured surveys and interviews across four functions to map the pain points directly, then audited our existing design system against what actually shipped in production.
Building an
AI-Ready
Design System
How I turned an AI hallucination problem into an infrastructure initiative — researching the gap, rebuilding the system from the ground up, and walking 100+ teammates through a new way of working.
Starting with research
Then I benchmarked how top-tier companies — Atlassian, Shopify, IBM — structured their design tokens and documentation. The pattern was consistent: where they succeeded with AI, they had built semantic meaning into the system itself. Where we were failing, we had given AI nothing meaningful to work with.
Hex codes. Pixel values. No context, no usage rules, no relationships. The AI wasn't hallucinating randomly — it was filling a vacuum we had created.
AI couldn't read our design system
We had design tokens. We had variables in Figma. We even had AI tools in the workflow. But the output was inconsistent — sometimes on-brand, often generic, never reliably correct. Every AI prototype needed heavy manual correction, which defeated the purpose entirely.
- Tokens existed but carried no semantic meaning for machines. AI tools saw hex codes and pixel values — not intent, context, or usage rules.
- AI prototyping produced inconsistent output. Hallucinations alone could take hundreds of rounds of prompting to fix. Button styling off, spacing wrong, colors approximated.
- Engineers manually overrode tokens because the system output wasn't trustworthy. Changing one value had unpredictable ripple effects.
- The capability lived with one person — not scalable, not resilient, not transferable.
The Insight
The problem wasn't the AI tool. It was that the design system wasn't machine-readable. Fix the input → fix the output.
Rebuilding the foundation
A flat token structure — even a well-named one — creates a hidden consumer problem. Engineers writing Tailwind, designers selecting variables, and AI tools reading exported context each need something fundamentally different from a token at the point of use. When those needs are collapsed into a single layer, any change becomes fragile, and AI gets noise instead of signal.
The fix wasn't cleaning up the existing tokens. It was rethinking the architecture to serve three distinct consumers explicitly. Three layers because three different jobs: Layer 1 gives engineering a source of truth they can trust and maintain. Layer 2 gives designers and AI semantic meaning — intent, not just value. Layer 3 gives QA provable coverage: every component traces back to a token, no hardcoding, no exceptions.
spacing/16: 1rem
radius/lg: 16px
// "Primary text color for body content.
// Use on headings and key labels.
// Meets WCAG AA on surface backgrounds."
card/padding → spacing/content/md
What the alias layer looks like inside Figma's Variable panel — each collection maps semantic intent to global primitives across responsive breakpoints.
Rebuilding the architecture in Figma is one thing. Proving it matches what actually runs in production is another — and that gap is where engineering trust breaks down. I ran a full audit of token consumption across the production codebase before declaring the rebuild complete.
The three-layer architecture is a claim. The table below is the proof: every token traced end-to-end from Figma alias → HTML element → Tailwind class. This is the cross-team contract — the artifact that makes the architecture auditable and the system legible to all three audiences at once.
Every token maps to a precise HTML element and Tailwind class. Designers work with token names, engineers write Tailwind, AI tools read both — same source, zero translation loss across all three audiences.
| Style Token | HTML Element | Tailwind Class | Mobile | Tablet | Desktop | Notes |
|---|---|---|---|---|---|---|
| Display | <h1> | tw-text-display | 36px | 40px | 44px | Marketing hero only. CSS override needed for Contentful. |
| Headline 1 | <h2> | tw-text-headline | 22px | 22px | 26px | Global h1 default via base CSS. |
| Headline 2 | <h3> | tw-text-headline-two | 20px | 20px | 22px | Global h2 default via base CSS. |
| Subheadline | <h4> | tw-text-subheadline | 18px | 18px | 20px | Global h3 default via base CSS. |
| Title | <h5> | tw-text-title | 18px | 18px | 20px | Global h4 default via base CSS. |
| Base / Body | <p>, <body> | tw-text-base | 16px | 16px | 16px | Global paragraph and body default. |
| Button | <button> | tw-text-button-link | 16px | 16px | 16px | Global button default. CTA labels. |
| Label | <label> | tw-text-label | 14px | 14px | 14px | Form labels, filter names. |
| Caption | <figcaption> | tw-text-caption | 14px | 14px | 14px | Disclaimers, captions. |
| Upper | — | tw-text-upper | 14px | 14px | 14px | Utility class only. Always UPPERCASE. |
| Small | <small> | tw-text-small | 14px | 14px | 14px | Supportive UI text. |
| Blockquote | <blockquote> | tw-text-blockquote | 20px | 20px | 22px | Italic Dovetail serif. |
| Link | <a> | tw-text-link | — | — | — | Inherits size from parent element. |
Heading levels h1–h4 only. Do not use <h5> or <h6>. For deeper hierarchy, use tokens as classes: tw-text-title, tw-text-label, tw-text-caption.
| Alias Token | Global Primitive | Default Value | Tailwind Class | Notes |
|---|---|---|---|---|
| text/primary | eco-green/4 | #223829 | tw-text-primary | Primary content, headings |
| text/secondary | gray/500 | #757575 | tw-text-secondary | Body copy, supporting text |
| text/tertiary | gray/400 | #9E9E9E | tw-text-tertiary | Placeholder, hint text |
| text/action | eco-green/3 | #276A43 | tw-text-action | Links, interactive labels |
| text/caution | marigold/4 | #593207 | tw-text-caution | Warning messages |
| text/danger | desert-rose/4 | #5C2A22 | tw-text-danger | Error messages |
| background/primary | white | #FFFFFF | tw-bg-primary | Page, card backgrounds |
| background/primary-inverse | eco-green/4 | #223829 | tw-bg-primary-inverse | Dark hero, footer, overlays |
| background/secondary | eco-green/1 | #EDF4F1 | tw-bg-secondary | Subtle section fills |
| background/action | eco-green/3 | #276A43 | tw-bg-action | CTA button fills |
| border/primary | gray/600 | #505050 | tw-border-primary | Default stroke |
| border/tertiary | gray/300 | #C4C4C4 | tw-border-tertiary | Dividers, subtle separators |
| border/action | eco-green/3 | #276A43 | tw-border-action | Focus rings, active states |
Theme variants (green-dark, marigold-light, stone-light) override alias values via CSS custom properties on data-theme attributes. Never reference raw hex values in component code — always use alias tokens.
| Alias Token | Global maps to | Desktop | Tablet | Mobile | Use Case |
|---|---|---|---|---|---|
| spacing / structure | |||||
| spacing/structure/XL | scale/4 | 32px | 32px | 24px | Between major page sections |
| spacing/structure/L | scale/3 | 24px | 24px | 24px | Between large components |
| spacing/structure/M | scale/2 | 16px | 16px | 16px | Between components |
| spacing/structure/S | scale/1 | 8px | 8px | 8px | Between header and subheader |
| spacing / component / gap | |||||
| spacing/component/gap/L | scale/2 | 16px | 16px | 16px | Between cards in a grid |
| spacing/component/gap/M | scale/1 | 8px | 8px | 8px | Content & button inside a card |
| spacing/component/gap/S | scale/0-5 | 4px | 4px | 4px | Tags, compact contexts |
| spacing / component / inset | |||||
| spacing/component/inset/XL | scale/4 | 32px | 24px | 20px | Modals, lightboxes, sidesheets |
| spacing/component/inset/L | scale/3 | 24px | 24px | 20px | Large widgets, expanded panels |
| spacing/component/inset/M | scale/2 | 16px | 16px | 16px | Medium cards, standard containers |
| spacing/component/inset/S | scale/1 | 8px | 8px | 8px | Small cards, compact containers |
| spacing/component/inset/XS | scale/0-5 | 4px | 4px | 4px | Location chips, tags, badges |
| spacing / micro | |||||
| spacing/micro | scale/0-5 | 4px | 4px | 4px | Icon + text pairs, inline elements |
These are alias tokens — they reference global scale values and carry semantic intent. Use the alias token name in code, never the raw scale value. Responsive variants collapse at breakpoints below xl (1440px).
| Alias Token | HTML | Desktop xl | Desktop lg | Tablet md | Mobile sm | Mobile xs |
|---|---|---|---|---|---|---|
| layout/breakpoint | @media | 1440px | 1280px | 768px | 480px | 0px |
| layout/grid-columns | .container | 12 | 12 | 12 | 4 | 4 |
| layout/grid-margin | .container | 88px | 64px | 40px | 24px | 16px |
| layout/grid-gutter | .grid | 24px | 24px | 16px | 16px | 16px |
| layout/content-max-width | .container | 1264px | 1152px | 688px | 432px | — |
Layout tokens are applied as CSS custom properties on layout containers — no direct Tailwind utility maps to them. Reference via var(--layout-grid-margin) etc. in component CSS. Grid column count drives the 12-column system used across all Kiva page templates.
Sample structure documentation — type scales, spacing system, layout grids, and color accessibility — documented as machine-readable artifacts.
Teaching AI to follow design rules
Chapter 3 solved the structure problem. This chapter solves a different one: making the system machine-readable.
A perfectly layered token architecture with no semantic annotations is still noise to an AI tool — a well-organized vocabulary with no grammar. The gap between AI uses our tokens and AI uses our tokens correctly comes down to one thing: whether the token communicates intent, not just value. That's what this chapter is about.
Token Descriptions — The Semantic Layer
Most teams export variables to AI and assume it will figure out the rest. That assumption is the root cause of the hallucination problem. AI doesn't infer design intent from structure alone — it needs the same contextual knowledge a junior designer has after a week of onboarding: when to use this token, where it belongs, what constraint it carries. Without that, it guesses. With it, it infers correctly the first time.
A project-level rules file committed to the Kiva repository. Any designer or engineer using AI coding tools automatically works within design system constraints — without specifying it in every prompt.
Most designers stop at personal workflow. This embeds constraints at the system level — the guardrails are inherited, not requested.
design_system: kiva-ds-2026
tokens: ./tokens/alias.json
constraints:
- Use semantic tokens, never raw values
- Follow spacing scale (4px base)
- All text must meet WCAG AA contrast
- Responsive breakpoints: 480 / 768 / 1024
Making It Usable
A well-designed system only produces reliable output if people know how to use it correctly. The token descriptions and rules file solve the data problem — they don't solve the workflow problem. If designers don't know how to export variables, how to prompt against the system, or how to validate what AI returns, the system provides no behavioral guarantee. These four deliverables are the bridge between system quality and workflow practice.
The Documentation
Documentation is where a design system's intelligence becomes transferable. The token descriptions, rules file, and workflow guides built in this chapter are only useful if they're backed by documentation thorough enough that someone — or something — can follow them independently.
Each of these four guidelines is written at the intersection of three audiences. For designers: the implementation rules and hierarchy rationale that prevent guesswork. For engineers: the code references and constraint context that eliminate manual overrides. For AI tools: the structured, parseable surface that closes the hallucination gap at source — not at the prompt layer, but before a single query is made.
See the difference
We took a static impact dashboard and had AI rebuild it as an interactive prototype using our design tokens. Same AI tool, same prompt structure — the only variable was the quality of design system input.
Even with all variables exported, AI doesn't nail everything — still about 10–15% off. That's why we built the documentation.
Scaling beyond one person
There's a specific failure mode in infrastructure work that looks like success: the system is well-built, people are using it, but all the tacit knowledge — the edge case judgment, the context behind decisions, the "ask Lin" reflex — still lives with one person. It scales as long as that person is reachable. The moment they're not, the system degrades.
The measure of a truly sustainable system isn't adoption. It's independence under normal conditions: designers building correctly without asking, engineers making token decisions with confidence, new team members onboarding from documentation rather than institutional memory. That requires deliberate effort to move knowledge out of your head and into the system itself — before you think you need to.
AI readiness as a practice
Chapter 6 is about people — making the system usable and resilient to human turnover. This chapter is about something different: making the system resilient to the AI tools themselves changing underneath it.
A design system's AI readiness isn't a state you reach and maintain. It has an expiration date tied to model updates you don't control. A token description that steered a model correctly in Q1 may produce different output in Q3 — not because the documentation changed, but because the model's interpretation of it did. New fine-tuning shifts how context is weighted. New tool interfaces change what gets passed to the model and what gets stripped. New hallucination patterns emerge from new training distributions. This means AI accuracy is a metric that requires measurement, not assumption.
The maintenance cycle below isn't process for its own sake. It's the mechanism that keeps a point-in-time build accurate over time — and the cadence that makes improvement legible to stakeholders beyond the design team.
Validated across three tools — Figma Make, Claude Code, and Google Stitch. Each showed measurable fidelity improvement after each review cycle. The target was never zero hallucination — that's not achievable with current technology, and claiming otherwise would undermine the credibility of everything else. The target was a shrinking gap, tracked visibly, quarter over quarter. That's now a standing practice.
Impact
Reflection
"An AI-ready design system isn't a project you finish. It's a standard you maintain. Every quarter, the gap gets smaller — and that's the whole job."
Let's
connect.
Looking for teams where design infrastructure is treated as a product — not a support function.
lin@linzhao.design