Kiva · Principal Designer × AI Systems Architect Lead · 2025–2026

Building an
AI-Ready
Design System

How I turned an AI hallucination problem into an infrastructure initiative — researching the gap, rebuilding the system from the ground up, and walking 100+ teammates through a new way of working.

Role  Principal Designer × AI Systems Architect Timeline  Q3 2025 – Q2 2026 Team  Engineering, Design, Stakeholders
Chapter 01

Starting with research

Before building anything, I needed to understand where the system was failing — and for whom. I ran structured surveys and interviews across four functions to map the pain points directly, then audited our existing design system against what actually shipped in production.

10
Designers
Spending 40%+ of prototyping time correcting AI output. Key complaint: AI approximated design system values instead of using them — wrong spacing, wrong font weights, wrong color tokens. Every prototype needed a full manual pass before it was usable.
9
Product Managers
AI-generated prototypes didn't look like the product they knew — delaying stakeholder sign-off. They couldn't tell whether a prototype was intentionally different or just inaccurate. Trust in AI output was low across the board.
30+
Engineers
AI-generated code diverged from the token system, requiring manual overrides on every build. Tokens were defined in Figma, but engineers didn't trust them enough to use them — so they stopped, and hardcoded values crept in instead.
+
Marketing & Ops
Couldn't self-serve design-consistent assets without routing through a designer. Every ad, email, and one-pager required human intervention — not because the work was complex, but because the system wasn't accessible enough to use independently.

Then I benchmarked how top-tier companies — Atlassian, Shopify, IBM — structured their design tokens and documentation. The pattern was consistent: where they succeeded with AI, they had built semantic meaning into the system itself. Where we were failing, we had given AI nothing meaningful to work with.

Hex codes. Pixel values. No context, no usage rules, no relationships. The AI wasn't hallucinating randomly — it was filling a vacuum we had created.

Chapter 02

AI couldn't read our design system

We had design tokens. We had variables in Figma. We even had AI tools in the workflow. But the output was inconsistent — sometimes on-brand, often generic, never reliably correct. Every AI prototype needed heavy manual correction, which defeated the purpose entirely.

  • Tokens existed but carried no semantic meaning for machines. AI tools saw hex codes and pixel values — not intent, context, or usage rules.
  • AI prototyping produced inconsistent output. Hallucinations alone could take hundreds of rounds of prompting to fix. Button styling off, spacing wrong, colors approximated.
  • Engineers manually overrode tokens because the system output wasn't trustworthy. Changing one value had unpredictable ripple effects.
  • The capability lived with one person — not scalable, not resilient, not transferable.

The Insight

The problem wasn't the AI tool. It was that the design system wasn't machine-readable. Fix the input → fix the output.
Chapter 03

Rebuilding the foundation

A flat token structure — even a well-named one — creates a hidden consumer problem. Engineers writing Tailwind, designers selecting variables, and AI tools reading exported context each need something fundamentally different from a token at the point of use. When those needs are collapsed into a single layer, any change becomes fragile, and AI gets noise instead of signal.

The fix wasn't cleaning up the existing tokens. It was rethinking the architecture to serve three distinct consumers explicitly. Three layers because three different jobs: Layer 1 gives engineering a source of truth they can trust and maintain. Layer 2 gives designers and AI semantic meaning — intent, not just value. Layer 3 gives QA provable coverage: every component traces back to a token, no hardcoding, no exceptions.

Layer 1
Global — Primitive Layer
Raw values pulled from existing JSON primitive files that engineering maintains. Typography, spacing, color, layout, radius, elevation, border. Naming conventions reuse existing tokens — maps directly into the Tailwind config with no translation needed.
color/green/500: #5B8F7B
spacing/16: 1rem
radius/lg: 16px
The new part
Layer 2
Alias — Semantic Layer
Ties values back to Global, now with detailed meaning. Every complex alias variable has a description: when to use it, where it goes, how it should behave. When exported to AI, the AI picks up all that context immediately — you describe intent in natural language, and it follows the design rules accurately.
color/text/primary → color/green/900
// "Primary text color for body content.
// Use on headings and key labels.
// Meets WCAG AA on surface backgrounds."
Layer 3
Mapped — Component Layer
Tokens applied to specific components. Connects everything end-to-end: change a global value → aliases update → components update → no manual override needed anywhere. The system becomes trustworthy by design.
button/primary/bg → color/action/primary
card/padding → spacing/content/md

Rebuilding the architecture in Figma is one thing. Proving it matches what actually runs in production is another — and that gap is where engineering trust breaks down. I ran a full audit of token consumption across the production codebase before declaring the rebuild complete.

CSS Audit: Before 36+ silent bugs found — tokens defined in Figma but never applied in code, overrides masking system values, deprecated tokens still referenced in production. Engineers had stopped trusting the system and were hardcoding values directly. The design system and the product had quietly diverged.
CSS Audit: After Every discrepancy fixed systematically. For the first time, what was in Figma matched what shipped. That alignment is the precondition for everything else — token descriptions, AI rules, documentation — all of it requires the structural layer to be trustworthy first.

The three-layer architecture is a claim. The table below is the proof: every token traced end-to-end from Figma alias → HTML element → Tailwind class. This is the cross-team contract — the artifact that makes the architecture auditable and the system legible to all three audiences at once.

</>  Cross-Team Contract

Every token maps to a precise HTML element and Tailwind class. Designers work with token names, engineers write Tailwind, AI tools read both — same source, zero translation loss across all three audiences.

Style Token HTML Element Tailwind Class Mobile Tablet Desktop Notes
Display<h1>tw-text-display36px40px44pxMarketing hero only. CSS override needed for Contentful.
Headline 1<h2>tw-text-headline22px22px26pxGlobal h1 default via base CSS.
Headline 2<h3>tw-text-headline-two20px20px22pxGlobal h2 default via base CSS.
Subheadline<h4>tw-text-subheadline18px18px20pxGlobal h3 default via base CSS.
Title<h5>tw-text-title18px18px20pxGlobal h4 default via base CSS.
Base / Body<p>, <body>tw-text-base16px16px16pxGlobal paragraph and body default.
Button<button>tw-text-button-link16px16px16pxGlobal button default. CTA labels.
Label<label>tw-text-label14px14px14pxForm labels, filter names.
Caption<figcaption>tw-text-caption14px14px14pxDisclaimers, captions.
Uppertw-text-upper14px14px14pxUtility class only. Always UPPERCASE.
Small<small>tw-text-small14px14px14pxSupportive UI text.
Blockquote<blockquote>tw-text-blockquote20px20px22pxItalic Dovetail serif.
Link<a>tw-text-linkInherits size from parent element.

Heading levels h1–h4 only. Do not use <h5> or <h6>. For deeper hierarchy, use tokens as classes: tw-text-title, tw-text-label, tw-text-caption.

Alias Token Global Primitive Default Value Tailwind Class Notes
text/primaryeco-green/4#223829tw-text-primaryPrimary content, headings
text/secondarygray/500#757575tw-text-secondaryBody copy, supporting text
text/tertiarygray/400#9E9E9Etw-text-tertiaryPlaceholder, hint text
text/actioneco-green/3#276A43tw-text-actionLinks, interactive labels
text/cautionmarigold/4#593207tw-text-cautionWarning messages
text/dangerdesert-rose/4#5C2A22tw-text-dangerError messages
background/primarywhite#FFFFFFtw-bg-primaryPage, card backgrounds
background/primary-inverseeco-green/4#223829tw-bg-primary-inverseDark hero, footer, overlays
background/secondaryeco-green/1#EDF4F1tw-bg-secondarySubtle section fills
background/actioneco-green/3#276A43tw-bg-actionCTA button fills
border/primarygray/600#505050tw-border-primaryDefault stroke
border/tertiarygray/300#C4C4C4tw-border-tertiaryDividers, subtle separators
border/actioneco-green/3#276A43tw-border-actionFocus rings, active states

Theme variants (green-dark, marigold-light, stone-light) override alias values via CSS custom properties on data-theme attributes. Never reference raw hex values in component code — always use alias tokens.

Alias Token Global maps to Desktop Tablet Mobile Use Case
spacing / structure
spacing/structure/XLscale/432px32px24pxBetween major page sections
spacing/structure/Lscale/324px24px24pxBetween large components
spacing/structure/Mscale/216px16px16pxBetween components
spacing/structure/Sscale/18px8px8pxBetween header and subheader
spacing / component / gap
spacing/component/gap/Lscale/216px16px16pxBetween cards in a grid
spacing/component/gap/Mscale/18px8px8pxContent & button inside a card
spacing/component/gap/Sscale/0-54px4px4pxTags, compact contexts
spacing / component / inset
spacing/component/inset/XLscale/432px24px20pxModals, lightboxes, sidesheets
spacing/component/inset/Lscale/324px24px20pxLarge widgets, expanded panels
spacing/component/inset/Mscale/216px16px16pxMedium cards, standard containers
spacing/component/inset/Sscale/18px8px8pxSmall cards, compact containers
spacing/component/inset/XSscale/0-54px4px4pxLocation chips, tags, badges
spacing / micro
spacing/microscale/0-54px4px4pxIcon + text pairs, inline elements

These are alias tokens — they reference global scale values and carry semantic intent. Use the alias token name in code, never the raw scale value. Responsive variants collapse at breakpoints below xl (1440px).

Alias Token HTML Desktop xl Desktop lg Tablet md Mobile sm Mobile xs
layout/breakpoint@media1440px1280px768px480px0px
layout/grid-columns.container12121244
layout/grid-margin.container88px64px40px24px16px
layout/grid-gutter.grid24px24px16px16px16px
layout/content-max-width.container1264px1152px688px432px

Layout tokens are applied as CSS custom properties on layout containers — no direct Tailwind utility maps to them. Reference via var(--layout-grid-margin) etc. in component CSS. Grid column count drives the 12-column system used across all Kiva page templates.

Chapter 04

Teaching AI to follow design rules

Chapter 3 solved the structure problem. This chapter solves a different one: making the system machine-readable.

A perfectly layered token architecture with no semantic annotations is still noise to an AI tool — a well-organized vocabulary with no grammar. The gap between AI uses our tokens and AI uses our tokens correctly comes down to one thing: whether the token communicates intent, not just value. That's what this chapter is about.

Token Descriptions — The Semantic Layer

Most teams export variables to AI and assume it will figure out the rest. That assumption is the root cause of the hallucination problem. AI doesn't infer design intent from structure alone — it needs the same contextual knowledge a junior designer has after a week of onboarding: when to use this token, where it belongs, what constraint it carries. Without that, it guesses. With it, it infers correctly the first time.

Without descriptions
name: color-action-primary
value: #2E5245
AI sees a hex code. Guesses where to apply it.
With descriptions
name: color-action-primary
value: #2E5245
description:
"Primary action color. Use for CTAs and
interactive elements. Must meet AA on
surface. Never use for decorative fills."
AI understands intent. Applies it correctly.
</>  Strongest Differentiator
AI Rules File in the Codebase

A project-level rules file committed to the Kiva repository. Any designer or engineer using AI coding tools automatically works within design system constraints — without specifying it in every prompt.

Most designers stop at personal workflow. This embeds constraints at the system level — the guardrails are inherited, not requested.

# .ai-rules (committed to repo)
design_system: kiva-ds-2026
tokens: ./tokens/alias.json
constraints:
  - Use semantic tokens, never raw values
  - Follow spacing scale (4px base)
  - All text must meet WCAG AA contrast
  - Responsive breakpoints: 480 / 768 / 1024

Making It Usable

A well-designed system only produces reliable output if people know how to use it correctly. The token descriptions and rules file solve the data problem — they don't solve the workflow problem. If designers don't know how to export variables, how to prompt against the system, or how to validate what AI returns, the system provides no behavioral guarantee. These four deliverables are the bridge between system quality and workflow practice.

01
Variable Export Workflow
How to export Figma variables to AI tools correctly. Includes which formats work, what gets lost in translation, and how to structure prompts around the variable system.
02
Prompt Guide
Step-by-step AI prototyping process for designers, not engineers. Describe what you want in natural language — don't tweak pixel by pixel. The AI follows design intent accurately.
03
Automated Design QA
A vibe-coded Figma plugin: open the file and it auto-audits every frame against design system guidelines and applies corrections — no checklist, no manual process. Paired with a SKILL.md that gives AI coding tools the same constraints, so AI-generated prototypes self-correct before review.
04
Multi-Audience Documentation
Most design system documentation is written for one reader. This is written for three: concise token-to-code reference for engineers, visual usage examples with rationale for designers, and structured rule sets that give AI tools parseable context to follow guidelines without human correction. Same system, different entry points.

The Documentation

Documentation is where a design system's intelligence becomes transferable. The token descriptions, rules file, and workflow guides built in this chapter are only useful if they're backed by documentation thorough enough that someone — or something — can follow them independently.

Each of these four guidelines is written at the intersection of three audiences. For designers: the implementation rules and hierarchy rationale that prevent guesswork. For engineers: the code references and constraint context that eliminate manual overrides. For AI tools: the structured, parseable surface that closes the hallucination gap at source — not at the prompt layer, but before a single query is made.

Chapter 05

See the difference

We took a static impact dashboard and had AI rebuild it as an interactive prototype using our design tokens. Same AI tool, same prompt structure — the only variable was the quality of design system input.

Before
Without Semantic Tokens
AI output with exported variables but no descriptions, no semantic layer, no usage rules. The tokens are there — but the AI has no way to understand intent.
Open full demo →
After
With Full Token Descriptions
AI output with semantic annotations, usage context, and relationship mappings. Same tool, same prompt — dramatically closer to production fidelity.
Open full demo →

Even with all variables exported, AI doesn't nail everything — still about 10–15% off. That's why we built the documentation.

CSS uses design tokens
Animations pure CSS, zero JS
Responds to breakpoints
Styling reference for eng
Chapter 06

Scaling beyond one person

There's a specific failure mode in infrastructure work that looks like success: the system is well-built, people are using it, but all the tacit knowledge — the edge case judgment, the context behind decisions, the "ask Lin" reflex — still lives with one person. It scales as long as that person is reachable. The moment they're not, the system degrades.

The measure of a truly sustainable system isn't adoption. It's independence under normal conditions: designers building correctly without asking, engineers making token decisions with confidence, new team members onboarding from documentation rather than institutional memory. That requires deliberate effort to move knowledge out of your head and into the system itself — before you think you need to.

Prompt Guide for Knowledge Transfer
The first test of whether the system was actually self-sustaining: could someone run the AI prototyping workflow without asking me? I wrote the guide to be that proof — not a tutorial, but a complete transfer of methodology. Colleagues ran it independently in the first week. That was the bar, not the document quality.
100+ Colleague Demo — CEO & CTO in Attendance
A 90-minute session with 100+ colleagues across every function — CEO, CTO, 10 designers, 9 PMs, 30+ engineers, plus marketing and operations. The goal wasn't to demonstrate the tool. It was to reframe what the design system was: not a design team deliverable, but shared product infrastructure that engineering, design, and AI tools all relied on. The question in the room shifted from "is this useful?" to "how do we govern this?" That shift was the outcome.
Two-Week Formal Adoption Window
After the demo, a structured two-week window for teams to pilot the workflow and surface issues in real use. Deliberately not a soft launch — a designed feedback loop. Every issue was triaged and resolved before becoming a habit. The problems that emerged during this window were more valuable than any pre-launch testing, because they came from actual team behavior under real conditions.
Automated Design QA Process
Compliance at scale requires removing friction from the right path, not adding gates. The automated QA plugin opens a Figma file and runs a complete audit against guidelines — flagging and correcting deviations without a checklist. The paired SKILL.md brings the same behavior into AI coding tools, so prototypes generated by Claude Code or Cursor inherit the same QA pass automatically. The constraint is built in; it doesn't require discipline to enforce.
Design System as Shared Infrastructure
The final repositioning: moving the design system's organizational identity from "design team output" to shared product infrastructure with engineering co-ownership. This changed how token updates were reviewed, how documentation gaps were prioritized, and who had a stake in keeping the system current. Sustainability isn't just a property of documentation — it's a property of who feels responsible for it.
Chapter 07

AI readiness as a practice

Chapter 6 is about people — making the system usable and resilient to human turnover. This chapter is about something different: making the system resilient to the AI tools themselves changing underneath it.

A design system's AI readiness isn't a state you reach and maintain. It has an expiration date tied to model updates you don't control. A token description that steered a model correctly in Q1 may produce different output in Q3 — not because the documentation changed, but because the model's interpretation of it did. New fine-tuning shifts how context is weighted. New tool interfaces change what gets passed to the model and what gets stripped. New hallucination patterns emerge from new training distributions. This means AI accuracy is a metric that requires measurement, not assumption.

The maintenance cycle below isn't process for its own sake. It's the mechanism that keeps a point-in-time build accurate over time — and the cadence that makes improvement legible to stakeholders beyond the design team.

Stress Test
Run the current AI tools against the design system across a structured prompt battery — covering layout, components, patterns, and documentation interpretation. The goal isn't to catch every failure; it's to get a representative sample across failure categories: token-level misapplication, component-level reinvention, pattern-level drift. What the model got right last cycle is not assumed to still be correct — it's re-verified.
Gap Analysis
Quantify the hallucination surface from the stress test: which tokens are misapplied and under what conditions, which components are being generated from scratch instead of referenced, which documented constraints are being ignored. Findings are logged and compared against the previous cycle — the output is a tracked delta, not a subjective impression. Design and engineering review together, because some gaps require system changes and some require code changes.
Update Cycle
Based on the gap analysis: refine token descriptions to close the most common misapplication patterns, tighten the HTML/Tailwind mapping where it's producing ambiguity, update documentation examples to address observed edge cases. Each update is scoped and reviewed by engineering before merging — the system is a technical contract, not a design artifact, and changes to it affect code. The target isn't a perfect cycle. It's a measurably smaller gap than the last one.
Cross-Functional Demo & Review
Present the before/after delta to a cross-functional audience each quarter. This step does two things: it keeps stakeholders calibrated on where AI accuracy actually stands (not where it was at launch), and it creates organizational accountability for the maintenance work. Showing improvement over time is what converts AI readiness from a design team initiative into a company-level practice with shared ownership.

Validated across three tools — Figma Make, Claude Code, and Google Stitch. Each showed measurable fidelity improvement after each review cycle. The target was never zero hallucination — that's not achievable with current technology, and claiming otherwise would undermine the credibility of everything else. The target was a shrinking gap, tracked visibly, quarter over quarter. That's now a standing practice.

Results

Impact

100+
Colleagues across design, engineering, product, and marketing trained in a single company-wide session — CEO and CTO in attendance. Design system repositioned as shared AI infrastructure, not a design team asset.
36+
CSS inconsistencies found & fixed in token audit — engineers now trust what's in Figma matches what ships
Days → hours
Prototype iteration time compressed — ~60% reduction in design-to-delivery cycle across teams
3 tools
Figma Make, Claude Code, and Google Stitch validated — all producing measurable fidelity improvements after each AI Readiness Review cycle
Quarterly
AI Readiness Reviews established as standing company practice — stress test, gap analysis, update cycle, cross-functional demo

Reflection

"An AI-ready design system isn't a project you finish. It's a standard you maintain. Every quarter, the gap gets smaller — and that's the whole job."
What's next

Let's
connect.

Looking for teams where design infrastructure is treated as a product — not a support function.

lin@linzhao.design