We Built Cages Called Templates

There is an uncomfortable truth at the foundation of every interface you have ever used: it was built by someone who had to imagine, in advance, every state you might encounter. Every screen. Every error. Every edge case. The developer's job was to enumerate the possibility space, then build it.

This is so embedded in how we think about software that it does not feel like a design choice. It feels like the nature of the medium. It is not. It is a historical accident — one specific answer to a problem the industry stopped questioning around 1984. And it is finally being undone.

What is happening right now, across the period roughly spanning 2024 through 2026, is that this premise is being dissolved. Not incrementally improved — dissolved. Large language models are demonstrating that the enumeration step can be skipped entirely. The interface does not need to be pre-built. It can be reasoned into existence, in real time, from the context of what a user actually needs at this moment.

This is what the field is calling Generative UI. And it is the subject of this four-part series.

A note on this series

This is Part 1 of four. It is the conceptual entry point — no framework knowledge required, no code to read. Subsequent parts go deeper into the engineering stack, design systems, and the longer-term vision. You can read them in any order, but this one comes first for a reason: the rest only makes sense if you accept the premise being argued here.

§ 01 — Foundations

The Sixty-Year Cage

Nielsen Norman Group noted that AI represents the first new UI paradigm in sixty years. The math is worth pausing on. Batch processing arrived in 1945. Command lines and graphical interfaces arrived in 1984. And then — nothing. For four decades, the industry refined what existed, rather than reaching for what did not.

The sixty-year gap is not incidental. It reflects how deeply entrenched the old model was. The graphical interface was so cognitively intuitive that it crowded out every competing idea. We built on top of it. We refined it. We added dark mode. But the fundamental transaction — developer imagines states, developer builds states, user visits states — never changed.

The graphical interface solved interface discovery so elegantly that it made interface imagination seem unnecessary. — a framing worth sitting with

The problem with pre-enumerated states is that reality does not pre-enumerate. Think about a site reliability engineer dealing with a cascading infrastructure failure at three in the morning. They do not want to navigate between tabs that some product manager imagined during a planning meeting six months ago. They want the interface that fits this incident, this moment, this specific cluster of correlated signals. That interface has never existed before. It has to be composed from scratch.

Or think about something simpler. A user in a customer support chat asks "can you compare the three plans you offer?" Today, in most products, the answer is a wall of text. The information was always there — pricing pages exist, comparison tables exist — but the chat surface cannot summon them on demand. The interface and the conversation live in separate worlds, joined only by the user's willingness to switch tabs and read carefully. This is not an interaction. It is a handoff.

The formal case for dynamic generation

Two recent papers give the field its first serious empirical footing.

Stanford's SALT NLP Lab published one of the more rigorous treatments in August 2025 — "Generative Interfaces for Language Models," abbreviated as GILM. Their contribution was formalizing what a generative interface actually is, mathematically. They represent it as a directed graph where the nodes are interface views and the edges are transitions triggered by user events or AI decisions. At the component level, each individual widget is governed by a finite state machine — meaning it can only be in one of a small set of valid states, and only certain transitions between those states are allowed.

What this means practically is that even when an interface is composed on the fly, it can maintain logical consistency. The LLM is not just generating HTML out of thin air. It is navigating a structured possibility space — one that has rules, constraints, and predictable transitions. The chaos is bounded. This is the architectural answer to the most common objection ("but what if the AI invents a Delete Database button that does not exist?"): it cannot, because the state graph does not contain that edge.

GILM · Stanford SALT NLP

72%

improvement in human preference for generated interfaces over conversational UIs. The wall of text doesn't just lose on aesthetics — it loses on cognition.

Google Research · Nov 2025

44%

of generated websites judged comparable to human-expert-crafted ones in blind evaluation. This is not "good enough." This is a category shift.

The GILM result is about preference — given a choice between a wall of text from a chatbot and a generated interface, users overwhelmingly choose the generated interface. That is not surprising. The Google number is more interesting. It is about parity with human experts — meaning that in nearly half of cases, the AI-generated result was indistinguishable from what a senior designer would have produced. Parity is the threshold beyond which any economic argument for human-only production starts to weaken.

These are not finished results. The literature is months old. But pre-enumeration is not just a workflow constraint — it is a cognitive ceiling. And the ceiling has cracked.

§ 02 — Taxonomy

Three Ways to Break the Cage

Saying "generate the interface" is easy. Doing it without producing chaos is harder. The field has spent two years converging on a practical answer — not a single technique, but a three-category taxonomy that has survived contact with production systems. This taxonomy turns an overwhelming space into a decision framework.

The three categories sit on a spectrum defined by two competing values: control (who decides what renders) and freedom (what can be rendered). Higher control means safer, more consistent, more predictable. Higher freedom means more expressive, more novel, more capable of surprise. You cannot have both at maximum simultaneously. Every team's job is to decide where on the spectrum each AI feature should sit.

Category 01

Static Generative

The AI selects from components you pre-built. It decides which to use and what data to pass. The code never changes — only the selection does.

Control

Freedom

Security

BEST FORRegulated industries, brand-sensitive products, any context where visual consistency is non-negotiable.

Category 02

Declarative Generative

The AI emits a structured specification. The client renders it using its own native widgets. Data, not code. Safe, but expressive.

Control

Freedom

Security

BEST FOREnterprise multi-platform deployments, design-system-compliant AI features, teams that need cross-framework portability.

Category 03

Open-Ended

The AI generates complete HTML surfaces, custom visualizations. Maximum creative freedom. Requires sandboxing to be safe.

Control

Freedom

Security

BEST FORCreative tools, custom data visualization, contexts where the agent needs to invent novel presentation formats.

All three categories are deployed in production right now, simultaneously, by serious teams. The debate over which approach is "correct" is largely meaningless. They solve different problems.

A financial dashboard serving regulated data to institutional investors will live almost entirely in the static and declarative zones — the visual language matters too much, and the legal exposure of a hallucinated chart is too high. A creative tool where AI composes novel data visualizations from messy real-world datasets will push hard toward open-ended, because the value is the novelty. A copilot embedded in a complex enterprise application will quietly layer all three, choosing the approach contextually per task: static cards for confirmed entities, declarative specs for assembled summaries, open-ended canvases when the user asks for something the catalog cannot describe.

The taxonomy works best not as a competition between approaches but as a set of dials you decide where to set, per feature. You do not commit your entire product to one category. You commit each AI surface to whichever category matches the risk profile, the brand stakes, and the kind of value you are trying to deliver.

What this changes about the design conversation

If you have been in a design review where someone asked "should our AI assistant be a chatbox or a panel or a sidebar?" — this taxonomy reframes the question. The shape of the surface is not the decision that matters. The decision that matters is how much of the visual output is being pre-approved by humans versus composed by the model, and how much creative latitude the model has within whatever envelope you give it.

The chatbox-versus-panel debate is a question about where the AI shows up. The taxonomy is a question about how it shows up. The second question is the one that determines whether the product feels safe, polished, surprising, or unhinged — and it is the one most teams are not asking yet.

§ 03 — Looking Ahead

A Fourth Category Is Emerging

The three-part taxonomy is the cleanest way to make sense of generative UI as it exists today. But research groups and frontier product teams are now describing something the taxonomy does not quite capture — a surface type with no chat interface at all.

In this fourth pattern, the AI never appears as a conversational partner. There is no input box, no chat bubble, no "ask me anything" prompt. Instead, the AI expresses itself purely through mutating the application's native interface. A panel rearranges itself based on what is happening in the data. A field auto-fills when context suggests an obvious value. A new view materializes because the system noticed something the user has not yet asked about.

This pattern — sometimes called chatless AI or ambient generative UI — is already shipping in production at Microsoft 365, Linear, Superhuman, and Datadog, among others. It may turn out to be the most commercially significant variation, because it sidesteps a problem that the chat-default approach never solved: most users cannot articulate, in words, what they want a tool to do. Forcing them to type a prompt is itself a UX failure.

Coming in Part 3

Your Design System Is No Longer Yours Alone

Part 3 of this series takes the chatless pattern seriously. It examines what happens to your design system when the second audience for your components is not a human developer but an LLM trying to compose interfaces from your catalog — and what the death of the chat box means for accessibility, articulation, and the next decade of product design.

Read Part 3 when it lands

Before we get there, though, Part 2 has to do the unglamorous work of mapping the engineering substrate that makes all of this possible — the framework landscape, the protocol stack, and the standards battle quietly being negotiated between Google, Anthropic, OpenAI, and the rest of the major labs. If you build software, that is the part you will want to read carefully.

The cages are coming down. What gets built in their place is the rest of this series.