Part 1 — We Built Cages Called Templates
Part 1 of this series argued that the sixty-year premise of pre-enumerated user interfaces is dissolving. If you accepted that argument, you arrive at the next question almost immediately: okay, so what do I actually build with?
That question has two layers of answer, and most engineers see only the top one. The visible layer is the framework you import — Vercel AI SDK, CopilotKit, json-render, whatever your team has settled on. That conversation is loud, opinionated, and largely settled by which tool you reached for first.
The invisible layer is more consequential. Beneath the frameworks, a three-protocol stack has been quietly forming since late 2024. Within roughly twelve months, it has received buy-in from Google, Microsoft, AWS, Anthropic, Oracle, and OpenAI simultaneously. That kind of cross-vendor convergence is rare in software. The last time something like it happened, the result was HTTP. The time before that, TCP/IP.
This post does two things, in that order. First, it maps the framework landscape as it stands in mid-2026 — honestly, with the version numbers and tradeoffs that actually matter. Then it lifts the floor and shows you the protocol stack underneath, which is what your framework choices are increasingly converging toward whether you notice it or not.
Audience note
This is the engineering-heavy post in the series. It assumes basic TypeScript and React familiarity, and a working understanding of how LLM tool-calling fits into a frontend application. If you read Part 1 and it left you wondering what the implementation surface looks like in practice, this is that surface.
§ 01 — Frameworks, Mapped
Six Tools, Honestly Ranked
By mid-2026, the GenUI framework landscape has stratified. The experiments of 2023 and early 2024 have been sorted by three forces: production usage, GitHub star momentum, and — most decisively — whether enterprise teams have actually bet their roadmaps on them. The result is a clear top tier, a couple of contenders with genuine differentiation, and a long tail of projects that are interesting but not yet load-bearing.
Here is an honest assessment of the six tools that matter. The columns are deliberate. DX is how pleasant the tool is to use day-to-day. Cross-platform is whether the same spec can target web, mobile, and native without parallel implementations. Maturity is whether you can ship production traffic through it today without a series of escalating surprises.
| Framework | Approach | Stars | DX | X-platform | Maturity |
|---|---|---|---|---|---|
| Vercel AI SDK | Tool-calling + parts | 23.4K | |||
| CopilotKit | AG-UI protocol + SDK | 30.1K | |||
| json-render | Catalog-constrained spec | 13K | |||
| Google A2UI | Declarative JSON protocol | 13.4K | |||
| Thesys C1 / OpenUI | API middleware + DSL | — | |||
| Tambo | Full-stack + built-in agent | 8K |
Each represents a fundamentally different philosophical bet about where the value in a GenUI framework comes from. Vercel won by being the most flexible. Tambo is winning by being the most opinionated. And Thesys is making the most contentious claim of the year. Each is worth a closer look.
Vercel AI SDK v6 — and the abstraction that made it work
The Vercel AI SDK has become the default substrate for TypeScript GenUI work, not because Vercel out-innovated anyone, but because they got to good abstractions early and then kept iterating without breaking everyone's code. With 10.5 million weekly npm downloads, the centre of gravity in the ecosystem is not subtle.
The single most important architectural decision in the SDK's history was the split, introduced in v5, between UIMessage and ModelMessage. To understand why this matters, picture what life was like before it. The shape of the data your frontend rendered was directly determined by the shape of the data the LLM produced. If OpenAI changed its message format, your UI had to change with it. If you wanted to attach a custom field — say, a "user has approved this action" flag — you were either fighting the framework or smuggling state through stringly-typed metadata fields.
The split separates these concerns at the type level. ModelMessage is what the model sees and produces, and it follows whatever the provider's API requires. UIMessage is what your frontend cares about, and it can carry whatever fields make sense for your application — including stateful information about tool calls, attachments, custom UI parts, and approval status. The SDK handles the translation between the two. The frontend data model is no longer hostage to the LLM protocol.
Version 6 builds on that foundation with the ToolLoopAgent class — a reusable agent abstraction with first-class support for human-in-the-loop approval gates — and full Model Context Protocol support. The agent itself is now a primitive you can compose, rather than a control flow you have to hand-write each time.
const agent = new ToolLoopAgent({ model: anthropic('claude-sonnet-4-6'), tools: { getSystemHealth, scaleService }, // Human-in-the-loop gate: destructive actions pause for approval. // The frontend gets a "needs-approval" part it can render as a button. needsApproval: ({ tool }) => tool.name === 'scaleService', }); // The stream yields a sequence of `parts` — small typed events // that mix prose, tool states, and UI fragments in order. for await (const part of agent.stream(messages)) { // When a tool finishes, we get its full structured output — // which the frontend can render as a real component, not a string. if (part.type === 'tool-getSystemHealth' && part.state === 'output-available') { renderHealthCard(part.output); } }
The parts array is the practical expression of the UIMessage/ModelMessage separation — it is a typed sequence of small events (text fragment, tool started, tool output available, UI piece) that the frontend can map onto components without ever parsing the model's raw response. And the needsApproval callback is the smallest possible API surface for one of the most important features in the entire framework: pausing an autonomous agent at a destructive boundary so a human can confirm. Both of those are easy to underrate until you have tried to implement them by hand.
Tambo — when "primitives" is not what you want
If the Vercel SDK is a box of well-shaped LEGO bricks, Tambo is a furniture kit. The bricks are flexible; the furniture is opinionated. Both are valid, and the choice between them is one of the more important architectural decisions a team building GenUI features will make in 2026.
Where the Vercel SDK gives you primitives to assemble — agent loops, streaming utilities, tool definitions — Tambo gives you the whole stack, including the agent that selects and streams component props at runtime. The mental model is unusual at first: you register a React component along with a Zod schema that describes its props, and Tambo's agent becomes capable of invoking it. You do not write the orchestration. You declare the components. The agent fills in the rest.
For teams that want "generative UI without building the orchestration layer from scratch," this is the closest thing the ecosystem has to a working shortcut. The 8,000 GitHub stars accumulated in a relatively short window suggest a real audience for that proposition. The differentiator is the compliance posture: Tambo ships with SOC 2 and HIPAA compliance built in. For healthcare, fintech, and any regulated context where "we'll figure out compliance later" is not a viable answer, this is not a small thing. It is often the deciding factor.
The cost of Tambo's opinionation is the usual one. If your needs sit anywhere outside the kit's assumptions — unusual streaming patterns, complex multi-agent topologies, custom transport — you will fight the framework. Vercel will let you build whatever you can imagine; Tambo will let you build what it imagines for you, faster. Choose accordingly.
OpenUI Lang — the 67% claim that matters if it holds
The most philosophically interesting development of early 2026 is Thesys's open-sourcing of OpenUI Lang, a purpose-built domain-specific language for generative UI specifications. The claim being made is unusually concrete: 67% fewer tokens than equivalent JSON for the same UI description.
At production scale, token efficiency compounds in three different directions simultaneously. There is the obvious cost dimension — fewer tokens per render means lower API spend, and at a million renders a day that arithmetic becomes serious. There is the latency dimension — shorter specifications stream faster, and streaming latency is one of the few remaining levers for making GenUI feel as snappy as static UI. And there is the context window dimension — specifications that fit in less budget free more room for tools, instructions, and history.
If OpenUI Lang delivers equivalent expressiveness at dramatically lower cost, it would justify a format migration the ecosystem will find painful but ultimately rational. The honest caveat is that the 67% number is currently Thesys's own benchmark, on test cases Thesys chose, against a JSON baseline Thesys constructed. Independent reproduction is still pending. If you are about to bet a roadmap on OpenUI Lang, run the benchmark on your own representative payloads first.
§ 02 — The Protocol Stack
What Is Actually Underneath
Step back from the framework debate for a moment, because something more architecturally significant is happening one layer down. The frameworks are all converging on the same protocols. Not because anyone is forcing them to, but because the protocols solve problems the frameworks individually cannot.
Four layers are now visible in production GenUI systems. Each layer has its own job, and each has a clear winner — or at least a clear consensus candidate. Together they form the substrate that the next generation of agentic software will run on.
The cleanest way to read this diagram is from the bottom up. Reasoning happens in the model. The model needs data and tools, which MCP provides. The model's output needs to reach the user in real time, which AG-UI handles. And the output needs a structured form the client can render, which A2UI (or its peers) specifies. Each layer is independently swappable. Each layer talks to its neighbour through an open protocol.
An emergent architecture, not a designed one
Nobody designed this stack. There was no committee, no RFC process, no founding manifesto. It emerged from three independent projects, started by competitors, that happened to solve complementary problems and then discovered they composed.
Anthropic · Nov 2024
MCP
Built to standardize how agents access tools, data sources, and resources.
CopilotKit · Mid 2025
AG-UI
Built to standardize the real-time event channel between agents and frontends.
Google · Late 2025
A2UI
Built to standardize how UI descriptions travel across platforms.
Three different organizations. Three different motivations. Three protocols that, almost by accident, slot into different positions in the same architecture. When developers discovered the pieces composed, the architecture solidified faster than any committee could have ratified it. This is the way real standards usually arrive — not by design, but by repeated rediscovery of the same problem from different sides.
MCP Apps (SEP-1865) — the surprise of January 2026
If you needed evidence that the protocol stack is consolidating faster than the framework debate suggests, the MCP Apps standard published in January 2026 is the clearest signal yet. The headline is unusual enough that it is worth stating plainly. Anthropic and OpenAI — direct competitors in one of the most consequential technology markets in living memory — co-authored an interoperability spec. Not a press release. An actual technical specification, with reference implementations on both sides.
SEP-1865 defines how interactive HTML interfaces can be embedded inside MCP clients. The technical content is sensible: mandatory iframe sandboxing for security isolation, postMessage JSON-RPC for host-to-UI communication, and a new ui:// URI scheme for declaring UI resources alongside the existing data and tool resources. None of these inventions are individually novel. The significance is that they are jointly agreed upon by two organizations that have every commercial reason to fragment the ecosystem and instead chose not to.
The practical implication is that any MCP server can now surface an interactive UI inside any MCP-compatible client. A Grafana MCP server can render charts directly inside Claude or ChatGPT — not screenshot URLs, not markdown approximations, real interactive charts. A Linear MCP server can show a working kanban board. The boundary between "talking to a tool" and "using a tool's UI" is dissolving from the bottom of the stack upward.
When five competitor organizations adopt compatible specifications within twelve months, the ecosystem locks in. The protocols become commodity infrastructure. The value moves up the stack, to the products built on top.
§ 03 — Looking Ahead
The Implication You May Not Have Noticed
If the protocol stack is converging — and it is — then a question that has been comfortably theoretical for years becomes urgent. Once the substrate is commodity, what is your product actually competing on? Not the framework, which is a choice that will increasingly not matter. Not the protocol, which is being standardized out of your hands. What is left, for most teams, is the catalog of components the AI is allowed to compose from. That catalog is the design system. And it is no longer being read only by humans.
This is the territory Part 3 of the series enters. The arguments there cut across the design-engineering boundary in ways the framework discussion does not — because the people who maintain the design system and the people consuming it (now: large language models) have genuinely different requirements, and the existing tooling was built for only one of them.
Coming in Part 3
Your Design System Is No Longer Yours Alone
Part 3 examines what happens to component libraries when an LLM is the second reader — and why the "chatless" pattern that's quietly shipping at Microsoft, Linear, Superhuman, and Datadog may be the most commercially consequential UI shift of the decade.
Read Part 3 when it landsPart 4 then closes the series with the longer-horizon question — protocol convergence, anticipatory interfaces, and the governance problem that nobody is asking yet. The protocol stack mapped above is the most confident prediction in the entire series. The implications of having it are where things get interesting.