Agent UI widgets

An agent on the canvas no longer has to render the built-in chat pane. Its owner can pick any widget — a custom chat, a voice UI, a dashboard — to be the agent's UI. That widget connects to the agent over the host.agents SDK surface: a permission-gated handle that speaks multimodal UAMP, reads chat history + non-secret config, and can drive (and be driven by) the agent.

This guide covers the host.get / host.agents factory, the AgentHandle surface, the connection grant + capability query, and the trust model.

Phase 1 is owner-only: a custom UI resolves only when the viewer is the agent's owner (a non-owner / public visitor always gets the built-in chat). Public exposure is a separately-reviewed later phase.

Resolving an agent handle

host.get(kind, id) is a generic, permission-gated resource-handle factory. host.agents.get(id) is typed sugar for kind: 'agent':

await host.ready();

// Which agents is this widget connected to? `list()` reflects ONLY the
// granted set (never an enumeration of every agent).
const [agentRef] = await host.agents.list(); // [{ kind:'agent', id }]
const agent = await host.agents.get(agentRef.id);

When your widget is installed as an agent's UI, the host pre-grants its agent, so host.agents.list() returns it. You can also read the id from your widget config (initialKv) if the installer seeded it.

host.get(kind, id) / a handle op rejects any (kind, id) the widget was not granted with a permission error — and the underlying server routes independently authorize the viewer, so the grant is a convenience gate, not the security boundary.

The `AgentHandle` surface

interface AgentHandle {
  id: string;
  kind: 'agent';

  // Realtime, multimodal UAMP bus (cookie-authed canonical endpoint).
  uamp: {
    send(event): Promise<{ ok: true }>;        // raw UAMP event(s)
    on(cb): () => void;                         // streamed response.delta events
    turn(content, { chatId?, onDelta?, onDone? }): Promise<{ ok: true }>;
  };

  // Chat history (reads); a turn is sent via `uamp`/`chats.send`.
  chats: {
    list(): Promise<…>;                         // the agent's chats
    get(chatId, { limit?, before? }): Promise<…>; // message history
    send(chatId, content): Promise<{ ok: true }>;
    subscribe(chatId, cb): () => void;          // ambient peer msgs (Phase 1b)
  };

  // Non-secret config: read, + a NARROW owner-only write.
  config: { get(): Promise<…>; update(patch): Promise<…> };

  // Modality + execution — drives UI auto-config (e.g. STT/TTS choice).
  capabilities(): Promise<{ mode, input, output, execution }>;

  // Convenience subscriptions.
  onDelta(cb): () => void;    // streamed response.delta
  onMessage(cb): () => void;  // message.created / message.updated
  onPresent(cb): () => void;  // agent `present` payloads (from the delta stream)
}

Converse over UAMP

// One text turn, streaming the reply.
await agent.uamp.turn('Summarize my last chat', {
  onDelta: (delta) => appendToken(delta?.text ?? ''),
  onDone: () => markComplete(),
});

turn(content) builds a UAMP session.create → input.* → response.create and streams the server's response.delta events back. Pass an object with a UAMP type (e.g. { type: 'input.audio', audio, format }) for non-text modalities.

The transport is the canonical, cookie-authed POST /api/agents/:id/uamp — the same protocol A2A and the NLI skill speak. The bridge holds the connection on your behalf; the sandbox never sees credentials.

Render a custom chat

const { chats } = await agent.chats.list();
const { messages } = await agent.chats.get(chats[0].id, { limit: 50 });
renderHistory(messages);

agent.onDelta((ev) => streamIntoBubble(ev.delta));
agent.onPresent((p) => renderPresented(p)); // agent showed a widget/card
await agent.chats.send(chats[0].id, 'hello');

Auto-configure from capabilities

capabilities() is the UAMP capability handshake — host.agents sends a capabilities.query over the agent's UAMP channel and the agent replies with a unified, non-secret capabilities (its model's real modalities + identity + voice). Not a bespoke endpoint, not a guess:

const caps = await agent.capabilities();
// caps.input  / caps.output : ['text','audio',…]  ← what the agent accepts / emits
// caps.execution            : 'cloud' | 'realtime' | 'local'
// caps.avatarUrl / caps.displayName : agent identity (render the avatar / name)
// caps.voiceId?             : a default voice
if (caps.input.includes('audio') && caps.execution === 'realtime') {
  streamAudioToAgent();              // provider does STT+TTS end-to-end
} else if (caps.input.includes('audio')) {
  sendAudioPerTurn();                // multimodal agent transcribes server-side
} else {
  sttThenSendText();                 // text agent → on-device STT
}
if (!caps.output.includes('audio')) connectTts(); // text out → speak with a TTS model

The widget picks its pipeline from what the agent actually does. (Server: the agent UAMP route answers capabilities.query via buildUampAgentCapabilities() — resolveAgent(id).getCapabilities() + identity + configured voice; the legacy voice-config is only a fallback.)

Edit non-secret config (owner-only)

// Allowed: ui (→ the agent's custom-UI choice), greetingMessage, suggestedActions.
await agent.config.update({ greetingMessage: 'Hey there 👋' });

config.update is a narrow allowlist, enforced server-side. It deliberately cannot change the agent's model, instructions, enabledTools, talkTo, or pricing — "non-secret" is not the same as "safe to let an in-page widget repoint the agent's brain". Those stay owner-edited through the normal settings.

The channel is symmetric. Beyond the UAMP response stream (which already lets the agent decide what you render), the agent can call named commands your widget declares via host.commands.handle(...) — reusing the existing WidgetCommandBus. Declare a command interface in your widget registry entry and the connected agent can invoke it (e.g. setTab, highlight) and call it as a tool. (Client-declared, LLM-native tools — WebMCP-shaped — are a Phase 1b addition.)

Trust model (how access is gated)

Owner-only (Phase 1). A custom UI resolves only for the agent's owner.

Server-side authorization is the boundary. Every host.agents op runs as the authenticated viewer over a same-origin route that authorizes ownership / chat-participation. host-side grants (state.connectedResources) are a UX / defence-in-depth filter, never the sole gate.

Grants are written server-side at install. A sandboxed widget cannot author its own grant.

Secrets never cross. config.get returns non-secret fields only; the agent's provider keys + server tools stay server-side (ADR-v3-12).

The sandbox is the credential boundary. The iframe runs allow-scripts on an opaque origin; the bridge holds all cookies/tokens.

The choice lives on the agent config at metadata.ui = { widgetType?, url?, initialKv? }. Set it any of three ways:

the "Agent UI" picker in the agent's settings (select a registry widget or a custom URL);

programmatically (owner) via agent.config.update({ ui: { widgetType: 'voice' } });

a config-editing agent through the factory update_agent tool's ui field.

Clear it (ui: null, or pick "Default" in the picker) to restore the built-in chat. The custom UI renders for the owner whenever metadata.ui is set; featureFlags.customUI === false is an explicit per-agent kill-switch (default on).