Authoring collaborative widgets with `host.collab`

A community-author's guide to building multiplayer widgets that live on a Robutler canvas. Companion docs: host-collab.md (SDK reference) and multiplayer-tictactoe-walkthrough.md (tutorial). All three sit on top of the v3 Plan 1 platform.

This guide walks through the eight things you need to know to ship a real multiplayer widget against Robutler's host.collab primitive. Two reference widgets ship in the box — multiplayer-tictactoe and multi-party-rtc — and the walkthroughs in this guide refer back to those files line by line. If you're building anything that two or more people poke at the same time, start by copy-pasting one of those.

A Robutler widget is a single HTML file that lives at public/widgets/<your-widget-id>/index.html. The portal serves it inside a sandboxed iframe (CSP and Permissions-Policy applied per the registry entry). There is no bundler, no framework, no node- modules — every dependency you need is either inlined or pulled at runtime from a CDN listed in your widget's CSP carve-out.

A collab widget has three structural pieces:

<!doctype html>
<html>
  <head>
    <meta name="robutler:widget" content='{"name":"my-widget", ...}' />
    <script src="/widgets/sdk.v2.js"></script>
    <style>/* inline CSS — external sheets blocked by base CSP */</style>
  </head>
  <body>
    <div id="root"><!-- your UI --></div>
    <script type="module">
      // 1) wait for the bridge
      await window.host.ready();

      // 2) mint a collab JWT for the workspace room
      const tok = await window.host.collab.getToken(
        'workspace',
        window.host.workspace.workspaceId,
      );

      // 3) load Yjs + Hocuspocus from the CDN whitelisted in CSP
      const Y = await import('https://cdn.jsdelivr.net/npm/yjs@13.6.30/+esm');
      const { HocuspocusProvider } = await import(
        'https://cdn.jsdelivr.net/npm/@hocuspocus/provider@3.1.4/+esm'
      );

      // 4) join the room
      const ydoc = new Y.Doc();
      const provider = new HocuspocusProvider({
        url: tok.wsUrl,
        name: tok.roomId,
        token: tok.token,
        document: ydoc,
      });

      // 5) … do real work …
    </script>
  </body>
</html>

That's the entire skeleton. The same five steps appear in every collab widget; the rest of this guide is about the data-modeling and correctness decisions you make on top of that skeleton.

Why dynamic ESM import? The widget iframe runs under a strict CSP that blocks bundlers. Yjs + Hocuspocus are loaded directly from cdn.jsdelivr.net (whitelisted in the per-widget CSP carve-out). You don't have to ship them yourself.

Why pure HTML? The widget bundle is a static asset served by the portal's Next.js host. No build step, no transpile, no chance of "my deploy broke production". A community-published widget is literally one HTML file you upload.

Failure modes you should handle

host.collab.getToken rejects with 503 if the collab pod is not enabled in the current environment (kill-switch, or Plan 1 hasn't shipped yet). Show a friendly notice; don't crash.

The dynamic import can fail (CDN outage, offline). Same advice.

provider.on('synced', …) may fire seconds late on cold rooms. Render a "Connecting…" state until then.

2. Picking your Yjs data structure

host.collab rooms expose a Y.Doc. You allocate top-level shared types on that doc — getMap('foo'), getArray('bar'), getText('baz'), or nest subdocs. Picking the right one is the single most consequential decision you make.

Shape	When to use	CRDT semantics
`Y.Map<K, V>`	Bounded set of keys with last-write-wins per key — game state, settings, slot assignments, tile occupants.	Concurrent writes to different keys merge; concurrent writes to the same key keep the one with the larger Lamport clock.
`Y.Array<T>`	Append-mostly logs — chat history, drawing strokes, event timeline.	Inserts at the same index by different peers preserve all inserts (ordered by clock). Deletes are tombstoned.
`Y.Text`	Rich-text or code editor buffers.	Character-level operational transform; ideal for prose.
Subdoc (`new Y.Doc()` placed in a parent Map)	Sharded large collections — one subdoc per item with lazy load.	Each subdoc is independently persisted; the parent map just references them.

Eventual consistency: what it means in practice

All Yjs operations are eventually consistent. That has three practical implications you should bake into your widget's UX:

No global ordering. "Player X moved before Player Y" is true only on the peer that observed both writes. Don't build logic that requires a total order — use additive game state where "move sequence" is irrelevant (each cell click is independent).

Last write per key. If two peers write game.set('nextPlayer', 'X') concurrently, one wins. The loser's local UI flickers for one tick. This is fine for game state because both writes were trying to express the same thing.

No transactions across peers. Yjs transactions (ydoc.transact(...)) bundle writes for observation atomicity within one peer — observers see one consistent snapshot. They do NOT prevent another peer from writing in between. For genuine cross-peer serialization you need a different primitive (an agent skill call, for example).

3. Awareness vs persisted state

Yjs has two completely different stores per room:

	`ydoc` (persisted)	`awareness` (ephemeral)
Survives all peers leaving?	Yes	No
Survives one peer reloading?	Yes	No (their state vanishes; comes back blank)
Latency	Sub-second (CRDT update batch)	Sub-second (broadcast)
Quota	Plan 1 hard cap on doc bytes	Plan 1 alert at 16KB p99 per write
Right answer for	Game state, document content, settings	Cursors, "typing now", selection highlight, WebRTC signaling

Heuristic: if losing the data on a reload would matter, put it in ydoc. If losing it on reload is expected (cursor goes away), put it in awareness.

Reserved awareness namespaces

Plan 1 server-side enforcement reserves these awareness keys for specific producers. Writing them from outside the reserved producer is rejected by Hocuspocus:

user.* — populated by Plan 1 from the JWT; you read this.

presence.* — your widget can write this (cursor, hover, selection).

comment.* — reserved for the canvas comment widget.

webrtc.* — reserved for the multi-party RTC pattern (§5).

Use presence.* for everything that doesn't have a more specific reserved namespace. The reference tic-tac-toe widget puts cursor + hover + slot-claim intent under presence.* — all three are ephemeral and tied to a specific peer's intent.

4. Conflict-free patterns

The reference widgets demonstrate three patterns worth memorizing.

Additive ops over destructive ops

Cell clicks in tic-tac-toe are additive — peer A writes board[3] = 'X', peer B writes board[6] = 'O'. Both succeed. The final board has both moves. Compare to a hypothetical "rotate the board 90°" operation: that's a global mutation; two concurrent rotations would race and produce undefined intermediate states. Avoid global mutations except when documented as destructive (see the drawing widget's "Clear canvas" button — explicitly destructive, explicitly documented, single button click guarded by a button press).

Idempotent transactions

Inside ydoc.transact(...), write what should be true rather than what should change:

ydoc.transact(() => {
  if (!game.has('board')) game.set('board', new Array(9).fill(null));
  if (!game.has('nextPlayer')) game.set('nextPlayer', 'X');
});

Multiple peers running this on first-join converge to the same state. No "first peer initializes, others read" race.

Intent → claim → confirm (slot races)

When two peers race to fill a single slot, write the intent to awareness first, then write the claim to ydoc:

// 1) Stake intent — visible to all peers immediately.
provider.awareness.setLocalStateField('presence', {
  ...prev,
  claimingSlot: 'X',
});

// 2) Wait a frame so a tying peer can see our intent.
setTimeout(() => {
  // 3) Commit — last writer wins via CRDT.
  const cur = game.get('playerSlots') || {};
  if (cur.X) return; // someone already there; fall back to spectator
  ydoc.transact(() => game.set('playerSlots', { ...cur, X: me }));
}, 16);

The intent stage reduces — but does not eliminate — collisions. CRDT last-write-wins is the fallback; the loser sees the slot go to the other peer and re-renders as spectator.

5. Multi-party WebRTC pattern

This is the canonical recipe for any N↔N peer-to-peer media use case. The reference is multi-party-rtc/index.html; copy-paste it whenever you need shared audio, video, or screen streams.

Mesh topology

Each peer in the room maintains one RTCPeerConnection per other peer. Bandwidth at the edge is O(N²); CPU per peer is O(N). The ceiling is 8 peers (ADR-v3-19); above that, the widget shows a "Maximum 8 participants" banner and the 9th joiner stays as a spectator.

Signaling via the `webrtc.*` awareness namespace

The widget never opens its own signaling websocket — it rides the collab room's awareness layer:

// Send an offer to peer B.
const prev = provider.awareness.getLocalState().webrtc || {};
mySeq += 1;
provider.awareness.setLocalStateField('webrtc', {
  ...prev,
  [remotePeerId]: { type: 'offer', sdp, seq: mySeq, from: myClientId },
});

Peer B observes the awareness change, reads state.webrtc[String(myClientId)], dedupes against its own lastSeen[A], and proceeds with the standard offer/answer dance.

Polite-peer rule

The peer with the larger clientID is "polite" and backs off on glare; the peer with the smaller clientID is impolite and proceeds. This is the W3C perfect-negotiation pattern — no bespoke handshake required.

TURN config from the JWT

The collab JWT embeds tok.turn = { url, username, credential, expiresAt }. Build your ICE config in one shot at startup; no separate REST round-trip needed:

const iceServers = [{ urls: 'stun:stun.l.google.com:19302' }];
if (tok.turn?.url) {
  iceServers.push({
    urls: tok.turn.url,
    username: tok.turn.username,
    credential: tok.turn.credential,
  });
}
new RTCPeerConnection({ iceServers, bundlePolicy: 'max-bundle' });

Cleanup on peer leave

When a peer's awareness state disappears (they navigated away, lost network, closed the tab), tear down their RTCPeerConnection:

provider.awareness.on('change', () => {
  const seen = new Set();
  provider.awareness.getStates().forEach((s, cid) => seen.add(cid));
  for (const cid of pcs.keys()) {
    if (!seen.has(cid)) tearDownPeer(cid); // close pc, remove tile
  }
});

6. Permissions + meta flags

The portal applies a strict Permissions-Policy and CSP to every widget iframe by default. To unlock browser capabilities your widget needs, declare them in the <meta name="robutler:widget" ...> block:

Flag	Unlocks	Use when
`allowMic`	`Permissions-Policy: microphone` on the iframe	You call `getUserMedia({ audio: true })`
`allowCamera`	`Permissions-Policy: camera`	You call `getUserMedia({ video: true })`
`allowScreen`	`Permissions-Policy: display-capture`	You call `getDisplayMedia()`
`allowGpu`	`Permissions-Policy: webgpu` + `connect-src` carve-out for first-party model CDNs	On-device foundation models (seeded widgets only — see anti-patterns)

Example:

<meta name="robutler:widget" content='{ ..., "permissions": ["allowMic","allowCamera","allowScreen"] }' />

The browser may still prompt the user for explicit consent on first use — that's intentional, not a bug.

CSP carve-outs

If your widget loads dependencies from a CDN (Yjs, foundation-model weights, etc.) or talks to a non-portal websocket, declare those in your registry entry's csp field:

'my-widget': {
  kind: 'iframe',
  entry: '/widgets/my-widget/index.html',
  csp: {
    connectSrc: ['https://cdn.jsdelivr.net', 'wss://collab.robutler.local'],
  },
},

The composer in lib/workspaces/widget-csp.ts merges your carve-out with the strict baseline.

7. Testing

Fake Yjs doc for unit tests

Yjs runs identically in node and browser. For per-widget unit tests:

import * as Y from 'yjs';
import { test } from 'vitest';

test('tic-tac-toe slot claim race', () => {
  const docA = new Y.Doc();
  const docB = new Y.Doc();
  const gameA = docA.getMap('game');
  const gameB = docB.getMap('game');

  // Race: both peers claim X simultaneously.
  docA.transact(() => gameA.set('playerSlots', { X: 'A' }));
  docB.transact(() => gameB.set('playerSlots', { X: 'B' }));

  // Sync.
  Y.applyUpdate(docB, Y.encodeStateAsUpdate(docA));
  Y.applyUpdate(docA, Y.encodeStateAsUpdate(docB));

  // Both peers converge — one of A or B wins. Last-write-per-clock.
  expect(gameA.get('playerSlots')).toEqual(gameB.get('playerSlots'));
});

Multi-peer e2e patterns

Playwright with multiple browser contexts is the right tool for multi-peer e2e:

const a = await browser.newContext();
const b = await browser.newContext();
await Promise.all([
  a.newPage().goto(`/workspace/${ws}#tictactoe`),
  b.newPage().goto(`/workspace/${ws}#tictactoe`),
]);
// Drive peer A's clicks, assert peer B's board updates.

The reference tests live in tests/e2e/widgets/.

8. Anti-patterns

Specific things not to do:

Don't store secrets in awareness. Awareness broadcasts to every workspace member. API keys, OAuth tokens, anything you'd put in an env var — never goes here. Use host.kv (per-widget-instance, workspace-member-only) or skip storage entirely.

Don't use awareness for durable state. Awareness vanishes when the last peer leaves. If you want it to survive a reload, it goes in ydoc. Period.

Don't bypass reserved namespaces. Writing user.*, comment.*, or webrtc.* (outside the multi-party-rtc pattern) is rejected server-side. Use presence.* for ephemeral peer state.

Don't ship community widgets with allowGpu. The CSP relaxation for GPU/on-device-model CDNs requires the widget's path to be in the WIDGET_ALLOWGPU_SEEDED_PATHS env var, per ADR-v3-07. That's a deliberate first-party-only carve-out. Community widgets get the Permissions-Policy bit (so WebGPU works in principle) but no CDN connect-src — you can't fetch model weights from outside the whitelisted origins.

Don't poll provider.awareness.getStates() on a timer. Subscribe to the change event instead. Polling burns CPU and introduces UI jitter.

Don't run an AnalyserNode per peer at the highest fftSize. 256 is plenty for active-speaker detection; higher values just cost battery on mobile.

Don't open a separate websocket for signaling. The collab room is already a signaling channel. Reuse it via the reserved webrtc.* namespace (or presence.* for non-RTC signals).

Don't assume a global clock. Yjs has no global time — only Lamport clocks per peer. If you need "newer wins by wall clock", store Date.now() in the value and compare on the consumer side.