Voice widgets
Voice widgets consume Plan 1's host.live.attach('voice:<agentId>').
Two reference widgets ship with the platform and double as the
canonical templates for community widget authors.
voice-chat (Mode 2 — portal-relay realtime)
The Mode 2 widget connects to a portal-mediated provider session
(Gemini Live). host.live.attach returns a portal-relay transport
carrying the relay coordinates — the SDK never opens the wire for you, so
the widget opens the relay WebSocket itself and speaks the UAMP realtime
protocol over it. The provider API key never reaches the widget
(ADR-v3-12); you only ever see a short-lived relayToken.
import { createBarVisualizer } from '/widgets/shared/audio-viz.js';
const agentId = new URL(location.href).searchParams.get('agentId');
await host.ready();
const attach = await host.live.attach(`voice:${agentId}`, {
transports: ['portal-relay'],
});
const t = attach.transport; // { kind:'portal-relay', relayUrl, relayToken }
const ws = new WebSocket(`${t.relayUrl}&token=${encodeURIComponent(t.relayToken)}`);
ws.onmessage = (ev) => {
const msg = JSON.parse(ev.data);
if (msg.type === 'session.created') { /* ready to talk */ }
if (msg.type === 'response.audio.delta') playWavChunk(msg.audio); // base64 WAV, 24kHz
if (msg.type === 'response.done') { /* agent finished its turn */ }
if (msg.type === 'response.cancelled') stopPlayback(); // barge-in acknowledged
};
// Push-to-talk: stream PCM16 mono 16kHz while held, then commit.
function onHold() { /* mic → ws.send({ type:'input.audio', audio: base64Pcm16 }) */ }
function onRelease() { ws.send(JSON.stringify({ type: 'input.audio_committed' })); }
// Barge-in: interrupt the agent mid-response.
function interrupt() { ws.send(JSON.stringify({ type: 'response.cancel' })); }UAMP realtime protocol (widget ⇄ relay)
| Direction | Event | Meaning |
|---|---|---|
| → relay | input.audio { audio } | base64 PCM16 mono 16kHz chunk |
| → relay | input.audio_committed | end of the user's turn (push-to-talk release) |
| → relay | response.cancel | barge-in — interrupt the agent's current turn |
| ← widget | session.created | relay + provider session ready |
| ← widget | response.audio.delta { audio } | base64 WAV chunk (24kHz) — feed to decodeAudioData |
| ← widget | response.done | agent finished its turn |
| ← widget | response.cancelled | interruption acknowledged — stop local playback |
| ← widget | session.error { error } | upstream/auth failure |
The widget declares <meta name="robutler:widget" allowMic> so the iframe
sandbox is granted microphone at mount time. Audio in is 16kHz; audio
out is 24kHz — use a dedicated AudioContext for playback so the two
rates don't fight. See /widgets/shared/audio-viz.js for the bar
visualizer the reference widget uses.
voice-ondevice (Mode 1 — WebGPU)
await host.ready();
const attach = await host.live.attach(`voice:${agentId}`);
// attach.meta.iframeBootstrap.models === { llm, voiceId, systemPrompt }
const { llm, voiceId, systemPrompt } = attach.meta.iframeBootstrap.models;
// Dynamic import + WebGPU pipeline setup; no portal proxying.
const { pipeline } = await import('https://cdn.jsdelivr.net/npm/@huggingface/transformers');
const stt = await pipeline('automatic-speech-recognition', 'onnx-community/whisper-small');
const tts = await pipeline('text-to-speech', `onnx-community/Kokoro-82M-${voiceId}`);The widget declares <meta name="robutler:widget" allowMic allowGpu>.
allowGpu triggers the CSP relaxation that permits cdn.jsdelivr.net,
*.hf.co, and huggingface.co in connect-src for widgets whose
path is in WIDGET_ALLOWGPU_SEEDED_PATHS.
viseme-utils.js
/widgets/shared/viseme-utils.js is a standalone ES module — no SDK
dependency. It provides:
VISEME_INVENTORY— the 14 visemes the mapper produces.mapSpectrumToViseme(spectrum, sampleRate)— pure function; the primitive that maps a single FFT frame to a viseme.audioToVisemes(source, opts)— async iterable yielding{ viseme, weight, startMs, endMs }.applyVisemesToElement(element, source, opts)— drives the element'sdata-viseme/data-viseme-weightattributes; combine with CSS rules like[data-viseme="aa"] .mouth { transform: scaleY(1.6); }.
Mic permissions
Plan 1's iframe-sandbox plumbing reads <meta robutler:widget allowMic>
at mount and injects microphone into the sandbox allow attribute.
If the user denies the prompt, navigator.mediaDevices.getUserMedia
rejects; the widgets above surface the error inline rather than
crashing.