Robutler
GuidesWidgets

Voice widgets

Voice widgets consume Plan 1's host.live.attach('voice:<agentId>'). Two reference widgets ship with the platform and double as the canonical templates for community widget authors.

voice-chat (Mode 2 — portal-relay realtime)

The Mode 2 widget connects to a portal-mediated provider session (Gemini Live). host.live.attach returns a portal-relay transport carrying the relay coordinates — the SDK never opens the wire for you, so the widget opens the relay WebSocket itself and speaks the UAMP realtime protocol over it. The provider API key never reaches the widget (ADR-v3-12); you only ever see a short-lived relayToken.

import { createBarVisualizer } from '/widgets/shared/audio-viz.js';

const agentId = new URL(location.href).searchParams.get('agentId');
await host.ready();

const attach = await host.live.attach(`voice:${agentId}`, {
  transports: ['portal-relay'],
});
const t = attach.transport; // { kind:'portal-relay', relayUrl, relayToken }

const ws = new WebSocket(`${t.relayUrl}&token=${encodeURIComponent(t.relayToken)}`);

ws.onmessage = (ev) => {
  const msg = JSON.parse(ev.data);
  if (msg.type === 'session.created') { /* ready to talk */ }
  if (msg.type === 'response.audio.delta') playWavChunk(msg.audio); // base64 WAV, 24kHz
  if (msg.type === 'response.done') { /* agent finished its turn */ }
  if (msg.type === 'response.cancelled') stopPlayback(); // barge-in acknowledged
};

// Push-to-talk: stream PCM16 mono 16kHz while held, then commit.
function onHold() { /* mic → ws.send({ type:'input.audio', audio: base64Pcm16 }) */ }
function onRelease() { ws.send(JSON.stringify({ type: 'input.audio_committed' })); }

// Barge-in: interrupt the agent mid-response.
function interrupt() { ws.send(JSON.stringify({ type: 'response.cancel' })); }

UAMP realtime protocol (widget ⇄ relay)

DirectionEventMeaning
→ relayinput.audio { audio }base64 PCM16 mono 16kHz chunk
→ relayinput.audio_committedend of the user's turn (push-to-talk release)
→ relayresponse.cancelbarge-in — interrupt the agent's current turn
← widgetsession.createdrelay + provider session ready
← widgetresponse.audio.delta { audio }base64 WAV chunk (24kHz) — feed to decodeAudioData
← widgetresponse.doneagent finished its turn
← widgetresponse.cancelledinterruption acknowledged — stop local playback
← widgetsession.error { error }upstream/auth failure

The widget declares <meta name="robutler:widget" allowMic> so the iframe sandbox is granted microphone at mount time. Audio in is 16kHz; audio out is 24kHz — use a dedicated AudioContext for playback so the two rates don't fight. See /widgets/shared/audio-viz.js for the bar visualizer the reference widget uses.

voice-ondevice (Mode 1 — WebGPU)

await host.ready();
const attach = await host.live.attach(`voice:${agentId}`);
// attach.meta.iframeBootstrap.models === { llm, voiceId, systemPrompt }
const { llm, voiceId, systemPrompt } = attach.meta.iframeBootstrap.models;

// Dynamic import + WebGPU pipeline setup; no portal proxying.
const { pipeline } = await import('https://cdn.jsdelivr.net/npm/@huggingface/transformers');
const stt = await pipeline('automatic-speech-recognition', 'onnx-community/whisper-small');
const tts = await pipeline('text-to-speech', `onnx-community/Kokoro-82M-${voiceId}`);

The widget declares <meta name="robutler:widget" allowMic allowGpu>. allowGpu triggers the CSP relaxation that permits cdn.jsdelivr.net, *.hf.co, and huggingface.co in connect-src for widgets whose path is in WIDGET_ALLOWGPU_SEEDED_PATHS.

viseme-utils.js

/widgets/shared/viseme-utils.js is a standalone ES module — no SDK dependency. It provides:

  • VISEME_INVENTORY — the 14 visemes the mapper produces.
  • mapSpectrumToViseme(spectrum, sampleRate) — pure function; the primitive that maps a single FFT frame to a viseme.
  • audioToVisemes(source, opts) — async iterable yielding { viseme, weight, startMs, endMs }.
  • applyVisemesToElement(element, source, opts) — drives the element's data-viseme / data-viseme-weight attributes; combine with CSS rules like [data-viseme="aa"] .mouth { transform: scaleY(1.6); }.

Mic permissions

Plan 1's iframe-sandbox plumbing reads <meta robutler:widget allowMic> at mount and injects microphone into the sandbox allow attribute. If the user denies the prompt, navigator.mediaDevices.getUserMedia rejects; the widgets above surface the error inline rather than crashing.

On this page