Whisper Live Captioning Demo

Accessible HTML Presentation with Real-Time Transcription

Live captions via Web Speech API (browser, no install) or Whisper.cpp (local, high accuracy)

Open a terminal in your project root.
Set WHISPER_BIN and WHISPER_MODEL environment variables if needed.
Set WHISPER_LANGUAGE to an ISO 639-1 code (e.g. fr, de, ja) if not speaking English. Omit for auto-detection.
Run npm run dev:whisper to start the caption listener and transcript writer.
Reload this page. The red circle (🔴) recording indicator will appear when the transcript is available.

If you prefer, you can use npm run dev:transcript to mirror a text file to JSON instead.

Web Speech API — works in Chrome and Edge, including on GitHub Pages; click the captions button to start
Whisper.cpp — high-accuracy local binary; requires building and running locally
Both options use the same caption display at the bottom of the screen
Language selection is available in the captions dialog (70+ languages supported)

Whisper.cpp needs a local binary. For static hosting, consider these options:

Web Speech API — built into Chrome and Edge; no install needed; works on GitHub Pages
Whisper WASM — runs Whisper entirely in the browser; no server required
VibeVoice — browser-based voice transcription; see VibeVoice discussion (issue #1)
Cloud APIs — OpenAI Whisper API, Azure Speech, or AssemblyAI via a small proxy

Built into Chrome and Edge — no install or local server needed
Works on GitHub Pages and any HTTPS-hosted site
Click the captions button in the toolbar and choose Start Web Speech Captions
Select your spoken language from the dropdown (70+ languages available)
Firefox does not support SpeechRecognition; use Chrome or Edge

If the demo doesn't load, open it directly: whisper-demo/index.html

Polling /presentations/whisper-demo/transcript.json every second.