The widget below checks if the transcript file is available. If Whisper is running, you’ll see a red circle (🔴) recording indicator. If not, a help link appears with setup instructions.
WHISPER_BIN and WHISPER_MODEL environment variables if needed.WHISPER_LANGUAGE to an ISO 639-1 code (e.g. fr, de, ja) if not speaking English. Omit for auto-detection.npm run dev:whisper to start the caption listener and transcript writer.If you prefer, you can use npm run dev:transcript to mirror a text file to JSON instead.
Web Speech API: No install needed. Open in Chrome or Edge, click the captions button, choose Start Web Speech Captions, allow microphone access, and select your spoken language from the dropdown.
Whisper.cpp (local only): Run npm run dev:whisper after building the binary (configure with WHISPER_BIN, WHISPER_MODEL, and optionally WHISPER_LANGUAGE).
Watcher alternative: npm run dev:transcript to mirror transcript.txt to JSON
Whisper.cpp needs a local binary. For static hosting, consider these options:
The Web Speech API (SpeechRecognition) is the easiest path for GitHub Pages: it runs entirely in the browser using the browser's built-in engine, requires no installation, and works over HTTPS. Chrome and Edge support it; Firefox does not.
Whisper WASM loads the Whisper model into the browser via WebAssembly. The whisper-demo/ directory has a placeholder for this approach. It needs a CORS-enabled HTTP server to serve the large model file.
VibeVoice is a browser-first voice-notes tool that may be adaptable for live captioning. It has not been tested with this project; see issue #1 for discussion.
Cloud speech APIs (OpenAI Whisper API, Azure Cognitive Services Speech, AssemblyAI) offer high accuracy and work anywhere, but require an API key and a small server-side proxy to keep the key secret.
See README.md for a full comparison table of all alternatives.
SpeechRecognition; use Chrome or EdgeThe Web Speech API integration (slides/webspeech-captions.js) uses window.SpeechRecognition (or webkitSpeechRecognition for older Chrome). It enables continuous, interim-results mode so words appear as you speak. The final text buffer keeps the last ~30 words visible. When Web Speech is active the Whisper JSON poll is paused so the two sources do not conflict.
The selected language is stored in localStorage under the key whisperSlides.captionLanguage and defaults to the HTML lang attribute value (en-us). Changing the language in the dialog restarts recognition immediately.
If the demo doesn't load, open it directly: whisper-demo/index.html
Polling /presentations/whisper-demo/transcript.json every second.