From Voice to Text

Discord (Legcord)
   ↓
PipeWire graph
   ↓
discord_sink (virtual null sink)
   ↓
discord_sink.monitor (loopback source)
   ↓
whisper_mic (remap-source, mono, 16kHz)
   ↓
ffmpeg
   ↓
audio.wav (growing file)
   ↓
whisper-stream

On the device used for transcription, a Discord client (Legcord) is running and joins the session's Discord voice channel. ~~System~~Legcord's ~~sound~~audio output is ~~fed~~moved ~~by pipewire into~~to a virtual null sink (named discord_sink) ~~and~~via aPipeWire. The sink's loopback source (~~called~~ discord_sink.monitor), is remapped by whisper_mic and fed into toffmpeg.

~~ffmpeg.~~

Whisper-stream uses the wav file created by ffmpeg to create the transcript.

Using a minimal language model (tiny.en), the voice input is transcribed with a delay of 5 to 10 seconds.