From Voice to Text

Discord (Legcord)
   ↓
PipeWire graph
   ↓
Monitor / Loopback source
   ↓
ffmpeg
   ↓
whisper-stream

On the device used for transcription, a Discord client (Legcord) is running and joins the session's Discord voice channel. System sound is fed by pipewire to ffmpeg and whisper-stream. Using a minimal language model (tiny), the voice input is transcribed with a delay of 5 to 10 seconds.