# Appendix

<span>Additional information and thoughts for the future. </span>

# Things to Look Into

- [https://picovoice.ai/blog/whisper-cpp-speaker-diarization/](https://picovoice.ai/blog/whisper-cpp-speaker-diarization/)

> [OpenAI Whisper](https://openai.com/index/whisper/) delivers highly accurate speech-to-text transcription, but it does not track speaker changes. Applications that rely on Whisper cannot determine *who* is speaking in a conversation. To add [speaker labels](https://picovoice.ai/blog/speech-to-text-features/) such as "Speaker 1" and "Speaker 2," you can integrate [Falcon Speaker Diarization](https://picovoice.ai/platform/falcon/) with [whisper.cpp. creating a fully local, offline, multi-speaker transcription system.](https://github.com/ggml-org/whisper.cpp)
> 
> This tutorial explains how to add speaker segmentation to Whisper, so you can determine "who spoke when" and generate timestamps to produce labeled Whisper transcripts suitable for multi-speaker recordings. This approach is ideal for use cases such as [podcast transcriptions](https://picovoice.ai/blog/new-era-for-podcast-publishers/), [meeting transcription and summarization](https://picovoice.ai/use-cases/meeting-transcription/), or [call center analytics](https://picovoice.ai/blog/challenges-of-call-centers/), where speaker identification is essential for readability and downstream analysis.

---

- [https://github.com/av/harbor](https://github.com/av/harbor)

Doesn't work with whisper.cpp? Would have been an interesting option.

> Harbor is a CLI and companion app that lets you spin up a complete local LLM stack—backends like Ollama, llama.cpp, or vLLM, frontends like Open WebUI, plus supporting services like SearXNG for web search, Speaches for voice chat, and ComfyUI for image generation—all pre-wired to work together with a single `harbor up` command. No manual setup: just pick the services you want and Harbor handles the Docker Compose orchestration, configuration, and cross-service connectivity so you can focus on actually using your models.

---

- LocalAI [https://localai.io/](https://localai.io/)

This will definitely be a starting point on better hardware.

---

- whisper-diarization might make discerning the speaker possible

---

Possible better(?) model for German? (Definitely not feasible on a Raspberry Pi 500)  
[https://huggingface.co/primeline/whisper-large-v3-german](https://huggingface.co/primeline/whisper-large-v3-german)

# System Information

<p class="callout info">This page was part of the project write-up I mainly did for myself and is kept for reference.</p>

### Device

<table id="bkmrk-device-name-cox-devi" style="border-collapse:collapse;width:76.0714%;height:60.5938px;border-width:1px;"><colgroup><col style="width:17.7734%;"></col><col style="width:82.2266%;"></col></colgroup><tbody><tr style="height:30.7969px;"><th style="height:30.7969px;background-color:rgb(248,248,248);">**Device Name**</th><td style="height:30.7969px;">Cox</td></tr><tr style="height:29.7969px;"><th style="height:29.7969px;background-color:rgb(248,248,248);">**Device Type**</th><td style="height:29.7969px;">Raspberry Pi 500 DE Version</td></tr></tbody></table>

### ![Raspberry Pi 500](https://info.zusammenkunft.net/uploads/images/gallery/2026-01/scaled-1680-/1-79-1.png)

### Networking

<table id="bkmrk-address-type-10.20.0" style="border-collapse:collapse;width:47.619%;height:89.3907px;"><colgroup><col style="width:50.0885%;"></col><col style="width:50.0885%;"></col></colgroup><thead><tr style="height:29.7969px;"><td style="height:29.7969px;">**Address**</td><td style="height:29.7969px;">**Type**</td></tr></thead><tbody><tr style="height:29.7969px;"><td style="height:29.7969px;"><span class="s1">10.20.0.94</span>

</td><td style="height:29.7969px;">Wi-Fi</td></tr><tr style="height:29.7969px;"><td style="height:29.7969px;">  
</td><td style="height:29.7969px;">Ethernet</td></tr></tbody></table>

### Hardware

<table id="bkmrk-name-type-notes-proc" style="border-collapse:collapse;width:100%;height:242.172px;"><colgroup><col style="width:17.5209%;"></col><col style="width:36.3527%;"></col><col style="width:46.1265%;"></col></colgroup><thead><tr style="height:29.7969px;"><th style="height:29.7969px;">**Name**</th><th style="height:29.7969px;">**Type**</th><th style="height:29.7969px;">**Notes**</th></tr></thead><tbody><tr style="height:46.5938px;"><th style="background-color:rgb(248,248,248);height:46.5938px;">**Device**</th><td style="height:46.5938px;">Raspberry Pi 500</td><td style="height:46.5938px;">Single Circuit Board Computer integrated into a keyboard casing. German language keyboard variant.</td></tr><tr style="height:29.7969px;"><th style="height:29.7969px;background-color:rgb(248,248,248);">**Processor**</th><td style="height:29.7969px;">Quad-core 64-bit Arm Cortex-A76</td><td style="height:29.7969px;">  
</td></tr><tr style="height:29.7969px;"><th style="height:29.7969px;background-color:rgb(248,248,248);">**Memory**</th><td style="height:29.7969px;">8GB</td><td style="height:29.7969px;">PDDR4X SDRAM</td></tr><tr style="height:46.5938px;"><th style="height:46.5938px;background-color:rgb(248,248,248);">**Storage**</th><td style="height:46.5938px;">MicroSD Card: 32GB</td><td style="height:46.5938px;">Raspberry Pi A2-Class MicroSD Card

Pre-programmed with Raspberry Pi OS

</td></tr><tr style="height:29.7969px;"><th style="height:29.7969px;background-color:rgb(248,248,248);">**Display**</th><td style="height:29.7969px;">Raspberry Pi Monitor 15.6"</td><td style="height:29.7969px;">  
</td></tr><tr style="height:29.7969px;"><th style="height:29.7969px;background-color:rgb(248,248,248);">**Sound In/Out**</th><td style="height:29.7969px;">Bluetooth Headset</td><td style="height:29.7969px;">  
</td></tr></tbody></table>

[![Raspberry Pi 500 Back Schematics](https://info.zusammenkunft.net/uploads/images/gallery/2026-01/scaled-1680-/image1-5.webp)](https://info.zusammenkunft.net/uploads/images/gallery/2026-01/image1-5.webp)

### Software

At the time of first installation/writing

<table id="bkmrk-name-description-not" style="border-collapse:collapse;width:100%;height:450.75px;"><colgroup><col style="width:17.64%;"></col><col style="width:48.9869%;"></col><col style="width:33.3731%;"></col></colgroup><thead><tr style="height:29.7969px;"><th style="height:29.7969px;">**Name**</th><th style="height:29.7969px;">**Description**</th><th style="height:29.7969px;">**Notes**</th></tr></thead><tbody><tr style="height:29.7969px;"><td style="height:29.7969px;background-color:rgb(248,248,248);">**OS**</td><td style="height:29.7969px;">Raspberry Pi OS - Debian GNU/Linux 13 (trixie)

</td><td style="height:29.7969px;">  
</td></tr><tr style="height:29.7969px;"><td style="height:29.7969px;background-color:rgb(248,248,248);">**Sound**</td><td style="height:29.7969px;"><span class="s1">PulseAudio (on PipeWire 1.4.2)</span>

</td><td style="height:29.7969px;">  
</td></tr><tr style="height:29.7969px;"><td style="background-color:rgb(248,248,248);height:29.7969px;">**libsdl2-dev**

</td><td style="height:29.7969px;"><span class="s1">2.32.4+dfsg-1</span>

</td><td style="height:29.7969px;">  
</td></tr><tr style="height:29.7969px;"><td style="background-color:rgb(248,248,248);height:29.7969px;">**pulseaudio-utils**</td><td style="height:29.7969px;"><span class="s1">17.0+dfsg1-2+rpt1</span>

</td><td style="height:29.7969px;">  
</td></tr><tr style="height:29.7969px;"><td style="height:29.7969px;background-color:rgb(248,248,248);">**<span class="s1">ffmpeg</span>**

</td><td style="height:29.7969px;"><span class="s1">version 7.1.3-0+deb13u1+rpt1</span>

</td><td style="height:29.7969px;">  
</td></tr><tr style="height:46.5938px;"><td style="height:46.5938px;background-color:rgb(248,248,248);">**whisper.cpp**</td><td style="height:46.5938px;">[v1.8.2](https://github.com/ggml-org/whisper.cpp/releases/tag/v1.8.1) (2025-10-15)</td><td style="height:46.5938px;">Port of OpenAI's Whisper model in C/C++ (MIT Licence)</td></tr><tr style="height:29.7969px;"><td style="height:29.7969px;background-color:rgb(248,248,248);">**Model**</td><td style="height:29.7969px;"><span class="s1"><s>ggml-base.bin</s> ggml-tiny.en.bin</span>

</td><td style="height:29.7969px;">  
</td></tr><tr style="height:46.5938px;"><td style="height:46.5938px;background-color:rgb(248,248,248);">**ollama**

</td><td style="height:46.5938px;">  
</td><td style="height:46.5938px;">  
</td></tr><tr style="height:29.7969px;"><td style="height:29.7969px;background-color:rgb(248,248,248);">**Models**</td><td style="height:29.7969px;">  
</td><td style="height:29.7969px;">  
</td></tr><tr style="height:29.7969px;"><td style="height:29.7969px;background-color:rgb(248,248,248);">**tmux**</td><td style="height:29.7969px;"><span class="s1">3.5a-3</span>

</td><td style="height:29.7969px;">  
</td></tr><tr style="height:29.7969px;"><td style="height:29.7969px;background-color:rgb(248,248,248);">**<span class="s1">build-essential</span>**

</td><td style="height:29.7969px;"><span class="s1">12.12</span>

</td><td style="height:29.7969px;">  
</td></tr><tr style="height:29.7969px;"><td style="background-color:rgb(248,248,248);height:29.7969px;">**<span class="s1">git</span>**

</td><td style="height:29.7969px;"><span class="s1">1:2.47.3-0+deb13u1</span>

</td><td style="height:29.7969px;">  
</td></tr><tr style="height:29.7969px;"><td style="background-color:rgb(248,248,248);height:29.7969px;">**<span class="s1">cmake</span>**

</td><td style="height:29.7969px;"><span class="s1">3.31.6-2</span>

</td><td style="height:29.7969px;">  
</td></tr><tr><td style="background-color:rgb(248,248,248);">**<span class="s1">Legcord</span>**

</td><td>1.2.1

<span class="s1">stable 499788 (c55be3e)</span>

</td><td><span class="s1">Build Override: N/A  
Vencord dc25c8f (Web)  
Electron 40.1.0  
Chromium 144</span></td></tr></tbody></table>

#### ffmpeg

<details id="bkmrk-ffmpeg-ffmpeg-versio"><summary>ffmpeg</summary>

```bash
ffmpeg version 7.1.3-0+deb13u1+rpt1 Copyright (c) 2000-2025 the FFmpeg developers
built with gcc 14 (Debian 14.2.0-19)

configuration: --prefix=/usr --extra-version=0+deb13u1+rpt1 --toolchain=hardened --incdir=/usr/include/aarch64-linux-gnu \
--enable-gpl --disable-stripping --disable-libmfx --disable-mmal --disable-omx --enable-gnutls --enable-libaom \ 
--enable-libass --enable-libbs2b --enable-libcdio --enable-libcodec2 --enable-libdav1d --enable-libflite \
--enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libglslang --enable-libgme \ 
--enable-libgsm --enable-libharfbuzz --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg \
--enable-libopenmpt --enable-libopus --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr \
--enable-libspeex --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx \ 
--enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzimg --enable-openal \
--enable-opencl --enable-opengl --disable-sndio --disable-libvpl --libdir=/usr/lib/aarch64-linux-gnu \
--arch=arm64 --enable-neon --enable-v4l2-request --enable-libudev --enable-epoxy --enable-libdc1394 \
--enable-libdrm --enable-libiec61883 --enable-vout-drm --enable-chromaprint --enable-frei0r --enable-ladspa \
--enable-libbluray --enable-libcaca --enable-libdvdnav --enable-libdvdread --enable-libjack --enable-libpulse \
--enable-librabbitmq --enable-librist --enable-libsrt --enable-libssh --enable-libsvtav1 --enable-libx264 \
--enable-libzmq --enable-libzvbi --enable-lv2 --enable-sand --enable-sdl2 --enable-libplacebo --enable-librav1e \
--enable-pocketsphinx --enable-librsvg --enable-libjxl --enable-shared

libavutil      59. 39.100 / 59. 39.100
libavcodec     61. 19.101 / 61. 19.101
libavformat    61.  7.100 / 61.  7.100
libavdevice    61.  3.100 / 61.  3.100
libavfilter    10.  4.100 / 10.  4.100
libswscale      8.  3.100 /  8.  3.100
libswresample   5.  3.100 /  5.  3.100
libpostproc    58.  3.100 / 58.  3.100
```

</details>#### pactl

<details id="bkmrk-pactl-server-string%3A"><summary>pactl</summary>

```
Server String: /run/user/1000/pulse/native
Library Protocol Version: 35
Server Protocol Version: 35
Is Local: yes
Client Index: 77
Tile Size: 65472
User Name: mela
Host Name: Cox
Server Name: PulseAudio (on PipeWire 1.4.2)
Server Version: 15.0.0
Default Sample Specification: float32le 2ch 48000Hz
Default Channel Map: front-left,front-right
Default Sink: auto_null
Default Source: auto_null.monitor
Cookie: 56a6:8e1c
```

</details>Pipewire

<details id="bkmrk-pipewire-mela%40cox%3A%7E-"><summary>Pipewire</summary>

```bash
mela@Cox:~ $ systemctl --user status pipewire pipewire-pulse wireplumber --no-pager
● pipewire.service - PipeWire Multimedia Service
     Loaded: loaded (/usr/lib/systemd/user/pipewire.service; enabled; preset: enabled)
     Active: active (running) since Wed 2026-01-07 17:21:03 CET; 3 days ago
 Invocation: d54a55bb3ef54b47b454d6cb42d5e273
TriggeredBy: ● pipewire.socket
   Main PID: 1334 (pipewire)
      Tasks: 3 (limit: 9571)
        CPU: 34ms
     CGroup: /user.slice/user-1000.slice/user@1000.service/session.slice/pipewire.service
             └─1334 /usr/bin/pipewire

Jan 07 17:21:03 Cox systemd[1301]: Started pipewire.service - PipeWire Multimedia Service.
Jan 07 17:21:04 Cox pipewire[1334]: mod.rt: RTKit error: org.freedesktop.DBus.Error.Servi…nknown
Jan 07 17:21:04 Cox pipewire[1334]: mod.rt: RTKit does not give us MaxRealtimePriority, using 1
Jan 07 17:21:04 Cox pipewire[1334]: mod.rt: RTKit error: org.freedesktop.DBus.Error.Servi…nknown
Jan 07 17:21:04 Cox pipewire[1334]: mod.rt: RTKit does not give us MinNiceLevel, using 0
Jan 07 17:21:04 Cox pipewire[1334]: mod.rt: RTKit error: org.freedesktop.DBus.Error.Servi…nknown
Jan 07 17:21:04 Cox pipewire[1334]: mod.rt: RTKit does not give us RTTimeUSecMax, using -1

● pipewire-pulse.service - PipeWire PulseAudio
     Loaded: loaded (/usr/lib/systemd/user/pipewire-pulse.service; enabled; preset: enabled)
     Active: active (running) since Wed 2026-01-07 17:21:03 CET; 3 days ago
 Invocation: 1e32511184954e5d9e8903b4482f4731
TriggeredBy: ● pipewire-pulse.socket
   Main PID: 1345 (pipewire-pulse)
      Tasks: 3 (limit: 9571)
        CPU: 27ms
     CGroup: /user.slice/user-1000.slice/user@1000.service/session.slice/pipewire-pulse.service
             └─1345 /usr/bin/pipewire-pulse

Jan 07 17:21:03 Cox systemd[1301]: Started pipewire-pulse.service - PipeWire PulseAudio.
Jan 07 17:21:04 Cox pipewire-pulse[1345]: mod.rt: RTKit error: org.freedesktop.DBus.Error.…known
Jan 07 17:21:04 Cox pipewire-pulse[1345]: mod.rt: RTKit does not give us MaxRealtimePriori…ing 1
Jan 07 17:21:04 Cox pipewire-pulse[1345]: mod.rt: RTKit error: org.freedesktop.DBus.Error.…known
Jan 07 17:21:04 Cox pipewire-pulse[1345]: mod.rt: RTKit does not give us MinNiceLevel, using 0
Jan 07 17:21:04 Cox pipewire-pulse[1345]: mod.rt: RTKit error: org.freedesktop.DBus.Error.…known
Jan 07 17:21:04 Cox pipewire-pulse[1345]: mod.rt: RTKit does not give us RTTimeUSecMax, using -1

● wireplumber.service - Multimedia Service Session Manager
     Loaded: loaded (/usr/lib/systemd/user/wireplumber.service; enabled; preset: enabled)
     Active: active (running) since Wed 2026-01-07 17:21:03 CET; 3 days ago
 Invocation: a5b8a90d609d4ec7a9111718a22690c1
   Main PID: 1342 (wireplumber)
      Tasks: 9 (limit: 9571)
        CPU: 183ms
     CGroup: /user.slice/user-1000.slice/user@1000.service/session.slice/wireplumber.service
             └─1342 /usr/bin/wireplumber

Jan 07 17:21:03 Cox systemd[1301]: Started wireplumber.service - Multimedia Service Sessi…nager.
Jan 07 17:21:04 Cox wireplumber[1342]: mod.rt: RTKit error: org.freedesktop.DBus.Error.Se…nknown
Jan 07 17:21:04 Cox wireplumber[1342]: mod.rt: RTKit does not give us MaxRealtimePriority…sing 1
Jan 07 17:21:04 Cox wireplumber[1342]: mod.rt: RTKit error: org.freedesktop.DBus.Error.Se…nknown
Jan 07 17:21:04 Cox wireplumber[1342]: mod.rt: RTKit does not give us MinNiceLevel, using 0
Jan 07 17:21:04 Cox wireplumber[1342]: mod.rt: RTKit error: org.freedesktop.DBus.Error.Se…nknown
Jan 07 17:21:04 Cox wireplumber[1342]: mod.rt: RTKit does not give us RTTimeUSecMax, using -1
Jan 07 17:21:04 Cox wireplumber[1342]: wp-internal-comp-loader: Loading profile 'main'
Jan 07 17:21:04 Cox wireplumber[1342]: default: Failed to get percentage from UPower: org…oOwner
Jan 07 17:21:04 Cox wireplumber[1342]: [0:00:08.569845993] [1342]  INFO Camera camera_man…251202
Hint: Some lines were ellipsized, use -l to show in full.

```

</details>#### Models

# Finding the Right Sound Device

<p class="callout success">If you want to recreate TranscriptOMatic on a system different from a Raspberry Pi 500, you will need to find out the right devices and names on your own. This was my approach, that might give you an idea where to start finding the right settings for your environment:</p>

##### Finding the System's Audio Sinks

Checking for the system's audio sinks:

```bash
pactl list short sinks
```

The result should look something like this:

```bash
mela@Cox:~ $ pactl list short sinks
35	auto_null	PipeWire	float32le 2ch 48000Hz	SUSPENDED
```

##### Finding the Discord Sound Device

After joining a Discord voice channel, checking for the Discord system sound device:

```bash
pactl list short sink-inputs
```

The result should be something like:

```bash
mela@Cox:~ $ pactl list short sink-inputs
184	35	183	PipeWire	float32le 2ch 48000Hz
```

##### Adding a Persistent Sink (And Using Its Monitor as a Stable Audio Source).

Create a dedicated null sink for Discord audio:

```bash
pactl load-module module-null-sink \
  sink_name=discord_sink \
  sink_properties=device.description=DiscordSink
```

After joining a Discord voice channel, the sink-input may disappear quickly if the channel is silent. To immediately move the active sink-input to the persistent sink:

```bash
pactl move-sink-input $(pactl list short sink-inputs | awk '{print $1}') discord_sink
```

This assumes that only a single sink-input is active, which is a reasonable assumption in a minimal setup, but may not hold true on a typical desktop system.

Controlling the result:

```bash
pactl list short sinks
```

The result should look something like this: `IDLE` instead of `SUSPENDED`.

`<span class="s1">IDLE</span>` means the sink exists persistently but currently receives no audio data.

```bash
mela@Cox:~ $ pactl list short sinks
199	discord_sink	PipeWire	float32le 2ch 48000Hz	IDLE
```

Getting the sound monitor source:

```bash
pactl list short sources
```

 The result should look something like this:

```bash
mela@Cox:~ $ pactl list short sources
199	discord_sink.monitor	PipeWire	float32le 2ch 48000Hz	IDLE
```

##### Changing `meeting-start` to Reflect Your Environment

Based on your results, change the relevant parts of meeting-start accordingly:

```bash
# 1) Ensure discord_sink exists
if ! pactl list short sinks | awk '{print $2}' | grep -qx "$DISCORD_SINK"; then
  pactl load-module module-null-sink \
    sink_name="$DISCORD_SINK" \
    sink_properties=device.description=DiscordSink >/dev/null
fi

# 2) Try to move an active sink-input (Discord) to discord_sink
#    This may be transient in silent channels → poll briefly.
moved="no"
for _ in {1..40}; do
  SID="$(pactl list short sink-inputs 2>/dev/null | awk 'NF{print $1}' | head -n1 || true)"
  if [[ -n "${SID:-}" ]]; then
    if pactl move-sink-input "$SID" "$DISCORD_SINK" 2>/dev/null; then
      moved="yes"
      break
    fi
  fi
  sleep 0.25
done

if [[ "$moved" != "yes" ]]; then
  echo "⚠️  Could not move a sink-input to $DISCORD_SINK."
  echo "    Make sure Legcord is connected to a voice channel, then re-run meeting-start."
fi

# 3) Remove existing whisper_mic remap-sources (idempotent)
for mid in $(pactl list short modules | awk '$0 ~ /module-remap-source/ && $0 ~ /source_name=whisper_mic/ {print $1}'); do
  pactl unload-module "$mid" >/dev/null 2>&1 || true
done
```

##### A Remapping Check

Under PipeWire, the master source of a remapped source may not be visible via <span class="s1">pactl list sources</span>. Successful capture via <span class="s1">ffmpeg</span> confirms correct wiring.

```bash
ffmpeg -f pulse -i whisper_mic -t 3 /tmp/whisper-mic-test.wav
ls -lh /tmp/whisper-mic-test.wav
```

A successful result:

```bash
-rw-rw-r-- 1 mela mela 565K Jan 11 07:23 /tmp/whisper-mic-test.wav
```

---

<p class="callout warning">PipeWire may ignore <span class="s1">rate=16000</span>; resampling can be handled downstream (e.g., by ffmpeg or whisper.cpp).</p>