voicemail-transcriber: Whisper-Powered Voicemail Transcription via IMAP

Voicemail-to-email services are everywhere — NumberGarage, Google Voice, countless VoIP providers. Most of them send you an audio file attached to an email. Some offer transcription, but the quality is usually poor enough that you’re listening to the audio anyway. voicemail-transcriber fixes that by running the audio through OpenAI’s Whisper model and forwarding you a new email with the transcription at the top.

What it does

It’s a Docker service that monitors one or more IMAP mailboxes, detects emails with audio attachments, transcribes the audio using Whisper, and forwards the email to a destination address with the transcription prepended in a styled block. The original attachment is preserved.

The Whisper medium model runs with GPU acceleration. I run it on a GTX 1660 Super — not a high-end card — and it handles voicemail transcriptions without issue. First run downloads the 1.42GB model, which is then cached.

Configuration

Single account mode uses discrete environment variables in a .env file:

IMAP_HOST=imap.yourmailserver.com
IMAP_USERNAME=voicemail@yourdomain.com
IMAP_PASSWORD=yourpassword
FORWARD_TO=you@yourdomain.com

Multiple accounts use a JSON array, letting you monitor separate mailboxes with separate forwarding destinations. Each account can include an optional phone number that gets injected into the subject line as [555.123.4567] so you know which number was called at a glance.

Running it

git clone https://github.com/kshartman/voicemail-transcriber
cd voicemail-transcriber
cp .env.example .env
# Edit .env with your credentials
docker-compose up -d --build

Processed emails are moved to an archive folder. Emails older than a year are automatically pruned from the archive. The service logs hourly statistics and sends a startup notification email when the container restarts, so you know it’s running.

Requirements

An NVIDIA GPU with CUDA support is required for reasonable performance. The NVIDIA Container Toolkit needs to be installed on the host. The README has the full setup commands for that.

The code is on GitHub: kshartman/voicemail-transcriber.

← saslfail: Escalating Fail2ban Bans for Postfix SASL Authentication Attacks