Voice Transcription Pipeline (Whisper ASR)¶

Converts raw audio recordings from sales calls and meetings into structured transcripts that the Meeting Intelligence and Signal Pipeline can process. Uses OpenAI Whisper (ASR — Automatic Speech Recognition) as the core transcription engine.

Why Whisper¶

Open-source, runs locally (no data leaves the environment)
High accuracy on business/sales conversation audio
Speaker diarization support (who said what)
Handles noisy environments, accents, technical terminology

Pipeline Flow¶

Audio File (Zoom / Meet / Sembly export)
       ↓
Whisper ASR (local inference)
       ↓
Raw Transcript (timestamped, speaker-labeled)
       ↓
Cleaning & Normalization
       ↓
Meeting Intelligence Signal Extraction

Output Format¶

Each processed recording produces: - Timestamped transcript with speaker labels - Confidence scores per segment - Audio quality metadata (for QA flagging) - Ready-to-extract format for Meeting Intelligence

Current State (2026-04-23)¶

Epic active (Monday.com: 11507547725)
Integration with Synthetic Meeting Transcript Generator in parallel build
Target: feed cleaned transcripts into signal extraction before Pilot Readiness milestone

Meeting Intelligence — Downstream consumer of transcripts
Signal Pipeline — Receives extracted signals
Synthetic Training Data Factory — Uses transcripts for training data generation

Voice Transcription Pipeline (Whisper ASR)¶

Why Whisper¶

Pipeline Flow¶

Output Format¶

Current State (2026-04-23)¶

Related¶