Voice Transcription Pipeline (Whisper ASR)¶
Converts raw audio recordings from sales calls and meetings into structured transcripts that the Meeting Intelligence and Signal Pipeline can process. Uses OpenAI Whisper (ASR — Automatic Speech Recognition) as the core transcription engine.
Why Whisper¶
- Open-source, runs locally (no data leaves the environment)
- High accuracy on business/sales conversation audio
- Speaker diarization support (who said what)
- Handles noisy environments, accents, technical terminology
Pipeline Flow¶
Audio File (Zoom / Meet / Sembly export)
↓
Whisper ASR (local inference)
↓
Raw Transcript (timestamped, speaker-labeled)
↓
Cleaning & Normalization
↓
Meeting Intelligence Signal Extraction
Output Format¶
Each processed recording produces: - Timestamped transcript with speaker labels - Confidence scores per segment - Audio quality metadata (for QA flagging) - Ready-to-extract format for Meeting Intelligence
Current State (2026-04-23)¶
- Epic active (Monday.com: 11507547725)
- Integration with Synthetic Meeting Transcript Generator in parallel build
- Target: feed cleaned transcripts into signal extraction before Pilot Readiness milestone
Related¶
- Meeting Intelligence — Downstream consumer of transcripts
- Signal Pipeline — Receives extracted signals
- Synthetic Training Data Factory — Uses transcripts for training data generation