Skip to content

Voice Transcription Pipeline (Whisper ASR)

Converts raw audio recordings from sales calls and meetings into structured transcripts that the Meeting Intelligence and Signal Pipeline can process. Uses OpenAI Whisper (ASR — Automatic Speech Recognition) as the core transcription engine.

Why Whisper

  • Open-source, runs locally (no data leaves the environment)
  • High accuracy on business/sales conversation audio
  • Speaker diarization support (who said what)
  • Handles noisy environments, accents, technical terminology

Pipeline Flow

Audio File (Zoom / Meet / Sembly export)
Whisper ASR (local inference)
Raw Transcript (timestamped, speaker-labeled)
Cleaning & Normalization
Meeting Intelligence Signal Extraction

Output Format

Each processed recording produces: - Timestamped transcript with speaker labels - Confidence scores per segment - Audio quality metadata (for QA flagging) - Ready-to-extract format for Meeting Intelligence

Current State (2026-04-23)

  • Epic active (Monday.com: 11507547725)
  • Integration with Synthetic Meeting Transcript Generator in parallel build
  • Target: feed cleaned transcripts into signal extraction before Pilot Readiness milestone