Skip to content

Synthetic Training Data Factory

Generates labeled synthetic data — meeting transcripts, CRM records, signal extractions — for two purposes: (1) training and fine-tuning signal extraction models, and (2) populating demo environments with realistic but non-customer data.

Why Synthetic Data

  • No real customer data needed for model training or demo scenarios
  • Covers edge cases and rare signal types underrepresented in real data
  • Controllable: generate specific scenarios (competitive mention, executive sponsor drop-off, budget freeze) on demand
  • Safe to share with prospects during demos — no data privacy concerns

Outputs

Artifact Use
Synthetic meeting transcripts Signal extraction training + demo
Synthetic CRM opportunity records ERI demo data (Signal Capture, EcoTasks)
Labeled signal extractions Model fine-tuning labeled dataset
Scenario packages Specific demo storylines (healthcare CRO, tech VP Sales)

Components

  • Synthetic Meeting Transcript Generator — LLM-generated meeting transcripts with injected signals
  • Scenario Engine — Parameterized scenario templates (ICP vertical, deal stage, signal density)
  • Labeling Pipeline — Auto-labels generated transcripts with ground-truth signal annotations

Current State (2026-04-23)

  • Synthetic Meeting Transcript Generator in active development (Epic: 11507545493)
  • Feeds ERI demo data quality recovery (target 40+ Signal Capture records)
  • EcoTask synthetic records also in scope for demo readiness