Mac AI Lab — Architecture Document¶
Owner: mining.mind Version: 1.1 Date: 2026-04-21 (original: March 2026) Classification: Internal reference — extended team background
Executive Summary¶
The Mac AI Lab is a layered local AI operating environment built on Apple Silicon. It provides a stable, reproducible foundation for:
- Model experimentation and local inference (no API costs for R&D)
- ASR transcription of enterprise meeting recordings
- Signal extraction from transcripts via local LLMs
- RAG (Retrieval-Augmented Generation) workflows
- QLoRA fine-tuning of extraction models
- LLMOps observability (Grafana, Postgres audit log)
Five Core Layers: 1. Infrastructure — macOS, Homebrew, Docker, filesystem 2. Runtime — nvm, Python venvs, shell aliases 3. Model & Inference — Ollama, local LLMs, ASR models, Open WebUI 4. Orchestration — Flowise, Python scripts, LangChain, MCP servers 5. Data, Retrieval & Application — Vector stores, Postgres, Grafana, structured outputs
Architecture Principles¶
Local-First Execution: Inference and workflow development run locally where possible. External APIs (Claude, GPT-4) used only when local capacity is insufficient or accuracy is required.
Runtime Separation: Different tools require different environments. Python venvs per pipeline; nvm for Node tooling.
Port Clarity: Each major service has explicit port assignment — no collisions, all documented.
Modular Orchestration: Ollama, vector stores, and database components are loosely coupled. Failures in one layer don't cascade.
Repeatable Operations: Startup, health checks, and extraction runs are scripted and documented. No manual intervention required for standard operations.
v1.2 Path Discipline: All mining.mind outputs route to /Claude/ canonical paths. Execution environment stays at old path during dual-run migration period.
Layer 1: Infrastructure¶
| Component | Detail |
|---|---|
| Hardware | Mac Studio M3 Ultra (96GB unified memory) — Mac Studio "Santa Cruz" is primary |
| Architecture | ARM64 (Apple Silicon) |
| Package manager | Homebrew (system-wide) |
| Containerization | Docker Desktop — Qdrant, ChromaDB, Grafana, Metabase |
| Shell | zsh with custom aliases in .zshrc |
| Remote access | Tailscale (Tailscale network: ai-factory.cheetah-mulley.ts.net) |
Travel note: Mac Studio access requires Tailscale. As of 2026-04-21, Mac Studio is SEV2 unreachable from travel — Mac laptop is the active execution environment. All transcription this session ran on travel laptop using mlx-whisper; Parakeet (113x realtime) requires Mac Studio.
Layer 2: Runtime¶
| Tool | Version | Purpose |
|---|---|---|
| nvm | Current | Node version management per tool |
| Node 20.x | Stable LTS | Flowise, automation scripts |
| Node 22.x | Latest | Experimental tools |
| Python venv | 3.11+ | Mining.mind pipeline at /ecomonetize/Project.Manager/Mining.Mind/transcription-pipeline/.venv/ |
Execution environment (dual-run period):
- Python venv and Ollama execution: /Users/rhartley/ecomonetize/Project.Manager/Mining.Mind/transcription-pipeline/
- Script canonical references: /Users/rhartley/Claude/scripts/agents/mining-mind/
- All outputs route to /Claude/ canonical paths
Layer 3: Model & Inference¶
Ollama — Local LLM Fleet¶
Service: Ollama REST API on port 11434
| Model | Size | Primary Use | Status |
|---|---|---|---|
llama3.3:70b |
~45GB | Signal extraction workhorse | Production |
qwen2.5:32b |
~20GB | Secondary extraction + fine-tune base | Production |
qwen3:30b |
~20GB | Available for evaluation | Available |
nomic-embed-text |
Small | RAG embeddings | Needs ollama pull before first RAG run |
Previous fleet (superseded): llama2:70b, mistral:7b, codellama:34b, orca-mini:3b, neural-chat:7b — replaced with current higher-quality models.
Open WebUI: Browser chat interface for Ollama — port 8080
ASR (Speech-to-Text) — Current Fleet¶
| Model | Speed | Platform | Status |
|---|---|---|---|
| NVIDIA Parakeet-TDT 0.6B v2 | 113x realtime | Mac Studio only | Production — Mac Studio |
| mlx-whisper large-v3-turbo | 22-61x realtime | Any Apple Silicon | Production — travel/laptop |
| Qwen3-ASR 1.7B | 4-5x realtime | Any | Fallback — multilingual |
| Deepgram | Real-time | Cloud API | Experimental |
mlx-whisper invocation (canonical):
cd /Users/rhartley/ecomonetize/Project.Manager/Mining.Mind/transcription-pipeline
source .venv/bin/activate
python3 scripts/transcribe.py <input.m4a> --backend mlx --model large-v3-turbo \
--output /Users/rhartley/Claude/knowledge/intel/meetings/<session-folder>/
Quality gate needed: mlx-whisper hallucinates (repeating filler words) on ambient noise recordings. Detect repetitive segments and flag low-confidence transcripts before extraction runs.
Layer 4: Orchestration¶
| Tool | Port | Purpose |
|---|---|---|
| Flowise | 3001 | Visual workflow editor |
| Python scripts (mining.mind) | N/A | Extraction pipeline, batch processing |
| LangChain | N/A | Complex chains, RAG, memory |
| LangSmith | External | Tracing and observability |
| MCP servers | Various | Claude Code tool integrations (Slack, Monday, Airtable, etc.) |
| Claude Code | N/A | Primary agent orchestration environment |
Canonical scripts path: /Claude/scripts/agents/mining-mind/
| Script | Purpose |
|---|---|
transcribe.py |
mlx-whisper transcription — single file or batch |
batch_humanx.sh |
Batch runner for HumanX 2026-04-06/07/08 (17 .m4a files) |
extract_signals_claude.py |
Signal extraction via Claude API |
batch_extract_signals.py |
Batch extraction across multiple transcripts |
Layer 5: Data, Retrieval & Application¶
Port Architecture (current — Mac Studio)¶
| Port | Service | Access |
|---|---|---|
| 5432 | PostgreSQL (ai-factory-postgres) |
Tailscale (LLMOps audit log, Grafana data source) |
| 6333 | Qdrant REST | Local / Docker |
| 6334 | Qdrant gRPC | Local / Docker |
| 8000 | ChromaDB | Local / Docker (RAG — 1,367 knowledge chunks) |
| 8080 | Open WebUI | Local (Ollama browser interface) |
| 11434 | Ollama | Local (model inference API) |
| 3000 | Grafana | SSH tunnel only — 127.0.0.1:3000 (LLMOps Executive Overview dashboard) |
| 3001 | Metabase | SSH tunnel only — 127.0.0.1:3001 |
LLMOps stack on Mac Studio:
- Grafana + Postgres = LLMOps Executive Overview (model invocation audit, cost tracking)
- Audit log schema v1.1 at /Claude/knowledge/best-practices/model-invocation-audit-schema-v1.0.md
- Every model call in mining.mind pipelines logs to Postgres + flat-file JSONL backup at /Claude/operations/logs/model-invocations/model-invocations.jsonl
Vector Stores¶
| Store | Port | Collection | Status |
|---|---|---|---|
| ChromaDB | 8000 | 1,367 knowledge chunks + meeting summaries | Active (RAG retrieval) |
| Qdrant | 6333 | eco-meeting-signals |
Needs rebuild — nomic-embed-text pull required first |
Folder Architecture (v1.2 Canonical)¶
/Users/rhartley/Claude/
├── knowledge/
│ ├── intel/meetings/{YYYY-MM-DD}-{session}/ # Meeting transcripts
│ ├── ip/frameworks/
│ │ ├── meeting-signals/ # Extracted signal JSON
│ │ ├── synthetic-corpus/ # Synthetic corpus JSON
│ │ ├── signal-ontology-v2.0.md # 35-domain extraction ontology
│ │ └── synthetic-meeting-corpus-report-v2.3.md # This document's companion
│ └── tools/mac-ai-lab-architecture.md # This document
├── scripts/agents/mining-mind/ # Canonical script references
├── plans/mining-mind-extraction-operations-plan-v1.0.md
├── operations/logs/model-invocations/ # Audit JSONL (fallback)
└── missions/M-2026-0421-signal-discovery-expansion/
└── features/F7-signal-master-spreadsheet-v2.0.md # Customer signal registry
/Users/rhartley/ecomonetize/Project.Manager/Mining.Mind/transcription-pipeline/
├── .venv/ # Python venv (stays here during dual-run)
├── scripts/ # Execution copies (canonical refs at /Claude/scripts/)
├── output/
│ ├── humanx-2026-04-06-08/ # Old location — migrated to /Claude/knowledge/intel/meetings/
│ ├── v3-signals/ # 28 real meeting signals — pending migration
│ └── synthetic-v2/ # 1,400 synthetic meetings — pending migration
└── fine-tuning/ # QLoRA infrastructure (execution stays here)
End-to-End Request Path (Signal Extraction)¶
Meeting Recording (.m4a/.mp3/.mp4)
→ ASR (mlx-whisper large-v3-turbo or Parakeet)
→ Transcript .txt → /Claude/knowledge/intel/meetings/{session}/
→ Signal extraction (Ollama llama3.3:70b or Claude API)
→ Quality check (20 ontology domains present, empty field flags)
→ Signal JSON → /Claude/knowledge/ip/frameworks/meeting-signals/{slug}.json
→ Audit log → Postgres (ai-factory-postgres) + JSONL fallback
Operating Model¶
Session-Start Checklist (every mining.mind session)¶
- Check recording folder (
/Users/rhartley/Human X/) for new .m4a/.mp3/.mp4 files - Check last extraction date in
output/v3-signals/— flag to CDO if gap > 2 weeks - Verify Ollama is running:
ollama list - Confirm Mac Studio Tailscale reachability — SEV2 if unreachable
- Extract any pending recordings before moving to other work
Weekly Maintenance¶
| Frequency | Action |
|---|---|
| Every session | Scan recording folder, extract if new files |
| Weekly | Run health check on all services; confirm Grafana dashboard current |
| Monthly | Synthetic generation batch (20-50 meetings); RAG rebuild after ≥10 new meetings |
| Quarterly | QLoRA fine-tune (when corpus ≥100 meetings, CDO approval required) |
Maturity Roadmap (current state: Phase 2-3)¶
| Phase | Description | Status |
|---|---|---|
| 1 | Stable workstation baseline — Ollama, basic service orchestration | Complete |
| 2 | Structured workflow layer — version-controlled scripts, extraction pipeline | Active |
| 3 | Persistent retrieval layer — ChromaDB active, Qdrant needs rebuild | Partial |
| 4 | Advanced simulation — synthetic engine + QLoRA fine-tuning | In progress |
| 5 | Production-like local platform — monitoring dashboards, CI/CD for workflows | Planned |
Risk Areas and Controls¶
| Risk | Mitigation |
|---|---|
| Mac Studio unreachable via Tailscale | SEV2 protocol; travel laptop as fallback for mlx-whisper |
| Old-path migration breaks signal-extraction service | Coordination note filed to Jordan — code.platform reviews before touching plist |
| Whisper hallucination on noisy recordings | Quality gate needed — detect repetitive segments pre-extraction |
| RAG index stale | ollama pull nomic-embed-text → Qdrant rebuild before Operation 4 |
| LLMOps Postgres connection lost during migration | Preserve connection string; don't touch Grafana dashboard definitions |
| Prompt drift in extraction scripts | Scripts version-controlled at /Claude/scripts/agents/mining-mind/ |
LLMOps Audit Requirements¶
Every model invocation in mining.mind pipelines must log to:
- Primary: ai-factory-postgres (port 5432, Tailscale) — Schema v1.1
- Fallback: /Claude/operations/logs/model-invocations/model-invocations.jsonl
Schema reference: /Claude/knowledge/best-practices/model-invocation-audit-schema-v1.0.md
CDO receives weekly model-call digest from Grafana per F9 ops plan requirements.
Updated from original Mac AI Lab Architecture v1.0 (March 2026). Model fleet, ports, paths, and operating model reflect current state as of 2026-04-21.
Source HTML: /Users/rhartley/ecomonetize/Project.Manager/Code.Assistant/ops-docs/mac-ai-lab/Mac-AI-Lab-Architecture.html