Skip to content

Mac AI Lab — Architecture Document

Owner: mining.mind Version: 1.1 Date: 2026-04-21 (original: March 2026) Classification: Internal reference — extended team background


Executive Summary

The Mac AI Lab is a layered local AI operating environment built on Apple Silicon. It provides a stable, reproducible foundation for:

  • Model experimentation and local inference (no API costs for R&D)
  • ASR transcription of enterprise meeting recordings
  • Signal extraction from transcripts via local LLMs
  • RAG (Retrieval-Augmented Generation) workflows
  • QLoRA fine-tuning of extraction models
  • LLMOps observability (Grafana, Postgres audit log)

Five Core Layers: 1. Infrastructure — macOS, Homebrew, Docker, filesystem 2. Runtime — nvm, Python venvs, shell aliases 3. Model & Inference — Ollama, local LLMs, ASR models, Open WebUI 4. Orchestration — Flowise, Python scripts, LangChain, MCP servers 5. Data, Retrieval & Application — Vector stores, Postgres, Grafana, structured outputs


Architecture Principles

Local-First Execution: Inference and workflow development run locally where possible. External APIs (Claude, GPT-4) used only when local capacity is insufficient or accuracy is required.

Runtime Separation: Different tools require different environments. Python venvs per pipeline; nvm for Node tooling.

Port Clarity: Each major service has explicit port assignment — no collisions, all documented.

Modular Orchestration: Ollama, vector stores, and database components are loosely coupled. Failures in one layer don't cascade.

Repeatable Operations: Startup, health checks, and extraction runs are scripted and documented. No manual intervention required for standard operations.

v1.2 Path Discipline: All mining.mind outputs route to /Claude/ canonical paths. Execution environment stays at old path during dual-run migration period.


Layer 1: Infrastructure

Component Detail
Hardware Mac Studio M3 Ultra (96GB unified memory) — Mac Studio "Santa Cruz" is primary
Architecture ARM64 (Apple Silicon)
Package manager Homebrew (system-wide)
Containerization Docker Desktop — Qdrant, ChromaDB, Grafana, Metabase
Shell zsh with custom aliases in .zshrc
Remote access Tailscale (Tailscale network: ai-factory.cheetah-mulley.ts.net)

Travel note: Mac Studio access requires Tailscale. As of 2026-04-21, Mac Studio is SEV2 unreachable from travel — Mac laptop is the active execution environment. All transcription this session ran on travel laptop using mlx-whisper; Parakeet (113x realtime) requires Mac Studio.


Layer 2: Runtime

Tool Version Purpose
nvm Current Node version management per tool
Node 20.x Stable LTS Flowise, automation scripts
Node 22.x Latest Experimental tools
Python venv 3.11+ Mining.mind pipeline at /ecomonetize/Project.Manager/Mining.Mind/transcription-pipeline/.venv/

Execution environment (dual-run period): - Python venv and Ollama execution: /Users/rhartley/ecomonetize/Project.Manager/Mining.Mind/transcription-pipeline/ - Script canonical references: /Users/rhartley/Claude/scripts/agents/mining-mind/ - All outputs route to /Claude/ canonical paths


Layer 3: Model & Inference

Ollama — Local LLM Fleet

Service: Ollama REST API on port 11434

Model Size Primary Use Status
llama3.3:70b ~45GB Signal extraction workhorse Production
qwen2.5:32b ~20GB Secondary extraction + fine-tune base Production
qwen3:30b ~20GB Available for evaluation Available
nomic-embed-text Small RAG embeddings Needs ollama pull before first RAG run

Previous fleet (superseded): llama2:70b, mistral:7b, codellama:34b, orca-mini:3b, neural-chat:7b — replaced with current higher-quality models.

Open WebUI: Browser chat interface for Ollama — port 8080

ASR (Speech-to-Text) — Current Fleet

Model Speed Platform Status
NVIDIA Parakeet-TDT 0.6B v2 113x realtime Mac Studio only Production — Mac Studio
mlx-whisper large-v3-turbo 22-61x realtime Any Apple Silicon Production — travel/laptop
Qwen3-ASR 1.7B 4-5x realtime Any Fallback — multilingual
Deepgram Real-time Cloud API Experimental

mlx-whisper invocation (canonical):

cd /Users/rhartley/ecomonetize/Project.Manager/Mining.Mind/transcription-pipeline
source .venv/bin/activate
python3 scripts/transcribe.py <input.m4a> --backend mlx --model large-v3-turbo \
  --output /Users/rhartley/Claude/knowledge/intel/meetings/<session-folder>/

Quality gate needed: mlx-whisper hallucinates (repeating filler words) on ambient noise recordings. Detect repetitive segments and flag low-confidence transcripts before extraction runs.


Layer 4: Orchestration

Tool Port Purpose
Flowise 3001 Visual workflow editor
Python scripts (mining.mind) N/A Extraction pipeline, batch processing
LangChain N/A Complex chains, RAG, memory
LangSmith External Tracing and observability
MCP servers Various Claude Code tool integrations (Slack, Monday, Airtable, etc.)
Claude Code N/A Primary agent orchestration environment

Canonical scripts path: /Claude/scripts/agents/mining-mind/

Script Purpose
transcribe.py mlx-whisper transcription — single file or batch
batch_humanx.sh Batch runner for HumanX 2026-04-06/07/08 (17 .m4a files)
extract_signals_claude.py Signal extraction via Claude API
batch_extract_signals.py Batch extraction across multiple transcripts

Layer 5: Data, Retrieval & Application

Port Architecture (current — Mac Studio)

Port Service Access
5432 PostgreSQL (ai-factory-postgres) Tailscale (LLMOps audit log, Grafana data source)
6333 Qdrant REST Local / Docker
6334 Qdrant gRPC Local / Docker
8000 ChromaDB Local / Docker (RAG — 1,367 knowledge chunks)
8080 Open WebUI Local (Ollama browser interface)
11434 Ollama Local (model inference API)
3000 Grafana SSH tunnel only127.0.0.1:3000 (LLMOps Executive Overview dashboard)
3001 Metabase SSH tunnel only127.0.0.1:3001

LLMOps stack on Mac Studio: - Grafana + Postgres = LLMOps Executive Overview (model invocation audit, cost tracking) - Audit log schema v1.1 at /Claude/knowledge/best-practices/model-invocation-audit-schema-v1.0.md - Every model call in mining.mind pipelines logs to Postgres + flat-file JSONL backup at /Claude/operations/logs/model-invocations/model-invocations.jsonl

Vector Stores

Store Port Collection Status
ChromaDB 8000 1,367 knowledge chunks + meeting summaries Active (RAG retrieval)
Qdrant 6333 eco-meeting-signals Needs rebuild — nomic-embed-text pull required first

Folder Architecture (v1.2 Canonical)

/Users/rhartley/Claude/
├── knowledge/
│   ├── intel/meetings/{YYYY-MM-DD}-{session}/     # Meeting transcripts
│   ├── ip/frameworks/
│   │   ├── meeting-signals/                         # Extracted signal JSON
│   │   ├── synthetic-corpus/                        # Synthetic corpus JSON
│   │   ├── signal-ontology-v2.0.md                 # 35-domain extraction ontology
│   │   └── synthetic-meeting-corpus-report-v2.3.md # This document's companion
│   └── tools/mac-ai-lab-architecture.md            # This document
├── scripts/agents/mining-mind/                      # Canonical script references
├── plans/mining-mind-extraction-operations-plan-v1.0.md
├── operations/logs/model-invocations/              # Audit JSONL (fallback)
└── missions/M-2026-0421-signal-discovery-expansion/
    └── features/F7-signal-master-spreadsheet-v2.0.md  # Customer signal registry

/Users/rhartley/ecomonetize/Project.Manager/Mining.Mind/transcription-pipeline/
├── .venv/                    # Python venv (stays here during dual-run)
├── scripts/                  # Execution copies (canonical refs at /Claude/scripts/)
├── output/
│   ├── humanx-2026-04-06-08/ # Old location — migrated to /Claude/knowledge/intel/meetings/
│   ├── v3-signals/           # 28 real meeting signals — pending migration
│   └── synthetic-v2/         # 1,400 synthetic meetings — pending migration
└── fine-tuning/              # QLoRA infrastructure (execution stays here)

End-to-End Request Path (Signal Extraction)

Meeting Recording (.m4a/.mp3/.mp4)
  → ASR (mlx-whisper large-v3-turbo or Parakeet)
  → Transcript .txt → /Claude/knowledge/intel/meetings/{session}/
  → Signal extraction (Ollama llama3.3:70b or Claude API)
  → Quality check (20 ontology domains present, empty field flags)
  → Signal JSON → /Claude/knowledge/ip/frameworks/meeting-signals/{slug}.json
  → Audit log → Postgres (ai-factory-postgres) + JSONL fallback

Operating Model

Session-Start Checklist (every mining.mind session)

  1. Check recording folder (/Users/rhartley/Human X/) for new .m4a/.mp3/.mp4 files
  2. Check last extraction date in output/v3-signals/ — flag to CDO if gap > 2 weeks
  3. Verify Ollama is running: ollama list
  4. Confirm Mac Studio Tailscale reachability — SEV2 if unreachable
  5. Extract any pending recordings before moving to other work

Weekly Maintenance

Frequency Action
Every session Scan recording folder, extract if new files
Weekly Run health check on all services; confirm Grafana dashboard current
Monthly Synthetic generation batch (20-50 meetings); RAG rebuild after ≥10 new meetings
Quarterly QLoRA fine-tune (when corpus ≥100 meetings, CDO approval required)

Maturity Roadmap (current state: Phase 2-3)

Phase Description Status
1 Stable workstation baseline — Ollama, basic service orchestration Complete
2 Structured workflow layer — version-controlled scripts, extraction pipeline Active
3 Persistent retrieval layer — ChromaDB active, Qdrant needs rebuild Partial
4 Advanced simulation — synthetic engine + QLoRA fine-tuning In progress
5 Production-like local platform — monitoring dashboards, CI/CD for workflows Planned

Risk Areas and Controls

Risk Mitigation
Mac Studio unreachable via Tailscale SEV2 protocol; travel laptop as fallback for mlx-whisper
Old-path migration breaks signal-extraction service Coordination note filed to Jordan — code.platform reviews before touching plist
Whisper hallucination on noisy recordings Quality gate needed — detect repetitive segments pre-extraction
RAG index stale ollama pull nomic-embed-text → Qdrant rebuild before Operation 4
LLMOps Postgres connection lost during migration Preserve connection string; don't touch Grafana dashboard definitions
Prompt drift in extraction scripts Scripts version-controlled at /Claude/scripts/agents/mining-mind/

LLMOps Audit Requirements

Every model invocation in mining.mind pipelines must log to: - Primary: ai-factory-postgres (port 5432, Tailscale) — Schema v1.1 - Fallback: /Claude/operations/logs/model-invocations/model-invocations.jsonl

Schema reference: /Claude/knowledge/best-practices/model-invocation-audit-schema-v1.0.md

CDO receives weekly model-call digest from Grafana per F9 ops plan requirements.


Updated from original Mac AI Lab Architecture v1.0 (March 2026). Model fleet, ports, paths, and operating model reflect current state as of 2026-04-21. Source HTML: /Users/rhartley/ecomonetize/Project.Manager/Code.Assistant/ops-docs/mac-ai-lab/Mac-AI-Lab-Architecture.html