Mac AI Lab — Architecture Document¶

Owner: mining.mind Version: 1.1 Date: 2026-04-21 (original: March 2026) Classification: Internal reference — extended team background

Executive Summary¶

The Mac AI Lab is a layered local AI operating environment built on Apple Silicon. It provides a stable, reproducible foundation for:

Model experimentation and local inference (no API costs for R&D)
ASR transcription of enterprise meeting recordings
Signal extraction from transcripts via local LLMs
RAG (Retrieval-Augmented Generation) workflows
QLoRA fine-tuning of extraction models
LLMOps observability (Grafana, Postgres audit log)

Five Core Layers: 1. Infrastructure — macOS, Homebrew, Docker, filesystem 2. Runtime — nvm, Python venvs, shell aliases 3. Model & Inference — Ollama, local LLMs, ASR models, Open WebUI 4. Orchestration — Flowise, Python scripts, LangChain, MCP servers 5. Data, Retrieval & Application — Vector stores, Postgres, Grafana, structured outputs

Architecture Principles¶

Local-First Execution: Inference and workflow development run locally where possible. External APIs (Claude, GPT-4) used only when local capacity is insufficient or accuracy is required.

Runtime Separation: Different tools require different environments. Python venvs per pipeline; nvm for Node tooling.

Port Clarity: Each major service has explicit port assignment — no collisions, all documented.

Modular Orchestration: Ollama, vector stores, and database components are loosely coupled. Failures in one layer don't cascade.

Repeatable Operations: Startup, health checks, and extraction runs are scripted and documented. No manual intervention required for standard operations.

v1.2 Path Discipline: All mining.mind outputs route to /Claude/ canonical paths. Execution environment stays at old path during dual-run migration period.

Layer 1: Infrastructure¶

Component	Detail
Hardware	Mac Studio M3 Ultra (96GB unified memory) — Mac Studio "Santa Cruz" is primary
Architecture	ARM64 (Apple Silicon)
Package manager	Homebrew (system-wide)
Containerization	Docker Desktop — Qdrant, ChromaDB, Grafana, Metabase
Shell	zsh with custom aliases in `.zshrc`
Remote access	Tailscale (Tailscale network: `ai-factory.cheetah-mulley.ts.net`)

Travel note: Mac Studio access requires Tailscale. As of 2026-04-21, Mac Studio is SEV2 unreachable from travel — Mac laptop is the active execution environment. All transcription this session ran on travel laptop using mlx-whisper; Parakeet (113x realtime) requires Mac Studio.

Layer 2: Runtime¶

Tool	Version	Purpose
nvm	Current	Node version management per tool
Node 20.x	Stable LTS	Flowise, automation scripts
Node 22.x	Latest	Experimental tools
Python venv	3.11+	Mining.mind pipeline at `/ecomonetize/Project.Manager/Mining.Mind/transcription-pipeline/.venv/`

Execution environment (dual-run period): - Python venv and Ollama execution: /Users/rhartley/ecomonetize/Project.Manager/Mining.Mind/transcription-pipeline/ - Script canonical references: /Users/rhartley/Claude/scripts/agents/mining-mind/ - All outputs route to /Claude/ canonical paths

Layer 3: Model & Inference¶

Ollama — Local LLM Fleet¶

Service: Ollama REST API on port 11434

Model	Size	Primary Use	Status
`llama3.3:70b`	~45GB	Signal extraction workhorse	Production
`qwen2.5:32b`	~20GB	Secondary extraction + fine-tune base	Production
`qwen3:30b`	~20GB	Available for evaluation	Available
`nomic-embed-text`	Small	RAG embeddings	Needs `ollama pull` before first RAG run

Previous fleet (superseded): llama2:70b, mistral:7b, codellama:34b, orca-mini:3b, neural-chat:7b — replaced with current higher-quality models.

Open WebUI: Browser chat interface for Ollama — port 8080

ASR (Speech-to-Text) — Current Fleet¶

Model	Speed	Platform	Status
NVIDIA Parakeet-TDT 0.6B v2	113x realtime	Mac Studio only	Production — Mac Studio
mlx-whisper large-v3-turbo	22-61x realtime	Any Apple Silicon	Production — travel/laptop
Qwen3-ASR 1.7B	4-5x realtime	Any	Fallback — multilingual
Deepgram	Real-time	Cloud API	Experimental

mlx-whisper invocation (canonical):

cd /Users/rhartley/ecomonetize/Project.Manager/Mining.Mind/transcription-pipeline
source .venv/bin/activate
python3 scripts/transcribe.py <input.m4a> --backend mlx --model large-v3-turbo \
  --output /Users/rhartley/Claude/knowledge/intel/meetings/<session-folder>/

Quality gate needed: mlx-whisper hallucinates (repeating filler words) on ambient noise recordings. Detect repetitive segments and flag low-confidence transcripts before extraction runs.

Layer 4: Orchestration¶

Tool	Port	Purpose
Flowise	3001	Visual workflow editor
Python scripts (mining.mind)	N/A	Extraction pipeline, batch processing
LangChain	N/A	Complex chains, RAG, memory
LangSmith	External	Tracing and observability
MCP servers	Various	Claude Code tool integrations (Slack, Monday, Airtable, etc.)
Claude Code	N/A	Primary agent orchestration environment

Canonical scripts path: /Claude/scripts/agents/mining-mind/

Script	Purpose
`transcribe.py`	mlx-whisper transcription — single file or batch
`batch_humanx.sh`	Batch runner for HumanX 2026-04-06/07/08 (17 .m4a files)
`extract_signals_claude.py`	Signal extraction via Claude API
`batch_extract_signals.py`	Batch extraction across multiple transcripts

Layer 5: Data, Retrieval & Application¶

Port Architecture (current — Mac Studio)¶

Port	Service	Access
5432	PostgreSQL (`ai-factory-postgres`)	Tailscale (LLMOps audit log, Grafana data source)
6333	Qdrant REST	Local / Docker
6334	Qdrant gRPC	Local / Docker
8000	ChromaDB	Local / Docker (RAG — 1,367 knowledge chunks)
8080	Open WebUI	Local (Ollama browser interface)
11434	Ollama	Local (model inference API)
3000	Grafana	SSH tunnel only — `127.0.0.1:3000` (LLMOps Executive Overview dashboard)
3001	Metabase	SSH tunnel only — `127.0.0.1:3001`

LLMOps stack on Mac Studio: - Grafana + Postgres = LLMOps Executive Overview (model invocation audit, cost tracking) - Audit log schema v1.1 at /Claude/knowledge/best-practices/model-invocation-audit-schema-v1.0.md - Every model call in mining.mind pipelines logs to Postgres + flat-file JSONL backup at /Claude/operations/logs/model-invocations/model-invocations.jsonl

Vector Stores¶

Store	Port	Collection	Status
ChromaDB	8000	1,367 knowledge chunks + meeting summaries	Active (RAG retrieval)
Qdrant	6333	`eco-meeting-signals`	Needs rebuild — `nomic-embed-text` pull required first

Folder Architecture (v1.2 Canonical)¶

/Users/rhartley/Claude/
├── knowledge/
│   ├── intel/meetings/{YYYY-MM-DD}-{session}/     # Meeting transcripts
│   ├── ip/frameworks/
│   │   ├── meeting-signals/                         # Extracted signal JSON
│   │   ├── synthetic-corpus/                        # Synthetic corpus JSON
│   │   ├── signal-ontology-v2.0.md                 # 35-domain extraction ontology
│   │   └── synthetic-meeting-corpus-report-v2.3.md # This document's companion
│   └── tools/mac-ai-lab-architecture.md            # This document
├── scripts/agents/mining-mind/                      # Canonical script references
├── plans/mining-mind-extraction-operations-plan-v1.0.md
├── operations/logs/model-invocations/              # Audit JSONL (fallback)
└── missions/M-2026-0421-signal-discovery-expansion/
    └── features/F7-signal-master-spreadsheet-v2.0.md  # Customer signal registry

/Users/rhartley/ecomonetize/Project.Manager/Mining.Mind/transcription-pipeline/
├── .venv/                    # Python venv (stays here during dual-run)
├── scripts/                  # Execution copies (canonical refs at /Claude/scripts/)
├── output/
│   ├── humanx-2026-04-06-08/ # Old location — migrated to /Claude/knowledge/intel/meetings/
│   ├── v3-signals/           # 28 real meeting signals — pending migration
│   └── synthetic-v2/         # 1,400 synthetic meetings — pending migration
└── fine-tuning/              # QLoRA infrastructure (execution stays here)

End-to-End Request Path (Signal Extraction)¶

Meeting Recording (.m4a/.mp3/.mp4)
  → ASR (mlx-whisper large-v3-turbo or Parakeet)
  → Transcript .txt → /Claude/knowledge/intel/meetings/{session}/
  → Signal extraction (Ollama llama3.3:70b or Claude API)
  → Quality check (20 ontology domains present, empty field flags)
  → Signal JSON → /Claude/knowledge/ip/frameworks/meeting-signals/{slug}.json
  → Audit log → Postgres (ai-factory-postgres) + JSONL fallback

Operating Model¶

Session-Start Checklist (every mining.mind session)¶

Check recording folder (/Users/rhartley/Human X/) for new .m4a/.mp3/.mp4 files
Check last extraction date in output/v3-signals/ — flag to CDO if gap > 2 weeks
Verify Ollama is running: ollama list
Confirm Mac Studio Tailscale reachability — SEV2 if unreachable
Extract any pending recordings before moving to other work

Weekly Maintenance¶

Frequency	Action
Every session	Scan recording folder, extract if new files
Weekly	Run health check on all services; confirm Grafana dashboard current
Monthly	Synthetic generation batch (20-50 meetings); RAG rebuild after ≥10 new meetings
Quarterly	QLoRA fine-tune (when corpus ≥100 meetings, CDO approval required)

Maturity Roadmap (current state: Phase 2-3)¶

Phase	Description	Status
1	Stable workstation baseline — Ollama, basic service orchestration	Complete
2	Structured workflow layer — version-controlled scripts, extraction pipeline	Active
3	Persistent retrieval layer — ChromaDB active, Qdrant needs rebuild	Partial
4	Advanced simulation — synthetic engine + QLoRA fine-tuning	In progress
5	Production-like local platform — monitoring dashboards, CI/CD for workflows	Planned

Risk Areas and Controls¶

Risk	Mitigation
Mac Studio unreachable via Tailscale	SEV2 protocol; travel laptop as fallback for mlx-whisper
Old-path migration breaks signal-extraction service	Coordination note filed to Jordan — code.platform reviews before touching plist
Whisper hallucination on noisy recordings	Quality gate needed — detect repetitive segments pre-extraction
RAG index stale	`ollama pull nomic-embed-text` → Qdrant rebuild before Operation 4
LLMOps Postgres connection lost during migration	Preserve connection string; don't touch Grafana dashboard definitions
Prompt drift in extraction scripts	Scripts version-controlled at `/Claude/scripts/agents/mining-mind/`

LLMOps Audit Requirements¶

Every model invocation in mining.mind pipelines must log to: - Primary: ai-factory-postgres (port 5432, Tailscale) — Schema v1.1 - Fallback: /Claude/operations/logs/model-invocations/model-invocations.jsonl

Schema reference: /Claude/knowledge/best-practices/model-invocation-audit-schema-v1.0.md

CDO receives weekly model-call digest from Grafana per F9 ops plan requirements.

Updated from original Mac AI Lab Architecture v1.0 (March 2026). Model fleet, ports, paths, and operating model reflect current state as of 2026-04-21. Source HTML: /Users/rhartley/ecomonetize/Project.Manager/Code.Assistant/ops-docs/mac-ai-lab/Mac-AI-Lab-Architecture.html