// SYSTEM_ARCHITECTURE v24 · 2026-07-02 · OLLAMA RETIRES 4 LANES JUL 15: devstral-2 → kimi-k2.7-code, qwen3-coder 480B/next → mistral-large-3 (pass-2) / Groq (fast), deepseek-v3.2 dropped · Sonnet 5 lead/executor, Opus planner/final-gate, Fable 5 break-glass · 6 frontier models tie → free lanes default · MemSearch 535 files · 116 active skills

Greg's AI Workflow

Model stack · Skills · Subagents · Build pipelines · Safety rails

INPUT

User Request

Opus 4.8

PLANNER

Reasoning, strategy, architecture, plans

nativeFable 5 break-glass

Sonnet 5

LEAD AGENT

Primary executor · Reviews all outputs · Integrates · Deploys

native delegates via MCP

RESEARCH / REASONING

Gemini 3.5 Flash

Research, blog posts, FAQs, comparisons, case studies

research compare find_best

Gemini 3.1 Pro METERED · JESSE $50

Greg's metered MCP Pro stays off (daily billing creep) , but Jesse's pay-per-token key ($50 prepaid) now unlocks Gemini 3.1 Pro via gemini-jesse.sh --pro for the high-stakes 3-family review panel (~$0.07/review, distinct arch) plus Imagen + Veo. Free research/write still go to Gemini Flash.

Groq Llama 4 Scout PREVIEW FREE

Llama 4 MoE, fast multimodal, Groq marks this Preview, "may be discontinued at short notice." Use opportunistically.

fast_llama4

GPT OSS 120B via Groq FREE

OpenAI OSS, the only production-tier cross-architecture reviewer on Groq. Use as second pass in the verification chain.

fast_gpt_oss

Llama 3.3 70B via Groq FREE

Workhorse fast chat, speed-tier fallback when Ollama Pro is down. 1K req/min, 500K req/day.

fast_code

CODE / BACKEND

DeepSeek V4 Pro 1.6T / 49B active FRONTIER · DEFAULT

Default first-pass code reviewer. 1M context, 3 thinking modes (off / low / medium / high). Streaming via ollama-mcp 0b88571, CF 524 fix 2026-05-12. Pin think=off for review, bench 2026-06-17 showed heavy <thinking> output bloat.

ollama_deepseek_pro

GPT 5.5 Pro / 5.4 Pro / Codex Spark (-pro tier) DISABLED

Off , $30/$180 tier, drains $18 in ~6 calls. Cheatsheet DO-NOT-CALL: opencode_gpt55_pro, opencode_gpt54_pro, opencode_gpt53_codex_spark, opencode_kimi_thinking. Now sanctioned (metered, capped): gpt-5.3-codex + gpt-5.5, see paid-lever node →

DeepSeek V4 Flash 158B OLLAMA PRO

Deep reasoning, thinking, agentic tasks, comparable to GPT-5

ollama_deepseek

Devstral 2 123B RETIRES JUL 15

SWE-Bench 72.2%, Mistral's best coder, multi-file editing. Ollama retires devstral-2:123b Jul 15 2026. The ollama_devstral tool repoints to kimi-k2.7-code (surviving SWE specialist), not Ollama's generic mistral-large-3 rec.

ollama_devstral → kimi-k2.7-code

CodeWhale deepseek-tui 0.8.66 INSTALLED · NOT WIRED

Rust coding-agent harness (deepseek-tui, runs deepseek-v4-pro). Installed Jun 9, never wired into routing, no invocations, not in the delegation cheatsheet or any skill/hook. Evaluated lane, not an active one. codewhale exec

codewhale (CLI, not MCP)

GLM 5.2 OLLAMA PRO

Agentic coding, sustains over 100s of rounds, SWE-Bench Pro 58.4%. Verbose, leaks the biggest <thinking> of the stack (bench 2026-06-17); strip it in loops.

ollama_glm

MiniMax M3 OLLAMA PRO

Frontier agentic coding + tool use, MSA arch (non-DeepSeek/Qwen, true cross-arch second opinion). 1M ctx (512K floor) & vision, the only Ollama-Pro route that takes image input or >10k-char prompts. Scoped to vision + long-doc lane, not a coding default.

ollama_minimax ollama_minimax_vision

OpenCode paid levers (~$18, sk-proj) METERED · CAPPED

OpenAI-family, manual-trigger only, free lanes stay default. gpt-5.1-codex-mini = cheapest verify (default). gpt-5.3-codex ($1.75/$14, frontier code) = HIGH-STAKES review when free gpt-oss-120b isn't enough (~$0.04/review, ~430 in $18). gpt-5.5 ($5/$30) = FRONTEND-DESIGN creative escalation only (~$0.26/component, ~70 in $18, cap it). Free fallback: Codex CLI gpt-5.4-mini. qwen3.6-plus (Alibaba, opencode_qwen36) = newer agentic/repo coding + reasoning, ~$0.02/call, 3rd distinct family (non-OpenAI/non-DeepSeek) for cross-arch panels, NOT a default (free Ollama Qwen3-coder 480B covers most), text-only no vision. (added 2026-06-27)
OpenCode placement (settled 2026-07-09): OpenCode is an agent harness, NOT a completion tier (~80% catalog overlaps free Ollama). Exactly 2 slots: (1) qwen3.6 cross-arch verify; (2) a non-Claude model driving an agentic repo session. Completions→Ollama, autonomous repo→Claude Code, grok headless→xAI direct API. opencode run is TUI-hostile headless.
Grok-4.5 (xAI, key loaded 2026-07-09): scripted/headless → xAI DIRECT API api.x.ai/v1/responses (clean text, OpenAI-compat), NOT OpenCode. Reasoning model → cost is reasoning-token-driven ~$0.05 to 0.30/coding call on ANY route (15 to 100 calls/$5), not "cheap." OpenCode grok = agentic-repo driver ONLY. 07-09 4-model bench (Opus/Sonnet/DeepSeek-Pro/Grok) all tied 52/52 + 30/30. Frontier models don't separate on single-fn tasks.

gpt-5.1-codex-mini gpt-5.3-codex gpt-5.5 qwen3.6-plus grok-4.5 (xAI direct)

DeepSeek V4 Pro (direct API) MOTHBALLED 2026-05-30

Off. Ollama Pro already includes DeepSeek V4 Pro 1.6T. The $10 direct-API verifier was redundant (same DeepSeek family, no cross-arch value). Verification moves to GPT 5.1 Codex Mini above. DO-NOT-CALL: deepseek_pro, deepseek_code_review.

CONTENT / UTILITY

Kimi K2.7 OLLAMA PRO

Bulk HTML, multimodal, vision, agent swarm, moved from free to Pro

ollama_kimi

Nemotron 3 Super NVIDIA 120B OLLAMA PRO

Agentic reasoning, content, mid lane (fast)

ollama_nemotron

Nemotron 3 Ultra NVIDIA OLLAMA PRO NEW · 2026-06-07

High-throughput reasoning + long-running agents. NVIDIA-family cross-arch verifier for RULE #0.6 (non-DeepSeek/Qwen second opinion).

ollama_nemotron_ultra

Nemotron 3 Nano 30B NVIDIA OLLAMA PRO NEW · 2026-06-07

Fast/cheap agentic lane, quick tasks, classification, routing, lightweight reasoning.

ollama_nemotron_nano

Qwen3-Coder 480B RETIRES JUL 15

Code tasks, second opinions, large code reviews. Ollama retires qwen3-coder:480b Jul 15 2026. Pass-2 code review moves to mistral-large-3:675b (Mistral family, stays cross-arch vs DeepSeek pass-1); ollama_chat keeps qwen3.5:397b.

ollama_code → mistral-large-3

Qwen3.5 397B OLLAMA PRO

Reasoning, writing, analysis, vision, thinking, zero cost

ollama_chat

Mistral Large 3 675B OLLAMA PRO

Complex reasoning, large documents, vision, biggest model

ollama_mistral

Veo 3.1 · Gemini Jesse $50 · gemini-veo.sh VIDEO

Text-to-video & image-to-video. Veo 3.1 Lite $0.40/8s = default (~125 clips/$50), Fast $1.20, Standard $3.20 , cheapest video lane, beats kie/fal. Seedance (fal) = photos-only fallback.

veo-3.1-lite-generate-preview gemini-veo.sh fal-ai/seedance (fallback)

Results flow back to Opus 4.8 / Sonnet 5 for review, integration & deployment

// MCP_SERVERS , 16 ACTIVE (ubersuggest + 21st-magic added 2026-06-17)

chrome-devtools RULE

Render before judging. DOM + accessibility tree + screenshot, not curl. Default for any UX, design audit, redesign, or rendering task per CLAUDE.md.

new_page · take_snapshot · take_screenshot

dataforseo SEO

Live SERP, keyword volume, backlinks, on-page Lighthouse, business listings, AI visibility (LLM mentions). 80+ tools , the canonical SEO data source.

serp · keywords · backlinks · ai_opt

mirage NEW · 2026-05-07

Multi-server file ops. SSH + RAM mounts under one virtual filesystem, cross-server diff, read, exec without juggling tabs. SSH writes default-deny.

mounts · ls · read · diff · exec

stitch-design UI

Google Stitch, UI generation from prompts. Design systems, screen variants, project scaffolding. Convert output to semantic CSS before shipping.

create_design_system · generate_screen · variants

fal-image

fal.ai image generation + edit endpoints, fallback when cc-nano-banana (Gemini) is unavailable or for specific fal-only models.

generate_image · edit_image

rybbit NEW · 2026-05-18

Privacy-friendly analytics MCP. Org-level Bearer key, reads sites, sessions, events, funnels, goals, outbound, live visitors. FPS live (siteId 10025). 17 tools.

overview · sessions · events · funnel_analyze · live

+ gemini-research (Flash only), groq-fast, ollama-pro, opencode (free models only), xiaomi-mimo, glm-free, deepseek, ubersuggest (keyword/rank data), 21st-magic (UI component gen), model-routing + data MCPs already covered above. Plus Anthropic-managed: Gmail, Google Calendar, Google Drive, HubSpot, Notion, Playwright (via plugins).

// BUILD_PIPELINES

ASTRO SSR FRONT-END BUILD

STEP 1

Design Input

Screenshot, brief, or reference file

DESIGN.md awesome-design-md

STEP 2

Skills Layer

Claude Code skills invoked by Opus

/stitch /astro-ssr /taste /design-system /code-rescue /rag-search /skill-auditor

STEP 3

Subagents

Specialized agents spawned per task

frontend-developer typescript-pro codex:codex-rescue

STEP 4

Models Execute

Fast code gen + design research

ollama_code_fast Gemini 3.5 Flash Llama 4 Scout

STEP 5

Review + Deploy

Code review then ship

/codex:review /codex:rescue rsync deploy

LIBRARIES: awesome-design-md awesome-agent-skills awesome-claude-code-subagents awesome-codex-subagents codex-plugin-cc

// QUALITY_GATES

PRE-DEPLOY GATES , MANDATORY BEFORE EVERY SHIP

ASTRO SSR

Frontend Gate

→ npm run build:css (if Tailwind classes changed)

→ Playwright desktop + mobile screenshot

→ seo-compliance-check.py on live URL

→ git commit before deploy

playwright seo-check

LARAVEL

Backend Gate

→ php artisan test (all suites pass)

→ phpstan analyse --level=5 (no new errors)

→ Check N+1 , eager load with ->with()

→ artisan config:clear + cache:clear

phpstan laravel-boost

ALL PROJECTS

Code Quality Gate

→ /codex:adversarial-review every branch

→ Fix all HIGH items before merge

→ /codex:rescue if stuck > 15 min

→ No hardcoded secrets, no SQL concat

/codex:adversarial /codex:rescue

SEO + DESIGN

Content Gate

→ DESIGN.md at project root before UI work

→ Title <60 chars, meta 130-160 chars

→ IndexNow ping after content deploys

→ /taste → /interface-design → /stitch

IndexNow /taste /stitch

// SAFETY_RAILS

ANTI-MISTAKE SYSTEM , RULES THAT PREVENT DATA LOSS

Every rule below maps to a real incident where production data was lost or a server was overwritten.

DEPLOY SAFETY

rsync Rules

✗ NEVER use rsync --delete, destroys server-only files

→ MD5 drift-check before every deploy, compare local vs server checksums first

→ All deploy scripts use ./deploy.sh with built-in drift protection

→ Server is source of truth, pull before push

deploy.sh drift-check md5sum compare

DEPLOYMENT

Deploy Targets

→ Deploy targets configured per-project in each repo's CLAUDE.md

→ Each site has a deploy.sh script with correct server + path

EDIT RULES

File Safety

✗ Never edit compiled HTML on the server, all sites are Astro SSR, edit .astro source

→ Check for astro.config.mjs FIRST before touching any .html file

→ Playwright screenshot required after every frontend edit

→ Multi-device sync: push to backup remote, pull on second machine

/astro-ssr skill playwright hook

SERVER STATE

Never reset unfamiliar state

✗ git reset --hard HEAD on shared-CMS server nuked 45 min of live Ryan edits (2026-05-19).

→ Audit dirty state + stashes before any destructive git op on a server with active users.

SSH HYGIENE

ControlMaster for rsync chains

→ Add ControlMaster auto + ControlPersist 10m to ~/.ssh/config for contabo.

→ Mac IP keeps tripping fail2ban during rapid rsync deploy chains, multiplexed sessions fix it.

BOT-WALL POLICY

Escalation ladder LOCKED

→ Patchright = top tier. CloakBrowser rejected.

✗ LinkedIn / Trustpilot / G2 / Google SERP scraping ABANDONED, use official APIs.

✗ Public marketing pages only. Never logged-in or personal data (CFAA / DMCA §1201 / EU sui generis floor).

SEO AUDIT WORKFLOW , DO NOT SKIP

STEP 1 , FIRST PASS (FREE)

→ Technical audit → seomator audit <url> (251 rules) or gemini-research

→ AI citability → geo-optimizer (47 criteria) , llms.txt, AI bot rules, passage optimization

→ E-E-A-T + citations → seo-geo-claude-skills (CORE-EEAT 80-item + CITE 40-item)

✗ Never spawn general-purpose agents for SEO research , they default to Sonnet and burn tokens

STEP 2 , SERP LANDSCAPE ONLY

→ seo-specialist agent = SERP landscaping ONLY , who's ranking, competitors, review counts, directory gaps

✗ seo-specialist runs on Haiku and hallucinates technical facts , never trust its page counts, schema claims, or indexed URL lists

STEP 3 , ACT

→ Cross-reference hub scores + Gemini findings + SERP data before touching any file

→ seo-* skill suite for implementation (22 skills, schema, sitemap, technical, content, geo, local, maps, cluster, drift, programmatic, competitor-pages …)

→ Playwright screenshot + deploy + IndexNow ping after every SEO change

⚠ AUDIT GAPS, BACKLOG

Fort Lauderdale POC, SEO audit never ran. Schedule seo-audit + seo-geo + seo-local once content is final. (Flagged 2026-05-12.)

NEW, LATE MAY 2026

Ship Workflow (mandatory)

Every deployed page runs: taste → humanizer → frontend-design → impeccable → SEO suite. SEO skills are not optional, not batch-only, per-page on every ship.

Factcheck Gate (step 2.5)

blog-factcheck verifies every claim against cited sources via WebFetch. BLOCKS ship on NOT-FOUND, unverified entity/date/quote, rubric < 90, or P0 fabricated stat. Pair with blog-factcheck-fix, Ollama Pro + subagents rewrite, Claude orchestrates (10-25× cheaper than Claude rewriting).

Browser-verify Alpine (hard rule)

For Alpine-heavy templates (phone.html, contacts/detail.html), open as affected user with DevTools open BEFORE commit. Server-side smoke tests miss silent x-text binding failures. Locked 2026-05-25 after phone-page incident.

NEW, MAY 2026

ollama_deepseek_pro default reviewer

DeepSeek V4 Pro 1.6T is now first-pass code review. CF 524 fix shipped 2026-05-12 (ollama-mcp 0b88571): streaming + AbortSignal.any + 270s wallclock budget + jittered overload retry + think:false default. Parity rule (bench 2026-06-17): Opus / Sonnet / DeepSeek / GLM / Gemini-Pro tie on correctness, spend paid verify only for cross-arch echo-breaking + code craft, never correctness.

Mirage MCP

Multi-server file ops, SSH + RAM mounts under one virtual filesystem for cross-server diff/read/exec. SSH writes default-deny.

chrome-devtools rule

Render before judging. DOM + a11y + screenshot, not curl. Default for any UX, render, audit, or redesign task.

DAILY-USE SKILLS, HIGHEST LEVERAGE

humanizer

Remove AI-writing tells. Detects/fixes inflated symbolism, em-dash overuse, rule-of-three, AI vocab. Run on every Gemini/GPT-drafted copy before publish.

frontend-design + impeccable

Distinct production-grade UI. impeccable flags AI-slop tells (side-stripe borders, gradient text, glassmorphism, hero-metric clichés). Pair for design audit + polish.

taste-skill + gpt-tasteskill

Senior UI/UX engineer. Editorial typography, gapless bento grids, strict GSAP scroll triggers, massive section spacing. Default for any web dev (per CLAUDE.md taste-skill rule).

SEO suite, 22 skills

seo-audit · seo-technical · seo-content · seo-geo · seo-google · seo-local · seo-maps · seo-schema · seo-sitemap · seo-cluster · seo-drift · seo-firecrawl · seo-rotation · seo-sxo · seo-image-gen · seo-programmatic · seo-competitor-pages · seo-ecommerce · seo-plan · seo-dataforseo · seo-page · seo-backlinks

cc-nano-banana

Required for ALL image generation. Nano Banana (Gemini CLI) for blog images, thumbnails, icons, diagrams, illustrations, photos. Free local fallback (2026-06-07): sd.cpp (Metal + SDXL, local-image.sh) for bulk/uncensored stills, keeps fal=video, recraft=text-in-image.

notebooklm

Query + manage Google NotebookLM via CLI. Master notebook has 38 sources. Vault-to-Master sync after every session.

visual-review + playwright

Mandatory after any frontend edit, screenshot via Playwright, Read the PNG before calling work done. Per CLAUDE.md rule #4.

verify-my-work + verification-before-completion

Run after ANY code edit, deployment, or fix. Requires running verification commands and confirming output before claiming completion.

handoff

Write handoff doc so fresh agent continues in clean context window. Use for scope-creep split-outs, ~120k dumb-zone compression, planner→prototype→planner round-trip, or cross-CLI pass (Codex, Copilot, OpenCode). Triggers: "handoff", "spawn agent for", "fresh session".

ponytail + caveman

ponytail least-code ladder (YAGNI → stdlib → native → one line) cuts the code; caveman cuts the prose. Run both. 10×Opus 4.8 bench: ponytail −28% LOC on realistic builds, same correctness, faster. Levels: light/full/ultra. Installed 2026-06-19.

NEW TOOLS , APRIL 2026

/design-system

66 DESIGN.md files from Stripe, Figma, Apple, PlayStation, WIRED , VoltAgent/awesome-design-md

/code-rescue

Codex CLI second opinion , gpt-4.1-mini, invoke when stuck >2 attempts or need architecture review

/rag-search

Semantic search across all 40+ projects via RAG-Anything + OpenAI embeddings

/skill-auditor

Weekly audit: broken symlinks, GitHub freshness, usage stats, duplicate detection

15 Stack Agents

fastapi · python · laravel · php · sql · deployment · devops · security · code-reviewer + more , VoltAgent

/start + /wrap

Session lifecycle , /start reads Memory+Obsidian+NotebookLM, /wrap commits+pushes+updates Obsidian

116 Active Skills

116 active skills (~/.claude/skills/), 151 stashed to skills-disabled/ to cut session baseline (curated from 305 on 2026-05-31, re-expanded since). Active set: humanizer, frontend-design, impeccable, taste-skill, full 22-skill SEO suite, Astro suite, cc-nano-banana, notebooklm, visual-review, verify-my-work. Plus plugin marketplaces (caveman, ponytail, obra-superpowers, interface-design, codex, memsearch).

Seedance 1.0

ByteDance text-to-video via fal.ai , Pro (1080p) + Lite (720p). Live now.

SEO INTELLIGENCE TOOLS , INSTALLED GLOBALLY

seo-audit-skill CLI

251 audit rules across 20 categories , @seomator engine. CLI: seomator audit <url>

technical content schema

geo-optimizer PYTHON

47 AI citability criteria , llms.txt, AI bot rules, passage-level optimization for ChatGPT/Perplexity/Gemini

GEO llms.txt AI search

seo-geo-claude-skills 20 SKILLS

CORE-EEAT (80-item) + CITE (40-item) frameworks. 12 commands: audit, optimize, schema, keywords, alerts

E-E-A-T citations monitor

NEW , APRIL 28, 2026

Xiaomi MiMo-V2.5-Pro SKIPPED 2026-05-05

Evaluated for purchase ($162/yr). 1T params (42B active MoE), 1M ctx, 1000+ tool call coherence. Declined, ollama_deepseek_pro on Ollama Pro covers the frontier slot (1.6T/49B active, 1M ctx, 3 thinking modes) at zero marginal cost. Reassess after 30-day delta check.

declined mimo-mcp (TTS still free)

MiMo-V2.5-TTS

Text-to-speech with emotion control, voice cloning from 30s audio sample, and voice design from text description. 24kHz, multilingual. Free during open beta.

FREE BETA 3 TTS tools

MiMo-V2.5 Base

Full-modal base model: native text, image, audio, video understanding. 1M context, 131K max output. Pro-level agentic at half the cost ($0.40/M input).

omnimodal 1M context

APRIL 23, 2026

NVIDIA NIM

~80 free AI models via OpenAI-compatible API. MiniMax M3, DeepSeek V4, GLM 5.2, Kimi K2.7, Sarvam-M, GPT-OSS 120B + nvidia_ask (any model)

FREE 7 tools nvidia-mcp · archived

MemSearch

Semantic vector search across 535 curated memory files (was 135). ONNX bge-m3 embeddings (local, free). Auto-captures sessions via hooks. Backed by MEMORY.md multi-index (root + 5 sub-indexes: active / infra / apis-skills / feedback / hot).

/memory-recall 4 hooks

Memory Scripts

memory-audit.py (orphans / empties / oversized) + reorganize-memory.sh (split >50-line files, fix index) + sync-memsearch-ifchanged.sh (AUTO re-index , do NOT manual sync-memsearch.sh) + weekly agent-os hygiene (Layer 5, report-only, Sat)

Level 2 audit

Karpathy Guidelines

4 behavioral principles from Andrej Karpathy: Think Before Coding, Simplicity First, Surgical Changes, Goal-Driven Execution. + RULE #0.7 (Boris Cherny, 2026-06-07): Loops not Prompts · Pre-compute > Inference · Tokens over Headcount.

/karpathy-guidelines skill

NotebookLM Tier 3

Semantic across vault + master notebook. 38-source master. CLI authenticated (Mac + Fedora). Vault → project notebook → Master after every session.

notebooklm skill gbentz2@gmail

Obsidian Vault Tier 2

~/Documents/greg-claude/, Projects, Servers, Sessions, NotebookLM. Per-project file pattern. Check FIRST before asking.

RULE #0

PLANNED , IN PROGRESS

ECharts Dashboard

Interactive reporting dashboard for CrawlHound , grade history, scan trends, site-wide metrics visualization

PDF Reports (ReportLab)

Downloadable PDF audit reports for CrawlHound, MrBotsworth, and gjapp , branded, shareable

CI/CD , Gitea Actions

Automated build + test + deploy pipeline on git.myseodesk.com , lint, SEO checks, rsync deploy on push

// HOOK_LAYER_ENFORCEMENT , v1 (2026-05-05)

DRIFT SIGNALS FROM CLAUDE USAGE REPORT, WHAT THE HOOKS FIX

72% DRIFT

Subagent-heavy sessions. Top spawns: backend-developer, frontend-developer, code-reviewer, all have free MCP twins.

target after hooks: < 30%

76% DRIFT

Usage at >150k context. Long mixed-topic sessions, no /clear or /compact between unrelated tasks.

target after hooks: < 30%

4% DRIFT

Skill invocation share. The 100+ installed skills (visual-review, redesign, audits) barely fire, same work runs through paid subagents instead.

target after hooks: > 15%

FIVE HOOKS, INSTALLED IN ~/.claude/hooks/

agent-spawn-gate.py PreToolUse

Fires on every Agent tool call. Drift list (6 named agents): warns ≤65% context used, blocks >65%. code-reviewer + general-purpose in ALWAYS_BLOCK. Whitelist (Explore/Plan/gsd/audit/seo/feature-dev) silent. Override: [paid-subagent-OK] token logs + allows.

delegation-table-injector.py UserPromptSubmit

Three triggers: session start, topic-switch keywords, post-/compact detection (transcript msg-count drop >50%). Injects ~250 tokens of routing rules from ~/.claude/rules/delegation-cheatsheet.md.

topic-switch-nudge.py UserPromptSubmit

Detects topic-switch keywords (“okay new task”, “switching to”, “moving on”) and appends a /clear reminder. 10-message debounce so it doesn’t fire on every consecutive switch.

skill-trigger-injector.py UserPromptSubmit

Highest-leverage hook. 26 keyword triggers in ~/.claude/rules/skill-triggers.json (regex, priority-ranked). On match, injects an EXTREMELY_IMPORTANT system-reminder naming the skill, same mechanism as using-superpowers.

chrome-devtools-reminder.py UserPromptSubmit NEW

Enforces the “render before judging” rule. When a prompt looks like UX / design / audit / redesign work, injects a reminder to use mcp__chrome-devtools__new_page + take_snapshot + take_screenshot from the main session, not curl, not WebFetch, not subagent.

SIX DRIFT SUBAGENTS, PAID CLAUDE CLONES WITH FREE MCP TWINS

drift agent	free mcp route	when in-agent is justified
backend-developer	ollama_devstral, ollama_deepseek_pro	5+ files coordinated, or hard arch
frontend-developer	ollama_kimi (HTML), ollama_code (review)	multi-framework full-stack work
fastapi-developer	ollama_deepseek_pro, ollama_devstral	complex async patterns + tests
code-reviewer	HARD-BLOCKED. Chain: ollama_deepseek_pro (DEFAULT) → ollama_code → fast_gpt_oss	`[paid-subagent-OK]` override only
security-auditor	/security-auditor skill + /api-keys skill	whole-system audit (5+ services)
database-administrator	/sql-pro skill + mysql/postgres MCP	HA/replication infra, not just queries

VERIFICATION CHAIN, MULTI-EYE REVIEW PATTERN

Apply when work is non-trivial: >50 LOC change, security-sensitive, or first-time deploy. Review prompts are short, so paid free-tier rate-limits go further than on bulk gen.

1. DRAFT

ollama_kimi / ollama_devstral / ollama_deepseek_pro

2. FIRST REVIEW, DEFAULT

ollama_deepseek_pro (DeepSeek V4 Pro 1.6T)

3. CODE-SPECIALIZED

ollama_code (Mistral Large 3 675B, cross-arch)

4. CROSS-ARCH

fast_gpt_oss (GPT OSS 120B)

5. PAID VERIFIER (~$18 cap)

gpt-5.1-codex-mini · gpt-5.3-codex (high-stakes)

6. INTEGRATE

Sonnet (here)

// SUBAGENTS , 50 ACTIVE

SPECIALIZED SUBAGENTS , SPAWNED BY OPUS VIA AGENT TOOL

STACK (15)

fastapi-developer python-pro laravel-specialist php-pro typescript-pro sql-pro backend-developer frontend-developer deployment-engineer devops-engineer security-engineer security-auditor database-administrator code-reviewer seo-specialist

Source: awesome-claude-code-subagents

SEO (13)

seo-technical seo-content seo-geo seo-google seo-schema seo-sitemap seo-local seo-maps seo-performance seo-visual seo-backlinks seo-dataforseo seo-image-gen

Technical truth → myseodesk-seo-hub

ADS AUDIT (6)

audit-google audit-meta audit-budget audit-compliance audit-creative audit-tracking

ADS CREATIVE (4)

creative-strategist copy-writer visual-designer format-adapter

Full campaign pipeline

GSD SUITE (12)

gsd-planner gsd-executor gsd-debugger gsd-verifier gsd-plan-checker gsd-phase-researcher gsd-project-researcher gsd-roadmapper gsd-codebase-mapper gsd-integration-checker gsd-nyquist-auditor gsd-research-synthesizer

Structured milestone planning

BUILT-IN (5)

general-purpose Explore Plan claude-code-guide statusline-setup

PLUGIN (1)

codex:codex-rescue

Use: stuck >2 attempts or arch review

MCP CONFIG FIX (Apr 2026)

MCPs now in ~/.claude.json (user scope) , loads in ALL CLI + VSCode sessions automatically

// TOKEN_ESTIMATION

ROUGH COST PER PLAN TYPE , OPUS 4.8 ($5/M in · $25/M out) · 3× CHEAPER THAN 4.7

Estimates assume Opus as planner + Sonnet subagents. Actual cost varies by context size and iterations.

Plan Type	Lead Model	~Tokens	~Cost	Notes
Quick Ask	Ollama Pro (Nemotron/Chat)	1-3K	$0 (Pro)	ollama_nemotron / ollama_chat , Tier 1
Bug Fix	Sonnet 5	5-15K	~$0.05-0.15	Read file → diagnose → patch → verify
Single Feature	Opus 4.8 + Sonnet	20-60K	~$0.17-0.70	Plan → subagent execute → codex:review
Static Page Build	Opus + Kimi K2.7	30-80K	~$0.17-0.50	DESIGN.md → /stitch → Playwright check → deploy
Astro SSR Feature	Opus + ollama_devstral / ollama_code	50-120K	~$0.35-1	/astro-ssr skill + full pre-deploy gate
Laravel API	Opus + ollama_devstral, verify GPT 5.1 Codex Mini	60-150K	~$0.50-1.30	Models + migrations + tests + PHPStan + review
Full Site (5-10 pages)	Opus + mix	150-400K	~$1.30-4	Full 6-stage pipeline: PLAN→DESIGN→BUILD→SEO→CRO→QA
SEO Audit	Gemini 3.5 Flash	10-30K	~$0.10-0.30	Schema + compliance + content + schema , mostly Gemini

MINIMIZE COST

→ Use Ollama Pro (Tier 1) for drafts, analysis, code review
→ Use Kimi K2.7 (Pro) for bulk HTML generation
→ Use Gemini 3.5 Flash for research/content
→ Groq/NVIDIA = fallback only (Tier 3)
→ /compact regularly in long sessions

CONTEXT WATCH

→ Each file read adds ~1-5K tokens
→ Large HTML files: 10-30K tokens each
→ Long conversations drift , use /clear or new session
→ Subagents get fresh context , use them for big files
→ CLAUDE.md loaded every session (~5K tokens)

WHEN TO UPGRADE

→ Stuck after 2 attempts → /codex:rescue
→ Complex architecture → Opus (not Sonnet)
→ Critical prod deploy → gpt-5.1-codex-mini; high-stakes → gpt-5.3-codex (~$18 cap)
→ Fresh frontend design take → gpt-5.5 (metered, capped)
→ Free cross-arch verify → fast_gpt_oss or Codex CLI gpt-5.4-mini
→ Free model fails tool calls → switch to Sonnet

// REFERENCE_TABLE

Role	Model	MCP Server	Use Case	Cost
Planner	Opus 4.8	native	Reasoning, architecture, strategy	paid
Lead	Sonnet 5	native	Executes tasks, reviews, deploys	paid
Frontier reviewer	DeepSeek V4 Pro 1.6T	ollama-pro	Default first-pass code review (post-streaming-fix 2026-05-12). 1M ctx, off/low/med/high thinking modes.	PRO
Research	Gemini 3.5 Flash	gemini-research	Research, comparisons, blog posts, FAQs	paid
Reasoning	Gemini 3.1 Pro	gemini-research	Disabled 2026-05-18, use ollama_deepseek_pro (free)	DISABLED
Frontier	Gemini 3.1 Pro	gemini-research	Disabled 2026-05-18, use ollama_deepseek_pro (free, 1.6T) or opencode gpt-5.1-codex-mini / gpt-5.3-codex (~$18 cap)	DISABLED
Fast 8B	Llama 3.1 8B Instant	groq-fast	Sub-second 8B, gates, autocomplete, micro-tasks	FREE
Fast 70B	Llama 3.3 70B Versatile	fast_code	Workhorse fast chat, speed-tier fallback	FREE
Preview	Llama 4 Scout 17B/16E	fast_llama4	Llama 4 MoE, preview tier per Groq, opportunistic only	FREE
Cross-arch	GPT OSS 120B	fast_gpt_oss	Verification chain second pass, only production-tier cross-arch on Groq	FREE
STT	Whisper Large v3 Turbo	groq-fast	Speech-to-text, 400 req/min, 4M sec/day	FREE
Verify	GPT 5.1 Codex Mini	opencode	Default paid cross-arch verify, manual pre-deploy only (~$18 cap). GPT 5.5 Pro DISABLED.	paid
Verify+	GPT 5.3 Codex	opencode	$1.75/$14 frontier code, HIGH-STAKES review only (~$0.04/review). (2026-06-17)	paid
Frontend	GPT 5.5	opencode	$5/$30 flagship, fresh design-creativity escalation only, capped (~70 gens/$18). (2026-06-17)	paid
Reason	Kimi K2 Thinking	opencode	Complex logic, multi-step planning	paid
Code	Trinity Large Preview	opencode	General coding and reasoning	paid
HTML	Kimi K2.7	ollama_kimi	Bulk HTML, multimodal, agent swarm	PRO
Content	Nemotron 3 Super 120B	ollama_nemotron	Agentic reasoning, content	PRO
Ollama Pro	Mistral Large 3 675B was qwen3-coder 480B, retires Jul 15	ollama_code	Pass-2 code review, cross-arch vs DeepSeek	PRO
Ollama Pro	Qwen3.5 397B	ollama_chat	Reasoning, writing, analysis, vision, thinking	PRO
Ollama Pro	Mistral Large 3 675B	ollama_mistral	Complex reasoning, large documents, vision	PRO
Ollama Pro	DeepSeek V4 Flash 158B	ollama_deepseek	Deep reasoning, thinking, agentic tasks	PRO
Ollama Pro	Kimi K2.7 Code was devstral-2 123B, retires Jul 15	ollama_devstral	SWE coding, multi-file editing (surviving specialist)	PRO
Ollama Pro	GLM 5.2	ollama_glm	Agentic coding, sustained over 100s of rounds	PRO
MiMo	MiMo-V2.5-Pro	mimo	Frontier reasoning, 1T/42B MoE, 1M ctx, 1000+ tool calls	$1/$3M
MiMo	MiMo-V2.5	mimo	Full-modal (text+image+audio+video), 1M ctx	$0.4/$2M
MiMo	MiMo-V2.5-TTS	mimo	Text-to-speech, voice clone, voice design	FREE
NVIDIA · archived	MiniMax M3	nvidia-nim	Large MoE reasoning + code (exclusive to NVIDIA)	FREE
NVIDIA · archived	DeepSeek V4 Flash	nvidia-nim	158B MoE reasoning (Tier 3 fallback)	FREE
NVIDIA · archived	GLM 5.2	nvidia-nim	Agentic coding (Tier 3 fallback)	FREE
NVIDIA · archived	Kimi K2.7	nvidia-nim	Moonshot reasoning (Tier 3 fallback)	FREE
NVIDIA · archived	Sarvam-M	nvidia-nim	Indian multilingual (exclusive to NVIDIA)	FREE
NVIDIA · archived	GPT-OSS 120B	nvidia-nim	OpenAI OSS (Tier 3 fallback)	FREE

Models

Ollama Pro

Free Models

MCP Servers

Active Skills

Active Agents

+ 1 active hook (agent-spawn-gate; 4 UserPromptSubmit injectors , delegation-table, topic-switch, skill-trigger, chrome-devtools-reminder , disabled 2026-05-31 to stop autocompact thrash) · 535 memory files in MemSearch · 38-source NotebookLM master · Obsidian vault per-project files