// SYSTEM_ARCHITECTURE v14 · streaming-fix + DeepSeek default + chrome-devtools rule

Greg's AI Workflow

Model stack · Skills · Subagents · Build pipelines · Safety rails

INPUT
User Request
O
Opus 4.7
PLANNER

Reasoning, strategy, architecture, plans

native
S
Sonnet 4.6
LEAD AGENT

Primary executor · Reviews all outputs · Integrates · Deploys

native delegates via MCP
RESEARCH / REASONING
Gemini 3.1 Flash

Research, blog posts, FAQs, comparisons, case studies

research compare find_best
Gemini 3.1 Pro PREMIUM

Deep reasoning, complex analysis, architecture decisions

pro_reason deep_research
Gemini 3.1 Pro FRONTIER

Most capable model — hardest problems, deepest analysis

frontier
Groq Llama 4 Scout PREVIEW FREE

Llama 4 MoE, fast multimodal — Groq marks this Preview, "may be discontinued at short notice." Use opportunistically.

fast_llama4
GPT OSS 120B via Groq FREE

OpenAI OSS — the only production-tier cross-architecture reviewer on Groq. Use as second pass in the verification chain.

fast_gpt_oss
Llama 3.3 70B via Groq FREE

Workhorse fast chat — speed-tier fallback when Ollama Pro is down. 1K req/min, 500K req/day.

fast_code
CODE / BACKEND
DeepSeek V4 Pro 1.6T / 49B active FRONTIER · DEFAULT

Default first-pass code reviewer. 1M context, 3 thinking modes (off / low / medium / high). Streaming via ollama-mcp 0b88571 — CF 524 fix 2026-05-12.

ollama_deepseek_pro
GPT 5.4 Pro PREMIUM

Hardest backend problems, algorithms, APIs

opencode_gpt54_pro
GPT 5.3 Codex Spark

Fast code generation, everyday backend tasks

opencode_gpt53_codex_spark
Kimi K2 Thinking

Complex logic, multi-step reasoning

opencode_kimi_thinking
Trinity Large Preview

General coding and reasoning

opencode_trinity_preview
DeepSeek V4 Flash 158B OLLAMA PRO

Deep reasoning, thinking, agentic tasks — comparable to GPT-5

ollama_deepseek
Devstral 2 123B OLLAMA PRO

SWE-Bench 72.2% — Mistral's best coder, multi-file editing

ollama_devstral
GLM 5.1 OLLAMA PRO

Agentic coding — sustains over 100s of rounds, SWE-Bench Pro 58.4%

ollama_glm
MiniMax M2.7 OLLAMA PRO

Agentic coding + tool use — alternative architecture for cross-arch second-opinion review (different family from Qwen / DeepSeek).

ollama_minimax
DeepSeek V3.2 legacy OLLAMA PRO

671B MoE, 160K ctx. Use V4 Flash / Pro instead; V3.2 kept as compatibility fallback.

ollama_deepseek3
CONTENT / UTILITY
Kimi K2.6 OLLAMA PRO

Bulk HTML, multimodal, vision, agent swarm — moved from free to Pro

ollama_kimi
Nemotron 3 Super NVIDIA 120B OLLAMA PRO

Agentic reasoning, content — moved from free to Pro

ollama_nemotron
Qwen3-Coder 480B OLLAMA PRO

Code tasks, second opinions, large code reviews — cloud-proxied

ollama_code
Qwen3.5 397B OLLAMA PRO

Reasoning, writing, analysis, vision, thinking — zero cost

ollama_chat
Mistral Large 3 675B OLLAMA PRO

Complex reasoning, large documents, vision — biggest model

ollama_mistral
Seedance 1.0 ByteDance / fal.ai VIDEO

Text-to-video & image-to-video — Pro (1080p) + Lite (720p, faster)

fal-ai/seedance/v1/pro fal-ai/seedance/v1/lite
Results flow back to Sonnet 4.6/4.7 for review, integration & deployment
// MCP_SERVERS — 12 ACTIVE
chrome-devtools RULE

Render before judging. DOM + accessibility tree + screenshot, not curl. Default for any UX, design audit, redesign, or rendering task per CLAUDE.md.

new_page · take_snapshot · take_screenshot
dataforseo SEO

Live SERP, keyword volume, backlinks, on-page Lighthouse, business listings, AI visibility (LLM mentions). 80+ tools — the canonical SEO data source.

serp · keywords · backlinks · ai_opt
mirage NEW · 2026-05-07

Multi-server file ops. SSH + RAM mounts under one virtual filesystem — cross-server diff, read, exec without juggling tabs. SSH writes default-deny.

mounts · ls · read · diff · exec
stitch-design UI

Google Stitch — UI generation from prompts. Design systems, screen variants, project scaffolding. Convert output to semantic CSS before shipping.

create_design_system · generate_screen · variants
mistral-code

Mistral code-specialized models — review, explain, generate. Complements Ollama Pro + OpenCode coverage.

code_review · code_explain · code_generate
fal-image

fal.ai image generation + edit endpoints — fallback when cc-nano-banana (Gemini) is unavailable or for specific fal-only models.

generate_image · edit_image

+ gemini-research, groq-fast, ollama-pro, opencode, xiaomi-mimo, glm-free — the 6 model-routing MCPs already covered above. Plus Anthropic-managed: Gmail, Google Calendar, Google Drive, HubSpot, Notion, Playwright (via plugins).

// BUILD_PIPELINES
ASTRO SSR FRONT-END BUILD
STEP 1
Design Input

Screenshot, brief, or reference file

DESIGN.md awesome-design-md
STEP 2
Skills Layer

Claude Code skills invoked by Opus

/stitch /astro-ssr /taste /design-system /code-rescue /rag-search /skill-auditor
STEP 3
Subagents

Specialized agents spawned per task

frontend-developer typescript-pro codex:codex-rescue
STEP 4
Models Execute

Fast code gen + design research

GPT 5.3 Spark Gemini 3.1 Flash Llama 4 Scout
STEP 5
Review + Deploy

Code review then ship

/codex:review /codex:rescue rsync deploy
LIBRARIES: awesome-design-md awesome-agent-skills awesome-claude-code-subagents awesome-codex-subagents codex-plugin-cc
// QUALITY_GATES
PRE-DEPLOY GATES — MANDATORY BEFORE EVERY SHIP
ASTRO SSR
Frontend Gate
npm run build:css (if Tailwind classes changed)
Playwright desktop + mobile screenshot
seo-compliance-check.py on live URL
git commit before deploy
playwright seo-check
LARAVEL
Backend Gate
php artisan test (all suites pass)
phpstan analyse --level=5 (no new errors)
Check N+1 — eager load with ->with()
artisan config:clear + cache:clear
phpstan laravel-boost
ALL PROJECTS
Code Quality Gate
/codex:adversarial-review every branch
Fix all HIGH items before merge
/codex:rescue if stuck > 15 min
No hardcoded secrets, no SQL concat
/codex:adversarial /codex:rescue
SEO + DESIGN
Content Gate
DESIGN.md at project root before UI work
Title <60 chars, meta 130-160 chars
IndexNow ping after content deploys
/taste → /interface-design → /stitch
IndexNow /taste /stitch
// SAFETY_RAILS
ANTI-MISTAKE SYSTEM — RULES THAT PREVENT DATA LOSS
Every rule below maps to a real incident where production data was lost or a server was overwritten.
DEPLOY SAFETY
rsync Rules
NEVER use rsync --delete — destroys server-only files
MD5 drift-check before every deploy — compare local vs server checksums first
All deploy scripts use ./deploy.sh with built-in drift protection
Server is source of truth — pull before push
deploy.sh drift-check md5sum compare
DEPLOYMENT
Deploy Targets
Deploy targets configured per-project in each repo's CLAUDE.md
Each site has a deploy.sh script with correct server + path
EDIT RULES
File Safety
Never edit compiled HTML on the server — all sites are Astro SSR, edit .astro source
Check for astro.config.mjs FIRST before touching any .html file
Playwright screenshot required after every frontend edit
Multi-device sync: push to backup remote, pull on second machine
/astro-ssr skill playwright hook
SEO AUDIT WORKFLOW — DO NOT SKIP
STEP 1 — FIRST PASS (FREE)
Technical audit → seomator audit <url> (251 rules) or gemini-research
AI citability → geo-optimizer (47 criteria) — llms.txt, AI bot rules, passage optimization
E-E-A-T + citations → seo-geo-claude-skills (CORE-EEAT 80-item + CITE 40-item)
Never spawn general-purpose agents for SEO research — they default to Sonnet and burn tokens
STEP 2 — SERP LANDSCAPE ONLY
seo-specialist agent = SERP landscaping ONLY — who's ranking, competitors, review counts, directory gaps
seo-specialist runs on Haiku and hallucinates technical facts — never trust its page counts, schema claims, or indexed URL lists
STEP 3 — ACT
Cross-reference hub scores + Gemini findings + SERP data before touching any file
seo-* skill suite for implementation (22 skills — schema, sitemap, technical, content, geo, local, maps, cluster, drift, programmatic, competitor-pages …)
Playwright screenshot + deploy + IndexNow ping after every SEO change
AUDIT GAPS — BACKLOG

Fort Lauderdale POC — SEO audit never ran. Schedule seo-audit + seo-geo + seo-local once content is final. (Flagged 2026-05-12.)

NEW — MAY 2026
ollama_deepseek_pro default reviewer

DeepSeek V4 Pro 1.6T is now first-pass code review. CF 524 fix shipped 2026-05-12 (ollama-mcp 0b88571): streaming + AbortSignal.any + 270s wallclock budget + jittered overload retry + think:false default.

Mirage MCP

Multi-server file ops — SSH + RAM mounts under one virtual filesystem for cross-server diff/read/exec. SSH writes default-deny.

chrome-devtools rule

Render before judging. DOM + a11y + screenshot, not curl. Default for any UX, render, audit, or redesign task.

DAILY-USE SKILLS — HIGHEST LEVERAGE
humanizer

Remove AI-writing tells. Detects/fixes inflated symbolism, em-dash overuse, rule-of-three, AI vocab. Run on every Gemini/GPT-drafted copy before publish.

frontend-design + impeccable

Distinct production-grade UI. impeccable flags AI-slop tells (side-stripe borders, gradient text, glassmorphism, hero-metric clichés). Pair for design audit + polish.

taste-skill + gpt-tasteskill

Senior UI/UX engineer. Editorial typography, gapless bento grids, strict GSAP scroll triggers, massive section spacing. Default for any web dev (per CLAUDE.md taste-skill rule).

SEO suite — 22 skills

seo-audit · seo-technical · seo-content · seo-geo · seo-google · seo-local · seo-maps · seo-schema · seo-sitemap · seo-cluster · seo-drift · seo-firecrawl · seo-rotation · seo-sxo · seo-image-gen · seo-programmatic · seo-competitor-pages · seo-ecommerce · seo-plan · seo-dataforseo · seo-page · seo-backlinks

cc-nano-banana

Required for ALL image generation. Nano Banana (Gemini CLI) for blog images, thumbnails, icons, diagrams, illustrations, photos.

notebooklm

Query + manage Google NotebookLM via CLI. Master notebook has 38 sources. Vault-to-Master sync after every session.

visual-review + playwright

Mandatory after any frontend edit — screenshot via Playwright, Read the PNG before calling work done. Per CLAUDE.md rule #4.

verify-my-work + verification-before-completion

Run after ANY code edit, deployment, or fix. Requires running verification commands and confirming output before claiming completion.

NEW TOOLS — APRIL 2026
/design-system

66 DESIGN.md files from Stripe, Figma, Apple, PlayStation, WIRED — VoltAgent/awesome-design-md

/code-rescue

Codex CLI second opinion — gpt-4.1-mini, invoke when stuck >2 attempts or need architecture review

/rag-search

Semantic search across all 40+ projects via RAG-Anything + OpenAI embeddings

/skill-auditor

Weekly audit: broken symlinks, GitHub freshness, usage stats, duplicate detection

15 Stack Agents

fastapi · python · laravel · php · sql · deployment · devops · security · code-reviewer + more — VoltAgent

/start + /wrap

Session lifecycle — /start reads Memory+Obsidian+NotebookLM, /wrap commits+pushes+updates Obsidian

305 Skills

305 skills installed (~/.claude/skills/). Includes humanizer, frontend-design, impeccable, taste-skill, full 22-skill SEO suite, cc-nano-banana, notebooklm, visual-review, verify-my-work. Plus plugin marketplaces (caveman, obra-superpowers, interface-design, codex, memsearch).

Seedance 1.0

ByteDance text-to-video via fal.ai — Pro (1080p) + Lite (720p). Live now.

SEO INTELLIGENCE TOOLS — INSTALLED GLOBALLY
seo-audit-skill CLI

251 audit rules across 20 categories — @seomator engine. CLI: seomator audit <url>

technical content schema
geo-optimizer PYTHON

47 AI citability criteria — llms.txt, AI bot rules, passage-level optimization for ChatGPT/Perplexity/Gemini

GEO llms.txt AI search
seo-geo-claude-skills 20 SKILLS

CORE-EEAT (80-item) + CITE (40-item) frameworks. 12 commands: audit, optimize, schema, keywords, alerts

E-E-A-T citations monitor
NEW — APRIL 28, 2026
Xiaomi MiMo-V2.5-Pro SKIPPED 2026-05-05

Evaluated for purchase ($162/yr). 1T params (42B active MoE), 1M ctx, 1000+ tool call coherence. Declinedollama_deepseek_pro on Ollama Pro covers the frontier slot (1.6T/49B active, 1M ctx, 3 thinking modes) at zero marginal cost. Reassess after 30-day delta check.

declined mimo-mcp (TTS still free)
MiMo-V2.5-TTS

Text-to-speech with emotion control, voice cloning from 30s audio sample, and voice design from text description. 24kHz, multilingual. Free during open beta.

FREE BETA 3 TTS tools
MiMo-V2.5 Base

Full-modal base model: native text, image, audio, video understanding. 1M context, 131K max output. Pro-level agentic at half the cost ($0.40/M input).

omnimodal 1M context
APRIL 23, 2026
NVIDIA NIM

~80 free AI models via OpenAI-compatible API. MiniMax M2.7, DeepSeek V3.2, GLM 5.1, Kimi K2.5, Sarvam-M, GPT-OSS 120B + nvidia_ask (any model)

FREE 7 tools nvidia-mcp · archived
MemSearch

Semantic vector search across 135 curated memory files (was 101). ONNX bge-m3 embeddings (local, free). Auto-captures sessions via hooks.

/memory-recall 4 hooks
Memory Scripts

reorganize-memory.sh (audit: orphans, empties, oversized files) + sync-memsearch.sh (copy curated memories to MemSearch, re-index)

Level 2 audit
Karpathy Guidelines

4 behavioral principles from Andrej Karpathy: Think Before Coding, Simplicity First, Surgical Changes, Goal-Driven Execution

/karpathy-guidelines skill
NotebookLM Tier 3

Semantic across vault + master notebook. 38-source master. CLI authenticated (Mac + Fedora). Vault → project notebook → Master after every session.

notebooklm skill gbentz2@gmail
Obsidian Vault Tier 2

~/Documents/greg-claude/ — Projects, Servers, Sessions, NotebookLM. Per-project file pattern. Check FIRST before asking.

RULE #0
PLANNED — IN PROGRESS
ECharts Dashboard

Interactive reporting dashboard for CrawlHound — grade history, scan trends, site-wide metrics visualization

PDF Reports (ReportLab)

Downloadable PDF audit reports for CrawlHound, MrBotsworth, and gjapp — branded, shareable

CI/CD — Gitea Actions

Automated build + test + deploy pipeline on git.myseodesk.com — lint, SEO checks, rsync deploy on push

// HOOK_LAYER_ENFORCEMENT — v1 (2026-05-05)
DRIFT SIGNALS FROM CLAUDE USAGE REPORT — WHAT THE HOOKS FIX
72% DRIFT

Subagent-heavy sessions. Top spawns: backend-developer, frontend-developer, code-reviewer — all have free MCP twins.

target after hooks: < 30%
76% DRIFT

Usage at >150k context. Long mixed-topic sessions, no /clear or /compact between unrelated tasks.

target after hooks: < 30%
4% DRIFT

Skill invocation share. The 100+ installed skills (visual-review, redesign, audits) barely fire — same work runs through paid subagents instead.

target after hooks: > 15%
FIVE HOOKS — INSTALLED IN ~/.claude/hooks/
agent-spawn-gate.py PreToolUse

Fires on every Agent tool call. Drift list (6 named agents): warns ≤65% context used, blocks >65%. code-reviewer + general-purpose in ALWAYS_BLOCK. Whitelist (Explore/Plan/gsd/audit/seo/feature-dev) silent. Override: [paid-subagent-OK] token logs + allows.

delegation-table-injector.py UserPromptSubmit

Three triggers: session start, topic-switch keywords, post-/compact detection (transcript msg-count drop >50%). Injects ~250 tokens of routing rules from ~/.claude/rules/delegation-cheatsheet.md.

topic-switch-nudge.py UserPromptSubmit

Detects topic-switch keywords (“okay new task”, “switching to”, “moving on”) and appends a /clear reminder. 10-message debounce so it doesn’t fire on every consecutive switch.

skill-trigger-injector.py UserPromptSubmit

Highest-leverage hook. 26 keyword triggers in ~/.claude/rules/skill-triggers.json (regex, priority-ranked). On match, injects an EXTREMELY_IMPORTANT system-reminder naming the skill — same mechanism as using-superpowers.

chrome-devtools-reminder.py UserPromptSubmit NEW

Enforces the “render before judging” rule. When a prompt looks like UX / design / audit / redesign work, injects a reminder to use mcp__chrome-devtools__new_page + take_snapshot + take_screenshot from the main session — not curl, not WebFetch, not subagent.

SIX DRIFT SUBAGENTS — PAID CLAUDE CLONES WITH FREE MCP TWINS
drift agent free mcp route when in-agent is justified
backend-developerollama_devstral, opencode_gpt54_pro5+ files coordinated, or hard arch
frontend-developerollama_kimi (HTML), ollama_code (review)multi-framework full-stack work
fastapi-developeropencode_gpt54_pro, ollama_deepseek_procomplex async patterns + tests
code-reviewerHARD-BLOCKED. Chain: ollama_deepseek_pro (DEFAULT) → ollama_code → fast_gpt_oss[paid-subagent-OK] override only
security-auditor/security-auditor skill + /api-keys skillwhole-system audit (5+ services)
database-administrator/sql-pro skill + mysql/postgres MCPHA/replication infra, not just queries
VERIFICATION CHAIN — MULTI-EYE REVIEW PATTERN

Apply when work is non-trivial: >50 LOC change, security-sensitive, or first-time deploy. Review prompts are short, so paid free-tier rate-limits go further than on bulk gen.

1. DRAFT
ollama_kimi / ollama_devstral / opencode_gpt54_pro
2. FIRST REVIEW — DEFAULT
ollama_deepseek_pro (DeepSeek V4 Pro 1.6T)
3. CODE-SPECIALIZED
ollama_code (qwen3-coder 480B)
4. CROSS-ARCH
fast_gpt_oss (GPT OSS 120B)
5. FRONTIER (gated)
Gemini pro_reason / Codex CLI
6. INTEGRATE
Sonnet (here)
// SUBAGENTS — 56 ACTIVE
SPECIALIZED SUBAGENTS — SPAWNED BY OPUS VIA AGENT TOOL
STACK (15)
fastapi-developer python-pro laravel-specialist php-pro typescript-pro sql-pro backend-developer frontend-developer deployment-engineer devops-engineer security-engineer security-auditor database-administrator code-reviewer seo-specialist
Source: awesome-claude-code-subagents
SEO (13)
seo-technical seo-content seo-geo seo-google seo-schema seo-sitemap seo-local seo-maps seo-performance seo-visual seo-backlinks seo-dataforseo seo-image-gen
Technical truth → myseodesk-seo-hub
ADS AUDIT (6)
audit-google audit-meta audit-budget audit-compliance audit-creative audit-tracking
ADS CREATIVE (4)
creative-strategist copy-writer visual-designer format-adapter
Full campaign pipeline
GSD SUITE (12)
gsd-planner gsd-executor gsd-debugger gsd-verifier gsd-plan-checker gsd-phase-researcher gsd-project-researcher gsd-roadmapper gsd-codebase-mapper gsd-integration-checker gsd-nyquist-auditor gsd-research-synthesizer
Structured milestone planning
BUILT-IN (5)
general-purpose Explore Plan claude-code-guide statusline-setup
PLUGIN (1)
codex:codex-rescue
Use: stuck >2 attempts or arch review
MCP CONFIG FIX (Apr 2026)
MCPs now in ~/.claude.json (user scope) — loads in ALL CLI + VSCode sessions automatically
// TOKEN_ESTIMATION
ROUGH COST PER PLAN TYPE — OPUS 4.7 ($15/M in · $75/M out)
Estimates assume Opus as planner + Sonnet subagents. Actual cost varies by context size and iterations.
Plan Type Lead Model ~Tokens ~Cost Notes
Quick Ask Ollama Pro (Nemotron/Chat) 1–3K $0 (Pro) ollama_nemotron / ollama_chat — Tier 1
Bug Fix Sonnet 4.6 5–15K ~$0.05–0.15 Read file → diagnose → patch → verify
Single Feature Opus 4.7 + Sonnet 20–60K ~$0.50–2 Plan → subagent execute → codex:review
Static Page Build Opus + Kimi K2.6 30–80K ~$0.50–1.50 DESIGN.md → /stitch → Playwright check → deploy
Astro SSR Feature Opus + GPT 5.3 Spark 50–120K ~$1–3 /astro-ssr skill + full pre-deploy gate
Laravel API Opus + GPT 5.4 Pro 60–150K ~$1.50–4 Models + migrations + tests + PHPStan + review
Full Site (5–10 pages) Opus + mix 150–400K ~$4–12 Full 6-stage pipeline: PLAN→DESIGN→BUILD→SEO→CRO→QA
SEO Audit Gemini 3.1 Flash 10–30K ~$0.10–0.30 Schema + compliance + content + schema — mostly Gemini
MINIMIZE COST
  • → Use Ollama Pro (Tier 1) for drafts, analysis, code review
  • → Use Kimi K2.6 (Pro) for bulk HTML generation
  • → Use Gemini 3.1 Flash for research/content
  • → Groq/NVIDIA = fallback only (Tier 3)
  • → /compact regularly in long sessions
CONTEXT WATCH
  • → Each file read adds ~1–5K tokens
  • → Large HTML files: 10–30K tokens each
  • → Long conversations drift — use /clear or new session
  • → Subagents get fresh context — use them for big files
  • → CLAUDE.md loaded every session (~5K tokens)
WHEN TO UPGRADE
  • → Stuck after 2 attempts → /codex:rescue
  • → Complex architecture → Opus (not Sonnet)
  • → Critical prod deploy → Gemini 3.1 Pro review
  • → Adversarial review finds HI bugs → GPT 5.4 Pro fix
  • → Free model fails tool calls → switch to Sonnet
// REFERENCE_TABLE
Role Model MCP Server Use Case Cost
Planner Opus 4.7 native Reasoning, architecture, strategy paid
Lead Sonnet 4.6 / 4.7 native Executes tasks, reviews, deploys paid
Frontier reviewer DeepSeek V4 Pro 1.6T ollama-pro Default first-pass code review (post-streaming-fix 2026-05-12). 1M ctx, off/low/med/high thinking modes. PRO
Research Gemini 3.1 Flash gemini-research Research, comparisons, blog posts, FAQs paid
Reasoning Gemini 3.1 Pro gemini-research Deep analysis, architecture, complex reasoning paid
Frontier Gemini 3.1 Pro gemini-research Hardest problems, frontier intelligence paid
Fast 8B Llama 3.1 8B Instant groq-fast Sub-second 8B — gates, autocomplete, micro-tasks FREE
Fast 70B Llama 3.3 70B Versatile fast_code Workhorse fast chat — speed-tier fallback FREE
Preview Llama 4 Scout 17B/16E fast_llama4 Llama 4 MoE — preview tier per Groq, opportunistic only FREE
Cross-arch GPT OSS 120B fast_gpt_oss Verification chain second pass — only production-tier cross-arch on Groq FREE
STT Whisper Large v3 Turbo groq-fast Speech-to-text — 400 req/min, 4M sec/day FREE
Backend GPT 5.4 Pro opencode Hardest backend, algorithms, APIs paid
Code GPT 5.3 Codex Spark opencode Fast code generation, everyday tasks paid
Reason Kimi K2 Thinking opencode Complex logic, multi-step planning paid
Code Trinity Large Preview opencode General coding and reasoning paid
HTML Kimi K2.6 ollama_kimi Bulk HTML, multimodal, agent swarm PRO
Content Nemotron 3 Super 120B ollama_nemotron Agentic reasoning, content PRO
Ollama Pro Qwen3-Coder 480B ollama_code Code tasks, large code reviews PRO
Ollama Pro Qwen3.5 397B ollama_chat Reasoning, writing, analysis, vision, thinking PRO
Ollama Pro Mistral Large 3 675B ollama_mistral Complex reasoning, large documents, vision PRO
Ollama Pro DeepSeek V4 Flash 158B ollama_deepseek Deep reasoning, thinking, agentic tasks PRO
Ollama Pro Devstral 2 123B ollama_devstral SWE coding, multi-file editing (72.2% SWE-Bench) PRO
Ollama Pro GLM 5.1 ollama_glm Agentic coding, sustained over 100s of rounds PRO
MiMo MiMo-V2.5-Pro mimo Frontier reasoning, 1T/42B MoE, 1M ctx, 1000+ tool calls $1/$3M
MiMo MiMo-V2.5 mimo Full-modal (text+image+audio+video), 1M ctx $0.4/$2M
MiMo MiMo-V2.5-TTS mimo Text-to-speech, voice clone, voice design FREE
NVIDIA · archived MiniMax M2.7 nvidia-nim Large MoE reasoning + code (exclusive to NVIDIA) FREE
NVIDIA · archived DeepSeek V3.2 nvidia-nim 671B MoE reasoning (Tier 3 fallback) FREE
NVIDIA · archived GLM 5.1 nvidia-nim Agentic coding (Tier 3 fallback) FREE
NVIDIA · archived Kimi K2.5 nvidia-nim Moonshot reasoning (Tier 3 fallback) FREE
NVIDIA · archived Sarvam-M nvidia-nim Indian multilingual (exclusive to NVIDIA) FREE
NVIDIA · archived GPT-OSS 120B nvidia-nim OpenAI OSS (Tier 3 fallback) FREE
40+
Models
12
Ollama Pro
10
Free Models
12
MCP Servers
305
Skills
50
Agents

+ 5 hooks (agent-spawn-gate, delegation-table-injector, topic-switch-nudge, skill-trigger-injector, chrome-devtools-reminder) · 135 memory files in MemSearch · 38-source NotebookLM master · Obsidian vault per-project files