consult-llm is a tool for getting a second opinion from another AI model,

right inside your existing agent workflow. Use it to plan architecture,

review changes, debate approaches, or get unstuck on tricky bugs. It supports GPT-5.5, Gemini 3.1 Pro, Claude Opus 4.7,

DeepSeek V4 Pro, and MiniMax M2.7, with API and local CLI backends, multi-turn

threads, git diff context, web-mode clipboard export, and a live monitor TUI.

Why a second opinion?

A model reviewing its own work isn't an independent check. Even in a fresh

context, it shares the same training, priors, and many of the same failure

modes. A different model was trained differently and makes different

mistakes, so it's more likely to push back, challenge weak reasoning, or

expose a blind spot.

Model strengths are uneven and hard to predict per task. A quick cross-check

is cheap, and sometimes the other model is the one that finds the path forward.

How it works

1. Install skills into your agent (Claude Code, Codex, OpenCode)

2. Trigger with a slash command: /consult, /debate, /collab

3. The skill pipes your prompt into consult-llm, which calls the backend and streams the response back inline

code

> The hint unlocking is still broken after your fix. Ask gemini

⏺ Skill(consult-llm)
  ⎿  Successfully loaded skill · 1 tool allowed

⏺ Let me consult Gemini to understand this segmentation mismatch:

⏺ Bash(cat issue.md | consult-llm -m gemini -f HintsText.tsx -f hints.ts)
  ⎿ [model:gemini-3.1-pro-preview]

     The root cause is a race condition in `handleConfirmGenerateHints`.
     After `await generateHints()`, the component state updates in the
     background, but the function continues with a stale `activeSegment`
     variable, requesting a hint for "isn't" instead of "isn't it".

⏺ Gemini identified the real issue: a race condition with stale state
  after async operations. I was looking at the wrong thing entirely.

[!NOTE]
This project previously used MCP as its transport layer. If you were using that version, see the migration guide below.

What you can do

Get a second opinion from another model from your agent with relevant file context (/consult)
Have models debate the best approach and synthesize a consensus (/debate)
Use existing subscriptions via CLI backends without API keys (Gemini CLI, Codex CLI, Cursor CLI, OpenCode)
Continue conversations across requests with thread_id
Copy prompts to clipboard for browser-based LLMs (--web)
**Monitor active and past runs** in a real-time TUI

Quick Start

1. Install the binaries:

bash

# Homebrew
brew install raine/consult-llm/consult-llm

# Shell script (macOS/Linux)
curl -fsSL https://raw.githubusercontent.com/raine/consult-llm/main/scripts/install.sh | bash

2. Configure a backend. CLI backends are the easiest if you have an existing subscription (no API key needed):

bash

consult-llm config set gemini.backend gemini-cli   # requires: gemini login
consult-llm config set openai.backend codex-cli    # requires: codex login

Or set API keys:

bash

consult-llm config set openai.api_key your_openai_key
consult-llm config set gemini.api_key your_gemini_key
# or via environment variables: OPENAI_API_KEY, GEMINI_API_KEY, etc.

3. Verify your setup:

bash

consult-llm models                    # see available models and resolved selectors
echo "hello" | consult-llm -m gemini  # quick smoke test

4. Install the skills so your agent can call consult-llm for you:

bash

consult-llm install-skills

Then invoke skills from inside your agent (see Usage right below).

Example workflows

Some real-world examples. Click to expand.

Explain the problem, and tell CC to consult a smarter LLM

code

Consult LLM for assistance, use gemini

⏺ I see the issue! The problem is that the frontend is trying to use the hint
  for "isn't" but the server generated a hint for "isn't it" (the full phrase).
  This is a mismatch between how the frontend segments text and how the server
  generates hints.

  Let me consult the LLM to get assistance on this segmentation mismatch issue.

⏺ Write(hint-segmentation-issue.md)
  ⎿  Wrote 53 lines to hint-segmentation-issue.md
     # Hint Segmentation Mismatch Issue
     ## Problem Description
     There's a mismatch between how the frontend segments text for clickable
     words and how the backend generates hints. This causes hint unlocking to
     fail when the segments don't match exactly.
     ## Example Case
     **Original Text**: "Hello! Nice weather, isn't it?"
     … +43 lines (ctrl+r to expand)

⏺ Bash(cat hint-segmentation-issue.md | consult-llm -m gemini -f hint-segmentation-issue.md -f web/src/components/HintsText.tsx -f web/src/utils/hints.ts -f server/src/routes/public/hints.ts -f server/src/llm/hints.ts)
  ⎿  [model:gemini-3.1-pro-preview] [thread_id:a3f7c...]

     An excellent analysis of the problem. The root cause has been correctly
     identified: a race condition within the `handleConfirmGenerateHints`
     function in `HintsText.tsx`. The component state updates in the background
     after `await generateHints()`, but the function continues executing with a
     stale `activeSegment` variable, causing it to request a hint for a smaller
     text segment ( e.g., "isn't") instead of the full, newly-available hint
     phrase (e.g., "isn't it").
     … +607 lines (ctrl+r to expand)

⏺ Perfect! The LLM identified the core issue - there's a race condition in
  handleConfirmGenerateHints where the function uses stale state after async
  operations. The solution is to re-target the hint key after generation to
  find the correct matching hint.

  Let me implement this solution:

⏺ Update Todos
  ⎿  ☐ Fix hint segmentation mismatch by implementing smarter hint key
       targeting in handleConfirmGenerateHints

Using web mode to copy prompts to clipboard for browser-based LLMs

code

> I need help implementing a -b/--background flag for my Rust CLI. Use consult LLM with web mode

⏺ I'll use web mode to copy the formatted prompt to your clipboard so you can
  paste it into any browser-based LLM service.

⏺ Bash(cat  [User pastes the response from the external LLM]

⏺ Thank you! Based on the recommendations, I'll now implement the solution...

This is useful when:

You want to use a free browser-based LLM service instead of API credits
You prefer a specific LLM's web interface
You want to review the full prompt before submitting it

Debugging Neovim plugin treesitter API breakage

⏺ Fixed! Added nil checks for both node and child and updated the iter_matches call to handle the newer API properly.

> Still not working:

  E5108: Error executing lua /Users/raine/code/testonly.nvim/lua/testonly.lua:92: attempt to call method 'child' (a nil value)
  stack traceback:
          /Users/raine/code/testonly.nvim/lua/testonly.lua:92: in function 'reset_all_exclusive'
          /Users/raine/code/testonly.nvim/lua/testonly.lua:110: in function 'reset_all_exclusive'
          [string ":lua"]:1: in main chunk

  Ask gemini

⏺ Let me consult with a more powerful AI to understand the current Neovim treesitter API changes:

⏺ Bash(cat 

Using slash command to consult multiple LLMs for comparison

/consult The branch completions should dynamic. Git should be called only when

using tab to autocomplete. Ask gemini and codex

⏺ I'll consult both Gemini and Codex about making the branch completions dynamic

(only calling git when tab-completing). Let me gather the relevant context first.

⏺ Bash(cat

Usage

The CLI is invoked by your agent via the installed skills; you don't call it directly. From inside Claude Code, OpenCode, or Codex:

code

/consult what's the best way to model this state machine?
/consult --gemini review this design for edge cases
/debate should this be a separate service or stay in the monolith?

CLI utilities

bash

consult-llm models                    # list available models and resolved selectors
consult-llm doctor                    # diagnose backend auth and config
consult-llm config set    # set a config value (user config by default)
consult-llm init-config               # scaffold ~/.config/consult-llm/config.yaml
consult-llm init-prompt               # scaffold ~/.config/consult-llm/SYSTEM_PROMPT.md
consult-llm install-skills            # install bundled skills to platform skill dirs
consult-llm update                    # self-update the binary

consult-llm models shows which models are active based on the configuration loaded for the current directory, useful for verifying that project-level config overrides and allowed_models restrictions are taking effect as expected.

consult-llm doctor checks that each provider's backend dependency (API key or CLI binary) is satisfied, shows which config files were loaded, and validates session storage. Pass --verbose to see all config keys including unset defaults.

Providers & Configuration

consult-llm separates model families from backends.

A model family is what you ask for: gemini, openai, deepseek, minimax, or anthropic.

A backend is how consult-llm reaches that model family:

**api**: direct HTTP calls using an API key
CLI backends: shell out to a local CLI tool already installed and logged in

Model family	`api` backend	CLI backends available	API key env var
Gemini	yes	`gemini-cli`, `cursor-cli`, `opencode`	`GEMINI_API_KEY`
OpenAI	yes	`codex-cli`, `cursor-cli`, `opencode`	`OPENAI_API_KEY`
DeepSeek	yes	`opencode`	`DEEPSEEK_API_KEY`
MiniMax	yes	`opencode`	`MINIMAX_API_KEY`
Anthropic	yes	none	`ANTHROPIC_API_KEY`

API backend

Direct HTTP calls to the provider. Requires an API key. Set it in your user config or as an environment variable:

bash

# User config (recommended, persists across sessions)
consult-llm config set openai.api_key your_openai_key
consult-llm config set gemini.api_key your_gemini_key

# Or as environment variables
export OPENAI_API_KEY=your_openai_key
export GEMINI_API_KEY=your_gemini_key

The api backend is the default. To set it explicitly:

bash

consult-llm config set gemini.backend api
consult-llm config set openai.backend api

CLI backends

Shell out to an already-installed local CLI. No API keys needed in consult-llm; authentication is handled by the CLI tool.

A key advantage over the API backend: CLI agents can browse your codebase, run commands, and do their own research before responding. The API backend receives only the prompt and files you explicitly include.

Gemini CLI: requires the Gemini CLI and gemini login:

bash

consult-llm config set gemini.backend gemini-cli

Codex CLI: requires Codex CLI and codex login:

bash

consult-llm config set openai.backend codex-cli
consult-llm config set openai.reasoning_effort high  # none | minimal | low | medium | high | xhigh

# Optional: append extra args to every `codex exec` invocation. Shell-quoted.
# Useful e.g. to skip the sandbox in environments that already isolate Codex:
consult-llm config set openai.extra_args '--dangerously-bypass-approvals-and-sandbox'

The same extra_args field is supported on gemini: for the Gemini CLI backend.

Cursor CLI: routes through cursor-agent:

bash

consult-llm config set openai.backend cursor-cli
consult-llm config set gemini.backend cursor-cli

If your prompts need shell commands in Cursor CLI ask mode, allow them in ~/.cursor/cli-config.json.

OpenCode: routes through opencode to Copilot, OpenRouter, or other providers:

bash

consult-llm config set openai.backend opencode
consult-llm config set gemini.backend opencode
consult-llm config set deepseek.backend opencode
consult-llm config set minimax.backend opencode

# Optional: configure OpenCode provider routing
consult-llm config set opencode.default_provider copilot
consult-llm config set openai.opencode_provider openai

Multi-turn conversations

CLI backends support multi-turn conversations. The first response includes a

[thread_id:xxx] prefix; pass that ID back with --thread-id to continue

the conversation with full context from prior turns.

code

> Ask codex what's the best caching strategy for our read-heavy API

⏺ Bash(cat > ~/.gitignore_global

If you use workmux worktrees, symlink it into new worktrees automatically by adding it to your .workmux.yaml:

yaml

files:
  symlink:
    - .consult-llm.local.yaml

Scaffold the user config and set values:

bash

consult-llm init-config
consult-llm config set default_model gemini
consult-llm config set gemini.backend gemini-cli
# Write to project config instead of user config:
consult-llm config set --project default_model openai
# Write to local project overrides (not committed):
consult-llm config set --local openai.backend codex-cli

Values are parsed as YAML, so booleans and lists work naturally:

bash

consult-llm config set no_update_check true
consult-llm config set allowed_models '[gemini, openai]'

Example ~/.config/consult-llm/config.yaml:

yaml

default_model: gemini

gemini:
  backend: gemini-cli

openai:
  backend: codex-cli
  reasoning_effort: high

opencode:
  default_provider: copilot

API keys

API keys can be set in your user config, a project-local config file, or as environment variables. Environment variables take highest precedence.

User config (~/.config/consult-llm/config.yaml), applies everywhere:

yaml

openai:
  api_key: your_openai_key
gemini:
  api_key: your_gemini_key

Project-local config (.consult-llm.local.yaml in the repo root, gitignored), overrides the user config for that project:

yaml

openai:
  api_key: your_project_specific_key

API keys are not allowed in .consult-llm.yaml (the committed project config). The tool will refuse to load it and tell you to move the key to .consult-llm.local.yaml.

Environment variables (highest precedence, useful for CI):

OPENAI_API_KEY
GEMINI_API_KEY
ANTHROPIC_API_KEY
DEEPSEEK_API_KEY
MINIMAX_API_KEY

**direnv** is an alternative to .consult-llm.local.yaml for project-specific keys via environment variables. Add a .envrc in the repo root and direnv allow it, then put keys in a .env file (both gitignored):

bash

# .envrc
dotenv

bash

# .env
OPENAI_API_KEY=your_project_specific_key

direnv loads the variables automatically when you enter the directory and unloads them when you leave.

Custom system prompt

bash

consult-llm init-prompt   # scaffold ~/.config/consult-llm/SYSTEM_PROMPT.md

Override the path in config:

yaml

system_prompt_path: /path/to/project/.consult-llm/SYSTEM_PROMPT.md

All environment variables

Environment variables override config file values.

Variable	Description	Allowed values	Default
`OPENAI_API_KEY`	OpenAI API key
`GEMINI_API_KEY`	Gemini API key
`ANTHROPIC_API_KEY`	Anthropic API key
`DEEPSEEK_API_KEY`	DeepSeek API key
`MINIMAX_API_KEY`	MiniMax API key
`CONSULT_LLM_DEFAULT_MODEL`	Model or selector to use when `-m` is omitted	selector or exact model ID	first available
`CONSULT_LLM_GEMINI_BACKEND`	Backend for Gemini models	`api` `gemini-cli` `cursor-cli` `opencode`	`api`
`CONSULT_LLM_OPENAI_BACKEND`	Backend for OpenAI models	`api` `codex-cli` `cursor-cli` `opencode`	`api`
`CONSULT_LLM_DEEPSEEK_BACKEND`	Backend for DeepSeek models	`api` `opencode`	`api`
`CONSULT_LLM_MINIMAX_BACKEND`	Backend for MiniMax models	`api` `opencode`	`api`
`CONSULT_LLM_ANTHROPIC_BACKEND`	Backend for Anthropic models	`api`	`api`
`CONSULT_LLM_ALLOWED_MODELS`	Comma-separated allowlist; restricts which models are enabled	model IDs	all
`CONSULT_LLM_EXTRA_MODELS`	Comma-separated extra model IDs to add to the catalog	model IDs
`CONSULT_LLM_CODEX_REASONING_EFFORT`	Reasoning effort for Codex CLI backend	`none` `minimal` `low` `medium` `high` `xhigh`	`high`
`CONSULT_LLM_CODEX_EXTRA_ARGS`	Extra CLI args appended to `codex exec` (shell-quoted)	e.g. `--dangerously-bypass-approvals-and-sandbox`
`CONSULT_LLM_GEMINI_EXTRA_ARGS`	Extra CLI args appended to `gemini` (shell-quoted)	shell-quoted args
`CONSULT_LLM_OPENCODE_PROVIDER`	Default OpenCode provider prefix for all models	provider name	per-model default
`CONSULT_LLM_OPENCODE_OPENAI_PROVIDER`	OpenCode provider for OpenAI models	provider name	`openai`
`CONSULT_LLM_OPENCODE_GEMINI_PROVIDER`	OpenCode provider for Gemini models	provider name	`google`
`CONSULT_LLM_OPENCODE_DEEPSEEK_PROVIDER`	OpenCode provider for DeepSeek models	provider name	`deepseek`
`CONSULT_LLM_OPENCODE_MINIMAX_PROVIDER`	OpenCode provider for MiniMax models	provider name	`minimax`
`CONSULT_LLM_SYSTEM_PROMPT_PATH`	Path to a custom system prompt file	file path	`~/.config/consult-llm/SYSTEM_PROMPT.md`
`CONSULT_LLM_NO_UPDATE_CHECK`	Disable background update checks	`1` `true` `yes`

Logging

All prompts and responses are logged to:

text

$XDG_STATE_HOME/consult-llm/consult-llm.log

Default: ~/.local/state/consult-llm/consult-llm.log

Each entry includes tool call arguments, the full prompt, the full response,

and token usage with cost estimates.

Example log entry

code

[2025-06-22T20:16:04.675Z] PROMPT (model: deepseek-v4-pro):
## Relevant Files

### File: src/main.ts

...

Please provide specific suggestions for refactoring with example code structure
where helpful.
================================================================================
[2025-06-22T20:19:20.632Z] RESPONSE (model: deepseek-v4-pro):
Based on the analysis, here are the key refactoring suggestions to improve
separation of concerns and maintainability:

...

This refactoring maintains all existing functionality while significantly
improving maintainability and separation of concerns.

Tokens: 3440 input, 5880 output | Cost: $0.014769 (input: $0.001892, output: $0.012877)
================================================================================

Monitor

consult-llm-monitor is a real-time TUI for active runs and history.

bash

consult-llm-monitor

It reads the per-run spool written by consult-llm, including active snapshots,

run metadata, event streams, and shared history.

Skills

Architecture

The skill system has two layers:

**consult-llm (base CLI)** handles the mechanics: reading stdin, attaching file context, calling the right backend, streaming the response, and managing thread IDs for multi-turn conversations. A dedicated consult-llm reference skill documents this contract and is loaded by other skills before they invoke the CLI.

Workflow skills compose on top. They gather context from the codebase, decide which models to call and how, and synthesize the results for you. When you run /consult or /debate, the agent reads a skill file that tells it how to orchestrate one or more consult-llm calls and what to do with the responses.

Invocation

When a workflow skill runs, the agent pipes the prompt via stdin and passes file context with -f:

bash

cat ` flags matching the selectors reported by `consult-llm models` (e.g. `--gemini`, `--openai`, `--deepseek`). With no selector flag, the multi-model skills default to consulting all available selectors.

- [`consult`](skills/consult/SKILL.md): ask one or more external LLMs; any number of `--` flags, plus `--browser` for clipboard/web mode
- [`collab`](skills/collab/SKILL.md): multiple LLMs brainstorm together, building on each other's ideas
- [`collab-vs`](skills/collab-vs/SKILL.md): the agent brainstorms with one partner LLM (`--` required) in alternating turns
- [`debate`](skills/debate/SKILL.md): multiple LLMs propose and critique competing approaches
- [`debate-vs`](skills/debate-vs/SKILL.md): the agent debates one opponent LLM (`--` required), then synthesizes the best answer
- [`panel`](skills/panel/SKILL.md): role-asymmetric advisory panel; each model speaks from one expert lens, agent synthesizes a trade-off resolution. The agent picks roles to fit the task (with a `--roles` override). Modes: `--mode design` (default) or `--mode review` for diff critique
- [`review-panel`](skills/review-panel/SKILL.md): standalone multi-model code review of a diff with identical prompts; agent dedupes findings by severity/confidence. Read-only by default; `--fix` opt-in for localized must-fix items
- [`implement`](skills/implement/SKILL.md): autonomous spec → plan → review → implement → red-team workflow. Evidence-gated reviewers, written feedback ledger, triggered debug loop, opt-in commits. Rigor knob: `--rigor lite|standard|deep`
- [`phased-implement`](skills/phased-implement/SKILL.md): coordinator that breaks a large task into a DAG of phases, each running `/implement` in its own [workmux](https://github.com/raine/workmux) worktree. Supports sequential, parallel, and mixed dependencies; per-phase merge with `/merge --keep` and ancestry verification; failure halts dependents. Requires `workmux`
- [`workshop`](skills/workshop/SKILL.md): interactive design session — agent clarifies the idea with the user, fans out to multiple LLMs in parallel for divergent approach generation, user picks one, then co-design with optional multi-LLM critique. Saves a design doc; hand it to `/implement` to build

See `skills/*/SKILL.md` for the exact prompts and invocation patterns.

## Updating

consult-llm update

code

This downloads the latest GitHub release, verifies its SHA-256 checksum, updates
`consult-llm`, and updates `consult-llm-monitor` if it lives alongside it.

## Migrating from MCP

If you previously used the MCP server version (`consult-llm-mcp` npm package):

1. **Install the CLI binary** (see [Quick Start](#quick-start)).

2. **Install skills** so your agent can call `consult-llm` for you:

consult-llm install-skills

code

3. **Migrate your config.** Any env vars you set in the MCP `"env"` block can move to `~/.config/consult-llm/config.yaml`, including API keys.

   For example, this MCP config in `~/.claude.json`:

"mcpServers": {

"consult-llm": {

"command": "npx",

"args": ["-y", "consult-llm-mcp"],

"env": {

"CONSULT_LLM_GEMINI_BACKEND": "api",

"CONSULT_LLM_OPENAI_BACKEND": "codex-cli",

"CONSULT_LLM_CODEX_REASONING_EFFORT": "xhigh",

"CONSULT_LLM_ALLOWED_MODELS": "gpt-5.4,gemini-3.1-pro-preview,MiniMax-M2.7",

"CONSULT_LLM_MINIMAX_BACKEND": "opencode",

"CONSULT_LLM_OPENCODE_MINIMAX_PROVIDER": "minimax"

}

code

becomes:

allowed_models: [gpt-5.4, gemini-3.1-pro-preview, MiniMax-M2.7]

gemini:

backend: api

openai:

backend: codex-cli

reasoning_effort: xhigh

minimax:

backend: opencode

opencode_provider: minimax

code

Put this in `~/.config/consult-llm/config.yaml` for user-wide settings, or in `.consult-llm.yaml` at the project root if the settings were specific to that project.

4. **Remove the MCP server registration** from your Claude Code config (`~/.claude.json`):

"mcpServers": {

// remove this entry:

"consult-llm": { ... }

}

code

5. **Uninstall the npm package** if you installed it globally:

npm uninstall -g consult-llm-mcp

code

## Development

git clone https://github.com/raine/consult-llm.git

cd consult-llm

just check

code

`just check` runs the standard local validation, including build and tests. Use `cargo build` or `cargo test` directly only when iterating on one step.

Try the local binary directly:

cat <<'EOF' | cargo run -- -m gemini

Sanity-check the local build and explain what this CLI does well.

EOF

code

## Releasing

See [RELEASE.md](RELEASE.md).

## Related Projects

- [workmux](https://github.com/raine/workmux)
- [claude-history](https://github.com/raine/claude-history)
- [tmux-file-picker](https://github.com/raine/tmux-file-picker)
- [tmux-agent-usage](https://github.com/raine/tmux-agent-usage)

Consult Llm Mcp

Documentation

Why a second opinion?

How it works

What you can do

Quick Start

Example workflows

Usage

CLI utilities

Providers & Configuration

API backend

CLI backends

Multi-turn conversations

API keys

Custom system prompt

Logging

Monitor

Skills

Architecture

Invocation

Similar MCP

Mcp Open Library

Metmuseum Mcp

Ashra Mcp

Mcp Browser Kit

Trending MCP

Playwright Mcp

Serena

Mcp Playwright

Mcp Server Cloudflare

Similar MCP

Mcp Open Library

Metmuseum Mcp

Ashra Mcp

Mcp Browser Kit

Trending MCP

Playwright Mcp

Serena

Mcp Playwright

Mcp Server Cloudflare