Track MCP LogoTrack MCP
Track MCP LogoTrack MCP

The world's largest repository of Model Context Protocol servers. Discover, explore, and submit MCP tools.

Product

  • Categories
  • Top MCP
  • New & Updated
  • Submit MCP

Company

  • About

Legal

  • Privacy Policy
  • Terms of Service
  • Cookie Policy

© 2026 TrackMCP. All rights reserved.

Built with ❤️ by Krishna Goyal

    Locallama Mcp

    An MCP Server that works with Roo Code/Cline.Bot/Claude Desktop to optimize costs by intelligently routing coding tasks between local LLMs free APIs and paid APIs.

    40 stars
    TypeScript
    Updated Oct 6, 2025
    clinebot
    mcp-server
    roocode
    vscode

    Table of Contents

    • Overview
    • Requirements
    • Installation
    • Configuration
    • Key environment variables
    • MCP Client Configuration
    • Tools
    • Core tools (always available)
    • OpenRouter tools (require OPENROUTER_API_KEY)
    • Async task flow
    • Resources
    • Static resources
    • Resource templates
    • Usage
    • Starting the server
    • Running benchmarks
    • Dashboard
    • Live monitoring metadata
    • _server_reminder ambient metadata
    • Remote access
    • Provider integrations
    • Ollama
    • LM Studio
    • llama.cpp (llama-server)
    • OpenRouter
    • Code search
    • Development
    • Architecture
    • Project docs
    • Troubleshooting
    • Security notes
    • License

    Table of Contents

    • Overview
    • Requirements
    • Installation
    • Configuration
    • Key environment variables
    • MCP Client Configuration
    • Tools
    • Core tools (always available)
    • OpenRouter tools (require OPENROUTER_API_KEY)
    • Async task flow
    • Resources
    • Static resources
    • Resource templates
    • Usage
    • Starting the server
    • Running benchmarks
    • Dashboard
    • Live monitoring metadata
    • _server_reminder ambient metadata
    • Remote access
    • Provider integrations
    • Ollama
    • LM Studio
    • llama.cpp (llama-server)
    • OpenRouter
    • Code search
    • Development
    • Architecture
    • Project docs
    • Troubleshooting
    • Security notes
    • License

    Documentation

    LocalLama MCP Server

    Status: experimental

    Latest release

    License: ISC

    Local-first, provider-neutral Model Context Protocol server for coding-agent workflows. Routes tasks across local models (Ollama, LM Studio, llama.cpp), free OpenRouter models, and paid frontier models using cost, latency, context capacity, and benchmark history.

    Node.js: >=22

    ⚠️ Early / experimental — not yet a stable release. This project is under active, rapid development and has not been fully verified end-to-end. MCP tool signatures, configuration, and behavior may change between releases without notice.

    Version numbers follow SemVer mechanically (they're derived from Conventional Commit messages, not hand-picked), so a 1.x number signals only *"a public surface exists"* — it is not a promise of stability or completeness. If you depend on this server, pin to an exact version.

    - **Tagged releases on main** are the relatively safer builds.

    - **The testing channel** publishes bleeding-edge pre-releases (x.y.z-testing.n) for trying unproven changes early.

    Overview

    LocalLama MCP reduces token costs without sacrificing quality. Tasks are queued asynchronously — route_task returns a task_id immediately; callers poll get_task_status for results. The decision engine chooses local → free → paid based on measured provider capabilities and configurable thresholds.

    Supported MCP clients: Codex, Claude Code, Claw Code, Cursor, GitHub Copilot Agent mode, and any generic MCP stdio client.

    Requirements

    • Node.js 22+
    • npm
    • At least one of: Ollama, LM Studio, llama.cpp server, or an OpenRouter API key

    Installation

    bash
    git clone https://github.com/Heratiki/locallama-mcp.git
    cd locallama-mcp
    npm install
    npm run build

    Configuration

    Copy .env.example to .env and edit with your values. The server resolves .env from its own root directory (or LOCALLAMA_ROOT_DIR when set), not from the MCP host's CWD.

    env
    # Local LLM Endpoints
    LM_STUDIO_ENDPOINT=http://localhost:1234/v1
    OLLAMA_ENDPOINT=http://localhost:11434/api
    # LLAMA_CPP_ENDPOINT=http://localhost:8080   # leave unset to disable
    
    # Routing thresholds
    DEFAULT_LOCAL_MODEL=qwen2.5-coder-3b-instruct
    TOKEN_THRESHOLD=1500
    COST_THRESHOLD=0.02
    QUALITY_THRESHOLD=0.7
    
    # Provider concurrency
    PROVIDER_HEALTH_PROBE_INTERVAL_MS=60000
    PROVIDER_MAX_CONCURRENT_LOCAL=1
    PROVIDER_MAX_CONCURRENT_REMOTE=5
    PROVIDER_TIMEOUT_MS=120000
    OLLAMA_TIMEOUT=120
    
    # Code search (native BM25, no Python required)
    CODE_SEARCH_ENABLED=true
    CODE_SEARCH_EXCLUDE_PATTERNS=["node_modules/**","dist/**",".git/**"]
    CODE_SEARCH_INDEX_ON_START=true
    CODE_SEARCH_REINDEX_INTERVAL=3600
    
    # Benchmarks
    BENCHMARK_RUNS_PER_TASK=3
    BENCHMARK_PARALLEL=false
    BENCHMARK_MAX_PARALLEL_TASKS=2
    BENCHMARK_TASK_TIMEOUT=60000
    BENCHMARK_SAVE_RESULTS=true
    BENCHMARK_RESULTS_PATH=./benchmark-results
    RELIABLE_BENCHMARK_COUNT=3
    MIN_VALIDATOR_SCORE=0.6
    VALIDATION_RETRY_BUDGET=1
    
    # Lock file
    LOCK_FILE_CHECK_ACTIVE_PROCESS=true
    REMOVE_STALE_LOCK_FILES=true
    
    # OpenRouter (optional)
    OPENROUTER_API_KEY=your_openrouter_api_key_here
    OPENROUTER_FREE_ONLY=false
    
    # Logging
    LOG_LEVEL=debug
    
    # Operational testing
    # EXPECT_LOCAL_PROVIDER_DOWN=true

    Key environment variables

    VariableDefaultDescription
    LM_STUDIO_ENDPOINT—LM Studio API base URL
    OLLAMA_ENDPOINT—Ollama API base URL
    LLAMA_CPP_ENDPOINT—llama-server URL; leave unset to disable provider
    DEFAULT_LOCAL_MODEL—Model name used when offloading to local provider
    TOKEN_THRESHOLD1500Token count above which local offload is considered
    COST_THRESHOLD0.02USD cost above which local offload is preferred
    QUALITY_THRESHOLD0.7Quality score below which paid API is always used
    RELIABLE_BENCHMARK_COUNT3Benchmark runs required before empirical scores are treated as fully reliable
    MIN_VALIDATOR_SCORE0.6Minimum validation score required before a model is eligible for external validation
    VALIDATION_RETRY_BUDGET1Validation retry attempts allowed after an initial failed validation
    PROVIDER_MAX_CONCURRENT_LOCAL1Shared local execution slot count
    PROVIDER_MAX_CONCURRENT_REMOTE5Per-remote-provider slot count
    OPENROUTER_API_KEY—Enables OpenRouter provider and related tools
    OPENROUTER_FREE_ONLYfalseRestrict OpenRouter to free-tier models only
    EXPECT_LOCAL_PROVIDER_DOWN—Set true in test-operational.mjs to assert no local suggestion

    MCP Client Configuration

    Build the server, then point your MCP client at node dist/index.js:

    json
    {
      "mcpServers": {
        "locallama": {
          "command": "node",
          "args": ["/path/to/locallama-mcp/dist/index.js"],
          "env": {
            "LM_STUDIO_ENDPOINT": "http://localhost:1234/v1",
            "OLLAMA_ENDPOINT": "http://localhost:11434/api",
            "DEFAULT_LOCAL_MODEL": "qwen2.5-coder-3b-instruct",
            "TOKEN_THRESHOLD": "1500",
            "COST_THRESHOLD": "0.02",
            "QUALITY_THRESHOLD": "0.07",
            "OPENROUTER_API_KEY": "your_openrouter_api_key_here"
          }
        }
      }
    }

    Claude Code users can place this in .mcp.json (project-scoped) or ~/.claude/settings.json (global).

    Tools

    Core tools (always available)

    ToolInputsDescription
    route_tasktask, context_length, expected_output_length?, complexity?, priority?, preemptive?Queue a task asynchronously. Returns task_id immediately. Poll get_task_status for results.
    get_task_statustask_idPoll a non-blocking route_task submission. Returns status, progress, and inline result when complete.
    cancel_tasktask_idCancel all queued or in-progress jobs for a task.
    cancel_jobjob_idCancel a single background job.
    preemptive_route_tasktask, context_length, expected_output_length?, complexity?, priority?Heuristic routing check with no LLM calls. Returns model/provider recommendation without executing the task.
    get_cost_estimatecontext_length, expected_output_length?, model?Estimate USD cost before calling route_task. Local and free-tier models return 0.
    benchmark_tasktask_id, task, context_length, expected_output_length?, complexity?, local_model?, paid_model?, runs_per_task?Benchmark one task across local vs paid models.
    benchmark_taskstasks[], runs_per_task?, parallel?, max_parallel_tasks?Benchmark multiple tasks in one call.
    benchmark_modelmodel_id, provider_id?, task_categories?Run built-in benchmark suites against a specific model. Persists results to benchmarks.db and updates ModelRegistry capability scores.
    retriv_initdirectories[], exclude_patterns?, chunk_size?, force_reindex?, bm25_options?Index code with the native BM25 engine (no Python required).
    retriv_searchquery, limit?Search indexed code using native BM25.
    reload_config—Reload .env at runtime. Atomic: invalid config is rejected.
    check_for_updates—Check whether the server is up to date with the latest GitHub commit.
    update_server—Pull latest changes from GitHub, run npm install and npm run build. Restart the server manually after.

    OpenRouter tools (require OPENROUTER_API_KEY)

    ToolInputsDescription
    get_free_models—List free models available from OpenRouter.
    clear_openrouter_tracking—Clear cached model list and force a fresh fetch.
    benchmark_free_modelstasks[], runs_per_task?, parallel?, max_parallel_tasks?Benchmark free OpenRouter models. Results written to benchmarks.db.
    set_model_prompting_strategymodel_id, system_prompt, user_prompt, use_chat, assistant_prompt?, success_rate?, quality_score?Set a custom prompting strategy for an OpenRouter model.

    Async task flow

    code
    route_task → { task_id }
                    ↓ poll
    get_task_status → { status: "pending" | "in_progress" | "completed" | "failed", result? }

    When local providers are contended by benchmark workloads, route_task surfaces contention metadata:

    json
    {
      "task_id": "...",
      "status": "queued",
      "queue_position": 2,
      "benchmark_contention": {
        "local_slot_contended": true,
        "active_benchmark_runs": 1,
        "queued_benchmark_runs": 2,
        "message": "Local execution slot currently contended by benchmark workloads."
      }
    }

    Resources

    Static resources

    URIDescription
    locallama://statusServer status
    locallama://modelsAvailable local models
    locallama://jobs/activeCurrently active jobs
    locallama://memory-bankMemory bank file list (if directory exists)
    locallama://openrouter/modelsAll OpenRouter models (requires API key)
    locallama://openrouter/free-modelsFree OpenRouter models (requires API key)
    locallama://openrouter/statusOpenRouter integration status (requires API key)

    Resource templates

    URI templateDescription
    locallama://usage/{api}Token usage and costs for a specific API (e.g. openrouter)
    locallama://jobs/progress/{jobId}Progress for a specific job
    locallama://openrouter/model/{modelId}Details for an OpenRouter model (requires API key)
    locallama://openrouter/prompting-strategy/{modelId}Prompting strategy for an OpenRouter model (requires API key)

    Usage

    Starting the server

    bash
    npm start

    A lock file prevents multiple instances. Stale locks from crashed processes are detected and cleaned up automatically.

    Running benchmarks

    bash
    npm run benchmark
    npm run benchmark:comprehensive

    Results are stored in benchmark-results/ as JSON and Markdown summaries.

    Dashboard

    When the server is running, a web dashboard is available at http://localhost:3001 (server-local).

    Features:

    • Real-time job queue with status, provider/model, and queue position
    • Task monitoring with per-job details and ETA
    • Manual route_task submission form
    • Task and job cancellation
    • Benchmark history

    REST API endpoints:

    MethodPathDescription
    GET/api/queueQueue summary and jobs. Filters: status, provider, model, task_id, q, page, page_size
    GET/api/tasksRecent tasks. Filters: status, provider, model, q, page, page_size
    GET/api/tasks/:taskIdDetailed task status
    POST/api/tasksSubmit a task (route_task)
    POST/api/tasks/:taskId/cancelCancel a task
    POST/api/jobs/:jobId/cancelCancel a job

    Example submission:

    bash
    curl -X POST http://localhost:3001/api/tasks \
      -H "Content-Type: application/json" \
      -d '{"task": "Refactor parser for readability", "context_length": 4096, "complexity": 0.6, "priority": "quality"}'

    Live monitoring metadata

    When the JobTracker WebSocket server is running, task-executing tools include:

    json
    {
      "task_id": "task-123",
      "monitoring": {
        "websocketUrl": "ws://127.0.0.1:8081",
        "activeJobsUri": "locallama://jobs/active",
        "jobProgressUriTemplate": "locallama://jobs/progress/{jobId}",
        "note": "Connect to websocketUrl for live updates, or use MCP resources."
      }
    }

    websocketUrl is scope: server-local — in SSH/container/Codespaces/WSL setups, forward the port before connecting.

    _server_reminder ambient metadata

    Tools attach a _server_reminder field at most once every 30 minutes to surface monitoring info:

    json
    {
      "_server_reminder": {
        "schemaVersion": 1,
        "kind": "monitoring-reminder",
        "status": "reachable",
        "scope": "server-local",
        "message": "Optional monitoring available from MCP server host.",
        "monitoringUrl": "http://127.0.0.1:3001",
        "lastCheckedAt": 1747699200000
      }
    }

    Remote access

    If your MCP client is not on the same machine as the server:

    bash
    # SSH
    ssh -L 8081:127.0.0.1:8081 -L 3001:127.0.0.1:3001 user@host
    • Dev Containers / Codespaces: forward ports 8081 (WebSocket) and 3001 (dashboard) via the VS Code Ports view.
    • WSL client + WSL server: use the WebSocket URL directly. Windows client + WSL server: forward port 8081 via VS Code or a local tunnel.

    Provider integrations

    Ollama

    Set OLLAMA_ENDPOINT in .env. The server probes for available models on startup.

    LM Studio

    Set LM_STUDIO_ENDPOINT in .env. Exposes an OpenAI-compatible API.

    llama.cpp (llama-server)

    bash
    # Single model
    llama-server -m /path/to/model.gguf --port 8080
    
    # Router mode (multiple models)
    llama-server --model /path/model1.gguf --model /path/model2.gguf --port 8080

    Set LLAMA_CPP_ENDPOINT=http://localhost:8080 in .env. If the endpoint is unset or unreachable, the provider initialises silently — other providers are unaffected. The server does not manage the llama-server process lifecycle.

    OpenRouter

    Set OPENROUTER_API_KEY. The server fetches ~240 available models on startup (30+ free). Use clear_openrouter_tracking to force a refresh. Set OPENROUTER_FREE_ONLY=true to restrict to free-tier models.

    Code search

    Code search uses a native TypeScript BM25 engine — no Python or external dependencies required.

    code
    # Via MCP tool
    retriv_init { "directories": ["/path/to/repo"], "force_reindex": true }
    retriv_search { "query": "pagination logic" }

    Development

    bash
    npm run build        # compile TypeScript + copy assets
    npm start            # run compiled server
    npm run dev          # TypeScript watch mode
    npm test             # build + run Jest (23 suites, 186 tests)
    npm run lint         # ESLint (note: eslint-plugin-import not installed — lint currently fails)
    npm run lint:fix     # ESLint with auto-fix

    All test files mock server state to prevent multiple real instances during test runs.

    Architecture

    code
    src/
      index.ts                        entry point, lock file, MCP lifecycle
      modules/
        api-integration/              tool definitions, resources, routing adapters
        decision-engine/              task analysis, model selection, coordination
        cost-monitor/                 token accounting, cost estimation
        benchmark/                    execution, scoring, summaries, DB storage
        lm-studio/                    LM Studio provider
        ollama/                       Ollama provider
        llama-cpp/                    llama-server provider
        openrouter/                   OpenRouter provider
        core/provider/                shared provider registry and execution queue
        updater/                      self-update logic (check_for_updates, update_server)
        job-store/                    persistent Task/Job store
        websocket-server/             live monitoring side channel

    Decision engine uses two model data stores:

    • ModelRegistry + CapabilityDetector: benchmark-derived capability scores (authoritative for full routing)
    • modelsDbService: heuristic performance data seeded from ModelRegistry at startup; used by preemptiveRouting()

    Project docs

    FilePurpose
    docs/AGENTS.mdShared operating guide for all coding agents
    docs/PROJECT_STATE.mdCurrent snapshot of completed and in-progress work
    docs/ROADMAP.mdLong-form modernization backdrop
    docs/ROADMAP_ACTIVE.mdActive roadmap tasks
    docs/PLAN.mdBranch implementation plan
    docs/OPERATIONAL_TEST_PLAN.mdLive test record and verified behavior
    docs/LIVE_TESTING.mdReal-world MCP test results and known open bugs
    docs/audits/ARCHITECTURAL_TRUTHS.mdCore design principles and constraints
    docs/history/memory-bank/Historical append-only project memory

    Troubleshooting

    Server won't start — lock file detected

    1. Check if another instance is running (ps aux | grep locallama).

    2. Stale locks from crashes are cleaned up automatically (REMOVE_STALE_LOCK_FILES=true).

    3. If needed, manually remove locallama.lock from the project root.

    OpenRouter models not appearing

    Use clear_openrouter_tracking through the MCP interface to force a fresh fetch.

    **npm run lint fails**

    eslint-plugin-import is referenced in the config but not installed. Known issue. Build and tests are unaffected.

    Security notes

    • API keys belong in .env, which is excluded from version control.
    • All log output goes to stderr; stdout is reserved for MCP JSON-RPC. Never write non-JSON to stdout.
    • Treat MCP tools as model-controlled surfaces. Avoid mutations without user approval.

    License

    ISC

    Similar MCP

    Based on tags & features

    • MC

      Mcp Open Library

      TypeScript·
      42
    • ME

      Metmuseum Mcp

      TypeScript·
      14
    • MC

      Mcp Ipfs

      TypeScript·
      11
    • LI

      Liveblocks Mcp Server

      TypeScript·
      11

    Trending MCP

    Most active this week

    • PL

      Playwright Mcp

      TypeScript·
      22.1k
    • SE

      Serena

      Python·
      14.5k
    • MC

      Mcp Playwright

      TypeScript·
      4.9k
    • MC

      Mcp Server Cloudflare

      TypeScript·
      3.0k
    View All MCP Servers

    Similar MCP

    Based on tags & features

    • MC

      Mcp Open Library

      TypeScript·
      42
    • ME

      Metmuseum Mcp

      TypeScript·
      14
    • MC

      Mcp Ipfs

      TypeScript·
      11
    • LI

      Liveblocks Mcp Server

      TypeScript·
      11

    Trending MCP

    Most active this week

    • PL

      Playwright Mcp

      TypeScript·
      22.1k
    • SE

      Serena

      Python·
      14.5k
    • MC

      Mcp Playwright

      TypeScript·
      4.9k
    • MC

      Mcp Server Cloudflare

      TypeScript·
      3.0k