Track MCP LogoTrack MCP
Track MCP LogoTrack MCP

The world's largest repository of Model Context Protocol servers. Discover, explore, and submit MCP tools.

Product

  • Categories
  • Top MCP
  • New & Updated
  • Submit MCP

Company

  • About

Legal

  • Privacy Policy
  • Terms of Service
  • Cookie Policy

© 2026 TrackMCP. All rights reserved.

Built with ❤️ by Krishna Goyal

    Knowledge Base Mcp Server

    This MCP server provides tools for listing and retrieving content from different knowledge bases.

    31 stars
    TypeScript
    Updated Oct 30, 2025

    Table of Contents

    • Demo
    • Setup Instructions
    • Install (one command)
    • Install (CLI alongside the MCP server, RFC 012)
    • Research evidence packets for agents
    • Local retrieval benchmarks
    • Local LLM answers (RFC 015)
    • Comparing embedding models (RFC 013)
    • MCP error codes
    • Install via Smithery
    • Install from source
    • Install (local development, live kb from your checkout)
    • Usage
    • Semantic Search Results
    • Remote transport (optional)
    • Troubleshooting & Logging
    • KB availability smoke check
    • Distinguishing search failure modes
    • Other tips
    • Security

    Table of Contents

    • Demo
    • Setup Instructions
    • Install (one command)
    • Install (CLI alongside the MCP server, RFC 012)
    • Research evidence packets for agents
    • Local retrieval benchmarks
    • Local LLM answers (RFC 015)
    • Comparing embedding models (RFC 013)
    • MCP error codes
    • Install via Smithery
    • Install from source
    • Install (local development, live kb from your checkout)
    • Usage
    • Semantic Search Results
    • Remote transport (optional)
    • Troubleshooting & Logging
    • KB availability smoke check
    • Distinguishing search failure modes
    • Other tips
    • Security

    Documentation

    Knowledge Base MCP Server

    Tests

    npm

    License

    Node.js >= 20

    This MCP server provides tools for listing and retrieving content from different knowledge bases.

    Demo

    kb CLI demo — list knowledge bases, ask a natural-language question, get scored markdown results

    Real output, no mock data: the capture drives the kb CLI against a small knowledge base seeded from this repo's own docs/, indexed with the default Ollama embedding model. Regenerate it with [docs/assets/record-demo.sh](docs/assets/record-demo.sh).

    smithery badge

    Setup Instructions

    These instructions assume you have Node.js (version 20 or higher) and npm installed on your system.

    Install (one command)

    bash
    npx -y @jeanibarz/knowledge-base-mcp-server@latest

    npx fetches the package from npm and launches the stdio server. Point your MCP client at npx -y @jeanibarz/knowledge-base-mcp-server@latest and configure the environment variables documented below. See docs/clients.md for copy-pasteable snippets (Claude Desktop, Codex CLI, Cursor, Continue, Cline).

    **Pin @latest, not the unversioned spec.** npx -y @jeanibarz/knowledge-base-mcp-server (no version) caches the resolved version in ~/.npm/_npx/ indefinitely — subsequent client launches reuse that cached version even after a new release ships. The @latest form hashes to a different cache key and re-resolves on every launch, so new fixes arrive on the next client restart instead of requiring a manual ~/.npm/_npx/ clear. See RFC 012 §2.4.

    Install (CLI alongside the MCP server, RFC 012)

    For an interactive shell or AI-agent shell-tool flow, install globally and use the kb bin directly. The OS resolves the binary on every invocation, so npm i -g …@latest is picked up without restarting any AI client that has the MCP server loaded:

    bash
    npm install -g @jeanibarz/knowledge-base-mcp-server@latest
    kb list                       # list available knowledge bases
    kb stats                      # read-only index/corpus stats
    kb search "your query"                       # read-only dense search
    kb search "your query" --timing              # include retrieval-stage timings
    kb search "your query" --format=compact      # one-line-per-hit operator table (#446)
    printf '{"query":"q1"}\n{"query":"q2"}\n' | kb search --batch-jsonl  # batched JSONL stdin (#440)
    kb search "query" --refresh                  # also re-scan KB files (write path)
    kb search "query" --explain-empty            # opt-in deep diagnostics when results are empty (#328)
    kb search "INDEX_NOT_INITIALIZED" --mode=lexical --refresh   # BM25 debug surface (#206 stage 1)
    kb search "retrieval benchmarks" --mode=lexical --lexical-unit=source  # source-level BM25
    kb search "INDEX_NOT_INITIALIZED" --mode=hybrid              # dense ⨁ BM25 fused via RRF (#206 stage 2)
    kb search "src/cli-search.ts" --mode=auto    # opt-in heuristic: hybrid for code/path/error-shaped queries
    kb search "runbook rollback" --context-window=1  # include adjacent chunks around dense hits
    kb search "agent evidence" --diverse --format=json           # source-aware representative sampling
    kb search "agent evidence" --anti-query="frontend styling"   # contrastive, positive-support constrained
    kb search "queue debt" --plus="slow loop" --minus="UI layout" --format=json
    kb open alpha/docs/deploy.md#L42-L78         # resolve a chunk id / kb:// URI / result path to its source file
    kb related alpha/docs/deploy.md#L42-L78      # find dense neighbors from an existing result chunk
    kb llm use-endpoint http://127.0.0.1:8080/v1/chat/completions --profile=local-research-agent
    kb ask "what changed in the daemonization notes?" --timing   # retrieval + local LLM answer with timings
    kb ask "what changed?" --kb=work --save-transcript --title="Ask - daemon changes" --yes
    kb research plan "autonomous research agents and evals" --format=json
    kb research collect "autonomous research agents and evals" --run-dir runs/agents --format=json
    kb remember --suggest --kb=work --title="Quarterly plan"
    printf '# Quarterly plan\n\n...' | kb remember --kb=work --title="Quarterly plan" --stdin --yes
    printf '\nFollow-up note.\n' | kb remember --kb=work --append=quarterly-plan.md --stdin --yes
    kb import-url --kb=research https://example.com/article   # snapshot a URL into a provenance-tagged note
    kb superseded --kb=work       # read-only review for obsolete/contradicted notes
    kb feedback add --kb=work --query="rollback procedure" --source=runbooks/deploy.md --verdict=relevant
    kb feedback promote --kb=work --query="rollback procedure" --fixture=docs/testing/feedback-fixture.yml --yes  # promote ledger entries into an eval fixture
    kb eval retrieval-eval.yml     # run fixture-driven retrieval checks
    kb eval-gate docs/testing/fixtures/rfc-018-gate-eval/queries.yml  # RFC 018 gate validation harness
    kb reindex --with-context     # rebuild the FAISS index with RFC 017 contextual prefaces
    kb reindex status --format=json  # ledger of recent / in-flight reindex passes (#417)
    kb logs show --request-id=     # read canonical request logs by id (#397)
    kb logs recent --limit=20 --format=json  # most recent canonical log entries
    kb serve                      # start the loopback CLI daemon (warm reads); --port=17799 --idle-timeout-ms=300000
    kb serve status               # daemon liveness + degraded-mode diagnostics (#420)
    kb config validate            # static env-var schema validation before startup
    kb doctor                     # availability snapshot (index, embedding backend, LLM)
    kb doctor --endpoints         # focused MCP/daemon/Ollama/LLM endpoint preflight
    kb doctor --locks             # write-lock owner and stale-lock diagnosis
    kb doctor --kb-symlinks       # inventory KB-root symlinks and escaping targets
    kb doctor --bug-report=/tmp   # write a redacted support bundle for issue reports
    kb --help                     # top-level command list
    kb help search                # per-command help (also: kb search --help)
    kb completion bash            # generate a bash shell completion script

    The kb bin shares the same env vars as the MCP server (KNOWLEDGE_BASES_ROOT_DIR, FAISS_INDEX_PATH, EMBEDDING_PROVIDER, OLLAMA_*, OPENAI_*, HUGGINGFACE_*). The consolidated operator matrix for retrieval flags, defaults, per-call overrides, rollout status, and validation commands lives in docs/feature-flags.md. kb stats [--kb=] [--format=md|json] mirrors the MCP kb_stats payload for local shell use: per-KB file/chunk/byte counts, last indexed time, embedding model, index path, version context, filesystem enumeration failure diagnostics, contextual-preface cache/failure counters (when RFC 017 ingest is enabled), and remote-transport request/auth/backoff counters (when the HTTP or SSE transport is active). It is read-only and does not refresh the index. kb search also defaults to read-only dense retrieval — it loads the existing FAISS index but does not re-scan KB files. Pass --refresh to re-index. Use --mode=hybrid for explicit dense+BM25 rank fusion, or --mode=auto to keep dense for prose queries while selecting hybrid for code, path, flag, error-code, and issue-reference shaped queries. Dense-only neighbor context flags (--context-before, --context-after, --context-window) attach adjacent chunks from the same source around each ranked semantic match; see docs/search-neighbor-context.md for examples, JSON shape, and tradeoffs. Exploration operators stay additive and read-only: --diverse reranks a bounded dense candidate pool for source-aware representative coverage; --anti-query= penalizes candidates close to a negative query but only among positively supported candidates; --plus= and --minus= add vector-composition-style positive and negative query components. JSON output includes an advanced_retrieval explanation block with mode, constraints, query components, and per-result scoring signals. Add --timing to kb search or kb ask when you need per-stage elapsed milliseconds in either markdown or JSON output. --format=compact collapses each result to a single score|kb|path:line line for terse operator listings; --batch-jsonl reads {"query":"…","kb":"…","k":N} records from stdin and emits one JSON result envelope per line. Search output includes a freshness footer indicating whether the index is up-to-date relative to KB file mtimes.

    Research evidence packets for agents

    kb research is a read-only workflow for agent shells that need a broad evidence pass before writing an answer, RFC, eval plan, or issue. It does not call an LLM, trigger local-research-agent, refresh indexes, or write KB notes.

    Run plan first to inspect the deterministic shelf/query plan, then run collect with a run directory:

    bash
    kb research plan "synthesize an end-to-end approach for autonomous research agents and evals" --format=json
    kb research collect "synthesize an end-to-end approach for autonomous research agents and evals" \
      --run-dir /tmp/kb-research-autonomous-agents-evals-20260521 \
      --format=json

    After collect, read the generated evidence_packet.md and synthesize manually. The run directory also contains run.json, plan.json, ledger.json, and events.jsonl; ledger.json stays lossless for audit/debug use, while evidence_packet.md is the human-scannable packet. The JSON contract and stable artifact fields are documented in [docs/cli-json-contracts.md](docs/cli-json-contracts.md#kb-research); a longer operator walk-through (when to use it, how to read the packet, downstream kb ask + kb feedback plumbing) lives in [docs/operations/research-workflow.md](docs/operations/research-workflow.md).

    For local day-two operations with Ollama, llama-server, n8n, systemd user

    units, remote MCP transports, or kb serve, see the

    local service operations runbook. For

    active incidents, start with the symptom-keyed

    incident response runbook.

    RFC 018 relevance gating is off by default. Enable it per process with KB_RELEVANCE_GATE=on, or per CLI call with kb search --gate; bypass an enabled process with --no-gate or MCP gate: "off". The judge uses --task-context= / --task-context-file= or MCP task_context, and reads KB_GATE_LLM_ENDPOINT / KB_GATE_LLM_MODEL (falling back to KB_LLM_ENDPOINT / KB_LLM_MODEL). Tuning env vars are KB_GATE_SCORE_FLOOR (default 0.95), KB_GATE_JUDGE_INPUT (default 10), KB_GATE_LLM_TIMEOUT_MS (default 8000), and KB_GATE_MIN_TASK_TOKENS (default 8). KB_GATE_EMPTY_VERDICT defaults to off; turn it on only when you are comfortable letting the gate return no retrieved context.

    kb search --format=vimgrep prints one quickfix-compatible line per result: path:line:col:preview. JSON results include a stable chunk_id such as alpha/docs/deploy.md#L42-L78 when chunk metadata has a KB, path, and line range; chunks without line metadata fall back to #chunk-N. Set KB_EDITOR_URI=vscode, cursor, or file to add opt-in absolute-path editor_uri fields and markdown Open links. The default KB_EDITOR_URI=none omits local absolute paths. kb open resolves any of those pointers back to the absolute source path, validated against the KB root; it is read-only and prints the path (add --json for the cited line range and an editorUri). kb related retrieves an indexed seed chunk from that handle, runs dense search with the seed text, and excludes the seed chunk unless --include-self is passed.

    kb remember is a conservative CLI write path for agent shells. --suggest lists likely existing targets from note filenames/headings, does not read stdin or write notes, and may update a small .index heading cache. Creates and appends require both --stdin and --yes; create uses a slugified .md filename and refuses overwrites, while append accepts only existing KB-relative paths. Plain EOF appends and kb capture --append serialize per target and commit through a temp-file fsync plus atomic rename. kb capture redacts common credentials from captured stdout and the displayed command line by default; pass --no-redact only when raw output is required. Add --refresh to re-index the affected KB after a successful write. For machine-readable command shapes, see [docs/cli-json-contracts.md](docs/cli-json-contracts.md).

    kb import-url --kb= snapshots a web page or PDF into one new KB note while preserving provenance. It fetches the URL over http(s), routes HTML and PDF responses through the same loaders the indexer uses, and writes a note whose YAML frontmatter records source_url, fetched_at, content_sha256, content_type, http_status, and byte_count. The fetch is guarded: only http/https schemes are allowed, redirects are followed manually and re-validated per hop (--max-redirects, default 5), responses are size-capped (--max-bytes, default 8 MiB) and time-bounded (--timeout, default 30000 ms), and private/loopback/link-local addresses are refused unless --allow-local-network is passed (SSRF guard). The note path defaults to a slug of the page title; pass --note= to choose it, --title to override the title, and --refresh to re-index afterwards. It refuses to overwrite an existing note.

    kb superseded --kb= is a read-only active-forgetting review. It scans markdown frontmatter for explicit contradiction, deprecated/dormant lifecycle status, stale verification dates, and low-confidence active notes, then uses the existing semantic index to add conservative newer-neighbor evidence when available. Use --format=json for agent workflows and --include-clean when you need a full inventory.

    kb feedback add --kb= --query="" --source= --verdict=relevant|irrelevant|stale|misleading [--relevance=0..3] [--chunk-id=] appends a relevance judgment to the per-KB ledger (/.index/relevance-feedback.jsonl). kb feedback list --kb= reviews recent entries, and kb feedback promote --kb= --query="" --fixture= --yes materialises every ledger row for a query into a retrieval-eval case so accumulated judgments become regression coverage. See [docs/operations/feedback-workflow.md](docs/operations/feedback-workflow.md).

    kb eval-gate runs the RFC 018 relevance-gate validation harness end-to-end against a labelled-queries fixture (Stage A statistical floor + optional Stage B LLM judge) and reports per-stage precision, recall, and false-empty rate. Use it when you change KB_GATE_* tuning or the judge prompt before promoting changes to production. See [docs/operations/eval-gate-harness.md](docs/operations/eval-gate-harness.md).

    kb logs is the canonical reader for the structured request log emitted under KB_LOG_FORMAT=canonical or both. Use kb logs show --request-id= to pull every line of a specific retrieval, kb logs show --query-sha= to follow recurring queries, and kb logs recent --limit= for the most recent entries. --format=json produces one line per record for downstream tooling.

    kb serve runs the loopback CLI daemon used by clients that pass --daemon for warm reads. The bare kb serve [--host=127.0.0.1] [--port=17799] [--idle-timeout-ms=300000] brings it up; kb serve status [--json] reports reachability, pid, idle timeout, and supported commands at the configured KB_DAEMON_URL (defaults to http://127.0.0.1:17799); SIGINT or SIGTERM stops it. CLI commands fall back to direct in-process execution when the daemon is unavailable. See [docs/operations/daemon-lifecycle.md](docs/operations/daemon-lifecycle.md).

    kb reindex --with-context rebuilds the FAISS index with RFC 017 contextual prefaces (requires KB_CONTEXTUAL_RETRIEVAL=on and an KB_LLM_ENDPOINT). kb reindex status reads the .reindex.run.json ledger and reports the current or most recent run (kb scope, model, started/finished timestamps, chunks processed, cache hit rate, and any failure code). The --kb= flag is a guard/estimator hint only — the rebuild always covers the entire single-index-per-model FAISS layout (see RFC 017 §5).

    Set KB_INGEST_SECRET_SCAN=on to scan chunks and frontmatter metadata for credential-shaped content before they are embedded. Hits are quarantined with reason secret_detected, skipped before FAISS writes, and can be inspected with kb quarantine list --reason=secret_detected; use KB_SECRET_SCAN_BYPASS_KBS=trusted-kb,... only for shelves that intentionally store credential examples. See Ingest Secret Scan.

    Retrieved chunks are passed through the untrusted-content guard (KB_INJECTION_GUARD, default tag). The guard scans chunk text for prompt-injection signals — system-role markers, instruction-override phrases, Unicode bidi/zero-width/tag controls — and either annotates metadata (tag), fences the content with sentinel strings (wrap), or both (both). Set KB_INJECTION_GUARD=off only on shelves you fully control, or use KB_INJECTION_GUARD_BYPASS_KBS=trusted-kb,... to opt specific shelves out without weakening the global default.

    kb eval runs retrieval checks from fixtures. Each case can set query, optional kb, required_sources, forbidden_sources, expected_metadata, max_duplicate_groups, stale_policy, and gate. Failing ungated cases print warnings and exit 0; failing gated cases exit 1 for CI.

    yaml
    gate: false
    cases:
      - name: deployment runbook
        query: rollback procedure
        kb: work
        gate: true
        required_sources: [runbooks/deploy.md]
        forbidden_sources: [archive/old-deploy.md]
        expected_metadata:
          frontmatter.status: approved
        max_duplicate_groups: 1
        stale_policy: fresh

    The MCP server (knowledge-base-mcp-server bin) is unchanged and still works with all the configurations in docs/clients.md. The CLI is additive.

    Local retrieval benchmarks

    Use npm run bench:beir to run a local BEIR benchmark with credential-free

    lexical retrieval:

    bash
    npm run bench:beir -- --dataset=scifact --split=test --mode=lexical --lexical-unit=source --output-dir=/tmp/kb-beir-scifact

    The runner builds a temporary KB root, emits metrics JSON plus a TREC run file,

    and records reproduction metadata such as git SHA, command, dataset checksum,

    runtime versions, chunking config, and latency percentiles. These are local

    benchmark artifacts, not official BEIR leaderboard submissions; the current

    lexical source path is scored at document level and is also available in the

    CLI via kb search --mode=lexical --lexical-unit=source. See

    benchmarks/README.md

    for smoke-test commands and caveats, and

    benchmarks/results/README.md for the archived

    SciFact run.

    Optuna tuning is optional and only runs when explicitly invoked. A SciFact

    example that sweeps lexical chunking and writes a replayable best-config file:

    bash
    npm run bench:tune -- \
      --trials=12 \
      --direction=maximize \
      --metric=metrics.ndcgAt10 \
      --study-name=scifact-lexical \
      --best-config-out=/tmp/kb-scifact-lexical-best.json \
      --param-int=KB_CHUNK_SIZE=256:1024:128 \
      --param-int=KB_CHUNK_OVERLAP=0:128:32 \
      -- npm run bench:beir -- --dataset=scifact --split=test --mode=lexical --max-queries=25 --output-dir=/tmp/kb-scifact-tune

    Replay the best trial without importing Optuna:

    bash
    npm run bench:tune -- --replay-config=/tmp/kb-scifact-lexical-best.json

    Local LLM answers (RFC 015)

    kb ask keeps retrieval deterministic and adds a local OpenAI-compatible chat step on top. It resolves the LLM endpoint from --endpoint, KB_LLM_ENDPOINT, --llm-profile, the active kb llm profile, then finally the local-research-agent default on 127.0.0.1:8080.

    Notes can opt out of LLM prompt packing with this YAML frontmatter:

    kb_policy: { no_llm_context: true }. Search can still return the chunk, but

    kb ask and MCP ask_knowledge skip it before calling the LLM and report

    context_packing.policy_filtered_chunks.

    For offline development, set KB_LLM_FAKE=on to route kb ask, RFC 018 relevance-gate judge calls, and RFC 017 contextual-preface generation to a deterministic in-process fake LLM. Use npm run dev:mockllm -- --port=18080 when a client needs an OpenAI-compatible localhost endpoint instead. Rules can be customized with KB_LLM_FAKE_RULES; see docs/testing/fake-llm.md.

    Add --save-transcript --kb= --yes to persist the generated answer as a new markdown note in that KB. The saved record includes the question, answer, citations, source chunk ids, LLM endpoint/profile/model, retrieval model, and timing metadata when --timing is present. --title= controls the note title and slug; existing transcript notes are never overwritten.

    bash
    # Reuse an already-running local-research-agent llama-server.
    kb llm use-endpoint http://127.0.0.1:8080/v1/chat/completions --profile=local-research-agent
    kb llm status
    kb ask "Which notes discuss reboot recovery?" --kb=operating-environment
    kb ask "Which notes discuss reboot recovery?" --kb=operating-environment \
      --save-transcript --title="Reboot recovery answer" --yes
    
    # Optional managed service for machines that want kb to own the warm model.
    kb llm install --profile=qwen --runner=llama-server \
      --bin=/path/to/llama-server --model=/path/to/model.gguf --port=8091
    kb llm start --profile=qwen
    kb llm set-model --profile=qwen --model=/path/to/other-model.gguf --start
    kb llm uninstall --profile=qwen

    External profiles are reuse-only: kb llm stop, restart, uninstall, and reap do not stop services owned by local-research-agent. Managed profiles are namespaced as kb-llm@.service, bind to 127.0.0.1, and write leases under the user state directory so stale managed models can be reaped instead of staying loaded forever.

    Comparing embedding models (RFC 013)

    The CLI can keep multiple embedding models side-by-side and query each by id. Useful for retrieval-quality A/B without losing the previous model:

    bash
    # List registered models. The * marks the active one.
    kb models list
    
    # Add a second model — embeds your KB once under the new model.
    # For paid providers, prints an estimated cost and prompts before any HTTP traffic.
    kb models add ollama nomic-embed-text          # local, free
    kb models add openai text-embedding-3-small    # paid; estimate first
    kb models add huggingface BAAI/bge-small-en-v1.5
    
    # Query a specific model without changing the default.
    kb search "your query" --model=openai__text-embedding-3-small
    
    # Side-by-side comparison: unified rank/score table over both models' top-k.
    kb compare "your query" ollama__nomic-embed-text-latest openai__text-embedding-3-small
    
    # Switch the default model.
    kb models set-active openai__text-embedding-3-small
    
    # Remove a model (refuses to remove the active one).
    kb models remove huggingface__BAAI-bge-small-en-v1.5

    ` is __, derived deterministically from (provider, model_name) as typed (e.g. OLLAMA_MODEL=nomic-embed-text:latest → ollama__nomic-embed-text-latest). On-disk layout: each model lives at ${FAISS_INDEX_PATH}/models//. The active model is recorded in ${FAISS_INDEX_PATH}/active.txt and overridable per-process via KB_ACTIVE_MODEL. See [docs/rfcs/013-multimodel-support.md`](docs/rfcs/013-multimodel-support.md) for the full design.

    Migration from the legacy single-model layout is automatic on first server (or kb) start: the existing single-model index is moved into ${FAISS_INDEX_PATH}/models// and active.txt is written. Cross-process starts coordinate the migration with ${FAISS_INDEX_PATH}/.kb-migration.lock; MCP and CLI processes no longer rely on a long-lived single-instance PID advisory. Keep a backup of the previous ${FAISS_INDEX_PATH} if you need rollback safety before upgrading from an older checkout.

    MCP surface — retrieve_knowledge gains an optional model_name argument; a new list_models tool returns the registered models; kb_stats reports the latest in-process updateIndex summary under last_index_update alongside the static index counts. Tools that don't pass model_name keep working unchanged (wire format is byte-equal to 0.2.x).

    MCP error codes

    Tool errors are returned with isError: true and a JSON text payload so MCP clients can branch without substring matching:

    json
    {
      "error": {
        "code": "PROVIDER_AUTH",
        "message": "OPENAI_API_KEY environment variable is required when using OpenAI provider"
      }
    }
    CodeMeaningTypical client action
    INDEX_NOT_INITIALIZEDA search ran before a FAISS index was available.Retry after initialization or trigger a refresh.
    PROVIDER_UNAVAILABLEThe embedding provider is temporarily unavailable.Retry with backoff.
    PROVIDER_TIMEOUTThe embedding provider timed out.Retry with backoff.
    PROVIDER_AUTHProvider credentials are missing or invalid.Ask the user to configure a valid API key.
    KB_NOT_FOUNDThe requested knowledge base does not exist.Prompt for one of the listed knowledge bases.
    PERMISSION_DENIEDThe server cannot read or write a required local path.Surface to the operator/admin.
    CORRUPT_INDEXThe persisted FAISS index is corrupt or unreadable.Rebuild or recover the index.
    VALIDATIONA caller-supplied argument failed validation.Fix the request before retrying.
    INTERNALAn unclassified server error occurred.Surface the message and logs for investigation.

    Install via Smithery

    To install Knowledge Base Server for Claude Desktop automatically via Smithery:

    bash
    npx -y @smithery/cli install @jeanibarz/knowledge-base-mcp-server --client claude

    Install from source

    Use this path if you want to develop against the repo or pin an unreleased commit.

    Prerequisites

    • Node.js (version 20 or higher)
    • npm (Node Package Manager)

    1. Clone the repository:

    bash
    git clone 
        cd knowledge-base-mcp-server

    2. Install dependencies:

    bash
    npm install

    3. Configure environment variables:

    This server supports three production embedding providers: Ollama (recommended for reliability), OpenAI and HuggingFace (fallback option). A deterministic EMBEDDING_PROVIDER=fake backend also exists for tests and offline fixtures; it is not suitable for deployed retrieval quality.

    ### Option 1: Ollama Configuration (Recommended)

    • Set EMBEDDING_PROVIDER=ollama to use local Ollama embeddings
    • Install Ollama and pull an embedding model: ollama pull dengcao/Qwen3-Embedding-0.6B:Q8_0
    • Configure the following environment variables:
    bash
    EMBEDDING_PROVIDER=ollama
            OLLAMA_BASE_URL=http://localhost:11434  # Default Ollama URL
            OLLAMA_MODEL=dengcao/Qwen3-Embedding-0.6B:Q8_0          # Default embedding model
            KNOWLEDGE_BASES_ROOT_DIR=$HOME/knowledge_bases
    • Minimum context window: the embedding model must accept at least ~500 tokens of input. The default chunker emits ~1000-character chunks which commonly tokenize past 256 tokens, so models like all-minilm (256 ctx) will reject every request. Use nomic-embed-text (8192 ctx), dengcao/Qwen3-Embedding-0.6B:Q8_0 (32K ctx), or any model with ≥512 ctx instead.

    ### Option 2: OpenAI Configuration

    • Set EMBEDDING_PROVIDER=openai to use OpenAI API for embeddings
    • Configure the following environment variables:
    bash
    EMBEDDING_PROVIDER=openai
            OPENAI_API_KEY=your_api_key_here
            OPENAI_MODEL_NAME=text-embedding-3-small
            KNOWLEDGE_BASES_ROOT_DIR=$HOME/knowledge_bases
    • As of this release, the OpenAI default is text-embedding-3-small (up from text-embedding-ada-002). Both produce 1536-dim vectors, but the model name change will trigger a one-time FAISS index rebuild on the next query. Override with OPENAI_MODEL_NAME=... if you prefer the old default.

    ### Option 3: HuggingFace Configuration (Fallback)

    • Set EMBEDDING_PROVIDER=huggingface or leave unset (default)
    • Obtain a free API key from HuggingFace
    • Configure the following environment variables:
    bash
    EMBEDDING_PROVIDER=huggingface          # Optional, this is the default
            HUGGINGFACE_API_KEY=your_api_key_here
            HUGGINGFACE_MODEL_NAME=BAAI/bge-small-en-v1.5
            HUGGINGFACE_PROVIDER=hf-inference       # Optional, router provider for serverless inference
            KNOWLEDGE_BASES_ROOT_DIR=$HOME/knowledge_bases
    • As of this release, the HuggingFace default is BAAI/bge-small-en-v1.5 (up from sentence-transformers/all-MiniLM-L6-v2). Both produce 384-dim vectors, but the model name change will trigger a one-time FAISS index rebuild on the next query. Override with HUGGINGFACE_MODEL_NAME=... if you prefer the old default.
    • HuggingFace retired the legacy api-inference.huggingface.co/models/...

    endpoint in 2025. Feature-extraction calls are now routed through the

    Inference Providers router at

    https://router.huggingface.co/hf-inference/models//pipeline/feature-extraction

    by default. Set HUGGINGFACE_PROVIDER to choose a different supported

    Inference Provider such as together, replicate, fireworks-ai,

    sambanova, nebius, or novita. The existing

    HUGGINGFACE_API_KEY value can be either a Hugging Face token or a

    compatible provider key, depending on how the request is authenticated

    upstream. To target a self-hosted or dedicated Inference Endpoint, set

    HUGGINGFACE_ENDPOINT_URL to the full POST URL; explicit endpoint URLs

    bypass router provider selection.

    ### Additional Configuration

    • The server supports the FAISS_INDEX_PATH environment variable to specify the path to the FAISS index. If not set, it will default to $HOME/knowledge_bases/.faiss. For a complete defaults and validation matrix across retrieval, ingest, diagnostics, and transport flags, see docs/feature-flags.md.
    • **Shared FAISS_INDEX_PATH coordination.** Multiple MCP/CLI processes may share a trusted FAISS_INDEX_PATH; mutating paths serialize through per-model write locks and versioned atomic saves. Keep the index directory trusted and local — do not mount it on a shared filesystem writable by untrusted peers. See [docs/architecture/threat-model.md](docs/architecture/threat-model.md) for the current concurrency posture.
    • Logging can be routed to a file by setting LOG_FILE=/path/to/logs/knowledge-base.log. Log verbosity defaults to info and can be adjusted with LOG_LEVEL=debug|info|warn|error.
    • Mutation audit log (opt-in). Set KB_MUTATION_AUDIT_LOG=/path/to/kb-mutations.jsonl to capture an append-only JSONL ledger of KB content writes. Each line records surface (cli.kb-remember / cli.kb-capture / cli.kb-ask / mcp.add_document / mcp.delete_document), operation, kb, relative_path, timestamp, before_sha256, after_sha256, write_performed, refresh_requested, refresh_status, and per-surface decision_flags. Note content is not stored; only hashes and metadata. The feature is best-effort — an audit write failure logs a warn to stderr but never aborts the primary mutation. KB names and paths are inherent to the records, so treat the audit log with the same sensitivity as the underlying KB directory.
    • Tailor tool descriptions per deployment. The model-facing MCP tool descriptions can be overridden before server start via RETRIEVE_KNOWLEDGE_DESCRIPTION, ASK_KNOWLEDGE_DESCRIPTION, LIST_KNOWLEDGE_BASES_DESCRIPTION, LIST_MODELS_DESCRIPTION, and KB_STATS_DESCRIPTION. Unset or empty falls back to the built-in defaults. Example:
    bash
    RETRIEVE_KNOWLEDGE_DESCRIPTION="Search engineering runbooks, RFCs, and postmortems."
            ASK_KNOWLEDGE_DESCRIPTION="Answer from engineering runbooks with citations."
            LIST_KNOWLEDGE_BASES_DESCRIPTION="List available engineering knowledge bases."
            LIST_MODELS_DESCRIPTION="List embedding models registered for engineering retrieval."
            KB_STATS_DESCRIPTION="Report engineering KB index and transport health."
    • Ingest filter overrides (RFC 011 M1). The server embeds only files whose extension is in {.md, .markdown, .txt, .rst, .html, .htm} and excludes PDFs, workflow sidecars (_seen.jsonl, _index.jsonl), log / staging subtrees (logs/, tmp/, _tmp/), and OS turds (.DS_Store, Thumbs.db, desktop.ini). To extend the allowlist or add more exclusions:
    bash
    # Comma-separated extensions (case-insensitive; leading dot optional).
            INGEST_EXTRA_EXTENSIONS=".json,.yaml"
            # Comma-separated minimatch globs relative to the KB root.
            INGEST_EXCLUDE_PATHS="drafts/**,scratch.md"

    Extensionless files (e.g. README, LICENSE, Makefile) and PDFs are not embedded by the default allowlist; use INGEST_EXTRA_EXTENSIONS=".pdf" only when PDF extraction is intentional. The base exclusions are authoritative: operators can add more but cannot remove the built-ins.

    • You can set these environment variables in your .bashrc or .zshrc file, or directly in the MCP settings.

    4. Build the server:

    bash
    npm run build

    5. Add the server to your MCP client:

    See docs/clients.md for copy-pasteable configuration snippets for Claude Desktop, Codex CLI, Cursor, Continue, and Cline.

    6. Create knowledge base directories:

    • Create subdirectories within the KNOWLEDGE_BASES_ROOT_DIR for each knowledge base (e.g., company, it_support, onboarding).
    • Place text files (e.g., .txt, .md) containing the knowledge base content within these subdirectories.
    • The server recursively reads ingestable files within the specified knowledge base subdirectories: the base allowlist is .md, .markdown, .txt, .rst, .html, and .htm, plus any extensions added with INGEST_EXTRA_EXTENSIONS.
    • The server skips hidden files and directories (those starting with a .).
    • For each file, the server calculates the SHA256 hash and stores it in a file with the same name in a hidden .index subdirectory. This hash is used to determine if the file has been modified since the last indexing.
    • File content is split into chunks before indexing: .md files use MarkdownTextSplitter (heading-aware), and every other text file uses RecursiveCharacterTextSplitter. Both splitters share the same chunkSize: 1000, chunkOverlap: 200 defaults, so a large .txt, .rst, or source file produces many chunks rather than a single embedding.
    • The content of each chunk is then added to a FAISS index, which is used for similarity search.
    • The FAISS index is automatically initialized when the server starts. It checks for changes in the knowledge base files and updates the index accordingly.

    Install (local development, live kb from your checkout)

    Use this when you're actively developing on the repo and want your global kb and knowledge-base-mcp-server bins to always reflect the current state of main (or your feature branch) — without npm publish and without manual reinstalls after each git pull.

    bash
    git clone https://github.com/jeanibarz/knowledge-base-mcp-server.git
    cd knowledge-base-mcp-server
    npm run dev:setup

    dev:setup does three things, all idempotent:

    1. **npm install + npm run build** — first build, so the bins exist before linking.

    2. **npm link** — symlinks kb and knowledge-base-mcp-server into the global node prefix (printed during setup so you can verify it lands where you expect). From then on, every npm run build overwrites build/ in place and the global bins pick up the new code on the next invocation. No re-link needed after rebuilds.

    3. **git config core.hooksPath .githooks** — points git at the tracked [.githooks/](./.githooks) directory so the post-merge and post-rewrite hooks fire after every git pull (merge or rebase) and git merge. The hook re-runs npm install if package.json changed and npm run build if any source changed. Skips quietly when nothing relevant moved. The hook order puts this last, so a failed install/build leaves the repo in its original state.

    After setup, the daily loop is just:

    bash
    git pull            # hook rebuilds automatically (merge or rebase)
    kb search "..."     # uses the freshly-built bin from this checkout

    Or, when editing locally:

    bash
    # edit src/...
    npm run build       # global `kb` immediately reflects your change

    For source-mapped CLI debugging without rebuilding or relinking, run the

    TypeScript CLI entrypoint directly:

    bash
    npm run dev:cli -- --help
    npm run dev:cli -- search "rollback procedure" --kb=work --k=5

    The wrapper prints the active KNOWLEDGE_BASES_ROOT_DIR, FAISS_INDEX_PATH,

    embedding provider, and embedding model to stderr before each invocation so

    you can confirm which KB and index a command would touch.

    For disposable HTTP/SSE transport debugging without touching your real KB or

    FAISS index, run:

    bash
    npm run dev:remote -- --transport=http
    npm run dev:remote -- --transport=sse

    dev:remote creates a seeded scratch KB, chooses a free loopback port,

    generates an MCP_AUTH_TOKEN, prints curl examples for the selected transport,

    starts the TypeScript server against the scratch state, and removes that state

    when the server exits. Add --keep to inspect the generated files afterward,

    or --print-env to emit the environment and examples without starting the

    server.

    Switching back to the published npm release (e.g. to compare behaviour):

    bash
    npm unlink -g @jeanibarz/knowledge-base-mcp-server
    npm install -g @jeanibarz/knowledge-base-mcp-server@latest

    **Why npm link instead of npm install -g .?** npm link is a symlink, so npm run build is reflected without reinstalling. npm install -g . copies the build snapshot, so every change requires a re-install.

    Hook scope. The hooks trigger on git pull / git merge / git pull --rebase, not on git checkout between branches. Run npm run build manually after a branch switch if needed. If a rebuild fails, the hook prints a warning and exits 0 so the pull itself isn't reported as failed — fix the build, then run npm run build manually.

    Usage

    Writing notes that retrieve well? See [docs/authoring-knowledge.md](docs/authoring-knowledge.md) — six-section guide on chunk-friendly markdown, frontmatter taxonomy that lifts into filters, content-boundary safety, and when to split a KB. For query-time neighbor windows around dense matches, see [docs/search-neighbor-context.md](docs/search-neighbor-context.md).

    The MCP server exposes the current tool set registered by src/KnowledgeBaseServer.ts:

    • list_knowledge_bases: Lists the available knowledge bases.
    • retrieve_knowledge: Retrieves similar chunks from the knowledge base based on a query. Optionally, if a knowledge base is specified, only that one is searched; otherwise, all available knowledge bases are considered. By default, at most 10 document chunks are returned with a score below a threshold of 2. A different threshold can optionally be provided using the threshold parameter.
    • ask_knowledge: Retrieves KB snippets and asks a configured local/OpenAI-compatible LLM to answer with citations.
    • list_models: Lists registered embedding models and the active model.
    • kb_stats: Returns read-only corpus, index, model, cache, and transport statistics.
    • diff_index: Compares retrieval results across two persisted FAISS index versions.
    • add_document: Writes a text document into a KB through the guarded MCP mutation path.
    • delete_document: Deletes a KB-relative document through the guarded MCP mutation path.
    • reindex_knowledge_base: Forces a global FAISS rebuild, optionally validating a named KB before the rebuild.

    You can use these tools through the MCP interface. The kb CLI covers the shell-oriented surfaces shown above.

    The server also exposes MCP resources for clients that want to enumerate and read source documents directly. resources/list returns kb:/// URIs for ingestable, non-quarantined files under KNOWLEDGE_BASES_ROOT_DIR, and resources/read returns the raw document content as text or, when .pdf is opted into ingest, a base64 PDF blob. See [docs/mcp-resources.md](docs/mcp-resources.md) for client-facing URI, MIME type, and percent-encoding details.

    The retrieve_knowledge tool performs a semantic search using a FAISS index. The index is automatically updated when the server starts or when a file in a knowledge base is modified.

    The output of the retrieve_knowledge tool is a markdown formatted string with the following structure:

    `markdown
    ## Semantic Search Results
    
    **Result 1:**
    
    [Content of the most similar chunk]
    
    **Source:**

    {

    "source": "[Path to the file containing the chunk]"

    }

    code
    ---
    
    **Result 2:**
    
    [Content of the second most similar chunk]
    
    **Source:**

    {

    "source": "[Path to the file containing the chunk]"

    }

    code
    > **Disclaimer:** The provided results might not all be relevant. Please cross-check the relevance of the information.

    Each result includes the content of the most similar chunk, the source file, and a similarity score.

    When chunk metadata includes line information, the markdown source header links a stable chunk handle such as alpha/docs/deploy.md#L42-L78 to the matching kb://alpha/docs/deploy.md#L42-L78 resource URI. Set KB_EDITOR_URI=vscode, cursor, or file before launching the server to also include editor-open links with local absolute paths.

    Remote transport (optional)

    By default the server speaks MCP over stdio — every supported client (Claude Desktop, Codex, Cursor, Continue, Cline) launches it as a child process. RFC 008 adds opt-in SSE and streamable HTTP transports for browser-based clients, Smithery remote mode, and shared deployments. Stdio is unchanged unless you set MCP_TRANSPORT.

    bash
    export MCP_TRANSPORT=http                         # stdio (default), sse, or http
    export MCP_AUTH_TOKEN="$(openssl rand -base64 32)"   # must be ≥32 characters; shorter tokens abort startup
    export MCP_ALLOWED_ORIGINS="http://localhost:5173"   # comma-separated; leave unset to deny all browser origins
    export MCP_PORT=8765                                  # default
    export MCP_BIND_ADDR=127.0.0.1                        # default — loopback only
    export MCP_AUTH_BACKOFF_THRESHOLD=5                   # failed bearer attempts before backoff; 0 disables
    export MCP_AUTH_BACKOFF_MS=30000                      # Retry-After window for auth backoff
    node build/index.js

    Endpoints exposed in this mode:

    • GET /health — unauthenticated liveness probe; returns 200 {"status":"ok"} only. Per RFC 008 §6.8 it intentionally exposes no version, uptime, or filesystem fingerprint to anonymous callers.
    • MCP_TRANSPORT=sse: GET /sse opens the long-lived SSE stream and POST /messages?sessionId= sends JSON-RPC messages for that session.
    • MCP_TRANSPORT=http: POST /mcp initializes and sends JSON-RPC messages using streamable HTTP. The server returns Mcp-Session-Id during initialization; clients must send it on subsequent GET, POST, and DELETE /mcp requests.

    All non-health transport endpoints require Authorization: Bearer .

    Repeated bearer failures from the same remote address enter a bounded in-memory

    backoff and return Retry-After; valid authentication clears the address state.

    When the server sits behind a reverse proxy, the key is the proxy socket address,

    not X-Forwarded-For, so configure proxy-side throttling for internet exposure.

    For a disposable remote playground during development, use npm run dev:remote

    from the local development section above.

    Security defaults: the server refuses to start in SSE or streamable HTTP mode without MCP_AUTH_TOKEN, binds only to loopback, and uses a constant-time bearer comparison. Operators exposing the endpoint off-host should set MCP_BIND_ADDR=0.0.0.0 *and* terminate TLS in a reverse proxy — TLS is out of scope for this server. Multiple MCP/CLI processes may share a FAISS_INDEX_PATH; mutating paths serialize through per-model locks and versioned atomic saves (see [docs/architecture/threat-model.md](./docs/architecture/threat-model.md)).

    Troubleshooting & Logging

    For a command-oriented runbook covering empty results, stale-index footers, linked-checkout/global-bin drift, missing active models, backend availability, and refresh lock contention, see [docs/troubleshooting-local-kb.md](docs/troubleshooting-local-kb.md). For operator incidents keyed by symptom, use [docs/operations/incident-response.md](docs/operations/incident-response.md).

    KB availability smoke check

    When kb search (or the MCP retrieve_knowledge tool) is not returning results, run the read-only kb doctor command first — it is the canonical availability check for retrieval and also reports local LLM readiness for kb ask:

    bash
    kb doctor                # human-readable report
    kb doctor --format=json  # machine-readable for agent shells
    kb doctor --endpoints    # focused bind/connect endpoint preflight
    kb doctor --locks        # model write-lock owner/stale diagnosis
    kb doctor --kb-symlinks  # KB-root symlink inventory and target classification
    kb doctor --bug-report=/tmp  # redacted support bundle directory

    The report covers active-model resolution, FAISS index version + mtime, the latest in-process index-update summary, per-KB stale counts, embedding-backend reachability (Ollama / HuggingFace / OpenAI), local LLM endpoint readiness, CLI version, and local git state. The command exits non-zero when any required retrieval check fails (active model unresolved, index missing, backend unreachable); LLM endpoint failures are WARN rows because search can remain healthy while kb ask is not ready.

    Use kb doctor --endpoints when you only need configured local endpoint readiness before starting or wiring clients. It checks MCP bind address/port availability, configured KB_DAEMON_URL health, configured Ollama embedding reachability, and configured KB_LLM_ENDPOINT or active LLM profile readiness without loading the full index health report.

    Use kb doctor --locks when refresh or model writes report lock contention. It scans per-model .kb-write.lock paths, reports lock age, recorded owner PID/command for new locks, stale suspicion, and conservative next actions without deleting any lock file.

    Use kb doctor --kb-symlinks when auditing KB content roots. It scans with lstat, does not follow symlink directories, and classifies symlink targets as inside-root, escaping, broken, or loop/error with capped examples.

    Use kb doctor --bug-report[=] when opening an issue or handing diagnostics to another operator. It writes a timestamped directory containing redacted doctor.json, stats.json, recent canonical log summaries, runtime/env metadata, and a README. The bundle does not include note contents or raw API keys; KB names, paths, model names, and log metadata can still be sensitive.

    Distinguishing search failure modes

    kb search failures are classified into one of six categories so a user or agent can tell what to fix without reading stack traces. Each failure carries a stable code, a category, a human message, and a concrete next_action:

    CategoryTypical codesWhat to try
    configurationPROVIDER_AUTH, KB_NOT_FOUND, ACTIVE_MODEL_UNRESOLVEDSet the missing API key, run kb list / kb models list, or kb models set-active .
    indexingINDEX_NOT_INITIALIZED, CORRUPT_INDEXBuild or rebuild the index with kb search --refresh.
    providerPROVIDER_UNAVAILABLE, PROVIDER_TIMEOUTVerify the embedding backend is reachable (ollama serve, provider status page).
    permissionsPERMISSION_DENIEDGrant write access to $FAISS_INDEX_PATH and per-KB .index/.
    inputVALIDATIONAdjust the rejected field named in the message.
    lockREFRESH_LOCK_BUSYRetry shortly; only one kb search --refresh writer runs per model.

    With --format=md the same fields render to stderr as kb search: followed by category: and next: lines. With --format=json they render to stdout as {"error":{"code","category","message","next_action",...}} so an agent can branch on the category programmatically. When the cause is unclear, the next_action falls back to kb doctor which prints the exact health snapshot needed to diagnose.

    Exit codes mirror the CLI's existing convention — 2 for configuration and input problems the user can fix without retry, 1 for runtime / index / provider / permissions / lock problems.

    Other tips

    • Set LOG_FILE to capture structured logs (JSON-RPC traffic continues to use stdout). This is especially helpful when diagnosing MCP handshake errors because all diagnostic messages are written to stderr and the optional log file.
    • Permission errors when creating or updating the FAISS index are surfaced with explicit messages in both the console and the log file. Verify that the process can write to FAISS_INDEX_PATH and the .index directories inside each knowledge base.
    • Run npm test to execute the Jest suite (serialised with --runInBand) that covers logger fallback behaviour and FAISS permission handling.

    Security

    The server is designed to run as a local tool: one user, one machine, one trusted terminal. Two trust boundaries matter in practice. The $FAISS_INDEX_PATH directory is a code-execution boundary — FaissStore.load deserialises the docstore via pickleparser, so the directory must only contain files written by this server (no untrusted backups, no shared-write mounts). The $KNOWLEDGE_BASES_ROOT_DIR tree is a content boundary — its contents are embedded and returned verbatim to the MCP client, so markdown from untrusted sources is a prompt-injection risk for downstream agents. Multiple MCP/CLI processes may share a trusted FAISS_INDEX_PATH; writes are coordinated by per-model locks and atomic save. Full discussion, including provider-key handling and concurrency, is in [docs/architecture/threat-model.md](./docs/architecture/threat-model.md).

    Similar MCP

    Based on tags & features

    • MC

      Mcp Open Library

      TypeScript·
      42
    • DI

      Discogs Mcp Server

      TypeScript·
      59
    • QU

      Quran Mcp Server

      TypeScript·
      50
    • AN

      Anilist Mcp

      TypeScript·
      57

    Trending MCP

    Most active this week

    • PL

      Playwright Mcp

      TypeScript·
      22.1k
    • SE

      Serena

      Python·
      14.5k
    • MC

      Mcp Playwright

      TypeScript·
      4.9k
    • MC

      Mcp Server Cloudflare

      TypeScript·
      3.0k
    View All MCP Servers

    Similar MCP

    Based on tags & features

    • MC

      Mcp Open Library

      TypeScript·
      42
    • DI

      Discogs Mcp Server

      TypeScript·
      59
    • QU

      Quran Mcp Server

      TypeScript·
      50
    • AN

      Anilist Mcp

      TypeScript·
      57

    Trending MCP

    Most active this week

    • PL

      Playwright Mcp

      TypeScript·
      22.1k
    • SE

      Serena

      Python·
      14.5k
    • MC

      Mcp Playwright

      TypeScript·
      4.9k
    • MC

      Mcp Server Cloudflare

      TypeScript·
      3.0k