Track MCP LogoTrack MCP
Track MCP LogoTrack MCP

The world's largest repository of Model Context Protocol servers. Discover, explore, and submit MCP tools.

Product

  • Categories
  • Top MCP
  • New & Updated
  • Submit MCP

Company

  • About

Legal

  • Privacy Policy
  • Terms of Service
  • Cookie Policy

© 2026 TrackMCP. All rights reserved.

Built with ❤️ by Krishna Goyal

    Mcp Server Datahub

    The official Model Context Protocol (MCP) server for DataHub (https://datahub.com)

    60 stars
    Python
    Updated Oct 17, 2025

    Table of Contents

    • What is DataHub?
    • Use Cases
    • Why DataHub MCP Server?
    • Structured Search with Context Filtering
    • SQL Intelligence & Query Generation
    • Table & Column-Level Lineage
    • Understands Your Data Ecosystem
    • Usage
    • Demo
    • Tools
    • Mutation Tools
    • User Tools
    • Document Tools
    • Configuration
    • Environment Variables
    • Example: Data Discovery & Understanding Flow (for Agents Using DataHub Tools)
    • 1. User Asks a Question
    • 2. Search for Relevant Datasets
    • 3. Inspect Candidate Datasets
    • 3.1 List Schema Fields
    • 3.2 Fetch Lineage (optional)
    • 3.3 Get Example Queries
    • 4. Understand Entity Relationships
    • get_entities
    • get_lineage_paths_between
    • 5. Construct a Query
    • 6. Return the Final Answer
    • Summary of Tools Used
    • Developing

    Table of Contents

    • What is DataHub?
    • Use Cases
    • Why DataHub MCP Server?
    • Structured Search with Context Filtering
    • SQL Intelligence & Query Generation
    • Table & Column-Level Lineage
    • Understands Your Data Ecosystem
    • Usage
    • Demo
    • Tools
    • Mutation Tools
    • User Tools
    • Document Tools
    • Configuration
    • Environment Variables
    • Example: Data Discovery & Understanding Flow (for Agents Using DataHub Tools)
    • 1. User Asks a Question
    • 2. Search for Relevant Datasets
    • 3. Inspect Candidate Datasets
    • 3.1 List Schema Fields
    • 3.2 Fetch Lineage (optional)
    • 3.3 Get Example Queries
    • 4. Understand Entity Relationships
    • get_entities
    • get_lineage_paths_between
    • 5. Construct a Query
    • 6. Return the Final Answer
    • Summary of Tools Used
    • Developing

    Documentation

    DataHub MCP Server

    A Model Context Protocol server implementation for DataHub.

    What is DataHub?

    DataHub is an open-source context platform that gives organizations a single pane of glass across their entire data supply chain. DataHub unifies data discovery, governance, and observability under one roof for every table, column, dashboard pipeline, document, and ML Model.

    With powerful features for data profiling, data quality monitoring, data lineage, data ownership, and data classification, DataHub brings together both technical and organizational context, allowing teams to find, create, use, and maintain trustworthy data.

    Use Cases

    The DataHub MCP Server enables AI agents to:

    • Find trustworthy data: Search across the entire data landscape using natural language to find the tables, columns, dashboards, & metrics that can answer your most mission-critical questions. Leverage trust signals like data popularity, quality, lineage, and query history to get it right, every time.
    • Explore data lineage & plan for data changes: Understand the impact of important data changes _before_ they impact your downstream users through rich data lineage at the asset & column level.
    • Understand your business: Navigate important organizational context like business glossaries, data domains, data products products, _and_ data assets. Understand how key metrics, business processes, and data relate to one another.
    • Explain & generate SQL queries: Generate accurate SQL queries to answer your most important questions with the help of critical context like data documentation, data lineage, and popular queries across the organization.

    Why DataHub MCP Server?

    With DataHub MCP Server, you can instantly give AI agents visibility into of your entire data ecosystem. Find and understand data stored in your databases, data lake, data warehouse, and BI visualization tools. Explore data lineage, understand usage & use cases, identify the data experts, and generate SQL - all through natural language.

    Structured Search with Context Filtering

    Go beyond keyword matching with powerful query & filtering syntax:

    • Wildcard matching: /q revenue_* finds revenue_kpis, revenue_daily, revenue_forecast
    • Field searches: /q tag:PII finds all PII-tagged data
    • Boolean logic: /q (sales OR revenue) AND quarterly for complex queries

    SQL Intelligence & Query Generation

    Access popular SQL queries, and generate new ones with accuracy:

    • See how analysts query tables (perfect for SQL generation)
    • Understand join patterns and common filters
    • Learn from production query patterns

    Table & Column-Level Lineage

    Trace data flow at both the table and column level:

    • Track how user_id becomes customer_key downstream
    • Understand transformation logic
    • Upstream and downstream exploration (1-3+ hops)
    • Handle enterprise-scale lineage graphs

    Understands Your Data Ecosystem

    Understand how your data is organized before searching:

    • Discover relevant data domains, owners, tags and glossary terms
    • Browse across data platforms and environments
    • Navigate the complexities of your data landscape without guessing

    Usage

    See instructions in the DataHub MCP server docs.

    Demo

    Check out the demo video, done in collaboration with the team at Block.

    Tools

    The DataHub MCP Server provides the following tools:

    search

    Search DataHub using structured keyword search (/q syntax) with boolean logic, filters, pagination, and optional sorting by usage metrics.

    get_lineage

    Retrieve upstream or downstream lineage for any entity (datasets, columns, dashboards, etc.) with filtering, query-within-lineage, pagination, and hop control.

    get_dataset_queries

    Fetch real SQL queries referencing a dataset or column—manual or system-generated—to understand usage patterns, joins, filters, and aggregation behavior.

    get_entities

    Fetch detailed metadata for one or more entities by URN; supports batch retrieval for efficient inspection of search results.

    list_schema_fields

    List schema fields for a dataset with keyword filtering and pagination, useful when search results truncate fields or when exploring large schemas.

    get_lineage_paths_between

    Retrieve the exact lineage paths between two assets or columns, including intermediate transformations and SQL query information.

    Mutation Tools

    These tools allow modifying metadata in DataHub. They are enabled via the TOOLS_IS_MUTATION_ENABLED=true environment variable.

    add_tags / remove_tags

    Add or remove tags from entities or schema fields (columns). Supports bulk operations on multiple entities.

    add_terms / remove_terms

    Add or remove glossary terms from entities or schema fields. Useful for applying business definitions and data classification.

    add_owners / remove_owners

    Add or remove ownership assignments from entities. Supports different ownership types (technical owner, data owner, etc.).

    set_domains / remove_domains

    Assign or remove domain membership for entities. Each entity can belong to one domain.

    update_description

    Update, append to, or remove descriptions for entities or schema fields. Supports markdown formatting.

    add_structured_properties / remove_structured_properties

    Manage structured properties (typed metadata fields) on entities. Supports string, number, URN, date, and rich text value types.

    User Tools

    These tools provide information about the authenticated user. Enabled via TOOLS_IS_USER_ENABLED=true.

    get_me

    Retrieve information about the currently authenticated user, including profile details and group memberships.

    Document Tools

    These tools work with documents (knowledge articles, runbooks, FAQs) stored in DataHub. Document tools are automatically hidden if no documents exist in the catalog.

    search_documents

    Search for documents using keyword search with filters for platforms, domains, tags, glossary terms, and owners.

    grep_documents

    Search within document content using regex patterns. Useful for finding specific information across multiple documents.

    save_document

    Save standalone documents (insights, decisions, FAQs, notes) to DataHub's knowledge base. Documents are organized under a configurable parent folder.

    Configuration

    Environment Variables

    VariableDefaultDescription
    TOOLS_IS_MUTATION_ENABLEDfalseEnable mutation tools (add/remove tags, owners, etc.)
    TOOLS_IS_USER_ENABLEDfalseEnable user tools (get_me)
    DATAHUB_MCP_DOCUMENT_TOOLS_DISABLEDfalseCompletely disable document tools
    SAVE_DOCUMENT_TOOL_ENABLEDtrueEnable/disable the save_document tool
    SAVE_DOCUMENT_PARENT_TITLESharedTitle for the parent folder of saved documents
    SAVE_DOCUMENT_ORGANIZE_BY_USERfalseOrganize saved documents by user
    SAVE_DOCUMENT_RESTRICT_UPDATEStrueOnly allow updating documents in the shared folder
    TOOL_RESPONSE_TOKEN_LIMIT80000Maximum tokens for tool responses
    ENTITY_SCHEMA_TOKEN_BUDGET16000Token budget per entity for schema fields
    DISABLE_NEWER_GMS_FIELD_DETECTIONfalseDisable adaptive GMS field detection
    DATAHUB_MCP_DISABLE_DEFAULT_VIEWfalseDisable automatic default view application
    SEMANTIC_SEARCH_ENABLEDfalseEnable semantic (AI-powered) search

    Example: Data Discovery & Understanding Flow (for Agents Using DataHub Tools)

    This example illustrates how an AI agent could orchestrate DataHub MCP tools to answer a user's data question. It demonstrates the decision-making flow, which tools are called, and how responses are used.

    1. User Asks a Question

    Example:

    "How can I find out how many pets were adopted last month?"

    The agent recognizes this as a data discovery → query construction workflow. It needs to (a) find relevant datasets, (b) inspect metadata, (c) construct a correct SQL query.

    2. Search for Relevant Datasets

    The agent begins with the search tool (semantic or keyword depending on configuration).

    Tool: search

    Input: natural-language query

    Example Call:

    json
    {
      "query": "pet adoptions"
    }

    Purpose: Identify datasets like adoptions, pet_profiles, pet_details.

    3. Inspect Candidate Datasets

    For each dataset returned by search, the agent may fetch metadata.

    3.1 List Schema Fields

    Tool: list_schema_fields

    Input: URN of dataset

    Purpose: Understand schema, datatype, candidate fields for querying.

    Example:

    json
    {
      "urn": "urn:li:dataset:(urn:li:dataPlatform:snowflake,mydb.public.adoptions,PROD)"
    }

    3.2 Fetch Lineage (optional)

    Tool: get_lineage

    Purpose: Determine whether dataset is derived or authoritative.

    3.3 Get Example Queries

    Tool: get_dataset_queries

    Purpose: Learn typical usage patterns and query templates for the dataset.

    4. Understand Entity Relationships

    If the question requires joining or entity navigation (e.g., connecting pets → adoptions):

    get_entities

    To retrieve entities related to a given URN, such as upstream/downstream tables.

    get_lineage_paths_between

    To calculate exact lineage paths between datasets if needed (e.g., between pet_profiles and adoptions).

    5. Construct a Query

    The agent now has:

    • The correct dataset
    • Its schema
    • Key fields
    • Sample queries
    • Relationship and lineage context

    The agent constructs an accurate SQL query.

    Example:

    sql
    SELECT COUNT(*)
    FROM mydb.public.adoptions
    WHERE adoption_date >= DATE_TRUNC('month', CURRENT_DATE - INTERVAL '1' MONTH)
      AND adoption_date < DATE_TRUNC('month', CURRENT_DATE);

    6. Return the Final Answer

    The agent may either:

    • return the SQL directly,
    • run it (if in an environment where query execution is allowed), or
    • provide a natural-language answer based on query output.

    Summary of Tools Used

    Tool NamePurpose
    searchFind relevant datasets for the question.
    list_schema_fieldsUnderstand dataset structure.
    get_lineageAssess data authority and provenance.
    get_dataset_queriesLearn how the dataset is typically queried.
    get_entitiesRetrieve related entities for context.
    get_lineage_paths_betweenUnderstand deeper relationships between datasets.

    Developing

    See DEVELOPING.md.

    Similar MCP

    Based on tags & features

    • AS

      Aseprite Mcp

      Python·
      92
    • IS

      Isaac Sim Mcp

      Python·
      83
    • FH

      Fhir Mcp Server

      Python·
      55
    • AL

      Alibaba Cloud Ops Mcp Server

      Python·
      78

    Trending MCP

    Most active this week

    • PL

      Playwright Mcp

      TypeScript·
      22.1k
    • SE

      Serena

      Python·
      14.5k
    • MC

      Mcp Playwright

      TypeScript·
      4.9k
    • MC

      Mcp Server Cloudflare

      TypeScript·
      3.0k
    View All MCP Servers

    Similar MCP

    Based on tags & features

    • AS

      Aseprite Mcp

      Python·
      92
    • IS

      Isaac Sim Mcp

      Python·
      83
    • FH

      Fhir Mcp Server

      Python·
      55
    • AL

      Alibaba Cloud Ops Mcp Server

      Python·
      78

    Trending MCP

    Most active this week

    • PL

      Playwright Mcp

      TypeScript·
      22.1k
    • SE

      Serena

      Python·
      14.5k
    • MC

      Mcp Playwright

      TypeScript·
      4.9k
    • MC

      Mcp Server Cloudflare

      TypeScript·
      3.0k