π Model Context Protocol (MCP) server with an HTTP API endpoint to access data from various open access data publishers
Documentation
π EOSC Data Commons Search server
A server for the EOSC Data Commons project MatchMaker service, providing natural language search over open-access datasets. It exposes an HTTP POST endpoint and supports the Model Context Protocol (MCP) to help users discover datasets and tools via a Large Language Modelβassisted search.
π§© Endpoints
The HTTP API comprises 2 main endpoints:
/mcp: MCP server that searches for relevant data to answer a user question using the EOSC Data Commons OpenSearch service- Uses Streamable HTTP transport
- Available tools:
- [x] Search datasets
- [x] Get metadata for the files in a dataset (name, description, type of files)
- [ ] Search tools
- [ ] Search citations related to datasets or tools
/chat: HTTP POST endpoint (JSON) for chatting with the MCP server tools via an LLM provider (API key provided through env variable at deployment)- Streams Server-Sent Events (SSE) response complying with the AG-UI protocol.
[!TIP]
It can also be used just as a MCP server through the pip package.
π Connect to MCP server
The system can be used directly as a MCP server using either STDIO, or Streamable HTTP transport.
[!WARNING]
You will need access to a pre-indexed OpenSearch instance for the MCP server to work.
Follow the instructions of your client, and use the /mcp URL of your deployed server (e.g. http://localhost:8000/mcp)
To add a new MCP server to VSCode GitHub Copilot:
- Open the Command Palette (
ctrl+shift+porcmd+shift+p) - Search for
MCP: Add Server... - Choose
HTTP, and provide the MCP server URL http://localhost:8000/mcp
Your VSCode mcp.json should look like:
{
"servers": {
"data-commons-search-http": {
"url": "http://localhost:8000/mcp",
"type": "http"
}
},
"inputs": []
}Or with STDIO transport:
{
"servers": {
"data-commons-search": {
"type": "stdio",
"command": "uvx",
"args": ["data-commons-search"],
"env": {
"OPENSEARCH_URL": "OPENSEARCH_URL"
}
}
}
}Or using local folder for development:
{
"servers": {
"data-commons-search": {
"type": "stdio",
"cwd": "~/dev/data-commons-search",
"env": {
"OPENSEARCH_URL": "OPENSEARCH_URL"
},
"command": "uv",
"args": ["run", "data-commons-search"]
}
}
}π οΈ Development
[!IMPORTANT]
Requirements:
- [x] [
uv](https://docs.astral.sh/uv/getting-started/installation/), to easily handle scripts and virtual environments- [x] docker, to deploy the OpenSearch service (or just access to a running instance)
- [x] API key for a LLM provider: e-infra CZ, Mistral.ai, or OpenRouter
π₯ Install dev dependencies
uv sync --all-extrasInstall pre-commit hooks:
uv run --all-extras pre-commit installCreate a keys.env file with your LLM provider API key(s), and optionally other configurations:
CESNET_API_KEY=YOUR_API_KEY
MISTRAL_API_KEY=YOUR_API_KEY
OPENROUTER_API_KEY=YOUR_API_KEY
OIDC_CLIENT_ID=
OIDC_CLIENT_SECRET=
LANGFUSE_HOST=https://cloud.langfuse.com
LANGFUSE_PUBLIC_KEY=
LANGFUSE_SECRET_KEY=
OPENSEARCH_URL=http://localhost:9200
REDIS_URL=redis://localhost:6379
POSTGRES_USER=app
POSTGRES_USER=app_password
POSTGRES_HOST=localhost
LOG_CONFIG=logging.dev.yml
RATE_LIMITING_ENABLED=Falseβ‘οΈ Start dev server
[!IMPORTANT]
The search system needs to connect to a PostgreSQL database to store authenticated users conversations, deploy and initialize the metadata-warehouse. See section below for more details on managing the database.
Start the server in dev at http://localhost:8000, with MCP endpoint at http://localhost:8000/mcp pointing to a running OpenSearch instance:
uv run --all-extras uvicorn src.data_commons_search.main:app --reloadDefault
OPENSEARCH_URL=http://localhost:9200
Customize server port through environment variable:
OPENSEARCH_URL=http://localhost:9200 SERVER_PORT=8001 uv run --all-extras uvicorn src.data_commons_search.main:app --host 0.0.0.0 --port 8001 --reload[!NOTE]
You can deploy the
matchmakerfrontend in dev on the side pointing to this dev server:```sh
cd ../matchmaker
npm run dev
```
[!TIP]
Example
curlrequest:```sh
curl -X POST http://localhost:8000/chat -H "Content-Type: application/json" \
-d '{"items": [{"type": "message", "role": "user", "content": [{"text": "Educational datasets from Switzerland covering student assessments, language competencies, and learning outcomes, including experimental or longitudinal studies on pupils or students."}]}], "model": "cesnet/qwen3-coder"}'
```
With authenticated user access token from http://127.0.0.1:8000/auth/login:
```sh
curl -X POST http://localhost:8000/chat -H "Content-Type: application/json" \
-H "Cookie: access_token=$ACCESS_TOKEN" \
-d '{"items": [{"type": "message", "role": "user", "content": [{"text": "Educational datasets from Switzerland covering student assessments, language competencies, and learning outcomes, including experimental or longitudinal studies on pupils or students."}]}], "model": "cesnet/qwen3-coder"}'
```
Get last conversation:
```sh
curl -X GET "http://localhost:8000/conversation/$(curl -s http://localhost:8000/conversations -H "Content-Type: application/json" -H "Cookie: access_token=$ACCESS_TOKEN" | jq -r '.[-1].thread_id')" -H "Content-Type: application/json" -H "Cookie: access_token=$ACCESS_TOKEN"
```
Find available model from Cesnet provider:
```sh
curl -H "Authorization: Bearer $CESNET_API_KEY" https://llm.ai.e-infra.cz/v1/models | jq ".data[].id"
```
Recommended model:
cesnet/qwen3-coderorcesnet/gpt-oss-120b(smaller, faster)
πΎ Database
The search system needs to connect to a PostgreSQL database to store authenticated users conversations, deploy and initialize the metadata-warehouse.
cd metadata-warehouse
docker compose up postgresInitialize db (from metadata-warehouse repo):
uv run --directory scripts/postgres_data create_db.py --db appdb --reset[!IMPORTANT]
For staging and production environments you will want to update the
appuser password:ALTER USER app WITH PASSWORD 'newpassword';
Reset db:
docker compose down --volumes --remove-orphansExport schema from db.py to metadata-warehouse (command to run at the root of the data-commons-search repo, and expect the metadata-warehouse folder to be alongside the data-commons-search folder in the same folder):
uv run scripts/export_db_schema.py ../metadata-warehouse/scripts/postgres_data/create_sql/appdb/tables.sqlπ¦ Build for production
Build binary in dist/
uv buildπ³ Deploy with Docker
Create a keys.env file with the API keys:
CESNET_API_KEY=YOUR_API_KEY
MISTRAL_API_KEY=YOUR_API_KEY
OPENROUTER_API_KEY=YOUR_API_KEY
SEARCH_API_KEY=SECRET_KEY_YOU_CAN_USE_IN_FRONTEND_TO_AVOID_SPAM[!TIP]
SEARCH_API_KEYcan be used to add a layer of protection against bots that might spam the LLM, if not provided no API key will be needed to query the API.
You can use the prebuilt docker image [ghcr.io/eosc-data-commons/data-commons-search:main](https://github.com/EOSC-Data-Commons/data-commons-search/pkgs/container/data-commons-search)
Example compose.yml:
services:
mcp:
image: ghcr.io/eosc-data-commons/data-commons-search:main
ports:
- "127.0.0.1:8000:8000"
environment:
OPENSEARCH_URL: "http://opensearch:9200"
CESNET_API_KEY: "${CESNET_API_KEY}"Build and deploy the service:
docker compose up[!IMPORTANT]
Current deployment to staging server is done automatically through GitHub Actions at each push to the
mainbranch.When a push is made the workflow will:
- Pull the
mainbranch from the frontend repository- Build the frontend, and add it to
src/data_commons_search/webapp- Build the docker image for the server
- Publish the docker image as
main/latest- The staging infrastructure then automatically pull the
latestversion of the image and deploys it.
β Run tests
[!CAUTION]
You need to first start the server on port 8000 (see start dev server section) and PostgreSQL.
uv run pytestTo display all logs for debugging:
uv run pytest -sRun search benchmark:
uv run tests/benchmark.pyπ§Ή Format code and type check
uvx ruff format && uvx ruff check --fix && uvx ty checkβ»οΈ Reset the environment
Upgrade uv:
uv self updateClean uv cache:
uv cache cleanπ·οΈ Release process
[!IMPORTANT]
Get a PyPI API token at pypi.org/manage/account.
Run the release script providing the version bump: fix, minor, or major
.github/release.sh fix[!TIP]
Add your PyPI token to your environment, e.g. in
~/.zshrcor~/.bashrc:```sh
export UV_PUBLISH_TOKEN=YOUR_TOKEN
```
π€ Acknowledments
The LLM provider cesnet is a service provided by e-INFRA CZ and operated by CERIT-SC Masaryk University
Computational resources were provided by the e-INFRA CZ project (ID:90254), supported by the Ministry of Education, Youth and Sports of the Czech Republic.
Similar MCP
Based on tags & features
Trending MCP
Most active this week