Automatic operation of on-screen GUI.
Documentation
omniparser-autogui-mcp
(日本語版はこちら)
This is an MCP server that analyzes the screen with OmniParser and automatically operates the GUI.
Confirmed on Windows.
License notes
This is MIT license, but Excluding submodules and sub packages.
OmniParser's repository is CC-BY-4.0.
Each OmniParser model has a different license (reference).
Installation
1. Please do the following:
git clone --recursive https://github.com/NON906/omniparser-autogui-mcp.git
cd omniparser-autogui-mcp
uv sync
set OCR_LANG=en
uv run download_models.py(Other than Windows, use `export instead of set`.)
(If you want `langchain_example.py to work, uv sync --extra langchain` instead.)
2. Add this to your `claude_desktop_config.json`:
{
"mcpServers": {
"omniparser_autogui_mcp": {
"command": "uv",
"args": [
"--directory",
"D:\\CLONED_PATH\\omniparser-autogui-mcp",
"run",
"omniparser-autogui-mcp"
],
"env": {
"PYTHONIOENCODING": "utf-8",
"OCR_LANG": "en"
}
}
}
}(Replace `D:\\CLONED_PATH\\omniparser-autogui-mcp` with the directory you cloned.)
`env` allows for the following additional configurations:
- `
OMNI_PARSER_BACKEND_LOAD`
If it does not work with other clients (such as LibreChat), specify `1`.
- `
TARGET_WINDOW_NAME`
If you want to specify the window to operate, please specify the window name.
If not specified, operates on the entire screen.
- `
OMNI_PARSER_SERVER`
If you want OmniParser processing to be done on another device, specify the server's address and port, such as `127.0.0.1:8000`.
The server can be started with `uv run omniparserserver`.
- `
SSE_HOST,SSE_PORT`
If specified, communication will be done via SSE instead of stdio.
- `
SOM_MODEL_PATH,CAPTION_MODEL_NAME,CAPTION_MODEL_PATH,OMNI_PARSER_DEVICE,BOX_TRESHOLD`
These are for OmniParser configuration.
Usually, they are not necessary.
Usage Examples
- Search for "MCP server" in the on-screen browser.
etc.
Similar MCP
Based on tags & features
Trending MCP
Most active this week