An MCP server that provides image recognition ๐ capabilities using Anthropic and OpenAI vision APIs
Documentation
MCP Image Recognition Server
An MCP server that provides image recognition capabilities using Anthropic and OpenAI vision APIs. Version 0.1.2.
Features
- Image description using Anthropic Claude Vision or OpenAI GPT-4 Vision
- Support for multiple image formats (JPEG, PNG, GIF, WebP)
- Configurable primary and fallback providers
- Base64 and file-based image input support
- Optional text extraction using Tesseract OCR
Requirements
- Python 3.8 or higher
- Tesseract OCR (optional) - Required for text extraction feature
- Windows: Download and install from UB-Mannheim/tesseract
- Linux:
sudo apt-get install tesseract-ocr - macOS:
brew install tesseract
Installation
1. Clone the repository:
git clone https://github.com/mario-andreschak/mcp-image-recognition.git
cd mcp-image-recognition2. Create and configure your environment file:
cp .env.example .env
# Edit .env with your API keys and preferences3. Build the project:
build.batUsage
Running the Server
Spawn the server using python:
python -m image_recognition_server.serverStart the server using batch instead:
run.bat serverStart the server in development mode with the MCP Inspector:
run.bat debugAvailable Tools
1. describe_image
- Input: Base64-encoded image data and MIME type
- Output: Detailed description of the image
2. describe_image_from_file
- Input: Path to an image file
- Output: Detailed description of the image
Environment Configuration
ANTHROPIC_API_KEY: Your Anthropic API key.OPENAI_API_KEY: Your OpenAI API key.VISION_PROVIDER: Primary vision provider (anthropicoropenai).FALLBACK_PROVIDER: Optional fallback provider.LOG_LEVEL: Logging level (DEBUG, INFO, WARNING, ERROR).ENABLE_OCR: Enable Tesseract OCR text extraction (trueorfalse).TESSERACT_CMD: Optional custom path to Tesseract executable.OPENAI_MODEL: OpenAI Model (default:gpt-4o-mini). Can use OpenRouter format for other models (e.g.,anthropic/claude-3.5-sonnet:beta).OPENAI_BASE_URL: Optional custom base URL for the OpenAI API. Set tohttps://openrouter.ai/api/v1for OpenRouter.OPENAI_TIMEOUT: Optional custom timeout (in seconds) for the OpenAI API.
Using OpenRouter
OpenRouter allows you to access various models using the OpenAI API format. To use OpenRouter, follow these steps:
1. Obtain an OpenAI API key from OpenRouter.
2. Set OPENAI_API_KEY in your .env file to your OpenRouter API key.
3. Set OPENAI_BASE_URL to https://openrouter.ai/api/v1.
4. Set OPENAI_MODEL to the desired model using the OpenRouter format (e.g., anthropic/claude-3.5-sonnet:beta).
5. Set VISION_PROVIDER to openai.
Default Models
- Anthropic:
claude-3.5-sonnet-beta - OpenAI:
gpt-4o-mini - OpenRouter: Use the
anthropic/claude-3.5-sonnet:betaformat inOPENAI_MODEL.
Development
Running Tests
Run all tests:
run.bat testRun specific test suite:
run.bat test server
run.bat test anthropic
run.bat test openaiDocker Support
Build the Docker image:
docker build -t mcp-image-recognition .Run the container:
docker run -it --env-file .env mcp-image-recognitionLicense
MIT License - see LICENSE file for details.
Release History
- 0.1.2 (2025-02-20): Improved OCR error handling and added comprehensive test coverage for OCR functionality
- 0.1.1 (2025-02-19): Added Tesseract OCR support for text extraction from images (optional feature)
- 0.1.0 (2025-02-19): Initial release with Anthropic and OpenAI vision support
Similar MCP
Based on tags & features
Trending MCP
Most active this week