Fetcher MCP

MCP server for fetch web page content using Playwright headless browser.

🌟 Recommended: OllaMan - Powerful Ollama AI Model Manager.

Advantages

JavaScript Support: Unlike traditional web scrapers, Fetcher MCP uses Playwright to execute JavaScript, making it capable of handling dynamic web content and modern web applications.

Intelligent Content Extraction: Built-in Readability algorithm automatically extracts the main content from web pages, removing ads, navigation, and other non-essential elements.

Flexible Output Format: Supports both HTML and Markdown output formats, making it easy to integrate with various downstream applications.

Parallel Processing: The fetch_urls tool enables concurrent fetching of multiple URLs, significantly improving efficiency for batch operations.

Resource Optimization: Automatically blocks unnecessary resources (images, stylesheets, fonts, media) to reduce bandwidth usage and improve performance.

Robust Error Handling: Comprehensive error handling and logging ensure reliable operation even when dealing with problematic web pages.

Configurable Parameters: Fine-grained control over timeouts, content extraction, and output formatting to suit different use cases.

Quick Start

Run directly with npx:

bash

npx -y fetcher-mcp

First time setup - install the required browser by running the following command in your terminal:

bash

npx playwright install chromium

HTTP and SSE Transport

Use the --transport=http parameter to start both Streamable HTTP endpoint and SSE endpoint services simultaneously:

bash

npx -y fetcher-mcp --log --transport=http --host=0.0.0.0 --port=3000

After startup, the server provides the following endpoints:

/mcp - Streamable HTTP endpoint (modern MCP protocol)
/sse - SSE endpoint (legacy MCP protocol)

Clients can choose which method to connect based on their needs.

Debug Mode

Run with the --debug option to show the browser window for debugging:

bash

npx -y fetcher-mcp --debug

Configuration MCP

Configure this MCP server in Claude Desktop:

On MacOS: ~/Library/Application Support/Claude/claude_desktop_config.json

On Windows: %APPDATA%/Claude/claude_desktop_config.json

json

{
  "mcpServers": {
    "fetcher": {
      "command": "npx",
      "args": ["-y", "fetcher-mcp"]
    }
  }
}

Docker Deployment

Running with Docker

bash

docker run -p 3000:3000 ghcr.io/jae-jae/fetcher-mcp:latest

Deploying with Docker Compose

Create a docker-compose.yml file:

yaml

version: "3.8"

services:
  fetcher-mcp:
    image: ghcr.io/jae-jae/fetcher-mcp:latest
    container_name: fetcher-mcp
    restart: unless-stopped
    ports:
      - "3000:3000"
    environment:
      - NODE_ENV=production
    # Using host network mode on Linux hosts can improve browser access efficiency
    # network_mode: "host"
    volumes:
      # For Playwright, may need to share certain system paths
      - /tmp:/tmp
    # Health check
    healthcheck:
      test: ["CMD", "wget", "--spider", "-q", "http://localhost:3000"]
      interval: 30s
      timeout: 10s
      retries: 3

Then run:

bash

docker-compose up -d

Features

fetch_url - Retrieve web page content from a specified URL

Uses Playwright headless browser to parse JavaScript
Supports intelligent extraction of main content and conversion to Markdown
Supports the following parameters:
url: The URL of the web page to fetch (required parameter)
timeout: Page loading timeout in milliseconds, default is 30000 (30 seconds)
waitUntil: Specifies when navigation is considered complete, options: 'load', 'domcontentloaded', 'networkidle', 'commit', default is 'load'
extractContent: Whether to intelligently extract the main content, default is true
maxLength: Maximum length of returned content (in characters), default is no limit
returnHtml: Whether to return HTML content instead of Markdown, default is false
waitForNavigation: Whether to wait for additional navigation after initial page load (useful for sites with anti-bot verification), default is false
navigationTimeout: Maximum time to wait for additional navigation in milliseconds, default is 10000 (10 seconds)
disableMedia: Whether to disable media resources (images, stylesheets, fonts, media), default is true
debug: Whether to enable debug mode (showing browser window), overrides the --debug command line flag if specified

fetch_urls - Batch retrieve web page content from multiple URLs in parallel
Uses multi-tab parallel fetching for improved performance
Returns combined results with clear separation between webpages
Supports the following parameters:
urls: Array of URLs to fetch (required parameter)
Other parameters are the same as fetch_url

browser_install - Install Playwright Chromium browser binary automatically

Installs required Chromium browser binary when not available
Automatically suggested when browser installation errors occur
Supports the following parameters:
withDeps: Install system dependencies required by Chromium browser, default is false
force: Force installation even if Chromium is already installed, default is false

Tips

Handling Special Website Scenarios

Dealing with Anti-Crawler Mechanisms

Wait for Complete Loading: For websites using CAPTCHA, redirects, or other verification mechanisms, include in your prompt:

code

Please wait for the page to fully load

This will use the waitForNavigation: true parameter.

Increase Timeout Duration: For websites that load slowly:

code

Please set the page loading timeout to 60 seconds

This adjusts both timeout and navigationTimeout parameters accordingly.

Content Retrieval Adjustments

Preserve Original HTML Structure: When content extraction might fail:

code

Please preserve the original HTML content

Sets extractContent: false and returnHtml: true.

Fetch Complete Page Content: When extracted content is too limited:

code

Please fetch the complete webpage content instead of just the main content

Sets extractContent: false.

Return Content as HTML: When HTML format is needed instead of default Markdown:

code

Please return the content in HTML format

Sets returnHtml: true.

Debugging and Authentication

Enabling Debug Mode

Dynamic Debug Activation: To display the browser window during a specific fetch operation:

code

Please enable debug mode for this fetch operation

This sets debug: true even if the server was started without the --debug flag.

Using Custom Cookies for Authentication

Manual Login: To login using your own credentials:

code

Please run in debug mode so I can manually log in to the website

Sets debug: true or uses the --debug flag, keeping the browser window open for manual login.

Interacting with Debug Browser: When debug mode is enabled:

1. The browser window remains open

2. You can manually log into the website using your credentials

3. After login is complete, content will be fetched with your authenticated session

Enable Debug for Specific Requests: Even if the server is already running, you can enable debug mode for a specific request:

code

Please enable debug mode for this authentication step

Sets debug: true for this specific request only, opening the browser window for manual login.

Development

Install Dependencies

bash

npm install

Install Playwright Browser

Install the browsers needed for Playwright:

bash

npm run install-browser

Build the Server

bash

npm run build

Debugging

Use MCP Inspector for debugging:

bash

npm run inspector

You can also enable visible browser mode for debugging:

bash

node build/index.js --debug

g-search-mcp: A powerful MCP server for Google search that enables parallel searching with multiple keywords simultaneously. Perfect for batch search operations and data collection.

License

Licensed under the MIT License

Fetcher Mcp

Documentation

Fetcher MCP

Advantages

Quick Start

HTTP and SSE Transport

Debug Mode

Configuration MCP

Docker Deployment

Running with Docker

Deploying with Docker Compose

Features

Tips

Handling Special Website Scenarios

Dealing with Anti-Crawler Mechanisms

Content Retrieval Adjustments

Debugging and Authentication

Enabling Debug Mode

Using Custom Cookies for Authentication

Development

Install Dependencies

Install Playwright Browser

Build the Server

Debugging

Related Projects

License

Similar MCP

Mcp Server Browserbase

Mcp Open Library

Anilist Mcp

Playwright Mcp

Trending MCP

Playwright Mcp

Serena

Mcp Playwright

Mcp Server Cloudflare

Mcp Server Browserbase

Mcp Open Library

Anilist Mcp

Playwright Mcp

Playwright Mcp

Serena

Mcp Playwright

Mcp Server Cloudflare