Chatterbox TTS MCP Server

A simplified Model Context Protocol (MCP) server that provides text-to-speech generation with automatic playback using the Chatterbox TTS model. The server loads the model automatically on first use and provides real-time progress notifications to keep users informed throughout the process.

Overview

This MCP server exposes Chatterbox TTS functionality through a single, streamlined tool that generates speech from text and plays it automatically. The server handles model loading, progress reporting, temporary file management, and audio playback seamlessly.

Features

Single Tool: `speak_text`

The speak_text tool provides complete text-to-speech functionality:

Parameters:
text (required): The text to convert to speech
exaggeration (optional): Controls expressiveness (0.0-1.0, default 0.5)
cfg_weight (optional): Controls classifier-free guidance (0.0-1.0, default 0.5)

Features:
Automatic model loading with progress notifications
Generates speech using temporary files (auto-cleanup)
Plays audio automatically on macOS using afplay
Real-time progress updates during all phases:
Model initialization and loading
Speech generation
Audio playback

Resource: `chatterbox://model-info`

Get information about the TTS model status and device capabilities:

Model loading status (loaded/not loaded)
Device information (MPS/CUDA/CPU)
Hardware acceleration availability

Progress Notifications

The server provides detailed progress notifications throughout the speech generation process:

1. Model Loading Phase:

"Loading Chatterbox TTS model..."
"Initializing PyTorch device..."
"Loading model weights..."
"Model loaded successfully!"

2. Speech Generation Phase:

"Starting speech generation..."
"Speech generated, saving to temporary file..."

3. Playback Phase:

"Playing audio..."
"Audio playback completed!"

4. Status Updates:

Device selection (MPS/CUDA/CPU)
Voice prompt usage when applicable
Success/error messages

Installation

1. Install dependencies:

bash

pip install mcp torch torchaudio

2. Install Chatterbox TTS:

Follow the Chatterbox TTS installation instructions to ensure the chatterbox.tts module is available.

Configuration

Audio File Storage

By default, the server stores audio files in ~/.chatterbox/audio. You can configure a custom location using:

Command line argument:

bash

python chatterbox_mcp_server.py --audio-dir /path/to/custom/audio/directory

Environment variable:

bash

export CHATTERBOX_AUDIO_DIR="/path/to/custom/audio/directory"
python chatterbox_mcp_server.py

Priority order:

1. Command line --audio-dir argument (highest priority)

2. CHATTERBOX_AUDIO_DIR environment variable

3. Default: ~/.chatterbox/audio (lowest priority)

Audio File TTL (Time To Live)

By default, audio files are automatically cleaned up after 1 hour. You can configure a custom TTL:

Command line argument:

bash

python chatterbox_mcp_server.py --audio-ttl-hours 24  # Keep files for 24 hours

Environment variable:

bash

export CHATTERBOX_AUDIO_TTL_HOURS=24
python chatterbox_mcp_server.py

Priority order:

1. Command line --audio-ttl-hours argument (highest priority)

2. CHATTERBOX_AUDIO_TTL_HOURS environment variable

3. Default: 1 hour (lowest priority)

Model Auto-Loading

By default, the TTS model is loaded on first use to minimize startup time. You can pre-load it at startup:

Command line argument:

bash

python chatterbox_mcp_server.py --auto-load-model

This will load the model during server startup, which takes a few seconds but ensures the first TTS request is faster.

Audio Storage Features:

Audio files are stored persistently with configurable automatic cleanup
Files are accessible via chatterbox://audio/{resource_id} resources
Directory is created automatically if it doesn't exist
Supports relative paths (will be expanded) and ~ home directory notation

Usage

Running the Server

Standalone:

bash

python chatterbox_mcp_server.py

With MCP tools:

bash

mcp dev chatterbox_mcp_server.py

Integration with Claude Desktop

Add to your Claude Desktop MCP configuration:

Basic configuration:

json

{
  "mcpServers": {
    "chatterbox-tts": {
      "command": "python",
      "args": ["/path/to/chatterbox_mcp_server.py"],
      "env": {}
    }
  }
}

With custom configuration:

json

{
  "mcpServers": {
    "chatterbox-tts": {
      "command": "python",
      "args": [
        "/path/to/chatterbox_mcp_server.py", 
        "--audio-dir", "/custom/audio/path",
        "--auto-load-model",
        "--audio-ttl-hours", "24"
      ],
      "env": {
        "CHATTERBOX_AUDIO_DIR": "/custom/audio/path",
        "CHATTERBOX_AUDIO_TTL_HOURS": "24"
      }
    }
  }
}

Example Usage from LLM

1. Basic text-to-speech:

code

Please use the speak_text tool to say "Hello, welcome to the Chatterbox TTS demonstration!"

2. Expressive speech:

code

Use speak_text to generate enthusiastic speech for "This is amazing!" with high expressiveness

The tool will automatically:

Load the model if needed (with progress updates)
Generate the speech
Play the audio
Clean up temporary files
Provide status updates throughout

Technical Details

Device Support

Apple Silicon (M1/M2/M3/M4): Uses MPS acceleration when available
NVIDIA GPUs: Uses CUDA when available
CPU fallback: Works on any system

Audio Processing

Uses temporary files for audio storage
Automatic cleanup after playback
WAV format output
High-quality audio generation

Model Management

Model loads once on first use
Shared across all subsequent requests
Thread-safe loading with progress tracking
Automatic device detection and optimization

File Structure

code

chatterbox-mcp/
├── chatterbox_mcp_server.py    # MCP server implementation
└── README.md                   # This documentation

Development

Key Improvements in This Version

1. Simplified Interface: Single speak_text tool instead of multiple tools

2. Automatic Playback: No need to manually play generated files

3. Progress Notifications: Real-time updates on model loading and generation

4. Persistent Audio Storage: Audio files are stored with configurable automatic cleanup

5. Better Error Handling: Comprehensive error reporting and recovery

6. Streamlined Workflow: One command generates and plays speech

Troubleshooting

Common Issues:

1. Model loading slow:

First-time loading downloads model weights
Progress notifications show current status
Subsequent uses are much faster

2. Audio playback issues:

afplay command is macOS-specific
Ensure system audio is working
Check volume settings

3. Memory issues:

Model requires significant GPU/CPU memory
Monitor system resources during loading
Consider closing other applications

4. Device selection:

Server automatically selects best available device
Check model info resource for current device
MPS (Apple Silicon) > CUDA (NVIDIA) > CPU

License

This MCP server implementation follows the same license as the underlying Chatterbox TTS model.

Chatterbox Mcp

Documentation

Chatterbox TTS MCP Server

Overview

Features

Single Tool: speak_text

Resource: chatterbox://model-info

Progress Notifications

Installation

Configuration

Audio File Storage

Audio File TTL (Time To Live)

Model Auto-Loading

Usage

Running the Server

Integration with Claude Desktop

Example Usage from LLM

Technical Details

Device Support

Audio Processing

Model Management

File Structure

Development

Key Improvements in This Version

Troubleshooting

License

Similar MCP

Esp Rainmaker Mcp

Personalizationmcp

Fal Mcp Server

Opengenes Mcp

Trending MCP

Playwright Mcp

Serena

Mcp Playwright

Mcp Server Cloudflare

Esp Rainmaker Mcp

Personalizationmcp

Fal Mcp Server

Opengenes Mcp

Playwright Mcp

Serena

Mcp Playwright

Mcp Server Cloudflare

Single Tool: `speak_text`

Resource: `chatterbox://model-info`