Documentation
Chatterbox TTS MCP Server
A simplified Model Context Protocol (MCP) server that provides text-to-speech generation with automatic playback using the Chatterbox TTS model. The server loads the model automatically on first use and provides real-time progress notifications to keep users informed throughout the process.
Overview
This MCP server exposes Chatterbox TTS functionality through a single, streamlined tool that generates speech from text and plays it automatically. The server handles model loading, progress reporting, temporary file management, and audio playback seamlessly.
Features
Single Tool: speak_text
The speak_text tool provides complete text-to-speech functionality:
- Parameters:
text(required): The text to convert to speechexaggeration(optional): Controls expressiveness (0.0-1.0, default 0.5)cfg_weight(optional): Controls classifier-free guidance (0.0-1.0, default 0.5)
- Features:
- Automatic model loading with progress notifications
- Generates speech using temporary files (auto-cleanup)
- Plays audio automatically on macOS using
afplay - Real-time progress updates during all phases:
- Model initialization and loading
- Speech generation
- Audio playback
Resource: chatterbox://model-info
Get information about the TTS model status and device capabilities:
- Model loading status (loaded/not loaded)
- Device information (MPS/CUDA/CPU)
- Hardware acceleration availability
Progress Notifications
The server provides detailed progress notifications throughout the speech generation process:
1. Model Loading Phase:
- "Loading Chatterbox TTS model..."
- "Initializing PyTorch device..."
- "Loading model weights..."
- "Model loaded successfully!"
2. Speech Generation Phase:
- "Starting speech generation..."
- "Speech generated, saving to temporary file..."
3. Playback Phase:
- "Playing audio..."
- "Audio playback completed!"
4. Status Updates:
- Device selection (MPS/CUDA/CPU)
- Voice prompt usage when applicable
- Success/error messages
Installation
1. Install dependencies:
pip install mcp torch torchaudio2. Install Chatterbox TTS:
Follow the Chatterbox TTS installation instructions to ensure the chatterbox.tts module is available.
Configuration
Audio File Storage
By default, the server stores audio files in ~/.chatterbox/audio. You can configure a custom location using:
Command line argument:
python chatterbox_mcp_server.py --audio-dir /path/to/custom/audio/directoryEnvironment variable:
export CHATTERBOX_AUDIO_DIR="/path/to/custom/audio/directory"
python chatterbox_mcp_server.pyPriority order:
1. Command line --audio-dir argument (highest priority)
2. CHATTERBOX_AUDIO_DIR environment variable
3. Default: ~/.chatterbox/audio (lowest priority)
Audio File TTL (Time To Live)
By default, audio files are automatically cleaned up after 1 hour. You can configure a custom TTL:
Command line argument:
python chatterbox_mcp_server.py --audio-ttl-hours 24 # Keep files for 24 hoursEnvironment variable:
export CHATTERBOX_AUDIO_TTL_HOURS=24
python chatterbox_mcp_server.pyPriority order:
1. Command line --audio-ttl-hours argument (highest priority)
2. CHATTERBOX_AUDIO_TTL_HOURS environment variable
3. Default: 1 hour (lowest priority)
Model Auto-Loading
By default, the TTS model is loaded on first use to minimize startup time. You can pre-load it at startup:
Command line argument:
python chatterbox_mcp_server.py --auto-load-modelThis will load the model during server startup, which takes a few seconds but ensures the first TTS request is faster.
Audio Storage Features:
- Audio files are stored persistently with configurable automatic cleanup
- Files are accessible via
chatterbox://audio/{resource_id}resources - Directory is created automatically if it doesn't exist
- Supports relative paths (will be expanded) and
~home directory notation
Usage
Running the Server
Standalone:
python chatterbox_mcp_server.pyWith MCP tools:
mcp dev chatterbox_mcp_server.pyIntegration with Claude Desktop
Add to your Claude Desktop MCP configuration:
Basic configuration:
{
"mcpServers": {
"chatterbox-tts": {
"command": "python",
"args": ["/path/to/chatterbox_mcp_server.py"],
"env": {}
}
}
}With custom configuration:
{
"mcpServers": {
"chatterbox-tts": {
"command": "python",
"args": [
"/path/to/chatterbox_mcp_server.py",
"--audio-dir", "/custom/audio/path",
"--auto-load-model",
"--audio-ttl-hours", "24"
],
"env": {
"CHATTERBOX_AUDIO_DIR": "/custom/audio/path",
"CHATTERBOX_AUDIO_TTL_HOURS": "24"
}
}
}
}Example Usage from LLM
1. Basic text-to-speech:
Please use the speak_text tool to say "Hello, welcome to the Chatterbox TTS demonstration!"2. Expressive speech:
Use speak_text to generate enthusiastic speech for "This is amazing!" with high expressivenessThe tool will automatically:
- Load the model if needed (with progress updates)
- Generate the speech
- Play the audio
- Clean up temporary files
- Provide status updates throughout
Technical Details
Device Support
- Apple Silicon (M1/M2/M3/M4): Uses MPS acceleration when available
- NVIDIA GPUs: Uses CUDA when available
- CPU fallback: Works on any system
Audio Processing
- Uses temporary files for audio storage
- Automatic cleanup after playback
- WAV format output
- High-quality audio generation
Model Management
- Model loads once on first use
- Shared across all subsequent requests
- Thread-safe loading with progress tracking
- Automatic device detection and optimization
File Structure
chatterbox-mcp/
├── chatterbox_mcp_server.py # MCP server implementation
└── README.md # This documentationDevelopment
Key Improvements in This Version
1. Simplified Interface: Single speak_text tool instead of multiple tools
2. Automatic Playback: No need to manually play generated files
3. Progress Notifications: Real-time updates on model loading and generation
4. Persistent Audio Storage: Audio files are stored with configurable automatic cleanup
5. Better Error Handling: Comprehensive error reporting and recovery
6. Streamlined Workflow: One command generates and plays speech
Troubleshooting
Common Issues:
1. Model loading slow:
- First-time loading downloads model weights
- Progress notifications show current status
- Subsequent uses are much faster
2. Audio playback issues:
afplaycommand is macOS-specific- Ensure system audio is working
- Check volume settings
3. Memory issues:
- Model requires significant GPU/CPU memory
- Monitor system resources during loading
- Consider closing other applications
4. Device selection:
- Server automatically selects best available device
- Check model info resource for current device
- MPS (Apple Silicon) > CUDA (NVIDIA) > CPU
License
This MCP server implementation follows the same license as the underlying Chatterbox TTS model.
Similar MCP
Based on tags & features
Trending MCP
Most active this week