Text-to-Speech (TTS)
Text-to-Speech (TTS) functionality for fmus-vox.
This module provides functionality for synthesizing text into speech using various models and techniques.
- fmus_vox.tts.speak(text: str, voice: str = 'default', output: str | None = None, model: str = 'vits', **kwargs) Audio | None[source]
Synthesize speech from text using a specified model and voice.
This is a simple functional API for quick speech synthesis. For more control, use the Speaker class directly.
- Parameters:
text – Text to synthesize
voice – Voice to use (name or ID)
output – Path to save audio file (if None, returns Audio object)
model – Model to use for synthesis (vits, coqui, etc.)
**kwargs – Additional model-specific parameters
- Returns:
Audio object if output is None, otherwise None
- Raises:
SynthesisError – If synthesis fails
Examples
>>> # Synthesize speech and play it >>> audio = speak("Hello, world!") >>> audio.play()
>>> # Synthesize speech and save to file >>> speak("Hello, world!", output="hello.wav")
Speaker Class
- class fmus_vox.tts.speaker.Speaker(model: str = 'vits', **kwargs)[source]
Bases:
objectBase class for text-to-speech synthesis.
This class provides the common interface for all TTS models and handles model loading, voice selection, and synthesis.
- Parameters:
model – Name of the model to use (vits, coqui, etc.)
voice – Voice ID or name to use
device – Computation device (cpu, cuda, auto)
**kwargs – Additional model-specific parameters
- classmethod register_model(name: str, implementation: type) None[source]
Register a model implementation.
- Parameters:
name – Model name
implementation – Model implementation class
- static __new__(cls, model: str = 'vits', **kwargs) Speaker[source]
Create a new Speaker instance of the appropriate subclass.
- Parameters:
model – Name of the model to use
**kwargs – Additional model-specific parameters
- Returns:
Speaker instance
- Raises:
ModelError – If the model is not supported
- __init__(model: str = 'vits', voice: str = 'default', device: str | None = None, **kwargs)[source]
Initialize the speaker.
- Parameters:
model – Name of the model to use
voice – Voice ID or name to use
device – Computation device (cpu, cuda, auto)
**kwargs – Additional model-specific parameters
- speak(**kwargs)
- speak_with_metadata(**kwargs)
- async speak_async(text: str) Audio[source]
Synthesize speech from text asynchronously.
- Parameters:
text – Text to synthesize
- Returns:
Audio object with synthesized speech
- Raises:
SynthesisError – If synthesis fails
- async speak_with_metadata_async(text: str) SpeechResult[source]
Synthesize speech from text asynchronously with additional metadata.
- Parameters:
text – Text to synthesize
- Returns:
SpeechResult object
- Raises:
SynthesisError – If synthesis fails