Text-to-Speech (TTS)

Text-to-Speech (TTS) functionality for fmus-vox.

This module provides functionality for synthesizing text into speech using various models and techniques.

fmus_vox.tts.speak(text: str, voice: str = 'default', output: str | None = None, model: str = 'vits', **kwargs) → Audio | None[source]

Synthesize speech from text using a specified model and voice.

This is a simple functional API for quick speech synthesis. For more control, use the Speaker class directly.

Parameters:

text – Text to synthesize
voice – Voice to use (name or ID)
output – Path to save audio file (if None, returns Audio object)
model – Model to use for synthesis (vits, coqui, etc.)
**kwargs – Additional model-specific parameters

Returns:

Audio object if output is None, otherwise None

Raises:

SynthesisError – If synthesis fails

Examples

>>> # Synthesize speech and play it
>>> audio = speak("Hello, world!")
>>> audio.play()

>>> # Synthesize speech and save to file
>>> speak("Hello, world!", output="hello.wav")

Speaker Class

class fmus_vox.tts.speaker.Speaker(model: str = 'vits', **kwargs)[source]

Bases: object

Base class for text-to-speech synthesis.

This class provides the common interface for all TTS models and handles model loading, voice selection, and synthesis.

Parameters:

model – Name of the model to use (vits, coqui, etc.)
voice – Voice ID or name to use
device – Computation device (cpu, cuda, auto)
**kwargs – Additional model-specific parameters

classmethod register_model(name: str, implementation: type) → None[source]

Register a model implementation.

Parameters:

name – Model name
implementation – Model implementation class

static __new__(cls, model: str = 'vits', **kwargs) → Speaker[source]

Create a new Speaker instance of the appropriate subclass.

Parameters:

model – Name of the model to use
**kwargs – Additional model-specific parameters

Returns:

Speaker instance

Raises:

ModelError – If the model is not supported

__init__(model: str = 'vits', voice: str = 'default', device: str | None = None, **kwargs)[source]

Initialize the speaker.

Parameters:

model – Name of the model to use
voice – Voice ID or name to use
device – Computation device (cpu, cuda, auto)
**kwargs – Additional model-specific parameters

speak(**kwargs)

speak_with_metadata(**kwargs)

async speak_async(text: str) → Audio[source]

Synthesize speech from text asynchronously.

Parameters:: text – Text to synthesize
Returns:: Audio object with synthesized speech
Raises:: SynthesisError – If synthesis fails

async speak_with_metadata_async(text: str) → SpeechResult[source]

Synthesize speech from text asynchronously with additional metadata.

Parameters:: text – Text to synthesize
Returns:: SpeechResult object
Raises:: SynthesisError – If synthesis fails

stream(text_generator: Generator[str, None, None]) → Generator[Audio, None, None][source]

Stream synthesis for incoming text chunks.

Parameters:: text_generator – Generator yielding text chunks
Yields:: Audio object for each synthesized chunk
Raises:: SynthesisError – If synthesis fails

set_voice(voice_id: str) → Speaker[source]: Set the voice to use.

set_speed(speed: float) → Speaker[source]: Set the speaking speed (1.0 is normal).

set_pitch(pitch: float) → Speaker[source]: Set the voice pitch in semitones (0.0 is normal).

set_style(style: str) → Speaker[source]: Set the speaking style (e.g., ‘neutral’, ‘happy’, ‘sad’).

voice(voice_id: str) → Speaker[source]: Set the voice to use (alias for set_voice).

speed(speed: float) → Speaker[source]: Set the speaking speed (alias for set_speed).

pitch(pitch: float) → Speaker[source]: Set the voice pitch (alias for set_pitch).

style(style: str) → Speaker[source]: Set the speaking style (alias for set_style).

get_available_voices() → List[Dict[str, Any]][source]

Get list of available voices.

Returns:: List of voice dictionaries with id, name, and language