Text-to-Speech (TTS)

Text-to-Speech (TTS) functionality for fmus-vox.

This module provides functionality for synthesizing text into speech using various models and techniques.

fmus_vox.tts.speak(text: str, voice: str = 'default', output: str | None = None, model: str = 'vits', **kwargs) Audio | None[source]

Synthesize speech from text using a specified model and voice.

This is a simple functional API for quick speech synthesis. For more control, use the Speaker class directly.

Parameters:
  • text – Text to synthesize

  • voice – Voice to use (name or ID)

  • output – Path to save audio file (if None, returns Audio object)

  • model – Model to use for synthesis (vits, coqui, etc.)

  • **kwargs – Additional model-specific parameters

Returns:

Audio object if output is None, otherwise None

Raises:

SynthesisError – If synthesis fails

Examples

>>> # Synthesize speech and play it
>>> audio = speak("Hello, world!")
>>> audio.play()
>>> # Synthesize speech and save to file
>>> speak("Hello, world!", output="hello.wav")

Speaker Class

class fmus_vox.tts.speaker.Speaker(model: str = 'vits', **kwargs)[source]

Bases: object

Base class for text-to-speech synthesis.

This class provides the common interface for all TTS models and handles model loading, voice selection, and synthesis.

Parameters:
  • model – Name of the model to use (vits, coqui, etc.)

  • voice – Voice ID or name to use

  • device – Computation device (cpu, cuda, auto)

  • **kwargs – Additional model-specific parameters

classmethod register_model(name: str, implementation: type) None[source]

Register a model implementation.

Parameters:
  • name – Model name

  • implementation – Model implementation class

static __new__(cls, model: str = 'vits', **kwargs) Speaker[source]

Create a new Speaker instance of the appropriate subclass.

Parameters:
  • model – Name of the model to use

  • **kwargs – Additional model-specific parameters

Returns:

Speaker instance

Raises:

ModelError – If the model is not supported

__init__(model: str = 'vits', voice: str = 'default', device: str | None = None, **kwargs)[source]

Initialize the speaker.

Parameters:
  • model – Name of the model to use

  • voice – Voice ID or name to use

  • device – Computation device (cpu, cuda, auto)

  • **kwargs – Additional model-specific parameters

speak(**kwargs)
speak_with_metadata(**kwargs)
async speak_async(text: str) Audio[source]

Synthesize speech from text asynchronously.

Parameters:

text – Text to synthesize

Returns:

Audio object with synthesized speech

Raises:

SynthesisError – If synthesis fails

async speak_with_metadata_async(text: str) SpeechResult[source]

Synthesize speech from text asynchronously with additional metadata.

Parameters:

text – Text to synthesize

Returns:

SpeechResult object

Raises:

SynthesisError – If synthesis fails

stream(text_generator: Generator[str, None, None]) Generator[Audio, None, None][source]

Stream synthesis for incoming text chunks.

Parameters:

text_generator – Generator yielding text chunks

Yields:

Audio object for each synthesized chunk

Raises:

SynthesisError – If synthesis fails

set_voice(voice_id: str) Speaker[source]

Set the voice to use.

set_speed(speed: float) Speaker[source]

Set the speaking speed (1.0 is normal).

set_pitch(pitch: float) Speaker[source]

Set the voice pitch in semitones (0.0 is normal).

set_style(style: str) Speaker[source]

Set the speaking style (e.g., ‘neutral’, ‘happy’, ‘sad’).

voice(voice_id: str) Speaker[source]

Set the voice to use (alias for set_voice).

speed(speed: float) Speaker[source]

Set the speaking speed (alias for set_speed).

pitch(pitch: float) Speaker[source]

Set the voice pitch (alias for set_pitch).

style(style: str) Speaker[source]

Set the speaking style (alias for set_style).

get_available_voices() List[Dict[str, Any]][source]

Get list of available voices.

Returns:

List of voice dictionaries with id, name, and language