Core Module

Core functionality for fmus-vox.

This module contains the fundamental components and utilities used throughout the library.

Audio Class

class fmus_vox.core.audio.Audio(data: ndarray, sample_rate: int)[source]

Bases: object

Main class for audio operations in fmus-vox.

The Audio class provides an intuitive interface for loading, processing, and manipulating audio data. It supports method chaining for clean, readable code.

Examples

>>> # Load and process audio
>>> audio = Audio.load("recording.wav")
>>> processed = audio.normalize().denoise().resample(target_sr=16000)
>>> processed.save("processed.wav")
>>>
>>> # Record and save audio
>>> audio = Audio.record(seconds=5)
>>> audio.save("recording.wav")
__init__(data: ndarray, sample_rate: int)[source]

Initialize an Audio object.

Parameters:
  • data – Audio data as a numpy array

  • sample_rate – Sample rate of the audio in Hz

classmethod load(source: str | Path | BinaryIO | ndarray, sample_rate: int | None = None) Audio[source]

Load audio from file, bytes, or numpy array.

Parameters:
  • source – Audio source (file path, file-like object, or numpy array)

  • sample_rate – Target sample rate for loading. If None, use the source’s rate. If source is a numpy array, this must be provided.

Returns:

Audio object

Raises:

AudioError – If the audio cannot be loaded

classmethod record(seconds: float | None = None, sample_rate: int = 44100, **kwargs) Audio[source]

Record audio from microphone.

Parameters:
  • seconds – Duration in seconds to record. If None, records until stopped.

  • sample_rate – Sample rate to record at

  • **kwargs – Additional arguments for recording

Returns:

Audio object containing the recorded audio

Raises:

AudioError – If recording fails

save(path: str | Path, format: str | None = None, **kwargs) str[source]

Save audio to file.

Parameters:
  • path – Path to save the audio file

  • format – Audio format (inferred from path if None)

  • **kwargs – Additional arguments for saving

Returns:

Path to the saved file

Raises:

AudioError – If saving fails

play() None[source]

Play audio through speakers.

Raises:

AudioError – If playback fails

trim(start: float = 0, end: float | None = None) Audio[source]

Trim audio to specified time range.

Parameters:
  • start – Start time in seconds

  • end – End time in seconds. If None, trim to the end of the audio.

Returns:

New Audio object with trimmed audio

denoise(strength: float = 0.5) Audio[source]

Remove noise from audio.

Parameters:

strength – Denoising strength (0.0 to 1.0)

Returns:

New Audio object with denoised audio

normalize(target_db: float = -3) Audio[source]

Normalize audio volume.

Parameters:

target_db – Target peak dB level

Returns:

New Audio object with normalized audio

resample(target_sr: int = 16000) Audio[source]

Resample audio to target sample rate.

Parameters:

target_sr – Target sample rate in Hz

Returns:

New Audio object with resampled audio

detect_vad(threshold: float = 0.5) List[Tuple[float, float]][source]

Detect voice activity segments.

Parameters:

threshold – Energy threshold for voice detection (0.0 to 1.0)

Returns:

List of (start_time, end_time) tuples in seconds

split_on_silence(min_silence_len: int = 500, silence_thresh: float = -40) List[Audio][source]

Split audio on silence into segments.

Parameters:
  • min_silence_len – Minimum silence length in milliseconds

  • silence_thresh – Silence threshold in dB

Returns:

List of Audio objects, one for each non-silent segment

change_speed(speed_factor: float = 1.0) Audio[source]

Change the playback speed of the audio.

Parameters:

speed_factor – Speed factor (1.0 = original speed)

Returns:

New Audio object with changed speed

change_pitch(semitones: float = 0.0) Audio[source]

Change the pitch of the audio.

Parameters:

semitones – Number of semitones to shift (-12 to +12)

Returns:

New Audio object with changed pitch

property duration: float

Get audio duration in seconds.

property sample_rate: int

Get audio sample rate.

property data: ndarray

Get audio data as numpy array.

__len__() int[source]

Get length of audio in samples.

Config Module

Configuration management for fmus-vox.

This module provides facilities for loading, storing, and accessing configuration settings throughout the library.

class fmus_vox.core.config.Config[source]

Bases: object

Configuration manager for fmus-vox.

Handles loading, saving, and accessing configuration settings. Supports both global and model-specific configurations.

__init__()[source]

Initialize configuration with default values.

save() None[source]

Save current configuration to user config file.

get(key: str, default: Any | None = None) Any[source]

Get configuration value.

Parameters:
  • key – Configuration key

  • default – Default value if key doesn’t exist

Returns:

Configuration value

set(key: str, value: Any) None[source]

Set configuration value.

Parameters:
  • key – Configuration key

  • value – Configuration value

update(config_dict: Dict[str, Any]) None[source]

Update multiple configuration values.

Parameters:

config_dict – Dictionary of configuration key-value pairs

reset() None[source]

Reset configuration to default values.

get_model_path(model_type: str, model_name: str) Path[source]

Get path to a specific model.

Parameters:
  • model_type – Type of model (e.g., ‘stt’, ‘tts’)

  • model_name – Name of model (e.g., ‘whisper’, ‘vits’)

Returns:

Path to model directory

get_device() str[source]

Get the computation device to use.

Returns:

0’, etc.)

Return type:

Device string (‘cpu’, ‘cuda

property as_dict: Dict[str, Any]

Get configuration as dictionary.

fmus_vox.core.config.get_config() Config[source]

Get the global configuration instance.

Returns:

Global Config instance

Utils Module

Utility functions for fmus-vox.

This module contains various utility functions used throughout the library.

fmus_vox.core.utils.get_logger(name: str, level: str | None = None) Logger[source]

Get a logger with the given name and level.

Parameters:
  • name – Logger name

  • level – Logging level (DEBUG, INFO, WARNING, ERROR, CRITICAL)

Returns:

Configured logger instance

fmus_vox.core.utils.timed(func: Callable) Callable[source]

Decorator to time function execution.

Parameters:

func – Function to time

Returns:

Wrapped function that logs execution time

fmus_vox.core.utils.ensure_path_exists(path: str | Path) Path[source]

Ensure that a directory path exists, creating it if necessary.

Parameters:

path – Directory path

Returns:

Path object for the directory

fmus_vox.core.utils.download_file(url: str, dest_path: str | Path, progress: bool = True) Path[source]

Download a file from a URL to a destination path.

Parameters:
  • url – URL to download from

  • dest_path – Path to save the file to

  • progress – Whether to show progress bar

Returns:

Path to the downloaded file

Raises:

FmusVoxError – If download fails

class fmus_vox.core.utils.LazyLoader(init_func: Callable[[], T])[source]

Bases: Generic[T]

Lazy loader for objects that are expensive to initialize.

Initializes the object only when it’s first accessed.

__init__(init_func: Callable[[], T])[source]

Initialize the lazy loader.

Parameters:

init_func – Function that initializes the object

get() T[source]

Get the object, initializing it if necessary.

Returns:

The initialized object

reset() None[source]

Reset the object, forcing re-initialization on next get().

fmus_vox.core.utils.format_timestamp(seconds: float) str[source]

Format seconds as a timestamp (MM:SS.mmm).

Parameters:

seconds – Time in seconds

Returns:

Formatted timestamp