Core Module

Core functionality for fmus-vox.

This module contains the fundamental components and utilities used throughout the library.

Audio Class

class fmus_vox.core.audio.Audio(data: ndarray, sample_rate: int)[source]

Bases: object

Main class for audio operations in fmus-vox.

The Audio class provides an intuitive interface for loading, processing, and manipulating audio data. It supports method chaining for clean, readable code.

Examples

>>> # Load and process audio
>>> audio = Audio.load("recording.wav")
>>> processed = audio.normalize().denoise().resample(target_sr=16000)
>>> processed.save("processed.wav")
>>>
>>> # Record and save audio
>>> audio = Audio.record(seconds=5)
>>> audio.save("recording.wav")

__init__(data: ndarray, sample_rate: int)[source]

Initialize an Audio object.

Parameters:

data – Audio data as a numpy array
sample_rate – Sample rate of the audio in Hz

classmethod load(source: str | Path | BinaryIO | ndarray, sample_rate: int | None = None) → Audio[source]

Load audio from file, bytes, or numpy array.

Parameters:

source – Audio source (file path, file-like object, or numpy array)
sample_rate – Target sample rate for loading. If None, use the source’s rate. If source is a numpy array, this must be provided.

Returns:

Audio object

Raises:

AudioError – If the audio cannot be loaded

classmethod record(seconds: float | None = None, sample_rate: int = 44100, **kwargs) → Audio[source]

Record audio from microphone.

Parameters:

seconds – Duration in seconds to record. If None, records until stopped.
sample_rate – Sample rate to record at
**kwargs – Additional arguments for recording

Returns:

Audio object containing the recorded audio

Raises:

AudioError – If recording fails

save(path: str | Path, format: str | None = None, **kwargs) → str[source]

Save audio to file.

Parameters:

path – Path to save the audio file
format – Audio format (inferred from path if None)
**kwargs – Additional arguments for saving

Returns:

Path to the saved file

Raises:

AudioError – If saving fails

play() → None[source]

Play audio through speakers.

Raises:: AudioError – If playback fails

trim(start: float = 0, end: float | None = None) → Audio[source]

Trim audio to specified time range.

Parameters:

start – Start time in seconds
end – End time in seconds. If None, trim to the end of the audio.

Returns:

New Audio object with trimmed audio

denoise(strength: float = 0.5) → Audio[source]

Remove noise from audio.

Parameters:: strength – Denoising strength (0.0 to 1.0)
Returns:: New Audio object with denoised audio

normalize(target_db: float = -3) → Audio[source]

Normalize audio volume.

Parameters:: target_db – Target peak dB level
Returns:: New Audio object with normalized audio

resample(target_sr: int = 16000) → Audio[source]

Resample audio to target sample rate.

Parameters:: target_sr – Target sample rate in Hz
Returns:: New Audio object with resampled audio

detect_vad(threshold: float = 0.5) → List[Tuple[float, float]][source]

Detect voice activity segments.

Parameters:: threshold – Energy threshold for voice detection (0.0 to 1.0)
Returns:: List of (start_time, end_time) tuples in seconds

split_on_silence(min_silence_len: int = 500, silence_thresh: float = -40) → List[Audio][source]

Split audio on silence into segments.

Parameters:

min_silence_len – Minimum silence length in milliseconds
silence_thresh – Silence threshold in dB

Returns:

List of Audio objects, one for each non-silent segment

change_speed(speed_factor: float = 1.0) → Audio[source]

Change the playback speed of the audio.

Parameters:: speed_factor – Speed factor (1.0 = original speed)
Returns:: New Audio object with changed speed

change_pitch(semitones: float = 0.0) → Audio[source]

Change the pitch of the audio.

Parameters:: semitones – Number of semitones to shift (-12 to +12)
Returns:: New Audio object with changed pitch

property duration: float: Get audio duration in seconds.

property sample_rate: int: Get audio sample rate.

property data: ndarray: Get audio data as numpy array.

__len__() → int[source]: Get length of audio in samples.

Config Module

Configuration management for fmus-vox.

This module provides facilities for loading, storing, and accessing configuration settings throughout the library.

class fmus_vox.core.config.Config[source]

Bases: object

Configuration manager for fmus-vox.

Handles loading, saving, and accessing configuration settings. Supports both global and model-specific configurations.

__init__()[source]: Initialize configuration with default values.

save() → None[source]: Save current configuration to user config file.

get(key: str, default: Any | None = None) → Any[source]

Get configuration value.

Parameters:

key – Configuration key
default – Default value if key doesn’t exist

Returns:

Configuration value

set(key: str, value: Any) → None[source]

Set configuration value.

Parameters:

key – Configuration key
value – Configuration value

update(config_dict: Dict[str, Any]) → None[source]

Update multiple configuration values.

Parameters:: config_dict – Dictionary of configuration key-value pairs

reset() → None[source]: Reset configuration to default values.

get_model_path(model_type: str, model_name: str) → Path[source]

Get path to a specific model.

Parameters:

model_type – Type of model (e.g., ‘stt’, ‘tts’)
model_name – Name of model (e.g., ‘whisper’, ‘vits’)

Returns:

Path to model directory

get_device() → str[source]

Get the computation device to use.

Returns:: 0’, etc.)
Return type:: Device string (‘cpu’, ‘cuda

property as_dict: Dict[str, Any]: Get configuration as dictionary.

fmus_vox.core.config.get_config() → Config[source]

Get the global configuration instance.

Returns:: Global Config instance

Utils Module

Utility functions for fmus-vox.

This module contains various utility functions used throughout the library.

fmus_vox.core.utils.get_logger(name: str, level: str | None = None) → Logger[source]

Get a logger with the given name and level.

Parameters:

name – Logger name
level – Logging level (DEBUG, INFO, WARNING, ERROR, CRITICAL)

Returns:

Configured logger instance

fmus_vox.core.utils.timed(func: Callable) → Callable[source]

Decorator to time function execution.

Parameters:: func – Function to time
Returns:: Wrapped function that logs execution time

fmus_vox.core.utils.ensure_path_exists(path: str | Path) → Path[source]

Ensure that a directory path exists, creating it if necessary.

Parameters:: path – Directory path
Returns:: Path object for the directory

fmus_vox.core.utils.download_file(url: str, dest_path: str | Path, progress: bool = True) → Path[source]

Download a file from a URL to a destination path.

Parameters:

url – URL to download from
dest_path – Path to save the file to
progress – Whether to show progress bar

Returns:

Path to the downloaded file

Raises:

FmusVoxError – If download fails

class fmus_vox.core.utils.LazyLoader(init_func: Callable[[], T])[source]

Bases: Generic[T]

Lazy loader for objects that are expensive to initialize.

Initializes the object only when it’s first accessed.

__init__(init_func: Callable[[], T])[source]

Initialize the lazy loader.

Parameters:: init_func – Function that initializes the object

get() → T[source]

Get the object, initializing it if necessary.

Returns:: The initialized object

reset() → None[source]: Reset the object, forcing re-initialization on next get().

fmus_vox.core.utils.format_timestamp(seconds: float) → str[source]

Format seconds as a timestamp (MM:SS.mmm).

Parameters:: seconds – Time in seconds
Returns:: Formatted timestamp