fmus-vox: A Speech Processing Library

Welcome to fmus-vox, a Python library for audio processing, speech-to-text, text-to-speech, voice cloning, and conversational AI.

Features

Audio Processing: Load, manipulate, and analyze audio with an intuitive interface
Speech-to-Text: Transcribe speech with support for multiple models (Whisper, Wav2Vec, etc.)
Text-to-Speech: Synthesize natural-sounding speech with various voices and styles
Voice Cloning: Create synthetic speech that mimics a specific voice
Wake Word Detection: Detect custom wake words in audio streams
Conversational AI: Build voice-driven conversational agents
Streaming: Real-time audio processing with low latency
API: Easy integration with web applications

Quick Example

Audio Processing:

from fmus_vox import Audio

audio = Audio.load("recording.wav")
processed = audio.normalize().denoise().resample(target_sr=16000)
processed.save("processed.wav")

Speech-to-Text:

from fmus_vox import transcribe

text = transcribe("recording.wav")
print(f"Transcription: {text}")

Text-to-Speech:

from fmus_vox import speak

speak("Hello, welcome to fmus-vox!", output="welcome.wav")

Voice Cloning:

from fmus_vox import clone_voice

clone_voice("my_voice.wav", "Hello with my voice", output="cloned.wav")

fmus-vox: A Speech Processing Library

Features

Quick Example

Indices and tables