fmus-vox: A Speech Processing Library
Welcome to fmus-vox, a Python library for audio processing, speech-to-text, text-to-speech, voice cloning, and conversational AI.
Features
Audio Processing: Load, manipulate, and analyze audio with an intuitive interface
Speech-to-Text: Transcribe speech with support for multiple models (Whisper, Wav2Vec, etc.)
Text-to-Speech: Synthesize natural-sounding speech with various voices and styles
Voice Cloning: Create synthetic speech that mimics a specific voice
Wake Word Detection: Detect custom wake words in audio streams
Conversational AI: Build voice-driven conversational agents
Streaming: Real-time audio processing with low latency
API: Easy integration with web applications
Quick Example
Audio Processing:
from fmus_vox import Audio
audio = Audio.load("recording.wav")
processed = audio.normalize().denoise().resample(target_sr=16000)
processed.save("processed.wav")
Speech-to-Text:
from fmus_vox import transcribe
text = transcribe("recording.wav")
print(f"Transcription: {text}")
Text-to-Speech:
from fmus_vox import speak
speak("Hello, welcome to fmus-vox!", output="welcome.wav")
Voice Cloning:
from fmus_vox import clone_voice
clone_voice("my_voice.wav", "Hello with my voice", output="cloned.wav")