fmus-vox: A Speech Processing Library

Welcome to fmus-vox, a Python library for audio processing, speech-to-text, text-to-speech, voice cloning, and conversational AI.

Features

  • Audio Processing: Load, manipulate, and analyze audio with an intuitive interface

  • Speech-to-Text: Transcribe speech with support for multiple models (Whisper, Wav2Vec, etc.)

  • Text-to-Speech: Synthesize natural-sounding speech with various voices and styles

  • Voice Cloning: Create synthetic speech that mimics a specific voice

  • Wake Word Detection: Detect custom wake words in audio streams

  • Conversational AI: Build voice-driven conversational agents

  • Streaming: Real-time audio processing with low latency

  • API: Easy integration with web applications

Quick Example

Audio Processing:

from fmus_vox import Audio

audio = Audio.load("recording.wav")
processed = audio.normalize().denoise().resample(target_sr=16000)
processed.save("processed.wav")

Speech-to-Text:

from fmus_vox import transcribe

text = transcribe("recording.wav")
print(f"Transcription: {text}")

Text-to-Speech:

from fmus_vox import speak

speak("Hello, welcome to fmus-vox!", output="welcome.wav")

Voice Cloning:

from fmus_vox import clone_voice

clone_voice("my_voice.wav", "Hello with my voice", output="cloned.wav")

Indices and tables