Streaming
fmus_vox.stream - Audio streaming functionality.
This module provides interfaces and implementations for audio streaming, including microphone input, websocket streaming, and real-time processing.
- class fmus_vox.stream.Microphone(device_index: int | None = None, sample_rate: int = 16000, channels: int = 1, format: str = 'float32', chunk_size: int = 1024, **kwargs)[source]
Bases:
objectEnhanced class for recording audio from a microphone device.
This class provides both blocking and streaming interfaces for capturing audio from microphone input devices, with support for device selection, audio visualization, and real-time processing.
- FORMAT_MAP = {'float32': None, 'int16': None, 'int24': None, 'int32': None, 'int8': None, 'uint8': None}
- __init__(device_index: int | None = None, sample_rate: int = 16000, channels: int = 1, format: str = 'float32', chunk_size: int = 1024, **kwargs)[source]
Initialize a microphone input stream.
- Parameters:
device_index – Index of the input device to use. None for default.
sample_rate – Sample rate to record at.
channels – Number of audio channels to record.
format – Audio format (‘float32’, ‘int16’, etc.)
chunk_size – Size of audio chunks to process at once.
**kwargs – Additional parameters for PyAudio.
- __exit__(exc_type, exc_val, exc_tb)[source]
Close the microphone stream when exiting context manager.
- open()[source]
Open the microphone stream.
- Raises:
DeviceError – If the specified device cannot be opened.
- add_filter(filter: AudioFilter) Microphone[source]
Add an audio processing filter.
- Parameters:
filter – The audio filter to add
- Returns:
Self for method chaining
- remove_filter(filter_name: str) bool[source]
Remove an audio processing filter by name.
- Parameters:
filter_name – Name of the filter to remove
- Returns:
True if filter was removed, False if not found
- set_visualization_callback(callback: Callable[[Dict[str, float]], None]) Microphone[source]
Set a callback for audio level visualization.
The callback will be called with a dictionary containing: - rms: Root mean square level (0.0 to 1.0) - peak: Peak level (0.0 to 1.0) - avg_rms: Average RMS over window - avg_peak: Average peak over window
- Parameters:
callback – Function to call with audio level data
- Returns:
Self for method chaining
- read(num_frames: int | None = None) bytes[source]
Read audio data from the microphone.
- Parameters:
num_frames – Number of frames to read. If None, reads one chunk.
- Returns:
Raw audio data as bytes.
- stop_recording() Audio[source]
Stop recording and return the recorded audio.
- Returns:
Audio object containing the recorded audio
- record_until_silence(silence_threshold: float = 0.01, silence_duration: float = 1.0, max_seconds: float | None = None, pre_buffer_seconds: float = 0.5) Audio[source]
Record until silence is detected.
- Parameters:
silence_threshold – Threshold for silence detection (0.0 to 1.0)
silence_duration – Duration of silence to stop recording (seconds)
max_seconds – Maximum recording duration (seconds)
pre_buffer_seconds – Seconds of audio to include before speech starts
- Returns:
Audio object containing the recorded audio
- record(seconds: float, visualization_callback: Callable | None = None) Audio[source]
Record audio for a specified duration.
- Parameters:
seconds – Duration to record in seconds
visualization_callback – Optional callback for visualization during recording
- Returns:
Audio object containing the recorded audio
- static list_devices() List[Dict[str, Any]][source]
List available audio input devices.
- Returns:
List of dictionaries containing device information
- static get_default_device() Dict[str, Any] | None[source]
Get the default audio input device.
- Returns:
Default device information or None if not found
- class fmus_vox.stream.VoiceStream(input_device: int | Microphone | None = None, sample_rate: int = 16000, channels: int = 1, buffer_duration: float = 30.0, vad_mode: str = 'normal', min_silence_duration: float = 0.5, min_speech_duration: float = 0.3, **kwargs)[source]
Bases:
objectReal-time voice processing stream.
This class provides functionality for continuous voice processing, including voice activity detection, speech segmentation, and real-time transcription.
- __init__(input_device: int | Microphone | None = None, sample_rate: int = 16000, channels: int = 1, buffer_duration: float = 30.0, vad_mode: str = 'normal', min_silence_duration: float = 0.5, min_speech_duration: float = 0.3, **kwargs)[source]
Initialize a voice stream for continuous processing.
- Parameters:
input_device – Microphone device index or Microphone instance. If None, the default input device is used.
sample_rate – Sample rate for audio processing.
channels – Number of audio channels.
buffer_duration – Maximum duration of audio buffer in seconds.
vad_mode – Voice activity detection sensitivity (‘aggressive’, ‘normal’, or ‘relaxed’).
min_silence_duration – Minimum silence duration to consider a speech segment complete.
min_speech_duration – Minimum speech duration to consider a speech segment valid.
**kwargs – Additional parameters for the microphone.
- on_audio(callback: Callable[[ndarray, Dict[str, Any]], None]) None[source]
Register a callback for raw audio data.
- Parameters:
callback – Function that takes (audio_data, metadata) parameters.
- on_speech_start(callback: Callable[[Dict[str, Any]], None]) None[source]
Register a callback for when speech begins.
- Parameters:
callback – Function that takes a metadata dictionary.
- on_speech_end(callback: Callable[[Audio, Dict[str, Any]], None]) None[source]
Register a callback for when speech ends.
- Parameters:
callback – Function that takes (audio, metadata) parameters.
- on_speech(callback: Callable[[Audio, Dict[str, Any]], None]) None[source]
Register a callback for complete speech segments.
Equivalent to on_speech_end but with a more intuitive name.
- Parameters:
callback – Function that takes (audio, metadata) parameters.
- class fmus_vox.stream.StreamBuffer(max_duration: float = 10.0, sample_rate: int = 16000, channels: int = 1, dtype: ~numpy.dtype = <class 'numpy.float32'>)[source]
Bases:
objectAudio buffer for streaming applications.
This class manages a ring buffer for audio data, providing methods to add, retrieve, and manipulate audio frames for streaming processing.
- __init__(max_duration: float = 10.0, sample_rate: int = 16000, channels: int = 1, dtype: ~numpy.dtype = <class 'numpy.float32'>)[source]
Initialize an audio buffer for streaming.
- Parameters:
max_duration – Maximum buffer duration in seconds.
sample_rate – Sample rate of the audio.
channels – Number of audio channels.
dtype – Data type for the buffer.
- write(data: ndarray | bytes) int[source]
Write audio data to the buffer.
- Parameters:
data – Audio data to write, as numpy array or bytes.
- Returns:
Number of samples written.
- read(duration: float | None = None, n_samples: int | None = None) ndarray[source]
Read audio data from the buffer.
- Parameters:
duration – Duration to read in seconds.
n_samples – Number of samples to read (overrides duration if provided).
- Returns:
Numpy array containing the requested audio data.
- read_latest(duration: float) ndarray[source]
Read the most recent audio data from the buffer.
- Parameters:
duration – Duration to read in seconds.
- Returns:
Numpy array containing the most recent audio data.
- class fmus_vox.stream.AudioWebSocket(sample_rate: int = 16000, channels: int = 1, format: str = 'float32', chunk_size: int = 4096)[source]
Bases:
objectWebSocket handler for streaming audio data.
This class manages WebSocket connections for real-time audio streaming, supporting bidirectional communication with audio input and output.
- Parameters:
sample_rate – Audio sample rate
channels – Number of audio channels
format – Audio format (float32, int16, etc.)
chunk_size – Size of audio chunks for streaming
- __init__(sample_rate: int = 16000, channels: int = 1, format: str = 'float32', chunk_size: int = 4096)[source]
Initialize the WebSocket audio stream.
- Parameters:
sample_rate – Audio sample rate
channels – Number of audio channels (1=mono, 2=stereo)
format – Audio format
chunk_size – Size of audio chunks
- async connect(uri: str) None[source]
Connect to a WebSocket server.
- Parameters:
uri – WebSocket URI to connect to
- Raises:
ConnectionError – If connection fails
- async send_audio(audio: Audio) None[source]
Send audio data through the WebSocket.
- Parameters:
audio – Audio object to send
- async send_text(data: Dict[str, Any]) None[source]
Send a text message through the WebSocket.
- Parameters:
data – Dictionary to send as JSON
- async send_transcription(text: str, confidence: float = 1.0) None[source]
Send a transcription result.
- Parameters:
text – Transcribed text
confidence – Confidence score
- class fmus_vox.stream.WebSocketVoiceStream(ws_uri: str | None = None, **kwargs)[source]
Bases:
VoiceStreamVoiceStream with WebSocket support for remote audio processing.
This extends VoiceStream to add WebSocket connectivity for real-time audio streaming over the network.
- Parameters:
ws_uri – WebSocket URI to connect to
**kwargs – Additional arguments for VoiceStream
- __init__(ws_uri: str | None = None, **kwargs)[source]
Initialize the WebSocket voice stream.
- Parameters:
ws_uri – WebSocket URI to connect to
**kwargs – Additional arguments for VoiceStream
- async connect_websocket(uri: str | None = None) None[source]
Connect to a WebSocket server.
- Parameters:
uri – WebSocket URI (uses self.ws_uri if not provided)
- async stream_to_websocket(audio_generator: AsyncIterator[Audio]) None[source]
Stream audio to a WebSocket connection.
- Parameters:
audio_generator – Async generator of Audio objects
- async fmus_vox.stream.create_websocket_server(host: str = '0.0.0.0', port: int = 8765, on_client_connect: Callable | None = None, on_audio_receive: Callable | None = None) None[source]
Create a WebSocket server for audio streaming.
- Parameters:
host – Host to bind to
port – Port to bind to
on_client_connect – Callback when a client connects
on_audio_receive – Callback when audio is received
- Raises:
ImportError – If websockets library is not installed
VoiceStream Class
- class fmus_vox.stream.voice_stream.VoiceStream(input_device: int | Microphone | None = None, sample_rate: int = 16000, channels: int = 1, buffer_duration: float = 30.0, vad_mode: str = 'normal', min_silence_duration: float = 0.5, min_speech_duration: float = 0.3, **kwargs)[source]
Bases:
objectReal-time voice processing stream.
This class provides functionality for continuous voice processing, including voice activity detection, speech segmentation, and real-time transcription.
- __init__(input_device: int | Microphone | None = None, sample_rate: int = 16000, channels: int = 1, buffer_duration: float = 30.0, vad_mode: str = 'normal', min_silence_duration: float = 0.5, min_speech_duration: float = 0.3, **kwargs)[source]
Initialize a voice stream for continuous processing.
- Parameters:
input_device – Microphone device index or Microphone instance. If None, the default input device is used.
sample_rate – Sample rate for audio processing.
channels – Number of audio channels.
buffer_duration – Maximum duration of audio buffer in seconds.
vad_mode – Voice activity detection sensitivity (‘aggressive’, ‘normal’, or ‘relaxed’).
min_silence_duration – Minimum silence duration to consider a speech segment complete.
min_speech_duration – Minimum speech duration to consider a speech segment valid.
**kwargs – Additional parameters for the microphone.
- on_audio(callback: Callable[[ndarray, Dict[str, Any]], None]) None[source]
Register a callback for raw audio data.
- Parameters:
callback – Function that takes (audio_data, metadata) parameters.
- on_speech_start(callback: Callable[[Dict[str, Any]], None]) None[source]
Register a callback for when speech begins.
- Parameters:
callback – Function that takes a metadata dictionary.
- on_speech_end(callback: Callable[[Audio, Dict[str, Any]], None]) None[source]
Register a callback for when speech ends.
- Parameters:
callback – Function that takes (audio, metadata) parameters.
- on_speech(callback: Callable[[Audio, Dict[str, Any]], None]) None[source]
Register a callback for complete speech segments.
Equivalent to on_speech_end but with a more intuitive name.
- Parameters:
callback – Function that takes (audio, metadata) parameters.
Microphone Class
fmus_vox.stream.microphone - Enhanced microphone audio streaming implementation.
This module provides comprehensive functionality for capturing audio from microphone devices, with support for device selection, audio visualization, and real-time processing.
- class fmus_vox.stream.microphone.AudioFilter(name: str = 'AudioFilter')[source]
Bases:
objectBase class for real-time audio filters.
Subclasses should implement the process method to perform audio processing on incoming audio data.
- __init__(name: str = 'AudioFilter')[source]
Initialize an audio filter.
- Parameters:
name – Name of the filter for identification
- class fmus_vox.stream.microphone.NoiseReduction(strength: float = 0.5)[source]
Bases:
AudioFilterNoise reduction filter.
Reduces background noise in audio recordings.
- __init__(strength: float = 0.5)[source]
Initialize noise reduction filter.
- Parameters:
strength – Noise reduction strength (0.0 to 1.0)
- class fmus_vox.stream.microphone.Normalization(target_db: float = -3.0)[source]
Bases:
AudioFilterAudio normalization filter.
Normalizes audio volume to a target level.
- class fmus_vox.stream.microphone.AudioLevelMeter(window_size: int = 10)[source]
Bases:
objectAudio level meter for real-time visualization.
Provides RMS and peak level measurements for audio visualization.
- class fmus_vox.stream.microphone.Microphone(device_index: int | None = None, sample_rate: int = 16000, channels: int = 1, format: str = 'float32', chunk_size: int = 1024, **kwargs)[source]
Bases:
objectEnhanced class for recording audio from a microphone device.
This class provides both blocking and streaming interfaces for capturing audio from microphone input devices, with support for device selection, audio visualization, and real-time processing.
- FORMAT_MAP = {'float32': None, 'int16': None, 'int24': None, 'int32': None, 'int8': None, 'uint8': None}
- __init__(device_index: int | None = None, sample_rate: int = 16000, channels: int = 1, format: str = 'float32', chunk_size: int = 1024, **kwargs)[source]
Initialize a microphone input stream.
- Parameters:
device_index – Index of the input device to use. None for default.
sample_rate – Sample rate to record at.
channels – Number of audio channels to record.
format – Audio format (‘float32’, ‘int16’, etc.)
chunk_size – Size of audio chunks to process at once.
**kwargs – Additional parameters for PyAudio.
- __exit__(exc_type, exc_val, exc_tb)[source]
Close the microphone stream when exiting context manager.
- open()[source]
Open the microphone stream.
- Raises:
DeviceError – If the specified device cannot be opened.
- add_filter(filter: AudioFilter) Microphone[source]
Add an audio processing filter.
- Parameters:
filter – The audio filter to add
- Returns:
Self for method chaining
- remove_filter(filter_name: str) bool[source]
Remove an audio processing filter by name.
- Parameters:
filter_name – Name of the filter to remove
- Returns:
True if filter was removed, False if not found
- set_visualization_callback(callback: Callable[[Dict[str, float]], None]) Microphone[source]
Set a callback for audio level visualization.
The callback will be called with a dictionary containing: - rms: Root mean square level (0.0 to 1.0) - peak: Peak level (0.0 to 1.0) - avg_rms: Average RMS over window - avg_peak: Average peak over window
- Parameters:
callback – Function to call with audio level data
- Returns:
Self for method chaining
- read(num_frames: int | None = None) bytes[source]
Read audio data from the microphone.
- Parameters:
num_frames – Number of frames to read. If None, reads one chunk.
- Returns:
Raw audio data as bytes.
- stop_recording() Audio[source]
Stop recording and return the recorded audio.
- Returns:
Audio object containing the recorded audio
- record_until_silence(silence_threshold: float = 0.01, silence_duration: float = 1.0, max_seconds: float | None = None, pre_buffer_seconds: float = 0.5) Audio[source]
Record until silence is detected.
- Parameters:
silence_threshold – Threshold for silence detection (0.0 to 1.0)
silence_duration – Duration of silence to stop recording (seconds)
max_seconds – Maximum recording duration (seconds)
pre_buffer_seconds – Seconds of audio to include before speech starts
- Returns:
Audio object containing the recorded audio
- record(seconds: float, visualization_callback: Callable | None = None) Audio[source]
Record audio for a specified duration.
- Parameters:
seconds – Duration to record in seconds
visualization_callback – Optional callback for visualization during recording
- Returns:
Audio object containing the recorded audio
- static list_devices() List[Dict[str, Any]][source]
List available audio input devices.
- Returns:
List of dictionaries containing device information
- static get_default_device() Dict[str, Any] | None[source]
Get the default audio input device.
- Returns:
Default device information or None if not found
- fmus_vox.stream.microphone.Mic
alias of
Microphone
AudioPlayer Class
fmus_vox.stream.audioplayer - Audio playback functionality.
This module provides classes for audio playback with support for file playback, streaming playback, and real-time audio output processing.
- class fmus_vox.stream.audioplayer.AudioEffect(name: str = 'AudioEffect')[source]
Bases:
objectBase class for real-time audio output effects.
Subclasses should implement the process method to perform audio processing on outgoing audio data.
- __init__(name: str = 'AudioEffect')[source]
Initialize an audio effect.
- Parameters:
name – Name of the effect for identification
- class fmus_vox.stream.audioplayer.Equalizer(bands: Dict[str, float] | None = None)[source]
Bases:
AudioEffectSimple equalizer effect for audio playback.
Applies gain adjustments to different frequency bands.
- __init__(bands: Dict[str, float] | None = None)[source]
Initialize equalizer with frequency band gains.
- Parameters:
bands – Dictionary of frequency bands and their gains (in dB) Default bands: “low”, “mid”, “high”
- class fmus_vox.stream.audioplayer.AudioPlayer(device_index: int | None = None, sample_rate: int = 44100, channels: int = 2, format: str = 'float32', buffer_size: int = 1024, **kwargs)[source]
Bases:
objectClass for playing audio from files or streams.
This class provides functionality for audio playback with support for real-time effects processing and audio format conversion.
- FORMAT_MAP = {'float32': None, 'int16': None, 'int24': None, 'int32': None, 'int8': None, 'uint8': None}
- __init__(device_index: int | None = None, sample_rate: int = 44100, channels: int = 2, format: str = 'float32', buffer_size: int = 1024, **kwargs)[source]
Initialize an audio player.
- Parameters:
device_index – Index of the output device to use. None for default.
sample_rate – Sample rate for playback.
channels – Number of audio channels for playback.
format – Audio format (‘float32’, ‘int16’, etc.)
buffer_size – Size of audio buffer chunks for playback.
**kwargs – Additional parameters for PyAudio.
- open()[source]
Open the audio playback stream.
- Raises:
DeviceError – If the specified device cannot be opened.
- add_effect(effect: AudioEffect) AudioPlayer[source]
Add an audio processing effect.
- Parameters:
effect – The audio effect to add
- Returns:
Self for method chaining
- remove_effect(effect_name: str) bool[source]
Remove an audio processing effect by name.
- Parameters:
effect_name – Name of the effect to remove
- Returns:
True if effect was removed, False if not found
- on_playback_complete(callback: Callable[[], None]) AudioPlayer[source]
Set callback for when playback completes.
- Parameters:
callback – Function to call when playback finishes
- Returns:
Self for method chaining
- on_position_change(callback: Callable[[float, float], None]) AudioPlayer[source]
Set callback for playback position updates.
The callback will be called with current position (seconds) and total duration (seconds) as arguments.
- Parameters:
callback – Function to call with position updates
- Returns:
Self for method chaining
- play(audio: Audio | ndarray | str) None[source]
Play audio from an Audio object, numpy array, or file.
- Parameters:
audio – Audio data to play. Can be: - Audio object - Numpy array (float32, -1.0 to 1.0 range) - String path to audio file
- seek(position_seconds: float) None[source]
Seek to a specific position in the audio.
- Parameters:
position_seconds – Position in seconds to seek to
- get_position() float[source]
Get current playback position in seconds.
- Returns:
Current position in seconds
- get_duration() float[source]
Get total duration of the loaded audio in seconds.
- Returns:
Total duration in seconds
- is_playing() bool[source]
Check if audio is currently playing.
- Returns:
True if audio is playing, False otherwise
- fmus_vox.stream.audioplayer.Player
alias of
AudioPlayer