Streaming

fmus_vox.stream - Audio streaming functionality.

This module provides interfaces and implementations for audio streaming, including microphone input, websocket streaming, and real-time processing.

class fmus_vox.stream.Microphone(device_index: int | None = None, sample_rate: int = 16000, channels: int = 1, format: str = 'float32', chunk_size: int = 1024, **kwargs)[source]

Bases: object

Enhanced class for recording audio from a microphone device.

This class provides both blocking and streaming interfaces for capturing audio from microphone input devices, with support for device selection, audio visualization, and real-time processing.

FORMAT_MAP = {'float32': None, 'int16': None, 'int24': None, 'int32': None, 'int8': None, 'uint8': None}
__init__(device_index: int | None = None, sample_rate: int = 16000, channels: int = 1, format: str = 'float32', chunk_size: int = 1024, **kwargs)[source]

Initialize a microphone input stream.

Parameters:
  • device_index – Index of the input device to use. None for default.

  • sample_rate – Sample rate to record at.

  • channels – Number of audio channels to record.

  • format – Audio format (‘float32’, ‘int16’, etc.)

  • chunk_size – Size of audio chunks to process at once.

  • **kwargs – Additional parameters for PyAudio.

__enter__()[source]

Start the microphone stream when used as a context manager.

__exit__(exc_type, exc_val, exc_tb)[source]

Close the microphone stream when exiting context manager.

open()[source]

Open the microphone stream.

Raises:

DeviceError – If the specified device cannot be opened.

close()[source]

Close the microphone stream and release resources.

add_filter(filter: AudioFilter) Microphone[source]

Add an audio processing filter.

Parameters:

filter – The audio filter to add

Returns:

Self for method chaining

remove_filter(filter_name: str) bool[source]

Remove an audio processing filter by name.

Parameters:

filter_name – Name of the filter to remove

Returns:

True if filter was removed, False if not found

set_visualization_callback(callback: Callable[[Dict[str, float]], None]) Microphone[source]

Set a callback for audio level visualization.

The callback will be called with a dictionary containing: - rms: Root mean square level (0.0 to 1.0) - peak: Peak level (0.0 to 1.0) - avg_rms: Average RMS over window - avg_peak: Average peak over window

Parameters:

callback – Function to call with audio level data

Returns:

Self for method chaining

read(num_frames: int | None = None) bytes[source]

Read audio data from the microphone.

Parameters:

num_frames – Number of frames to read. If None, reads one chunk.

Returns:

Raw audio data as bytes.

start_recording()[source]

Start recording audio to internal buffer.

stop_recording() Audio[source]

Stop recording and return the recorded audio.

Returns:

Audio object containing the recorded audio

record_until_silence(silence_threshold: float = 0.01, silence_duration: float = 1.0, max_seconds: float | None = None, pre_buffer_seconds: float = 0.5) Audio[source]

Record until silence is detected.

Parameters:
  • silence_threshold – Threshold for silence detection (0.0 to 1.0)

  • silence_duration – Duration of silence to stop recording (seconds)

  • max_seconds – Maximum recording duration (seconds)

  • pre_buffer_seconds – Seconds of audio to include before speech starts

Returns:

Audio object containing the recorded audio

record(seconds: float, visualization_callback: Callable | None = None) Audio[source]

Record audio for a specified duration.

Parameters:
  • seconds – Duration to record in seconds

  • visualization_callback – Optional callback for visualization during recording

Returns:

Audio object containing the recorded audio

static list_devices() List[Dict[str, Any]][source]

List available audio input devices.

Returns:

List of dictionaries containing device information

static get_default_device() Dict[str, Any] | None[source]

Get the default audio input device.

Returns:

Default device information or None if not found

calibrate_noise_profile(seconds: float = 2.0) None[source]

Calibrate noise reduction filter with ambient noise.

Records ambient noise to calibrate the noise reduction filter.

Parameters:

seconds – Duration to record ambient noise in seconds

create_vad_detector(threshold: float = 0.02, window_size: int = 10) None[source]

Create a Voice Activity Detector filter.

Parameters:
  • threshold – Threshold for voice detection (0.0 to 1.0)

  • window_size – Window size for smoothing in frames

class fmus_vox.stream.VoiceStream(input_device: int | Microphone | None = None, sample_rate: int = 16000, channels: int = 1, buffer_duration: float = 30.0, vad_mode: str = 'normal', min_silence_duration: float = 0.5, min_speech_duration: float = 0.3, **kwargs)[source]

Bases: object

Real-time voice processing stream.

This class provides functionality for continuous voice processing, including voice activity detection, speech segmentation, and real-time transcription.

__init__(input_device: int | Microphone | None = None, sample_rate: int = 16000, channels: int = 1, buffer_duration: float = 30.0, vad_mode: str = 'normal', min_silence_duration: float = 0.5, min_speech_duration: float = 0.3, **kwargs)[source]

Initialize a voice stream for continuous processing.

Parameters:
  • input_device – Microphone device index or Microphone instance. If None, the default input device is used.

  • sample_rate – Sample rate for audio processing.

  • channels – Number of audio channels.

  • buffer_duration – Maximum duration of audio buffer in seconds.

  • vad_mode – Voice activity detection sensitivity (‘aggressive’, ‘normal’, or ‘relaxed’).

  • min_silence_duration – Minimum silence duration to consider a speech segment complete.

  • min_speech_duration – Minimum speech duration to consider a speech segment valid.

  • **kwargs – Additional parameters for the microphone.

on_audio(callback: Callable[[ndarray, Dict[str, Any]], None]) None[source]

Register a callback for raw audio data.

Parameters:

callback – Function that takes (audio_data, metadata) parameters.

on_speech_start(callback: Callable[[Dict[str, Any]], None]) None[source]

Register a callback for when speech begins.

Parameters:

callback – Function that takes a metadata dictionary.

on_speech_end(callback: Callable[[Audio, Dict[str, Any]], None]) None[source]

Register a callback for when speech ends.

Parameters:

callback – Function that takes (audio, metadata) parameters.

on_speech(callback: Callable[[Audio, Dict[str, Any]], None]) None[source]

Register a callback for complete speech segments.

Equivalent to on_speech_end but with a more intuitive name.

Parameters:

callback – Function that takes (audio, metadata) parameters.

on_vad(callback: Callable[[bool, Dict[str, Any]], None]) None[source]

Register a callback for voice activity detection events.

Parameters:

callback – Function that takes (is_speech, metadata) parameters.

start() None[source]

Start the voice stream processing.

stop() None[source]

Stop the voice stream processing.

__enter__()[source]

Start the stream when used as a context manager.

__exit__(exc_type, exc_val, exc_tb)[source]

Stop the stream when exiting context manager.

__del__()[source]

Clean up resources when garbage collected.

class fmus_vox.stream.StreamBuffer(max_duration: float = 10.0, sample_rate: int = 16000, channels: int = 1, dtype: ~numpy.dtype = <class 'numpy.float32'>)[source]

Bases: object

Audio buffer for streaming applications.

This class manages a ring buffer for audio data, providing methods to add, retrieve, and manipulate audio frames for streaming processing.

__init__(max_duration: float = 10.0, sample_rate: int = 16000, channels: int = 1, dtype: ~numpy.dtype = <class 'numpy.float32'>)[source]

Initialize an audio buffer for streaming.

Parameters:
  • max_duration – Maximum buffer duration in seconds.

  • sample_rate – Sample rate of the audio.

  • channels – Number of audio channels.

  • dtype – Data type for the buffer.

write(data: ndarray | bytes) int[source]

Write audio data to the buffer.

Parameters:

data – Audio data to write, as numpy array or bytes.

Returns:

Number of samples written.

read(duration: float | None = None, n_samples: int | None = None) ndarray[source]

Read audio data from the buffer.

Parameters:
  • duration – Duration to read in seconds.

  • n_samples – Number of samples to read (overrides duration if provided).

Returns:

Numpy array containing the requested audio data.

read_latest(duration: float) ndarray[source]

Read the most recent audio data from the buffer.

Parameters:

duration – Duration to read in seconds.

Returns:

Numpy array containing the most recent audio data.

clear() None[source]

Clear the buffer.

to_audio(duration: float | None = None) Audio[source]

Convert buffer contents to an Audio object.

Parameters:

duration – Duration to convert in seconds. If None, uses all available data.

Returns:

Audio object containing the buffer data.

__len__() int[source]

Return the number of samples currently in the buffer.

class fmus_vox.stream.AudioWebSocket(sample_rate: int = 16000, channels: int = 1, format: str = 'float32', chunk_size: int = 4096)[source]

Bases: object

WebSocket handler for streaming audio data.

This class manages WebSocket connections for real-time audio streaming, supporting bidirectional communication with audio input and output.

Parameters:
  • sample_rate – Audio sample rate

  • channels – Number of audio channels

  • format – Audio format (float32, int16, etc.)

  • chunk_size – Size of audio chunks for streaming

__init__(sample_rate: int = 16000, channels: int = 1, format: str = 'float32', chunk_size: int = 4096)[source]

Initialize the WebSocket audio stream.

Parameters:
  • sample_rate – Audio sample rate

  • channels – Number of audio channels (1=mono, 2=stereo)

  • format – Audio format

  • chunk_size – Size of audio chunks

async connect(uri: str) None[source]

Connect to a WebSocket server.

Parameters:

uri – WebSocket URI to connect to

Raises:

ConnectionError – If connection fails

async disconnect() None[source]

Disconnect from the WebSocket server.

async send_audio(audio: Audio) None[source]

Send audio data through the WebSocket.

Parameters:

audio – Audio object to send

async send_text(data: Dict[str, Any]) None[source]

Send a text message through the WebSocket.

Parameters:

data – Dictionary to send as JSON

async send_transcription(text: str, confidence: float = 1.0) None[source]

Send a transcription result.

Parameters:
  • text – Transcribed text

  • confidence – Confidence score

async send_synthesis(audio: Audio) None[source]

Send synthesized speech.

Parameters:

audio – Audio to send

property is_connected: bool

Check if connected to WebSocket.

property is_streaming: bool

Check if currently streaming.

class fmus_vox.stream.WebSocketVoiceStream(ws_uri: str | None = None, **kwargs)[source]

Bases: VoiceStream

VoiceStream with WebSocket support for remote audio processing.

This extends VoiceStream to add WebSocket connectivity for real-time audio streaming over the network.

Parameters:
  • ws_uri – WebSocket URI to connect to

  • **kwargs – Additional arguments for VoiceStream

__init__(ws_uri: str | None = None, **kwargs)[source]

Initialize the WebSocket voice stream.

Parameters:
  • ws_uri – WebSocket URI to connect to

  • **kwargs – Additional arguments for VoiceStream

async connect_websocket(uri: str | None = None) None[source]

Connect to a WebSocket server.

Parameters:

uri – WebSocket URI (uses self.ws_uri if not provided)

async disconnect_websocket() None[source]

Disconnect from the WebSocket server.

async stream_to_websocket(audio_generator: AsyncIterator[Audio]) None[source]

Stream audio to a WebSocket connection.

Parameters:

audio_generator – Async generator of Audio objects

async fmus_vox.stream.create_websocket_server(host: str = '0.0.0.0', port: int = 8765, on_client_connect: Callable | None = None, on_audio_receive: Callable | None = None) None[source]

Create a WebSocket server for audio streaming.

Parameters:
  • host – Host to bind to

  • port – Port to bind to

  • on_client_connect – Callback when a client connects

  • on_audio_receive – Callback when audio is received

Raises:

ImportError – If websockets library is not installed

VoiceStream Class

class fmus_vox.stream.voice_stream.VoiceStream(input_device: int | Microphone | None = None, sample_rate: int = 16000, channels: int = 1, buffer_duration: float = 30.0, vad_mode: str = 'normal', min_silence_duration: float = 0.5, min_speech_duration: float = 0.3, **kwargs)[source]

Bases: object

Real-time voice processing stream.

This class provides functionality for continuous voice processing, including voice activity detection, speech segmentation, and real-time transcription.

__init__(input_device: int | Microphone | None = None, sample_rate: int = 16000, channels: int = 1, buffer_duration: float = 30.0, vad_mode: str = 'normal', min_silence_duration: float = 0.5, min_speech_duration: float = 0.3, **kwargs)[source]

Initialize a voice stream for continuous processing.

Parameters:
  • input_device – Microphone device index or Microphone instance. If None, the default input device is used.

  • sample_rate – Sample rate for audio processing.

  • channels – Number of audio channels.

  • buffer_duration – Maximum duration of audio buffer in seconds.

  • vad_mode – Voice activity detection sensitivity (‘aggressive’, ‘normal’, or ‘relaxed’).

  • min_silence_duration – Minimum silence duration to consider a speech segment complete.

  • min_speech_duration – Minimum speech duration to consider a speech segment valid.

  • **kwargs – Additional parameters for the microphone.

on_audio(callback: Callable[[ndarray, Dict[str, Any]], None]) None[source]

Register a callback for raw audio data.

Parameters:

callback – Function that takes (audio_data, metadata) parameters.

on_speech_start(callback: Callable[[Dict[str, Any]], None]) None[source]

Register a callback for when speech begins.

Parameters:

callback – Function that takes a metadata dictionary.

on_speech_end(callback: Callable[[Audio, Dict[str, Any]], None]) None[source]

Register a callback for when speech ends.

Parameters:

callback – Function that takes (audio, metadata) parameters.

on_speech(callback: Callable[[Audio, Dict[str, Any]], None]) None[source]

Register a callback for complete speech segments.

Equivalent to on_speech_end but with a more intuitive name.

Parameters:

callback – Function that takes (audio, metadata) parameters.

on_vad(callback: Callable[[bool, Dict[str, Any]], None]) None[source]

Register a callback for voice activity detection events.

Parameters:

callback – Function that takes (is_speech, metadata) parameters.

start() None[source]

Start the voice stream processing.

stop() None[source]

Stop the voice stream processing.

__enter__()[source]

Start the stream when used as a context manager.

__exit__(exc_type, exc_val, exc_tb)[source]

Stop the stream when exiting context manager.

__del__()[source]

Clean up resources when garbage collected.

Microphone Class

fmus_vox.stream.microphone - Enhanced microphone audio streaming implementation.

This module provides comprehensive functionality for capturing audio from microphone devices, with support for device selection, audio visualization, and real-time processing.

class fmus_vox.stream.microphone.AudioFilter(name: str = 'AudioFilter')[source]

Bases: object

Base class for real-time audio filters.

Subclasses should implement the process method to perform audio processing on incoming audio data.

__init__(name: str = 'AudioFilter')[source]

Initialize an audio filter.

Parameters:

name – Name of the filter for identification

process(data: ndarray, sample_rate: int) ndarray[source]

Process audio data.

Parameters:
  • data – Audio data as numpy array

  • sample_rate – Sample rate of the audio

Returns:

Processed audio data

enable()[source]

Enable the filter.

disable()[source]

Disable the filter.

class fmus_vox.stream.microphone.NoiseReduction(strength: float = 0.5)[source]

Bases: AudioFilter

Noise reduction filter.

Reduces background noise in audio recordings.

__init__(strength: float = 0.5)[source]

Initialize noise reduction filter.

Parameters:

strength – Noise reduction strength (0.0 to 1.0)

calibrate(noise_sample: ndarray)[source]

Calibrate the noise profile from a sample of background noise.

Parameters:

noise_sample – Audio sample containing only background noise

process(data: ndarray, sample_rate: int) ndarray[source]

Apply noise reduction to the audio data.

class fmus_vox.stream.microphone.Normalization(target_db: float = -3.0)[source]

Bases: AudioFilter

Audio normalization filter.

Normalizes audio volume to a target level.

__init__(target_db: float = -3.0)[source]

Initialize normalization filter.

Parameters:

target_db – Target dB level to normalize to

process(data: ndarray, sample_rate: int) ndarray[source]

Normalize audio volume.

class fmus_vox.stream.microphone.AudioLevelMeter(window_size: int = 10)[source]

Bases: object

Audio level meter for real-time visualization.

Provides RMS and peak level measurements for audio visualization.

__init__(window_size: int = 10)[source]

Initialize audio level meter.

Parameters:

window_size – Size of the averaging window in frames

process(data: ndarray) Dict[str, float][source]

Process audio data and calculate levels.

Parameters:

data – Audio data as numpy array

Returns:

Dictionary with rms and peak levels

class fmus_vox.stream.microphone.Microphone(device_index: int | None = None, sample_rate: int = 16000, channels: int = 1, format: str = 'float32', chunk_size: int = 1024, **kwargs)[source]

Bases: object

Enhanced class for recording audio from a microphone device.

This class provides both blocking and streaming interfaces for capturing audio from microphone input devices, with support for device selection, audio visualization, and real-time processing.

FORMAT_MAP = {'float32': None, 'int16': None, 'int24': None, 'int32': None, 'int8': None, 'uint8': None}
__init__(device_index: int | None = None, sample_rate: int = 16000, channels: int = 1, format: str = 'float32', chunk_size: int = 1024, **kwargs)[source]

Initialize a microphone input stream.

Parameters:
  • device_index – Index of the input device to use. None for default.

  • sample_rate – Sample rate to record at.

  • channels – Number of audio channels to record.

  • format – Audio format (‘float32’, ‘int16’, etc.)

  • chunk_size – Size of audio chunks to process at once.

  • **kwargs – Additional parameters for PyAudio.

__enter__()[source]

Start the microphone stream when used as a context manager.

__exit__(exc_type, exc_val, exc_tb)[source]

Close the microphone stream when exiting context manager.

open()[source]

Open the microphone stream.

Raises:

DeviceError – If the specified device cannot be opened.

close()[source]

Close the microphone stream and release resources.

add_filter(filter: AudioFilter) Microphone[source]

Add an audio processing filter.

Parameters:

filter – The audio filter to add

Returns:

Self for method chaining

remove_filter(filter_name: str) bool[source]

Remove an audio processing filter by name.

Parameters:

filter_name – Name of the filter to remove

Returns:

True if filter was removed, False if not found

set_visualization_callback(callback: Callable[[Dict[str, float]], None]) Microphone[source]

Set a callback for audio level visualization.

The callback will be called with a dictionary containing: - rms: Root mean square level (0.0 to 1.0) - peak: Peak level (0.0 to 1.0) - avg_rms: Average RMS over window - avg_peak: Average peak over window

Parameters:

callback – Function to call with audio level data

Returns:

Self for method chaining

read(num_frames: int | None = None) bytes[source]

Read audio data from the microphone.

Parameters:

num_frames – Number of frames to read. If None, reads one chunk.

Returns:

Raw audio data as bytes.

start_recording()[source]

Start recording audio to internal buffer.

stop_recording() Audio[source]

Stop recording and return the recorded audio.

Returns:

Audio object containing the recorded audio

record_until_silence(silence_threshold: float = 0.01, silence_duration: float = 1.0, max_seconds: float | None = None, pre_buffer_seconds: float = 0.5) Audio[source]

Record until silence is detected.

Parameters:
  • silence_threshold – Threshold for silence detection (0.0 to 1.0)

  • silence_duration – Duration of silence to stop recording (seconds)

  • max_seconds – Maximum recording duration (seconds)

  • pre_buffer_seconds – Seconds of audio to include before speech starts

Returns:

Audio object containing the recorded audio

record(seconds: float, visualization_callback: Callable | None = None) Audio[source]

Record audio for a specified duration.

Parameters:
  • seconds – Duration to record in seconds

  • visualization_callback – Optional callback for visualization during recording

Returns:

Audio object containing the recorded audio

static list_devices() List[Dict[str, Any]][source]

List available audio input devices.

Returns:

List of dictionaries containing device information

static get_default_device() Dict[str, Any] | None[source]

Get the default audio input device.

Returns:

Default device information or None if not found

calibrate_noise_profile(seconds: float = 2.0) None[source]

Calibrate noise reduction filter with ambient noise.

Records ambient noise to calibrate the noise reduction filter.

Parameters:

seconds – Duration to record ambient noise in seconds

create_vad_detector(threshold: float = 0.02, window_size: int = 10) None[source]

Create a Voice Activity Detector filter.

Parameters:
  • threshold – Threshold for voice detection (0.0 to 1.0)

  • window_size – Window size for smoothing in frames

fmus_vox.stream.microphone.Mic

alias of Microphone

AudioPlayer Class

fmus_vox.stream.audioplayer - Audio playback functionality.

This module provides classes for audio playback with support for file playback, streaming playback, and real-time audio output processing.

class fmus_vox.stream.audioplayer.AudioEffect(name: str = 'AudioEffect')[source]

Bases: object

Base class for real-time audio output effects.

Subclasses should implement the process method to perform audio processing on outgoing audio data.

__init__(name: str = 'AudioEffect')[source]

Initialize an audio effect.

Parameters:

name – Name of the effect for identification

process(data: ndarray, sample_rate: int) ndarray[source]

Process audio data.

Parameters:
  • data – Audio data as numpy array

  • sample_rate – Sample rate of the audio

Returns:

Processed audio data

enable()[source]

Enable the effect.

disable()[source]

Disable the effect.

class fmus_vox.stream.audioplayer.Equalizer(bands: Dict[str, float] | None = None)[source]

Bases: AudioEffect

Simple equalizer effect for audio playback.

Applies gain adjustments to different frequency bands.

__init__(bands: Dict[str, float] | None = None)[source]

Initialize equalizer with frequency band gains.

Parameters:

bands – Dictionary of frequency bands and their gains (in dB) Default bands: “low”, “mid”, “high”

set_gain(band: str, gain_db: float) None[source]

Set gain for a specific frequency band.

Parameters:
  • band – Band name (“low”, “mid”, “high”, or custom band)

  • gain_db – Gain in decibels (-12 to +12 recommended)

process(data: ndarray, sample_rate: int) ndarray[source]

Apply equalization to the audio data.

class fmus_vox.stream.audioplayer.AudioPlayer(device_index: int | None = None, sample_rate: int = 44100, channels: int = 2, format: str = 'float32', buffer_size: int = 1024, **kwargs)[source]

Bases: object

Class for playing audio from files or streams.

This class provides functionality for audio playback with support for real-time effects processing and audio format conversion.

FORMAT_MAP = {'float32': None, 'int16': None, 'int24': None, 'int32': None, 'int8': None, 'uint8': None}
__init__(device_index: int | None = None, sample_rate: int = 44100, channels: int = 2, format: str = 'float32', buffer_size: int = 1024, **kwargs)[source]

Initialize an audio player.

Parameters:
  • device_index – Index of the output device to use. None for default.

  • sample_rate – Sample rate for playback.

  • channels – Number of audio channels for playback.

  • format – Audio format (‘float32’, ‘int16’, etc.)

  • buffer_size – Size of audio buffer chunks for playback.

  • **kwargs – Additional parameters for PyAudio.

__enter__()[source]

Open the audio stream when used as a context manager.

__exit__(exc_type, exc_val, exc_tb)[source]

Close the audio stream when exiting context manager.

open()[source]

Open the audio playback stream.

Raises:

DeviceError – If the specified device cannot be opened.

close()[source]

Close the audio playback stream and release resources.

add_effect(effect: AudioEffect) AudioPlayer[source]

Add an audio processing effect.

Parameters:

effect – The audio effect to add

Returns:

Self for method chaining

remove_effect(effect_name: str) bool[source]

Remove an audio processing effect by name.

Parameters:

effect_name – Name of the effect to remove

Returns:

True if effect was removed, False if not found

on_playback_complete(callback: Callable[[], None]) AudioPlayer[source]

Set callback for when playback completes.

Parameters:

callback – Function to call when playback finishes

Returns:

Self for method chaining

on_position_change(callback: Callable[[float, float], None]) AudioPlayer[source]

Set callback for playback position updates.

The callback will be called with current position (seconds) and total duration (seconds) as arguments.

Parameters:

callback – Function to call with position updates

Returns:

Self for method chaining

play(audio: Audio | ndarray | str) None[source]

Play audio from an Audio object, numpy array, or file.

Parameters:

audio – Audio data to play. Can be: - Audio object - Numpy array (float32, -1.0 to 1.0 range) - String path to audio file

stop() None[source]

Stop audio playback.

pause() None[source]

Pause audio playback.

resume() None[source]

Resume audio playback.

seek(position_seconds: float) None[source]

Seek to a specific position in the audio.

Parameters:

position_seconds – Position in seconds to seek to

get_position() float[source]

Get current playback position in seconds.

Returns:

Current position in seconds

get_duration() float[source]

Get total duration of the loaded audio in seconds.

Returns:

Total duration in seconds

is_playing() bool[source]

Check if audio is currently playing.

Returns:

True if audio is playing, False otherwise

static list_devices() List[Dict[str, Any]][source]

List available audio output devices.

Returns:

List of dictionaries containing device information

static get_default_device() Dict[str, Any] | None[source]

Get the default audio output device.

Returns:

Default device information or None if not found

fmus_vox.stream.audioplayer.Player

alias of AudioPlayer