Speex: A Free Codec For Free Speech-tonmind.com

Blog

Tags

Speex: A Free Codec For Free Speech

November 17 , 2021

Overview

Speex is an Open Source/Free Software patent-free audio compression format designed for speech. The Speex Project aims to lower the barrier of entry for voice applications by providing a free alternative to expensive proprietary speech codecs. Moreover, Speex is well-adapted to Internet applications and provides useful features that are not present in most other codecs. Finally, Speex is part of the GNU Project and is available under the revised BSD license.

Speex is targeted at Voice over IP (VoIP) and file-based compression. The design goals have been to make a codec that would be optimized for high quality speech and low bit rate. To achieve this the codec uses multiple bit rates, and supports ultra-wideband, wideband and narrowband. The codec is determined to be robust to lost packets, but weak to corrupted ones. All this led to the choice of code excited linear prediction (CELP) as the encoding technique to use for Speex.

Features

Sampling rate
Speex is mainly designed for three different sampling rates: 8 kHz (the same sampling rate to transmit telephone calls), 16 kHz, and 32 kHz. These are respectively referred to as narrowband, wideband and ultra-wideband.

Quality
Speex encoding is controlled most of the time by a quality parameter that ranges from 0 to 10. In constant bit-rate (CBR) operation, the quality parameter is an integer, while for variable bit-rate (VBR), the parameter is a real (floating point) number.

Complexity (variable)
With Speex, it is possible to vary the complexity allowed for the encoder. This is done by controlling how the search is performed with an integer ranging from 1 to 10 in a way similar to the -1 to -9 options to gzip compression utilities. For normal use, the noise level at complexity 1 is between 1 and 2 dB higher than at complexity 10, but the CPU requirements for complexity 10 is about five times higher than for complexity 1. In practice, the best trade-off is between complexity 2 and 4,[13] though higher settings are often useful when encoding non-speech sounds like DTMF tones, or if encoding is not in real-time.

Variable bit-rate (VBR)
Variable bit-rate (VBR) allows a codec to change its bit rate dynamically to adapt to the "difficulty" of the audio being encoded. In the example of Speex, sounds like vowels and high-energy transients require a higher bit rate to achieve good quality, while fricatives (e.g. s and f sounds) can be coded adequately with fewer bits. For this reason, VBR can achieve lower bit rate for the same quality, or a better quality for a certain bit rate. Despite its advantages, VBR has three main drawbacks: first, by only specifying quality, there is no guarantee about the final average bit-rate. Second, for some real-time applications like voice over IP (VoIP), what counts is the maximum bit-rate, which must be low enough for the communication channel. Third, encryption of VBR-encoded speech may not ensure complete privacy, as phrases can still be identified, at least in a controlled setting with a small dictionary of phrases,[14] by analysing the pattern of variation of the bit rate.

Average bit-rate (ABR)
Average bit-rate solves one of the problems of VBR, as it dynamically adjusts VBR quality in order to meet a specific target bit-rate. Because the quality/bit-rate is adjusted in real-time (open-loop), the global quality will be slightly lower than that obtained by encoding in VBR with exactly the right quality setting to meet the target average bitrate.

Voice Activity Detection (VAD)
When enabled, voice activity detection detects whether the audio being encoded is speech or silence/background noise. VAD is always implicitly activated when encoding in VBR, so the option is only useful in non-VBR operation. In this case, Speex detects non-speech periods and encodes them with just enough bits to reproduce the background noise. This is called "comfort noise generation" (CNG). Last version VAD was working fine is 1.1.12, since v 1.2 it has been replaced with simple Any Activity Detection.

Discontinuous transmission (DTX)
Discontinuous transmission is an addition to VAD/VBR operation which allows ceasing transmitting completely when the background noise is stationary. In a file, 5 bits are used for each missing frame (corresponding to 250 bit/s).

Perceptual enhancement
Perceptual enhancement is a part of the decoder which, when turned on, tries to reduce (the perception of) the noise produced by the coding/decoding process. In most cases, perceptual enhancement makes the sound further from the original objectively (signal-to-noise ratio), but in the end it still sounds better (subjective improvement).

Algorithmic delay
Every codec introduces a delay in the transmission. For Speex, this delay is equal to the frame size, plus some amount of "look-ahead" required to process each frame. In narrowband operation (8 kHz), the delay is 30 ms, while for wideband (16 kHz), the delay is 34 ms. These values do not account for the CPU time it takes to encode or decode the frames.

TONMIND, designer and manufacturer of IP Speaker since 2014. The SIP Speakers have applied Speex audio processing to improve the sound quality.

Our IP Paging Speakers Codec includes OPUS, G711U, G711A, G722, GSM, MP1, MP2, MP3, WAV, LPCM s16le. The various codec also ensures excellent sound quality.

Tomind SIP Speaker can be applied to various application cases, for instance, school, commercial canter, customer service center, hotel, hospital, large venues, etc. Users can connect the SIP Speakers with IPPBX or the PA System Software develloped by our R&D Team. It can also work with Axis software via RTP Multicasting.

Tonmind core strength includes:

• Over 10 years VoIP audio and video experience

• Exclusive technical support .
• Well-trained customer team.
• Customer Oriented.
• Quick market response.

Tags :