Speex is an open-source lossy speech codec designed specifically for VoIP and file-based compression. It is useful for compressing recordings of speeches or interviews. Speex is not a music codec.
Max automatically places the Speex bitstream in an Ogg container.
Configuring Speex Output
Speex configuration is located in the Speex section of the preferences.
Mode
Mode designates the sampling rate with which you wish to encode. Smaller sampling rates result in lower quality and smaller file sizes. For reference, CD quality is 41.1 kHz.
- Narrowband
- This samples your file at 8 kHz, which is the same sampling rate used to transmit telephone calls.
- Wideband
- This samples your file at 16 kHz.
- Ultra-wideband
- This samples your file at 32 kHz.
Pre-encoding options
- Denoise input
- Enabling this will remove or significantly lessen the hiss or hum in some recordings.
- Apply adaptive gain control
- Enabling this will adjust the volume to a constant level.
Encoder target
Keep in mind the mode (a.k.a sampling rate) when choosing an encoder target if you're concerned with file size or audio quality.
- Quality
- Quality is a subjective measurement of a file's audio quality. A smaller quality setting results in lower quality and smaller file sizes.
- Bitrate
- Bitrates from 4kbps to 28kbps are available.
Complexity
Complexity is a measure of how meticulously Speex compresses the audio. A setting of 10 will take around 5 times longer to encode than a setting of 1, but is useful for encoding non-speech sounds like DMTF tones.
A setting of between 2 and 4 is the best trade-off between encoding time and encoding quality.
Encoder options
- Enable variable bitrate mode (VBR)
- Select this to enable VBR. Selecting this will automatically enable voice activity detection (VAD) in the Other options box.
- Enable VBR average bitrate mode
- This will take the average bitrate of a variable bitrate encoding, and encode the file at that constant bitrate.
Other options
Enabling both of these can greatly reduce the file size for files where there is no one speaking for a periods of time.
- Enable voice activity detection
- This option is implicitly activated in VBR encodings, but when used for CBR encodings, Speex will detect non-speech periods and encode them with just enough bits to reproduce the background noise. This is called comfort noise generation (CNG). CNG alleviates the problems of adding complete silence to non-speech periods such as the listener believing that the file has ended, the speech sounding "choppy" and even hard to understand, and the possibility of the sudden drop-out in sound being jarring to the listener.
- Enable discontinuous transmission (DTX)
- Enabling DTX discontinues the bitstream when being streamed over the internet if there is only background noise, such as when no one has been speaking for a while. This cuts down on overall bandwidth used, putting less strain on the Speex server. When writing to a file, DTX writes 250bps to such periods of non-speech.
- Frames per Ogg packet
- Max automatically places the Speex bitstream in an Ogg container. If encoding at a low bitrate, the Ogg container can create a significant overhead resulting in a significantly larger file relative to the size of the bitstream. Setting Max to put a higher number of frames in each Ogg packet can alleviate this problem.