Max Help: Speex

Speex is an open-source lossy speech codec designed specifically for VoIP and file-based compression. It is useful for compressing recordings of speeches or interviews. Speex is not a music codec.

Max automatically places the Speex bitstream in an Ogg container.

Configuring Speex Output

Speex configuration is located in the Speex section of the preferences.

Mode

Mode designates the sampling rate with which you wish to encode. Smaller sampling rates result in lower quality and smaller file sizes. For reference, CD quality is 41.1 kHz.

Narrowband
This samples your file at 8 kHz, which is the same sampling rate used to transmit telephone calls.
Wideband
This samples your file at 16 kHz.
Ultra-wideband
This samples your file at 32 kHz.

Pre-encoding options

Denoise input
Enabling this will remove or significantly lessen the hiss or hum in some recordings.
Apply adaptive gain control
Enabling this will adjust the volume to a constant level.

Encoder target

Keep in mind the mode (a.k.a sampling rate) when choosing an encoder target if you're concerned with file size or audio quality.

Quality
Quality is a subjective measurement of a file's audio quality. A smaller quality setting results in lower quality and smaller file sizes.
Bitrate
Bitrates from 4kbps to 28kbps are available.

Complexity

Complexity is a measure of how meticulously Speex compresses the audio. A setting of 10 will take around 5 times longer to encode than a setting of 1, but is useful for encoding non-speech sounds like DMTF tones.

A setting of between 2 and 4 is the best trade-off between encoding time and encoding quality.

Encoder options

Enable variable bitrate mode (VBR)
Select this to enable VBR. Selecting this will automatically enable voice activity detection (VAD) in the Other options box.
Enable VBR average bitrate mode
This will take the average bitrate of a variable bitrate encoding, and encode the file at that constant bitrate.

Other options

Enabling both of these can greatly reduce the file size for files where there is no one speaking for a periods of time.

Enable voice activity detection
This option is implicitly activated in VBR encodings, but when used for CBR encodings, Speex will detect non-speech periods and encode them with just enough bits to reproduce the background noise. This is called comfort noise generation (CNG). CNG alleviates the problems of adding complete silence to non-speech periods such as the listener believing that the file has ended, the speech sounding "choppy" and even hard to understand, and the possibility of the sudden drop-out in sound being jarring to the listener.
Enable discontinuous transmission (DTX)
Enabling DTX discontinues the bitstream when being streamed over the internet if there is only background noise, such as when no one has been speaking for a while. This cuts down on overall bandwidth used, putting less strain on the Speex server. When writing to a file, DTX writes 250bps to such periods of non-speech.
Frames per Ogg packet
Max automatically places the Speex bitstream in an Ogg container. If encoding at a low bitrate, the Ogg container can create a significant overhead resulting in a significantly larger file relative to the size of the bitstream. Setting Max to put a higher number of frames in each Ogg packet can alleviate this problem.

References