Adaptive Multi-Rate (AMR)
It is an audio data compression scheme optimized for speech coding. AMR was adopted as the standard speech codec by 3GPP in October 1998 and is now widely used in GSM. It uses link adaptation to select from one of eight different bit rates based on link conditions.
The bit rates 12.2, 10.2, 7.95, 7.40, 6.70, 5.90, 5.15 and 4.75 kbit/s are based on frames which
contain 160 samples and are 20 milliseconds long. AMR uses different techniques, such as Algebraic Code Excited Linear Prediction (ACELP), Discontinuous Transmission (DTX), voice activity detection (VAD) and comfort noise generation (CNG).
The usage of AMR requires optimized link adaptation that selects the best codec mode to meet the local radio channel and capacity requirements. If the radio conditions are bad, source coding is reduced and channel coding is increased. This improves the quality and robustness of the network connection while sacrificing some voice clarity. In the particular case of AMR this improvement is somewhere around 4-6 dB S/N for useable communication. The new intelligent system allows the network operator to prioritize capacity or quality per base station.
Audio data compression
Audio compression is a form of data compression designed to reduce the size of audio files. Audio compression algorithms are typically referred to as audio codecs. As with other specific forms of data compression, there exist many "lossless" and "lossy" algorithms to achieve the compression effect.
Speech encoding
Speech coding is the compression of speech (into a code) for transmission with speech
codecs that use audio signal processing and speech processing techniques. The two most important applications using speech coding are *mobile phones and internet phones.*
The techniques used in speech coding are similar to that in audio data compression and audio coding where knowledge in psychoacoustics is used to transmit only data that is relevant to the human auditory system. For example, in narrowband speech coding, only information in the frequency band 400 Hz to 3500 Hz is transmitted but the reconstructed signal is still adequate for intelligibility.
However, speech coding differs from audio coding in that there is a lot more statistical information available about the properties of speech. In addition, some auditory information which is relevant in audio coding can be unnecessary in the speech coding context. In speech coding, the most important criterion is preservation of intelligibility and "pleasantness" of speech, with a constrained amount of transmitted data.
It should be emphasised that the intelligibility of speech includes, besides the actual literal content, also speaker identity, emotions, intonation, timbre etc. that are all important for perfect intelligibility. The more abstract concept of pleasantness of degraded speech is a different property than intelligibility, since it is possible that degraded speech is completely intelligible, but subjectively annoying to the listener.
In addition, most speech applications require low coding delay, as long coding delays interfere with speech interaction.
The A-law algorithm and the Mu-law algorithm are used in nearly all land-line long distance telephone communications. They can be seen as a kind of speech encoding, requiring only 8 bits per sample but giving effectively 12 bits of resolution.
The most common speech coding scheme is Code-Excited Linear Predictive (CELP) coding, which is used for example in the GSM standard. In CELP, the modelling is divided in two stages, a linear predictive stage that models the spectral envelope and code-book based model of the residual of the linear predictive model.
In addition to the actual speech coding of the signal, it is often necessary to use channel coding for transmission, to avoid losses due to transmission errors. Usually, speech coding and channel coding methods have to be chosen in pairs, with the more important bits in the speech data stream protected by more robust channel coding, in order to get the best overall coding results.