What is Jitter ?
If you ever experimented with the program ping you probably know that if you send a sequence of packets from point A to some point B, each of the packets will need a slightly different time to reach the destination. The varying transit times are not an issue if you are downloading a web page but they matter if you wish to transmit a stream of real-time data. For example, let's suppose that a VoIP device sends out one RTP packet each 20 milliseconds. Below figure shows what the stream might look like at the receiving end. The fact that the packets do not arrive precisely each 20 milliseconds means that we cannot play them out as they arrive unless we are willing to accept poor quality of the audio output.
Define Jitter:
Formally, jitter is defined as a statistical variance of the RTP data packet inter-arrival time. In the Real Time Protocol, jitter is measured in timestamp units. For example, if you transmit audio sampled at the usual 8000 Hertz, the unit is 1/8000 of a second.
The first step to dealing with jitter successfully is to know how large it is. However, we do not need to compute the precise value. In RTP, the receiving endpoint computes an estimate using a simplified formula (a first-order estimator). The jitter estimate is sent to the other party using RTCP (the Real Time Control Protocol).
Jitter Buffer
The network delivers RTP packets asynchronously, with variable delays. To be able to play the audio stream with reasonable quality, the receiving endpoint needs to turn the variable delays into constant delays. This can be done by using a jitter buffer.
The jitter buffer implementation is quite simple: You create a buffer to hold, say, 100 milliseconds of audio with the sampling rate of 8000 Hz, 100 milliseconds correspond to 800 samples. You place incoming audio frames to the buffer and start the playout when the buffer is, say, at least half full.
Once you start to play the audio, it's a bit of a gamble: you risk both buffer underflow (you need to play another frame but the buffer is empty) and buffer overflow (the buffer is full and you need to throw away the just received packet). To reduce the risk, you can increase the size of the buffer, but you simultaneously increase latency: if you start playing when there's at least 50 milliseconds of audio, you delay the signal by those 50 milliseconds. To improve the odds, you can implement an adaptive buffer — the buffer will change its size based on the current jitter.
Sources of Jitter
I would like to conclude this piece with an observation about the sources of jitter. In addition to varying transit times, jitter can sometimes originate right in the sending computer. This happens when the audio data is not read directly from a sound card (sound cards have a very stable clock, more precise than the computer's on-board clock) but comes from another source — for example, the audio stream is generated by a text-to-speech software or simply read from a file. In other words, we are talking about applications like voice mail and interactive voice response (IVR) systems.
*When run on a standard operating system, IVR and voice mail applications can have a problem with precise timing and thus cause a high jitter. Quite often, the operating system process schedulers works with 10 milliseconds quanta. Consider an application that wants to send one RTP packet each 30 milliseconds. The application spends, say, 5 milliseconds doing some processing (e.g. text-to-speech synthesis). After that, it would need to sleep for precisely 25 milliseconds, so that the interval between packets is exactly 30 ms. But because of the 10 ms quantum, the length of the sleep is rounded up to the nearest multiple of 10ms. In other words, the interval between packets ends up being 35 milliseconds. Should this happen in between each pair of packets, you will get a really poor audio quality.*
To overcome the issue, you can do two things:
• Reconfigure the operating system or install a kernel module or driver that will support a more precise timing.
• Or, at the very least, use an adaptive sending algorithm that tries to compensate the incorrect sleep lengths (see section 6 of the OpenH323 tutorial for more about how to do this).