These are partly notes I have jotted down from what I am reading and also links to helpful VOIP sites. I lay no claims to ownership, except where indicated.

Sunday, January 31, 2010

Jots for today

Understanding Codecs and DSP functionality



-a device or program capable of performing encoding or decoding on some digital data stream or signal.

-transform voip media streams into another format: A to D; D to D; or D to A.

-especially important on low-speed serial links where bandwidth is very important.

Codecs supported by the Cisco IOS GWs:


for encoding tel audio on 64-kbps channel

it is a PCM scheme operating at 8KHz sample rate, with 8 bits per sample

widely used in telecoms ind as it improves the signal-to-noise ratio without increasing the amount of data.

has two subsets:

1. mu-law

used in Nth Ame and Jap phone ntwks.

2. a-law

used in Europe and elsewhere around the world.

both subsets use compressed speech carried in 8-bit samples. Use 8KHz sampling rate with 64kbps of storage.


wideband speech codec.
provides 7KHz of wideband audio at data rates from 48kbps to 64kbps.

tech is based on adaptive differential PCM (ADPCM).

G.722.1 - lower bit-rate compressions

G.722.2 (Adaptive Multi-Rate Wideband)- offers even more lower bit-rate compressions


is an ITU-T ADPCM which ops at data rates of 40, 32, 24, 16 kbps.


16kbps low-delay CELP (LD-CELP)


uses conjugate-structure algebraic-CELP (CS-ACELP)


describes dual-rate coder for multimedia communications for compressing speech or audio signal components at very low-bit rate as part of the H.324 family of standards.

two bit rates assoc with it:

r63: 8.3 kbps using 24 byte frames and Multipulse LPC with Maximum Likelihood Quantization (MPC-MLQ)

r53: 5.3 kbps using 20 byte frame and the ACELP algorithm.


frame size of 20 ms and ops at 13 kbps bit rate.

is a Regular Pulse Excited-Linear Predictive (RPE-LTP) coder.

network must support GSM FR codecs in order to write VoiceXML scripts.

-iLBC [Internet Low Bit Rate Codec]

designed for narrowband speech
results in a payload bit rate of 13.33kbps for 30 ms frame and 15.20 kbps for 20 ms frames.

algorithm is based on block-independent linear predictive coding with the choice of data frame lengths of 20 ms and 30 ms.

There is a need to balance the need for voice quality against the cost of BW when choosing codecs. The higher the codec BW, the higher the cost of each call across the ntwk.


-voice sample size is a variable that can affect total BW used.

-voice sample is defined as the total output from a codec DSP that is encapsulated into a Protocol Data Unit (PDU)

Table of various codecs, their sample sizes and the number of pkts reqd for voip to xmit 1 second of audio:

The larger the sample size, the larger the packet and the fewer the encapsulated samples that have to be sent (wc reduces BW).



Bytes_per_Sample = (Sample_Size * Codec_Bandwidth) /8

Other factors to bear in mind when calculating overhead of voip call:

1. Layer 2 and security protocols add to pkt size significantly.

Layer 2 overhead for various protocols:

-Ethernet II protocol

carries 18 bytes of overhead
6 for source MAC
6 for dest MAC
2 for type
4 for CRC


carries 6 bytes of overhead

1 flag byte to indicate beginning and end of a frame

1 address byte

1 control byte

1 protocol byte

2 bytes for CRC


carries 6 bytes of overhead

2 bytes for DLCI header

2 for FRF.12

2 for CRC

-Multilink PPP

carries 6 bytes of overhead

1 for flag

1 for address

2 for control or type

2 for CRC

2. The IP and transport layers also contribute to overhead

IP adds a 20 byte header

UDP adds 8 byte header

RTP adds a 12 byte header

3. Security overhead

IPSec adds 50 to 57 bytes of overhead when u r using VPN.

L2TP or GRE adds 24 bytes of overhead.

if in use, MLP will add 6 bytes

MPLS adds 4 byte label to every pkt


Points to consider before calculating:

-if more bw is reqd for the codec, then more total bw is reqd.

-if more overhead is assoc with the data link, the more total bw is needed.

-if there is a larger sample size, then less total bw is reqd.

-if cRTP is being used then the total bw reqd is reduced significantly.

As a ntwk engineer:

-you need to calc the total bw for each voip call

-this info can then be used to calculate the total bw for the company's WAN links


TBW = total packet size * packets per second (pps)

-total packet size in bytes = (Layer 2 header: MPPP, FRF.12, or Ethernet) + (IP/UDP/RTP header)+(voice payload size)

-pps = codec bit rate/voice payload size

Protocol header assumptions used for the calcs:

-40 bytes for IP(20)/UDP(8)/RTP(12) headers

-cRTP reduces IP/UDP/RTP to 2 or 4 bytes (cRTP not available over Ethernet)

-6 bytes for MPPP, or FRF.12 L2 header

-1 byte for the end-of-frame flag on MP or Frame Relay frames

-18 bytes for Ethernet L2 headers (including 4 bytes for FCS or CRC)

Example calc:

G.729 codec (8 kbps) with a 20 byte sample size and using FRF.12 without cRTP

total packet size = 6 bytes (FRF.12) + 40 bytes (IP/UDP/RTP) + 20 bytes (voice payload size) = 66 bytes

total packet size (bits) = 66 bytes * 8 bits per byte = 528 bits

PPS = 8 kbps/160 bits (20 bytes * 8 bits) = 50 p/s i.e 8000/160

BW per call = 528 bits/s * 50 p/s = 26, 400 bps = 26.4kbps.

Example calc:

G.729 codec (8 kbps) with a 20 byte sample size and using FRF.12 with cRTP

total packet size = 6 bytes + 2 bytes + 20 bytes = 28 bytes * 8 bits = 224 bits

pps = 8 kbps/160 bits (20 bytes * 8 bits)= 50 p/s

total BW per call = 224 bits/s * 50 p/s = 11200 bps = 11.2 kbps

VAD provides a max 35% BW savings. VAD should however not be taken into account for the purpose of ntwrk design and bw engineering. Features such as music on hold (MOH) and fax render VAD ineffective. VAD reduces silence on voip conversations but also provides comfort noise generation (CNG).


media resource - sw-based or hw-based entity that performs media processing functions on the data streams to wc it is connected.

-transcoding: conversion from one codec to another.

processed by DSPs on a DSP farm - sessions are initiated and managed by CUCM which refers to transcoders as hw MTPs.

-voice termination: the digitization and packetization of an analog signal on a TDM interface.

-conference bridge: a resource that joins multiple parties into a single call.

hardware conference bridges are used in two environs:

central site

remote site

-MTP: an entity that accepts two full-duplex voice streams using the same codec.

can be used for:

repacketization - transcode a-law to mu-law and vice versa

H.323 supplementary services

two types of MTPs:

1. sw MTP

2. hw MTP


This refers to the amount of processing required to perform voice compression

Two categories:

medium complexity

high complexity

No comments:

Post a Comment