What is H.264 ?
H.264 is an industry standard for video compression, the process of converting digital video into a format that takes
up less capacity when it is stored or transmitted. Video compression (or video coding) is an essential technology
for applications such as digital television, DVD-Video, mobile TV, videoconferencing and internet video streaming. Standardizing video compression makes it possible for products from different manufacturers
(e.g. encoders, decoders and storage media) to inter-operate. An encoder converts video into a compressed
format and a decoder converts compressed video back into an uncompressed format. Recommendation
H.264: Advanced Video Coding is a document published by the international standards bodies ITU-T
(International Telecommunication Union) and ISO/IEC (International Organisation for Standardisation / International Electro technical Commission). It defines a format (syntax) for compressed video and a method for decoding this syntax to produce a displayable video sequence. The standard document does not actually
specify how to encode (compress) digital video this is left to the manufacturer of a video encoder
but in practice the encoder is likely to mirror the steps of the decoding process. Figure 1 shows
the encoding and decoding processes and highlights the parts that are covered by the H.264 standard.
The H.264/AVC standard was first published in 2003. It builds on the concepts of earlier standards
such as MPEG-2 and MPEG-4 Visual and offers the potential for better compression efficiency
(i.e. better-quality compressed video) and greater flexibility in compressing, transmitting and
How does an H.264 codec work ?
An H.264 video encoder carries out prediction, transform and encoding processes (see
Figure 1) to produce a compressed H.264 bitstream. An H.264 video decoder carries
out the complementary processes of decoding, inverse transform and reconstruction to
produce a decoded video sequence
The encoder processes a frame of video in units of a Macroblock (16×16 displayed pixels). It forms a prediction
of the macroblock based on previously-coded data, either from the current frame (intra prediction) or from
other frames that have already been coded and transmitted (inter prediction). The encoder subtracts the
prediction from the current macroblock to form a residual1. The prediction methods supported by H.264
are more flexible than those in previous standards, enabling accurate predictions and hence efficient
video compression. Intra prediction uses 16×16 and 4×4 block sizes to predict the macroblock from surrounding, previously-coded pixels within the same frame (Figure 2). Inter prediction uses a range of block sizes (from 16×16 down to 4×4) to predict pixels in the current frame from similar regions in previously-coded frames (Figure 3).
Transform and quantization
A block of residual samples is transformed using a 4×4 or 8×8 integer transform, an approximate form of the Discrete Cosine Transform (DCT). The transform outputs a set of coefficients, each of which is a weighting value for a standard basis pattern. When combined, the weighted basis patterns re-create the block of residual samples. Figure 4 shows how the inverse DCT creates an image block by weighting each basis pattern according to a coefficient value and combining the weighted basis patterns.
The output of the transform, a block of transform coefficients, is quantized, i.e. each coefficient is divided by an integer value. Quantization reduces the precision of the transform coefficients according to a quantization parameter (QP). Typically, the result is a block in which most or all of the coefficients are zero, with a few non-zero coefficients. Setting QP to a high value means that more coefficients are set to zero,
resulting in high compression at the expense of poor decoded image quality. Setting QP to a low value means
that more non-zero coefficients remain after quantization, resulting in better decoded image quality but lower compression.
The video coding process produces a number of values that must be encoded to form the compressed bitstream.
These values include:
• quantized transform coefficients
• information to enable the decoder to re-create the prediction
• information about the structure of the compressed data and the compression tools used during encoding
• information about the complete video sequence
These values and parameters (syntax elements) are converted into binary codes using variable length coding and/or arithmetic coding. Each of these encoding methods produces an efficient, compact binary representation of the information. The encoded bitstream can then be stored and/or transmitted.
A video decoder receives the compressed H.264 bitstream, decodes each of the syntax elements and extracts the information described above (quantized transform coefficients, prediction information, etc). This information is
then used to reverse the coding process and recreate a sequence of video images.
Rescaling and inverse transform
The quantized transform coefficients are re-scaled. Each coefficient is multiplied by an integer value to
restore its original scale2. An inverse transform combines the standard basis patterns, weighted by
the re-scaled coefficients, to re-create each block of residual data. These blocks are combined together
to form a residual macroblock.
For each macroblock, the decoder forms an identical prediction to the one created by the encoder.
The decoder adds the prediction to the decoded residual to reconstruct a decoded macroblock which
can then be displayed as part of a video frame.
For More Details