The latest IP cameras are using the new video compression H.264. We have had many questions about this new compression method so here’s an article that provides the information you will need to better understand this new technology.
H.264 is a new version of MPEG4 and it provides about twice as much compression as the older version. Apple has been using this standard for a number of years and it is now available in the latest IP Cameras. A number of manufacturers have begun to introduce this technology. Axis is in the lead at the moment, but other companies such as Sony, IQinvision and others are slowly introducing their new models as well.
This latest video compression standard, H.264 (also known as MPEG-4 Part 10/AVC for Advanced Video Coding), is becoming the video standard of choice.
Compression Concept
The intent of the H.264/AVC project was to create a standard capable of providing good video quality at substantially lower bit rates than previous standards (e.g. half or less the bit rate of MPEG-2, H.263, or MPEG-4 Part 2), without increasing the complexity of design so much that it would be impractical or excessively expensive to implement. An additional goal was to provide enough flexibility to allow the standard to be applied to a wide variety of applications on a wide variety of networks and systems, including low and high bit rates, low and high resolution video, broadcast, DVD storage, RTP/IP packet networks, and ITU-T multimedia telephony systems.
H.264 is an open, licensed standard that supports the most efficient video compression techniques available today. Without compromising image quality, an H.264 encoder can reduce the size of a digital video file by more than 80% compared with the Motion JPEG format and as much as 50% more than with the MPEG-4 Part 2 standard. This means that much less network bandwidth and storage space are required for a video file. Or seen another way, much higher video quality can be achieved for a given bit rate.
Jointly defined by standardization organizations in the telecommunications and IT industries, H.264 is expected to be more widely adopted than previous standards.Video compression is about reducing and removing redundant video data so that a digital video f ile can be effectively sent and stored. The process involves applying an algorithm to the source video to create a compressed file that is ready for transmission or storage. To play the compressed file, an inverse algorithm is applied to produce a video that shows virtually the same content as the original source video. The time it takes to compress, send, decompress and display a file is called latency. The more advanced the compression algorithm, the higher the latency, given the same processing power.
A pair of algorithms that works together is called a video codec (encoder/decoder). Video codecs that implement different standards are normally not compatible with each other; that is, video content that is compressed using one standard cannot be decompressed with a different standard. For instance, an MPEG-4 Part 2 decoder will not work with an H.264 encoder. This is simply because one algorithm cannot correctly decode the output from another algorithm but it is possible to implement many different algorithms in the same software or hardware, which would then enable multiple formats to be compressed. Different video compression standards utilize different methods of reducing data, and hence, results differ in bit rate, quality and latency.
The graph below provides a bit rate comparison, given the same level of image quality, among the following video standards: Motion JPEG, MPEG-4 Part 2 (no motion compensation), MPEG-4 Part 2 (with motion compensation) and H.264 (baseline profile).
Figure 1. An H.264 encoder generated up to 50% fewer bits per second for a sample video sequence than an MPEG-4 encoder with motion compensation. The H.264 encoder was at least three times more efficient than an MPEG-4 encoder with no motion compensation and at least six times more efficient than Motion JPEG.
Frames
Depending on the H.264 profile, different types of frames such as I-frames, P-frames and B-frames, may be used by an encoder.
An I-frame, or intra frame, is a self-contained frame that can be independently decoded without any reference to other images. The first image in a video sequence is always an I-frame. I-frames are needed as starting points for new viewers or resynchronization points if the transmitted bit stream is damaged. I-frames can be used to implement fast-forward, rewind and other random access functions. An encoder will automatically insert I-frames at regular intervals or on demand if new clients are expected to join in viewing a stream. The drawback of I-frames is that they consume much more bits, but on the other hand, they do not generate many artifacts.
A P-frame, which stands for predictive inter frame, makes references to parts of earlier I and/or P frame(s) to code the frame. P-frames usually require fewer bits than I-frames, but a drawback is that they are very sensitive to transmission errors because of the complex dependency on earlier P and I reference frames.
A B-frame, or bi-predictive inter frame, is a frame that makes references to both an earlier reference frame and a future frame.
Figure 2.
When a video decoder restores a video by decoding the bit stream frame by frame, decoding must always start with an I-frame. P-frames and B-frames, if used, must be decoded together with the reference frame(s).In the H.264 baseline profile, only I- and P-frames are used. This profile is ideal for network cameras and video encoders since low latency is achieved because B-frames are not used.
When a video decoder restores a video by decoding the bit stream frame by frame, decoding must always start with an I-frame. P-frames and B-frames, if used, must be decoded together with the reference frame(s).In the H.264 baseline profile, only I- and P-frames are used. This profile is ideal for network cameras and video encoders since low latency is achieved because B-frames are not used.
Basic Concepts of Reducing the Data
A variety of methods can be used to reduce video data, both within an image frame and between a series of frames.
Within an image frame, data can be reduced simply by removing unnecessary information, which will have an impact on the image resolution. MJPEG utilizes this algorithm.
In a series of frames, video data can be reduced by such methods as difference coding, which is used by MEPG4 and H.264. In difference coding, a frame is compared with a reference frame (i.e. earlier I- or P-frame) and only pixels that have changed with respect to the reference frame are coded. In this way, the number of pixel values that are coded and sent is reduced.
Figure 3. With Motion JPEG format, the three images in the above sequence are coded and sent as separate unique images (I-frames) with no dependencies on each other.
Figure 4. With difference coding (used in most video compression standards including H.264), only the first image (I-frame) is coded in its entirety. In the two following images (P-frames), references are made to the first picture for the static elements, i.e. the house, and only the moving parts, i.e. the running man, is coded using motion vectors, thus reducing the amount of information that is sent and stored.
The amount of encoding can be further reduced if detection and encoding of differences is based on blocks of pixels (macroblocks) rather than individual pixels; therefore, bigger areas are compared and only blocks that are significantly different are coded. The overhead associated with indicating the location of areas to be changed is also reduced.
Difference coding, however, would not significantly reduce data if there was a lot of motion in a video. Here, techniques such as block-based motion compensation can be used. Block-based motion compensation takes into account that much of what makes up a new frame in a video sequence can be found in an earlier frame, but perhaps in a different location. This technique divides a frame into a series of macroblocks. Block by block, a new frame—for instance, a P-frame—can be composed or ‘predicted’ by looking for a matching block in a reference frame. If a match is found, the encoder simply codes the position where the matching block is to be found in the reference frame. Coding the motion vector, as it is called, takes up fewer bits than if the actual content of a block were to be coded.
Difference coding, however, would not significantly reduce data if there was a lot of motion in a video. Here, techniques such as block-based motion compensation can be used. Block-based motion compensation takes into account that much of what makes up a new frame in a video sequence can be found in an earlier frame, but perhaps in a different location. This technique divides a frame into a series of macroblocks. Block by block, a new frame—for instance, a P-frame—can be composed or ‘predicted’ by looking for a matching block in a reference frame. If a match is found, the encoder simply codes the position where the matching block is to be found in the reference frame. Coding the motion vector, as it is called, takes up fewer bits than if the actual content of a block were to be coded.
Figure 5. Illustration of block-based motion compensation
Improving Compression Even more with H.264
H.264 takes video compression technology to a new level. With H.264, a new and advanced intra prediction scheme is introduced for encoding I-frames. This scheme can greatly reduce the bit size of an I-frame and maintain a high quality by enabling the successive prediction of smaller blocks of pixels within each macroblock in a frame. This is done by trying to find matching pixels among the earlier-encoded pixels that border a new 4x4 pixel block to be intra-coded. By reusing pixel values that have already been encoded, the bit size can be drastically reduced. The new intraprediction is a key part of the H.264 technology that has proven to be very efficient. For comparison, if only I-frames were used in an H.264 stream, it would have a much smaller file size than a Motion JPEG stream, which uses only I-frames.
H.264 takes video compression technology to a new level. With H.264, a new and advanced intra prediction scheme is introduced for encoding I-frames. This scheme can greatly reduce the bit size of an I-frame and maintain a high quality by enabling the successive prediction of smaller blocks of pixels within each macroblock in a frame. This is done by trying to find matching pixels among the earlier-encoded pixels that border a new 4x4 pixel block to be intra-coded. By reusing pixel values that have already been encoded, the bit size can be drastically reduced. The new intraprediction is a key part of the H.264 technology that has proven to be very efficient. For comparison, if only I-frames were used in an H.264 stream, it would have a much smaller file size than a Motion JPEG stream, which uses only I-frames.
In this mode, four bottom pixels from the block above are copied vertically into part of an intra-coded macro-block. | In this mode, four right-most pixels from the block to the left are copied horizontally into part of an intra-coded macroblock. | In this mode, eight bottom pixels from the blocks above are copied diagonally into part of an intra-coded macro-block. |
Figure 6. Illustrations of some of the modes that intra prediction can take in coding 4x4 pixels within one of the 16 blocks that make up a macroblock. Each of the 16 blocks in a macroblock may be coded using different modes.
Original source image Intra predicted image
Residual image Output image
Figure 7. The above images illustrate the efficiency of H.264’s intra prediction scheme, whereby the intra predicted image is sent for “free”. Only the residual content and the intra prediction modes need to be coded to produce the output image.
Block-based motion compensation—used in encoding P- and B-frames—has also been improved in H.264. An H.264 encoder can choose to search for matching blocks—down to sub-pixel accuracy—in a few or many areas of one or several reference frames. The block size and shape can also be adjusted to improve a match. In areas where no matching blocks can be found in a reference frame, intra-coded macroblocks are used. The high degree of flexibility in H.264’s block-based motion compensation pays off in crowded surveillance scenes where the quality can be maintained for demanding applications. Motion compensation is the most demanding aspect of a video encoder and the different ways and degrees with which it can be implemented by an H.264 encoder can have an impact on how efficiently video is compressed.
With H.264, typical blocky artifacts—seen in highly compressed video using Motion JPEG and MPEG standards other than H.264—can be reduced using an in-loop deblocking filter. This filter smoothes block edges using an adaptive strength to deliver an almost perfect decompressed video.
Figure 8. Blocky artifacts in the highly compressed image at left are reduced when a deblocking filter is applied, as seen in the image at right.
Conclusion
H.264 compression provides a significant improvement in video compression technology. It is supported by many different standards groups making it one of the most accepted standards. Because it provides a dramatic improvement in compression, it reduces the bandwidth and storage required. It provides an 80% improvement over MJPEG compression and about 50% improvement over MPEG4 compression. It is now available in the latest cameras from Axis, and other manufacturers.
Need more information about this compression or the cameras that utilize it, just contact us at 914-944-3425 or by using our contact form.