Digital Imaging & Data Compression

 

Padding Techniques for MPEG-4

While MPEG-4 proposes fixed boundary pixel padding for arbitrary shaped object encoding, our research is focused on a number of other alternatives to see if any improvement can be achieved. Our work can be classified into:

(i) Linear extrapolation padding;

(ii) Extrapolated average padding;

(iii) Hybrid padding schemes

1. Linear Extrapolation Padding

In this scheme, we use a single iteration of a row-based Linear Extrapolation Padding [LEP] step, to predict those exterior pixels adjacent to the boundary pixels of the reference VO. The algorithm is described in mathematical detail as follows:

Given any row of pixels in a boundary block of size, we define all those exterior pixels that are immediately next to the boundary pixels in that row, as projected pixels [see Figure 1]. Only these pixels would be padded by the LEP technique.

                                                wpe1.jpg (12002 bytes)

Assume that, in a given row, there exist () consecutive pixels inside the VO, which are bounded by a projected pixel either on the left or on the right. Let these ‘’ pixel values be represented by. The first step of our algorithm is to construct a linear equation of the form to fit all these pixel values, represented by Pn, and determine the two coefficients, A and B. Here Xn represents the column number of Pn with respect to the projected pixel, which actually indicates the location of Pn inside the row being considered . The coefficients A and B are found in least squared error terms as follows:

Firstly, the prediction error for a single pixel is given by,

                                                                        (1)

Secondly, to obtain the minimum error, we have: and .

              (2)

and

                                                            (3)

By simplifying equation 2 and equation 3 above, we arrive at the following matrix equation.

                                                    (4)

After constants and are found from equation 4, the equation is used to do the extrapolation and find the projected pixel value.

A straightforward application of the above scheme is used in the determination of the pixel A, in Figure 1. Note that, here, Pn consists of the two pixels that are located to the right of A . In this way, the projected pixel is determined by the variation trend of those interior pixels inside the VO and close to the boundary. However, specific implementations of the above scheme are necessary for the following cases,

(i) In cases that are bounded by two projected pixels [e.g. pixels denoted by B in Figure 1], both projected pixels will be deterimned using the same linear equation.

(ii) If a projected pixel is flanked by interior pixels on both sides, as illustrated by pixel C in Figure 1, the above process is performed in both directions and the average of the two resulting extrapolated pixel values are taken as the value of the projected pixel.

(iii) For projected pixels where , which is shown by pixel D in Figure 1, the pixel value is taken as equal to the single interior pixel value.

After all the projected pixels are determined, the existing MPEG-4 horizontal and vertical padding techniques are used to pad the rest of the exterior pixels within the block. Note that the projected pixels that were produced by the linear extrapolation now act as the new boundary pixels. After all boundary macro-blocks are padded similarly, extended padding is used to pad the remaining exterior macro-blocks within the VOP bounding rectangle.

2. Extrapolated Average Padding

From our experience of testing MPEG-4, we found that MPEG-4 did not perform well in encoding arbitrary shaped MPEG-4 video objects that have been severely distorted (or have changed shape severely) between consecutive video frames. The LEP technique we proposed also fails to perform well under similar geometrical conditions.

Figure 2 illustrates the reason of this failure. Note, that in the non overlapping area the pixel errors are the difference between the corresponding extrapolated boundary pixel values of the reference VOP (Pad) and the interior pixel values (closer to the boundary) of the boundary macro-block (Cur) to be encoded. If the non-overlapping area is large, the boundary pixel values of the reference VOP (or the linear extrapolated pixel values) may not be a good representation for the interior pixel values (especially for pixels that are further away from the boundary) of the arbitrarily shaped boundary macro-block to be encoded. Thus, the methods discussed above (i.e. MPEG-4 method and the LEP technique) would fail to produce lower magnitude error blocks.

                        wpe3.jpg (15068 bytes)

As a solution, we propose an Extrapolated Average Padding (EAP) technique for severely distorted reference VOPs.

Firstly, the arithmetic mean value A of all the pixels p (i,j) of the boundary macro-block situated in the interior of the reference VOP is calculated using the following formula:

                                                (5)

where, , N is the number of pixels situated within the reference VOP. The division by N is done by rounding to the nearest integer. The next step is to assign A to each block pixel situated outside the object region L, i.e.

                                        (6)

After the boundary macro-blocks are padded according to the above technique, Extended Padding is used to pad the exterior macro-blocks. Any exterior macro-block that does not get padded at the end of this stage would be padded using 128.

3. Hybrid Padding

Detailed experimental investigations showed that the LEP technique works best when the matching shapes are close, thus performing well especially in sequences where objects change shape at a slower pace. In contrast, the EAP technique discussed above works best when the shape changes are large. Most video sequences fall into the first category. However, the importance of dealing with large shape mismatches between adjacent VOP’s cannot be ignored, especially due to the fact that most video sequences would contain VO’s of both types. Alternatively, a video sequence may have VO’s that are distorted between frames only in certain areas or sections of the object. Thus, in either case the use of the hybrid approach is bound to improve compression efficiency. As a result, we investigated several schemes that could be used to design a hybrid between the LEP and EAP techniques. Details are given below:

Method 1: A video sequence may consist of a collection of video objects, which due to their motion and occlusion may considerably change shape between frames. Assuming that these video objects could be accurately categorised either as ‘distorted’ or as ‘non distorted’, it is possible to use the appropriate coding strategy (i.e. either EAP or LEP respectively) to code a given classified object, at a given temporal location in the video sequence. Assuming that the video objects are bounded by a closed contour, the following attributes could be used in order to make the above decision.

(i) The number of pixels representing the perimeter of the contour, .

(ii) The first invariant moment ‘h1’, defined as,    

where, , . Here and represent the x and y co-ordinates of the points along the contour. Note that ‘h1’ is invariant to translation, rotation and isotropic scale changes.

(iii) The second invariant moment ‘h2’, defined as,

where, and are defined as in (i).

In cases where the object is partly bounded by an edge of the video frame, the edge is considered as a part of the object contour. The values of attributes ‘n’, ‘h1’ and ‘h2’ are used in order to determine whether a video object has been ‘distorted’ or ‘non-distorted’. However, the invariant moment attributes ‘h1’ and ‘h2’ were found to produce more reliable results as compared to using the contour length attribute, ‘n’.

Once the video objects are categorised as above, the LEP technique is used to padd the reference VO’s of the objects that are classified as ‘non-distorted’. The EAP technique is used to pad the reference VO’s of the objects that are classified as ‘distorted’. As the shape information of the VO streams are transmitted as binary alpha planes, the categorisation could be done independently of the VO prediction.

Method 2: The notion of categorising a video object as ‘distorted’ or ‘non-distorted’, as described in the above section, is based on the assumption that when an object is ‘distorted’, it implies that either all or a major proportion of boundary blocks would have severely changed shape. However, detailed experimental investigations with several test video sequences indicated that this assumption is far from true. Most VOP’s are such that only a part of the VO would change shape. Thus, a coding scheme, which identifies the areas in an object that could be classified as ‘distorted’ and subsequently use the EAP technique to padd the reference frame boundary blocks of such areas, would perform better than the method described in the previous section. The following scheme would address this issue.

                                                        wpe4.jpg (12139 bytes)

Firstly, the LEP technique is used to pad the reference frame VOP. Motion compensation for all the arbitrary shaped boundary blocks in the current frame VOP is now performed taking this padded VOP as the reference. Subsequently, for each matching pair a measure of mismatch is calculated by averaging the three largest ‘mismatch distances’, H1, H2 and H3. (see Figure 3). If this average is greater than a threshold, T (2.0 for our experiments) a decision is made to re-pad the pixels that are within the best matching block, but are outside the shape of the reference video object, using the EAP technique. I.e.,

If             

The prediction errors are calculated based on these new padding values. For the remaining blocks the prediction values are calculated using the original padding values [i.e. padded using LEP] of the reference frame VOP.

References

  1. ISO/IEC/MPEG’97: ‘MPEG-4 video verification model 8.0’, Doc.N1996, July 1997.

  2. ISO/IEC/MPEG’97: ‘MPEG-4 Requirement – Version 3’, MPEG Requirements Group, Doc.N1682, April 1997.

  3. P.Kauff, B.Makai, S.Rauthenberg, U.Golz, J.De Lameillieure, T.Sikora, ‘Functional coding of video using a shape-adaptive DCT algorithm and an object-based motion prediction toolbox’, IEEE Trans. on CSVT-7, February 1997, pp. 181-196.

  4. T.Sikora, ‘Low complexity shape-adaptive DCT for coding of arbitrarily shaped image segments’, Signal Processing, Image Communication, Vol. 7, 1995, pp. 381-395.

  5. E.A.Edirisinghe, J.Jiang, C.Grecos, "A Novel Shape Padding Technique for Improving MPEG-4 Compression Efficiency", IEE Electronic Letters, Vol. 35, No. 17, August 1999, pp. 1453-1454.

  6. Hu Li, B.S.Manjunath, Sanjit K Mitra, "A Contour Based Approach to Multi-Sensor Image Registration", IEEE Trans. in Image Process., Vol. 4, No. 3, March 1995, pp. 321-324.

  7. MPEG-2, test model 5, ISO/IEC/JTC1/SC29/WG11/93-225B, Test model editing committee, April 1993.

  8. A.K.Katsaggelos, L.P.Kondi, F.W.Meier, J.Ostermann, G.M.Schuster, "MPEG-4 and Rate-Distortion-Based Shape-Coding Techniques" IEEE Proc., Special Issue on Multimedia Signal Proc., Vol.86, No.6, June 1998, pp. 1126-1154.

  9. P.Kauff, T.Sikora, U.Gölz and B.Makai, "Functional Coding of Video for MPEG-4 Applications", ISO-MPEG, Document 395, Singapore, November 1994.

  10. R.Schäfer and T.Sikora, "Digital Video Coding Standards and Their Role in Video Communications", Proceedings of the IEEE, Vol. 83, No. 6, June 1995.

  11. T.Sikora, "The MPEG-4 Video Standard Verification Model", IEEE Trans. CSVT, Vol. 7, No. 1, Feb. 1997.

  12. L.Chiariglione, "MPEG and Multimedia Communications", IEEE Trans. CSVT, Vol. 7, No. 1, Feb. 1997.

Back to Research Activities