Digital Imaging & Data Compression
|
|
Padding Techniques for MPEG-4 While MPEG-4 proposes fixed boundary pixel padding for arbitrary shaped object encoding, our research is focused on a number of other alternatives to see if any improvement can be achieved. Our work can be classified into: (i) Linear extrapolation padding; (ii) Extrapolated average padding; (iii) Hybrid padding schemes 1. Linear Extrapolation Padding In this scheme, we use a single iteration of a row-based Linear Extrapolation Padding [LEP] step, to predict those exterior pixels adjacent to the boundary pixels of the reference VO. The algorithm is described in mathematical detail as follows: Given any row of pixels in
a boundary block of size
Assume that, in a given
row, there exist Firstly, the prediction
error for a single pixel
Secondly, to obtain the
minimum error, we have:
and
By simplifying equation 2 and equation 3 above, we arrive at the following matrix equation.
After constants A straightforward application of the above scheme is used in the determination of the pixel A, in Figure 1. Note that, here, Pn consists of the two pixels that are located to the right of A . In this way, the projected pixel is determined by the variation trend of those interior pixels inside the VO and close to the boundary. However, specific implementations of the above scheme are necessary for the following cases,
After all the projected pixels are determined, the existing MPEG-4 horizontal and vertical padding techniques are used to pad the rest of the exterior pixels within the block. Note that the projected pixels that were produced by the linear extrapolation now act as the new boundary pixels. After all boundary macro-blocks are padded similarly, extended padding is used to pad the remaining exterior macro-blocks within the VOP bounding rectangle. 2. Extrapolated Average Padding From our experience of testing MPEG-4, we found that MPEG-4 did not perform well in encoding arbitrary shaped MPEG-4 video objects that have been severely distorted (or have changed shape severely) between consecutive video frames. The LEP technique we proposed also fails to perform well under similar geometrical conditions. Figure 2 illustrates the reason of this failure. Note, that in the non overlapping area the pixel errors are the difference between the corresponding extrapolated boundary pixel values of the reference VOP (Pad) and the interior pixel values (closer to the boundary) of the boundary macro-block (Cur) to be encoded. If the non-overlapping area is large, the boundary pixel values of the reference VOP (or the linear extrapolated pixel values) may not be a good representation for the interior pixel values (especially for pixels that are further away from the boundary) of the arbitrarily shaped boundary macro-block to be encoded. Thus, the methods discussed above (i.e. MPEG-4 method and the LEP technique) would fail to produce lower magnitude error blocks.
As a solution, we propose an Extrapolated Average Padding (EAP) technique for severely distorted reference VOPs. Firstly, the arithmetic mean value A of all the pixels p (i,j) of the boundary macro-block situated in the interior of the reference VOP is calculated using the following formula:
where,
After the boundary macro-blocks are padded according to the above technique, Extended Padding is used to pad the exterior macro-blocks. Any exterior macro-block that does not get padded at the end of this stage would be padded using 128. 3. Hybrid Padding Detailed experimental investigations showed that the LEP technique works best when the matching shapes are close, thus performing well especially in sequences where objects change shape at a slower pace. In contrast, the EAP technique discussed above works best when the shape changes are large. Most video sequences fall into the first category. However, the importance of dealing with large shape mismatches between adjacent VOPs cannot be ignored, especially due to the fact that most video sequences would contain VOs of both types. Alternatively, a video sequence may have VOs that are distorted between frames only in certain areas or sections of the object. Thus, in either case the use of the hybrid approach is bound to improve compression efficiency. As a result, we investigated several schemes that could be used to design a hybrid between the LEP and EAP techniques. Details are given below: Method 1: A video sequence may consist of a collection of video objects, which due to their motion and occlusion may considerably change shape between frames. Assuming that these video objects could be accurately categorised either as distorted or as non distorted, it is possible to use the appropriate coding strategy (i.e. either EAP or LEP respectively) to code a given classified object, at a given temporal location in the video sequence. Assuming that the video objects are bounded by a closed contour, the following attributes could be used in order to make the above decision.
In cases where the object is partly bounded by an edge of the video frame, the edge is considered as a part of the object contour. The values of attributes n, h1 and h2 are used in order to determine whether a video object has been distorted or non-distorted. However, the invariant moment attributes h1 and h2 were found to produce more reliable results as compared to using the contour length attribute, n. Once the video objects are categorised as above, the LEP technique is used to padd the reference VOs of the objects that are classified as non-distorted. The EAP technique is used to pad the reference VOs of the objects that are classified as distorted. As the shape information of the VO streams are transmitted as binary alpha planes, the categorisation could be done independently of the VO prediction. Method 2: The notion of categorising a video object as distorted or non-distorted, as described in the above section, is based on the assumption that when an object is distorted, it implies that either all or a major proportion of boundary blocks would have severely changed shape. However, detailed experimental investigations with several test video sequences indicated that this assumption is far from true. Most VOPs are such that only a part of the VO would change shape. Thus, a coding scheme, which identifies the areas in an object that could be classified as distorted and subsequently use the EAP technique to padd the reference frame boundary blocks of such areas, would perform better than the method described in the previous section. The following scheme would address this issue.
Firstly, the LEP technique is used to pad the reference frame VOP. Motion compensation for all the arbitrary shaped boundary blocks in the current frame VOP is now performed taking this padded VOP as the reference. Subsequently, for each matching pair a measure of mismatch is calculated by averaging the three largest mismatch distances, H1, H2 and H3. (see Figure 3). If this average is greater than a threshold, T (2.0 for our experiments) a decision is made to re-pad the pixels that are within the best matching block, but are outside the shape of the reference video object, using the EAP technique. I.e., If
The prediction errors are calculated based on these new padding values. For the remaining blocks the prediction values are calculated using the original padding values [i.e. padded using LEP] of the reference frame VOP. References
|