Digital Imaging & Data Compression

 

Object-based Stereo Image Compression

1. Overview

To reflect the true disparity between left and right frames, free-form object based matching would be more efficient than block-based techniques. This has been proved in video compression, such as MPEG-4. However, the price to pay is to encode the arbitrary shape of those objects and the complexity level will also be increased accordingly. To make a better balance, we proposed a hybrid scheme with features of both block-based and object-based coding techniques to achieve data compression for stereo image pairs. The proposed algorithm comprises the following operations:

  1. Extract objects by contour analysis for both frames, and the objects are then matched to find those areas, which are similar in terms of their shapes;

  2. Enclose the matching object pairs by object bounding rectangles, in a similar spirit of MPEG-4. A parity based contour filling algorithm is further used to identifty the interior and exterior pixels of the object, in which the shapes of the objects are identified by binary object planes. These planes are only used as an aid in the coding process rather than being encoded.

  3. Encode the objects in terms of three groups: (a) objects that are enclosed by closed contours; (b) objects enclosed by contours that terminate at the image frame boundaries; and (c) unidentified areas that are treated as the background.

2. Algorithm Design

Contour Analysis of Stereo Images

The contour is extracted by a two-step process. Firstly, the image is convolved (LoG*Image) with a Laplacian-of-Gaussian (LoG) operator, defined by:

(1)

where and is the standard deviation. To facilitate its application, equation (1) can be discretized in various ways.

Secondly, edges are detected at the zero-crossing points (e.g., patterns such as "++--" and "--++" along both vertical and horizontal directions). In this step, the slopes of the LoG of the image along both x and y directions, denoted by Sx and Sy, are used to compute the edge strength at each zero-crossing point. An edge strength at a point ( x,y ) is defined as follows:

S (x,y) = (2)

The contour points are chosen using a hysteresis thresholding technique, which is given as follows:

The edge strength at each point along the contour is greater than Tl and at least one point on the contour has an edge strength greater than Tu , where Tl and Tu are pre-set thresholds and Tl < Tu . Generally, Tl is set sufficiently low to preserve the whole contour around the region boundary and Tu is chosen large enough to avoid spurious edges. Contour search is initiated whenever one point with a value greater than Tu is scanned. The search is conducted in both directions of the contour and the neighbouring pixels, with values greater than Tl being accepted as contour points. The search is terminated when no neighbouring pixels are found to satisfy this condition. Then all edge strength values along the detected contour are set to zero so that these points will not be visited again. The same search operation continues until the whole edge strength array has been scanned. Figure 1 illustrate an example:

                wpe3.jpg (22617 bytes)

Contour Matching

The contoured objects are matched in terms of: (i) those that are closed in as illustrated in Fig. 3(a); and (ii) those which are open, and terminated at the image boundaries as shown in Fig. 3(b).

        wpe4.jpg (11341 bytes)

Closed contour matching: For every closed-in contour, three shape attributes are computed: (i) the number of pixels representing the perimeter of the contour, ; (ii) the y co-ordinate of the centroid ; and (iii) the first invariant moment, h (invariant to translation, rotation and isotropic scale changes). Note that is considered as a good matching attribute as we assume that the vertical stereo disparity between points in the image frames are negligible. This is because that parallel axis geometry [1] is used in obtaining all the test stereo image pairs. If and represent the x and y co-ordinates of the points along the contour, can be defined as follows:

                        (3)

Where              and . (4)

Every closed-in contour of the right frame is compared with that of the left frame. A contour from the left frame is accepted as an initial candidate match for the right frame contour, if the differences between each of their shape attributes fall below some pre-set thresholds (e.g. 20% for h, and 10% for y0 and n). At the end of this procedure, closed-in contours in right frames may have multiple candidate matches and a further step is necessary to find the best match among them. This is performed as follows.

Firstly, the contours are converted into their chain codes . A chain code is a more succinct way of representing a contour than a simple collection of co-ordinate points along the contour. It describes a contour as a series of one-pixel vector transitions at different orientations (8 different move directions for 8-connected contours), with only the starting point being defined explicitly. In our experiments, we chose the leftmost pixel out of those topmost pixels in a given contour as the starting point. However, the final result would be independent of this starting point. To prevent the so called wrap around problem in standard chain code representations, we first convert a given standard chain code {a1 a2 ...an} with length n into a modified code {b1 b2 ... bn} by a shifting operation defined recursively by:

b1= a1

bi = qi , qi is an integer such that (qi -ai) mod 8 =0 and is miniminzed, i = 2, 3, ... n.

Let and be the chain code representations of two contours L and R (modified as above), which correspond to the left and right frames respectively, and let NL and NR be their lengths. We can define a measure of correlation MCkl between two n-point segments, one starting at index k of contour L and the other starting at index l of contour R. The specific definition of MCkl is given below:

                                    MCkl = (5)

Where, and defined similarly.

In order to identify the location of the best fit between the two contours, an n-point segment of L, starting at index k (e.g. k=0), is slid over contour B. The similarity function, , where specifies the search range, is then used to locate the best fit. Contour R in the right image and contour L in the left image are selected as a matched pair, if the following two conditions are satisfied.

  1. , whererepresents all the contours with similar shapes (initial candidate matches, above) to contour.

  2. , where

is a pre-set threshold that eliminates matches with poor correlation.

In the case that multiple contours get matched to the same contour, the pair with the highest is selected as the matching pair.

Open contour matching: For matching of open contours, the salient segments along the contours are used as the matching primitives. Salient segments such as corners can be detected from the chain code representation. Specifically, for a contour of length with chain code , we define a measure of curvature at the point as,

                                            (6)

where, is the standard deviation of the LoG operator used for the contour extraction. The point along the contour is chosen as a salient point if both of the following conditions are satisfied,

  1. , and

  2. for all

where p is a constant that determines the minimum distance between the salient points, and is a threshold specifying the minimum acceptable curvature. The contour segments surrounding the salient points are then used as 1-D templates in finding the corresponding matches in the other image.

Contour Blocking

Contour blocking is a process designed to divide the matched objects into blocks in a most efficient way so that different encoding techniques can be applied to compress the stereo image pair. The basic principle is similar to MPEG-4 techniques, which include fitting an object into smallest possible rectangle, padding the reference object and produce disparity compensated predictive errors.

Object Encoding

The stereo image pair is encoded on an object basis by identifying three types of objects within the images. These include: objects that are bounded within closed contours, objects that have open contours and both ends of the open contour terminate at image boundaries, and the rest which fall into the background. Although there exist differences among the three types and their encoding requires individual co-ordination, the major proposed encoding techniques can be described as: (i) padding of left object bounding rectangles; (ii) disparity-based prediction for both arbitrary shaped boundary blocks and internal blocks; (iii) background encoding.

Therefore, the proposed encoding scheme can be highlighted as follows:

  1. padding of left object bounding rectangles features local gradient inside the object;

  2. The disparity values and the prediction errors for each arbitrary shaped boundary block is transmitted on an object by object basis. In this way, the error blocks and disparity values can be properly identified and used to reconstruct the right objects at the decoding end.

  3. In the prediction process, the shape of the objects in the left frame are used to produce errors for right objects, and thus no bits are consumed to encode the shape information. Yet correct shape can be reconstructed at the decoding end since the lost information will be picked up by corresponding background block encoding;

  4. Internal areas are encoded adaptive to their local texture.

Back to Research Activities