Digital Imaging & Data Compression

 

Block-bases Stereo Image Compression

1. Overview of Stereo Image Compression:

3D stereo images are acquired by simulating human’s eyesight effect upon observing objects through two horizontally separated perspectives. Correspondingly, two frames are resulted for one 3-D image, labelled as left frame and right frame. If these two frames are to be transmitted with the idea of reconstructing the 3D image at the receiver end, we would need double the bandwidth required for monocular image transmission. Therefore, data compression is necessary.

To develop data compression algorithms for stereo images, two important factors need to be considered:

1.1 Camera Geometry: The arrangement for the separation and relative orientation of the two cameras is referred to as the camera geometry. There are normally two types of camera geometry. One is parallel axes geometry popular with still stereo image pairs and the other is converging axes geometry, which is mostly used for stereo image sequence and stereo computer vision. The camera geometry establishes a model and a foundation for all data compression algorithm development, which is mainly represented by disparity estimation, motion estimation to remove the redundancy and achieve data compression.

1.1.1 Parallel axes geometry: In a parallel axis geometry, the image planes of the two cameras are arranged as coplanar and collinear with identical optical characteristics to acquire the stereo image pair. This can be illustrated as follows:

                        wpe1.jpg (12813 bytes)

 Where: f – the focal length, the distance between the camera location and the image planes;

P(x,y,z) – a general scene point, which is projected to PL on the left image and PR on the right image;

(xL,yL), (xR,yR) and (x,y,z) represent left co-ordinates, right co-ordinates and global co-ordinates respectively.

If we superimpose one of the stereo frames on top of the other, we would observe that the matching pixels on the images do not coincide, but are apparently displaced from one another. This is called disparity or binocular parallax. The parallel axis geometry has the following properties:

  1. the vertical component of the disparity vectors is always zero;

  2. the depth of an object, z-f, is inversely proportional to the disparity, which is given by:

                        (1)

Equation (1) is obtained from Figure 2:

                        wpe2.jpg (10572 bytes)

1.1.2. Converging axes geometry: This camera geometry can be illustrated in Figure 3:

                                            wpe3.jpg (8329 bytes)

The converging axes geometry has the following properties:

  1. It enables the fields of view to overlap fully in the scene volume of interest;

  2. It is easily calibrated as such that the origin of the global co-ordinate system is positioned at where the optical axes of the two cameras converge (see Figure 3).

  3. The disparity will have both horizontal and vertical components and none of them is normally zero.

1.2 Normal Block-based Compression Strategy

  1. Compress left frame by using any conventional still image compression algorithm;

  2. For each block in the right frame, search the left frame for the best match to estimate the disparity;

  3. Calculate the DFD (displaced frame difference) similar to MPEG;

  4. DCT the DFD, quantization and entropy coding, or using other alternatives to encode the DFD, such as wavelets.

2. Our Pioneering-Block Based Compression Technique

The pioneering-block based compression technique features low complexity in algorithm design, low-cost in implementation and easy to develop. The basic idea can be illustrated in Figure 4.

                                                wpe4.jpg (10711 bytes)

Let represent the block to be encoded, which is of size 88 pixels, and in the ith row and jth column of the right frame and Lij the corresponding block in the left frame. Two blocks, and , can be selected in the right frame to construct a so called pioneering block since they are the immediate neighbours both vertically and horizontally to this block.

Since the stereo pair is acquired by parallel axes geometry, the disparity between the two frames will only occur horizontally rather than vertically. Hence, the two neighbouring blocks, and , can be used instead to search for the best match. In this way, overhead bits can be eliminated as the two blocks are always available at the decoding end before the block is decoded. To align the right frame with the left and search for the best match between the two frames, a pioneering block denoted as is produced as follows from the two neighbouring blocks.

                                = (2)

where , (0,1) are used to determine the weight of each block in the construction of .

We then use the to search the left frame for its best match constructed in the same way as in equation (2) from two neighbouring blocks inside the left frame. Considering the fact that we are only concerned with the horizontal disparity, the search in the same row as that of the right frame can actually be limited to a window of N pixels wide. Although this can be taken to extend upto the right end of the left image, our experiments showed that N=64 is a reasonable choice.

Let represent the pioneering block constructed from the right frame, and the corresponding blocks constructed from equation (2) inside the search window of the left frame, the distance between the two blocks can be calculated as below:

                        (3)

where , (18) stands for pixels from the right frame and the left frame respectively.

The best match for is then selected by :

                         (4)

Assuming that the best match , comes from the two blocks, and , the best match for the block to be encoded can then be determined as the block . As the search is limited inside the window in which all possible blocks are in the same row, the best matching block , may only have different column index represented by . The whole process of searching, matching and predicting is illustrated in Figure 4.

is then transformed into the frequency domain to give and the frequency domain predictor is found as, , dropping the subscripts of . Letbe the frequency domain representation of the block to be predicted, . The prediction error is then found in the quantized, frequency domain as :

(5)

where represents the scalar quantization function. This error is then encoded and transmitted in conjunction with the Runlegth and Huffman coding procedures described in JPEG.

Further details about the performance of the algorithm are available in the research report at Standford University, USA.

Back To Research Activities