Digital Imaging & Data Compression
|
|
Low-cost rate control algorithm development 1. Overview Rate control addresses an important issue that image compression ratios should be controlled at a given bit rate regardless the variety of compressibility characterised by input image content. While the rate control inevitably incurs additional distortion in order to force the compression ratio be close to the target, it is desired to minimise the distortion and thus optimise the output image quality at the decoding end. As research advances, such a rate control is becoming an important issue for applications where similar or constant compression ratios are required no matter how different the input images may vary. Typical applications of rate control can be illustrated in image communication, where constant bit rate is favoured in order to fully exploit the communication channel capacity. In computer science, especially database development, storage space management also requires rate-control in the sense that the compression ratios for all images be maintained at the same level, when data compression is concealed from end users. For example, one often takes it for granted that the space of one deleted image should be sufficient for storing another replacement image in a database. If data compression is applied, however, the true storage space would depend on the compression ratio achieved for each individual image, and the compression ratio varies between images. Without rate-control, the space allocated for the new image could be insufficient due to the fact that the new image is more difficult to compress than the old one. Existing research in still image compression has, so far, not addressed the rate control problem, since most of the data compression applications in computer science are limited at end user level rather than system level. Hence it is down to the end-users to be aware of the nature of variable compression ratios and see to the problems with manual control. As data compression becomes more and more involved at system level, requirement of controlling compression ratios at a certain level will be inevitable. In fact, the emerging JPEG-2000 standard has already included rate control as one of its target specifications for the algorithm development[5]. In telecommunication, however, rate control is under intensive research for the reason that the limited bandwidth needs to be utilised as efficient as possible. Hence, the international standards of MPEGs propose to use buffers to reduce the effect of the variable bit rate by using variable quantization tables[9-10]. Accordingly, the problem can be theoretically formulated as follows: Given
subject to: Where From the above formulation, it can be seen that the rate control is designed and applied within a group of N images. In the case of MPEG, N is the number of frames inside the GOP (group of pictures from an I-frame to the next I-frame). Existing research is represented by a number of rate-distortion based approaches[1-3,6-8,110-16] which are of high complexity in algorithm design, as the R-D (rate-distortion) characteristics of the input data is to be measured before any decision can be made about the quantization step assignment or information loss distribution. Typical work includes dynamic programming[11] and Lagrange multiplier optimisation[2,8,13,15]. To predict the R-D characteristics, various techniques have been proposed. These include trellis-based[11], model based[7,8] and piecewise approximation[8] approaches. Apart from the high complexity involved in estimating the R-D characteristics, the main drawback is that the optimisation of MSE (mean-square-error) measured distortion may not give the best image quality in terms of human visual perception. This is because MSE or PSNR values can only quantify the difference between every original pixel and its reconstruction, without considering human visual perception of those differences. Yet the human visual perception of each intensity difference indeed varies, depending on its location, neighbourhood, and image content represented. Our approach along this line of research is characterised by:
The following is one example of our proposed algorithms. 2. Algorithm Design 2.1 Review of JPEG-LS Near Lossless Image Compression The latest JPEG efforts in new international standards for lossless and near lossless image compression is represented by JPEG-LS[4] in which the main compression techniques proposed can be summarised by (i) run-length coding, (ii) non-linear prediction, (iii) context based statistics modelling; and (iv) entropy coding.
Run-length coding and prediction-based entropy coding are selected by a template of four neighbouring pixels as illustrated in Figure 1. To reduce the computing cost for statistical modelling and for selection of appropriate coding mode, JPEG-LS proposed the following three delta values to implement the local texture analysis.
For near lossless compression, information loss is introduced in JPEG-LS as a fixed, pre-determined constant value represented by a parameter NEAR[4]. If all the four neighbouring pixel values are the same (lossless) or their differences are less than NEAR (near lossless), it is a good indication that the local region surrounding the pixel to be encoded is of smooth texture. Hence, run-length coding is applied for encoding the next sequence of pixels until the run is broken. Otherwise, the texture may not be that smooth and thus a prediction based entropy coding is selected. This type of near lossless encoding takes the effect at the decoding end that the reconstructed pixels may have a maximum distortion of NEAR in comparison with their original values. Hence, the selection of its coding mode in JPEG-LS can be summarised below:
Prediction is designed by exploiting a simple local texture analysis among all the three context pixels inside the predictive template. Specifically, comparisons of intensity values of the three surrounding pixels, a, b, and c, are made to see if any horizontal or vertical edge can be detected. When an edge is detected among the three pixels, the pixel that is not on the edge will be taken as the predictive value. Otherwise, the predictive value will be a well-balanced value drawn from all three pixels. The entire prediction scheme can be described as follows:
2.2 Rate Control Algorithm Design To achieve rate-control for the near lossless compression mode of JPEG-LS, the feature of local texture analysis implemented by run-length mode and non-linear prediction needs to be further exploited to keep the rate-control effect to a minimum for the decoded image quality. Firstly, the compressibility of local texture can be classified into two categories represented by run-length coding mode and predictive coding mode. In the run-length coding mode, it can be observed that the local compressibility is determined by the value of its run-length. The higher the value of run-length, the easier the local region is compressed. Secondly, within the predictive coding mode, three further categories can be identified to determine the smoothness of its local texture decided by the three predictive values given in equation (4). It is recognised that human visual perception of distortion is less sensitive for noisier textures than for smoother textures. Hence, by considering all the facts, it is possible to take advantage of the local texture analysis embedded in the prediction of JPEG-LS to design a low complexity rate control algorithm. As an example, for cases where noisier texture is detected, human visual perception of the image quality allows us to introduce more information loss, as it tends to tolerate more distortion than the smooth texture. Therefore, information loss can be introduced in a way that is adaptive to the local texture analysis, and adaptive to the extent of difficulty in its regional compression. Considering the fact that JPEG-LS is designed to compress each pixel in one of the three different ways: (i) run-length coding; (ii) non-edge detected prediction and (iii) edge detected prediction, three different information loss levels are firstly designed to implement the principle that information loss should be introduced in regions where larger distortion is tolerable in terms of human visual perception. Consequently, the constant information loss parameter, NEAR in JPEG-LS, can be made to vary within the three different levels represented by NEAR_L, NEAR_M and NEAR_H, in the proposed algorithm, to indicate low level information loss, medium level information loss and high level information loss respectively. The corresponding coding operation can be modified below:
Generally, it is expected that the local region with predictive coding mode would normally represent a noisier texture. But it is difficult to substantiate that the edge detected area should be noisier than the non-edge detected. Considering the nature of the predictive coding proposed by JPEG-LS, further analysis reveals that the balanced predictive value, a+b-c, in fact accommodates those uncertain situations including (a) similar intensity values among all a, b, c pixels, and (b) an edge involves the pixel to be encoded, which is not detectable from the predictive pattern only. Therefore, it may not be exactly right to allocate NEAR_M to non-edge detected prediction and NEAR_H to edge detected prediction. As proved by our experiments, the initial values selected for both cases are actually the same. The design in equation (5), however, remains useful to reserve its flexibility and variety in consideration of the principle adopted for information loss distribution. It is also noted that the edge-detected prediction could represent a local area where visual perception may not tolerate high information loss. This occurs when the three pixels in the predictive pattern coincidentally represent a pure step edge inside the input image. Considering the fact that the prediction proposed in JPEG-LS is basically a median filter rather than a proper edge detector, it would be very unlikely that the edge detected by such a prediction scheme happens to be a pure step edge inside the image. In fact, the JPEG-LS prediction is only designed to reflect the intensity variety among a small group of three pixels to produce a good prediction. Hence, in most cases, the so-called edge that is detected by the prediction scheme may not represent any true edge inside the local area. On the other hand, the prediction may not detect any edge even when a pure step edge does exist. In addition, the distortion level allocated is location sensitive, down to each individual pixel. In a very unlikely case that a high distortion level is mistakenly allocated to a pure step edge, the effect is normally compensated in the next few pixels, since a smoother texture is bound to follow and the distortion level will be reduced accordingly. Having said that, further improvement may be achievable if a more reliable texture analysis method is used to pinpoint the information loss level for each pixel encoded. This would significantly increase the complexity of the overall algorithm design, which may not be justified in the case of JPEG-LS. To detect the compressibility of each local region inside the input image, the compression ratio can be computed and compared with the target compression ratio as the encoding proceeds. Such a comparison would give us a good indication whether the information loss introduced should be increased or decreased, in order to make necessary adjustment and force the compression ratio to be close to or equal to the pre-determined target compression ratio. Therefore, we introduce a way of assessing on-line compression at the end of each row. By on-line compression, we mean a compression ratio that is computed and achieved so far at a point before the next pixel is to be encoded. Indeed, by designating those points at the end of each row along the scanning route, we have the advantage of allowing the compression ratio to adjust itself by the local compressibility embedded inside each row. At each compression assessment point, the rate control is implemented on the simple principle that if the on-line compression ratio is smaller than the target compression ratio, we should introduce more information loss to force the on-line compression ratio be close to the target. Otherwise we decrease the information loss to allow better compression quality or to allow opportunities of compensation for any extra information loss introduced earlier. The increase/decrease of information loss is designed in such an order that the local texture analysis in prediction is always considered whenever the information loss needs to be increased or decreased. This would enable us to avoid any excessive information loss, when the reconstructed image quality is interpreted by human visual perception. Firstly, the initialisation of the three parameters, NEAR_H, NEAR_M and NEAR_L, should be designed to reflect the principle that for noisy texture, more information loss should be introduced, and for smooth textures, less information loss should be introduced. Once the initial values of those parameters are designed, it is desirable to keep their distances that way unless the local compressibility demands a different distribution. This will be accommodated by the rate control scheme in which, each individual parameter will be adjusted accordingly. To further exploit the local texture based information loss distribution in the rate control process, we design the order of increase as: H-M-L and the order of decrease as: L-M-H. This is designed to keep it minimum for the information loss introduced to sensitive areas by leaving the parameter L be increased at the end of the adjustment cycle, and decreased at the first of the adjustment cycle. For the convenience of description, we use H, M, and L to represent NEAR_H, NEAR_M and NEAR_L respectively. In order to keep the change of the distances among the three parameters minimum, however, the value of increase or decrease is designed to be unity, and the order of both increase and decrease is always maintained as designed at every possible stage of the adjustment. In order to correctly identify the right parameter for the next adjustment, either increase or decrease, we use two arrays of three elements to hold the parameters, which have already been adjusted through the cycle, as well as those remaining parameters. This is illustrated in Figure 2, which shows the initial state before rate control is started. Each time a parameter is increased or decreased, it will be moved across from its original position to the corresponding position in the other array, i.e., H at the top, M at the middle and L at the bottom. Hence, the rate control algorithm can be described by the movement of the three parameters from one array to the other according to the order of adjustment pre-defined.
At each compression assessment point, the on-line compression ratio is compared with the target compression ratio to determine whether the information loss parameter for the next row of pixels should be increased or decreased. Whenever the on-line compression ratio is detected to be smaller than the target, we increase the information loss parameter to force the compression ratio changing towards the target. Otherwise, we would decrease the parameters to prevent any excessive distortion from being introduced and to improve the image quality. To facilitate the rate control operation of identifying the correct parameter for the next adjustment, we name the array, which holds the remaining parameters, I-array (for increase) or D-array (for decrease) depending on whether the information loss parameter has just been increased or decreased for the current row. Once this array is named correctly, the other array must be either D-array or I-array accordingly. In this way, the action by rate control upon the information loss parameter for the next row can be described as follows:
To further illustrate the key operations of the algorithm, we present an example as follows, in which it is assumed that the rate control operations for a sequence of nine compression assessment points are given below: I, I, D, D, D, D, I, I, I Where I stands for increase of parameters and D for decrease of parameters. Further assume that the starting state of the parameters is the same as that illustrated in Figure 2. Hence, the array full of parameters can be named as I-array, since the next movement is to increase the parameter. Along the top-down direction, the two parameters, H and M, can be increased by one and moved to the D-array as a result of the two I-operations in the above sequence. After that, the state of parameters can be described by Figure 3-(a).
As the next move in the sequence is to decrease the parameter, the search will be conducted instead in the D-array along the bottom-up direction to identify the parameter to be decreased and moved to the I-array. This would give us a state of parameters as shown in Figure 3-(b). After the second D operation inside the sequence is completed, hence, it can be seen that the D-array becomes empty and the array full of parameters needs to be renamed, according to the rate control algorithm proposed. This state before its rename is illustrated in Figure 3-(c). Since the next move is another decrease, the original I-array should now be renamed as D, and the parameter L is moved across to the I-array as shown in Figure 3-(d). Carrying on with the remaining operations in the sequence, the rest of all corresponding states can be obtained accordingly. In the above algorithm design, each parameter is only increased or decreased by 1 whenever it is being moved across to the other array. This is designed to reflect the principle that only the smallest possible adjustment is made, in order to avoid any excessive information loss being introduced unnecessarily. 2.3. Alternative Approaches Two alternatives can be added or incorporated into the above algorithm. One is to design the arrays with variable number of elements to have different orders when the distortion level is increased or decreased. The other is to introduce a prediction scheme to predict the terminal compression ratio at each compression assessment point (end of each row). Thus the distortion level will be varied depending on the comparison between the target compression ratio and the predicted compression ratio. As a matter of fact, by varying the size of the two arrays, the algorithm can be designed to have larger differences in interpreting local textures to optimise the image quality in terms of human visual perception. For example, in cases high target compression ratios (>3:1) are required, it may be more important to have a better quality for those smooth regions. Under this context, it may be desirable to further delay increasing the value of parameter L. Hence, the visual inspection can be made more comfortable to see high quality in smooth areas. The principle of so doing is to add more elements of M and H into the two arrays. One example is shown in Figure 4, where two arrays of five elements are designed and thus the adjustment of L would not be made until both M and H are adjusted twice.
Prediction can be designed as a line-fitting operation, in which the predictive line is constructed in such a way that the distance between the line and all the concerned OCR (on-line compression ratio) values is kept minimum. Hence, the remaining issue is to determine how many OCR values should be involved in this line-fitting process and how those values should be selected. In fact, a simpler predictive line can be constructed by selecting only two points, (current_row, OCR[current_row]) and (current_row-distance, OCR[current_row-distance]). This is illustrated in Figure 5. As a result, the final compression ratio at the end of input image can be predicted by the following linear equations, corresponding to the point: (height, PCR): and hence:
Selection of distance enables us to reduce the effect of those drastic and temporary changes in past OCRs, which may be unreliable to the prediction of the final compression ratio.
References
Choi J. and Park D. A stable feedback control of the buffer state using controlled Lagrange multiplier method, IEEE Trans. Image Processing, Vol 3, Sept 1994, pp 546-558. Hsu C. Y., Ortega A. and Reibman A. Joint selection of source and channel rate for VBR video transmission under ATM policing constraints IEEE J. Select. Areas Commun., Vol 15, Aug. 1997, pp 1016-1028. ISO/IEC JTC 1/SC 29/WG1 FCD 14495-public draft, July 16th, 1997. http://www.jpeg.org/public/jpeglinks.htm JTC 1.29.14, 15444: Call for contributions for JPEG-2000, http://www.jpeg.org Keesman G., Shah I. And Klein-Gunnewiek R. Bit-rate control for MPEG encoders Signal Processing, Image communication, Vol 6, pp 545-560, Feb 1995. Lin L.J., Ortega A. and Kuo C.C.J. Rate control using spline-interpolated R-D characteristics Proc. SPIE Visual Commun. Image Processing96, Orlando, FL., Mar. 1996, pp 111-122. Lin L.J. and Ortega A. Bit-rate control using piecewise approximated rate-distortion characteristics IEEE Trans. Circuits & Systems for Video Technology, Vol 8, No 4, August 1998, pp 446-459. MPEG-2 encoder v.1.1a, MPEG Software simulation Group, (online) http://www.mpeg.org/tristan/MPEG/MSSG
|