Digital Imaging & Data Compression

 

Image Compression with Neural Networks

Overview

Apart from the existing technology on image compression represented by series of JPEG, MPEG, and H.26x standards, new technology such as neural networks and genetic algorithms are being developed to explore the future of image coding. Successful applications of neural networks to vector quantization have now become well established, and other aspects of neural network involvement in this area are stepping up to play significant roles in assisting with those traditional compression techniques. Existing research can be summarised as follows:

  1. Back-Propagation Image Compression;

  2. Hebbian Learning Based Image Compression;

  3. Vector Quantization Neural Networks;

  4. Predictive Coding Neural Networks.

1. Basic Back-Propagation Neural Network

The neural network structure can be illustrated in Fig.1. Three layers, one input layer, one output layer and one hidden layer, are designed. Both input layer and output layer are fully connected to the hidden layer. Compression is achieved by designing the value of K, the number of neurones at the hidden layer, less than that of neurones at both input and output layers. The input image is split up into blocks or vectors of 88, 44 or 1616 pixels. When the input vector is referred to as N-dimensional which is equal to the number of pixels included in each block, all the coupling weights connected to each neurone at the hidden layer can be represented by {wji, j = 1, 2, ... K and i = 1, 2, ... N}, which can also be described by a matrix of KN. From the hidden layer to the output layer, the connections can be represented by {w’ij : } which is another weight matrix of NK. Image compression is achieved by training the network in such a way that the coupling weights, {wji}, scale the input vector of N-dimension into a narrow channel of K-dimension (K < N) at the hidden layer and produce the optimum output value which makes the quadratic error between input and output minimum. In accordance with the neural network structure, the operation can be described as follows:

                                            1jK             (1)

for encoding and

                                            1iN              (2)

for decoding.

where xi[0, 1] denotes the normalised pixel values for grey scale images with grey levels [0, 255]. The reason of using normalised pixel values is due to the fact that neural networks can operate more efficiently when both their inputs and outputs are limited to a range of [0, 1]. Good discussion on a number of normalization functions and their effect on neural network performances can be found in reference.

                                                        wpe1.jpg (10009 bytes)

The above linear networks can also be designed into non-linear if a transfer function such as sigmoid is added to the hidden layer and the output layer to scale the summation down in the above equations. There is no proof, however, that the non-linear network can provide better solution than its linear counterpart.

With this basic back-propagation neural network, compression is conducted in two phases: training and encoding. In the first phase, a set of image samples is designed to train the network via back-propagation learning rule which uses each input vector as the desired output. This is equivalent to compressing the input into the narrow channel represented by the hidden layer and then reconstructing the input from the hidden to the output layer.

The second phase simply involves the entropy coding of the state vector hj at the hidden layer. In cases that adaptive training is conducted, the entropy coding of those coupling weights may also be required in order to catch up with some input characteristics that are not encountered at the training stage. The entropy coding is normally designed as the simple fixed length binary coding although many advanced variable length entropy coding algorithms are available.

This neural network development, in fact, is in the direction of K-L transform technology, which actually provides the optimum solution for all linear narrow channel type of image compression neural networks. When equations (1) and (2) are represented in matrix form, we have

                                [h] = [W]T[x]                             (3)

                                                (4)

for encoding and decoding.

The K-L transform maps input images into a new vector space where all the coefficients in the new space is de-correlated. This means that the covariance matrix of the new vectors is a diagonal matrix whose elements along the diagonal are eigen-values of the covariance matrix of the original input vectors. Let ei and , i = 1, 2, ... n, be eigen-vectors and eigenvalues of cx, the covariance matrix for input vector x, and those corresponding eigen-values are arranged in a descending order so that , for i =1, 2, ... n-1. To extract the principal components, K eigen-vectors corresponding to the K largest eigen-values in cx are normally used to construct the K-L transform matrix, [AK], in which all rows are formed by the eigen-vectors of cx. In addition, all eigen-vectors in [AK] are ordered in such a way that the first row of [AK] is the eigen-vector corresponding to the largest eigen-value, and the last row is the eigen-vector corresponding to the smallest eigen-value. Hence, the forward K-L transform or encoding can be defined as:

                                [y] = [AK]([x] - [mx])                  (5)

and the inverse K-L transform or decoding can be defined as:

                                 [] = [AK]T[y] + [mx]                (6)

where [mx] is the mean value of [x] and [] represents the reconstructed vectors or image blocks. Thus the mean square error between x and is given by the following equation:

                                     (7)

where the statistical mean value E{.} is approximated by the average value over all the input vector samples which, in image coding, are all the non-overlapping blocks of 44 or 88 pixels.

Therefore, by selecting the K eigen-vectors associated with the largest eigen-values to run K-L transform over input image pixels, the resulting errors between reconstructed image and original one can be minimised due to the fact that the values of ’s decrease monotonically.

From the comparison between the equation pair (3-4) and the equation pair (5-6), it can be concluded that the linear neural network reaches the optimum solution whenever the following condition is satisfied:

                                [W’][W]T=[AK]T[AK]                  (8)

Under this circumstance, the neurone weights from input to hidden and from hidden to output can be described respectively as follows:

                                [W’] = [AK][U]-1; [W]T = [U][AK]T                  (9)

where [U] is an arbitrary KK matrix and [U][U]-1 gives an identity matrix of KK. Hence, it can be seen that the linear neural network can achieve the same compression performance as that of K-L transform without necessarily obtaining its weight matrices being equal to [AK]T and [AK].

2. Hierarchical Back-Propagation Neural Network

The basic back-propagation network can be further extended to construct a hierarchical neural network by adding two more hidden layers into the existing network, in which the three hidden layers are termed as combiner layer, compressor layer and decombiner layer. The structure can be shown in Figure 2. The idea is to exploit correlation between pixels by inner hidden layer and to exploit correlation between blocks of pixels by outer hidden layers. From input layer to combiner layer and decombiner layer to output layer, local connections are designed which has the same effect as M fully connected neural sub-networks.

                                        wpe2.jpg (31302 bytes)

Training of such a neural network can be conducted in terms of: (i) Outer Loop Neural Network (OLNN) Training; (ii) Inner Loop Neural Network (ILNN) Training; and (iii) Coupling weight allocation for the Overall Neural Network.

3. Adaptive Back-Propagation Neural Network

Adaptive back-propagation neural network is designed to make the neural network compression adaptive to the content of input image. The general structure for a typical adaptive scheme can be illustrated in Fig. 3, in which a group of neural networks with increasing number of hidden neurones, (hmin, hmax), is designed. The basic idea is to classify the input image blocks into a few sub-sets with different features according to their complexity measurement. A fine tuned neural network then compresses each sub-set.

                                        wpe5.jpg (18838 bytes)

Training of such a neural network can be designed as: (a) parallel training; (b) serial training; and (c) activity based training;

The parallel training scheme applies the complete training set simultaneously to all neural networks and use S/N (signal-to-noise) ratio to roughly classify the image blocks into the same number of sub-sets as that of neural networks. After this initial coarse classification is completed, each neural network is then further trained by its corresponding refined sub-set of training blocks.

Serial training involves an adaptive searching process to build up the necessary number of neural networks to accommodate the different patterns embedded inside the training images. Starting with a neural network with pre-defined minimum number of hidden neurones, hmin, the neural network is roughly trained by all the image blocks. The S/N ratio is used again to classify all the blocks into two classes depending on whether their S/N is greater than a pre-set threshold or not. For those blocks with higher S/N ratios, further training is started to the next neural network with the number of hidden neurones increased and the corresponding threshold readjusted for further classification. This process is repeated until the whole training set is classified into a maximum number of sub-sets corresponding to the same number of neural networks established.

In the next two training schemes, extra two parameters, activity A(Pl) and four directions, are defined to classify the training set rather than using the neural networks. Hence the back propagation training of each neural network can be completed in one phase by its appropriate sub-set.

The so called activity of the lth block is defined as:

                                                     (10)

and                  (11)

where Ap(Pl(i,j)) is the activity of each pixel which concerns its neighbouring 8 pixels as r and s vary from -1 to +1 in equation (11).

Prior to training, all image blocks are classified into four classes according to their activity values which are identified as very low, low, high and very high activities. Hence four neural networks are designed with increasing number of hidden neurones to compress the four different sub-sets of input images after the training phase is completed.

On top of the high activity parameter, further feature extraction technique is applied by considering four main directions presented in image details, i.e., horizontal, vertical and the two diagonal directions. These preferential direction features can be evaluated by calculating the values of mean squared differences among neighbouring pixels along the four directions.

For those image patterns classified as high activity, further four neural networks corresponding to the above directions are added to refine their structures and tune their learning processes to the preferential orientations of the input. Hence, the overall neural network system is designed to have six neural networks among which two correspond to low activity and medium activity sub-sets and other four networks correspond to the high activity and four direction classifications.

4. Hebbian Learning Based Image Compression

While the back-propagation based narrow-channel neural network aim at achieving compression upper bounded by K-L transform, a number of Hebbian learning rules have been developed to address the issue how the principal components can be directly extracted from input image blocks to achieve image data compression. The general neural network structure consists of one input layer and one output layer. Hebbian learning rule comes from Hebb’s postulation that if two neurones were very active at the same time which is illustrated by the high values of both its output and one of its inputs, the strength of the connection between the two neurones will grow or increase. Hence, for the output values expressed as [h] = [w]T[x], the learning rule can be described as:

                                                         (12)

where, Wi(t+1) = {wi1, wi2, ... wiN} - the ith new coupling weight vector in the next cycle (t+1); 1iM and M is the number of output neurones.

- learning rate; hi(t) - ith output value; X(t) - input vector, corresponding to each individual image block.

- Euclidean norm used to normalise the updated weights and make the learning stable.

From the basic learning rule, a number of variations have been developed in the existing research.

5. Vector Quantization Neural Networks

Since neural networks are capable of learning from input information and optimising itself to obtain the appropriate environment for a wide range of tasks, a family of learning algorithms have been developed for vector quantization.  The input vector is constructed from a K-dimensional space. M neurones are designed to compute the vector quantization code-book in which each neurone relates to one code-word via its coupling weights. The coupling weight, {wij}, associated with the i’th neurone is eventually trained to represent the code-word ci in the code-book. As the neural network is being trained, all the coupling weights will be optimised to represent the best possible partition of all the input vectors. To train the network, a group of image samples known to both encoder and decoder is often designated as the training set, and the first M input vectors of the training data set are normally used to initialise all the neurones. With this general structure, various learning algorithms have been designed and developed such as Kohonen’s self-organising feature mapping, competitive learning, frequency sensitive competitive learning, fuzzy competitive learning, general learning, and distortion equalised fuzzy competitive learning and PVQ (predictive VQ) neural networks.

Let Wi(t) be the weight vector of the i’th neurone at the t’th iteration, the basic competitive learning algorithm can be summarised as follows:

                                                     (13)

                                                                     (14)

where d(x, Wi(t)) is the distance in L2 metric between input vector x and the coupling weight vector Wi(t) = {wi1, wi2, ... wiK}; K = pp; is the learning rate, and zi is its output.

A so called under-utilisation problem occurs in competitive learning which means some of the neurones are left out of the learning process and never win the competition. Various schemes are developed to tackle this problem. Kohonen self-organising neural network overcomes the problem by updating the winning neurone as well as those neurones in its neighbourhood.

Frequency sensitive competitive learning algorithm addresses the problem by keeping a record of how frequent each neurone is the winner to maintain that all neurones in the network are updated an approximately equal number of times. To implement this scheme, the distance is modified to include the total number of times that the neurone i is the winner. The modified distance measurement is defined as:

                                                 (15)

where ui(t) is the total number of winning times for neurone i up to the t’th training cycle. Hence, the more the ith neurone wins the competition, the greater its distance from the next input vector. Thus, the chance of winning the competition diminishes. This way of tackling the under-utilisation problem does not provide interactive solutions in optimising the code-book.

Around the competitive learning scheme, fuzzy membership functions are introduced to control the transition from soft to crisp decisions during the code-book design process. The essential idea is that one input vector is assigned to a cluster only to a certain extent rather than either ‘in’ or ‘out’. The fuzzy assignment is useful particularly at earlier training stages which guarantees that all input vectors are included in the formation of new code-book represented by all the neurone coupling weights. Representative examples include direct fuzzy competitive learning, fuzzy algorithms for learning vector quantization and distortion equalised fuzzy competitive learning algorithm etc.

6. Predictive Coding Neural Networks

Predictive coding has been proved a powerful technique in de-correlating input data for speech compression and image compression where a high degree of correlation is embedded among neighbouring data samples. Although general predictive coding is classified into various models such as AR and ARMA etc., auto-regressive model (AR) has been successfully applied to image compression. Hence, predictive coding in terms of applications in image compression can be further classified into linear and non-linear AR models. Conventional technology provides a mature environment and well developed theory for predictive coding which is represented by LPC (linear predictive coding) PCM(pulse code modulation), DPCM(delta PCM) or their modified variations. Non-linear predictive coding, however, is very limited due to the difficulties involved in optimising the coefficients extraction to obtain the best possible predictive values. Under this circumstance, neural network provides a very promising approach in optimising non-linear predictive coding.

With linear AR model, predictive coding can be described by the following equation:

                                                                (16)

where p represents the predictive value for the pixel Xn which is to be encoded in the next step. Its neighbouring pixels, Xn-1, Xn-2, ... Xn-N, are used by the linear model to produce the predictive value. vn stands for the errors between the input pixel and its predictive value. vn can also be modelled by a set of zero-mean independent and identically distributed random variables.

Based on the above linear AR model, a multi-layer perceptron neural network can be constructed to achieve the design of its corresponding non-linear predictor as shown in Fig. 4. For the pixel Xn which is to be predicted, its N neighbouring pixels obtained from its predictive pattern are arranged into one dimensional input vector X = {Xn-1, Xn-2, ... Xn-N} for the neural network. A hidden layer is designed to carry out back propagation learning for training the neural network. The output of each neurone, say the j’th neurone, can be derived from the equation given below:

                                                             (17)

where f(v)=is a sigmoid transfer function.

                                                 wpe6.jpg (14213 bytes)

To predict those drastically changing features inside images such as edges, contours etc., high-order terms are added to improve the predictive performance. This corresponds to a non-linear AR model expressed as follows:

                                             (18)

Hence, another so called functional link type neural network can be designed to implement this type of non-linear AR model with high-order terms. The structure of the network is illustrated in Fig. 5. It contains only two layers of neurones, one for input and the other for output. Coupling weights, {wi}, between input layer and output layer are trained towards minimising the residual energy which is defined as:

                        RE =                              (19)

where is the predictive value for the pixel Xn.

                                   wpe7.jpg (16015 bytes)

Back to Research Activities