Digital Imaging & Data Compression
|
|
Image Compression with Neural Networks Overview Apart from the existing technology on image compression represented by series of JPEG, MPEG, and H.26x standards, new technology such as neural networks and genetic algorithms are being developed to explore the future of image coding. Successful applications of neural networks to vector quantization have now become well established, and other aspects of neural network involvement in this area are stepping up to play significant roles in assisting with those traditional compression techniques. Existing research can be summarised as follows:
1. Basic Back-Propagation Neural Network The neural network
structure can be illustrated in Fig.1. Three layers, one input layer, one output layer and
one hidden layer, are designed. Both input layer and output layer are fully connected to
the hidden layer. Compression is achieved by designing the value of K, the number
of neurones at the hidden layer, less than that of neurones at both input and output
layers. The input image is split up into blocks or vectors of 8
for encoding and
for decoding. where xi
The above linear networks can also be designed into non-linear if a transfer function such as sigmoid is added to the hidden layer and the output layer to scale the summation down in the above equations. There is no proof, however, that the non-linear network can provide better solution than its linear counterpart. With this basic back-propagation neural network, compression is conducted in two phases: training and encoding. In the first phase, a set of image samples is designed to train the network via back-propagation learning rule which uses each input vector as the desired output. This is equivalent to compressing the input into the narrow channel represented by the hidden layer and then reconstructing the input from the hidden to the output layer. The second phase simply involves the entropy coding of the state vector hj at the hidden layer. In cases that adaptive training is conducted, the entropy coding of those coupling weights may also be required in order to catch up with some input characteristics that are not encountered at the training stage. The entropy coding is normally designed as the simple fixed length binary coding although many advanced variable length entropy coding algorithms are available. This neural network development, in fact, is in the direction of K-L transform technology, which actually provides the optimum solution for all linear narrow channel type of image compression neural networks. When equations (1) and (2) are represented in matrix form, we have [h] = [W]T[x] (3)
for encoding and decoding. The K-L transform maps
input images into a new vector space where all the coefficients in the new space is
de-correlated. This means that the covariance matrix of the new vectors is a diagonal
matrix whose elements along the diagonal are eigen-values of the covariance matrix of the
original input vectors. Let ei and [y] = [AK]([x] - [mx]) (5) and the inverse K-L transform or decoding can be defined as:
[ where [mx]
is the mean value of [x] and [
where the statistical mean
value E{.} is approximated by the average value over all the input vector samples
which, in image coding, are all the non-overlapping blocks of 4 Therefore, by selecting the
K eigen-vectors associated with the largest eigen-values to run K-L transform over
input image pixels, the resulting errors between reconstructed image and original one can
be minimised due to the fact that the values of From the comparison between the equation pair (3-4) and the equation pair (5-6), it can be concluded that the linear neural network reaches the optimum solution whenever the following condition is satisfied: [W][W]T=[AK]T[AK] (8) Under this circumstance, the neurone weights from input to hidden and from hidden to output can be described respectively as follows: [W] = [AK][U]-1; [W]T = [U][AK]T (9) where [U] is an
arbitrary K 2. Hierarchical Back-Propagation Neural Network The basic back-propagation network can be further extended to construct a hierarchical neural network by adding two more hidden layers into the existing network, in which the three hidden layers are termed as combiner layer, compressor layer and decombiner layer. The structure can be shown in Figure 2. The idea is to exploit correlation between pixels by inner hidden layer and to exploit correlation between blocks of pixels by outer hidden layers. From input layer to combiner layer and decombiner layer to output layer, local connections are designed which has the same effect as M fully connected neural sub-networks.
Training of such a neural network can be conducted in terms of: (i) Outer Loop Neural Network (OLNN) Training; (ii) Inner Loop Neural Network (ILNN) Training; and (iii) Coupling weight allocation for the Overall Neural Network. 3. Adaptive Back-Propagation Neural Network Adaptive back-propagation neural network is designed to make the neural network compression adaptive to the content of input image. The general structure for a typical adaptive scheme can be illustrated in Fig. 3, in which a group of neural networks with increasing number of hidden neurones, (hmin, hmax), is designed. The basic idea is to classify the input image blocks into a few sub-sets with different features according to their complexity measurement. A fine tuned neural network then compresses each sub-set.
Training of such a neural network can be designed as: (a) parallel training; (b) serial training; and (c) activity based training; The parallel training scheme applies the complete training set simultaneously to all neural networks and use S/N (signal-to-noise) ratio to roughly classify the image blocks into the same number of sub-sets as that of neural networks. After this initial coarse classification is completed, each neural network is then further trained by its corresponding refined sub-set of training blocks. Serial training involves an adaptive searching process to build up the necessary number of neural networks to accommodate the different patterns embedded inside the training images. Starting with a neural network with pre-defined minimum number of hidden neurones, hmin, the neural network is roughly trained by all the image blocks. The S/N ratio is used again to classify all the blocks into two classes depending on whether their S/N is greater than a pre-set threshold or not. For those blocks with higher S/N ratios, further training is started to the next neural network with the number of hidden neurones increased and the corresponding threshold readjusted for further classification. This process is repeated until the whole training set is classified into a maximum number of sub-sets corresponding to the same number of neural networks established. In the next two training schemes, extra two parameters, activity A(Pl) and four directions, are defined to classify the training set rather than using the neural networks. Hence the back propagation training of each neural network can be completed in one phase by its appropriate sub-set. The so called activity of the lth block is defined as:
and where Ap(Pl(i,j)) is the activity of each pixel which concerns its neighbouring 8 pixels as r and s vary from -1 to +1 in equation (11). Prior to training, all image blocks are classified into four classes according to their activity values which are identified as very low, low, high and very high activities. Hence four neural networks are designed with increasing number of hidden neurones to compress the four different sub-sets of input images after the training phase is completed. On top of the high activity parameter, further feature extraction technique is applied by considering four main directions presented in image details, i.e., horizontal, vertical and the two diagonal directions. These preferential direction features can be evaluated by calculating the values of mean squared differences among neighbouring pixels along the four directions. For those image patterns classified as high activity, further four neural networks corresponding to the above directions are added to refine their structures and tune their learning processes to the preferential orientations of the input. Hence, the overall neural network system is designed to have six neural networks among which two correspond to low activity and medium activity sub-sets and other four networks correspond to the high activity and four direction classifications. 4. Hebbian Learning Based Image Compression While the back-propagation based narrow-channel neural network aim at achieving compression upper bounded by K-L transform, a number of Hebbian learning rules have been developed to address the issue how the principal components can be directly extracted from input image blocks to achieve image data compression. The general neural network structure consists of one input layer and one output layer. Hebbian learning rule comes from Hebbs postulation that if two neurones were very active at the same time which is illustrated by the high values of both its output and one of its inputs, the strength of the connection between the two neurones will grow or increase. Hence, for the output values expressed as [h] = [w]T[x], the learning rule can be described as:
where, Wi(t+1)
= {wi1, wi2, ... wiN} - the ith new coupling
weight vector in the next cycle (t+1); 1
From the basic learning rule, a number of variations have been developed in the existing research. 5. Vector Quantization Neural Networks Since neural networks are capable of learning from input information and optimising itself to obtain the appropriate environment for a wide range of tasks, a family of learning algorithms have been developed for vector quantization. The input vector is constructed from a K-dimensional space. M neurones are designed to compute the vector quantization code-book in which each neurone relates to one code-word via its coupling weights. The coupling weight, {wij}, associated with the ith neurone is eventually trained to represent the code-word ci in the code-book. As the neural network is being trained, all the coupling weights will be optimised to represent the best possible partition of all the input vectors. To train the network, a group of image samples known to both encoder and decoder is often designated as the training set, and the first M input vectors of the training data set are normally used to initialise all the neurones. With this general structure, various learning algorithms have been designed and developed such as Kohonens self-organising feature mapping, competitive learning, frequency sensitive competitive learning, fuzzy competitive learning, general learning, and distortion equalised fuzzy competitive learning and PVQ (predictive VQ) neural networks. Let Wi(t) be the weight vector of the ith neurone at the tth iteration, the basic competitive learning algorithm can be summarised as follows:
where d(x, Wi(t))
is the distance in L2 metric between input vector x and the coupling
weight vector Wi(t) = {wi1, wi2, ... wiK};
K = p A so called under-utilisation problem occurs in competitive learning which means some of the neurones are left out of the learning process and never win the competition. Various schemes are developed to tackle this problem. Kohonen self-organising neural network overcomes the problem by updating the winning neurone as well as those neurones in its neighbourhood. Frequency sensitive competitive learning algorithm addresses the problem by keeping a record of how frequent each neurone is the winner to maintain that all neurones in the network are updated an approximately equal number of times. To implement this scheme, the distance is modified to include the total number of times that the neurone i is the winner. The modified distance measurement is defined as:
where ui(t) is the total number of winning times for neurone i up to the tth training cycle. Hence, the more the ith neurone wins the competition, the greater its distance from the next input vector. Thus, the chance of winning the competition diminishes. This way of tackling the under-utilisation problem does not provide interactive solutions in optimising the code-book. Around the competitive learning scheme, fuzzy membership functions are introduced to control the transition from soft to crisp decisions during the code-book design process. The essential idea is that one input vector is assigned to a cluster only to a certain extent rather than either in or out. The fuzzy assignment is useful particularly at earlier training stages which guarantees that all input vectors are included in the formation of new code-book represented by all the neurone coupling weights. Representative examples include direct fuzzy competitive learning, fuzzy algorithms for learning vector quantization and distortion equalised fuzzy competitive learning algorithm etc. 6. Predictive Coding Neural Networks Predictive coding has been proved a powerful technique in de-correlating input data for speech compression and image compression where a high degree of correlation is embedded among neighbouring data samples. Although general predictive coding is classified into various models such as AR and ARMA etc., auto-regressive model (AR) has been successfully applied to image compression. Hence, predictive coding in terms of applications in image compression can be further classified into linear and non-linear AR models. Conventional technology provides a mature environment and well developed theory for predictive coding which is represented by LPC (linear predictive coding) PCM(pulse code modulation), DPCM(delta PCM) or their modified variations. Non-linear predictive coding, however, is very limited due to the difficulties involved in optimising the coefficients extraction to obtain the best possible predictive values. Under this circumstance, neural network provides a very promising approach in optimising non-linear predictive coding. With linear AR model, predictive coding can be described by the following equation:
where p represents the predictive value for the pixel Xn which is to be encoded in the next step. Its neighbouring pixels, Xn-1, Xn-2, ... Xn-N, are used by the linear model to produce the predictive value. vn stands for the errors between the input pixel and its predictive value. vn can also be modelled by a set of zero-mean independent and identically distributed random variables. Based on the above linear AR model, a multi-layer perceptron neural network can be constructed to achieve the design of its corresponding non-linear predictor as shown in Fig. 4. For the pixel Xn which is to be predicted, its N neighbouring pixels obtained from its predictive pattern are arranged into one dimensional input vector X = {Xn-1, Xn-2, ... Xn-N} for the neural network. A hidden layer is designed to carry out back propagation learning for training the neural network. The output of each neurone, say the jth neurone, can be derived from the equation given below:
where f(v)=
To predict those drastically changing features inside images such as edges, contours etc., high-order terms are added to improve the predictive performance. This corresponds to a non-linear AR model expressed as follows:
Hence, another so called functional link type neural network can be designed to implement this type of non-linear AR model with high-order terms. The structure of the network is illustrated in Fig. 5. It contains only two layers of neurones, one for input and the other for output. Coupling weights, {wi}, between input layer and output layer are trained towards minimising the residual energy which is defined as:
RE = where
|