Now, I think it's time to present in details the implementation of the 2d-dct.
In the previous post, I described that I used the row-column decomposition approach which uses two 1d-dcts and utilizes the following equation: Z=A*X*AT. The first 1d-dct takes each input serially and outputs the 8 signal vector parallely and implements the following equation: Y = A * XT. X is the 8x8 block and A is the DCT coefficient matrix as described in the previous post. The 1d-dct module constists of 8 multipliers in parallel, 8 adders and some control signals. As we have each input in the module serially the multipliers take the coefficients in each column and multiply each input. After the multiplication the product goes to the adder which adds the product with the previous stored result of the adder. So in 8 cycles plus the latency of the pipeline stages we have the first 8 signal vector which is the result of the 1d-dct. In the following picture we can see the waveform of the signals of the 1d-dct module:
In the above waveform we can see that we take each input serially and we output the 8 signal vector parallely after some cycles. For the first row the output is computed after 8+3=11 cycles, the 3 cycles are the extras cycles for the pipeline stages and for the other rows the output is valid after 8 cyles. If each row each is computed then the data_valid signal is high for one cycle. This signal is used for the other 1d-dct's in the final 2d-dct module.
The 1D-DCT module implemented in myhdl and tested for its correct behavior using test units.
Now, it's time to describe the final 2D-DCT module. After the 1d-dct module we have the result of each vector parallely, and the result is Y = A * XT. In the next 1d-dct we must implement the following equation Z = A * (A * XT)T. In order to implement the following equation we will use 8 1d-dct modules in order to compute each row of the final 2d-dct result. Each of the 1d-dct's will take each of the 8 signal result output of the first stage and will output the final result which is a 8x8 matrix parallely. This implementation have some advantages. The flow of the inputs never stops, there is no need to use additional storage for each block, and there is no need to use some complex finite state machines as the control signals integrated in the design. However this approach have more utilization needs than some other methods. We need 72 multipliers and 72 adders. In the following pictures I will present some waveforms of the signals of the 2D-DCT module.
In the above waveform we can see that each input is transformed in a signed signal after one cycle and then the value 128 is subtracted after another cycle. After two cycles the inputs is valid and inserted in the first 1d-dct stage.
In the above waveform we can see that the output signals of the 1st stage are used in each of the eight 1d-dct blocks. As we can see the data valid signal is active for one clock. This signals sets the other 8 1d-dct modules of the 2nd stage to make computations only when its high. Thus, when all the rows are processed we have the ouput of the 2d-dct. In the following waveform some of the outputs of the 2d-dct are shown.
The final 2d-dct module verified for its correct behavior with the use of test units.
No comments:
Post a Comment