MyHDL GSOC 2016: June 2016

Sunday, June 26, 2016

Extended Modularity and Scalability

Today, I improved my 1d-dct and 2d-dct modules to include NxN blocks transformation. 1D-DCT can convert N vectors and 2D-DCT can convert NxN blocks. There were a lot of issues to deal with.

Firstly, there were some issues when I tried to use list of signals. I intergrated some of the assignments in the interfaces.

Secondly, there were some issues when I tried to use list of interfaces. I had to define some blocks in the code which use the interfaces in order to get converted.

Sunday, June 19, 2016

Midterm Evaluation

Midterm evaluation comes this week, so, I will write a little summary of the things that I have done the previous weeks.

As I described in my proposal, the first 3 weeks I implemented the color space conversion module in myhdl. I wrote all the convertible test units (VHDL, MyHDL, and verilog) in order to prove the correct behavior of the module. Moreover, I familiarized with the travis-ci, landscape and coverage tools. I changed some files in the branch to make the travis-ci to work and to increase the coverage and health of my branch. Finally, the branch which I was working there my module got merged in the original repository.

In the 4th week, I experimented with different designs of 2d-dct and chosen a simple and straightforward architecture. In the previous posts, I described the design which I used and the some of the DCT theory. In my working branch of the 2d-dct, I uploaded the module of the 1d-dct with the convertible test-benches. The 1d-dct works fine and it's correct behavior it is shown form the test units and the different tests.

The following days, I will upload the full working 2d-dct module with the respective test units.

As Christopher proposed, we should use a minimal flow for our design and if it is possible to disregard complex FSMs. Thus, the flow in the frontend part of the encoder is kept as minimal as possible. Each pixel is inserted serially in the first module (Color Space Conversion) row by row for each 8x8 block of the image. The outputs of the color space conversion module (Y, Cb, Cr) are inserted parallely in three 2D-DCTs modules and each of the 2D-DCTs modules after some cycles, outputs parallelly the 8x8 transformed block which consists of 64 signals. The positions of the 64 signals in the 3 8x8 blocks should be changed according to the zig-zag ordering. Then, the 3 blocks are inserted in the backend part for processing.

2D-DCT Part 2

Now, I think it's time to present in details the implementation of the 2d-dct.

In the previous post, I described that I used the row-column decomposition approach which uses two 1d-dcts and utilizes the following equation: Z=A*X*A^T. The first 1d-dct takes each input serially and outputs the 8 signal vector parallely and implements the following equation: Y = A * X^T. X is the 8x8 block and A is the DCT coefficient matrix as described in the previous post. The 1d-dct module constists of 8 multipliers in parallel, 8 adders and some control signals. As we have each input in the module serially the multipliers take the coefficients in each column and multiply each input. After the multiplication the product goes to the adder which adds the product with the previous stored result of the adder. So in 8 cycles plus the latency of the pipeline stages we have the first 8 signal vector which is the result of the 1d-dct. In the following picture we can see the waveform of the signals of the 1d-dct module:

In the above waveform we can see that we take each input serially and we output the 8 signal vector parallely after some cycles. For the first row the output is computed after 8+3=11 cycles, the 3 cycles are the extras cycles for the pipeline stages and for the other rows the output is valid after 8 cyles. If each row each is computed then the data_valid signal is high for one cycle. This signal is used for the other 1d-dct's in the final 2d-dct module.

The 1D-DCT module implemented in myhdl and tested for its correct behavior using test units.

Now, it's time to describe the final 2D-DCT module. After the 1d-dct module we have the result of each vector parallely, and the result is Y = A * X^T. In the next 1d-dct we must implement the following equation Z = A * (A * X^T)^T. In order to implement the following equation we will use 8 1d-dct modules in order to compute each row of the final 2d-dct result. Each of the 1d-dct's will take each of the 8 signal result output of the first stage and will output the final result which is a 8x8 matrix parallely. This implementation have some advantages. The flow of the inputs never stops, there is no need to use additional storage for each block, and there is no need to use some complex finite state machines as the control signals integrated in the design. However this approach have more utilization needs than some other methods. We need 72 multipliers and 72 adders. In the following pictures I will present some waveforms of the signals of the 2D-DCT module.

In the above waveform we can see that each input is transformed in a signed signal after one cycle and then the value 128 is subtracted after another cycle. After two cycles the inputs is valid and inserted in the first 1d-dct stage.

In the above waveform we can see that the output signals of the 1st stage are used in each of the eight 1d-dct blocks. As we can see the data valid signal is active for one clock. This signals sets the other 8 1d-dct modules of the 2nd stage to make computations only when its high. Thus, when all the rows are processed we have the ouput of the 2d-dct. In the following waveform some of the outputs of the 2d-dct are shown.

The final 2d-dct module verified for its correct behavior with the use of test units.

Wednesday, June 15, 2016

2D-DCT Part 1

The forward 2D-DCT is computed from the following equation:

In the previous equation N is the block size, in our situation our block size is 8x8 so N=8, x(i, j) is the input sample and X(m,n) is the dct transformed matrix.

A straightforward implementation of the previous equation requires N⁴ multiplications. However, the DCT is a separable transform and it can be expressed in matrix notation (the row-column decomposition) as two 1D-DCT as follows:

Z=A*X*A^T

The matrix A is a NxN (8x8) matrix whose basis vectors are sampled cosines defined as:

Each NxN (8x8) matrix-matrix multiply is separated into N (8) matrix-vector products. A standard block diagram of a 2D-DCT processor with the use of 1D-DCT units is shown in the following figure:

The first unit computes the Y = A*X and the second computes the Z= Y*A^T. Each 1D-DCT unit must be capable of computing N multiplies per input sample.

The first 1D-DCT unit operates on rows of A and columns of X. However, in each cycle we have the values of the elements in each row instead of the elements in each columns. So it is wiser to compute first the product Y = A * X^T. Thus, in the second stage we have to perform 1D-DCT in order to compute the product Z = A*Y^T. It is obvious that Z = A * (A * X^T)^Tequals to Z = A * X * A^T.

I created a simple code in python to confirm that the matrices multiplications equals to the final 2D-DCT. I confirmed the results of the row-column decomposition approach with the results of the intergrated matlab dct() function using a test block.

In the next part, I will describe in details my implementation of the 2D-DCT in MyHDL.

All the above pictures and equations were taken from the paper "A 100 mhz 2-d 8x8 DCT/IDCT Processor for HDTV Applications" link.

Monday, June 13, 2016

2D-DCT Implementation

The third week passed, and the color space conversion module with the unit tests merged in the original repository.

These days, I figured out how to implement the 2D-DCT with a simple and straightforward way. The implementation follows the row-column decomposition method.

First I created and tested the 1D-DCT module and then I created and tested the final 2D-DCT module. In the following posts, I will explain my design in details with waveforms and a lot of interesting stuff and some theory about the DCT.

Thursday, June 9, 2016

Created separate tests and improved code quality

As Chris pointed, I split the original test unit into two separate tests. The first test checks the color conversion module with myhdl simulator while the second test checks the outputs of the converted testbench in Verilog and VHDL with the outputs of the myhdl simulator.

Moreover, I improved the code quality of the rgb2ycbcr.py in order to increase the health of the module using landscape.io

As the 3rd week approaches, I have to read about the 2D-DCT module and how to implement it in MyHDL.

Saturday, June 4, 2016

Code Refactoring

Till to this day, I refactored all the code which I have submitted in the previous commits. As Christopher Felton showed me, the code for the color space conversion and the test-benches refactored accordingly. However, there are still some changes to be made in order for the code of the test unit to be more readable. I created a different branch which I edit there my reworked code and submitted a PR: https://github.com/mkatsimpris/test_jpeg/pull/4

These days I learned a lot of useful new things. I learned about Travis-CI and how to use it in projects in github and some other useful tools like landscape and coveralls. The experience I gained through the experimentation with these tools is invaluable. I managed to get my branch be built with some changes in travis.yml and some scripts in order to get GHDL installed.

There is some work to be done as regard as the code in color space conversion and with the help of Christopher I think that in the following week the PR will be mergeable in the main repo.

Finally, I believe that its a good time to start reading about 2D-DCT and start a discussion with Chris and Nikos about the architecture that we will use.