Wednesday, August 17, 2016

Work Product

In the following post is summarized the overall work that has been done during the GSoC.

Completed Work

  1.  Color Space Conversion Module with parallel inputs and parallel outputs interface
  2. Color Space Conversion Module with serial inputs and serial outputs interface
  3. 1D-DCT Module
  4. 2D-DCT Module
  5. Zig-Zag Scan Module
  6. Complete Frontend Part of the Encoder
  7. Block Buffer
  8. Input Block Buffer
All the above modules except 7 and 8 are merged in the main repository. Each module is verified for it's correct behavior with a software reference and additional MyHDL and vhdl/verilog convertible testbenches created for each module. All the modules are modular and scalable in terms of the NxN blocks which they can process and the fractional bits for the fixed point computations which they can handle. The documentation for the overall project is here and could be created using make html. Also, the documentation can be seen online here.The documentation includes the description for each module, interface, testbench and additionally the coverage results for each module and the synthesis results for the complete frontend part.

Github Links

      1. https://github.com/jandecaluwe/myhdl/pull/156
      2. https://github.com/cfelton/test_jpeg/pull/33
      3. https://github.com/mkatsimpris/test_jpeg/pull/2
      4. https://github.com/mkatsimpris/test_jpeg/pull/3
      5. https://github.com/cfelton/test_jpeg/pull/12
      6. https://github.com/mkatsimpris/test_jpeg/pull/4
      7. https://github.com/cfelton/test_jpeg/pull/13
      8. https://github.com/mkatsimpris/test_jpeg/pull/5
      9. https://github.com/cfelton/test_jpeg/pull/18
      10. https://github.com/cfelton/test_jpeg/pull/23
      11. https://github.com/cfelton/test_jpeg/pull/35
      12. https://github.com/cfelton/test_jpeg/pull/39
      13. https://github.com/cfelton/test_jpeg/pull/41

  • List of all PRs in the main repo: here

Work Product

In the following post is summarized the overall work that has been done during the GSoC.

Completed Work

  1.  Color Space Conversion Module with parallel inputs and parallel outputs interface
  2. Color Space Conversion Module with serial inputs and serial outputs interface
  3. 1D-DCT Module
  4. 2D-DCT Module
  5. Zig-Zag Scan Module
  6. Complete Frontend Part of the Encoder
  7. Block Buffer
  8. Input Block Buffer
All the above modules except 7 and 8 are merged in the main repository. Each module is verified for it's correct behavior with a software reference and additional MyHDL and vhdl/verilog convertible testbenches created for each module. All the modules are modular and scalable in terms of the NxN blocks which they can process and the fractional bits for the fixed point computations which they can handle. The documentation for the overall project is here and could be created using make html. Also, the documentation can be seen online here.The documentation includes the description for each module, interface, testbench and additionally the coverage results for each module and the synthesis results for the complete frontend part.

Github Links

      1. https://github.com/jandecaluwe/myhdl/pull/156
      2. https://github.com/cfelton/test_jpeg/pull/33
      3. https://github.com/mkatsimpris/test_jpeg/pull/2
      4. https://github.com/mkatsimpris/test_jpeg/pull/3
      5. https://github.com/cfelton/test_jpeg/pull/12
      6. https://github.com/mkatsimpris/test_jpeg/pull/4
      7. https://github.com/cfelton/test_jpeg/pull/13
      8. https://github.com/mkatsimpris/test_jpeg/pull/5
      9. https://github.com/cfelton/test_jpeg/pull/18
      10. https://github.com/cfelton/test_jpeg/pull/23
      11. https://github.com/cfelton/test_jpeg/pull/35
      12. https://github.com/cfelton/test_jpeg/pull/39
      13. https://github.com/cfelton/test_jpeg/pull/41

  • List of all PRs in the main repo: here

Friday, August 12, 2016

Documentation and Coverage Completed

The coverage, documentation, and synthesis results are in the PR. I am waiting for Chris to review them and tell me what to change.

Monday, August 8, 2016

Documentation

Today, I start writing the documentation for all of my modules and the complete frontend part. I will use Sphinx. As, the backend is not ready yet, I will try to fill the time by doing this task.

Sunday, August 7, 2016

Week 11

This week I complete the convertible tests for the frontend part and for the new color converter. Vikram, made a PR for the backend part, so the next days we can integrate it with my part and complete the encoder. However, the backend still lacks from complete test coverage with a software prototype. The next days till 15 Augoust which is the end of the coding period I will try to finish the following tasks:

1)Complete the encoder.
2)Ensure the correct behavior and the covertibility.
3)Synthesize it and take some measurements (resource utilization and max frequency).
4)Write documentation for all my module and clean up the code.

Sunday, July 31, 2016

Block Buffer

Today, I implemented the convertible testbench for the block buffer and the triple_buffer.

The triple buffer has 4 block buffers which store the data from the video source (lines of image) and output then to the frontend part in 8x8 blocks. Each 8x8 block is output three times to the frontend. The video source and the overall design share the same clock. So, in order for the video source input data and the output data to be read correctly 4 buffers where used. The output of the data to the frontend is continuous. The input video source can stop sending data when the stop_source signal is valid. The stop signal is True when all the buffers are full and is False when the the 3 buffers were read from the frontend. The next thing that has to be done is to add docstrings in these modules.

Moreover, I completed and the convertible testbench for the new color space converter.

Thursday, July 28, 2016

Frontend Synthesis and Block Buffer

This week I managed to eliminate the inferred latches in the design by changing the code. The unexpected latches cause a lot of timing problems. Moreover, I implemented a block buffer which takes each row serially and output a 8x8 block serially in the frontend. This block buffer needs a lot of documentation and a complete convertible testbench.

The following days I will try to complete the following uncompleted tasks

1)Include convertible testbenches in the changed designs of my modules.
2)Change pytest settings as Chris mentioned.
3)Complete the blockbuffer module
4) Glue together the backend and the frontend part to complete the encoder!!!
5) Clean up the code and write detailed documentation for each module.

Saturday, July 23, 2016

Week 9

This week we had a lot of problems in defining the interfaces between the frontend and the backend part. From my part, I changed the method of the outputs from parallel to serial in order to communicate with the backend part without problems. 

Moreover, I synthesized the frontend part and from the design there are inferred some latches which cause a lot of timing problems. I changed some of the code and there are only some latches in the color converter which must be fixed.

Also, I created a new version of the frontend part and a new version for the color converter in order to output the data serially.

There are a lot of things to do in the next days:

1) Fix the inferred latches from the design.
2) Create the block buffer module for the frontend part.
3)Complete the encoder.
4)Complete the testbenches for the changed zig zag module and the new color space converter.
5)Write documentation for all the modules.

Sunday, July 17, 2016

Week 8

This week passed with a lot of work. The frontend part of the encoder and the zig-zag module merged in the main repository. So, in this stage the frontend part is ready and the backend part of the encoder is missing in order to create the complete JPEG encoder. As discussed with Christopher, I have to write documentation for each module and don't leave it for the last days. Moreover, this week I will try to synthesize each module with different parameters and measure the utilization.

Sunday, July 10, 2016

Week 7

The PR with the zig-zag module, is waiting to be merged and reviewed by Christopher. However, a new branch is created which contains the front-end part of the encoder. This part consists of the color-space conversion, dct-2d and zig-zag modules for 8x8 blocks. From a short discussion with Christopher, we decided that hardware utilization with different configurations of the modules should be compute. This will happen the following days.

Sunday, July 3, 2016

Week 6

This week merged the 2d-dct and 1d-dct modules into the main repository. Moreover, a new branch (zig-zag) created which contains the zig-zag module and the test units. Christopher re-organized the contributed code and gave us some points to write a more compact style. In the following days, the code will be refactored according to Christopher's recommendations.

Friday, July 1, 2016

Zig-Zag Core

The 2d-dct and 1d-dct modules merged in the main repository by Christopher. Now, it's time for the zig-zag core to be implemented. In the following days I will create a new branch with the zig-zag core and it's unit test.

Sunday, June 26, 2016

Extended Modularity and Scalability

Today, I improved my 1d-dct and 2d-dct modules to include NxN blocks transformation. 1D-DCT can convert N vectors and 2D-DCT can convert NxN blocks. There were a lot of issues to deal with.

Firstly, there were some issues when I tried to use list of signals. I intergrated some of the assignments in the interfaces.

Secondly, there were some issues when I tried to use list of interfaces. I had to define some blocks in the code which use the interfaces in order to get converted.

Sunday, June 19, 2016

Midterm Evaluation

Midterm evaluation comes this week, so, I will write a little summary of the things that I have done the previous weeks.

As I described in my proposal, the first 3 weeks I implemented the color space conversion module in myhdl. I wrote all the convertible test units (VHDL, MyHDL, and verilog) in order to prove the correct behavior of the module. Moreover, I familiarized with the travis-ci, landscape and coverage tools. I changed some files in the branch to make the travis-ci to work and to increase the coverage and health of my branch. Finally, the branch which I was working there my module got merged in the original repository.

In the 4th week, I experimented with different designs of 2d-dct and chosen a simple and straightforward architecture. In the previous posts, I described the design which I used and the some of the DCT theory. In my working branch of the 2d-dct, I uploaded the module of the 1d-dct with the convertible test-benches. The 1d-dct works fine and it's correct behavior it is shown form the test units and the different tests.

The following days, I will upload the full working 2d-dct module with the respective test units.

As Christopher proposed, we should use a minimal flow for our design and if it is possible to disregard complex FSMs. Thus, the flow in the frontend part of the encoder is kept as minimal as possible. Each pixel is inserted serially in the first module (Color Space Conversion) row by row for each 8x8 block of the image. The outputs of the color space conversion module (Y, Cb, Cr) are inserted parallely in three 2D-DCTs modules and each  of the 2D-DCTs modules after some cycles, outputs parallelly the 8x8 transformed block which consists of 64 signals. The positions of the 64 signals in the 3 8x8 blocks should be changed according to the zig-zag ordering. Then, the  3 blocks are inserted in the backend part for processing.

2D-DCT Part 2

Now, I think it's time to present in details the implementation of the 2d-dct. 

In the previous post, I described that I used the row-column decomposition approach which uses two 1d-dcts and utilizes the following equation: Z=A*X*AT. The first 1d-dct takes each input serially and outputs the 8 signal vector parallely and implements the following equation: Y = A * XT. X is the 8x8 block and A is the DCT coefficient matrix as described in the previous post. The 1d-dct module constists of 8 multipliers in parallel, 8 adders and some control signals. As we have each input in the module serially the multipliers take the coefficients in each column and multiply each input. After the multiplication the product goes to the adder which adds the product with the previous stored result of the adder. So in 8 cycles plus the latency of the pipeline stages we have the first 8 signal vector which is the result of the 1d-dct. In the following picture we can see the waveform of the signals of the 1d-dct module:




In the above waveform we can see that we take each input serially and we output the 8 signal vector parallely after some cycles. For the first row the output is computed after 8+3=11 cycles, the 3 cycles are the extras cycles for the pipeline stages and for the other rows the output is valid after 8 cyles. If each row each is computed then the data_valid signal is high for one cycle. This signal is used for the other 1d-dct's in the final 2d-dct module.

The 1D-DCT module implemented in myhdl and tested for its correct behavior using test units.

Now, it's time to describe the final 2D-DCT module. After the 1d-dct module we have the result of each vector parallely, and the result is Y = A * XT. In the next 1d-dct we must implement the following equation Z = A * (A * XT)T. In order to implement the following equation we will use 8 1d-dct modules in order to compute each row of the final 2d-dct result. Each of the 1d-dct's will take each of the 8 signal result output of the first stage and will output the final result which is a 8x8 matrix parallely. This implementation have some advantages. The flow of the inputs never stops, there is no need to use additional storage for each block, and there is no need to use some complex finite state machines as the control signals integrated in the design. However this approach have more utilization needs than some other methods. We need 72 multipliers and 72 adders. In the following pictures I will present some waveforms of the signals of the 2D-DCT module.


In the above waveform we can see that each input is transformed in a signed signal after one cycle and then the value 128 is subtracted after another cycle. After two cycles the inputs is valid and inserted in the first 1d-dct stage.



In the above waveform we can see that the output signals of the 1st stage are used in each of the eight 1d-dct blocks. As we can see the data valid signal is active for one clock. This signals sets the other  8 1d-dct modules of the 2nd stage to make computations only when its high. Thus, when all the rows are processed we have the ouput of the 2d-dct. In the following waveform some of the outputs of the 2d-dct are shown.

The final 2d-dct module verified for its correct behavior with the use of test units.

Wednesday, June 15, 2016

2D-DCT Part 1


The forward 2D-DCT is computed from the following equation:








In the previous equation N is the block size, in our situation our block size is 8x8 so N=8, x(i, j) is the input sample and X(m,n) is the dct transformed matrix.

A straightforward implementation of the previous equation requires N4 multiplications. However, the DCT is a separable transform and it can be expressed in matrix notation (the row-column decomposition) as two 1D-DCT as follows:
Z=A*X*AT
The matrix A is a NxN (8x8) matrix whose basis vectors are sampled cosines defined as:
Each NxN (8x8) matrix-matrix multiply is separated into N (8) matrix-vector products. A standard block diagram of a 2D-DCT processor with the use of 1D-DCT units is shown in the following figure:

The first unit computes the Y = A*X and the second computes the Z= Y*AT. Each 1D-DCT unit must be capable of computing N multiplies per input sample.

The first 1D-DCT unit operates on rows of A and columns of  X. However, in each cycle we have the values of the elements in each row instead of the elements in each columns. So it is wiser to compute first the product Y = A * XT. Thus, in the second stage we have to perform 1D-DCT in order to compute the product Z = A*YT. It is obvious that Z = A * (A * XT)equals to Z = A * X * AT

I created a simple code in python to confirm that the matrices multiplications equals to the final 2D-DCT. I confirmed the results of the row-column decomposition approach with the results of the intergrated matlab dct() function using a test block. 

In the next part, I will describe in details my implementation of the 2D-DCT in MyHDL.

All the above pictures and equations were taken from the paper "A 100 mhz 2-d 8x8 DCT/IDCT Processor for HDTV Applications" link.



Monday, June 13, 2016

2D-DCT Implementation

The third week passed, and the color space conversion module with the unit tests merged in the original repository.

These days, I figured out how to implement the 2D-DCT with a simple and straightforward way. The implementation follows the row-column decomposition method.

First I created and tested the 1D-DCT module and then I created and tested the final 2D-DCT module. In the following posts, I will explain my design in details with waveforms and a lot of interesting stuff and some theory about the DCT.

Thursday, June 9, 2016

Created separate tests and improved code quality

As Chris pointed, I split the original test unit into two separate tests. The first test checks the color conversion module with myhdl simulator while the second test checks the outputs of the converted testbench in Verilog and VHDL with the outputs of the myhdl simulator.

Moreover, I improved the code quality of the rgb2ycbcr.py in order to increase the health of the module using landscape.io

As the 3rd week approaches, I have to read about the 2D-DCT module and how to implement it in MyHDL.

Saturday, June 4, 2016

Code Refactoring

Till to this day, I refactored all the code which I have submitted in the previous commits. As Christopher Felton showed me, the code for the color space conversion and the test-benches refactored accordingly. However, there are still some changes to be made in order for the code of the test unit to be more readable. I created a different branch which I edit there my reworked code and submitted a PR: https://github.com/mkatsimpris/test_jpeg/pull/4

These days I learned a lot of useful new things. I learned about Travis-CI and how to use it in projects in github and some other useful tools like landscape and coveralls. The experience I gained through the experimentation with these tools is invaluable. I managed to get my branch be built with some changes in travis.yml and some scripts in order to get GHDL installed.

There is some work to be done as regard as the code in color space conversion and with the help of Christopher I think that in the following week the PR will be mergeable in the main repo.

Finally, I believe that its a good time to start reading about 2D-DCT and start a discussion with Chris and Nikos about the architecture that we will use.

Monday, May 30, 2016

A lot of changes should be made

Today Chrisopher checked my pull request  as regards the color space conversion module. 
The PR can be found here:

The main issues that must be fixed are:
  • PEP8 code refactoring
  • Interfaces change (add enable_in as data_valid in interfaces)
  • Add docstring comments in interfaces and in blocks
  • Add a comment pointing to the wikipedia page or the JFIF standard for the coefficients
Also with some discussion with Nikolaos we agreed that the the converted code of the module and the testbenches must be verified for MyHDL, Verilog and VHDL.

In the following commits the above issues will be fixed.

Saturday, May 28, 2016

Make more tests

Today, Christopher point me some changes to do in my code in the pull requests in order to make my module more. I made these minor changes and I will wait for more feedback in order to improve my code in terms of scalability, modularity, and readability. When all the changes implemented the pull request will finally be merged in the original repository.

Meanwhile, I run some tests of the color transformation module with different fractional bits of the coefficients. During the tests with fractional bits of the coefficients set to 14, some of the outputs values where not correct and some tests didn't pass. This is natural because the operation with fixed point representation pose some errors.  If the fractional bits set to 15 then all the test pass.

Friday, May 27, 2016

Implementing the Color Space conversion module

In this week my responsibilities are to understand how the color space conversion module works and write a test unit for the module. However I moved a bit ahead. I implemented the module and the test unit in myhdl.

Firstly, I read a lot of material to understand how the conversion from rgb to luminance and chrominance works. The equations which used in the conversion are:


Y = (0.299*R)+(0.587*G)+(0.114*B)
Cb = (-0.1687*R)-(0.3313*G)+(0.5*B)+128
Cr = (0.5*R)-(0.4187*G)-(0.0813*B)+128

 As we can see from the above equation the coefficients in the multiplication are  number that are smaller than 1. The hardware implementation of the coefficients will be a fixed point 2s complement representation. In fixed point representation must be specified the format of the representation by specifying the integer bits, fractional bits, and sign bit. However in this equation all the numbers are below 1 so we have no integer bit, only fractional bits and sign bit. The implemented module will be scalable with respect to the specified fractional bits, and the coefficients.

 In my first attempt to implement the module and the test-bench I faced some issues which I will describe in the next paragraphs.

1) Bad conversion of the myhdl test-bench to verilog code. There is an issue in myhdl with the assignment of negative numbers from a list to a signal. The bug can be seen in this gist: https://gist.github.com/mkatsimpris/6490baf7e76e054f76880da3fabb2e65.

2) The code written in myhdl must respect each languages (VHDL, verilog) peculiarities. When I wrote the code of the module and the test unit in myhdl everything worked fine. The output values of my module (Y, Cr ,Cb) were correct with comparison to a software implementation of the color space converter. However when I tried to verify the converted test-bench in verilog everything failed. The reason was that I didn't account how the verilog handles the 2s complement operations. There is an example: (a*-b) and -(a*b) give different outputs in verilog. The correct is -(a*b) in order to handle the output like 2s complement. So some changes should be done in myhdl code in order to pass the convertion and verification test.

In order to solve all of the above issues I will verify the conversion and the correct output of the test units in VHDL with the help of GHDL instead of using verilog and iverilog.










Sunday, May 22, 2016

GSoC Community Bonding Period

Good news! My proposal got accepted from Python Software Foundation for GSoC 2016. I will be working on MyHDL sub-organization writing code from 23 May till 23 August.

The main goal of the proposed project is to create the frontend part of a JPEG encoder which will be implemented in MyHDL. The frontend part consists of the color-space converter, the 2D DCT, the top-level FSM, and the input buffer. Based on a reference design, the final implementation in MyHDL will provide a more modular and scalable design. Throughout the project, will be developed all the required unit tests to prove the correct functionality of each block of the frontend part of the encoder while keeping in mind not only to verify the correct behavior of the design but also to produce synthesizable VHDL/Verilog code. The ideal goal would be to create a fully working JPEG encoder by combining the frontend and the backend part of the encoder written in MyHDL and implement the converted code in a FPGA board in order to measure some metrics like resource utilization and performance.

As regards the community bonding period, me and Nikolaos Kavvadias my supervisor, arranged some Skype calls in order to discuss some issues about the proposal and get to know each other. Nikolaos gave me a lot of useful advises and pointed me the right directions to start designing each module. His experience and help are appreciated and I think that we will have a great time during the project.

During the community bonding period some helpful advises were given from the guys in the IRC channel which get me unstuck in some problems I had with MyHDL. Christopher Felton is online 24/7 in the channel and provides really useful help as regards MyHDL issues and general hardware issues about the project.

In the following paragraphs I will give some short details about each module which I will design in the frontend part of the JPEG encoder.

General System Architecture

The image is taken from the documentation of the reference design which will be used during the project http://opencores.org/project,mkjpeg.

Buffer Fifo Module
 
Host Data interface writes input image line by line to BUF_FIFO. This FIFO is intended to minimize latency between raw image loading and encoding start. It performs raster to block conversion (line wise input to 8x8 block conversion). BUF_FIFO is actually a RAM. BUF_FIFO must be able to store at least 8 lines of input image, so that JPEG encoding can be started for 8x8 blocks.  


RGB to YCbCr Module 

Following equation is implemented by means of multipliers and adders to perform conversion:


Y = (0.299*R)+(0.587*G)+(0.114*B)
Cb = (-0.1687*R)-(0.3313*G)+(0.5*B)+128
Cr = (0.5*R)-(0.4187*G)-(0.0813*B)+128

Constants used in this equation have format 14 bits of precision plus 1 sign bit on MSB. 

FDCT 

FDCT is intended to perform functions: RGB to YCbCr conversion, chroma subsampling, input level shift and 2D DCT (discrete cosine transform).

Zig Zag


Zig-Zag block is responsible to perform so called zig-zag scan. It is simply reorder of samples positions in one 8x8 block according to following tables. 

input order (natural):

0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
...
...





































...
...
62
63

Zig-zag output order:

0
1
5
6
14
15
27
28
2
4
7
13
16
26
29
42
3
8
12
17
25
30
41
43
9
11
18
24
31
40
44
53
10
19
23
32
39
45
52
54
20
22
33
38
46
51
55
60
21
34
37
47
50
56
59
61
35
36
48
49
57
58
62
63

It means for example that first sample (position 0) is mapped to the same output position 0. Sample 2 is mapped to output position 5, etc.

Control State Machine

Control State Machine is an FSM which will control the whole encoding process.