Sunday, 21 August 2016

GSoC final summary and development

Hey guys,

This is my final post in the GSoC series of posts for the myhdl version of  Leros Tiny Processor. Here I will describe the work done, what were some of the challenges I faces, the work still remaining, and my future plans for this project.

My GSoC project was to redesign the Leros tiny processor in myhdl,  convert and synthesize,  test it on hardware,  and develop a command bridge assembly for it to be interfaced with other rhea designs.  
The entire project repository can be found at:

https://github.com/forumulator/pyLeros

The pull requests that divide the project into stages(now merged) are at:

4.  The conversion branch. The complete convertible code, including the example programs, assembler, ans the generated .rom and .vhd files. 
https://github.com/forumulator/pyLeros/pull/4

3.  The python based simulator and increased test coverage
https://github.com/forumulator/pyLeros/pull/3

2.  The complete working and tested pyLeros modules
https://github.com/forumulator/pyLeros/pull/2

1.  Work before the midterms PR (partly the modules and the tests).
https://github.com/forumulator/pyLeros/pull/1

pyLeros Development Summary 


In my GSoC proposal, I had outlined 5 major goals for the development of leros:
1. Writing the tools like simulator, assembler. 
2. Development of the processor and test suite.
3. Code Refactoring and conversion to myhdl
4. Synthesis and harware testing with examples.
5. Writing of the UART + Command bridge on Leros, the real world application.

I'm happy to tell you that my goals 1, 2, and 3, are all over and done with. The various python based modules have been written and well tested(coveralls indicate that the test coverage is currently around 88%) . I wrote a simulator in python for the processor, which has also been used for the tests. This is a good idea because processor simulators are much easier to write in software than the processors themselves, and this increases test coverage. Many of the bugs were caught this way. Many challenges were also faced, like hazards, and getting the timing down just right.

That part(points 1, 2) was done in the first 7 weeks. Unfortunately,  it took more time that originally anticipated because the slightest of bugs in the description can cause the program to run awol. Also, debugging is hard because you actually have to go to the executing step by step, looking at the execution of each instruction.

Then I started the code refactoring and conversion phase. This is important because lots of things that can be written in software and run perfectly on simulation can either not convert to VHDL or not synthesize. For example, the decoder signals, which go from the decoder to the various components and cause the execution, were originally a list of signals. Since this can't be converted, I have hence used interfaces. Other things were also changes, that were giving either no or poor results in VHDL.

Synthesis and testing

Finally synthesis and testing part, which was further refining the structure of myhdl so that the converted VHDL can be synthesized, and is semantically correct too( This is important, so that resources are not mis-assigned while synthesis). There is actually still some bugs in this, for example, the synthesizer keeps assigning a couple thousand registers instead of on board memory for the RAM.

Examples and hardware testing

Next came the examples. Because I felt like I was repeating almost exactly martin had done with his assembler in Java, I scraped my half-written python assembler and ported over the java one in a day. Now I had examples that I wrote, some of the common tasks that can be used to test processors, for example sorting algorithms, etc. These were written, assembled, tested on simulation, converted to VHDL, tested on VHDL simulation,  and finally tested on the FPGA. And they work!

External Design of the processor


The main design of the processor goes as follows: We write and assembly file, and assemble it. While instantiating the processor in our designs, we pass this file as arg to the main pyleros @myhdl.block, and also connect the 16- bit I/O ports and 1-bit strobes to the appropriate places. And the working is on. Such a design,a long with the I/O also have tests in the test suite. This was done because the processor is supposed to work as a general purpose peripheral, so the memories have been included and are not exposed outside. This design can be modified as needed.

Git development flow.

The git development workflow I followed was something like this: A little of the initial development was done was in the branch. Then I moved the development to the branch core. After the mid terms, the a PR was given from the core branch to the master, and the development continued in dev-exp. dev-stg2 contains the code for simulation and some refactoring. Branch conversion contains the conversion refactoring, tests with the assembled code, and generated vhdl, with the synthesis. The branched have subsequently branched from the previous one after giving a PR to the one before that. The branches have been merged to main after successfull testing phase, because of time constraints.

The PR's can be viewed here:
https://github.com/forumulator/pyLeros/pulls

Future plans

Unfortunately, one of the goals outlined, creating of the command bridge on pyleros, I wasn't able to complete, and has been postponed to after GSoC. However, I have a clear view in mind of what has to be done, and I expect to finish it off in the next couple of weeks. That way, I can also test the processor with existing rhea cores. Beyond that, the part remaining is writing more examples, trying to increase coverage to 100%, the works. In short, the majority (over 95%) of the project has been done.

And that brings us to the end of this long post. It has been a long and eventful summer with many ups and downs along the way. But after all this, the project is finally done, and I can tell anyone who asks that summer 2016 was a summer well spent!






Sunday, 26 June 2016

Week 5 Summary - Code cleanup and test coverage

Hello Readers,

I am back again with my weekly updates. This week was the midterm evaluations week. Thus, the first half of the week passed in code cleanup and increasing health. This to make a pull request from core branch to the master, which will be evaluated for the midterms. After that, starting thursday, I worked on the dev-exp branch, where I focused on writing more tests, and eliminating the errors, module wise.

Test Coverage 

Test coverage of a code measures how much of the code has been covered in the tests written. Loosely, this corresponds to how many of the loc/ possible paths the tests written are covering. Test coverage is absolutely vital before deploying any code, because, once it's all integrated together, if a subtle error occurs, it becomes very difficult to determine where the bug is. Test coverage eliminates that by testing each small part of the code individually, which can help us 'catch' bugs.

In a TDD based development flow, we write some of the tests first, to get an idea of the interface to whatever we are going to code, and also an idea of how the actual code would be. However, it becomes difficult sometimes to test all the possible paths the code is going to take before we write the code itself. 

Thus, I mainly wrote the tests for the fetch/decode and execute module of pyleros. It is a bit tricky to write individual tests, because the pipeline stages were only designed to work with one another(synchronously) and not separately. In the design, the data flows automatically from fedec to execute and vice-versa. Thus to design tests for say fedec, I had to somewhat emulate parts of the other module(execute) using my code. So, for example, to test the add instruction, I initialize all the IN/ OUT signals required, and the instantiate the myhdl.block for fedec. I also initialise the alu. Then, I pass the OUT signals of the fedec to the IN of alu, and finally make an assertion on the result. In the input of the fedec, I give the corresponding instructions, which I then test one by one weather they produce correct results from the alu. This tests weather the fetch and decoding of the instruction is happening correctly. In this way, the instruction set is divided into different classes, and tested for each class. A similar procedure for the execute module. 

Future Plan

In the next week, I plan to:
1.) Further increase test coverage, ideally taking it > 95%.
2.) Write examples for the simulation and test that they work correctly. 
3.) Refactor the code to use interfaces and some of the other higher level features of myhdl.

This should have my simulation part ready in the next week. The week after that will be focused on hardware setup and testing. 

All in all, the work is going pretty well and is thoroughly enjoyable.

Tuesday, 21 June 2016

Midterm report and future plan

Hello Readers, 

The mid-term evaluations are here. For this, I am required to submit a report of my work so far, and list the plan for the future weeks. So here goes.

The work till now

 In the month since GSoC coding period started, I have :

 Week 1 - 2 : Created the tools simulator and assembler for the core, to better understand the design and architecture set. 
 Week 3 - 4 : Written tests for and coded the main modules of the processor 

With respect to the timeline detailed in my GSoC proposal, I have met most of my deadlines. The core and tools have been coded. Tests have been written and are passing for the most part(some test that are not yet passing are marked @pytest.mark.xfail, to be fixed next) .

A PR has been given from the main development branch, core to the master of the repo, which can be seen at:

https://github.com/forumulator/pyLeros/pull/1

Issues

Unfortunately, I had to take a couple of unplanned trips urgently due to which my work, and more importantly, the work flow, suffered in the first couple of weeks. But I have worked extra during the next two week to make up for the slow start, and now I am almost at my midterm goals.

Work wise, the one major thing that I planned that has been shifted to post-midterm is setting up the hardware and testing the core on Atlys and Basys FPGA, both of which I own. Unfortunately, this is not the simplest task. Subtle issues in the code manifest themselves in the actual hardware execution that do not during simulation. For example, there's the issue of delta delay that occur between simulation steps which are not present in the hardware, which can lead to subtle nuances. Further setting up I/O properly for the boards a significant task. This make building for hardware different from building for simulation.

Plan for the coming weeks

In the next couple of weeks, I plan to have a completely working processor, including on the hardware. Further, the code will be refactored to take advantages to some of the advances features of myHDL including interfaces. That leaves me with enough time to devote to working on the SoC design, and comparison on VHDL and myHDL versions of the core. 

Week 5: Clean up the code and add documentation wherever missing. Make sure that all the tests pass and the simulation of the processor is working
Week 6: Add I/O, reusing uart from rhea if possible. Refactor the code to use interfaces. Write small examples for the instruction set. 
Week 7: Setup the Atlys and Basys boards. Make sure that the processor works on FPGAs, along with all the examples. Add I/O for the hardware. Write a script to build for the two boards.

In conclusion, I worked, had issues, completed almost all goals for the midterm evals, and hope to resolve the issues in the coming weeks. I'm really enjoying this experience.

Week 3 - 4 summary : Pyleros myHDL code

Reserved

Week 1-2 Summary : Assembler and Simulator

Hello Readers,

This post is a little late in coming, I know. As I mentioned in the earlier post, I was completely cut off from the internet for the first couple of weeks, and the communication part of the project has been a little weak.

Anyway, this is about the work done in the first 2 weeks. The first 2 weeks were dedicated to studying the design of the processor and creating the tools, including the simulator and the assembler. Creating the assembler linker helped to get thoroughly familiar with the instruction set, while the simulator helps understand the data paths that need to be build in the actual processor. Plus, these tools are useful in quickly writing examples to test on the actual core.

What follows is a description of both the tools.


Simulator:

An instruction set simulator, or simply a simulator, for those who don't know, is a piece of software that does what the processor would do, given the same input. We are 'simulating' the behaviour of hardware on a piece of software. The mechanism, of course, is completely different.

A ISS is usually build for a processor to model how it would behave in different situations.  Compared to describing the entire datapath of the processor, a simulator is much simpler to code.

0x08 0x12 #ADD r1

Since the design of Leros is accumulator based, one of the operands is implicit(the accumulator) and this instruction describes adding the content of memory location r1 to the contents of accumulator, and storing it back in the acc.
Where 0x08 is the opcode, and 0x12 is the address of the register described by the identifier r1. The actuall processor would involve a decoder.:

On a simulator, this can easily be modelled by a decoder function containing if-else statements that do the same job, for example,

if instr & 0xff00:
    addr = instr & 0xff
    val = data_mem[addr]
    acc += val

The storage units, for example, accumulator, register file, or the data/ instruction memory, is modelled by variables. And that's pretty much there is to a simulator. 

Assembler:

Before assembly code can be simulated, it needs to be assembled into binary for a particular instruction set, and that is the job of the assembler. The major difference between an assembler and compiler is that most of assembly code is just a human readable version of the binary that the processor executes. The major job of an assembler is:

1. Give assembler directives for data declaration, like a_: DA 2, which assigns an array of two bytes to a.
2. Convert identifiers to actuall memory locations. 
3. Convert instructions fully in to binary.

When the programs is split into multiple files, there are often external references, which are resolved by the linker. The linker's job is to take two assembled files, resolve the external references, and convert them to a single memory for loading. 

Leros instruction set and tools


Since the leros instruction set is of constant length(16 bit) and uses only one operand(the other being implicit, the accumulator), the job was greatly simplified. The first pass, as described above, has to maintain a list of all the identifiers. There are no complex instructions like in the 8085 instruction set, or a complex encoding like the MIPS instruction set. 

The high 8 bits represent the opcode, with the lowest opcode bit representing if the instruction is immediate. The next two bits are used to describe the alu operation, which can be arithametic like
ADD, SUB 
or logical like 
OR, AND, XOR, SHR
 Data read and write from the memory is done using the instructions
LOAD, STORE, IND LOAD, IND STORE, LOADH 
The addressing can be either direct, or immediate, with the first 256(2^8) words of the memory directly accessible with address given as tge lower 8 bits instr(7 downto 0) describing the address. The higher addresses can be accessed by using indirect load stores, in which an 8 bit offset is added to the address, which is also retrieved from the memory using a load. 

Finally, branching is done by using the
BRANCH, BRZ, BRNZ, BRP, BRN
instructions, which respectively mean the unconditional branch, branch if zero, branch if non zero, branch if positive, branch if negative. 

I/O can be specified by the
IN, OUT
instructions along with the I/O address given as the lower 8-bits of the instruction. 

That's the end of that. Stay tuned for more!

Monday, 20 June 2016

GSoC developement progress and the first blog

June 20, 2016
Hello Readers,

About the project:

I got selected to GSoC 2016 for the Leros Microprocessor project under the myHDL organization which is a sub-org of python.  The project consists of me porting and refactoring code for the Leros microprocessor, from VHDL, in which it was originally developed, by Martin Schoeberl(https://github.com/schoeberl), to python and myHDL. This will then be used to build small SoC designs and test the performance on the real hardware on the Atlys and the Basys development board. The other advantage of Leros is that it is optimized for minimal hardware usage on low cost FPGA boards. The architecture and instruction set, and the pipelines have been constructed with this as the primary aim.

The original Github for the VHDL version is available at: https://github.com/schoeberl/leros , and the documentation with the details at: https://github.com/schoeberl/leros/blob/master/doc/leros.pdf

The situation so far

 
The GSoC coding period began on 22 May 2016, and ends on 27 August, 2016. Today, the date is 20 June, 2016. It has been almost a month since the start of the coding period, and due to unfortunate circumstances, the work, I'm sorry to stay, was a little slow in the first couple of weeks. On top of that, I have not really blogged about my progress all that frequently, and thus the situation looked quite bleak a couple of weeks ago. However, week 3 saw a dramatic rise in the amount of work being done, and thanks to the extra week I reserved before midterm, I have almost completely caught up to all my goals for the midterm. The blogging was still little laggy, but I will be making up for that with posts describing my weekly progress for the first 4 weeks henceforth.

Summary of weekly work

The summary of my weekly work is as follows:

Community bonding period: Wrote code samples and get familiar with the myHDL design process.
Week 1: Studied the design of Leros thoroughly and decide the major design decisions for the python version. Started with the instruction set simulator.

Week 2:  Finished with the instruction set simulator.

Week 3: Wrote a crude assembler and linker to complement the simulator which has a high level version of the processor. Started on the actual core with the tests.

Week 4: Integration and continued work on the actual core. The core is more or less where it should be according to my timeline.

As mentioned earlier, I will be following up will blog posts detailing the work of each of the weeks described earlier.

Further work and midterms eval

TO DO: The major thing that I have not been able to do is setup the processor on actual hardware( the atlys and basys boards), as planned before the midterm. That has been shifted to the week after the midterms.

The work for this week, before the midterm evaluation is to clean up the code in the development branches and make sure the tests pass, then give a PR to the master which I will be showing for the midterm evaluations.

I will also be writing a midterm blog post detailing the complete work and report for the evaluation.

I am immensely enjoying my work so far.

Wednesday, 23 March 2016

First Post



This is the first post the Leros Developement blog for Google Summer of Code 2016. Application ready, fingers crossed. Hopefully lots more to come soon!