This is write-up about a work-in-progress simple riscv assembler in python. The riscv ISA is simple enough to implement in few 100s LOC. Besides,it looked like a good enough project for nice evening.

The following items are on my todo list:

  • define all rv32i instructions
  • handle the ABI naming convention(a0-a31, sp, etc)
  • handle pseudoinstruction defined in chapter 25
  • write tkinter gui for the assembler

riscv rv32i ISA Link to heading

It’s risc isa (duh!) with 32bit length. from Chapter 2:

In the base RV32I ISA, there are four core instruction formats (R/I/S/U), as shown in Figure 2.2. All are a fixed 32 bits in length and must be aligned on a four-byte boundary in memory.

the formats are designed for consistency to simplify the decoding hardware.

The RISC-V ISA keeps the source (rs1 and rs2) and destination (rd) registers at the same position in all formats to simplify decoding

The instruction formats are:

Example image

Instruction opcodes are defined in Chapter 24:

Example image

Implementation Link to heading

To get things rolling, I decided to implement each instruction format as derived class from Instruction. This way i can customize the parsing and code generation. Maybe I need to refactor this to consolidate the common parts(well, move fast and breath things).

class Instruction():
    def __init__(self, name):
        self.name = name
        self.type = ""
        self.hex_instruct = None
        self.asm_instruct = None

    def __str__(self):
        fields = '\n'.join([str(f) for f in self.fields])
        return f"{self.type} : {self.name} {fields}"

    def assemble(self):
        raise NotImplementedError()
class RTypeInstruction(Instruction):
    def __init__(self, name, opcode = None,func3= None, funct7  = None):
        Instruction.__init__(self, name)
        
        self.type ="R-type"

        self.fields = [
            Field("opcode",7,opcode),
            Field("rd",5),
            Field("funct3",3, func3),
            Field("rs1",5),
            Field("rs2",5),
            Field("funct7",7, funct7)
        ]

    def assemble(self, asm_instruct):
        self.asm_instruct = asm_instruct
        m = re.match(f'{self.name} ([a-z0-9]+),([a-z0-9]+),([a-z0-9]+)', asm_instruct)
        if m:
            self.fields[1].value = reg_to_index_bin(m.groups()[0])
            self.fields[3].value = reg_to_index_bin(m.groups()[1])
            self.fields[4].value = reg_to_index_bin(m.groups()[2])
            i = ''.join(f.value for f in reversed(self.fields))
        
            self.hex_instruct = bin_to_hex(i, 8)
            return self
        else:
            return None

The plan is to define all instructions using the format classes. and let the client code iterate and call format.assemble to match assembly to format classes. Again Hacky API but good for now.

rv_isa = [
    RTypeIntruction("add",  "0110011", "000", "0000000"),
    RTypeIntruction("or",   "0110011", "110", "0000000"),
    ITypeIntruction("andi", "0010011", "000"),
    STypeIntruction("sb",   "0100011", "000"),
    BTypeIntruction("beq",  "0100011", "000"),
]

Example of client code:

    code = [
        "add a2,x3,x4",
        "or x1,x2,x4",
        "andi x1,x2,8",
        "sb x1,-20(x2)",
        "beq x1,x2,-20",
    ]
    for asm in code:
        found = False
        for insformat in rv_isa:
            inst = insformat.assemble(asm)
            if inst:
                found = True
                print(inst)
                break
        if found == False:
            print("Error: asm not found")

for output, __str__ should work with all instruction format.

R-type : add opcode[7] 0110011
rd[5] 00010
funct3[3] 000
rs1[5] 00011
rs2[5] 00100
funct7[7] 0000000