This is write-up about a work-in-progress simple riscv assembler in python. The riscv ISA is simple enough to implement in few 100s LOC. Besides,it looked like a good enough project for nice evening.
The following items are on my todo list:
- define all rv32i instructions
- handle the ABI naming convention(a0-a31, sp, etc)
- handle pseudoinstruction defined in chapter 25
- write tkinter gui for the assembler
riscv rv32i ISA Link to heading
It’s risc isa (duh!) with 32bit length. from Chapter 2:
In the base RV32I ISA, there are four core instruction formats (R/I/S/U), as shown in Figure 2.2. All are a fixed 32 bits in length and must be aligned on a four-byte boundary in memory.
the formats are designed for consistency to simplify the decoding hardware.
The RISC-V ISA keeps the source (rs1 and rs2) and destination (rd) registers at the same position in all formats to simplify decoding
The instruction formats are:
Instruction opcodes are defined in Chapter 24:
Implementation Link to heading
To get things rolling, I decided to implement each instruction format as derived class from Instruction
.
This way i can customize the parsing and code generation. Maybe I need to refactor this to consolidate the common parts(well, move fast and breath things).
class Instruction():
def __init__(self, name):
self.name = name
self.type = ""
self.hex_instruct = None
self.asm_instruct = None
def __str__(self):
fields = '\n'.join([str(f) for f in self.fields])
return f"{self.type} : {self.name} {fields}"
def assemble(self):
raise NotImplementedError()
class RTypeInstruction(Instruction):
def __init__(self, name, opcode = None,func3= None, funct7 = None):
Instruction.__init__(self, name)
self.type ="R-type"
self.fields = [
Field("opcode",7,opcode),
Field("rd",5),
Field("funct3",3, func3),
Field("rs1",5),
Field("rs2",5),
Field("funct7",7, funct7)
]
def assemble(self, asm_instruct):
self.asm_instruct = asm_instruct
m = re.match(f'{self.name} ([a-z0-9]+),([a-z0-9]+),([a-z0-9]+)', asm_instruct)
if m:
self.fields[1].value = reg_to_index_bin(m.groups()[0])
self.fields[3].value = reg_to_index_bin(m.groups()[1])
self.fields[4].value = reg_to_index_bin(m.groups()[2])
i = ''.join(f.value for f in reversed(self.fields))
self.hex_instruct = bin_to_hex(i, 8)
return self
else:
return None
The plan is to define all instructions using the format classes. and let the client code iterate and call format.assemble
to match assembly to format classes. Again Hacky API but good for now.
rv_isa = [
RTypeIntruction("add", "0110011", "000", "0000000"),
RTypeIntruction("or", "0110011", "110", "0000000"),
ITypeIntruction("andi", "0010011", "000"),
STypeIntruction("sb", "0100011", "000"),
BTypeIntruction("beq", "0100011", "000"),
]
Example of client code:
code = [
"add a2,x3,x4",
"or x1,x2,x4",
"andi x1,x2,8",
"sb x1,-20(x2)",
"beq x1,x2,-20",
]
for asm in code:
found = False
for insformat in rv_isa:
inst = insformat.assemble(asm)
if inst:
found = True
print(inst)
break
if found == False:
print("Error: asm not found")
for output, __str__
should work with all instruction format.
R-type : add opcode[7] 0110011
rd[5] 00010
funct3[3] 000
rs1[5] 00011
rs2[5] 00100
funct7[7] 0000000