Introduction
These are my personal notes for learning RISC-V.
A compiled version can be found at https://dmytrish.net/lib/riscv
RISC-V basics
The basic sets and extensions
The basic instruction sets are:
- RV32I for 32-bit integers and addresses;
- RV64I for 64-bit integers and addresses;
- Some thought was given to RV128I as well.
A specific implementation MUST implement the basic set (allowing for software emulation of some instructions), and CAN implement standard and non-standard extensions.
Most common standard extensions:
- integer Multiplication
- integer Atomic operations
- single-precision (32bit) Floating-point operations
- Double-precision (64bit) floating-point operations
- Compressed instruction encoding
The General variant, RVnnG, is a shortcut for RVnnIMAFD
A popular compilation target is the GC combination:
$ rustup target list | grep riscv
riscv32i-unknown-none-elf
riscv32imac-unknown-none-elf
riscv32imc-unknown-none-elf
riscv64gc-unknown-linux-gnu
riscv64gc-unknown-none-elf
riscv64imac-unknown-none-elf
Endianness
The base set memory system is assumed to be little-endian in respect to parcels.
16-bit parcels can be of any encoding, e.g.
// Example: store x2 at x3 in native endianness
sh x2, 0(x3) // store the low parcel of x2 at x3
srli x2, x2, 16 // right-shift the integer by 16 bit
sh x2, 2(x3) // store high parcel of x3
Exceptions, traps, interrupts
Exceptions are caused by not being to proceed with the normal execution of a thread. Trap is a synchronous transfer of control to a trap handler (usually executed in a more privileged environment).
Interrupts are caused by events external to the current thread of execution.
Instruction Encoding
Instruction Length Encoding
The base set contains fixed-length 32-bit instructions, naturally aligned to
32-bit boundaries. This instructions have the five least significant bits set to
xxx11
except 11111
(see below).
The encoding supports variable-length instructions of one or more 16bit parcels (with least significant bits 00, 01, 10).
The standard C (compressed) extension relaxes boundaries to be 16bit.
___________________
| xxxxxxxx xxxxxxAA | 16-bit (AA ≠ 11)
_______________________________________
| xxxxxxxx xxxxxxxx | xxxxxxxx xxxBBB11 | 32-bit (BBB ≠ 111)
_______________________________________________
..xxxx | xxxxxxxx xxxxxxxx | xxxxxxxx xx011111 | 48-bit
_______________________________________________
..xxxx | xxxxxxxx xxxxxxxx | xxxxxxxx x0111111 | 64-bit
_______________________________________________
..xxxx | xxxxxxxx xxxxxxxx | xNNNxxxx x1111111 | (80 + 16*NNN)-bit, NNN ≠ 111)
Encodings
Every RV32I instruction is 32 bit long.
It is encoded by one of the six encoding types (R-type, I-type, S-type/SB-type, U-type/UJ-type) and may contain the following parts:
part | instruction bits | description | which encoding types |
---|---|---|---|
opcode | 7 at [6:0] | operation selector | (in all encoding types) |
funct3 | 3 at [14:12] | suboperation selector | (except U/UJ type) |
funct7 | 7 at [31:25] | suboperation selector | (only in R type) |
rd | 5 at [11:7] | Destination Register index | (except S/SB type) |
rs1 | 5 at [19:15] | Source 1 Register index | (except U/UJ type) |
rs2 | 5 at [24:20] | Source 2 Register index | (R, S/SB type) |
imm | 5/7/12/20 bits | an immediate value | (except R-type) |
R-type encoding
Contains rd, funct3, rs1, rs2, funct7
bit R-type
_____ ___
31 |
30 |
29 |
28 | funct7
27 |
26 |
25 _|_
__24 |
23 |
22 | rs2
21 |
20 _|_
19 |
18 |
17 | rs1
__16 |
15 _|_
14 |
13 | funct3
12 _|_
11 |
10 |
9 | rd
___8 |
7 _|_
6 |
5 |
4 |
3 | opcode
2 |
1 | 1
___0 _|_1
I-type encoding
Contains rd, funct3, rs1, 12-bit imm
bit I-type
_____ ___
31 |
30 |
29 |
28 |
27 |
26 | imm[11:0]
25 |
__24 |
23 |
22 |
21 |
20 _|_
19 |
18 |
17 | rs1
__16 |
15 _|_
14 |
13 | funct3
12 _|_
11 |
10 |
9 | rd
___8 |
7 _|_
6 |
5 |
4 |
3 | opcode
2 |
1 |
___0 _|_
S-type encoding
Contains funct3, rs1, rs2, 12-bit imm (imm[11:5] at bit 25, imm[4:0] at bit 7).
bit S-type
_____ ___
31 |
30 |
29 |
28 | imm[11:5]
27 |
26 |
25 _|_
__24 |
23 |
22 | rs2
21 |
20 _|_
19 |
18 |
17 | rs1
__16 |
15 _|_
14 |
13 | funct3
12 _|_
11 |
10 |
9 | imm[4:0]
___8 |
7 _|_
6 |
5 |
4 |
3 | opcode
2 |
1 |
___0 _|_
SB-type encoding
Like S-type, but without imm[0], imm[12] at bit 31, imm[11] at bit 7
bit SB-type
_____ ___
31 |_imm[12]
30 | imm[10]
29 | imm[9]
28 | imm[8]
27 | imm[7]
26 | imm[6]
25 _|_imm[5]
__24 |
23 |
22 | rs2
21 |
20 _|_
19 |
18 |
17 | rs1
__16 |
15 _|_
14 |
13 | funct3
12 _|_
11 | imm[4]
10 | imm[3]
9 | imm[2]
___8 |_imm[1]
7 _|_imm[11]
6 |
5 |
4 |
3 | opcode
2 |
1 |
___0 _|_
U-type encoding
Contains rd, 20-bit imm at [31:12]
bit U-type
_____ ___
31 |
30 |
29 |
28 |
27 |
26 |
25 |
__24 |
23 |
22 |
21 | imm[31:12]
20 |
19 |
18 |
17 |
__16 |
15 |
14 |
13 |
12 _|_
11 |
10 |
9 | rd
___8 |
7 _|_
6 |
5 |
4 |
3 | opcode
2 |
1 |
___0 _|_
UJ-type encoding
Like U-type, but imm without imm[0], imm[20] at 31, imm[10:1] at 21, imm[11] at 20, imm[19:12] at 12
bit U-type
_____ ___
31 |_imm[20]
30 | imm[10]
29 | imm[9]
28 | imm[8]
27 | imm[7]
26 | imm[6]
25 | imm[5]
__24 | imm[4]
23 | imm[3]
22 | imm[2]
21 |_imm[1]
20 |_imm[11]
19 | imm[19]
18 | imm[18]
17 | imm[17]
__16 | imm[16]
15 | imm[15]
14 | imm[14]
13 | imm[13]
12 _|_imm[12]
11 |
10 |
9 | rd
___8 |
7 _|_
6 |
5 |
4 |
3 | opcode
2 |
1 |
___0 _|_
RV32I
RV32I registers
The user-visible architectural state is:
- Value of
x0
/zero
is always 0; writes do not change it. - 31 read/write data registers
x1
..x31
; pc
: the program counter register pointing to the start of the current instruction
Instructions must be stored naturally aligned in little-endian byte order.
Instruction classes
RV32I specification describes 47 instructions:
- computational instructions (21 instructions)
- memory access instructions (10 instructions)
- control flow instructions (8 instructions)
- system instructions (8 instructions, control and status registers)
Major opcodes for classes
op[1:0]=11 for all instructions in RV32I.
op[4:2]= | 000 | 001 | 010 | 011 | 100 | 101 | 110 |
---|---|---|---|---|---|---|---|
op[6:5]=00 | Loads | F-ext | Fences | Arithm | AUIPC | RV64I | |
op[6:5]=01 | Stores | F-ext | A-ext | Arithm | LUI | RV64I | |
op[6:5]=10 | F-ext | F-ext | F-ext | F-ext | F-ext | RV128I | |
op[6:5]=11 | Branches | JALR | JAL | System | RV128I |
Least significant byte of an instruction by class
Byte | _3 | _7 | _B | _F |
---|---|---|---|---|
0_/8_ | Loads | F-ext | Fences | |
1_/9_ | Arithm. (I) | AUIPC | RV64I | |
2_/A_ | Stores | F-ext | A-ext | |
3_/B_ | Arithm. (R) | LUI | RV64I | |
4_/C_ | F-ext | F-ext | F-ext | |
5_/D_ | F-ext | RV128I | ||
6_/E_ | Branches | JALR | JAL | |
7_/F_ | System | RV128I |
RV32I: Computational Instructions
There are 21 computational instructions.
rd denotes a _d_estination _r_egister, rs1 and rs2 some _s_ource _r_egisters. imm stands for "immediate value".
Instruction | "C" | Meaning |
---|---|---|
add rd, rs1, rs2 | rd = rs1 + rs2 | |
sub rd, rs1, rs2 | rd = rs1 - rs2 | |
sll rd, rs1, rs2 | rd = rs1 << rs2 | shift left logical by register |
srl rd, rs1, rs2 | TODO | shift right logical by register |
sra rd, rs1, rs2 | TODO | shift right arithm. by register |
and rd, rs1, rs2 | rd = rs1 & rs2 | bitwise AND |
or rd, rs1, rs2 | rd = rs1 | rs2 | bitwise OR |
xor rd, rs1, rs2 | rd = rs1 ^ rs2 | bitwise XOR |
slt rd, rs1, rs2 | rd = ((int)rs1 < (int)rs2) | compare, set 1/0 |
sltu rd, rs1, rs2 | rd = ((uint)rs1 < (uint)rs2) | compare, set 1/0 |
addi rd, rs1, imm | rd = rs1 + (int32_t)imm | |
slli rd, rs1, imm | rd = rs1 << imm[4:0] | |
srli rd, rs1, imm | ??? | shift right logical by immediate |
srai rd, rs1, imm | ??? | shift right arithm. by immediate |
andi rd, rs1, imm | rd = rs1 & imm | bitwise AND with immediate |
ori rd, rs1, imm | rd = rs1 | imm | bitwise OR with immediate |
xori rd, rs1, imm | rd = rs1 ^ imm | bitwise XOR with immediate |
slti rd, rs1, imm | rd = ((int)rs1 < (int)imm) | sign-extend imm, compare, set 1/0 |
sltui rd, rs1, imm_ | rd = ((uint)rs1 < (uint)imm) | sign-extend imm, compare as unsigned, set 1/0 |
lui rd, imm | rd = (imm << 12) | load upper immediate, set lower 12 bits to 0 |
auipc rd, imm | rd = pc + (imm << 12) | add upper immediate to pc |
Signed integers are stored as 2's complements. All of instructions sign-extend operands if needed.
Definitions:
- logical left shift by n: equivalent to multiplication by 2^n.
- logical right shift by n: equivalent to unsigned division by 2^n, rounding towards 0
- logical arithmetical shift by n: equivalent to unsigned division by 2^n, rounding down
Notes:
- not rd, rs1 can be implemented as
xori rd, rs1, -1
- a pseudoinstruction seqz:
sltiu rd, rs1, 1
, computes if rs1 is 0. - a pseudoinstruction li rd, imm:
lui rd, imm[31:12]; addi rd, rd, imm[11:0]
- nop is usually defined as
addi x0, x0, 0
TODO: what happens on integer overflow?
auipc
is a position-independent code shortcut, e.g.:
auipc x4, 0x1
lw x4, 0x234(x4)
allows to read a word from memory at pc + 0x1234
into x4
TODO: pc
at which point?
Encoding auipc
and lui
lui
and auipc
are U-type.
Least significant byte looks like 37/B7, 17/97.
imm[31:12] | rd | opcode | |
---|---|---|---|
lui | 01 101 11 | ||
auipc | 00 101 11 |
Encoding register instructions
Instructions with rs2 are R-type:
funct7 | rs2 | rs1 | funct3 | rd | opcode | |
---|---|---|---|---|---|---|
add | 0000000 | 000 | 01 100 11 | |||
sub | 0100000 | 000 | 01 100 11 | |||
sll | 0000000 | 001 | 01 100 11 | |||
slt | 0000000 | 010 | 01 100 11 | |||
sltu | 0000000 | 011 | 01 100 11 | |||
xor | 0000000 | 100 | 01 100 11 | |||
srl | 0000000 | 101 | 01 100 11 | |||
sra | 0100000 | 101 | 01 100 11 | |||
or | 0000000 | 110 | 01 100 11 | |||
and | 0000000 | 111 | 01 100 11 |
Encoding instructions with immediates
Everything else is I-type.
imm[11:5] | imm[4:0] | rs1 | funct3 | rd | opcode | |
---|---|---|---|---|---|---|
addi | 000 | 00 100 11 | ||||
slti | 010 | 00 100 11 | ||||
sltiu | 011 | 00 100 11 | ||||
xori | 100 | 00 100 11 | ||||
ori | 110 | 00 100 11 | ||||
andi | 111 | 00 100 11 | ||||
slli | 0000000 | shamt | 001 | 00 100 11 | ||
srli | 0000000 | shamt | 101 | 00 100 11 | ||
srai | 0100000 | shamt | 101 | 00 100 11 |
Least significant byte
13
/93
for instructions with immediates33
/B3
for register instructions37
/B7
for lui17
/97
for auipc
RV32I: Memory Access Instructions
Computational instructions only work on the contents of registers. Memory access instruction exchange 8/16/32-bit values ("bytes"/ "halfs"/"words") between registers and RAM locations.
There are 5 load ("RAM to register") instructions and 3 store ("register to RAM") instructions. They use byte addresses in RAM encoded as the value in register rs1 added with 12-bit sign-extended immediate.
2 fence instructions serialize concurrent accesses to RAM from different hardware threads.
Misaligned RAM access is allowed, but can be non-atomic and/or much slower.
Load instructions
instr | description | "C" |
---|---|---|
lw rd, imm(rs1) | "load word" | int32_t *ptr = rs1 + (int32_t)imm; rd = *ptr; |
lh rd, imm(rs1) | "load half", sign-extend | int16_t *p = rs1 + (int32_t)imm; rd = (int32_t)*p; |
lb rd, imm(rs1) | "load byte", sign-extend | int8_t *p = rs1 + (int32_t)imm; rd = (int32_t)*p; |
lhu rd, imm(rs1) | "load half unsigned" | uint16_t *p = rs1 + (int32_t)imm; rd = (uint32_t)*p; |
lbu rd, imm(rs1) | "load byte unsigned" | uint8_t *p = rs1 + (int32_t)imm; rd = (uint32_t)*p; |
Store instructions
instr | description | "C" |
---|---|---|
sw imm(rs1), rs2 | "store word" | *(int32_t *)(rs1 + imm[11:0])) = rs2 |
sh imm(rs1), rs2 | "store half" | *(int16_t *)(rs1 + imm[11:0])) = (int16_t)rs2 |
sb imm(rs1), rs2 | "store byte" | *(int8_t *)(rs1 + imm[11:0])) = (int8_t)rs2 |
Fence instructions
instr | description |
---|---|
fence pred, succ | an explicit barrier for the specified kinds of concurrent memory accesses |
fence.i | an explicit barrier for writing and executing instructions in RAM concurrently |
When multiple harts, hardware threads ("cores") are present and share the same RAM, it is necessary to control how changes by one hart are perceived by another.
Some (ahem, x86_64) architectures provide sequential consistency, which guarantees that any observed state can be described by some combination of concurrent sequential changes. This model makes it easier to reason about machine code, but can significantly complicate hardware. Under sequential consistency, speculative and out-of-order execution must maintain a separate externally visible sequentially-consistent state.
Since different harts work with different areas of RAM most of the time, RISC-V assumes a relaxed memory model, which requires explicit synchronization when needed.
A fence instruction provides an ordering guarantee between memory accesses before and after the fence. The arguments describe:
- the predecessor set: kinds of accesses by prior instructions that must be completed before fence
- the successor set: kinds of accesses by subsequent instructions that must not start before the fence
The kinds of accesses are:
- R: "read memory"
- W: "write memory"
- I: "device input"
- O: "device output"
E.g. fence rw, w
guarantees that all reads and writes by preceding
instructions appear completed before this instruction and any reordered writes
by subseqent instructions must wait until this instruction.
Note: reads by subsequent instructions can happen before this fence.
A fence.i allows to synchronize RAM data-access and instruction-access. E.g. if one hart writes instructions to RAM and another executes them, fence.i guarantees that preceding stores by one hart become visible to instruction fetches from another hart after.
Encoding
Stores are in S-type format:
instr | imm[11:5] | rs2 | rs1 | funct3 | imm[4:0] | opcode |
---|---|---|---|---|---|---|
sb | 000 | 01 000 11 | ||||
sh | 001 | 01 000 11 | ||||
sw | 010 | 01 000 11 |
The following instructions are in I-type format:
instr | imm[11:0] | rs1 | funct3 | rd | opcode |
---|---|---|---|---|---|
lb | 000 | 00 000 11 | |||
lh | 001 | 00 000 11 | |||
lw | 010 | 00 000 11 | |||
lbu | 100 | 00 000 11 | |||
lhu | 101 | 00 000 11 |
instr | imm[11:0] | rs1 | funct3 | rd | opcode |
---|---|---|---|---|---|
fence | 0000 pred succ | 00000 | 000 | 00000 | 00 011 11 |
fence.i | 0000 0000 00000 | 00000 | 001 | 00000 | 00 011 11 |
Least significant byte looks like:
03
/83
for loads23
/A3
for stores0F
for fences
TODO: clarify encoding of pred/succ masks.
Control Flow Instructions
instr | description |
---|---|
beq rs1, rs2, imm[12:1] | Branch to pc + sext(imm) if rs1 = rs2 |
bne rs1, rs2, imm[12:1] | Branch to pc + sext(imm) if rs1 ≠ rs2 |
blt rs1, rs2, imm[12:1] | Branch to pc + sext(imm) if (int)rs1 < (int)rs2_ |
bltu rs1, rs2, imm[12:1] | Branch to pc + sext(imm) if (uint)rs1 < (uint)rs2 |
bge rs1, rs2, imm[12:1] | Branch to pc + sext(imm) if (int)rs1 >= (int)rs2_ |
bgeu rs1, rs2, imm[12:1] | Branch to pc + sext(imm) if (uint)rs1 >= (uint)rs2 |
jal rd, imm[20:1] | Jump and link |
jalr rd, rs1, imm[11:0] | Jump and link register |
Conditional branches
A conditional jump to anywhere in range of ±4 KiB (1K instructions)
relative to pc
(at 16 bit boundary).
Jump-and-link
jal rd, imm[20:1] (jump-and-link):
- writes the address of the subsequent instruction (
pc + 4
) to rd - sets
pc
topc + sext(imm)
, allowing for jumps in a ±1MiB range.
Note: a one-way "goto" can be jal x0, offset
to discard the "return" address.
"Long jumps" (to an arbitrary 32-bit offset
) can be done with:
auipc x1, offset[31:12]
jalr x0, offset[11:0](x1)
Jump-and-link-register
jalr rd, rs1, imm[11:0] allows for indirect jumps (switch statements, function returns, indirect function calls, vtable dispatch, etc):
- write the address of the next instruction (
pc + 4
) into rd - set
pc
topc + rs1 + sext(imm)
TODO: if rd is rs1, does it use the original value?
Encoding
jal is UJ-type.
imm[20,10:1,11,19:12] | rd | opcode | |
---|---|---|---|
jal | 11 011 11 |
jalr is I-type.
imm[11:0] | rs1 | funct3 | rd | opcode | |
---|---|---|---|---|---|
jalr | 000 | 11 001 11 |
Conditional branches are SB-type
imm[12,10:5] | rs2 | rs1 | funct3 | imm[4:1,11] | opcode | |
---|---|---|---|---|---|---|
beq | 000 | 11 000 11 | ||||
bne | 001 | 11 000 11 | ||||
blt | 100 | 11 000 11 | ||||
bge | 101 | 11 000 11 | ||||
bltu | 110 | 11 000 11 | ||||
bgeu | 111 | 11 000 11 |
Least significant byte 6_
/E_
encodes a branch instruction:
63
/E3
for conditional jumps67
/E7
for jalr6F
/EF
for jal
System Instructions
Environment interaction
There are two instructions to interact with the operating system:
- ecall for system calls
- ebreak for calling a debugger
Control and Status Registers (CSRs)
Control and Status Registers (CSRs) provide a general facility for system control and I/O. There is a CSR address space for up to 212 registers.
The instructions to modify CSRs are:
instr using rs1 | instr using imm | description |
---|---|---|
csrrw rd, csr, rs1 | csrrwi rd, csr, imm | atomically copy a value from csr to rd and overwrite csr with the value in rs1 or imm |
csrrc rd, csr, rs1 | csrrci rd, csr, imm | atomically copy a value from csr to rd and clear bits in csr 1 |
csrrs rd, csr, rs1 | csrrsi rd, csr, imm | atomically copy a value from csr to rd and set bits in a csr 1 |
Note: csrrs x1, csr, x0
can be used to read from csr
without modifying it.
It is abbreviated as a pseudoinstruciton csrr rd, csr.
Mandatory user-readable CSRs
CSR | at | description |
---|---|---|
cycle | 0xC00 | cycle counter |
cycleh | 0xC80 | upper 32 bit of cycle counter |
time | 0xC01 | real-time clock |
timeh | 0xC81 | upper 32 bit of real-time clock |
instret | 0xC02 | instructions retired counter |
instreth | 0xC82 | upper 32 bit of instret |
Encoding
All of the following are in I-format:
imm[11:0] | rs1 | funct3 | rd | opcode | |
---|---|---|---|---|---|
ecall | 0000 0000 0000 | 00000 | 000 | 00000 | 11 100 11 |
ebreak | 0000 0000 0001 | 00000 | 000 | 00000 | 11 100 11 |
csrrw | 001 | 11 100 11 | |||
csrrs | 010 | 11 100 11 | |||
csrrc | 011 | 11 100 11 | |||
csrrwi | 101 | 11 100 11 | |||
csrrci | 110 | 11 100 11 | |||
csrrsi | 111 | 11 100 11 |
1 TODO: according to the mask in rs1/imm ?
Linux "Hello world" in RISC-V GNU assembly
Let's write the smallest possible RISC-V Linux program that:
- outputs "Hello world" to the standard output
- exits successfully
Cross-compilation and RISCV-emulation packages on Ubuntu
I'm using a x86_64 machine with Ubuntu 22.04 and a RISC-V GCC toolchain from its repositories, gcc-riscv64-unknown-elf.
In order to get Linux-specific APIs for RISC-V, we'll also use
package linux-libc-dev-riscv64-cross
that provides RISC-V specific C headers in /usr/riscv64-linux-gnu/include
.
Package qemu-user allows to run RV64 binaries in a software-emulated RV64 Linux environment.
Documentation and references:
The assembly code
First of all, the program must exit successfully. On Linux, this is done via
the exit
system call: https://linux.die.net/man/2/exit
How do we write RISC-V assembly to actually call it?
According to man 2 syscalls,
the actual syscall numbers for the host instruction set architecture can be found
in /usr/include/asm/unistd.h
as __NR_xxx
constants (e.g. __NR_exit
for the exit
syscall).
This RISC-V cross-compilation toolchain defines the actual number in
/usr/riscv64-linux-gnu/include/asm-generic/unistd.h
as __NR_exit
.
According to man 2 syscall, the RISC-V way of making Linux system calls looks like this:
- put the syscall number into
a7
:li a7, __NR_exit
- put the syscall arguments into
a0
,a1
, ...,a5
ecall
performs the system call- returned values can be found in
a0
,a1
Therefore, _exit(0)
in C translates to:
li a7, __NR_exit
li a0, 0
ecall
man 2 write describes the syscall arguments:
#define STDOUT_FILENO 1
.text
# write(STDOUT_FILENO, greeting, greetlen):
li a7, __NR_write
li a0, STDOUT_FILENO # `int fd`
la a1, greeting # `const void *buf`
li a2, greetlen # `size_t count`
ecall
Symbols greeting
and greetlen
are defined in section .rodata
:
.section .rodata
greeting: .asciz "Hello world\n"
.equ greetlen, . - greeting
The complete assembly code in hello.S
(capital .S
means "assembly source, preprocessed"):
#include <asm-generic/unistd.h>
#define STDOUT_FILENO 1
.section .rodata # a section for read-only data
greeting: .asciz "Hello world\n" # const char *greeting = "Hello world\n";
.equ greetlen, . - greeting # const size_t greetlen = sizeof greeting;
.text # a section for executable code
.globl _start # export the program entrypoint symbol for the linker
_start: # linkers use `_start` as the default entrypoint
li a7, __NR_write
li a0, STDOUT_FILENO
la a1, greeting
li a2, greetlen
ecall # write(STDOUT_FILENO, greeting, greetlen)
li a7, __NR_exit
li a0, 0
ecall # _exit(0)
1: j 1b # hang, in the very unlikely case `exit` failed
Compilation
By default, riscv64-unknown-elf-gcc
tries to link start code from some crt0.o
to enable
libc functionality. Our program does not need libc, so let's add -nostdlib
to LDFLAGS
.
To be able to include asm-generic/unistd.h
from /usr/riscv64-linux-gnu/include
, adjust
ASFLAGS
to include files from there: -I /usr/riscv64-linux-gnu/include
.
The Makefile
:
CC = riscv64-unknown-elf-gcc
ASFLAGS += -I /usr/riscv64-linux-gnu/include
LDFLAGS += -nostdlib
# GNU make has an implicit rule for %: %.S which is roughly
# $(CC) $(ASFLAGS) $(LDFLAGS) $< -o $@
hello: hello.S
# find and clean all the executables here
clean:
-find -executable -type f -delete
# `clean` is not a file!
.PHONY: clean
Running make
produces a 1208-byte ELF executable that runs via qemu-riscv64
and outputs
"Hello world" (or just run it directly if qemu-user-binfmt
is installed):
$ make hello
riscv64-unknown-elf-gcc -I /usr/riscv64-linux-gnu/include -nostdlib hello.S -o hello
$ stat hello
File: hello
Size: 1208 Blocks: 8 IO Block: 4096 regular file
...
$ qemu-riscv64 hello
Hello world
Linux "Hello world" in Rust with GNU libc
Let's write a Rust program that outputs "Hello world" to stdout and compile it to a RISC-V ELF binary. Bonus points: actually run it.
If you have a RISC-V Linux installation, congratulations, everything is handled by the default
configuration and toolchains (cargo init && cargo run
should just work).
The following instructions assume cross-compilation on a x86_64 host machine with Ubuntu 22.04.
Packages and tools for cross-compilation
Ensure that rustup is installed. This is needed to manage rustc targets.
Make sure that riscv64gc-unknown-linux-gnu
rustc target is installed:
rustup target add riscv64gc-unknown-linux-gnu
It seems to be the case that rustc targets do not try to bring their own GCC toolchains with them or guess what system packages provide it (which is reasonable, since each Linux distribution has its own non-standard packaging for cross-compilation toolchains and glibc; I wish it was not so).
For Ubuntu, make sure that the following packages are installed:
gcc-riscv64-linux-gnu
, the cross-compilation GCC toolchain for RISCV.libc6-riscv64-cross
for a dynamically-linked RISCV version of glibcqemu-user
to run a RISCV binary on x86_64patchelf
in case you want to run a dynamically-linked binary on x86_64
Documentation and references:
Compilation
Create a Cargo project template in an empty directory:
$ cargo init --name hello-libc
$ cat src/main.rs
fn main() {
println!("Hello world!");
}
Now, adjust .cargo/config.toml
:
[build]
target = "riscv64gc-unknown-linux-gnu" # build for this target by default
[target.riscv64gc-unknown-linux-gnu] # settings for this target
linker = "riscv64-linux-gnu-gcc" # mandatory to link the binary
runner = "qemu-riscv64" # to make `cargo run` work on x86\_64
linker = "riscv64gc-unknown-linux-gnu"
is crucial for cross-compilation, without it
rustc just tries to use whichever ld
it finds in $PATH and fails miserably. I still
don't understand why different GCC toolchains require mostly-the-same, but different
GNU linkers.
Also, you cannot use linker = "riscv64-unknown-linux-ld"
directly, since it will not be able
to find -lgcc_s
on its own. There might be a way to tweak this, but GNU toolchain options are
pain</rant>
Now cargo build
(--release
) should produce a RISCV executable in
target/riscv64gc-unknown-linux-gnu/debug/hello-libc
.
Compiling with a statically-linked glibc
Since Rust 1.19, it is possible to link glibc
statically. To make a statically-linked executable, use a somewhat cryptically named
target-feature=+crt-static
to rustc flags in .cargo/config.toml
:
[target.riscv64gc-unknown-linux-gnu]
...
rustflags = [
"-C", "target-feature=+crt-static", # link glibc statically
]
Running it
Let's assume the following shell variables are set, to make snippets more human-friendly:
$ BIN_DIR=./target/riscv64gc-unknown-linux-gnu/debug
$ RV_SYS_DIR=/usr/riscv64-linux-gnu
The compiled binary, $BIN_DIR/hello-libc
, can be copied to a RISC-V Linux installation
and it should be able to run there (I did not verify this).
If it is statically linked, it should just work with qemu-riscv64
.
If you want to run a dynamically linked RISC-V executable on a x86_64 machine, things get complicated:
$ qemu-riscv64 $BIN_DIR/hello-libc
qemu-riscv64: Could not open '/lib/ld-linux-riscv64-lp64d.so.1': No such file or directory
There is no such ELF loader ("ELF interpeter"), /lib/ld-linux-riscv64-lp64d.so.1
,
installed, but:
$ apt-file search ld-linux-riscv64-lp64d.so
libc6-riscv64-cross: /usr/riscv64-linux-gnu/lib/ld-linux-riscv64-lp64d.so.1
[Peek under this fold to see] what does not work to fix this.
-
trying to change the executable
RPATH
does not change the hardcoded ELF interpeter path. It is not a regular shared library and always is an absolute path. -
there seem to be no way to convince the linker to use the interpreter in
$RV_SYS_DIR/lib/
. The GNU toolchain insists on hardcoding a specific ELF interpreter path it was itself configured with.
One (dirty) way to solve this is to symlink $RV_SYS_DIR/lib/ld-linux-riscv64-lp64d.so.1
into /lib/
manually and use LD_LIBRARY_PATH=$RV_SYS_DIR/lib
to override system libraries.
A better way is to patch the ELF interpreter and RPATH in (a copy of) the executable:
$ patchelf $BIN_DIR/hello-libc \
--set-interpreter $RV_SYS_DIR/lib/ld-linux-riscv64-lp64d.so.1 \
--set-rpath $RV_SYS_DIR/lib
$ qemu-riscv64 $BIN_DIR/hello-libc
Hello world!
Troubleshooting the linking
These are some tricks I found useful to understand what was (not) going on:
-
troubleshooting link failures with
cargo build
with-vv
.The actual rustc command is still a hostile lump of text.
$ cargo build -vv Compiling hello-libc v0.1.0 (/home/user/code/lang/arch/riscv/hello-libc) Running `CARGO=/home/user/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/bin/cargo CARGO_BIN_NAME=hello-libc CARGO_CRATE_NAME=hello_libc CARGO_MANIFEST_DIR=/home/user/code/lang/arch/riscv/hello-libc CARGO_PKG_AUTHORS='' CARGO_PKG_DESCRIPTION='' CARGO_PKG_HOMEPAGE='' CARGO_PKG_LICENSE='' CARGO_PKG_LICENSE_FILE='' CARGO_PKG_NAME=hello-libc CARGO_PKG_REPOSITORY='' CARGO_PKG_RUST_VERSION='' CARGO_PKG_VERSION=0.1.0 CARGO_PKG_VERSION_MAJOR=0 CARGO_PKG_VERSION_MINOR=1 CARGO_PKG_VERSION_PATCH=0 CARGO_PKG_VERSION_PRE='' CARGO_PRIMARY_PACKAGE=1 LD_LIBRARY_PATH='/home/user/code/lang/arch/riscv/hello-libc/target/debug/deps:/home/user/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib:/home/user/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib' rustc --crate-name hello_libc --edition=2021 src/main.rs --error-format=json --json=diagnostic-rendered-ansi,artifacts,future-incompat --crate-type bin --emit=dep-info,link -C embed-bitcode=no -C debuginfo=2 -C metadata=cac59addb6dfe60a -C extra-filename=-cac59addb6dfe60a --out-dir /home/user/code/lang/arch/riscv/hello-libc/target/riscv64gc-unknown-linux-gnu/debug/deps --target riscv64gc-unknown-linux-gnu -C linker=riscv64-linux-gnu-gcc -C incremental=/home/user/code/lang/arch/riscv/hello-libc/target/riscv64gc-unknown-linux-gnu/debug/incremental -L dependency=/home/user/code/lang/arch/riscv/hello-libc/target/riscv64gc-unknown-linux-gnu/debug/deps -L dependency=/home/user/code/lang/arch/riscv/hello-libc/target/debug/deps` Finished dev [unoptimized + debuginfo] target(s) in 0.62s
Copying the command into $EDITOR and breaking into human-digestible lines helps.
Alternatively, if you're in the mood for a shell vibe from the 80s, use some `sed`
$ sed -e 's/ \(CARGO_\|LD_\|-C\|--\|-L\|rustc\|src\)/\n\1/g' < tmp/link-command.txt CARGO=/home/user/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/bin/cargo CARGO_BIN_NAME=hello-libc CARGO_CRATE_NAME=hello_libc CARGO_MANIFEST_DIR=/home/user/code/lang/arch/riscv/hello-libc CARGO_PKG_AUTHORS='' CARGO_PKG_DESCRIPTION='' CARGO_PKG_HOMEPAGE='' CARGO_PKG_LICENSE='' CARGO_PKG_LICENSE_FILE='' CARGO_PKG_NAME=hello-libc CARGO_PKG_REPOSITORY='' CARGO_PKG_RUST_VERSION='' CARGO_PKG_VERSION=0.1.0 CARGO_PKG_VERSION_MAJOR=0 CARGO_PKG_VERSION_MINOR=1 CARGO_PKG_VERSION_PATCH=0 CARGO_PKG_VERSION_PRE='' CARGO_PRIMARY_PACKAGE=1 LD_LIBRARY_PATH='/home/user/code/lang/arch/riscv/hello-libc/target/debug/deps:/home/user/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib:/home/user/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib' rustc --crate-name hello_libc --edition=2021 src/main.rs --error-format=json --json=diagnostic-rendered-ansi,artifacts,future-incompat --crate-type bin --emit=dep-info,link -C embed-bitcode=no -C debuginfo=2 -C metadata=cac59addb6dfe60a -C extra-filename=-cac59addb6dfe60a --out-dir /home/user/code/lang/arch/riscv/hello-libc/target/riscv64gc-unknown-linux-gnu/debug/deps --target riscv64gc-unknown-linux-gnu -C linker=riscv64-linux-gnu-gcc -C incremental=/home/user/code/lang/arch/riscv/hello-libc/target/riscv64gc-unknown-linux-gnu/debug/incremental -L dependency=/home/user/code/lang/arch/riscv/hello-libc/target/riscv64gc-unknown-linux-gnu/debug/deps -L dependency=/home/user/code/lang/arch/riscv/hello-libc/target/debug/deps
Tip:
rustc -C help
is your friend. -
getting verbose output from gcc with
rustflags = ["-C", "link-arg=-v"]
in[target.riscv64gc-unknown-linux-gnu]
section of.cargo/config.toml
.rustflags = [ "-C", "link-arg=-v", # make gcc more talkative "-C", "link-arg=-Wl,--verbose", # make linker more talkative ]
-
getting the ELF interpreter via
file
:$ file $BIN_DIR/hello-libc ./target/riscv64gc-unknown-linux-gnu/debug/hello-libc: ELF 64-bit LSB pie executable, UCB RISC-V, RVC, double-float ABI, version 1 (SYSV), dynamically linked, interpreter /usr/riscv64-linux-gnu/lib/ld-linux-riscv64-lp64d.so.1, ...
Linux "Hello world" in no_std
Rust
The Rust-glibc example used glibc, which provided Rust with OS APIs. In this section, the goal is to write the same "Hello world" program without any libc. It must rely only on RISC-V Linux ABI, reimplementing the required OS API as we need it, just like the GNU assembly "Hello world".
This section is inspired by Embeddonomicon. It takes the Linux process sandbox as a kind of an "embedded" environment, with complete control over memory, without any external code, with only Linux ABI as its "hardware". A custom runtime will be grown as we go.
Tools and references
Ensure that rustup is installed.
Make sure that a rustc target riscv64gc-unknown-none-elf
is installed:
$ rustup target add riscv64gc-unknown-none-elf
Unlike riscv64gc-unknown-linux-gnu
, it assumes a "baremetal" environment and does not try to link
any libraries.
cargo-binutils
is not strictly necessary, but it's nice to have cargo objdump
and cargo nm
:
$ cargo install cargo-binutils
Documentation:
- Embeddonomicon shows how to bring up a custom Rust runtime in a new environment.
- Rust inline assembly is the definitive reference manual for the topic.
The minimal no_std
binary
$ cargo init --name hello-nostd
Set the default target and runner in .cargo/config.toml
:
[build]
target = "riscv64gc-unknown-none-elf" # build for this target by default
[target.riscv64gc-unknown-none-elf] # configuration for this target
runner = "qemu-riscv64" # for `cargo run` to work on x86_64
Set src/main.rs
to be #![no_main]
and #![no_std]
.
A no_std
environment still requires at least two basic runtime mechanisms, both related to
Rust panicking:
-
what to do when unwinding the stack on panic. This is implemented by a
#[lang = "eh_personality"]
function or just by waiving it off inCargo.toml
:[profile.dev] panic = "abort" [profile.release] panic = "abort"
Although:
riscv64gc-unknown-none-elf
assumes"panic-strategy": "abort"
by default. -
a
#[panic_handler]
function to execute when panic happened and the stack was unwound successfully:src/main.rs
:#![no_main] #![no_std] #[panic_handler] fn panic_handler(_panic: &core::panic::PanicInfo) -> ! { loop {} // for now, just hang to satisfy the typechecker. }
#![no_main]
means that we should also remove fn main()
. We'll get back to it later.
The binary that can be built at this stage does not actually contain any executable code.
$ cargo run
'cargo run' terminated by signal SIGSEGV (Address boundary error)
$ cargo objdump --release -- -d | rustfilt
hello-nostd: file format elf64-littleriscv
A minimal executable that exits successfully
The output of cargo rustc -- -Z unstable-options --print target-spec-json
suggests that
riscv64-unknown-none-elf
uses rust.lld
as its default linker.
I did not dig into details, but I guessed that its default linker script uses a _start
symbol
as its entrypoint.
Reading the Rust inline assembly guide and translating the knowledge from the assembly "Hello world", we get this:
Writing to stdout
Tidying up: linux-rt
and its linker script
TODO
Troubleshooting
Getting the JSON spec of the current rustc target (requires nightly):
cargo rustc -- -Z unstable-options --print target-spec-json
cargo rustc -- -Z unstable-options --print target-spec-json
$ cargo rustc -- -Z unstable-options --print target-spec-json
Compiling hello-nostd v0.1.0 (/home/user/code/learn/eval/rvemu/riscv/hello-nostd)
{
"arch": "riscv64",
"code-model": "medium",
"cpu": "generic-rv64",
"data-layout": "e-m:e-p:64:64-i64:64-i128:128-n64-S128",
"eh-frame-header": false,
"emit-debug-gdb-scripts": false,
"features": "+m,+a,+f,+d,+c",
"is-builtin": true,
"linker": "rust-lld",
"linker-flavor": "ld.lld",
"llvm-abiname": "lp64d",
"llvm-target": "riscv64",
"max-atomic-width": 64,
"panic-strategy": "abort",
"relocation-model": "static",
"target-pointer-width": "64"
}