Introduction

These are my personal notes for learning RISC-V.

A compiled version can be found at https://dmytrish.net/lib/riscv

RISC-V basics

The basic sets and extensions

The basic instruction sets are:

  • RV32I for 32-bit integers and addresses;
  • RV64I for 64-bit integers and addresses;
  • Some thought was given to RV128I as well.

A specific implementation MUST implement the basic set (allowing for software emulation of some instructions), and CAN implement standard and non-standard extensions.

Most common standard extensions:

  • integer Multiplication
  • integer Atomic operations
  • single-precision (32bit) Floating-point operations
  • Double-precision (64bit) floating-point operations
  • Compressed instruction encoding

The General variant, RVnnG, is a shortcut for RVnnIMAFD

A popular compilation target is the GC combination:

$ rustup target list | grep riscv
riscv32i-unknown-none-elf
riscv32imac-unknown-none-elf
riscv32imc-unknown-none-elf
riscv64gc-unknown-linux-gnu
riscv64gc-unknown-none-elf 
riscv64imac-unknown-none-elf

Endianness

The base set memory system is assumed to be little-endian in respect to parcels.

16-bit parcels can be of any encoding, e.g.

    // Example: store x2 at x3 in native endianness
    sh   x2, 0(x3)    // store the low parcel of x2 at x3
    srli x2, x2, 16   // right-shift the integer by 16 bit
    sh   x2, 2(x3)    // store high parcel of x3

Exceptions, traps, interrupts

Exceptions are caused by not being to proceed with the normal execution of a thread. Trap is a synchronous transfer of control to a trap handler (usually executed in a more privileged environment).

Interrupts are caused by events external to the current thread of execution.

Instruction Encoding

Instruction Length Encoding

The base set contains fixed-length 32-bit instructions, naturally aligned to 32-bit boundaries. This instructions have the five least significant bits set to xxx11 except 11111 (see below).

The encoding supports variable-length instructions of one or more 16bit parcels (with least significant bits 00, 01, 10).

The standard C (compressed) extension relaxes boundaries to be 16bit.

                              ___________________ 
                             | xxxxxxxx xxxxxxAA |     16-bit (AA ≠ 11)

          _______________________________________
         | xxxxxxxx xxxxxxxx | xxxxxxxx xxxBBB11 |     32-bit (BBB ≠ 111)

  _______________________________________________
  ..xxxx | xxxxxxxx xxxxxxxx | xxxxxxxx xx011111 |     48-bit

  _______________________________________________
  ..xxxx | xxxxxxxx xxxxxxxx | xxxxxxxx x0111111 |     64-bit

  _______________________________________________
  ..xxxx | xxxxxxxx xxxxxxxx | xNNNxxxx x1111111 |     (80 + 16*NNN)-bit, NNN ≠ 111)

Encodings

Every RV32I instruction is 32 bit long.

It is encoded by one of the six encoding types (R-type, I-type, S-type/SB-type, U-type/UJ-type) and may contain the following parts:

partinstruction bitsdescriptionwhich encoding types
opcode7 at [6:0]operation selector(in all encoding types)
funct33 at [14:12]suboperation selector(except U/UJ type)
funct77 at [31:25]suboperation selector(only in R type)
rd5 at [11:7]Destination Register index(except S/SB type)
rs15 at [19:15]Source 1 Register index(except U/UJ type)
rs25 at [24:20]Source 2 Register index(R, S/SB type)
imm5/7/12/20 bitsan immediate value(except R-type)

R-type encoding

Contains rd, funct3, rs1, rs2, funct7

 bit      R-type           
_____       ___            
  31         |             
  30         |             
  29         |             
  28         | funct7      
  27         |             
  26         |             
  25        _|_            
__24         |             
  23         |             
  22         | rs2         
  21         |             
  20        _|_            
  19         |             
  18         |             
  17         | rs1         
__16         |             
  15        _|_            
  14         |             
  13         | funct3      
  12        _|_            
  11         |             
  10         |             
   9         | rd          
___8         |             
   7        _|_            
   6         |             
   5         |             
   4         |             
   3         | opcode      
   2         |             
   1         | 1           
___0        _|_1           

I-type encoding

Contains rd, funct3, rs1, 12-bit imm

 bit      I-type           
_____       ___            
  31         |             
  30         |             
  29         |             
  28         |             
  27         |             
  26         | imm[11:0]   
  25         |             
__24         |             
  23         |             
  22         |             
  21         |             
  20        _|_            
  19         |             
  18         |             
  17         | rs1         
__16         |             
  15        _|_            
  14         |             
  13         |  funct3     
  12        _|_            
  11         |             
  10         |             
   9         |  rd         
___8         |             
   7        _|_            
   6         |             
   5         |             
   4         |             
   3         | opcode      
   2         |             
   1         |             
___0        _|_            

S-type encoding

Contains funct3, rs1, rs2, 12-bit imm (imm[11:5] at bit 25, imm[4:0] at bit 7).

 bit      S-type         
_____       ___          
  31         |           
  30         |           
  29         |           
  28         | imm[11:5] 
  27         |           
  26         |           
  25        _|_          
__24         |           
  23         |           
  22         |  rs2      
  21         |           
  20        _|_          
  19         |           
  18         |           
  17         |  rs1      
__16         |           
  15        _|_          
  14         |           
  13         |  funct3   
  12        _|_          
  11         |           
  10         |           
   9         | imm[4:0]  
___8         |           
   7        _|_          
   6         |           
   5         |           
   4         |           
   3         |  opcode   
   2         |           
   1         |           
___0        _|_          

SB-type encoding

Like S-type, but without imm[0], imm[12] at bit 31, imm[11] at bit 7

 bit      SB-type         
_____       ___          
  31         |_imm[12]          
  30         | imm[10]   
  29         | imm[9]
  28         | imm[8]
  27         | imm[7]       
  26         | imm[6]       
  25        _|_imm[5]    
__24         |           
  23         |           
  22         |  rs2      
  21         |           
  20        _|_          
  19         |           
  18         |           
  17         |  rs1      
__16         |           
  15        _|_          
  14         |           
  13         |  funct3   
  12        _|_          
  11         | imm[4]          
  10         | imm[3]
   9         | imm[2]
___8         |_imm[1]
   7        _|_imm[11]
   6         |           
   5         |           
   4         |           
   3         |  opcode   
   2         |           
   1         |           
___0        _|_          

U-type encoding

Contains rd, 20-bit imm at [31:12]

 bit      U-type         
_____       ___          
  31         |           
  30         |           
  29         |           
  28         |           
  27         |           
  26         |           
  25         |           
__24         |           
  23         |           
  22         |           
  21         | imm[31:12]
  20         |           
  19         |           
  18         |           
  17         |           
__16         |           
  15         |           
  14         |           
  13         |           
  12        _|_          
  11         |           
  10         |           
   9         |  rd       
___8         |           
   7        _|_          
   6         |           
   5         |           
   4         |           
   3         |  opcode   
   2         |           
   1         |           
___0        _|_          

UJ-type encoding

Like U-type, but imm without imm[0], imm[20] at 31, imm[10:1] at 21, imm[11] at 20, imm[19:12] at 12

 bit      U-type         
_____       ___          
  31         |_imm[20]
  30         | imm[10]   
  29         | imm[9]   
  28         | imm[8]       
  27         | imm[7]       
  26         | imm[6]       
  25         | imm[5]       
__24         | imm[4]       
  23         | imm[3]       
  22         | imm[2]       
  21         |_imm[1]
  20         |_imm[11]     
  19         | imm[19]          
  18         | imm[18]          
  17         | imm[17]          
__16         | imm[16]          
  15         | imm[15]          
  14         | imm[14]          
  13         | imm[13]          
  12        _|_imm[12]          
  11         |           
  10         |           
   9         |  rd       
___8         |           
   7        _|_          
   6         |           
   5         |           
   4         |           
   3         |  opcode   
   2         |           
   1         |           
___0        _|_          

RV32I

RV32I registers

The user-visible architectural state is:

  • Value of x0/zero is always 0; writes do not change it.
  • 31 read/write data registers x1 .. x31;
  • pc: the program counter register pointing to the start of the current instruction

Instructions must be stored naturally aligned in little-endian byte order.

Instruction classes

RV32I specification describes 47 instructions:

Major opcodes for classes

op[1:0]=11 for all instructions in RV32I.

op[4:2]=000001010011100101110
op[6:5]=00LoadsF-extFencesArithmAUIPCRV64I
op[6:5]=01StoresF-extA-extArithmLUIRV64I
op[6:5]=10F-extF-extF-extF-extF-extRV128I
op[6:5]=11BranchesJALRJALSystemRV128I

Least significant byte of an instruction by class

Byte_3_7_B_F
0_/8_LoadsF-extFences
1_/9_Arithm. (I)AUIPCRV64I
2_/A_StoresF-extA-ext
3_/B_Arithm. (R)LUIRV64I
4_/C_F-extF-extF-ext
5_/D_F-extRV128I
6_/E_BranchesJALRJAL
7_/F_SystemRV128I

RV32I: Computational Instructions

There are 21 computational instructions.

rd denotes a _d_estination _r_egister, rs1 and rs2 some _s_ource _r_egisters. imm stands for "immediate value".

Instruction"C"Meaning
add rd, rs1, rs2rd = rs1 + rs2
sub rd, rs1, rs2rd = rs1 - rs2
sll rd, rs1, rs2rd = rs1 << rs2shift left logical by register
srl rd, rs1, rs2TODOshift right logical by register
sra rd, rs1, rs2TODOshift right arithm. by register
and rd, rs1, rs2rd = rs1 & rs2bitwise AND
or rd, rs1, rs2rd = rs1 | rs2bitwise OR
xor rd, rs1, rs2rd = rs1 ^ rs2bitwise XOR
slt rd, rs1, rs2rd = ((int)rs1 < (int)rs2)compare, set 1/0
sltu rd, rs1, rs2rd = ((uint)rs1 < (uint)rs2)compare, set 1/0
addi rd, rs1, immrd = rs1 + (int32_t)imm
slli rd, rs1, immrd = rs1 << imm[4:0]
srli rd, rs1, imm???shift right logical by immediate
srai rd, rs1, imm???shift right arithm. by immediate
andi rd, rs1, immrd = rs1 & immbitwise AND with immediate
ori rd, rs1, immrd = rs1 | immbitwise OR with immediate
xori rd, rs1, immrd = rs1 ^ immbitwise XOR with immediate
slti rd, rs1, immrd = ((int)rs1 < (int)imm)sign-extend imm, compare, set 1/0
sltui rd, rs1, imm_rd = ((uint)rs1 < (uint)imm)sign-extend imm, compare as unsigned, set 1/0
lui rd, immrd = (imm << 12)load upper immediate, set lower 12 bits to 0
auipc rd, immrd = pc + (imm << 12)add upper immediate to pc

Signed integers are stored as 2's complements. All of instructions sign-extend operands if needed.

Definitions:

  • logical left shift by n: equivalent to multiplication by 2^n.
  • logical right shift by n: equivalent to unsigned division by 2^n, rounding towards 0
  • logical arithmetical shift by n: equivalent to unsigned division by 2^n, rounding down

Notes:

  • not rd, rs1 can be implemented as xori rd, rs1, -1
  • a pseudoinstruction seqz: sltiu rd, rs1, 1, computes if rs1 is 0.
  • a pseudoinstruction li rd, imm: lui rd, imm[31:12]; addi rd, rd, imm[11:0]
  • nop is usually defined as addi x0, x0, 0

TODO: what happens on integer overflow?

auipc is a position-independent code shortcut, e.g.:

auipc   x4, 0x1
lw      x4, 0x234(x4)

allows to read a word from memory at pc + 0x1234 into x4

TODO: pc at which point?

Encoding auipc and lui

lui and auipc are U-type. Least significant byte looks like 37/B7, 17/97.

imm[31:12]rdopcode
lui01 101 11
auipc00 101 11

Encoding register instructions

Instructions with rs2 are R-type:

funct7rs2rs1funct3rdopcode
add000000000001 100 11
sub010000000001 100 11
sll000000000101 100 11
slt000000001001 100 11
sltu000000001101 100 11
xor000000010001 100 11
srl000000010101 100 11
sra010000010101 100 11
or000000011001 100 11
and000000011101 100 11

Encoding instructions with immediates

Everything else is I-type.

imm[11:5]imm[4:0]rs1funct3rdopcode
addi00000 100 11
slti01000 100 11
sltiu01100 100 11
xori10000 100 11
ori11000 100 11
andi11100 100 11
slli0000000shamt00100 100 11
srli0000000shamt10100 100 11
srai0100000shamt10100 100 11

Least significant byte

  • 13/93 for instructions with immediates
  • 33/B3 for register instructions
  • 37/B7 for lui
  • 17/97 for auipc

RV32I: Memory Access Instructions

Computational instructions only work on the contents of registers. Memory access instruction exchange 8/16/32-bit values ("bytes"/ "halfs"/"words") between registers and RAM locations.

There are 5 load ("RAM to register") instructions and 3 store ("register to RAM") instructions. They use byte addresses in RAM encoded as the value in register rs1 added with 12-bit sign-extended immediate.

2 fence instructions serialize concurrent accesses to RAM from different hardware threads.

Misaligned RAM access is allowed, but can be non-atomic and/or much slower.

Load instructions

instrdescription"C"
lw rd, imm(rs1)"load word"int32_t *ptr = rs1 + (int32_t)imm;
rd = *ptr;
lh rd, imm(rs1)"load half",
sign-extend
int16_t *p = rs1 + (int32_t)imm;
rd = (int32_t)*p;
lb rd, imm(rs1)"load byte",
sign-extend
int8_t *p = rs1 + (int32_t)imm;
rd = (int32_t)*p;
lhu rd, imm(rs1)"load half unsigned"uint16_t *p = rs1 + (int32_t)imm;
rd = (uint32_t)*p;
lbu rd, imm(rs1)"load byte unsigned"uint8_t *p = rs1 + (int32_t)imm;
rd = (uint32_t)*p;

Store instructions

instrdescription"C"
sw imm(rs1), rs2"store word"*(int32_t *)(rs1 + imm[11:0])) = rs2
sh imm(rs1), rs2"store half"*(int16_t *)(rs1 + imm[11:0])) = (int16_t)rs2
sb imm(rs1), rs2"store byte"*(int8_t *)(rs1 + imm[11:0])) = (int8_t)rs2

Fence instructions

instrdescription
fence pred, succan explicit barrier for the specified kinds of
concurrent memory accesses
fence.ian explicit barrier for writing and executing
instructions in RAM concurrently

When multiple harts, hardware threads ("cores") are present and share the same RAM, it is necessary to control how changes by one hart are perceived by another.

Some (ahem, x86_64) architectures provide sequential consistency, which guarantees that any observed state can be described by some combination of concurrent sequential changes. This model makes it easier to reason about machine code, but can significantly complicate hardware. Under sequential consistency, speculative and out-of-order execution must maintain a separate externally visible sequentially-consistent state.

Since different harts work with different areas of RAM most of the time, RISC-V assumes a relaxed memory model, which requires explicit synchronization when needed.

A fence instruction provides an ordering guarantee between memory accesses before and after the fence. The arguments describe:

  1. the predecessor set: kinds of accesses by prior instructions that must be completed before fence
  2. the successor set: kinds of accesses by subsequent instructions that must not start before the fence

The kinds of accesses are:

  • R: "read memory"
  • W: "write memory"
  • I: "device input"
  • O: "device output"

E.g. fence rw, w guarantees that all reads and writes by preceding instructions appear completed before this instruction and any reordered writes by subseqent instructions must wait until this instruction. Note: reads by subsequent instructions can happen before this fence.

A fence.i allows to synchronize RAM data-access and instruction-access. E.g. if one hart writes instructions to RAM and another executes them, fence.i guarantees that preceding stores by one hart become visible to instruction fetches from another hart after.

Encoding

Stores are in S-type format:

instrimm[11:5]rs2rs1funct3imm[4:0]opcode
sb00001 000 11
sh00101 000 11
sw01001 000 11

The following instructions are in I-type format:

instrimm[11:0]rs1funct3rdopcode
lb00000 000 11
lh00100 000 11
lw01000 000 11
lbu10000 000 11
lhu10100 000 11

instrimm[11:0]rs1funct3rdopcode
fence0000 pred succ000000000000000 011 11
fence.i0000 0000 00000000000010000000 011 11

Least significant byte looks like:

  • 03/83 for loads
  • 23/A3 for stores
  • 0F for fences

TODO: clarify encoding of pred/succ masks.

Control Flow Instructions

instrdescription
beq rs1, rs2, imm[12:1]Branch to pc + sext(imm) if rs1 = rs2
bne rs1, rs2, imm[12:1]Branch to pc + sext(imm) if rs1rs2
blt rs1, rs2, imm[12:1]Branch to pc + sext(imm) if (int)rs1 < (int)rs2_
bltu rs1, rs2, imm[12:1]Branch to pc + sext(imm) if (uint)rs1 < (uint)rs2
bge rs1, rs2, imm[12:1]Branch to pc + sext(imm) if (int)rs1 >= (int)rs2_
bgeu rs1, rs2, imm[12:1]Branch to pc + sext(imm) if (uint)rs1 >= (uint)rs2
jal rd, imm[20:1]Jump and link
jalr rd, rs1, imm[11:0]Jump and link register

Conditional branches

A conditional jump to anywhere in range of ±4 KiB (1K instructions) relative to pc (at 16 bit boundary).

jal rd, imm[20:1] (jump-and-link):

  • writes the address of the subsequent instruction (pc + 4) to rd
  • sets pc to pc + sext(imm), allowing for jumps in a ±1MiB range.

Note: a one-way "goto" can be jal x0, offset to discard the "return" address.

"Long jumps" (to an arbitrary 32-bit offset) can be done with:

    auipc x1, offset[31:12]
    jalr  x0, offset[11:0](x1)

jalr rd, rs1, imm[11:0] allows for indirect jumps (switch statements, function returns, indirect function calls, vtable dispatch, etc):

  • write the address of the next instruction (pc + 4) into rd
  • set pc to pc + rs1 + sext(imm)

TODO: if rd is rs1, does it use the original value?

Encoding

jal is UJ-type.

imm[20,10:1,11,19:12]rdopcode
jal11 011 11

jalr is I-type.

imm[11:0]rs1funct3rdopcode
jalr00011 001 11

Conditional branches are SB-type

imm[12,10:5]rs2rs1funct3imm[4:1,11]opcode
beq00011 000 11
bne00111 000 11
blt10011 000 11
bge10111 000 11
bltu11011 000 11
bgeu11111 000 11

Least significant byte 6_/E_ encodes a branch instruction:

  • 63/E3 for conditional jumps
  • 67/E7 for jalr
  • 6F/EF for jal

System Instructions

Environment interaction

There are two instructions to interact with the operating system:

  • ecall for system calls
  • ebreak for calling a debugger

Control and Status Registers (CSRs)

Control and Status Registers (CSRs) provide a general facility for system control and I/O. There is a CSR address space for up to 212 registers.

The instructions to modify CSRs are:

instr using rs1instr using immdescription
csrrw rd, csr, rs1csrrwi rd, csr, immatomically copy a value from csr to rd and
overwrite csr with the value in rs1 or imm
csrrc rd, csr, rs1csrrci rd, csr, immatomically copy a value from csr to rd and
clear bits in csr 1
csrrs rd, csr, rs1csrrsi rd, csr, immatomically copy a value from csr to rd and
set bits in a csr 1

Note: csrrs x1, csr, x0 can be used to read from csr without modifying it. It is abbreviated as a pseudoinstruciton csrr rd, csr.

Mandatory user-readable CSRs

CSRatdescription
cycle0xC00cycle counter
cycleh0xC80upper 32 bit of cycle counter
time0xC01real-time clock
timeh0xC81upper 32 bit of real-time clock
instret0xC02instructions retired counter
instreth0xC82upper 32 bit of instret

Encoding

All of the following are in I-format:

imm[11:0]rs1funct3rdopcode
ecall0000 0000 0000000000000000011 100 11
ebreak0000 0000 0001000000000000011 100 11
csrrw00111 100 11
csrrs01011 100 11
csrrc01111 100 11
csrrwi10111 100 11
csrrci11011 100 11
csrrsi11111 100 11

1 TODO: according to the mask in rs1/imm ?

Linux "Hello world" in RISC-V GNU assembly

Let's write the smallest possible RISC-V Linux program that:

  • outputs "Hello world" to the standard output
  • exits successfully

Cross-compilation and RISCV-emulation packages on Ubuntu

I'm using a x86_64 machine with Ubuntu 22.04 and a RISC-V GCC toolchain from its repositories, gcc-riscv64-unknown-elf.

In order to get Linux-specific APIs for RISC-V, we'll also use package linux-libc-dev-riscv64-cross that provides RISC-V specific C headers in /usr/riscv64-linux-gnu/include.

Package qemu-user allows to run RV64 binaries in a software-emulated RV64 Linux environment.

Documentation and references:

The assembly code

First of all, the program must exit successfully. On Linux, this is done via the exit system call: https://linux.die.net/man/2/exit

How do we write RISC-V assembly to actually call it?

According to man 2 syscalls, the actual syscall numbers for the host instruction set architecture can be found in /usr/include/asm/unistd.h as __NR_xxx constants (e.g. __NR_exit for the exit syscall).

This RISC-V cross-compilation toolchain defines the actual number in /usr/riscv64-linux-gnu/include/asm-generic/unistd.h as __NR_exit.

According to man 2 syscall, the RISC-V way of making Linux system calls looks like this:

  • put the syscall number into a7: li a7, __NR_exit
  • put the syscall arguments into a0, a1, ..., a5
  • ecall performs the system call
  • returned values can be found in a0, a1

Therefore, _exit(0) in C translates to:

    li  a7, __NR_exit
    li  a0, 0
    ecall

man 2 write describes the syscall arguments:

#define STDOUT_FILENO   1

.text
    # write(STDOUT_FILENO, greeting, greetlen):
    li  a7, __NR_write
    li  a0, STDOUT_FILENO       # `int fd`
    la  a1, greeting            # `const void *buf`
    li  a2, greetlen            # `size_t count`
    ecall

Symbols greeting and greetlen are defined in section .rodata:

.section .rodata

greeting: .asciz "Hello world\n"
.equ greetlen, . - greeting

The complete assembly code in hello.S (capital .S means "assembly source, preprocessed"):

#include <asm-generic/unistd.h>

#define STDOUT_FILENO   1

.section .rodata                    # a section for read-only data

greeting: .asciz "Hello world\n"    # const char *greeting = "Hello world\n";
.equ greetlen, . - greeting         # const size_t greetlen = sizeof greeting;

.text                               # a section for executable code
.globl _start                       # export the program entrypoint symbol for the linker
_start:                             # linkers use `_start` as the default entrypoint
    li  a7, __NR_write
    li  a0, STDOUT_FILENO
    la  a1, greeting
    li  a2, greetlen
    ecall                           # write(STDOUT_FILENO, greeting, greetlen)

    li  a7, __NR_exit
    li  a0, 0
    ecall                           # _exit(0)
1:  j 1b                            # hang, in the very unlikely case `exit` failed

Compilation

By default, riscv64-unknown-elf-gcc tries to link start code from some crt0.o to enable libc functionality. Our program does not need libc, so let's add -nostdlib to LDFLAGS.

To be able to include asm-generic/unistd.h from /usr/riscv64-linux-gnu/include, adjust ASFLAGS to include files from there: -I /usr/riscv64-linux-gnu/include.

The Makefile:

CC = riscv64-unknown-elf-gcc
ASFLAGS += -I /usr/riscv64-linux-gnu/include
LDFLAGS += -nostdlib

# GNU make has an implicit rule for %: %.S which is roughly
#   $(CC) $(ASFLAGS) $(LDFLAGS) $< -o $@
hello: hello.S

# find and clean all the executables here
clean:
    -find -executable -type f -delete

# `clean` is not a file!
.PHONY: clean

Running make produces a 1208-byte ELF executable that runs via qemu-riscv64 and outputs "Hello world" (or just run it directly if qemu-user-binfmt is installed):

$ make hello
riscv64-unknown-elf-gcc -I /usr/riscv64-linux-gnu/include -nostdlib hello.S -o hello

$ stat hello
  File: hello
  Size: 1208            Blocks: 8          IO Block: 4096   regular file
  ...

$ qemu-riscv64 hello
Hello world

Linux "Hello world" in Rust with GNU libc

Let's write a Rust program that outputs "Hello world" to stdout and compile it to a RISC-V ELF binary. Bonus points: actually run it.

If you have a RISC-V Linux installation, congratulations, everything is handled by the default configuration and toolchains (cargo init && cargo run should just work).

The following instructions assume cross-compilation on a x86_64 host machine with Ubuntu 22.04.

Packages and tools for cross-compilation

Ensure that rustup is installed. This is needed to manage rustc targets.

Make sure that riscv64gc-unknown-linux-gnu rustc target is installed:

rustup target add riscv64gc-unknown-linux-gnu

It seems to be the case that rustc targets do not try to bring their own GCC toolchains with them or guess what system packages provide it (which is reasonable, since each Linux distribution has its own non-standard packaging for cross-compilation toolchains and glibc; I wish it was not so).

For Ubuntu, make sure that the following packages are installed:

  • gcc-riscv64-linux-gnu, the cross-compilation GCC toolchain for RISCV.
  • libc6-riscv64-cross for a dynamically-linked RISCV version of glibc
  • qemu-user to run a RISCV binary on x86_64
  • patchelf in case you want to run a dynamically-linked binary on x86_64

Documentation and references:

Compilation

Create a Cargo project template in an empty directory:

$ cargo init --name hello-libc

$ cat src/main.rs
fn main() {
    println!("Hello world!");
}

Now, adjust .cargo/config.toml:

[build]
target = "riscv64gc-unknown-linux-gnu"      # build for this target by default

[target.riscv64gc-unknown-linux-gnu]        # settings for this target
linker = "riscv64-linux-gnu-gcc"            # mandatory to link the binary
runner = "qemu-riscv64"                     # to make `cargo run` work on x86\_64

linker = "riscv64gc-unknown-linux-gnu" is crucial for cross-compilation, without it rustc just tries to use whichever ld it finds in $PATH and fails miserably. I still don't understand why different GCC toolchains require mostly-the-same, but different GNU linkers.

Also, you cannot use linker = "riscv64-unknown-linux-ld" directly, since it will not be able to find -lgcc_s on its own. There might be a way to tweak this, but GNU toolchain options are pain</rant>

Now cargo build (--release) should produce a RISCV executable in target/riscv64gc-unknown-linux-gnu/debug/hello-libc.

Compiling with a statically-linked glibc

Since Rust 1.19, it is possible to link glibc statically. To make a statically-linked executable, use a somewhat cryptically named target-feature=+crt-static to rustc flags in .cargo/config.toml:

[target.riscv64gc-unknown-linux-gnu]
...
rustflags = [
    "-C", "target-feature=+crt-static",     # link glibc statically
]

Running it

Let's assume the following shell variables are set, to make snippets more human-friendly:

$ BIN_DIR=./target/riscv64gc-unknown-linux-gnu/debug
$ RV_SYS_DIR=/usr/riscv64-linux-gnu

The compiled binary, $BIN_DIR/hello-libc, can be copied to a RISC-V Linux installation and it should be able to run there (I did not verify this).

If it is statically linked, it should just work with qemu-riscv64.

If you want to run a dynamically linked RISC-V executable on a x86_64 machine, things get complicated:

$ qemu-riscv64 $BIN_DIR/hello-libc
qemu-riscv64: Could not open '/lib/ld-linux-riscv64-lp64d.so.1': No such file or directory

There is no such ELF loader ("ELF interpeter"), /lib/ld-linux-riscv64-lp64d.so.1, installed, but:

$ apt-file search ld-linux-riscv64-lp64d.so
libc6-riscv64-cross: /usr/riscv64-linux-gnu/lib/ld-linux-riscv64-lp64d.so.1
[Peek under this fold to see] what does not work to fix this.
  • trying to change the executable RPATH does not change the hardcoded ELF interpeter path. It is not a regular shared library and always is an absolute path.

  • there seem to be no way to convince the linker to use the interpreter in $RV_SYS_DIR/lib/. The GNU toolchain insists on hardcoding a specific ELF interpreter path it was itself configured with.

One (dirty) way to solve this is to symlink $RV_SYS_DIR/lib/ld-linux-riscv64-lp64d.so.1 into /lib/ manually and use LD_LIBRARY_PATH=$RV_SYS_DIR/lib to override system libraries.

A better way is to patch the ELF interpreter and RPATH in (a copy of) the executable:

$ patchelf $BIN_DIR/hello-libc \
    --set-interpreter $RV_SYS_DIR/lib/ld-linux-riscv64-lp64d.so.1 \
    --set-rpath $RV_SYS_DIR/lib

$ qemu-riscv64 $BIN_DIR/hello-libc
Hello world!

Troubleshooting the linking

These are some tricks I found useful to understand what was (not) going on:

  • troubleshooting link failures with cargo build with -vv.

    The actual rustc command is still a hostile lump of text.
    $ cargo build -vv
       Compiling hello-libc v0.1.0 (/home/user/code/lang/arch/riscv/hello-libc)
         Running `CARGO=/home/user/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/bin/cargo CARGO_BIN_NAME=hello-libc CARGO_CRATE_NAME=hello_libc CARGO_MANIFEST_DIR=/home/user/code/lang/arch/riscv/hello-libc CARGO_PKG_AUTHORS='' CARGO_PKG_DESCRIPTION='' CARGO_PKG_HOMEPAGE='' CARGO_PKG_LICENSE='' CARGO_PKG_LICENSE_FILE='' CARGO_PKG_NAME=hello-libc CARGO_PKG_REPOSITORY='' CARGO_PKG_RUST_VERSION='' CARGO_PKG_VERSION=0.1.0 CARGO_PKG_VERSION_MAJOR=0 CARGO_PKG_VERSION_MINOR=1 CARGO_PKG_VERSION_PATCH=0 CARGO_PKG_VERSION_PRE='' CARGO_PRIMARY_PACKAGE=1 LD_LIBRARY_PATH='/home/user/code/lang/arch/riscv/hello-libc/target/debug/deps:/home/user/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib:/home/user/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib' rustc --crate-name hello_libc --edition=2021 src/main.rs --error-format=json --json=diagnostic-rendered-ansi,artifacts,future-incompat --crate-type bin --emit=dep-info,link -C embed-bitcode=no -C debuginfo=2 -C metadata=cac59addb6dfe60a -C extra-filename=-cac59addb6dfe60a --out-dir /home/user/code/lang/arch/riscv/hello-libc/target/riscv64gc-unknown-linux-gnu/debug/deps --target riscv64gc-unknown-linux-gnu -C linker=riscv64-linux-gnu-gcc -C incremental=/home/user/code/lang/arch/riscv/hello-libc/target/riscv64gc-unknown-linux-gnu/debug/incremental -L dependency=/home/user/code/lang/arch/riscv/hello-libc/target/riscv64gc-unknown-linux-gnu/debug/deps -L dependency=/home/user/code/lang/arch/riscv/hello-libc/target/debug/deps`
        Finished dev [unoptimized + debuginfo] target(s) in 0.62s
    

    Copying the command into $EDITOR and breaking into human-digestible lines helps.

    Alternatively, if you're in the mood for a shell vibe from the 80s, use some `sed`
    $ sed -e 's/ \(CARGO_\|LD_\|-C\|--\|-L\|rustc\|src\)/\n\1/g' < tmp/link-command.txt
    CARGO=/home/user/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/bin/cargo
    CARGO_BIN_NAME=hello-libc
    CARGO_CRATE_NAME=hello_libc
    CARGO_MANIFEST_DIR=/home/user/code/lang/arch/riscv/hello-libc
    CARGO_PKG_AUTHORS=''
    CARGO_PKG_DESCRIPTION=''
    CARGO_PKG_HOMEPAGE=''
    CARGO_PKG_LICENSE=''
    CARGO_PKG_LICENSE_FILE=''
    CARGO_PKG_NAME=hello-libc
    CARGO_PKG_REPOSITORY=''
    CARGO_PKG_RUST_VERSION=''
    CARGO_PKG_VERSION=0.1.0
    CARGO_PKG_VERSION_MAJOR=0
    CARGO_PKG_VERSION_MINOR=1
    CARGO_PKG_VERSION_PATCH=0
    CARGO_PKG_VERSION_PRE=''
    CARGO_PRIMARY_PACKAGE=1
    LD_LIBRARY_PATH='/home/user/code/lang/arch/riscv/hello-libc/target/debug/deps:/home/user/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib:/home/user/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib'
    rustc
    --crate-name hello_libc
    --edition=2021
    src/main.rs
    --error-format=json
    --json=diagnostic-rendered-ansi,artifacts,future-incompat
    --crate-type bin
    --emit=dep-info,link
    -C embed-bitcode=no
    -C debuginfo=2
    -C metadata=cac59addb6dfe60a
    -C extra-filename=-cac59addb6dfe60a
    --out-dir /home/user/code/lang/arch/riscv/hello-libc/target/riscv64gc-unknown-linux-gnu/debug/deps
    --target riscv64gc-unknown-linux-gnu
    -C linker=riscv64-linux-gnu-gcc
    -C incremental=/home/user/code/lang/arch/riscv/hello-libc/target/riscv64gc-unknown-linux-gnu/debug/incremental
    -L dependency=/home/user/code/lang/arch/riscv/hello-libc/target/riscv64gc-unknown-linux-gnu/debug/deps
    -L dependency=/home/user/code/lang/arch/riscv/hello-libc/target/debug/deps
    

    Tip: rustc -C help is your friend.

  • getting verbose output from gcc with rustflags = ["-C", "link-arg=-v"] in [target.riscv64gc-unknown-linux-gnu] section of .cargo/config.toml.

    rustflags = [
      "-C", "link-arg=-v",                # make gcc more talkative
      "-C", "link-arg=-Wl,--verbose",     # make linker more talkative
    ]
    
  • getting the ELF interpreter via file:

    $ file $BIN_DIR/hello-libc
    ./target/riscv64gc-unknown-linux-gnu/debug/hello-libc: ELF 64-bit LSB pie executable, UCB RISC-V, RVC, double-float ABI, version 1 (SYSV), 
    dynamically linked, interpreter /usr/riscv64-linux-gnu/lib/ld-linux-riscv64-lp64d.so.1, ...
    

Linux "Hello world" in no_std Rust

The Rust-glibc example used glibc, which provided Rust with OS APIs. In this section, the goal is to write the same "Hello world" program without any libc. It must rely only on RISC-V Linux ABI, reimplementing the required OS API as we need it, just like the GNU assembly "Hello world".

This section is inspired by Embeddonomicon. It takes the Linux process sandbox as a kind of an "embedded" environment, with complete control over memory, without any external code, with only Linux ABI as its "hardware". A custom runtime will be grown as we go.

Tools and references

Ensure that rustup is installed.

Make sure that a rustc target riscv64gc-unknown-none-elf is installed:

$ rustup target add riscv64gc-unknown-none-elf

Unlike riscv64gc-unknown-linux-gnu, it assumes a "baremetal" environment and does not try to link any libraries.

cargo-binutils is not strictly necessary, but it's nice to have cargo objdump and cargo nm:

$ cargo install cargo-binutils

Documentation:

The minimal no_std binary

$ cargo init --name hello-nostd

Set the default target and runner in .cargo/config.toml:

[build]
target = "riscv64gc-unknown-none-elf"   # build for this target by default

[target.riscv64gc-unknown-none-elf]     # configuration for this target
runner = "qemu-riscv64"                 # for `cargo run` to work on x86_64

Set src/main.rs to be #![no_main] and #![no_std].

A no_std environment still requires at least two basic runtime mechanisms, both related to Rust panicking:

  • what to do when unwinding the stack on panic. This is implemented by a #[lang = "eh_personality"] function or just by waiving it off in Cargo.toml:

    [profile.dev]
    panic = "abort"
    
    [profile.release]
    panic = "abort"
    

    Although: riscv64gc-unknown-none-elf assumes "panic-strategy": "abort" by default.

  • a #[panic_handler] function to execute when panic happened and the stack was unwound successfully:

    src/main.rs:

    #![no_main]
    #![no_std]
    
    #[panic_handler]
    fn panic_handler(_panic: &core::panic::PanicInfo) -> ! {
        loop {}           // for now, just hang to satisfy the typechecker.
    }
    

#![no_main] means that we should also remove fn main(). We'll get back to it later.

The binary that can be built at this stage does not actually contain any executable code.

$ cargo run
'cargo run' terminated by signal SIGSEGV (Address boundary error)

$ cargo objdump --release -- -d | rustfilt
hello-nostd:    file format elf64-littleriscv

A minimal executable that exits successfully

The output of cargo rustc -- -Z unstable-options --print target-spec-json suggests that riscv64-unknown-none-elf uses rust.lld as its default linker. I did not dig into details, but I guessed that its default linker script uses a _start symbol as its entrypoint.

Reading the Rust inline assembly guide and translating the knowledge from the assembly "Hello world", we get this:


#![allow(unused)]
#![no_main]
#![no_std]
#![feature(start)]              // to enable #[start]

fn main() {
use core::arch::asm;            // to use asm!()

#[panic_handler]
fn panic_handler(_panic: &core::panic::PanicInfo) -> ! {
    loop {}
}

#[no_mangle]                    // for linker to be able to see `_start`
#[start]
pub unsafe extern "C"           // everything about this function is unsafe!
fn _start() -> ! {              // does not return
    asm!(
        "ecall",
        in("a7") 93,            // __NR_exit
        in("a0") 0,             // status code 0
        options(noreturn),
    )                           // `noreturn` assigns this block the return type `!`
}
}

Writing to stdout


#![allow(unused)]
fn main() {
...
fn _start() -> ! {
    static HELLO: &[u8] = b"Hello world!\n";

    asm!(
        "ecall",
        in("a7") 64,                    // __NR_write
        in("a0") 1,                     // STDOUT_FILENO
        in("a1") HELLO.as_ptr().addr(), // #![feature(strict_provenance)]
        in("a2") HELLO.len(),
        options(readonly),              // expect no changes to memory
    );

    ...
}
}

Tidying up: linux-rt and its linker script

TODO

Troubleshooting

Getting the JSON spec of the current rustc target (requires nightly):

cargo rustc -- -Z unstable-options --print target-spec-json

$ cargo rustc -- -Z unstable-options --print target-spec-json
   Compiling hello-nostd v0.1.0 (/home/user/code/learn/eval/rvemu/riscv/hello-nostd)
{
  "arch": "riscv64",
  "code-model": "medium",
  "cpu": "generic-rv64",
  "data-layout": "e-m:e-p:64:64-i64:64-i128:128-n64-S128",
  "eh-frame-header": false,
  "emit-debug-gdb-scripts": false,
  "features": "+m,+a,+f,+d,+c",
  "is-builtin": true,
  "linker": "rust-lld",
  "linker-flavor": "ld.lld",
  "llvm-abiname": "lp64d",
  "llvm-target": "riscv64",
  "max-atomic-width": 64,
  "panic-strategy": "abort",
  "relocation-model": "static",
  "target-pointer-width": "64"
}