RV32I: Memory Access Instructions

Computational instructions only work on the contents of registers. Memory access instruction exchange 8/16/32-bit values ("bytes"/ "halfs"/"words") between registers and RAM locations.

There are 5 load ("RAM to register") instructions and 3 store ("register to RAM") instructions. They use byte addresses in RAM encoded as the value in register rs1 added with 12-bit sign-extended immediate.

2 fence instructions serialize concurrent accesses to RAM from different hardware threads.

Misaligned RAM access is allowed, but can be non-atomic and/or much slower.

Load instructions

instrdescription"C"
lw rd, imm(rs1)"load word"int32_t *ptr = rs1 + (int32_t)imm;
rd = *ptr;
lh rd, imm(rs1)"load half",
sign-extend
int16_t *p = rs1 + (int32_t)imm;
rd = (int32_t)*p;
lb rd, imm(rs1)"load byte",
sign-extend
int8_t *p = rs1 + (int32_t)imm;
rd = (int32_t)*p;
lhu rd, imm(rs1)"load half unsigned"uint16_t *p = rs1 + (int32_t)imm;
rd = (uint32_t)*p;
lbu rd, imm(rs1)"load byte unsigned"uint8_t *p = rs1 + (int32_t)imm;
rd = (uint32_t)*p;

Store instructions

instrdescription"C"
sw imm(rs1), rs2"store word"*(int32_t *)(rs1 + imm[11:0])) = rs2
sh imm(rs1), rs2"store half"*(int16_t *)(rs1 + imm[11:0])) = (int16_t)rs2
sb imm(rs1), rs2"store byte"*(int8_t *)(rs1 + imm[11:0])) = (int8_t)rs2

Fence instructions

instrdescription
fence pred, succan explicit barrier for the specified kinds of
concurrent memory accesses
fence.ian explicit barrier for writing and executing
instructions in RAM concurrently

When multiple harts, hardware threads ("cores") are present and share the same RAM, it is necessary to control how changes by one hart are perceived by another.

Some (ahem, x86_64) architectures provide sequential consistency, which guarantees that any observed state can be described by some combination of concurrent sequential changes. This model makes it easier to reason about machine code, but can significantly complicate hardware. Under sequential consistency, speculative and out-of-order execution must maintain a separate externally visible sequentially-consistent state.

Since different harts work with different areas of RAM most of the time, RISC-V assumes a relaxed memory model, which requires explicit synchronization when needed.

A fence instruction provides an ordering guarantee between memory accesses before and after the fence. The arguments describe:

  1. the predecessor set: kinds of accesses by prior instructions that must be completed before fence
  2. the successor set: kinds of accesses by subsequent instructions that must not start before the fence

The kinds of accesses are:

  • R: "read memory"
  • W: "write memory"
  • I: "device input"
  • O: "device output"

E.g. fence rw, w guarantees that all reads and writes by preceding instructions appear completed before this instruction and any reordered writes by subseqent instructions must wait until this instruction. Note: reads by subsequent instructions can happen before this fence.

A fence.i allows to synchronize RAM data-access and instruction-access. E.g. if one hart writes instructions to RAM and another executes them, fence.i guarantees that preceding stores by one hart become visible to instruction fetches from another hart after.

Encoding

Stores are in S-type format:

instrimm[11:5]rs2rs1funct3imm[4:0]opcode
sb00001 000 11
sh00101 000 11
sw01001 000 11

The following instructions are in I-type format:

instrimm[11:0]rs1funct3rdopcode
lb00000 000 11
lh00100 000 11
lw01000 000 11
lbu10000 000 11
lhu10100 000 11

instrimm[11:0]rs1funct3rdopcode
fence0000 pred succ000000000000000 011 11
fence.i0000 0000 00000000000010000000 011 11

Least significant byte looks like:

  • 03/83 for loads
  • 23/A3 for stores
  • 0F for fences

TODO: clarify encoding of pred/succ masks.