RV32I: Memory Access Instructions
Computational instructions only work on the contents of registers. Memory access instruction exchange 8/16/32-bit values ("bytes"/ "halfs"/"words") between registers and RAM locations.
There are 5 load ("RAM to register") instructions and 3 store ("register to RAM") instructions. They use byte addresses in RAM encoded as the value in register rs1 added with 12-bit sign-extended immediate.
2 fence instructions serialize concurrent accesses to RAM from different hardware threads.
Misaligned RAM access is allowed, but can be non-atomic and/or much slower.
Load instructions
instr | description | "C" |
---|---|---|
lw rd, imm(rs1) | "load word" | int32_t *ptr = rs1 + (int32_t)imm; rd = *ptr; |
lh rd, imm(rs1) | "load half", sign-extend | int16_t *p = rs1 + (int32_t)imm; rd = (int32_t)*p; |
lb rd, imm(rs1) | "load byte", sign-extend | int8_t *p = rs1 + (int32_t)imm; rd = (int32_t)*p; |
lhu rd, imm(rs1) | "load half unsigned" | uint16_t *p = rs1 + (int32_t)imm; rd = (uint32_t)*p; |
lbu rd, imm(rs1) | "load byte unsigned" | uint8_t *p = rs1 + (int32_t)imm; rd = (uint32_t)*p; |
Store instructions
instr | description | "C" |
---|---|---|
sw imm(rs1), rs2 | "store word" | *(int32_t *)(rs1 + imm[11:0])) = rs2 |
sh imm(rs1), rs2 | "store half" | *(int16_t *)(rs1 + imm[11:0])) = (int16_t)rs2 |
sb imm(rs1), rs2 | "store byte" | *(int8_t *)(rs1 + imm[11:0])) = (int8_t)rs2 |
Fence instructions
instr | description |
---|---|
fence pred, succ | an explicit barrier for the specified kinds of concurrent memory accesses |
fence.i | an explicit barrier for writing and executing instructions in RAM concurrently |
When multiple harts, hardware threads ("cores") are present and share the same RAM, it is necessary to control how changes by one hart are perceived by another.
Some (ahem, x86_64) architectures provide sequential consistency, which guarantees that any observed state can be described by some combination of concurrent sequential changes. This model makes it easier to reason about machine code, but can significantly complicate hardware. Under sequential consistency, speculative and out-of-order execution must maintain a separate externally visible sequentially-consistent state.
Since different harts work with different areas of RAM most of the time, RISC-V assumes a relaxed memory model, which requires explicit synchronization when needed.
A fence instruction provides an ordering guarantee between memory accesses before and after the fence. The arguments describe:
- the predecessor set: kinds of accesses by prior instructions that must be completed before fence
- the successor set: kinds of accesses by subsequent instructions that must not start before the fence
The kinds of accesses are:
- R: "read memory"
- W: "write memory"
- I: "device input"
- O: "device output"
E.g. fence rw, w
guarantees that all reads and writes by preceding
instructions appear completed before this instruction and any reordered writes
by subseqent instructions must wait until this instruction.
Note: reads by subsequent instructions can happen before this fence.
A fence.i allows to synchronize RAM data-access and instruction-access. E.g. if one hart writes instructions to RAM and another executes them, fence.i guarantees that preceding stores by one hart become visible to instruction fetches from another hart after.
Encoding
Stores are in S-type format:
instr | imm[11:5] | rs2 | rs1 | funct3 | imm[4:0] | opcode |
---|---|---|---|---|---|---|
sb | 000 | 01 000 11 | ||||
sh | 001 | 01 000 11 | ||||
sw | 010 | 01 000 11 |
The following instructions are in I-type format:
instr | imm[11:0] | rs1 | funct3 | rd | opcode |
---|---|---|---|---|---|
lb | 000 | 00 000 11 | |||
lh | 001 | 00 000 11 | |||
lw | 010 | 00 000 11 | |||
lbu | 100 | 00 000 11 | |||
lhu | 101 | 00 000 11 |
instr | imm[11:0] | rs1 | funct3 | rd | opcode |
---|---|---|---|---|---|
fence | 0000 pred succ | 00000 | 000 | 00000 | 00 011 11 |
fence.i | 0000 0000 00000 | 00000 | 001 | 00000 | 00 011 11 |
Least significant byte looks like:
03
/83
for loads23
/A3
for stores0F
for fences
TODO: clarify encoding of pred/succ masks.