iota
NOTE: this is an old draft, the current design document is at https://dmytrish.net/lib/iota/
A low-level language aiming to be small and principled:
- Reifies abstractions from full linear/ordered dependent typing to the actual bare-metal bytes manipulation
- Nothing should be forced to be
unsafe
at any possible level of compilation or execution, as long as you can or want to prove safety. - A limited, ascetic and opinionated set of language features
- Unambiguous, simple, minimalistic syntax; context-free grammar
- Having nested dot-namespaces and context-aware completion is a must.
- Make everything (threads, stacks, allocators, effects, assumptions) as explicit as possible, while keeping it not too verbose.
- Every bit of data is transparent, structured and accounted for. Its meaning can be erased only explicitly.
- Everything that happens at runtime can be reflected on. A systems language is more than just a compiler blueprint.
Inspired by:
- Golang: syntax, syntactic minimalism
- Rust: safe memory management, traits, namespaces
- Zig: compile-time evaluation, explicit allocators.
- Swift: witness tables, layout-agnostic structs.
- Austral: linear types, capabilities, runtime minimalism
- Lean: proofs, properties, standard library, syntax?
References:
Syntax
Go-style, minimalistic (no semicolons, less parentheses, less special characters). AST maps into const-datastructures that can be manipulated by const-functions to enable metaprogramming.
# a one-line comment.
# language is built around databags, which are like Lua tables.
bag_example = (
key1 = value
/*0 = */ val
nested_bag = (one = 1, two = 2) # if in one line, use commas.
)
# <name>.iota files are databags themselves. Directories are databags.
# functions
sqr [T: has_mul] (x: T): T = { x * x }
s1 = sqr 12 # eagerly, comptime evaluated to 12
s2 = { sqr 12 } # curly braces make it lazy
less_than (y: int) (x: int) : bool = { x < y }
less_than_42 = less_than 42 # func(x:int): bool { x < 42 }
But: Go syntax is not really minimalistic when it comes to lambdas and complex, Lean-level details. Application by juxtaposition is good, parenthesis are not necessary. Can Haskell/Lean syntax be made less alien?
The t: T
syntax for the typing relation is too entrenched to try to express this any other way?
Semantics
Dimensions of the language:
- before-time / const-time / run-time
- capabilities: const-time (e.g. async) / run-time capabilities
- the data language. Data notation. Code is data. Run-time data. Encoding vs schema.
- value aspects: types, traits, lifetimes and linearity, mutability (?).
- source code: unstructured text, semistructure (i.e. "json"), structure (i.e. AST + modules/packages), richer-then-Lisp homoiconicity based on databags. Progressive semantic enrichment: text $\rightarrow$ structure $\rightarrow$ AST (+tags) $\rightarrow$ const-modules. Attribute grammars / ECS-style AST.
- run-time model: packages, dynamic capabilities, memory management (allocations). Error handling.
- practicalities: complexity budget, familiarity, syntax, tooling, testing.
- scaling up and down: easy to start.
Evaluation
The language is stratified into:
- the before-time, everything fixed about the language: AST entities, the full list of capabilities, keywords, kinds, prototypes of builtin types, etc. Schemas for types: U8, UPtr, Array, Dict, AST nodes, EnumCapabilities, Fn0, Fn1, ... ?
- the compile-time, types, AST/metaprogramming, a subset of "lifted" run-time functions and executable semantics. Executable layout/segments/static linking. Dynamic type tables. Simple non-programmable C-style "metatypes", sufficient to program generics.
- the run-time: values, functions, struct values, array values, heap/stack, threads/async, allocators.
Run time can be compiled for any host environment, compile-time should be on WASM/RISCV emulation (like cksystemsteaching/selfie/).
A run time unit is a package. A package is a compiled library that can also have an entry point to run as an executable. Packages are units of translation. Packages are linked (dynamically too?) to run processes. Packages get dynamic capabilities from packages that import them. Steal from Swift: type witness tables and layout-independent interop.
A package consists of static logical namespaces called modules. Modules get their capabilities from their package into their contexts and are statically verified. A module that gets a proc.memory
capability should be considered unsafe (like unsafe code in Rust): it can violate memory and therefore capability safety. Modules are like Haskell monads or an effect system: they group functions with similar effects (?).
Interop
Goals:
- make C ABI interop rich and powerful (so it works well)
- make C headers interop easy, almost seamless technically (~
@ cImport
). - make it complex and demanding, proof-heavy when imported (lots of additional static information)
- Rust interop via
extern "interop_abi"
? - WIT from WASM?
- Q: GC-based languages
Entities
-1. propositions
Propositions are like Prop
in Coq: they describe some properties of other entities and their relations.
???
0. functions
Functions arguments should be clear. The annotation clutter (types, resources/lifetimes, mutability, ref/use-borrow/own, traits, etc) can be moved out someplace else.
"functions as abstractions" (typed values) vs "functions as executable code" (resources that consume other resources like stack and executable segment, use and provide capabilities, etc). A stack frame as basically a struct
value (also: good for async and reified state machines). Threads of executtion are explicit.
Runtime relies on heavily inlined functions:
- arithmetics is functions
- struct field access is functions
- array access is functions
- ...
1. pure values
Some values are pure, "platonic", they carry meaning, type and properties, but do not have a fixed memory encoding (i.e. size, alignment, etc). E.g.
Booleans: Bool = True | False
Booleans operations:
(b: Bool).not() -> Bool
(b1: Bool).or(b2: Bool) -> Bool
(b1: Bool).and(b2: Bool) -> Bool
- ...
Builtin numbers:
Bit = 0 | 1
num.Int
: ..., -1, 0, 1, .... A pure number, without run-time representation guarantees; hidden inlang.num
by default.Int.zero(): Int
.(n: Int).succ(): Int
Int.one(): Int = Int.zero().succ()
(m: Int).add(n: Int): Int
(n: Int).twice(): Int
(n: Int).exp2(): Int
Builtin witnesses:
(num: Type).rangeFrom(start: num)
: a refinement type.(num: Type).rangeTo(end: num)
: toend
non-inclusively.
Aliases:
num.Nat: Type = Int.rangeFrom(Int.zero)
num.Fin: (n: Nat) -> Type = Nat.rangeTo(n)
bin.Num: (n: Nat) -> Type = Fin(n.exp2())
2. memory resources: cells and places
A cell: a piece of memory made of a size: Nat
of octets of Bit
s (assumption: octets of bits).
# Addr is an abstract identity of a Cell:
const Addr
func (addr: Addr).space: AddrSpace
func (a1: Addr).is(a2: Addr): Bool
const Cell
# TODO: it is a Resource on Memory
func (p: Cell).addr: Addr
func (p: Cell).size: Nat
func (p: Cell).parent: Resource # parent Cell?
func (p: Cell).manager: Maybe(Allocator)
Addr
is a member of a set of type AddrSpace
. It can be many things: a bare bin.Num
as a C-style pointer, a pair of (AddrSpace, bin.Num)
, a 16-bit x86 pair of (Segment, U16)
, a CHERI memory handle (TODO), a hash for a content-addressable store (?), etc.
Process memory segments are Cell
s without values (but e.g. unused .bss
can have a witness that all its bits are zero). They are narrowed down to smaller and smaller non-overlapping Cell
s used as Place
s for values of specific types. Cells are linear resources.
A place is a combination of a cell + a value type + a binary encoding of values into sequences of bits.
with (
size: Int,
ty: Type,
encoder: ty -> Seq(size, U8)
decoder: Seq(size, U8) -> ty,
) {
const Place(ty)
with (p: Place(ty)) {
p.cell(): Cell
p.get(): ty
p.ref(): Ref(ty)
do p.set(value: ty) # linear action, consumes and produces
}
}
Q: should a Place
<-> Type
relation be 1:1? Or: a Cell
can be a part of many Place
s?
2. arrays
Arrays are fixed-size resources, laid out in contiguous memory (possibly with stride != item.size
?).
No access by bare Nat
indexes. Indexes should be opaque and be/have witnesses that they are valid and more static metainformation.
test {
let a = do mem.Array(Int, 5)
# if statically known size, compiler can verify this (implicit witness?):
do a.at(1) = 43
#
do a.at(5) = -1 or error("out of bounds")
}
func (a: Array) Reverse() {
var i0 = a.first() # witness: i0 < a.len
var i1 = a.last() # witness: i1 < a.len
while i0 < i1 { # witness: i0 < i1
a.swap(i0, i1) # since i0 < i1, these are different places
do i0.next() # since i0 < a.len, no need to handle OOB
do i1.prev() # since i0 < i1, no need to handle OOB
}
}
TODO: Accomodate Structs of Arrays naturally.
3. structs
Struct are less important and more obfuscated than in many other low-level languages. The notion of a composite value (a typed record) and memory resource holding it (Cell
) is explicitly separated.
Structs do not necessarily live in contiguous blobs of memory (e.g. SOA). Abstract references that do not necessarily correspond to a pointer. Access happens via accessors that are efficiently inlined.
4. traits
Can add complexity, but very useful to keep dot-namespaces flexible and extensible. Not sure how to fit them into the language yet.
Types
a) Either full dependent typing with universes (comptime "stages" to sound less bombastic) or b) limited or no type expressivity in compile time itself (?). const-eval should have a fixed set of simple C-like types (ints; chars; structs; fixed arrays?). They are projected into run-time and expanded by any user-defined types.
Yes, structs in compile time + associated methods, because compile time must have capabilities.
Type system: linear types/generics/traits + metaprogrammable, like in Zig.
1. values and functions
2. witnesses and properties
???
Resources
Resources are provided by the execution context. They are linear and should be recycled exactly once.
Capabilities
Dynamic capabilities: run-time objects that allow specific run-time actions. Structured linear collections of values + value-specific actions + limits on actions?. $$ Capability = (object, action, limits)$$ Capability objects usually allow narrowing down to a narrower capability (either consuming or uniqrefing their parent capability?). Or a narrower capability can be split from the parent (e.g. a buffer from a buffer).
Static capabilities:
- rewriting code in a scope (e.g. async) during compilation
- ensuring that some dynamic capability cannot be used in a scope (e.g. no
core.mem
, nocore.alloc
, nocore.panic
, noos
).
It only makes sense for non-native, emulated/JIT execution. Already compiled native code can use them, but should not be trusted not to use other capabilities.
-
proc.memory
: object: the entire process memory; actions: peek, poke, transmute. This is roughly unsafe Rust. This capability is special: having random memory access capability means having the ability to get all other capabilities as well. Can objects be reduced to specific buffers? -
proc.alloc
: object: heaps and their memory; actions: malloc, realloc, free. -
proc.panic
: object: thread, action: exit/abort/atexit; not giving this to a library gives you#[no_panic]
in Rust. -
proc.mmap
: object: process memory segments, actions: mmap/munmap, listing segments of the current process and libraries, managing JIT segments, etc. -
proc.lib
: object: shared libraries, actions: list, search, dlopen/dlsym/... -
proc.thread
: objects: pthreads; actions: spawn, kill, synchronize; -
proc.async
: green threads, async/await? This is an interesting case: importing this capability changes how compiler generates code ("const-capability"?). -
os.io
: objects: stdin/stdout/stderr, actions: read/write/close. -
os.fs
: a list of directories + actions on them; ability to open/read/write files. -
os.proc
: interacting with other processes: getpid, fork, exec, kill, pipes, popen, ptrace; IPC? -
os.time
: a dict of clocks (walltime, monotonic, high-precision, etc) and timers -
os.signal
? -
os.user
: objects: UIDs/GIDs, actions: getuid/setuid/setgid/... -
net.dns
: object: DNS domains or wildcards, actions: resolving DNS records -
net.ip
: object: IP endpoints, actions: gettingip addr/ip link
info? -
net.tcp
: object: a TCP endpoint (IP endpoint + TCP port), actions: connecting, send/recv -
net.udp
: same asnet.tcp
-
gpu.???
: access to GPU contexts, draw calls, textures, shaders, etc. -
cap.manage
: managing/introspecting capabilities themselves?
Static capabilities.
- async: allows rewriting code of a scope into state machines at compile time.
- no panic: statically guarantees that no panics can come out of this scope.
- no alloc: statically guarantees that no allocations can happen in this scope.
- safe memory: statically verified absense of direct
core.mem
usage.
References:
Open questions:
Safe code built on unsafe code. In other words, interplay between capabilities and modules/packages. E.g. std.Rc
must use proc.memory
capability.
- but it's in
std
, can receive this authority viastd.init()
or something. Then it decides which modules getproc.memory
? - Modules get a capability statically: capability might be present in a loaded package instance, but statically verified to not be used in a specific module.
- Getting a capability via memory introspection: possible using
proc.memory
.
Capabilities might be indirect:
- writing to a file on NFS causes network requests;
- ability to write
/proc/self/mem
means havingcore.mem
(no language is truly memory safe on Linux!). - Opaque data can smuggle dynamic capabilities through a module that prohibits this capability (e.g. passing a reference to an allocator through a no-alloc module)?
- Any piece of code with access to capabilities can implement a Turing-complete interpreter that makes static analysis of effects undecidable in general and then provide its capabilities to third-party code indirectly. Is static analysis of capabilities feasible at all? How limited a statically-verifiable version would be?
Run time
Values:
// byte := uint8('h')
let byte = 'h' : U8
let x = 12 : I32
let pi = 3.14 : F32
Structs/records:
// shortcut (?): struct Point {
const Point {
x: F64
y: F64
}
// shortcut: `impl Point` (?)
Point = trait {
radial = (&self) func() {
return .x
}
}
Minimalistic bootstrapping using proc.memory
capability.
Memory management
Run-time behavior depends on given capabilities. Modules can be allocating or non-allocating. A package can request proc.alloc
, then it is considered alloc-dependent. A sum of dependent package capabilities must be computed to ensure that a package is non-alloc (or e.g. non-panic). proc.alloc
can be a heap memory allocator or a garbage-collecting allocator. Each package (module?) instance can use its own allocator.
Linear types + defer for resources that must be freed, including heap allocations?
Low-level, static memory management (i.e. ~ Rust core
) in const-eval?
Compile-time
Heavily unified around const
functions and values, Zig-style, with possible shortcuts.
The basic building block is const-functions, compile time functions. Modules are const-functions; imports are const-calls of modules. Types are const-values.
Compile-time evaluation: WASM interpreter? Constrain to terminating evaluations only?
const-data language
Const-data language should be (at the same time!)
- a subset of the runtime language to be familiar (?)
- it would be nice to have it more dynamic and declarative than the run-time language. A Jsonnet-like data language?
Q: Are these goals compatible at all?
Parenthesis are a "databag" structure: an ordered associative array (like a Lua table, possibly with implicit indexes).
(
/*0 = */ 12, // a literal, with implicit key 0
key = value, // explicit pair of const-literals "key", "value"
)
Function argument list is a bag at comptime. Record definitions is a bag. So are braces and source code files? Zig: files are just const structs.
Modules
Iota | Go | Rust | Haskell | |
---|---|---|---|---|
? | "pkg.go.dev" | "crates.io" | a global root | |
other namespaces | ? | "github.com/", etc | cargo configurable? | cabal configurable? |
"project" | ? | module, "go.mod" | a cargo workspace or project | a cabal unit |
"library" | ? | package | a target in a project | a sub-unit |
"binary" | ? | package main | a target in a project | a sub-unit |
Translation unit | ? | =package | =crate | =file |
module namespaces
Go-style, one implicit global root like "pkg.iota.dev/", optionally other roots like "github.com/org/". These are root packages that include other packages recursively? The universe of packages is just a lazy bag, like in Nix?
A package file tree of a package my-pkg
:
my-sub-pkg/
package.iota
...
assets/ # static resources
build/ # like target/ in Cargo
docs/
deps/
github.com/org/package/import.iota
github.com/org/pkg/subpkg/import.iota
examples/
exe/ # like cmd/ in Go, app/ in Cabal ?
binary1.iota
lib/ # like pkg/ in Go
pubmod1.iota
pubmod1.test.iota
pubmod2.iota
src/ # like internal/ in Go
internal1.iota
test/ # like tests/ in Cargo?
package.iota
It is equivalent to a table:
(
package = @import("package.iota")
lib = (
pubmod1 = @import("lib/pubmod1.iota")
pubmod2 = @import("lib/pubmod2.iota")
)
exe = (
binary1 = @import("exe/binary1.iota")
)
src = (internal1 = @import("src/internal1.iota"))
deps = (
`github.com`/org/pkg = @import("deps/github.com/org/package/import.iota")
)
...
)
What's a package?
- a const-value of:
- metadata
- name, description, url, license, etc
- dependencies
- list of its modules
- metadata
- a const-trait that can
build()
/test()
/etc its values?- iotac executable == "iota const evaluator". iota-"cargo" is just a library.
$ iotac build --target=binary1
== run const-fnpackage.build(target = binary1)
$ iotac test
== run const-fnpackage.test()
- ...
package.iota
is a go.mod
-like project file, an entry point to building. A mix of cargo.toml, Deno's main file, go.mod, etc.
// my_package = package(
// package my_package(ctx: Context) // shortcut
// like cargo.toml,
// key `name` set to `my_package`:
name = "my_package"
description = "foos bars"
url = "author/lib" // if uploaded to the main registry
dependencies = (
"github.com/org/lib" = (version = "v0.1"),
)
require = (
// the list of capabilities required from the environment
)
// Q: how to describe build targets and tests?
// Q: how to include modules?
// like `mod mod1;` in Rust, Go-style syntactically?
// end
binary vs library
Entry points require a uniq ref to a process resource (borrowed from the OS).
module definition
// shortcut: module main(ctx: Context) (...) { ... }
const main module(ctx: Context) (
doc = "module documentation goes here?"
// public definitions, module "signature"
) is
// imports
const std = ctx.import('std')
// private definitions, implementations
end // closes is, optional if EOF
module imports
// full form
self.import(
"std" // std = "pkg.iota.dev/std"
"github.com/org/lib" // lib = "github.com/org/lib"
rename "name"
)
module contexts
The context of a module defines the list of accessible static capabilities.
An executable gets the root dynamic capability from the environment (e.g. from the ELF loader or another loader middleware).
A library gets its capabilities from its importer.
Standard library
logic.bool
The basic building block for data, the two-valued inductive type bool
. Will become a machine bit.
logic.prop
The basic building block for proofs, types and reasoning about code.
logic.pair
The basic constructor for complex heterogeneous data. It's a building block for records.
logic.list
Type-wise: the basic constructor for homogeneous data. It's a building block for arrays, vectors, etc.
logic.lambda
Type-wise: structural expressions for untyped lambda, which are used later by a prover to manipulate terms. The basic building block for describing semantics of (unrestricted, not linear) code.
Q: should this be dependently-typed right away?
logic.linear
Augments logic.lambda
with linear typing. It's a basic building block for describing resources, e.g. heap allocations and memory segments.
Describes the algebra of resources: allocation, single use, deallocation.
logic.ordered
Augments logic.lambda
and logic.linear
with an ordered type system (see Advanced Topics in Types and Programming Languages by Pierce). A building block for describing stacks and stacked contexts.
Describes the algebra of stacked resources, e.g. lifetimes.
logic.nat
, logic.num
, logic.fin
, logic.field
Basic Peano numbers to be used in proofs. Their properties.
Finite enumerations and sets. Especially of size $2^n$.
(Finite?) fields and their properties. Especially on $2^n$-sized sets (Zig-like u2
, u3
, ..., u8
, u9
, ...).
logic.array
vec n T
in dependent types.
A fixed-size sequence of items of some type, indexed by fin
.
A special case: $2^n$-sized arrays (useful, because can be totally indexed by a bit sequence).
data.byte
, data.word
Describe fixed-size sequences of bool
s: 8 bool
s for byte
, 32/64 bool
s for word
.
Defines the usual CPU operations on them (shl
, shr
, not
, and
, ..., ).
Defines $\tt{u8 } = \tt{finfield}(+, 2^8)$, and a mapping $byte \iff \rm{u8}$
The same for u64
as $\tt{word} \iff \tt{finfield}(+, 2^{64})$.
Describes two-complement mapping $[-128 .. 128) \iff \tt{i8}$ and $[-2^{63} .. 2^{63}) \iff \tt{i64}$. Operations on signed numbers.
data.utf8
Describes a sequence of byte
s which is a correct UTF8-encoded Unicode-ish string. A basic building block for symbols?
data.buffer
Describes a linearly typed array of byte
s which can be read and written into (for byte
, word
, maybe utf8
). Algebra of linearly typed arrays (ownership, splitting, swaps, etc).
Special case: $2^n$-sized buffers.
data.bag
(Q: what are symbols? Are they just utf8
byte sequences? Are they (opaque) word
s, e.g. pointers/hashes? Any additional properties on them?)
Describes a JSON-structured data type, bag
.
<bag> ::= unit /* json: null */
| bool /* json: bool */
| <nat> /* json: num */
| <utf8> /* json: str */
| ([<sym> =] <bag>, ...) /* json: object/array */
A bag a known-size associative array with a known set of symbols/nats as keys. Order of key-value pairs is preserved. Keys can be implicit, filled by incremented nat
s, like in Lua.
Bags are values, their bit representation is not defined. Other things can be coerced to bags (records, any inductive data type structurally?).
data.parse
A small parser-combinator library.
lang.syntax
Describes the language:
- tokens: keywords, literals, symbols,
(
,)
, ... - AST structure as an inductive data type (on top of token bags?). Each node has a bag of attributes.
lang.parser
Uses data.parse
to build an AST from a utf8
byte sequence.
lang.binary
An interface to serialize/deserialize bags+types.
lang.record
A tuple <a bag
of data, data type>. Generic (serde-like) serialization/deserialization interface.
lang.struct
Mapping (binary serialization/deserialization implementation) of a record
onto a linear buffer
.
lang.func
, lang.coro
?
A func
is a lambda that actually consumes/produces resources (stack frames, thread time, etc).
Describes how lambda values become machine resources and run: stack frames, returns, calls, ....
Calling conventions?
A coro
is an executable state machine.
Q: closures?
core.mem
Provides: core.mem
capability. Reading most (?) properties is safe (? but: ASLR). Destructive (i.e. write) changes are equivalent to having any process-wide capability.
But: not every capability depends on core.mem
(e.g. os.fs
, os.user
, net.ip.block
are limits external to a process).
Objects:
- address spaces: usually one, with its traits (32-bit, 64-bit, etc), but e.g. code space, data space for Harvard architecture.
- memory segments: a set of $(offset, size)$ mappings with their own traits and allowed operations. Segments can grow (e.g. heap, stack). Can be executable, writable. Can be created/deleted (e.g. mmap/munmap, MMU operations). Q: are these just special, root memory regions?
- memory regions (places?) (~ a resource that can hold a representation of a value, possibly complex one). Type encodings and transmutation, memory layout, alignment. Properties: offset, size, stride (may differ from size), read/write, volatile, executable, etc. Access patterns (random access, sequential access, etc).
- references: a way to refer to a memory region for some purpose (e.g. reading, writing, managing, executing instructions). Operations: peek, poke. Explicitly shy away from pointers, especially as integers.
A remark about Rust: Rust does a very good job when relying on its set of assumptions (e.g. one address space, nonvolatile memory regions, internal ABI, pointers are integers), but is literally unsafe
when stepping outside (e.g. lifetimes on JIT-compiled functions, memory segments going away underneath "safe" references, some other ABIs, NUMA, etc). Let's bring more clarity and explicitness here.
core.alloc
Requires: a core.mem
capability.
Provides: a core.alloc
capability, negation of core.noalloc
Objects:
- allocators: objects that can create/resize/consume memory places as resources.
Allocators can be created from a memory region and recycled themselves by their memory segment or parent region (e.g. enabling munmap
). Allocators should be explicit (at least on a module level) and nudge to allocate memory closer to callstack roots, unlike the global allocator in Rust.
Allocation is fallible. Every allocation either should be handled, or means possibility of panic.
Allocated places are bound by the lifetime of their allocator
Allocators can be garbage-collecting, with no-op "freeing".