iota - design

capybara picture

This is a draft design for a programming language/environment.

aspirations

A low-level language aiming to be small and principled:

  • A limited, ascetic and opinionated set of language features. Unambiguous, simple, minimalistic syntax; context-free grammar. Still richer than Forth/Lisp/Smalltalk.
  • Having data notation, generic maps, nested dot-namespaces and context-aware completion is a must.
  • Make everything (threads, stacks, allocators, effects, assumptions) as explicit as possible, while keeping it not too verbose. Capabilities and contexts instead of ambient authority.
  • But: try to keep complexity gradual, scale down.
  • Reifies abstractions from full linear/ordered dependent typing to the actual bare-metal bytes manipulation. Nothing should be forced to be unsafe at any possible level of compilation or execution, as long as you can and want to prove safety.
  • Everything that happens at runtime can be reflected on. A systems language is more than just a compiler blueprint. Every bit of data is transparent, structured and accounted for. Its meaning can be erased only explicitly.
  • Think about async, fallibility and diagnostics upfront.

layers

Progressive enrichment of semantics, staged run-time:

  • ground layer: plain text.
  • data layer: a notation/model for data (like EDN for Clojure). It's map-structured with additional amenities for Lean-style functional programming (think JSON/S-expressions, but rich just enough to write actual code, austerely).
  • schema layer: adding some rudimentary typing/schemas on top of data layer. Suitable for configs/serialization. Data should be usable both in-memory and when streamed somewhere else.
  • lambda layer: a Calculus of Inductive Constructions semantics able to encode Lean-style proofs. It is built on top of the data layer by adding grammar restrictions and executable/CoIC semantics. Resource management is not a concern.
  • systems layer: linearly/ordered-typed modification of lambda layer that is concerned with resource management (memory: allocations, references, borrow checking; input/output, async, threads). Can be compiled to a standalone executable, fast and close to metal, with the runtime environment still retaining a lot of metainformation and providing good reflection and dynamic linking (fatter than Rust, slimmer than Go runtime).

inspirations

  • Lean: proofs, properties, standard library, syntax?
  • Clojure: data notations, maps, rich homoiconicity.
  • Swift: witness tables, layout-agnostic structs, dynamic runtime.
  • Golang: syntax, syntactic minimalism
  • Rust: safe memory/resource management, traits, namespaces
  • Zig: compile-time evaluation, explicit allocators.
  • Austral: linear types, capabilities, runtime minimalism

references

Iota Data Model

At the beginning is text, a sequence of Unicode code points. It is shapeless and unstructured.

Let's define basic structure on top of it, called Iota Data Model

iod concepts

Atomic entities are:

  • numbers (TODO: unbounded nat or i64 or f64 or generic u N / i <N>?)
  • strings: sequences of Unicode code points
  • symbols: strings usable as names.

Pairs are <key> = <value> where <key>/<value> are any other entities (atomic or compound).

Maps are sets of pairs. They are generally unordered (? - although there are situations where ordering of keys matter). Keys are not required to be unique, the last one "wins". Keys can be explicit or implicitly assigned to autoincrementing numbers (?) or special symbols like $0, $1, ...(?).

Lists are ordered sets of entities as well: [], [a], [a, b].

  • unlike tuples, [a] is not equal to a.

Tuples are ordered sets of entities: (), (a, b), (a, b, c), ...

  • () is an empty tuple, unit in FP-speek.
  • (<entity>) is a singleton tuple, it's equivalent to <entity>.

Chains are ordered sequences of values: a b c. They are more like lists in FP, and can be iterated on.

Lookups are dot-separated sequences of entities, usually representing hierarchical access in nested maps.

Metadata, tags, annotations (TODO: choose one name): a : t

TODO: choice (|).

Iota Object Notation

ion ebnf

<sep> ::= <nl> | ',' | ';'
<nl> ::= '\n'

<str> ::= <json:str>
<num> ::= <json:num>

<sym> ::=
   | (* alphanumeric, with '_', '$', '\', digits *)
   | (* punctuation mix? *)

<item> ::=
   | '{' <choice> '}'       (* <map> *)
   | '[' <choice> ']'       (* <list> *)
   | '(' <choice> ')'       (* <tuple> *)
   | 'do' <seq> 'end'       (* <scope> *)
   | '.' <item>             (* <quote> *)
   | '$' <item>             (* <unquote> *)
   | <str>
   | <num>
   | <sym>

<chain> ::=
   | <item> <item>*                    (* whitespace, no newlines *)
   | <item> => <nl>? <chain>           (* short bind *)
   | 'with' <typed> '=>' <nl>? <chain> (* long bind *)

<typed> ::= <chain> (<nl>? ':' <nl>? <typed>)* (* type annotations *)
<choice> ::= <pair>? '|' <pair> ('|' <pair>)*  (* choice *)

<pair> ::=
   | '#'<item> <nl>? <pair>                  (* pre-meta *)
   | (<typed> <nl>? '=' <nl>?)? <chain>      (* "key-value" pairs *)

<seq> ::= <pair>? (<sep> <pair>?)*             (* toplevel *)

TODO: think of a syntax for Haskell-style a $ b $ c chain grouping.

iota syntax by example

comments

There is an piece of folklore on the internet that claims that syntax of comments of a programming language generates ~5x more discussions than syntax of the rest of the language (TODO: source). So, let's be as boring as possible here:

// this is a comment until the end of the line
/* an inline comment */

I have a suspicion that for some irrational reason ("this code with asterisks looks so pretty!"?) humans like asterisks so much that they are willing to put up with bare pointers as long as they look like asterisks. So asterisks are good. </joke>

(Q: comments should translate into metadata and be attached to nodes somehow?)

symbols

<sym> examples:

  • a, a0, hello_world
  • $a, $, hello$world
  • \to, \lambda
  • _, _1, _hello_
  • TODO: unicode?
// Symbols:
symbol
01234
hello_world
\alpha

strings and numbers

Self-evaluating atomic values.

TODO: check if JSON strings and numbers make sense. TODO: multi-line strings?

tuples

// tuples, ordered sequences of data:
()            // an empty tuple
(symbol)      // a one-element tuple, semantically equal to `symbol`
((symbol))    //   same as above
(a, b)        // a tuple of two values,
              //   or pairs, to be precise: (a = a, b = b)

pairs

Key-value pairs can be used for definitions and building maps:

// Pairs, `<key> = <value>`:
zero = 0
one = 1

a = b = c               // a single pair: a = {b = c}

"key is optional, this is a value and a key"

maps

Maps are namespaced partial functions from a key to a value:

// Maps, sets of pairs:
{ x = 12, y = 15 }    // a set of pairs
{
    x = 12            // newlines are separators
    y = 15
}

// empty maps:
{}
{,, , }        // an empty sequence inside
{  | |}        // an empty choice inside

// implicit keys are equal to their values:
{
    /* a = */ a
    /* b = */ b
}

{ { zero, one } = x }   // keys can be maps themselves

// more examples:
{ {} = {} }          // key `{}` -> value `{}`
{ {} }               // implicit key `{}` -> value `{}`
{one = 1, two = 2}   // symbol `one` -> 1, `two`-> 2
{a, b}               // implicit keys: a -> a, b -> b

(application) chains

This is inspired by Haskell: applying a function to something should be just a (left-associatve) juxtaposition. Juxtaposition sounds too clever, so Iota calls these (application) chains instead:

// [application] chains, `<item> <item>*`...:
f x y               // means `((f x) y)`

This is a very flexible syntax primitive, use with care.

dots and quoting

Dots are just a way of quoting something, like 'a or (quote a) in Lisp.

// quoting and unquoting:
.a             // just a symbol `a` that does not refer to anything
.{}            // a quoted map

A nice way to use this is for namespacing or "dot access" of values in a complex structure:

// namespacing and accessing a map:
m = { a = b = 2 }
.a.b          // application of `.a` to `.b` produces a <path> `.a.b`
m.a.b         // a path can be applied to a expression 
              //   to access something by key, "project" it.
              //   (m.a).b == m.(.a.b) = m.a.b
m .a .b       // i.e. `(m .a) .b`, it's just an application, see
m[.a][1]      // i.e. `(m [.a]) [1]`,  "index" something by something
m.(a, b)      // i.e. `m .(a, b)`, "project" two fields at once

// but: in Lean `m.f a.b` means `(m .f) (a .b)` and it's awesome.
//   Q: Should I steal this somehow?

// Q: map comprehensions?

When you say quote, you must then say unquote. Lisp uses (unquote a) or ,a for this, which looks pretty alien to any modern language. Modern developers are used to dollars to substitute something, instead, so why not use $<item> for unquoting:

.{             // any <item> can be quoted, e.g. maps
   a = $a         // `$<item>`  means unquoting, splicing, i.e. substitution from context
   b = $(f x y)   // any <item> can be spliced
}

// Q: is the following worth it?
//   Q: should it be unrestricted `$<item>` or `$(<chain>)` only?
s = "string"
"unquoted $(s)"         // ??? - use ION here?
."still quoted $(s)"    // literally "still quoted $(s)"?

specifying types

Typing relation (<term> : <term>) is special: in a dependently-typed language types are really a part of the term. They cannot be squeezed into some compiler metadata for expressions. They need respect and their own syntax:

// Typed expressions:
x : t             // means `x` has type `t`
a b : t u         // means chain `a b` has type `t u`
x : t : u         // i.e. `x` has type `t : u`

scopes

I have spent quite a bit of time trying to imagine a language built on maps. Map-based homoiconicity was supposed to be the central idea of this language experiment.

The problem with maps is that it does not make sense to make them ordered by default. This imposes too much of burden on serialization, efficient in-memory representation, etc. Maps with keys and independent ordering of pairs are not just simple partial functions anymore.

At the same time, it is crucial for source code "statements" (be it real statements or monadic bindings) to be represented sequentially and in some order.

Haskell's do-notation is just a chain of nested >>= calls with lexical scope. Its syntax hides this fact very cleanly.

echo = with {T} {ctx : Monad} : Monad T =>
   ctx.do getLine (a =>
      ctx.do getLine (b =>
         ctx.do (putStr "concat: ") (_ =>
            putStrLn (a ~ b))))

// can as well be

echo = do
   a = getLine()
   b = getLine()
   putStr "concat: "
   putStrLn < a.merge b
end

TODO: the types do not really make sense here, think of a better monadic/effectful semantics and evaluation.

Scopes start with do and end with end, with a sequential sequence of <pairs> inside. When seen as a data-structure they work roughly as

(<parent context>,
 (a = getLine,
  (b = getLine,
   (putStr "concat",
    putStrLn ((a .merge) b)))))

(late) bindings for names

Scopes are good, local names in them are good, but functions bodies are meant to be scopes with "holes" which are filled by late bindings to other values. This is a name without a value (yet), a parameter binding:

// Bindings are usually indicated with `=>`:

// Short binds, "lambdas", `<item> => <chain>`:
x => x.mul x            // binds a name `x` into a "body" chain `x .mul x`, when applied
(x => x.mul x) 12       // intuitively supposed to produce 144

x => y => x.mul y       // right-associative, i.e. `x => (y => (x .mul y))`
(x : Num) => x.mul x    // arguments can be typed

x => .{x = $x}          // the body chain can be one item, one item can be a (quoted) map

[out] => out.println "hello"   // the argument can be a tuple, a list, any <item>

// Long binds, `with <typed> => <chain>`
//   - allow to use many curried arguments
//   - allow to specify the return type:
//   - Lean-style
with (x: Num) (y: Num) : Num => x.mul y

interplay with other text formats

  • markdown
  • html/xml
  • Rust macros
  • URLs
  • yaml/json
  • unix shells
  • wat

ion and json

ion is a json superset

Given

<json:object> ::= '{' (<json:pair>)? (',' <json:pair>)* '}'
<json:array> ::= '[' <json:value>? (',' <json:value>)* ']'
<json:value> ::=
   | <json:object>
   | <json:array>
   | <json:str>
   | <json:num>
   | 'null'
   | 'true' | 'false'

prove that json is a subset of ion:

<json:array>
   \equals '[' <json:value>? (',' <json:value>)* ']'         (* TODO: <nl> *)
   \subset '[' <item>? (<sep> <item>)* ']'
   \subset '[' <chain>? (<sep> <chain>)* ']'
   \subset '[' <anno>? (<sep> <anno>?)* ']'
   \subset '[' <pair>? (<sep> <pair>?)* ']'
   \equals '[' <seq> ']'
   \subset <item>

<json:pair>
   \equals <json:str> ':' <json:value>
   \subset <item> ':' '<item>'
   \subset <chain> ':' <chain>
   \subset <anno>

<json:object>
   \equals '{' <json:pair>? (',' <json:pair>)* '}'
   \subset '{' <anno>? (',' <anno>)* '}'
   \subset '{' <pair>? (',' <pair>)* '}'
   \equals '{' <seq> '}'
   \equals <map>

<json:value>
   \cases
   | ('true'|'false'|'null')     \subset <sym>
   | <json:num>                  \equals <num>
   | <json:str>                  \equals <str>
   | <json:array>                \subset <list>
   | <json:object>               \subset <map>
   \subset <item>

mapping into json

<seq>   <=> [<json:pair>]

<pair>  <=> <json:chain>   // if no $key and no $meta
        <=> {
                "=": <json:chain>,
                "$key": <json:typed>,       // optional
                "$meta": <json:item>,       // optional
            }

<typed> <=> <json:chain>   // if no types
        <=> {
                ":": <json:chain>,
                "$type": <json:typed>,
            }

<chain> <=> <json:item>     // if one item
        <=> {
                " ": [<json:item>]     // non-empty
                "$bind": <json:item>   // optional
            }

<item>  <=> <json:str>
        <=> <json:num>
        <=> <json:sym>
        <=> <json:dot>
        <=> <json:tuple>
        <=> <json:list>
        <=> <json:map>

<map>   <=> {"{}": <json:seq>}

<list>  <=> {"[]": <json:seq>}

<tuple> <=> {"()": <json:seq>}

<dot>   <=> { ".": <json:item> }

<sym>   <=> ".<sym>"

<str>   <=> <str>   // if starts with a dot, must be \.

Examples:

[
// "hello"
"hello"

// hello
".hello"

// ".hello"
"\.hello"

// [1, 2, 3]
{"[]": [1, 2, 3]}

// (a)
{"()": [".a"]}

// {numbers, one=1, two=2}
{"{}": [
  {"=": ".numbers"},
  {"$key": ".one", "=": 1},
  {"$key": ".two", "=": 2},
]}

// x = 42
{"=": 42, "$key": ".x"}}

// f x y
{" ": [".f", ".x", ".y"]}

// f x : Nat .to Nat
{
  ":": {" ", [".f", ".x"]},
  "$ty": {" ", ["Nat", {".": "to"}, "Nat"]}
}

// inc (x: Nat) = x .add 1
{
  "=": {" ": [".x", {".": ".add"}, 1]},
  "$key": {
    " ": [
      ".inc",
      {"()": [
        {":": ".x", "$ty": ".Nat"}
      ]}
    ]
  }
}

// x => x .add 1
{
  " ": [".x", {".": ".add"}, 1],
  "$bind": ".x"
}

// #nat one = 1
{
    "=": 1,
    "$key": ".one",
    "$meta": ".nat"
}

]

mapping to S-exprs

; "hello"
"hello"

; hello
'hello

; .hello
''hello

; x = 42
'(:key x 42)

;; sequences map to lists:
; 1, 2, 3
'(1 2 3)

; [1, 2, 3]:
'(:list 1 2 3)

; (1)
'(:tuple 1)

; ()
'(:tuple)

; f x y
'(:chain f x y)

; f x : Nat .to Nat
'(:type (:app Nat .to Nat)
  :app f x))

; inc (x: Nat) = x .add 1
'(:key (:chain inc (:type Nat x))
  :chain x .add 1)

; x => x .add 1
'(:bind x
  :chain x .add 1)

; #nat one = 1
'(:meta nat
  :key one
  1)

Ion query language

Given a Ion tree, produce a subset.

projecting: any <seq> can be indexed by (a list of) keys

Seq-wrappers like map, list are "transparent":

let x = {a, b, c}           // implicit keys equal to values
let y = [a, "b", 1]
let z = {0=zero, 1=one, zero=0, one=1}

// indexing
x.a     // => .a
x[.a]   // => .a
x[.a, .b]   // => {.a, .b}

x[0]    // => _, bottom, a value that cannot be stored at runtime.
y[.a]   // => _
y[0]    // => .a
// TODO: tuples, "the default value":
()      // no values inside
(a)     // implicity a pair `a = a`, but must evaluate to `a`, how?

filtering?

TODO: looks at Haskell optics

Iota Binary

Sources of inspiration:

  • WIT canonical representation: concepts, binary encoding, opaque resources, structural typing
  • capnproto: messages, schema language with expressive types, RPC (?)
  • Apache Arrow columnar format: run-end encoding, binary dictionaries and lists, use of indirection.
  • FIDL and its binary encoding: tables, look for good ideas.

local in-memory representation

When data are used in some local context, they can rely on it for brevity: schemas, relative offsets, implicit meaning defined by a schema, etc. Data are not required to be identified by hashes, value hashes can be implicit and reconstructed when needed.

Local representation should allow for zero-copy access and ideally (append-oriented?) patching.

Considerations:

  • data locality, array-oriented way of storing (minimize separately allocated objects)
  • naturally mapped into the data model, if possible
  • minimize intrusive tags and metadata (tagging every integer/pointer is not an option, try to move as much as possible into context, use zero-sized types to carry context information in the compiler).

distributed context

Units of data must be fully self-contained: serializable, streamable, bringing their schema with them if needed.

Distributed contexts must be mergeable: two independent contexts on different machines must be easily mergeable into one context. That's why the distributed format must be content-addressable and encoded by Merkle trees ()

Inspirations:

maps

algebra of maps

Maps can be used as partial functions from key terms to value terms. Maps have their domain (i.e. a set of keys) and image (i.e. a set of values).

Maps are more complex than just partial functions, though: they can have linear/ownership semantics. Maps are objects + (linear?) lenses over their structure:

Maps are monoids: they have the neutral element {} and can be concatenated, producing a map (~ for concatenation? But, e.g. HOCON uses juxtaposition for monoidal concatenation):

Inspirations:

operations over values

Given an existing map, do something to its values without changing its set of keys ("structure").

// exactly one value cell:
m.a.b

// a prism to get zero or one cell:
m{.a.b}

// a prism to a reference to zero or one cell:
ref m{.a.b}

// a traversal to exactly two cells,
// (`val` uses an implicitly given allocator to make a copy):
#example
val {a=1, b=2, c=3}.(a, b) = {a=1, b=2}

#example
{a=1, c=3}.(a, b) = ERROR "b=?"  // static or dynamic error

// a prism to three values or less:
#example
{a=1, b=2, c=3}{.a, .b, .d} = {a=1, b=2}

// setting a value:
at m.a.b \set 42
// at <cell> <op> <args>...

// popping a value from a map:
//   a linear(?) lens that consumes and splits a map
v = at m \split k

operations over structure

  • projecting into a submap (also: by key)
  • replacing a submap (also: by key)
  • removing a submap (also: by key)

Constructing maps:

// an empty map
{}

// adding a pair to a map:
a' = a ~ {k: v}

// merging two maps:
m = m1 ~ m2
#example
{a=1, b=2} ~ {b=3, c=5} = {a=1, b=3, c=5}
// the last key wins

Monoidal concatenation of maps is abstract, it does not necessarily create a new flat map. It can be a kind of ECMAScript-like prototype chaining with a [[proto]] slot.

Additional operations:

#example
{a=1, b=2} \split .a = {b=2}

THINK: addressing keys vs addressing values, e.g.

// an iterator over all values < 2:
#example
{a=1, b=2, c=3}[v => v < 2] = ???

{1=a, b=2, c=3}.[k => ]

abstract value and self

(?) Every value v is equivalent to notation { v }, which is a shortcut for a pair of { self = v }. self is a special key which refers to the "default" value of the map. When a map is used as a value, it dereferences to its last self key.

TODO: it or self?

TODO: what is a map without self? something that cannot be constructed at runtime without a container?. If the maps does not have self, it cannot be used as a single value?

Properties:

  • ${\tt v} \equiv {\tt { v}} \equiv {\tt {{v}}} \equiv ...$
  • ${\tt v} \equiv {\tt v.self} \equiv {\tt v.self.self} \equiv ...$

self really looks like an identity lens.

TODO: think if this is in any way consistent.

THINK: can partiality be seen as {} instead of a value? {} becomes nil, it is smelly.

runtime reification

The notation {...} is an abstract comptime map. It can be used as a blueprint for a runtime value:

// given a runtime container type `HashMap`,
//   construct a HashMap from an abstract map:
m = HashMap { one = 1, two = 2, three = 3}

// Q: is this { self = HashMap, ... } ?

Q: empty runtime maps? HashMap {} is valid?

HOAS

Wiki: higher-order abstract syntax

When we have maps, we can have nested maps (where #<metainfo> <item>):

#outer {
  x = 1
  // #inner is not bound, it's the default value of #outer
  #inner {
    // Q: how this automatic chaining happens? Implicit `self = outer`?
    // here x is 1
    x = 2
    // here x is 2
    #last {
      x
      // currently bound x becomes the default value
      //   of both inner and outer map
    }
  }
  // here x is 1, but no one is able to see this
}

Nesting maps can be seen a sequencing mechanism. The inner map without a key can be seen as a temporary scope inside of it, chained to / inheriting the outer map. Lookups of symbol can use the current chain.

TODO: non-shadowable definitions, #const ?

scopes

The vanilla map notation is very awkward for this, though. Scopes are notations for sequences of nested pairs, each inheriting scope of its parent pair.

This scope corresponds to the example map above:

do
  x = 1       // the body of #outer
  // the value of #outer:
  x = 2       // the body of #inner
  // the value of #inner:
  x           // #last
end   // "evaluates" to 2

Scopes can branch: they can inherit the outer scope, but end up as a separate, non-default value:

do
  one = 1
  three = do two = 1; one + two; end
  three
end

<=>

_ ~  // outer context
{ one = 1 } ~
{ three = self ~ { two = 2 } ~ { one + two } } ~
{ three }

Multiple sequential default values:

do 1; 2; end

<=>

_ ~   // outer scope
{ /* self = */ 1 } ~
{
  // the last `self` binding wins due to monoidal concatenation:
  /* self = */ { /* self = */ 2 }
}

encapsulation

E.g. you have a helper or a value you don't want to expose:

(range: Nat) => (Rand) do
  num = randInt range  // unexported
  #export {
    guess = (n) => n == num
    range = range
  }
end

Importing multiple other things from other scope:

// sometimes you want to bring another scope in:
math = @include("math.iota")
at self ~ math.{
  ln, log2, sqrt
}

// or just import everything from another scope:
at self ~ @include("math.iota")

do-notation

aka "programmable scope".

Examples to think of:

// Implicit Identity monad:
do
  one = 1
  three = do two = 2; one + two; end
  three
end

// <=> [by monad expansion]
(ctx: Id _) =>
  Id.lift 1 \Id.bind one =>
    Id.lift (
        Id.lift 2 \Id.bind two =>
          Id.pure (one + two)
    ) \Id.bind three => Id.pure
      three

//   Id.pure : a -> Id a
//   Id.bind : Id a -> (a -> Id b) -> Id b
//   Id.lift : {Has Id m} -> Id a -> m a
// <=> (by definitons of Id.lift)

// (self : Id _) => { ...
(_: Id _) =>
  1 \Id.bind one =>
    (self \Id.bind
      (_: Id _) => {two = 2} ~ Id.pure { one + two }
    ) \Id.bind three =>
      Id.pure three

// <=> [ by def. of `Id.bind` ]
(_: Id _) =>
  { one = 1 } ~
  { three = self ~ {two = 2} ~ {self = one + two} } ~
  { three }

// <=> [no values from `Id` anymore]
{ one = 1 } ~
{ three = {two = 2} ~ { one + two }} ~
{ three }

// <=> [scope reduction]
{ {two = 2} ~ { 1 + two } }

// <=> [scope reduction]
{ 1 + 2 }
// State monad:
St = State
(St) do
  x = get ()
  x = x + 1
  put x
  x
end

// <=> [monad expansion]
(st : St _) =>
  // lift is closed over the runtime `st`:
  // St.bind does not, it goes straight to the type
  // TODO: think about comptime/runtime binding more.
  st.lift get () \St.bind x =>
    st.lift add x 1 \St.bind x =>
      st.lift put x \St.bind _ =>
        st.lift x \St.bind id

// <=> [expand `st.lift` to `St.lift st`]
(st : St _) =>
  St.lift st get () \St.bind x =>
    St.lift st (add x 1) \St.bind x =>
      St.lift st put x \St.bind _ =>
        St.lift st x \St.bind id

// <=> [State.mk, State.get, State.put]
State.mk (st: _ : Add) =>
  (st, st) \St.bind (x, st) =>
    (add x 1, st) \St.bind (x, st) =>
      ((), x) \St.bind (_, st) =>
        (x, st)

// <=> [State.bind]
State.mk st =>
  {(x, st) = (st, st)} ~
  {(x, st) = (add x 1, st)} ~
  {(_, st) = ((), x)} ~
  (x, st)

// <=>
State.mk st => {x = st} ~ {x = add x 1} ~ {st = x} ~ (x, st)
// <=>
State.mk st => { x = add st 1 } ~ (x, x)
// IO:
(IO) do
  x = read \ getLine ()
  putStr "the number is "
  print x
end

// <=> [monad expansion]
(io: IO) =>
  IO.lift io.getLine () \IO.bind $0 =>
    IO.lift read $0 \IO.bind x =>
      IO.lift io.putStr "the number is " \bind _ =>
        IO.lift (io.print x)

// <=>
// TODO
// An effect bag, (State, IO):
(State | IO) do
  x = get ()
  x = x + 1
  put x
  print x
end

// TODO

misc

  • interplay with XML attributes/children distinction?

Relations, Functions and Automata

The usual named functions can be defined in a style of logical relations:

sqrdist(x: num, y: num): num = x*x + y*y
sqrdist(x: num)(y:num): num = x*x + y*y
sqrdist
    : num -> num -> num
    = fn x y => x*x + y*y

apply42{t: Type}(f: num -> t): t = f 42

Lambdas are values:

fn x => x + 1
fn(x: num): num => x + 1

Priority of => is above = and below :

const-computations, comptime

Some parts of modules are executed:

  • at parse time: e.g. notations? Expanded before getting a HOAS tree.
  • at expand time: macros get expanded after having a HOAS tree, before starting compilation
  • at compile time: source code undergoes typechecking; the universe of types is expanded by definitions.
  • at runtime: it computes.

Macros and notations should not be reprogrammable on the fly; ideally, they should be fixed for a package and known beforehand when depending on it. No conditional notations (and macros?).

Q: should packages have a clearly separate section for parse/expand-time things? Q: how to delineate compile/run time contexts in the code?
Q: Should comptime operations be a subset of runtime operations? Can all types be projected into runtime? If so, how?

Resources

memory resources: cells and places

A cell: a piece of memory made of a size: Nat of octets of Bits (assumption: octets of bits).

# Addr is an abstract identity of a Cell:
const Addr
func (addr: Addr).space: AddrSpace
func (a1: Addr).is(a2: Addr): Bool

const Cell
# TODO: it is a Resource on Memory
func (p: Cell).addr: Addr
func (p: Cell).size: Nat
func (p: Cell).parent: Resource        # parent Cell?
func (p: Cell).manager: Maybe(Allocator)

Addr is a member of a set of type AddrSpace. It can be many things: a bare bin.Num as a C-style pointer, a pair of (AddrSpace, bin.Num), a 16-bit x86 pair of (Segment, U16), a CHERI memory handle (TODO), a hash for a content-addressable store (?), etc.

Process memory segments are Cells without values (but e.g. unused .bss can have a witness that all its bits are zero). They are narrowed down to smaller and smaller non-overlapping Cells used as Places for values of specific types. Cells are linear resources.

A place is a combination of a cell + a value type + a binary encoding of values into sequences of bits.

with (
    size: Int,
    ty: Type,
    encoder: ty -> Seq(size, U8)
    decoder: Seq(size, U8) -> ty,
) {
    const Place(ty)
    with (p: Place(ty)) {
        p.cell(): Cell
        p.get(): ty
        p.ref(): Ref(ty)
        do p.set(value: ty)     # linear action, consumes and produces
    }
}

Q: should a Place <-> Type relation be 1:1? Or: a Cell can be a part of many Places?

arrays

Arrays are fixed-size resources, laid out in contiguous memory (possibly with stride != item.size?).

No access by bare Nat indexes. Indexes should be opaque and be/have witnesses that they are valid and more static metainformation.

test {
    let a = do mem.Array(Int, 5)

    # if statically known size, compiler can verify this (implicit witness?):
    do a.at(1) = 43

    #
    do a.at(5) = -1 or error("out of bounds")
}

func (a: Array) Reverse() {
    var i0 = a.first()    # witness: i0 < a.len
    var i1 = a.last()     # witness: i1 < a.len
    while i0 < i1 {       # witness: i0 < i1
        a.swap(i0, i1)    # since i0 < i1, these are different places
        do i0.next()      # since i0 < a.len, no need to handle OOB
        do i1.prev()      # since i0 < i1, no need to handle OOB
    }
}

TODO: Accomodate Structs of Arrays naturally.

structs

Struct are less important and more obfuscated than in many other low-level languages. The notion of a composite value (a typed record) and memory resource holding it (Cell) is explicitly separated.

Structs do not necessarily live in contiguous blobs of memory (e.g. SOA). Abstract references that do not necessarily correspond to a pointer. Access happens via accessors that are efficiently inlined.

traits and objects

Can add complexity, but very useful to keep dot-namespaces flexible and extensible. Not sure how to fit them into the language yet.

memory management

Run-time behavior depends on given capabilities. Modules can be allocating or non-allocating. A package can request proc.alloc, then it is considered alloc-dependent. A sum of dependent package capabilities must be computed to ensure that a package is non-alloc (or e.g. non-panic). proc.alloc can be a heap memory allocator or a garbage-collecting allocator. Each package (module?) instance can use its own allocator.

Linear types + defer for resources that must be freed, including heap allocations?

Low-level, static memory management (i.e. ~ Rust core) in const-eval?

Resources, Contexts and Capabilities

Resources

Resources are provided by the execution context. They are linear and should be recycled exactly once.

Capabilities

Dynamic capabilities: run-time objects that allow specific run-time actions. Structured linear collections of values + value-specific actions + limits on actions?. $$ Capability = (object, action, limits)$$ Capability objects usually allow narrowing down to a narrower capability (either consuming or uniqrefing their parent capability?). Or a narrower capability can be split from the parent (e.g. a buffer from a buffer).

Static capabilities:

  • rewriting code in a scope (e.g. async) during compilation
  • ensuring that some dynamic capability cannot be used in a scope (e.g. no core.mem, no core.alloc, no core.panic, no os).

It only makes sense for non-native, emulated/JIT execution. Already compiled native code can use them, but should not be trusted not to use other capabilities.

  • proc.memory: object: the entire process memory; actions: peek, poke, transmute. This is roughly unsafe Rust. This capability is special: having random memory access capability means having the ability to get all other capabilities as well. Can objects be reduced to specific buffers?

  • proc.alloc : object: heaps and their memory; actions: malloc, realloc, free.

  • proc.panic: object: thread, action: exit/abort/atexit; not giving this to a library gives you #[no_panic] in Rust.

  • proc.mmap: object: process memory segments, actions: mmap/munmap, listing segments of the current process and libraries, managing JIT segments, etc.

  • proc.lib: object: shared libraries, actions: list, search, dlopen/dlsym/...

  • proc.thread: objects: pthreads; actions: spawn, kill, synchronize;

  • proc.async: green threads, async/await? This is an interesting case: importing this capability changes how compiler generates code ("const-capability"?).

  • os.io: objects: stdin/stdout/stderr, actions: read/write/close.

  • os.fs: a list of directories + actions on them; ability to open/read/write files.

  • os.proc: interacting with other processes: getpid, fork, exec, kill, pipes, popen, ptrace; IPC?

  • os.time : a dict of clocks (walltime, monotonic, high-precision, etc) and timers

  • os.signal ?

  • os.user: objects: UIDs/GIDs, actions: getuid/setuid/setgid/...

  • net.dns: object: DNS domains or wildcards, actions: resolving DNS records

  • net.ip: object: IP endpoints, actions: getting ip addr/ip link info?

  • net.tcp: object: a TCP endpoint (IP endpoint + TCP port), actions: connecting, send/recv

  • net.udp: same as net.tcp

  • gpu.??? : access to GPU contexts, draw calls, textures, shaders, etc.

  • cap.manage: managing/introspecting capabilities themselves?

Static capabilities.

  • async: allows rewriting code of a scope into state machines at compile time.
  • no panic: statically guarantees that no panics can come out of this scope.
  • no alloc: statically guarantees that no allocations can happen in this scope.
  • safe memory: statically verified absense of direct core.mem usage.

References:

Open questions:

Safe code built on unsafe code. In other words, interplay between capabilities and modules/packages. E.g. std.Rc must use proc.memory capability.

  • but it's in std, can receive this authority via std.init() or something. Then it decides which modules get proc.memory ?
  • Modules get a capability statically: capability might be present in a loaded package instance, but statically verified to not be used in a specific module.
  • Getting a capability via memory introspection: possible using proc.memory.

Capabilities might be indirect:

  • writing to a file on NFS causes network requests;
  • ability to write /proc/self/mem means having core.mem (no language is truly memory safe on Linux!).
  • Opaque data can smuggle dynamic capabilities through a module that prohibits this capability (e.g. passing a reference to an allocator through a no-alloc module)?
  • Any piece of code with access to capabilities can implement a Turing-complete interpreter that makes static analysis of effects undecidable in general and then provide its capabilities to third-party code indirectly. Is static analysis of capabilities feasible at all? How limited a statically-verifiable version would be?

Packages and Libraries

IotaGoRustHaskell
global default namespace?"pkg.go.dev""crates.io"a global root
other namespaces?"github.com/", etccargo configurable?cabal configurable?
"project"?module, "go.mod"a cargo workspace or projecta cabal unit
"library"?packagea target in a projecta sub-unit
"binary"?package maina target in a projecta sub-unit
Translation unit?=package=crate=file

module namespaces

Go-style, one implicit global root like "pkg.iota.dev/", optionally other roots like "github.com/org/". These are root packages that include other packages recursively? The universe of packages is just a lazy bag, like in Nix?

A package file tree of a package my-pkg:

my-sub-pkg/
  package.iota
  ...

assets/           # static resources
build/            # like target/ in Cargo
docs/
deps/
  github.com/org/package/import.iota
  github.com/org/pkg/subpkg/import.iota
examples/
exe/              # like cmd/ in Go, app/ in Cabal ?
  binary1.iota
lib/              # like pkg/ in Go
  pubmod1.iota
  pubmod1.test.iota
  pubmod2.iota
src/              # like internal/ in Go
  internal1.iota
test/             # like tests/ in Cargo?
package.iota

It is equivalent to a table:

(
    package = @import("package.iota")
    lib = (
        pubmod1 = @import("lib/pubmod1.iota")
        pubmod2 = @import("lib/pubmod2.iota")
    )
    exe = (
        binary1 = @import("exe/binary1.iota")
    )
    src = (internal1 = @import("src/internal1.iota"))
    deps = (
        `github.com`/org/pkg = @import("deps/github.com/org/package/import.iota")
    )
    ...
)

What's a package?

  • a const-value of:
    • metadata
      • name, description, url, license, etc
      • dependencies
    • list of its modules
  • a const-trait that can build()/test()/etc its values?
    • iotac executable == "iota const evaluator". iota-"cargo" is just a library.
    • $ iotac build --target=binary1 == run const-fn package.build(target = binary1)
    • $ iotac test == run const-fn package.test()
    • ...

package.iota is a go.mod-like project file, an entry point to building. A mix of cargo.toml, Deno's main file, go.mod, etc.

// my_package = package(
// package my_package(ctx: Context)   // shortcut

// like cargo.toml,
// key `name` set to `my_package`:
name = "my_package"
description = "foos bars"
url = "author/lib"     // if uploaded to the main registry

dependencies = (
  "github.com/org/lib" = (version = "v0.1"),
)

require = (
  // the list of capabilities required from the environment
)

// Q: how to describe build targets and tests?

// Q: how to include modules?
// like `mod mod1;` in Rust, Go-style syntactically?

// end

binary vs library

Entry points require a uniq ref to a process resource (borrowed from the OS).

module definition

// shortcut: module main(ctx: Context) (...) { ... }
const main module(ctx: Context) (
    doc = "module documentation goes here?"
    // public definitions, module "signature"
) is

// imports
const std = ctx.import('std')

// private definitions, implementations

end // closes is, optional if EOF

module imports

// full form
self.import(
  "std"                 // std = "pkg.iota.dev/std"
  "github.com/org/lib"  // lib = "github.com/org/lib"
  rename "name"
)

module contexts

The context of a module defines the list of accessible static capabilities.

An executable gets the root dynamic capability from the environment (e.g. from the ELF loader or another loader middleware).

A library gets its capabilities from its importer.

Interop

Goals:

  • make C ABI interop rich and powerful (so it works well)
  • make C headers interop easy, almost seamless technically (~@ cImport).
  • make it complex and demanding, proof-heavy when imported (lots of additional static information)
  • Rust interop via extern "interop_abi"?
  • WIT from WASM?
  • Q: GC-based languages

Standard library

logic

  • logic.bool: the basic building block for data, the two-valued inductive type bool. Will become a machine bit.

  • logic.prop: the basic building block for proofs, types and reasoning about code.

  • logic.pair: the basic constructor for complex heterogeneous data. It's a building block for records.

  • logic.list: Type-wise: the basic constructor for homogeneous data. It's a building block for arrays, vectors, etc.

  • logic.lambda: Type-wise: structural expressions for untyped lambda, which are used later by a prover to manipulate terms. The basic building block for describing semantics of (unrestricted, not linear) code.

    Q: should this be dependently-typed right away?

  • logic.linear: Augments logic.lambda with linear typing. It's a basic building block for describing resources, e.g. heap allocations and memory segments. Describes the algebra of resources: allocation, single use, deallocation.

  • logic.ordered: Augments logic.lambda and logic.linear with an ordered type system (see Advanced Topics in Types and Programming Languages by Pierce). A building block for describing stacks and stacked contexts. Describes the algebrax of stacked resources, e.g. lifetimes.

  • logic.nat, logic.num, logic.fin, logic.field: Basic Peano numbers to be used in proofs. Their properties. Finite enumerations and sets. Especially of size $2^n$. (Finite?) fields and their properties. Especially on $2^n$-sized sets (Zig-like u2, u3, ..., u8, u9, ...).

  • logic.array: vec n T in dependent types. A fixed-size sequence of items of some type, indexed by fin. A special case: $2^n$-sized arrays (useful, because can be totally indexed by a bit sequence).

  • data.byte, data.word: Describe fixed-size sequences of bools: 8 bools for byte, 32/64 bools for word. Defines the usual CPU operations on them (shl, shr, not, and, ..., ). Defines $\tt{u8 } = \tt{finfield}(+, 2^8)$, and a mapping $byte \iff \rm{u8}$ The same for u64 as $\tt{word} \iff \tt{finfield}(+, 2^{64})$. Describes two-complement mapping $[-128 .. 128) \iff \tt{i8}$ and $[-2^{63} .. 2^{63}) \iff \tt{i64}$. Operations on signed numbers.

  • data.utf8: Describes a sequence of bytes which is a correct UTF8-encoded Unicode-ish string. A basic building block for symbols?

  • data.buffer: Describes a linearly typed array of bytes which can be read and written into (for byte, word, maybe utf8). Algebra of linearly typed arrays (ownership, splitting, swaps, etc). Special case: $2^n$-sized buffers.

  • data.bag (Q: what are symbols? Are they just utf8 byte sequences? Are they (opaque) words, e.g. pointers/hashes? Any additional properties on them?)

    Describes a JSON-structured data type, bag.

    <bag> ::= unit                /* json: null */
            | bool                /* json: bool */
            | <nat>               /* json: num */
            | <utf8>              /* json: str */
            | ([<sym> =] <bag>, ...)
            /* json: object/array */
    

    A bag a known-size associative array with a known set of symbols/nats as keys. Order of key-value pairs is preserved. Keys can be implicit, filled by incremented nats, like in Lua.

    Bags are values, their bit representation is not defined. Other things can be coerced to bags (records, any inductive data type structurally?).

  • data.parse: A small parser-combinator library.

lang

  • lang.syntax: Describes the language:

    • tokens: keywords, literals, symbols, (, ), ...
    • AST structure as an inductive data type (on top of token bags?). Each node has a bag of attributes.
  • lang.parser: Uses data.parse to build an AST from a utf8 byte sequence.

  • lang.binary: An interface to serialize/deserialize bags+types.

  • lang.record: A tuple <a bag of data, data type>. Generic (serde-like) serialization/deserialization interface.

  • lang.struct: Mapping (binary serialization/deserialization implementation) of a record onto a linear buffer.

  • lang.func, lang.coro? A func is a lambda that actually consumes/produces resources (stack frames, thread time, etc). Describes how lambda values become machine resources and run: stack frames, returns, calls, .... Calling conventions?

    A coro is an executable state machine.

    Q: closures?

core

core.mem

Provides: core.mem capability. Reading most (?) properties is safe (? but: ASLR). Destructive (i.e. write) changes are equivalent to having any process-wide capability. But: not every capability depends on core.mem (e.g. os.fs , os.user, net.ip.block are limits external to a process).

Objects:

  • address spaces: usually one, with its traits (32-bit, 64-bit, etc), but e.g. code space, data space for Harvard architecture.
  • memory segments: a set of $(offset, size)$ mappings with their own traits and allowed operations. Segments can grow (e.g. heap, stack). Can be executable, writable. Can be created/deleted (e.g. mmap/munmap, MMU operations). Q: are these just special, root memory regions?
  • memory regions (places?) (~ a resource that can hold a representation of a value, possibly complex one). Type encodings and transmutation, memory layout, alignment. Properties: offset, size, stride (may differ from size), read/write, volatile, executable, etc. Access patterns (random access, sequential access, etc).
  • references: a way to refer to a memory region for some purpose (e.g. reading, writing, managing, executing instructions). Operations: peek, poke. Explicitly shy away from pointers, especially as integers.

A remark about Rust: Rust does a very good job when relying on its set of assumptions (e.g. one address space, nonvolatile memory regions, internal ABI, pointers are integers), but is literally unsafe when stepping outside (e.g. lifetimes on JIT-compiled functions, memory segments going away underneath "safe" references, some other ABIs, NUMA, etc). Let's bring more clarity and explicitness here.

core.alloc

Requires: a core.mem capability. Provides: a core.alloc capability, negation of core.noalloc

Objects:

  • allocators: objects that can create/resize/consume memory places as resources.

Allocators can be created from a memory region and recycled themselves by their memory segment or parent region (e.g. enabling munmap). Allocators should be explicit (at least on a module level) and nudge to allocate memory closer to callstack roots, unlike the global allocator in Rust.

Allocation is fallible. Every allocation either should be handled, or means possibility of panic.

Allocated places are bound by the lifetime of their allocator

Allocators can be garbage-collecting, with no-op "freeing".

TODO

The bootstrapping sequence:

  • an Ion parser and test suite:
    • strings, numbers, symbols
    • tuples
    • pairs and maps
    • dot and paths
    • functions and scopes
  • a simple total language for querying/transformation of ion datastructures
    • list access
    • map transformations
  • a minikanren-style unification engine?
  • UTLC
  • linear lambda
  • arrays, references, memory management
  • static projections, i.e. comptime?

scratch area: free-floating ideas

indirection, mutable value semantics

When a parameter is mentioned without annotations, compiler is free to choose any way of passing an argument (high-level code is easy on details). If you want a specific way of passing a value (ref / use own / val), specify it.

TODO: a notation for mutable value semantics, most likely * to look C-like.

Linear values can be consumed and/or replaced by other value. An explicit way of passing ownership in and/or out of an automaton.

// e.g. Golang:
func (<resource>)<name>(
   <context>,
   <in> | *<inout>, ...
) (<out>...)
#![allow(unused)]
fn main() {
// e.g. Rust:
fn method(
   /* ref/use/own */ self,
   <in>,          // move the cell ownership
   &<in>,         // usually cheap way of passing values
   &mut <inout>,  // this cell cannot be consumed, values can be.
) <out>           // may be a reference!
}

THINK: bridge between linear lambda and arrays.

array value semantics

i.e. "Fortran-style programming", "data-oriented design", etc.

It's built from:

  • primitive values like u8, f32, etc.
    • can be opaque blobs of bytes.
  • arrays of values: ordered and indexed sequences of values.
  • indexes: handles to a specific array element.
    • When the array length is statically known: elements of Fin len. Can be optimized to u<N> bitsets to avoid end checking (total indexing).
    • When the array length is unknown: must carry a proof of validity.
  • slices of other arrays.

When arrays have statically known lengths: mostly solved in Ada, just have the domain of indexes as ranges.

THINK: dynamically-sized arrays, dependently-sized structs ("length-tag-value"s). This very quickly evolves into some kind of dependently-typed programming: array ranges are types dependent on the array length and values.

THINK: a generic way to project statically known things into comptime and how to fall back to dynamic checks gracefully and transparently. A generic way to uplift runtime constructs into comptime when possible.

unification and relations

i.e. relations vs functions vs automata (vs I/O?)

// append as a a prolog-style relation
rules (
   $t
   $X: $T
   exists nil : $T
   split: $X => { empty | (head $X : $t, tail $X : $T) }
   cons: ($t, $T) => $T
)
given (T: Type)
append (X: List T) (Y: List T) : List T =
   rec(self)
   match X {
   | [] => Y
   | [H, ...T] => [H, ...self T Y]
   }

It would nice to have a unified mechanism for expressing multi-way relations, not only one-way pure functions. This can be useful for typechecking, especially traits.

Think: zero or one variant unification for reversible computations. Even a two-way fork in possible options grows exponentially. But: is this practical in any way?

parameters vs context

Some variation of dynamic scoping, ideally typed and somewhat explicit.

Adapt and expand ctx context.Context Golang idiom to carry everything related to dynamic context, including capabilities. Contexts are what separates pure functions from automata.

metainterpretation

TODO: the language structure and properties should be expressed in itself.

rewriting rules

i.e. "lawful macros".

A compiler can be a bunch of rewriting rules that rewrites a given AST to a bitstring.

complex identifiers

Lisp: almost every possible character (except [(),'"]) can be a part of an identifier. Good: flexible, subsumes almost any other language. Bad: can get unreadable, especially with Unicode.

C: identfiers consist of [_A-Za-a][_A-Za-z0-9]*. Good: very minimal character set that makes it possible to highlight anywhere with anything, instantly recognizable and mostly unambiguous. Must be used as the basis of ABI. Bad: no paths, no namespaces, no structure.

Clojure: namespaced/symbols, :selfevaluated, ...

Iota: atomic identifiers made of alphabet characters and symbols. "path-like" complex identifiers like /a/b/c or a.b.c. Sigil identifiers like $a, %1, a$b, :a (atomic identifiers + punctuation without whitespace/other gaps).

errors handling: conditions?

Why do we always unwind call stack on error? Why can't we call an error handler of top of it, to be able to examine everything? Optionally it can jump into a supplied continuation with unwinding.