Iota Object Notation

ion ebnf

<sep> ::= <nl> | ',' | ';'
<nl> ::= '\n'

<str> ::= <json:str>
<num> ::= <json:num>

<sym> ::=
   | (* alphanumeric, with '_', '$', '\', digits *)
   | (* punctuation mix? *)

<item> ::=
   | '{' <choice> '}'       (* <map> *)
   | '[' <choice> ']'       (* <list> *)
   | '(' <choice> ')'       (* <tuple> *)
   | 'do' <seq> 'end'       (* <scope> *)
   | '.' <item>             (* <quote> *)
   | '$' <item>             (* <unquote> *)
   | <str>
   | <num>
   | <sym>

<chain> ::=
   | <item> <item>*                    (* whitespace, no newlines *)
   | <item> => <nl>? <chain>           (* short bind *)
   | 'with' <typed> '=>' <nl>? <chain> (* long bind *)

<typed> ::= <chain> (<nl>? ':' <nl>? <typed>)* (* type annotations *)
<choice> ::= <pair>? '|' <pair> ('|' <pair>)*  (* choice *)

<pair> ::=
   | '#'<item> <nl>? <pair>                  (* pre-meta *)
   | (<typed> <nl>? '=' <nl>?)? <chain>      (* "key-value" pairs *)

<seq> ::= <pair>? (<sep> <pair>?)*             (* toplevel *)

TODO: think of a syntax for Haskell-style a $ b $ c chain grouping.

iota syntax by example

comments

There is an piece of folklore on the internet that claims that syntax of comments of a programming language generates ~5x more discussions than syntax of the rest of the language (TODO: source). So, let's be as boring as possible here:

// this is a comment until the end of the line
/* an inline comment */

I have a suspicion that for some irrational reason ("this code with asterisks looks so pretty!"?) humans like asterisks so much that they are willing to put up with bare pointers as long as they look like asterisks. So asterisks are good. </joke>

(Q: comments should translate into metadata and be attached to nodes somehow?)

symbols

<sym> examples:

  • a, a0, hello_world
  • $a, $, hello$world
  • \to, \lambda
  • _, _1, _hello_
  • TODO: unicode?
// Symbols:
symbol
01234
hello_world
\alpha

strings and numbers

Self-evaluating atomic values.

TODO: check if JSON strings and numbers make sense. TODO: multi-line strings?

tuples

// tuples, ordered sequences of data:
()            // an empty tuple
(symbol)      // a one-element tuple, semantically equal to `symbol`
((symbol))    //   same as above
(a, b)        // a tuple of two values,
              //   or pairs, to be precise: (a = a, b = b)

pairs

Key-value pairs can be used for definitions and building maps:

// Pairs, `<key> = <value>`:
zero = 0
one = 1

a = b = c               // a single pair: a = {b = c}

"key is optional, this is a value and a key"

maps

Maps are namespaced partial functions from a key to a value:

// Maps, sets of pairs:
{ x = 12, y = 15 }    // a set of pairs
{
    x = 12            // newlines are separators
    y = 15
}

// empty maps:
{}
{,, , }        // an empty sequence inside
{  | |}        // an empty choice inside

// implicit keys are equal to their values:
{
    /* a = */ a
    /* b = */ b
}

{ { zero, one } = x }   // keys can be maps themselves

// more examples:
{ {} = {} }          // key `{}` -> value `{}`
{ {} }               // implicit key `{}` -> value `{}`
{one = 1, two = 2}   // symbol `one` -> 1, `two`-> 2
{a, b}               // implicit keys: a -> a, b -> b

(application) chains

This is inspired by Haskell: applying a function to something should be just a (left-associatve) juxtaposition. Juxtaposition sounds too clever, so Iota calls these (application) chains instead:

// [application] chains, `<item> <item>*`...:
f x y               // means `((f x) y)`

This is a very flexible syntax primitive, use with care.

dots and quoting

Dots are just a way of quoting something, like 'a or (quote a) in Lisp.

// quoting and unquoting:
.a             // just a symbol `a` that does not refer to anything
.{}            // a quoted map

A nice way to use this is for namespacing or "dot access" of values in a complex structure:

// namespacing and accessing a map:
m = { a = b = 2 }
.a.b          // application of `.a` to `.b` produces a <path> `.a.b`
m.a.b         // a path can be applied to a expression 
              //   to access something by key, "project" it.
              //   (m.a).b == m.(.a.b) = m.a.b
m .a .b       // i.e. `(m .a) .b`, it's just an application, see
m[.a][1]      // i.e. `(m [.a]) [1]`,  "index" something by something
m.(a, b)      // i.e. `m .(a, b)`, "project" two fields at once

// but: in Lean `m.f a.b` means `(m .f) (a .b)` and it's awesome.
//   Q: Should I steal this somehow?

// Q: map comprehensions?

When you say quote, you must then say unquote. Lisp uses (unquote a) or ,a for this, which looks pretty alien to any modern language. Modern developers are used to dollars to substitute something, instead, so why not use $<item> for unquoting:

.{             // any <item> can be quoted, e.g. maps
   a = $a         // `$<item>`  means unquoting, splicing, i.e. substitution from context
   b = $(f x y)   // any <item> can be spliced
}

// Q: is the following worth it?
//   Q: should it be unrestricted `$<item>` or `$(<chain>)` only?
s = "string"
"unquoted $(s)"         // ??? - use ION here?
."still quoted $(s)"    // literally "still quoted $(s)"?

specifying types

Typing relation (<term> : <term>) is special: in a dependently-typed language types are really a part of the term. They cannot be squeezed into some compiler metadata for expressions. They need respect and their own syntax:

// Typed expressions:
x : t             // means `x` has type `t`
a b : t u         // means chain `a b` has type `t u`
x : t : u         // i.e. `x` has type `t : u`

scopes

I have spent quite a bit of time trying to imagine a language built on maps. Map-based homoiconicity was supposed to be the central idea of this language experiment.

The problem with maps is that it does not make sense to make them ordered by default. This imposes too much of burden on serialization, efficient in-memory representation, etc. Maps with keys and independent ordering of pairs are not just simple partial functions anymore.

At the same time, it is crucial for source code "statements" (be it real statements or monadic bindings) to be represented sequentially and in some order.

Haskell's do-notation is just a chain of nested >>= calls with lexical scope. Its syntax hides this fact very cleanly.

echo = with {T} {ctx : Monad} : Monad T =>
   ctx.do getLine (a =>
      ctx.do getLine (b =>
         ctx.do (putStr "concat: ") (_ =>
            putStrLn (a ~ b))))

// can as well be

echo = do
   a = getLine()
   b = getLine()
   putStr "concat: "
   putStrLn < a.merge b
end

TODO: the types do not really make sense here, think of a better monadic/effectful semantics and evaluation.

Scopes start with do and end with end, with a sequential sequence of <pairs> inside. When seen as a data-structure they work roughly as

(<parent context>,
 (a = getLine,
  (b = getLine,
   (putStr "concat",
    putStrLn ((a .merge) b)))))

(late) bindings for names

Scopes are good, local names in them are good, but functions bodies are meant to be scopes with "holes" which are filled by late bindings to other values. This is a name without a value (yet), a parameter binding:

// Bindings are usually indicated with `=>`:

// Short binds, "lambdas", `<item> => <chain>`:
x => x.mul x            // binds a name `x` into a "body" chain `x .mul x`, when applied
(x => x.mul x) 12       // intuitively supposed to produce 144

x => y => x.mul y       // right-associative, i.e. `x => (y => (x .mul y))`
(x : Num) => x.mul x    // arguments can be typed

x => .{x = $x}          // the body chain can be one item, one item can be a (quoted) map

[out] => out.println "hello"   // the argument can be a tuple, a list, any <item>

// Long binds, `with <typed> => <chain>`
//   - allow to use many curried arguments
//   - allow to specify the return type:
//   - Lean-style
with (x: Num) (y: Num) : Num => x.mul y

interplay with other text formats

  • markdown
  • html/xml
  • Rust macros
  • URLs
  • yaml/json
  • unix shells
  • wat