When implementing a toy programming runtime, I tried to implement my “language” as a stack of simple layers.
Today, I'd like to describe the bottom of the stack, Barely Even Structured Text or BEST.
Intended as a lightweight (but not strictly minimal) layer, its design can be summarized in a few choices:
- Only produce a sequence, no structure yet as it can be built on top (token stream not abstract syntax tree);
- 2 types of tokens: strings and symbols, nothing more (specialization is made atop);
- Allow for arbitrary bytes everywhere in both, including invalid UTF-8;
- Quoting is exposed to upper layers, escaping is not;
- Keep it simple yet allow as short a representation as possible;
- Follow established conventions otherwise.
Here is the full description:
Spaces are separators outside of double quotes and escape sequences.
Strings are prefixed by
'or delimited by".The rest is symbols. They can be prefixed by
\', or delimited by\"and".Quoting is preserved by parsers and printers, and meaningful in some circumstances (eg in STOR,
[starts a vector but\'[does not).
\can be used to escape:
'single quote"double quotespace\backslashnnewlinercarriage returnttabfform feedvvertical tabbbackspace[0-9A-F][0-9A-F]arbitrary byte in hexu[0-9A-F][0-9A-F][0-9A-F][0-9A-F]unicode code point in hex (represented in UTF-8)
For example, in:
A 'world \'of "dew"A and of are symbols, the latter single-quoted. world and dew are strings, the latter double-quoted.
Here are 3 strings:
'A\ single\ string
"On each line"
"This one\nis two lines"Single-quoted strings end before any unescaped space, whereas double-quoted strings only end before an unescaped double-quote, so this is a fine string too:
"A direct line
break works, so do
escaped quotes like \"!"The naïve, probably bug-ridden implementation of parsing and printing fits in 200 lines of Nim code.
Looking forward to describing the layers above:
- Simple Textual Object Representation (STOR), pretty much a textual CBOR with explicitly sized integers;
- FORM Open Representation Model (FORM), an object model consisting of STOR with cycles;
- STORM, the STOR Machine, a virtual machine implementing a concatenative model through a data deque, a context deque, and a stream deque, all containing references to FORM values;
- STORK, the STOR Kernel, a document fed into the machine at startup to initialize its context stack with a base vocabulary for its primitives and commonplace constructs.