The Forth Lisp Python Continuum (Flpc) is a small high dynamic programming language I'm making. In these series of posts, I will describe and discuss its inner workings, more or less in the order that they are run when the program starts.
This will hopefully help with others making programming languages and serve as some kind of documentation. Though Flpc was designed to be relatively easy to change, there is a surprising amount that stayed unchanged since the start of this project, perhaps unfortunately. I'm sure some of this post will still become outdatated and the intended final format is an interactive tutorial to teach you how to recreate Flpc so different design decisions can be made (similar to How to make these slides) but we're still very far from that.
I've recently ported Flpc's C source to Nim, which now runs faster than C (when all Nim/underlying C compiler optimization flags are turned on) thanks to folks from the Nim forum.
I did this so that I can (hopefully) develop and try out different optimizations more quickly. Since Flpc became self-hosted, I've used the language in its intended way to add syntactic sugar at runtime. For example, I've added
"""This is a long stirng with "quotes" in it"""
{ 3 1 True None }
{ x*x for x in l }
in this way. It's great to see my envisioned workflow pan out, but the feedback loop for editing a grammar and seeing the results is too slow. And this is no good interactively using and modifying the language at runtime. Even without grammar modification, reruning the same script becomes slow after 4-5 functions are added to it.
However, optimization clashes with the dynamic nature of the language and the clarity and simplicity of its internal state at runtime. This combined with the changes needed across multiple layers has made progress on optimization slow. So I'm procrastinating by writing these posts.
If you know anything about optimizing compilers or bytecode and understand the trade-offs stated here, please get in touch by email or otherwise. Because of the long cycles, some intuition about what will be effective could really speed things up.
With that out of the way, let's get to the description of Flpc internals. The model is very similar to other languages I wrote about before so I will concisely state the description and invite you to read those for a slightly more detailed exploration in a simpler setting.
Alongside that, it has a number of other information in its global state
All "global" variables above are fields of g
(g.memory
, g.call_stack
, etc).
Elements in memory
and the data_stack
are integers where
TL=3
bits represents the type of the element andThere are 5 types, which in order (represented by 0, 1, 2, 3, 4
) are
strings
array containing the actual string,memory
represented by an index of memory
primitives
array,For example,
3456 = 54 * (2**3) + 1
has type 1, which is Str
and value 54
. Since strings[54]
is +
, a value of 3456
in memory represents the string "+"
.
There's more to say about the design and functionality of some of those, but lets skip that for now and jump directly to the bootstrapping steps, which is what this is about.
The core of the main loop for running FlpcForth bytecode is [simplification]
proc main_loop() =
while true:
g.call_stack[g.call_stack.len - 1] += 1
func_call(g.memory[g.call_stack[g.call_stack.len - 1] - 1])
That is, it
call_stack
,func_call
.func_call
is also fairly straightforward:
g.primitives
), call it.g.memory
), then add it to the top of the call_stack
.proc func_call(elem: Elem) =
case elem.ftype
of Pointer:
g.call_stack.add(elem.value)
of Prim:
let res = primitives[elem.value]()
if res != NoValue:
data_stack.add(res)
else:
error("Cannot call " & $elem)
For this main_loop
to work, our initial state needs some value on the call_stack
. The initial call_stack
is set to [0]
, meaning the first instruction we run is at g.memory[0]
and so we just need to make sure memory
is initialized with the right value there.
g.memory[0]
is a pointer to the function input_loop2
whose body is
repeat:
call(names.get(input.next_token()))
That is, it repeats these three steps indefinitely
func_call
this primitive or pointer.Notice that we're calling func_call
from Flpc rather than Nim (like we did in main_loop
). To do that, we put a thin wrapper around func_call
and add that wrapper to the primitives
array.
input.next_token
is a primitive but names.get
is not. names.get
is a Flpc function whose body looks something like
return_if(name == "string1" value1)
return_if(name == "string2" value3)
return_if(name == "string3" value3)
...
lookup_error()
The body of this function is modified by appending to it when new entries are added (to what we think of as the names
dictionary though there is no explicit object other than this names.get
function body).
Initial values written to memory
are loaded from a file init_memory.dat are loaded the integer representation of all these elements. 7 functions, including input_loop2
and names.get
are loaded this way. Another pre-loaded function is write_loop
.
With main_loop
, we can run any sequence of primitives or predefined Flpc function. Now we want some way to define new functions.
names.get
associates the (function) name [
with the (pointer to the) function write_loop
which lets us write new functions. Its body is essentially
write_loop <- fun[]:
pointer = Pointer(memory.len())
repeat:
token = input.next_token()
memory.append(names.get(token))
if string_endswith(token ":"):
memory.append(input.next_token())
if string_equal(`token "]"):
return(`pointer)
memory.append
is a primitive that writes is argument to the end of memory and memory.len
returns the current number of cells in memory.
memory.get
and memory.set
(not seen yet) are other primitives for reading and writing directly to memory
.
As a result if we write [ foo bar baz ]
, it will write a new function whose body is foo bar baz
(run foo()
then bar()
then baz()
) and return a pointer to (the beginning of) that function. We can call the result using call
.
For easier reuse, we'd like to name these new functions. To do that, we add them to the body of names.get
. A helper function functions.add
pre-loaded via init_memory.dat
takes a string and pointer and adds them the the names dictionary by essentially adding the line
return_if(name == "new_string" new_pointer)
to the body of memory.get
(and moves the line lookup_error()
down by rewriting it). The full definition is
functions.append <- fun[value]:
memory.set(functions.end() s21()) # pos value
functions.end.increase()
functions.add <- inline[]:
F' push: pick1 functions.append
push: push: functions.append functions.append
push: string_equal functions.append
push: push: functions.append functions.append
push: return_if functions.append
push: lookup_error functions.append
push: ] functions.append
functions.end.decrease functions.end.decrease 'F
functions.end
is a primitive that returns a pointer to the end of the body of names.get
, functions.end.increase
and functions.end.decrease
are primitives to increase and decrease these values.
The pointer from functions_end
is a global variable in Nim initialized by a value in init_memory.dat
(init_memory.dat
contains initial values for more than just memory
). In retrospect, functions.end
could be a value stored in memory
instead of a global in Nim (or C).
lookup_error
is the final pre-loaded function from init_memory.dat
, which just calls
error("lookup_error")
where error
is a premitive for an unrecoverable error.
proc error(msg: string) =
echo msg
raise newException(Exception, msg)
An alternate design (not tried) is to only input_loop2
in init_memory.dat
and move everything else to a later stage stage0
. To do that, we'd have to replace the initial function bodies
[ instr1 instr2 instr3 ]
with
pushf: instr1 memory.append pushf: instr2 memory.append pushf: instr3 memory.append
In other words, we write out by hand the instructions that write_loop
would execute.
But then we'd need a program to generate these sequences (or write them by hand and make them harder to edit). So I think adding a few extra function to init_memory.dat
/boot.flpc
was worth it.
Update: As I'm writing this, I added a new add_primitive
primitive to add new primitives to the names
dictionary. I will remove most primitives not needs for bootstrapping until add_primitive
can be called.
This begs the question of how init_memory.dat
is generated from boot.flpc
and other information. The process is a bit ugly. I still have an old Flpc interpreter written in Python lying around that is not part of the Github repo.
It puts the list of primitives in an order, builds an internal names
dictionary from primitives names to integer representation (the index in the order = the type Prim
). Then it writes down the body of each function in boot.flpc
using its own reimplementation of write_loop
. Then writes the names.get
function described earlier, to memory. Finally, this interpreter dumps its state to init_memory.dat
.
I want to end this post (before the appendix with init_memory.dat
layout) with one way to explore a bit.
THere's a function bp
(which is also a primitive) that can be called from either Nim or Flpc, which I think of as a breakpoint. It drops you to a prompt and lets you into a prompt bp>
and lets you examine. In the C version, this was an empty function that you'd attach to with gdb. In the Nim version, there a few commands (and you could expand these yourself):
bt
- prints the current Flpc state, including stackcs
- prints just the call stack as integers (for when bt
itself is broken)ds
- prints just the data stackst
- prints the Nim stack trace (needs --stacktrace:on
when compiling).mem <pos>
- print memory around the position pos
(an integer).Similar to bp
, there's a debugger
and debuggerv2
function that can be called from Flpc once they are up.
The current of init_memory.dat contains the following lines in order.
#memory <init memory length> <init memory entries>
<space separated memory entries>
where #memory
is a constant string to help navigate the init_memory.dat
file. The first <init memory entries>
entries of memory
are initialized to what's on the second line and the rest (up to <init memory length>
) are all 0.
Entries are integers representing (typed) elements. However, the last 2 bits are used to represent the type (instead of the last 3 bits) and so initial memory contains no value of type 5 = Special
(as they cannot be represented).
#positions <init position entries>
<space separated pairs of entries>
I've not yet described the positions
globals yet, which maps memory
indices to FlpcForth source ranges (start and end character). -1
represent no source positions for things like data.
#strings <init strings length>
<space separated strings entries>
is the initial strings. In particular, this contains the names of primitive and pre-loaded functions.
#indices <pointer to functions_end> <pointer to names_get> <pointer to run>
Pointer to (initial) memory
, essentially allowing a fast names.get
from Nim. The last one is no longer used and could be removed from this file.
#names <init primitive names length>
<space separated primitive names entries>
The names of primitives. This is so we know when initial memory
contains Prim 12, we know what primitive that should be. It is however redundant as we could have made strings
start with all primitive names.
Posted on Jul 3, 2021