Part 1 of this tutorial
Part 2 of this tutorial
Direct link to the final (pregenerated) interpreter at the end of this tutorial. Try running
nasm -f elf64 forth.asm -o forth.o ;and ld forth.o ;and ./a.out < test2.f
We will finally add
We first need to initialize the names dictionnary with the names of all primitive functions. Since values in the (writable)
.bss section cannot be directly initialized, we put initial values in the
.data section and copy it over with a new
setup in header.asm will do the copying and other initialization. We expect
_start to call it before anything else. We'll also move
_start to forth.f.
compiler.py will generate these initial values for us. Which assembly functions should be exposed as primitives? We might as well provide all of them.
We can now replace our placeholder
is_primitive by the actual function. We'll waste the first # primitive entries in
memory and call anything with lower index a primitive and everything else a function.
Make compiler.py automatically generate the number of primitives as well as the index of some useful primitives like return (and replace
Everything is ready and we add
input-loop. Try this in forth20:
python compiler.py > forth.asm nasm -f elf64 forth.asm -o forth.o ;and ld forth.o ;and ./a.out
Try it with input
push1 push1 + print
forth-debug.f is also provided for a version with more debugging output.
Let's at least add all the primitives needed for a self interpreter our simulator ran.
If we had already defined
bind:, we could add it this way:
[ next-input names.set ] bind: bind:
(assuming bind isn't called from inside functions). But we can actually replace the only call to
bind: here by its body
[ next-input names.set ] next-input bind: names.set
Running this once our interpreter has started will define bind! (We put its name
bind: right after
next-input instead of at the very end since that's where
next-input will read it).
push: transforms the next token into a string and pushes onto a stack
[ next-command strings.address ] bind: push:
next-input when the stack is empty except for the special memory locations 0 and 1 we reserved. Otherwise, it pushes the next value in memory onto the stack. In this particular case, we expect it to be a string index. We implement it as a primitive since it depends on the call stack length.
It could possibly be interpreted instead of compiled by accounting for the calls to itself.
New primitive added:
As with the Python simulator,
call was already part of
main-loop and we just need to move it out. In compiled Forth, it is
[ is-primitive push: call-primitive push: call_stack.push if-else ] bind: call
call_function in assembly using compiler.py. Try forth21.
python compiler.py > forth.asm nasm -f elf64 forth.asm -o forth.o ;and ld forth.o ;and ./a.out < test.f
test.f is used as input.
We should have added the option to detect (data) stack underflows earlier.
%macro popdata 1 push r5 dec qword [data_stack_length] js %%negative mov r5, [data_stack_length] mov %1, [data_stack + 8*r5] pop r5 jmp %%end %%negative: prints "Popping empty stack! " inc qword [data_stack_length] call call_stack.len call print call call_stack.pop call print pushdata __LINE__ call print jmp exit %%end: %endmacro
This debug version of
popdata detects an error in
s2. When there's only one element on the stack,
s2 still tries to pop two elements. Fix this by moving
repeat works the same way as in the Python version and just moves the index at top of the call stack back. We need an extra
memory.get to call loops.
[ push1 print ] bind: foo push: foo strings.address memory.get call repeat
As in the Python simulator, since the input isn't saved,
repeat only works when inside the body of a function.
pushe evaluates the next token and then pushes onto the stack. Let's aim for something simpler: parsing integers only. Add primitives
int which turns a string into an integer (read in base 10) and
pushi which reads the following string and turns it into an integer.
int reads on digit at a time keeping a tally of what the integer would be if te current characters is the last one in the token. No validity checks are made on the input.
New primitive added:
write-loop is updated to take into account
pushi just like we did for
Try these two functions in forth22.
python compiler.py > forth.asm nasm -f elf64 forth.asm -o forth.o ;and ld forth.o ;and ./a.out < test.f nasm -f elf64 forth.asm -o forth.o ;and ld forth.o ;and ./a.out < test2.f
The first test runs an infinite loop printing 1 and needs to be interrupted (by press Control-C).
This completes our tutorial for making a Forth interpreter in assembly. Below we discuss some possible extensions and optimizations. The other interesting thing to do would be to add more primitives (either compiled or interpreted).
Here are some possible extensions not shown in this tutorial
memoryand making the right
blk. This probably needs
memoryto be the array allocated in the
SIGSEGVand print the
call_stack(maybe even the primitives stack).
Here are some possible optimizations not done in this tutorial.
Some improvement also make the assembly source look more human written.
Use a binary search tree or hash table for the
names dictionnary. Even a simple implementation should drop the linear search for lookup down to logarithmic.
Some useless instructions (like for
s2) or longer sequences that can be replaced by shorter sequences.
s11: popdata r1 pushdata r1 pushdata r1 ret
could instead just increase
[data_stack_length] and duplicate the top of the stack.
popdatasince it not read from anywhere else.
Maybe the data stack always seem to have the wrong arguments at the top.
If there's no branching in execution, setup the stack first and then call all functions. Try avoiding shuffling the stack in the middle.
A list of primitives in the final version, arranged by type.
main_loop, input_loop, write_loop
next_char, next_input, next_command, print, print_string, printspace, printeol, print_top, push, pushi
string_equal, equal_right_bracket, int
add, sub, not, equal, mul, greater
return, if, if_else, call_function
s11, s21, s2, s1212, s1312, s231
call_stack.is_empty, call_stack.pop, call_stack.push, memory.get, memory.set_at, memory.len, memory.append, strings.len, strings.set_at, strings.append, strings.pop, names.get, names.len, names.keys.get, names.values.get, names.set, strings.get, strings.address
input_buffer.len, input_buffer.push, input_buffer.append, input_buffer.pop, input_buffer.get, input_buffer.set_at, input_buffer.address
exit_main_loop, primitive_case, names_found, names_not_found, space_loop, word_loop, next_char_is_space, is_input_end
is_primitive, call_primitive, is_whitespace, raw_write