Making a low level (Linux) debugger, part 3: our first program

This continues a series where we make a debugger and live editor for (re)creating assembly and C programs.

In part 1, we got the assembly parts: read/write registers and memory, single step, single instruction execution, function calls (although not perfect), set/restore breakpoints, memory allocation and examining upcoming instructions.

In part 2, we got the C parts: read/write variables using ptrace and memory maps, a C read-eval-print loop (REPL), line numbers from DWARF headers and undo using fork().

Its now time to actually use our debugger/editor! We'll try to use it to write a C program.

We'll use three different files [1]

We'll try to keep sample3.c mostly unchanged and instead alter library.c which will eventually become our program (by adding its main function). library.c is initially empty.

Note about sample sessions

I wanted to talk/walk through a sample session. However, it is quite hard for me to convey how difficult (or not) something is. Things like save_state and load_state aren't worth mentioning every time I use them. But there's also the Python interpreter itself (I'm actually using the IPython REPL) and how I'm copying between my text editor and interpreter. There's how helpful (or not) my environment is in case of errors and how many mistakes I make/am led ot make.

A screen recording of the entire unedited session contains everything but lasts too long and its difficult to focus on excerpts I want to draw attention to. An edited session doesn't necessarily do justice to stumbling blocks.

In the end, I'll stick to a similar format as earlier parts and add appendices for a few detours.

Setting up

First, we'll add a reload_library function similar to reload_run_once that will reload

def reload_library():
    command = "gcc -shared -x c -o -fPIC library.c"
    gcc_proc = subprocess.Popen(shlex.split(command))
    out = gcc_proc.communicate()
    if not gcc_proc.returncode: # No error

And in sample3.c, add

void* dllibrary;

void reload_library(){
  if (dllibrary != NULL) dlclose(dllibrary);
  dllibrary = dlopen("./", RTLD_NOW);

similar to reload_run_once(). This is the last time we'll change sample3.c. Recompile it and start the Python interpreter in interactive mode.

python -i

With our three-files setup, we want run_once to have access to variables in library.c but since is also dynamically loaded, we cannot get access to them with extern.

However, as described in part 2, we can use dlsym to get a pointer. For example if

int memory[1000];

is in library.c then

extern void* dllibrary;

void run_once(){
    int* memory = dlsym(dllibrary, "memory");
    // More source here

effectively gives us access to that memory variable.

Add an array c_lib_globals for all variables we want to extract using dlsym and change the generated accordingly.

c_globals.append("void* dllibrary")
c_lib_globals = []

def dl_lib_var(var):
    type_, name = var.split()
    return '%(type)s %(name)s = dlsym(dllibrary, "%(name)s");' % {"type": type_,
                                                              "name": name}

def c_program(c_lines, raw=False):
    return """
    #include <stdio.h>
    #include <dlfcn.h>
    void run_once(){
    }""" % (";\n".join(c_globals),
            ";\n".join(dl_lib_var(var) for var in c_lib_globals),

Change the first few lines of run_c to use c_program.

def run_c(c_lines):
    command = "gcc -shared -x c -o -fPIC -"
    gcc_proc = subprocess.Popen(shlex.split(command), stdin=subprocess.PIPE)
    out = gcc_proc.communicate(input=c_program(c_lines.encode('string_escape')))

For the memory array example, we'd simply run

>>> c_lib_globals.append("int* memory")

Test this

>>> run_c('memory[0] = 1;')
>>> run_c('printf("%i\n", memory[0]);')

Unfortunately, all c_lib_globals variable will all be pointers as I've not found a better way. This means for integers that are not an array of integers, we'll have to pretend they are arrays of size 1 and write *varname everywhere we would write varname, including assignments.

A simple model

The goal is to eventually make an interpreter so let's make a simple one as a test. Our debugger can be used to make any program, of course.

For a Forth-like interpreter, we had

  1. memory,
  2. an array of primitives,
  3. a data (parameters) stack, and
  4. a call (return) stack.

Let's make things simpler at first and only have

  1. memory
  2. primitives and
  3. a program counter in lieu of a call stack.

Otherwise, the model is the same: our memory is an array of primitives and the main loop executes the primitive at the program counter, increments the program counter and repeats.

Add memory (if not already done in the previous step) and program_counter to library.c

int memory[1000];
int program_counter = 0;

then reload_library() and check that things are working properly.

>>> reload_library()
>>> c_lib_globals.extend(["int* memory", "int* program_counter"])
>>> run_c('printf("%i\n", *program_counter);')

Interpreter loop

Lets implement the model described. Each iteration of our interpreter's loop will read memory at the program counter, lookup the corresponding function, call it and increment the program counter. Something like


but let's create some test primitives before adding this loop.


What primitive should be start our first test with?

Since we have a debugger that can write values, we don't need to parse and compile the input yet [2]. Instead, we'll use our debugger to write the program we want directly into memory and set the program counter to the starting position.

Perhaps the simplest primitive is the no-op (pass in Python) which does nothing.

void do_nothing() {}

We'll make this primitive #0. (Another option would be to make primitive #0 an error so accidentally executing a fresh piece of memory would immediately fail.)

Let's add something more interesting for primitive #1 and make it read/write to memory (the only other thing to try in our chosen model is to read/write to the program counter). We'll run a first test with just two primitives so let's make the second primitive change the next primitive to run!

void toggle(){
  memory[program_counter + 1] = 1 - memory[program_counter + 1];

Add a function to print out the first few entries of memory and test our primitives.

void print_memory(){
  for (int i=0; i<10; i++){
    printf("%i", memory[i]);

>>> reload_library()
>>> c_func_call("print_memory", starts[""])
>>> c_lib_globals.append("void* toggle")
>>> run_c('toggle()')
<stdin>: In function ‘run_once’:
<stdin>:9:5: error: called object ‘toggle’ is not a function or function pointer
<stdin>:8:7: note: declared here
<stdin>:10:5: error: expected ‘;’ before ‘}’ token

That last line gives an error because we can't directly call a void* pointer. We have to first cast it to a function pointer.

>>> run_c('typedef void (*void_func_ptr_t)(void); ((void_func_ptr_t)toggle )();')
>>> c_func_call("print_memory", starts[""])

Now, put both primitives in an array as we planned

void (*primitives[])(void) = {do_nothing, toggle};

and try calling it by index

>>> reload_library()
>>> c_lib_globals.append("void* primitives")
>>> c_func_call("print_memory", starts[""])
>>> run_c('((void (**)(void))primitives)[1]();')
>>> c_func_call("print_memory", starts[""])

For easier access, we can add a type

typedef void (**func_ptr_array_t)(void);

in c_program so run_c('((func_ptr_array_t)primitives)[1]();') would have the same effect.

Test program

Its still a bit boring to have only two primitives, one of which does nothing but that should be enough for a first test. We now add the interpreter's main loop and run a test program. The "all 0s" program will only do_nothing so lets set one of the entries in memory to 1, say memory[0].

>>> reload_library()
>>> run_c('memory[0] = 1;')
>>> c_func_call("print_memory", starts[""])

"Appendix: String escape" describes an error encountered here. We fix it by making automatic string escapes optional so run_c(c_lines, raw=True) will not try to escape characters in c_lines.

>>> run_c(prog, raw=True)
>>> c_func_call("print_memory", starts[""])

Lets add a function to run one iteration of the main loop.

def iter_():
             (*program_counter)++;''', raw=True)
    run_c('printf("%i\n", *program_counter);')
    c_func_call("print_memory", starts[""])

and test it

>>> iter_()

We can run the loop a few more times. If we put it in a C while loop, since we don't have any halting primitives or stopping condition, it would run until the process is interrupted.

Instead, lets run it until the program_counter is far enough, say at index 60 or higher.

def run_loop():
    prog = """
    while (*program_counter < 60) {
    run_c(prog, raw=True)

Lets also print that much from memory.

void print_memory(){
  for (int i=0; i<60; i++){
    printf("%i", memory[i]);

and run this loop

>>> reload_library()
>>> run_c('memory[0] = 1;')
>>> run_loop()
>>> c_func_call("print_memory", starts[""])

Lets pretend memory is circular (so program_counter is its value modulo 60) and run this loop a few more times.

def run_loop():
    prog = """
    while (*program_counter < 60) {
    *program_counter = 0;"""
    run_c(prog, raw=True)
    c_func_call("print_memory", starts[""])

>>> for _ in range(60):
...     run_loop()

Oh look, its the Sierpinski triangle.

This concludes our first test and is a good place to stop.

Concluding remarks

Working in our debugger was generally pleasant but we didn't get to use a number of features like breakpoints and assembly stepping and execution. Even load_state and save_state was of limited use because reload_library mostly resets our state. Maybe with more complex programs those other feature will become more useful? Hopefully, we'll also be able to evolve this debugger alongside the interpreter we are making so that it is also a debugger for the interpreter's language. And the last thing we haven't gotten around to using a more graphical debugger.

Appendix: Debugging notes

load_state bug

Right off the bat, trying to call a function from a shared library after a load_state gives an error. This is because the old pid is used instead of in load_lib_vars (so the wrong memory maps were read).

Replacing pid with everywhere fixes this.


Reading variables from when it is reloaded sometimes resulted in old value. I'm guessing the problem comes from RTLD_LAZY. From the dlopen man pages:

Relocations shall be performed at an implementation-defined time,
ranging from the time of the dlopen() call until the first reference
to a given symbol occurs.

But accessing a variable ("symbols") from our debugger directly wouldn't count as a first reference to a symbol. At least I think that's the source of the problem. In any case, replacing RTLD_LAZY by RTLD_NOW seem to fix the issue.

String escape

It was convenient to be able to run run_c('printf("Hello world\n");') without having to escape the backslash into \\. But what if we actually want to run multi-line commands?

>>> prog = """
... ((func_ptr_array_t)primitives)[memory[*program_counter]]();\
... (*program_counter)++;
... """
>>> run_c(prog)
<stdin>: In function ‘run_once’:
<stdin>:11:5: error: stray ‘\’ in program
<stdin>:11:6: warning: implicit declaration of function ‘n’ [-Wimplicit-function-declaration]
<stdin>:11:41: error: subscripted value is neither array nor pointer nor vector
<stdin>:11:95: error: stray ‘\’ in program
<stdin>:12:5: error: expected ‘;’ before ‘}’ token

This give a compiler error because our end-of-lines \n are replaced by \\n. We can just change run_c to take an optional argument to prevent it from running .encode('string_escape') on our source.

def run_c(c_lines, raw=False):
    if not raw:
        c_lines = c_lines.encode('string_escape')
    command = "gcc -shared -x c -o -fPIC -"
    gcc_proc = subprocess.Popen(shlex.split(command), stdin=subprocess.PIPE)
    out = gcc_proc.communicate(input=c_program(c_lines))

>>> run_c(prog, raw=True)

now runs.

Appendix: Other changes in the interim

Between and, some other changes also happened, which includes

Disabling compiler optimization

Some helpful responses to part 2 suggests adding -O0 would result in assembly that's closer to our C source. So that's something worth trying.



The source for this post is here. The files are all in the sample3 folder.

Posted on Jul 6, 2018

Blog index RSS feed Contact