Making a low level (Linux) debugger, part 3: our first program

This continues a series where we make a debugger and live editor for (re)creating assembly and C programs.

In part 1, we got the assembly parts: read/write registers and memory, single step, single instruction execution, function calls (although not perfect), set/restore breakpoints, memory allocation and examining upcoming instructions.

In part 2, we got the C parts: read/write variables using ptrace and memory maps, a C read-eval-print loop (REPL), line numbers from DWARF headers and undo using fork().

Its now time to actually use our debugger/editor! We'll try to use it to write a C program.

We'll use three different files [1]

the starting file sample3.c containing the main function,
the run-once shared library run_once.so,
another shared library library.so compiled from library.c where our "finalized" source will reside.

We'll try to keep sample3.c mostly unchanged and instead alter library.c which will eventually become our program (by adding its main function). library.c is initially empty.

Note about sample sessions

I wanted to talk/walk through a sample session. However, it is quite hard for me to convey how difficult (or not) something is. Things like save_state and load_state aren't worth mentioning every time I use them. But there's also the Python interpreter itself (I'm actually using the IPython REPL) and how I'm copying between my text editor and interpreter. There's how helpful (or not) my environment is in case of errors and how many mistakes I make/am led ot make.

A screen recording of the entire unedited session contains everything but lasts too long and its difficult to focus on excerpts I want to draw attention to. An edited session doesn't necessarily do justice to stumbling blocks.

In the end, I'll stick to a similar format as earlier parts and add appendices for a few detours.

Setting up

First, we'll add a reload_library function similar to reload_run_once that will reload library.so.

def reload_library():
    command = "gcc -shared -x c -o library.so -fPIC library.c"
    gcc_proc = subprocess.Popen(shlex.split(command))
    out = gcc_proc.communicate()
    if not gcc_proc.returncode: # No error
        c_func_call("reload_library")
        load_lib_vars('library.so')

And in sample3.c, add

void* dllibrary;

void reload_library(){
  if (dllibrary != NULL) dlclose(dllibrary);
  dllibrary = dlopen("./library.so", RTLD_NOW);
}

similar to reload_run_once(). This is the last time we'll change sample3.c. Recompile it and start the Python interpreter in interactive mode.

python -i tutorial3.py

With our three-files setup, we want run_once to have access to variables in library.c but since library.so is also dynamically loaded, we cannot get access to them with extern.

However, as described in part 2, we can use dlsym to get a pointer. For example if

int memory[1000];

is in library.c then

extern void* dllibrary;

void run_once(){
    int* memory = dlsym(dllibrary, "memory");
    // More source here
}

effectively gives us access to that memory variable.

Add an array c_lib_globals for all variables we want to extract using dlsym and change the generated run_once.so accordingly.

c_globals.append("void* dllibrary")
c_lib_globals = []

def dl_lib_var(var):
    type_, name = var.split()
    return '%(type)s %(name)s = dlsym(dllibrary, "%(name)s");' % {"type": type_,
                                                              "name": name}

def c_program(c_lines, raw=False):
    return """
    #include <stdio.h>
    #include <dlfcn.h>
    %s;
    void run_once(){
    %s;
    %s
    }""" % (";\n".join(c_globals),
            ";\n".join(dl_lib_var(var) for var in c_lib_globals),
            c_lines)

Change the first few lines of run_c to use c_program.

def run_c(c_lines):
    command = "gcc -shared -x c -o run_once.so -fPIC -"
    gcc_proc = subprocess.Popen(shlex.split(command), stdin=subprocess.PIPE)
    out = gcc_proc.communicate(input=c_program(c_lines.encode('string_escape')))
    [...]

For the memory array example, we'd simply run

>>> c_lib_globals.append("int* memory")

Test this

>>> run_c('memory[0] = 1;')
>>> run_c('printf("%i\n", memory[0]);')
1

Unfortunately, all c_lib_globals variable will all be pointers as I've not found a better way. This means for integers that are not an array of integers, we'll have to pretend they are arrays of size 1 and write *varname everywhere we would write varname, including assignments.

A simple model

The goal is to eventually make an interpreter so let's make a simple one as a test. Our debugger can be used to make any program, of course.

For a Forth-like interpreter, we had

memory,
an array of primitives,
a data (parameters) stack, and
a call (return) stack.

Let's make things simpler at first and only have

memory
primitives and
a program counter in lieu of a call stack.

Otherwise, the model is the same: our memory is an array of primitives and the main loop executes the primitive at the program counter, increments the program counter and repeats.

Add memory (if not already done in the previous step) and program_counter to library.c

int memory[1000];
int program_counter = 0;

then reload_library() and check that things are working properly.

>>> reload_library()
>>> c_lib_globals.extend(["int* memory", "int* program_counter"])
>>> run_c('printf("%i\n", *program_counter);')
0

Interpreter loop

Lets implement the model described. Each iteration of our interpreter's loop will read memory at the program counter, lookup the corresponding function, call it and increment the program counter. Something like

((func_ptr_array_t)primitives)[memory[*program_counter]]();
(*program_counter)++;

but let's create some test primitives before adding this loop.

Primitives

What primitive should be start our first test with?

Since we have a debugger that can write values, we don't need to parse and compile the input yet [2]. Instead, we'll use our debugger to write the program we want directly into memory and set the program counter to the starting position.

Perhaps the simplest primitive is the no-op (pass in Python) which does nothing.

void do_nothing() {}

We'll make this primitive #0. (Another option would be to make primitive #0 an error so accidentally executing a fresh piece of memory would immediately fail.)

Let's add something more interesting for primitive #1 and make it read/write to memory (the only other thing to try in our chosen model is to read/write to the program counter). We'll run a first test with just two primitives so let's make the second primitive change the next primitive to run!

void toggle(){
  memory[program_counter + 1] = 1 - memory[program_counter + 1];
}

Add a function to print out the first few entries of memory and test our primitives.

void print_memory(){
  for (int i=0; i<10; i++){
    printf("%i", memory[i]);
  }
  printf("\n");
}

>>> reload_library()
>>> c_func_call("print_memory", starts["library.so"])
0000000000
>>> c_lib_globals.append("void* toggle")
>>> run_c('toggle()')
<stdin>: In function ‘run_once’:
<stdin>:9:5: error: called object ‘toggle’ is not a function or function pointer
<stdin>:8:7: note: declared here
<stdin>:10:5: error: expected ‘;’ before ‘}’ token

That last line gives an error because we can't directly call a void* pointer. We have to first cast it to a function pointer.

>>> run_c('typedef void (*void_func_ptr_t)(void); ((void_func_ptr_t)toggle )();')
>>> c_func_call("print_memory", starts["library.so"])
0100000000

Now, put both primitives in an array as we planned

void (*primitives[])(void) = {do_nothing, toggle};

and try calling it by index

>>> reload_library()
>>> c_lib_globals.append("void* primitives")
>>> c_func_call("print_memory", starts["library.so"])
0000000000
>>> run_c('((void (**)(void))primitives)[1]();')
>>> c_func_call("print_memory", starts["library.so"])
0100000000

For easier access, we can add a type

typedef void (**func_ptr_array_t)(void);

in c_program so run_c('((func_ptr_array_t)primitives)[1]();') would have the same effect.

Test program

Its still a bit boring to have only two primitives, one of which does nothing but that should be enough for a first test. We now add the interpreter's main loop and run a test program. The "all 0s" program will only do_nothing so lets set one of the entries in memory to 1, say memory[0].

>>> reload_library()
>>> run_c('memory[0] = 1;')
>>> c_func_call("print_memory", starts["library.so"])
1000000000

"Appendix: String escape" describes an error encountered here. We fix it by making automatic string escapes optional so run_c(c_lines, raw=True) will not try to escape characters in c_lines.

>>> run_c(prog, raw=True)
>>> c_func_call("print_memory", starts["library.so"])
1100000000

Lets add a function to run one iteration of the main loop.

def iter_():
    run_c('''((func_ptr_array_t)primitives)[memory[*program_counter]]();
             (*program_counter)++;''', raw=True)
    run_c('printf("%i\n", *program_counter);')
    c_func_call("print_memory", starts["library.so"])

and test it

>>> iter_()
2
1110000000

We can run the loop a few more times. If we put it in a C while loop, since we don't have any halting primitives or stopping condition, it would run until the process is interrupted.

Instead, lets run it until the program_counter is far enough, say at index 60 or higher.

def run_loop():
    prog = """
    while (*program_counter < 60) {
      ((func_ptr_array_t)primitives)[memory[*program_counter]]();
      (*program_counter)++;}"""
    run_c(prog, raw=True)

Lets also print that much from memory.

void print_memory(){
  for (int i=0; i<60; i++){
    printf("%i", memory[i]);
  }
  printf("\n");
}

and run this loop

>>> reload_library()
>>> run_c('memory[0] = 1;')
>>> run_loop()
>>> c_func_call("print_memory", starts["library.so"])
111111111111111111111111111111111111111111111111111111111111

Lets pretend memory is circular (so program_counter is its value modulo 60) and run this loop a few more times.

def run_loop():
    prog = """
    while (*program_counter < 60) {
      ((func_ptr_array_t)primitives)[memory[*program_counter]]();
      (*program_counter)++;}
    *program_counter = 0;"""
    run_c(prog, raw=True)
    c_func_call("print_memory", starts["library.so"])

>>> for _ in range(60):
...     run_loop()
... 
111111111111111111111111111111111111111111111111111111111111
101010101010101010101010101010101010101010101010101010101010
110011001100110011001100110011001100110011001100110011001100
100010001000100010001000100010001000100010001000100010001000
111100001111000011110000111100001111000011110000111100001111
101000001010000010100000101000001010000010100000101000001010
110000001100000011000000110000001100000011000000110000001100
100000001000000010000000100000001000000010000000100000001000
111111110000000011111111000000001111111100000000111111110000
101010100000000010101010000000001010101000000000101010100000
110011000000000011001100000000001100110000000000110011000000
100010000000000010001000000000001000100000000000100010000000
111100000000000011110000000000001111000000000000111100000000
101000000000000010100000000000001010000000000000101000000000
110000000000000011000000000000001100000000000000110000000000
100000000000000010000000000000001000000000000000100000000000
111111111111111100000000000000001111111111111111000000000000
101010101010101000000000000000001010101010101010000000000000
110011001100110000000000000000001100110011001100000000000000
100010001000100000000000000000001000100010001000000000000000
111100001111000000000000000000001111000011110000000000000000
101000001010000000000000000000001010000010100000000000000000
110000001100000000000000000000001100000011000000000000000000
100000001000000000000000000000001000000010000000000000000000
111111110000000000000000000000001111111100000000000000000000
101010100000000000000000000000001010101000000000000000000000
110011000000000000000000000000001100110000000000000000000000
100010000000000000000000000000001000100000000000000000000000
111100000000000000000000000000001111000000000000000000000000
101000000000000000000000000000001010000000000000000000000000
110000000000000000000000000000001100000000000000000000000000
100000000000000000000000000000001000000000000000000000000000
111111111111111111111111111111110000000000000000000000000000
101010101010101010101010101010100000000000000000000000000000
110011001100110011001100110011000000000000000000000000000000
100010001000100010001000100010000000000000000000000000000000
111100001111000011110000111100000000000000000000000000000000
101000001010000010100000101000000000000000000000000000000000
110000001100000011000000110000000000000000000000000000000000
100000001000000010000000100000000000000000000000000000000000
111111110000000011111111000000000000000000000000000000000000
101010100000000010101010000000000000000000000000000000000000
110011000000000011001100000000000000000000000000000000000000
100010000000000010001000000000000000000000000000000000000000
111100000000000011110000000000000000000000000000000000000000
101000000000000010100000000000000000000000000000000000000000
110000000000000011000000000000000000000000000000000000000000
100000000000000010000000000000000000000000000000000000000000
111111111111111100000000000000000000000000000000000000000000
101010101010101000000000000000000000000000000000000000000000
110011001100110000000000000000000000000000000000000000000000
100010001000100000000000000000000000000000000000000000000000
111100001111000000000000000000000000000000000000000000000000
101000001010000000000000000000000000000000000000000000000000
110000001100000000000000000000000000000000000000000000000000
100000001000000000000000000000000000000000000000000000000000
111111110000000000000000000000000000000000000000000000000000
101010100000000000000000000000000000000000000000000000000000
110011000000000000000000000000000000000000000000000000000000
100010000000000000000000000000000000000000000000000000000000

Oh look, its the Sierpinski triangle.

This concludes our first test and is a good place to stop.

Concluding remarks

Working in our debugger was generally pleasant but we didn't get to use a number of features like breakpoints and assembly stepping and execution. Even load_state and save_state was of limited use because reload_library mostly resets our state. Maybe with more complex programs those other feature will become more useful? Hopefully, we'll also be able to evolve this debugger alongside the interpreter we are making so that it is also a debugger for the interpreter's language. And the last thing we haven't gotten around to using a more graphical debugger.

Appendix: Debugging notes

load_state bug

Right off the bat, trying to call a function from a shared library after a load_state gives an error. This is because the old pid is used instead of process.pid in load_lib_vars (so the wrong memory maps were read).

Replacing pid with process.pid everywhere fixes this.

RTLD_LAZY

Reading variables from library.so when it is reloaded sometimes resulted in old value. I'm guessing the problem comes from RTLD_LAZY. From the dlopen man pages:

Relocations shall be performed at an implementation-defined time,
ranging from the time of the dlopen() call until the first reference
to a given symbol occurs.

But accessing a variable ("symbols") from our debugger directly wouldn't count as a first reference to a symbol. At least I think that's the source of the problem. In any case, replacing RTLD_LAZY by RTLD_NOW seem to fix the issue.

String escape

It was convenient to be able to run run_c('printf("Hello world\n");') without having to escape the backslash into \\. But what if we actually want to run multi-line commands?

>>> prog = """
... ((func_ptr_array_t)primitives)[memory[*program_counter]]();\
... (*program_counter)++;
... """
>>> run_c(prog)
<stdin>: In function ‘run_once’:
<stdin>:11:5: error: stray ‘\’ in program
<stdin>:11:6: warning: implicit declaration of function ‘n’ [-Wimplicit-function-declaration]
<stdin>:11:41: error: subscripted value is neither array nor pointer nor vector
<stdin>:11:95: error: stray ‘\’ in program
<stdin>:12:5: error: expected ‘;’ before ‘}’ token

This give a compiler error because our end-of-lines \n are replaced by \\n. We can just change run_c to take an optional argument to prevent it from running .encode('string_escape') on our source.

def run_c(c_lines, raw=False):
    if not raw:
        c_lines = c_lines.encode('string_escape')
    command = "gcc -shared -x c -o run_once.so -fPIC -"
    gcc_proc = subprocess.Popen(shlex.split(command), stdin=subprocess.PIPE)
    out = gcc_proc.communicate(input=c_program(c_lines))
    [...]

>>> run_c(prog, raw=True)

now runs.

Appendix: Other changes in the interim

Between tutorial2.py and tutorial3.py, some other changes also happened, which includes

Being able to read_int from shared object library (with an optional _start offset)
Using a global tmp_addr variable instead of reserving new memory with each call to safe_func_call.

Disabling compiler optimization

Some helpful responses to part 2 suggests adding -O0 would result in assembly that's closer to our C source. So that's something worth trying.

Footnotes

[1] A few other arrangements were tried and even though this isn't perfect, it works decently for the workflow of dynamically reload libraries and running single lines.
[2] Without a debugger, we could have hard coded initial values at the start of the program.

Source

The source for this post is here. The files are all in the sample3 folder.

Posted on Jul 6, 2018

Blog index RSS feed Contact