Making a low level (Linux) debugger

gdb and lldb are the best known debuggers to me. While they are both customizable with scripts, there are many times where I'd like have much more control over how my debugger works (both the interactive portion and its internal representation).

Being able to recreate flpc in a more interactive ways is one of these times. In this post, I try to make a debugger from more primitive pieces: the ptrace system call wrapped by the python-ptrace library, pyelftools and later on, the disassembly library distorm3.

Because those debuggers are very large projects, trying to remake them seem daunting. But since I mostly want to debug and live-edit a binary I've created, I don't need maximum compatibility. Simplicity will be favoured over completeness when it seems like a good trade. Hopefully, this post itself exposes enough of the underlying ideas to bridge the gap in case of a slightly different environment and standard.

The source for this post is here. Everything is made and tested for Linux x86_64. The lines are in the order of this tutorial with functions and imports moved closer to the front. So not only is the final debugger interactive, the steps for making the debugger is also interactive.

Setting up

To avoid permissions issue, we will launch the debugged process as a child.

import ptrace.debugger
shell_command = ["./a.out"]
child_proc = subprocess.Popen(shell_command)
pid = child_proc.pid
debugger = ptrace.debugger.PtraceDebugger()
process = debugger.addProcess(pid, False)

This uses the ptrace system call to attach to the child process and pause it. process now contains many convenient methods.

(This follows the hello world example from python-ptrace.)

Reading values

Lets start simple but not linger here for too long.

Get registers

>>> regs = process.getregs()
>>> registers = {k:getattr(regs, k) for k in dir(regs) if not k.startswith('_')}
>>> registers
{'cs': 51L,
 'ds': 0L,
 'eflags': 519L,
 [...]
 'rax': 0L,
 'rbp': 140733962602848L,
 'rbx': 3L,
 'rcx': 139901135742274L,
 'rdi': 3L,
 'rdx': 140733962602656L,
 'rip': 139901135742280L,
 'rsi': 140733962602656L,
 'rsp': 140733962602520L,
 'ss': 43L}

Read bytes from memory

>>> import binascii
>>> binascii.hexlify(process.readBytes(registers['rsp'], 8))
'70987e453d7f0000'

Assembly REPL

Next we'll want to run assembly instructions one at a time. Lets gather the ingredients

Single step

>>> process.getreg('rip')
140187902313503L
>>> process.singleStep()
>>> process.getreg('rip')
140187902313507L

(rip is the instruction pointer. The r prefix indicates its length of 64 bits. We see that it has indeed advance when we took a step.)

Continue until the child raises a signal (SIGTRAP in this case). This may result in an error if the process terminates or raises a different signal.

>>> import signal
>>> process.waitSignals(signal.SIGTRAP)
ProcessSignal('Signal SIGTRAP',)

process.singleStep is non-blocking so we'll add a blocking version for convenience.

def step():
    process.singleStep()
    process.waitSignals(signal.SIGTRAP)

(Its not very clean, but lets use process as a global for the moment.)

Write to memory. In assembly, the instruction int 3 raises SIGTRAP. This instruction can be written as a single byte 0xCC.

>>> process.writeBytes(process.getreg('rip'), chr(0xCC))
>>> process.cont()
>>> process.waitSignals(signal.SIGTRAP)
ProcessSignal('Signal SIGTRAP',)

(We can also check the rip register before and after to see that it increase by exactly 1.)

Set a register

>>> process.setreg('rax', 0)

Now we have everything we need to run a single instruction given as bytes.

def run_asm(instr):
    old_rip = process.getreg('rip')
    old_values = process.readBytes(old_rip, len(instr))
    process.writeBytes(old_rip, instr)
    step()
    # Rewind rip unless the instruction altered it.
    if process.getreg('rip') == old_rip + len(instr):
        process.setreg('rip', old_rip)
    process.writeBytes(old_rip, old_values)

This overwrites the bytes below the instruction pointer with out instruction instr takes a step and revert the overwritten bytes and position of the instruction pointer. Only do that last bit if the instruction pointer wasn't changed (like in the case of a jump or call).

With a lookup table of assembly instructions to bytes, this could be put into a loop and made into a REPL.

Calling a function (first attempt)

What if we want to call an assembly function and pause after returning from it?

Lets write a Python function func_call(func_addr) for that. (Run this function line by line to examine its intermediate state.) First, save some of our current state.

def func_call(func_addr):
    old_rip = process.getreg('rip')
    old_regs = process.getregs()
    old_values = process.readBytes(old_rip, 6)

We could just use run_asm with the call instruction. That's byte 0xE8 followed by 5 bytes in little endian describing the difference between the current rip and our destination.

To pause the child after the call, we can write int 3 (byte 0xCC) after our call instructions.

    diff = func_addr - (old_rip + 5)
    new_values = chr(0xE8) + struct.pack('i', diff) + chr(0xCC)
    process.writeBytes(old_rip, new_values)
    step()

We can double check that the call was made

    new_rip = process.getreg('rip')
    assert(new_rip == func_addr)

Now lets run until it hits a SIGTRAP (hopefull the one we set).

    process.cont()
    process.waitSignals(signal.SIGTRAP)

And now restore the bytes we've overwritten and the register values. In some cases, we might want to keep some of those.

    process.writeBytes(old_rip, old_values)
    process.setregs(old_regs)

Getting a function's address

In fact, lets try to call C functions (compiled into the binary), for the moment with no argument and void return.

We just need to find the address of the function. We can get it from its header using pyelftools.

from elftools.elf.elffile import ELFFile
from elftools.elf.sections import SymbolTableSection

def variables(filename="a.out"):
    f = ELFFile(open(filename))
    symb_sections = [section for section in f.iter_sections()
                     if isinstance(section, SymbolTableSection)]
    variables = {symb.name:symb['st_value'] for section in symb_sections
                 for symb in section.iter_symbols()}
    return variables

and now call

>>> c_variables = variables("a.out")
>>> func_call(c_variables['some_func_name'])

In fact, this method gets all static variables (I think) not just functions. For shared libraries, we can call variables with the full path to the .so file of that library.

However, this won't always work because the actual region of memory used doesn't always start at 0 and we need to add the start of that region as offset.

For the momemnt, we can get it like so. We will explain and explore memory region and /proc/pid/maps a bit later.

>>> line1 = open("/proc/%s/maps" % pid).readline()
>>> _start = int(line1.split("-")[0], 16)
>>> start = _start if _start != 0x400000 else 0
>>> func_call(start + c_variables['some_func_name'])

Setting breakpoints

Now that we have functions' addresses, we can set breakpoint by just writing int 3 (byte 0xCC) at the start of the function.

def set_breakpoint(addr):
    old = process.readBytes(addr, 1)
    process.writeBytes(addr, chr(0xCC))
    return old

and restore the overwritten value once we hit the breakpoint

def restore_breakpoint(old):
    rip = process.getreg('rip')
    process.setreg('rip', rip - 1)
    addr = rip - 1
    process.writeBytes(addr, old)

They can then be used like so

>>> old = set_breakpoint(start + variables['my_func'])
>>> process.waitSignals(signal.SIGTRAP)
>>> restore_breakpoint(old)

Calling a function (second attempt)

There are some issues with the first approach to calling functions, although in general, it works surprisingly well.

Call distance too high call (0xE8) only takes 5 bytes as argument but an address (diff) may need 8 bytes to describe. We could either wait (step) until we are in range of the function we want to call (this only works if we don't need to call the function right away) or we could put the destination in a register, say rax, and call rax (bytes FF D0).

Overwritten bytes Since we overwrite 7 bytes (6 for call, one for int) and only restore these bytes after the function returns, anything else that reads them will get unexpected values. For example, if we made the call while inside the function body (and the program reaches old_rip again).

We could potentially restore 6 of the 7 bytes after one step, leaving only 0xCC. This only reduces the size of the problem.

We could manually craft a stack frame. I think this is what gdb does.

Instead, we will reserve a new piece of memory and write our instructions there.

Allocating memory

We can use the mmap system call (call number 9) to reserve some memory. This syscall needs some magic constants, some of which are in ptrace.syscall.

import ptrace.syscall
MMAP_PROT_BITMASK = {k:v for v,k in ptrace.syscall.posix_arg.MMAP_PROT_BITMASK}
MMAP_PROT_BITMASK['PROT_ALL'] = MMAP_PROT_BITMASK['PROT_READ']\
                              | MMAP_PROT_BITMASK['PROT_WRITE']\
              | MMAP_PROT_BITMASK['PROT_EXEC']
MAP_PRIVATE = 0x02
MAP_ANONYMOUS = 0x20
syscalls = {k: v for v, k in ptrace.syscall.linux_syscall64.SYSCALL_NAMES.items()}

With this function, we can call mmap. syscall is bytes 0F 05.

def reserve_memory(size):
    old_regs = process.getregs()
    regs = {'rax': syscalls['mmap'], 'rdi': 0, 'rsi': size,
            'rdx': MMAP_PROT_BITMASK['PROT_ALL'],
            'r10': MAP_PRIVATE | MAP_ANONYMOUS,
            'r8': -1, 'r9': 0}
    for reg, value in regs.items():
        process.setreg(reg, value)
    run_asm(chr(0x0f) + chr(0x05))
    result = process.getreg('rax')
    process.setregs(old_regs)
    return result

This strategy is adapted from this example. For referene, the constants are

syscalls['mmap'] = 9
MMAP_PROT_BITMASK['PROT_ALL'] = 7
MAP_PRIVATE | MAP_ANONYMOUS = 34

The address at which memory is reserved is in rax after the call so we extract and return it.

This lets us make our modified slightly safer function call

def safe_func_call(func_addr):
    old_rip = process.getreg('rip')
    old_regs = process.getregs()
    tmp_addr = reserve_memory(6)
    process.setreg('rip', tmp_addr)
    # call rax
    process.setreg('rax', func_addr)
    new_values = chr(0xff) + chr(0xd0) + chr(0xcc)
    process.writeBytes(tmp_addr, new_values)
    step()

    new_rip = process.getreg('rip')
    assert(new_rip == func_addr)
    process.cont()
    process.waitSignals(signal.SIGTRAP)
    process.setregs(old_regs)

This version may still segfault sometimes. I'm not entirely sure why.

Looking around

Lets add a look function to our debugger that tells us what are the next instructions. We need the distorm3 disassembler for this, which can be installed using pip.

PtraceProcess.disassemble then gives us an iterator of the next ten instructions

def look(addr=None):
    print("ip:", hex(process.getreg('rip')))
    for i, instr in enumerate(process.disassemble(start=addr)):
        hexa = instr.hexa
        hexa = ' '.join(hexa[i:i+2] for i in range(0, len(hexa), 2))
        print(str(i).ljust(4), hexa.ljust(24), instr.text.lower())

Running this gives something like

>>> look()
ip: 0x555c9860810dL
0    48 89 c2                 mov rdx, rax
1    48 8d 05 79 0f 20 00     lea rax, [rip+0x200f79]
2    48 89 10                 mov [rax], rdx
3    48 8d 05 6f 0f 20 00     lea rax, [rip+0x200f6f]
4    48 8b 00                 mov rax, [rax]
5    48 89 c6                 mov rsi, rax
6    48 8d 3d af 02 00 00     lea rdi, [rip+0x2af]
7    b8 00 00 00 00           mov eax, 0x0
8    e8 d8 fa ff ff           call 0x555c98607c10
9    48 8d 05 51 0f 20 00     lea rax, [rip+0x200f51]

PtraceProcess.dumpCode works similarly with different formatting.

Examining and modifying C

This post is already getting long. I will write about reading/writing C variables, running single C statement, shared library, dynamic loading and memory maps (/proc/pid/maps) next time.

Note about this project

Originally, I wasn't sure if I'd go this low level for my project (or if I really need to). Instead, I could just start debugging once my interpreter is up. It'd still be useful to have a separate interpreter and debugger so the former's state can be modified while frozen. (Imagine trying to alter the call stack while each the control flow of the stack altering function is determined by the top of that stack! It may be possible but is already very hard to reason about.)

I might still try to make the interpreter support having an external debugger plugin at some point.

Source

The source for this post is here.

Some useful references

Posted on Jun 14, 2018

Blog index RSS feed Contact