This post continues the exploration of using the x86 trap flag and signal handlers to observe the execution of a program; this time with the goal of injecting the ‘debugger’ into a compiled binary. This includes an overview of LD_PRELOAD.
In my last post
I introduced the idea of observing every step of
the execution of a program by setting the x86 trap flag which causes
an interrupt after every instruction, and a signal handler to catch
that interrupt within the same process. This turns out to be much
faster than using the Linux ptrace
facilty, which is the ‘proper’
way to write a debugger to observe the execution of a program.
The flow looks kind of like this:
One issue is that as a ‘debugger’ this is not very useful if code has to be inserted into the source of the program that you want to observe. It would be much more useful if we could inject this into an already compiled program.
The easiest way to inject code into a compiled program is to use
LD_PRELOAD
. This is an environment variable that allows overriding
dynamic library functions that a program uses.
Aside: LD_PRELOAD explained
Every time a program is loaded to be executed, there’s a program
called the dynamic linker that, among other things, finds the list of
dynamic library functions that the program uses, and links them into
the executable (dynamic libraries are .so
files).
When the LD_PRELOAD
environment variable refers to the
filename of a dynamic library, the dynamic linker will attempt to
always find a dynamic function in that library first. Thus it allows
us to override dynamic functions.
Julia Evans explains this well in this post, which in turn refers to the nice tutorial Dynamic linker tricks: Using LD_PRELOAD to cheat, inject features and investigate programs by Rafał Cieślak.
Let’s do a simple hello world example program that we will override:
#include <stdio.h>
int main() {
printf("hello\n");
printf("world!\n");
}
We can figure out which dynamic functions are called using nm -D
:
>> nm -D hello
w __gmon_start__
080484cc R _IO_stdin_used
U __libc_start_main
U puts
Apparently the program uses puts
to write the output on screen.
We can write a tiny shared library to override it:
#include <unistd.h>
int puts(const char *str) {
write(1, "NOPE!\n", 6);
return 1; // success
}
Note that we can’t use puts
itself when printing, because this
will result in the function calling itself. We’ll just use
the write system call to output on screen.
We need some special way to compile the function to create a shared library:
>> gcc -shared -fPIC override.c -o override.so
And then we can use LD_PRELOAD
and override puts
in our hello-world example:
>> LD_PRELOAD=./override.so ./hello
NOPE!
NOPE!
We may have a problem if we need to call the original function that
was overridden. Thankfully, we ask the dynamic loader to give us a pointer
to the original function using dlsym
:
#define_GNU_SOURCE
#include <dlfcn.h>
#include <stdlib.h>
typedef int (*putsfn)(const char* str);
int puts(const char *str) {
static putsfn orig_puts = NULL;
if (orig_puts == NULL)
orig_puts = (putsfn)dlsym(RTLD_NEXT, "puts"); // get original puts
return original_puts("NOPE!");
}
To compile, we now need to tell gcc to include libdl
via -ldl
.
Using LD_PRELOAD to inject trap-flag tracing
Back to our problem at hand — how are going to inject trap-flag tracing into an already compiled program? The trivial program I was using in the last post({filename}/blog/2017-01-11-trapflag-tracing.md) was one that merely executes a million instructions and then quits.
void main() {
int num_instructions = 1000000;
unsigned int count = num_instructions >> 1;
asm volatile(
"1:\n" // define local label
"decl %%eax\n" // eax -= 1
"jnz 1b\n" // jump to previous local label 1 (before) if not zero
: // no output regs
: "a"(count) // input count -> eax
);
}
If we want to use LD_PRELOAD
to inject the tracer, we have to find
some dynamic library function this program is calling. And it should
be as early in the program as possible, and should be ideally called
by any program. In our example program we don’t make any explicit
library calls, but may there’s still some setup code? Let’s just check what
the program uses using nm -D
:
>> nm -D loop
w __gmon_start__
0804847c R _IO_stdin_used
U __libc_start_main
This __libc_start_main
function seems like an interesting candidate!
Some poking around the source code of libc
(find your version using ldd --version
) revealed that this is
actually the function that calls main
! So this function
will start executing even before main
.
This means we could intercept __libc_start_main
, start the tracer by
setting the trap-signal handler and setting the trap flag. Then we can
call the original __libc_start_main
to start the original
program. This should allow us to inject the trap-flag tracer into
arbitrary compiled programs, as long as they are based on libc
(and
call libc
dynamically, i.e. they are not statically compiled).
Our overridden __lib_start_main
looks like this:
// declare type of __libc_start_main function
typedef int (*MainFnType)(int (*main)(int, char **, char **),
int argc,
char **argv,
int (*init)(void),
void (*fini)(void),
void (*ldso_fini)(void),
void (*stack_end));
// override __libc_start_main
int __libc_start_main(int (*main)(int, char **, char **),
int argc,
char **argv,
int (*init)(void),
void (*fini)(void),
void (*ldso_fini)(void),
void (*stack_end)) {
// get original function
MainFnType orig_main = (MainFnType)dlsym(RTLD_NEXT,
"__libc_start_main");
// start tracing
startTrace();
// call original function
int result = orig_main(main, argc, argv,
init, fini, ldso_fini, stack_end);
return result;
}
The start/stop trace functions are the same as for the previous post:
static struct sigaction trapSa;
void startTrace() {
// set up trap signal handler
trapSa.sa_flags = SA_SIGINFO;
trapSa.sa_sigaction = trapHandler;
sigaction(SIGTRAP, &trapSa, NULL);
setTrapFlag();
}
void stopTrace() {
clearTrapFlag();
printf("cycles: %lld\n", ccycle);
}
void setTrapFlag() {
asm volatile("pushfl\n" // push status register to stack
"orl $0x100, (%esp)\n" // set trap-flag of on-stack value
"popfl\n" // pop status register
);
}
void clearTrapFlag() {
asm volatile("pushfl\n" // push status register
"andl $0xfffffeff, (%esp)\n" // clear trap-flag
"popfl\n" // pop status register
);
}
In the actual code, I added some trickery catching the exit system call in order to find when the program finishes. I’ll explain how this works in the next post. For now let’s just assume we can catch the exit of the program, and execute the an exit handler. Again we will only count the cycles, and print how many cycles executed at the end of the program.
Now, if we run this on our trivial loop example, we get this:
>> LD_PRELOAD=./override.so loop
intercepted sys exit. cycles:0x000f499c
And it works!
Now we can trace compiled programs. Let’s try tracing some other simple programs:
>> LD_PRELOAD=./override.so /bin/echo hello world
hello world
/bin/echo: write error
intercepted sys exit. cycles:0x0003a70c
>> LD_PRELOAD=./override.so /bin/ls
LICENSE override.c README.md tracer.h
make.sh override.so tracer.c
/bin/ls: write error
intercepted sys exit. cycles:0x0003878f
(Note that we’re explicitly calling the programs, because echo
by itself may be dealt with directly by the shell)
This defintely works, although I’d prefer not to have those write errors.
But overall, now we have a way to step through an arbitrary compiled
program, assuming that is based on the libc
library. At a next step,
we can figure out how exactly I intercepted those exits, and actually
collect some useful information about the execution of the program.
See the source code at the current commit on github.