Reverse Engineering

Introduction

In most reverse engineering situations you have a file or set of files that you are hoping to learn more about. In these cases you do not have the source code available. Instead, you have a compiled program that may or may not have some additional obfuscation that prevents you from reading it. Before understanding how to reverse a compiled program, you must have a high-level understanding of how a compiler works:

A compiler takes a program written in a high level language and translates it to a less human-readable language. C programs are often compiled to executable machine code while higher level languages like Java are often compiled into bytecode and executed by an interpreter (the JVM in this case).

In some cases, decompilers have been made that reverse the steps taken during the compilation process like serialization and opcode translation. Fairly accurate decompilers exist for some popular bytecode formats.

* Java .class - JD
* Python .pyc - Uncompyle2, decompyle, unpyc

Decompilers for machine code are generally not as effective since there is so much information lost during compilation to machine code. Because of this, disassamblers and debuggers are often used to analyze these types of programs. IDA Pro is the state-of-the-art in this field but there are also some free alternatives like OllyDbg. However, we'll be using the simplest debugger that you're already familar with: gdb.

First Steps

Get the provided binary and follow these steps to reverse out the secret info.

There are several lightweight tools that can give you an incredible amount of information about the file you are trying to analyze. We will be primarily looking at the following tools: file, strings, strace, objdump, and gdb.

The first tool, file, tells you the type of file you are looking at. It does this primarily by looking at the file's magic number. Different file formats have different magic numbers which are stored as metadata within the files themselves.

    ajk138@hamsa:~/reversing$ file unknown_binary
unknown_binary: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.24, BuildID[sha1]=0x8cf8e498027e2f83f8e8191aa6c32262805dc007, not stripped

Just based on that, we know the following: This file is an ELF binary. ELF is essentially the standard binary format for the x86 architecture. It's built for a 32-bit LSB x86 Linux machine. It uses some shared library functions. Lastly, the symbol table is not stripped (man strip) which will help later.

The tool strings simply returns all printable strings within a file. This often includes error messages and possibly function names.

ajk138@hamsa:~/reversing$ strings unknown_binary
/user
You must specify an account name
send error
admin
Welcome, admin!
Enter an admin command:
Welcome, admin! Enter a command:
Welcome, %s! Enter a command:
/ping
You must specify an account to ping
Pinged: %s
/super_ddos
You must specify a site to dos
Low Orbit Ion Cannon Activated
/print_secret_admin_info
: invalid command
Usage: ./lab PORT

The compiler generated strings and function names are left out of the example above (so your output will have a lot of additional information). These are all raw strings in the binary. Notice that there is a command restricted to admin users that seems like it'll print the secret info we want.

First of all, running the file gives you some basic usage information. The only required argument is a port number. The tool strace displays all system calls called by a file along with arguments and return values. All filesystem and network interactions use system calls. ltrace is an equivalent tool for linked library calls. The first line in strace will always be a call to execve. Why? Because that is the system call used to spawn a child process which is how all programs are started.

ajk138@hamsa:~/reversing$ chmod +x unknown_binary
ajk138@hamsa:~/reversing$ strace ./unknown_binary 4567
execve("./unknown_binary", ["./unknown_binary", "4567"], [/* 17 vars */]) = 0
<SNIPPED>
socket(PF_NETLINK, SOCK_RAW, 0)                 = 3
bind(3, {sa_family=AF_NETLINK, pid=0, groups=00000000}, 12) = 0
getsockname(3, {sa_family=AF_NETLINK, pid=20992, groups=00000000}, [12]) = 0
time(NULL)                                                            = 1389673000
sendto(3, "\24\0\0\0\26\0\1\3(\272\324R\0\0\0\0\0\0\0\0", 20, 0, {sa_family=AF_NETLINK, pid=0, groups=00000000}, 12) = 20
recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"0\0\0\0\24\0\2\0(\272\324R\0R\0\0\2\10\200\376\1\0\0\0\10\0\1\0\177\0\0\1"..., 4096}], msg_controllen=0, msg_flags=0}, 0) = 228
recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"@\0\0\0\24\0\2\0(\272\324R\0R\0\0\n\200\200\376\1\0\0\0\24\0\1\0\0\0\0\0"..., 4096}], msg_controllen=0, msg_flags=0}, 0) = 256
recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"\24\0\0\0\3\0\2\0(\272\324R\0R\0\0\0\0\0\0\1\0\0\0\24\0\1\0\0\0\0\0"..., 4096}], msg_controllen=0, msg_flags=0}, 0) = 20
close(3)                                                                = 0
socket(PF_INET6, SOCK_DGRAM, IPPROTO_IP) = 3
connect(3, {sa_family=AF_INET6, sin6_port=htons(4567), inet_pton(AF_INET6, "::", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, 28) = 0
getsockname(3, {sa_family=AF_INET6, sin6_port=htons(53015), inet_pton(AF_INET6, "::1", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28]) = 0
connect(3, {sa_family=AF_UNSPEC, sa_data="\0\0\0\0\0\0\0\0\0\0\0\0\0\0"}, 16) = 0
connect(3, {sa_family=AF_INET, sin_port=htons(4567), sin_addr=inet_addr("0.0.0.0")}, 16) = 0
getsockname(3, {sa_family=AF_INET6, sin6_port=htons(56404), inet_pton(AF_INET6, "::ffff:127.0.0.1", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28]) = 0
close(3)                                                                = 0
socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 3
bind(3, {sa_family=AF_INET, sin_port=htons(4567), sin_addr=inet_addr("0.0.0.0")}, 16) = 0
listen(3, 5)                                                        = 0
accept(3,

It's waiting for a connection on the port provided. Use nc to connect to it and try using some of the commands identified from strings.

At this point you know a lot about your file. You know the type of the file, any strings that were perserved, as well as any files touched, listening sockets, or other filesystem or network activity. Check out some commands you can run just based on the output of strings.

ajk138@hamsa:~/reversing$ nc localhost 4567
/test
/test: invalid command
/user
You must specify an account name
/ping
You must specify an account to ping
/ping /user
Pinged: /user
/user /ping
Welcome, /ping! Enter a command:
/user john
Welcome, john! Enter a command:
/ping john
Pinged: john
/user admin
Welcome, admin! Enter a command:
/print_secret_admin_info
You must logged in as an admin to run this command.

Didn't quite work. Since the file is in an executable format, the next step is disassembly analysis.

Disassembly Analysis

A disassembler can reliably translate machine code into assembly code. Both objdump and gdb have disassemblers along with a number of other helpful options for disassembly analysis.

objdump can display the program headers, symbol table and full disassembly of an executable file. The program headers will have some interesting information like memory mappings, executable sections of memory, and the starting address of execution. You can check that out with objdump -x. Here's the symbol table from objdump -t:

ajk138@hamsa:~/reversing$ objdump -t unknown_binary | grep .text
08048670 l        d    .text    00000000                            .text
080486a0 l         F .text    00000000                            __do_global_dtors_aux
08048700 l         F .text    00000000                            frame_dummy
080494c0 l         F .text    00000000                            __do_global_ctors_aux
080494b0 g         F .text    00000002                            __libc_csu_fini
080494b2 g         F .text    00000000                            .hidden __i686.get_pc_thunk.bx
0804876f g         F .text    00000b14                            do_command
08048724 g         F .text    0000004b                            random_token
08049440 g         F .text    00000061                            __libc_csu_init
08048670 g         F .text    00000000                            _start
08049283 g         F .text    000001b2                            main

You can make the guess at this point that there are three important functions in the file text: main, do_command, and random_token.

gdb is a command-line debugger with many uses that will not be described here. Take a look at the documentation if you don't feel too comfortable with it.

If you tried all of the commands earlier, you probably noticed that logging in as a user is simple with /user . However, we need to log in as an admin in order to get the secret info. With some experimenting you can find the following:

$ gdb unknown_binary
(gdb) b main
Breakpoint 1 at 0x8049287
(gdb) b random_token
Breakpoint 2 at 0x8048728
(gdb) b do_command
Breakpoint 3 at 0x8048775
(gdb) run <YOUR PORT>

(in another terminal)
$ nc hamsa.cs.northwestern.edu <YOUR PORT>
$ /user user
$ /user admin

(gdb terminal)
(gdb) run 4567
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /home/ajk138/reversing/unknown_binary 4567

Breakpoint 1, 0x08049287 in main ()
(gdb) c
Continuing.

Breakpoint 3, 0x08048775 in do_command ()
(gdb) c
Continuing.

Breakpoint 3, 0x08048775 in do_command ()
(gdb) c
Continuing.

Breakpoint 2, 0x08048728 in random_token ()
(gdb) c
Continuing.

Only the command "/user admin" hits the function random_token. To summarize what this tells us about do_command, to log in you must pass the string "/user" as the first token and to hit the random_token function you must pass the string "admin" as the second token. Looking further at the disassembly of do_command, you can see a third call to strtok compared with the output of the function random_token. This means you must pass a particular third token in order to log in as an admin.

Classwork

Reverse engineer the function random_token in the binary provided in order find the correct third token needed to log in as an admin. Check that you've logged in by sending the command /print_secret_admin_info.