Reverse Engineering

Introduction

In most reverse engineering situations you have a file or set of files that you are hoping to learn more about. In these cases you do not have the source code available. Instead, you have a compiled program that may or may not have some additional obfuscation that prevents you from reading it. Before understanding how to reverse a compiled program, you must have a high-level understanding of how a compiler works:

A compiler takes a program written in a high level language and translates it to a less human-readable language. C programs are often compiled to executable machine code while higher level languages like Java are often compiled into bytecode and executed by an interpreter (the JVM in this case).

In some cases, decompilers have been made that reverse the steps taken during the compilation process like serialization and opcode translation. Fairly accurate decompilers exist for some popular bytecode formats.

* Java .class - JD
* Python .pyc - Uncompyle2, decompyle, unpyc

Decompilers for machine code are generally not as effective since there is so much information lost during compilation to machine code. Because of this, disassamblers and debuggers are often used to analyze these types of programs. IDA Pro is the state-of-the-art in this field but there are also some free alternatives like OllyDbg. However, we'll be using the simplest debugger that you're already familar with: gdb.

First Steps

The binary we will be looking at is under the student image in /mnt/labs/reverse-engineering

There are several lightweight tools that can give you an incredible amount of information about the file you are trying to analyze. We will be primarily looking at the following tools: file, strings, strace, objdump, and gdb.

The first tool, file, tells you the type of file you are looking at. It does this primarily by looking at the file's magic number. Different file formats have different magic numbers which are stored as metadata within the files themselves.

$ file unknown
unknown: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 3.2.0, not stripped

Just based on that, we know the following: This file is an ELF binary. ELF is essentially the standard binary format for the x86(-64) architecture. It's built for a 64-bit LSB x86 Linux machine. It uses some shared library functions. Lastly, the symbol table is not stripped (man strip) which will help (a lot) later.

The tool strings simply returns all printable strings within a file. This often includes error messages and possibly function names.

$ strings unknown_binary
<SNIPPED>
Usage: %s <port>
Bad port
Socket: bad file descriptor
Failed to bind
Failed to listen
Unauthorized to use this command
Must specify a username!
admin
Authorized as admin user
You may now execute privileged commands
Bad authentication token
Logged in as: %s
/quit
/user
/print_secret
Unrecognized command: %s
<SNIPPED>

The compiler generated strings, function names, and some other information is left out of the example above (so your output will have a lot of additional information). These are all raw strings in the binary. There are a few things to note here. This is clearly some type of network activity happening (port, socket, bind, etc), and there seems to be a command interface with a permission system.

First of all, running the file gives you some basic usage information. The only required argument is a port number. The tool strace displays all system calls called by a file along with arguments and return values. All filesystem and network interactions use system calls. ltrace is an equivalent tool for linked library calls. The first line in strace will always be a call to execve. Why? Because that is the system call used to spawn a child process which is how all programs are started.

$ chmod +x unknown_binary
$ strace ./unknown_binary 4567
execve("./unknown", ["./unknown", "4567"], 0x7fffc418a298 /* 76 vars */) = 0
<SNIPPED>
socket(AF_INET, SOCK_STREAM, IPPROTO_IP) = 3
bind(3, {sa_family=AF_INET, sin_port=htons(4567), sin_addr=inet_addr("0.0.0.0")}, 16) = 0
listen(3, 5)                            = 0
accept(3,

We can see that it has set up the socket and is now waiting for a connection on the port provided. Use nc to connect to it and try using some of the commands identified from strings.

You will find that there doesn't seem to be a way to get the secret info through the interface alone.

Disassembly Analysis

A disassembler can reliably translate machine code into assembly code. Both objdump and gdb have disassemblers along with a number of other helpful options for disassembly analysis.

objdump can display the program headers, symbol table and full disassembly of an executable file. The program headers will have some interesting information like memory mappings, executable sections of memory, and the starting address of execution. You can check that out with objdump -x. Here's the symbol table from objdump -t:

$ objdump -t unknown | grep .text
0000000000001190 l    d  .text  0000000000000000              .text
00000000000011c0 l     F .text  0000000000000000              deregister_tm_clones
00000000000011f0 l     F .text  0000000000000000              register_tm_clones
0000000000001230 l     F .text  0000000000000000              __do_global_dtors_aux
0000000000001280 l     F .text  0000000000000000              frame_dummy
0000000000001850 g     F .text  0000000000000005              __libc_csu_fini
0000000000001662 g     F .text  0000000000000171              run
00000000000014c2 g     F .text  0000000000000008              quit
00000000000014ca g     F .text  00000000000000be              authorize
00000000000017e0 g     F .text  0000000000000065              __libc_csu_init
0000000000001400 g     F .text  00000000000000c2              printSecret
0000000000001190 g     F .text  000000000000002f              _start
000000000000128c g     F .text  0000000000000171              main
00000000000015a3 g     F .text  00000000000000bf              parseCommand
0000000000001588 g     F .text  000000000000001b              sanitize

You can make the guess at this point that there are some important functions to explore: authorize or printSecret are probably a good place to start.

gdb is a command-line debugger with many uses that will not be described here. Take a look at the documentation if you don't feel too comfortable with it.

If you tried all of the commands earlier, you probably noticed that logging in as a user is simple with /user . However, we need to log in as an admin in order to get the secret info. With some experimenting, you can find a way to use gdb to authorize you as the admin.

$ gdb unknown
Reading symbols from unknown...
(No debugging symbols found in unknown)
(gdb) b authorize
Breakpoint 1 at 0x14ca
(gdb) run 5000
Starting program: unknown 5000
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".

(in another terminal)
$ nc localhost <YOUR PORT>
$ /user admin password

(back to gdb terminal)
[New Thread 0x7ffff7d87700 (LWP 85484)]
Starting session 4
[Switching to Thread 0x7ffff7d87700 (LWP 85484)]

Thread 2 "unknown" hit Breakpoint 1, 0x00005555555554ca in authorize ()

Only the command "/user admin" hits a the final call to strtok, which is the function that parses the third command line argument. You may be able to take advantage of something there.

Hints: the secret string is stored in some memory address, and the memory address could be found in some register.

Classwork

Reverse engineer the provided in order to print the secret information. There are many ways to do this, and you are free to approach the problem however you wish. Some starting ideas for you include: finding the admin password and signing in, using gdb to trick the program into thinking you provided the correct password (even when you did not), or even forcing the program into decoding and printing the secret without ever needing to sign in! In your submission, please also include a brief description of how you get the secret information.