Reverse Engineering
Introduction
In most reverse engineering situations you have a file or set of files that you are hoping to learn more about. In these cases you do not have the source code available. Instead, you have a compiled program that may or may not have some additional obfuscation that prevents you from reading it. Before understanding how to reverse a compiled program, you must have a high-level understanding of how a compiler works:
A compiler takes a program written in a high level language and translates it to a less human-readable language. C programs are often compiled to executable machine code while higher level languages like Java are often compiled into bytecode and executed by an interpreter (the JVM in this case).
In some cases, decompilers have been made that reverse the steps taken during the compilation process like serialization and opcode translation. Fairly accurate decompilers exist for some popular bytecode formats.
* Java .class - JD
* Python .pyc - Uncompyle2, decompyle, unpyc
Decompilers for machine code are generally not as effective since there is so much information lost during compilation to machine code. Because of this, disassamblers and debuggers are often used to analyze these types of programs. IDA Pro is the state-of-the-art in this field but there are also some free alternatives like OllyDbg. However, we'll be using the simplest debugger that you're already familar with: gdb.
First Steps
The binary we will be looking at is under the student image in /mnt/labs/reverse-engineering
There are several lightweight tools that can give you an incredible amount of information about the file you are trying to analyze. We will be primarily looking at the following tools: file
, strings
, strace
, objdump
, and gdb
.
The first tool, file
, tells you the type of file you are looking at. It does this primarily by looking at the file's magic number
. Different file formats have different magic numbers which are stored as metadata within the files themselves.
$ file unknown
unknown: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 3.2.0, not stripped
Just based on that, we know the following: This file is an ELF binary. ELF is essentially the standard binary format for the x86(-64) architecture. It's built for a 64-bit LSB x86 Linux machine. It uses some shared library functions. Lastly, the symbol table is not stripped (man strip
) which will help (a lot) later.
The tool strings
simply returns all printable strings within a file. This often includes error messages and possibly function names.
$ strings unknown_binary
<SNIPPED>
Usage: %s <port>
Bad port
Socket: bad file descriptor
Failed to bind
Failed to listen
Unauthorized to use this command
Must specify a username!
admin
Authorized as admin user
You may now execute privileged commands
Bad authentication token
Logged in as: %s
/quit
/user
/print_secret
Unrecognized command: %s
<SNIPPED>
The compiler generated strings, function names, and some other information is left out of the example above (so your output will have a lot of additional information). These are all raw strings in the binary. There are a few things to note here. This is clearly some type of network activity happening (port, socket, bind, etc), and there seems to be a command interface with a permission system.
First of all, running the file gives you some basic usage information. The only required argument is a port number. The tool strace
displays all system calls called by a file along with arguments and return values. All filesystem and network interactions use system calls. ltrace
is an equivalent tool for linked library calls. The first line in strace will always be a call to execve. Why? Because that is the system call used to spawn a child process which is how all programs are started.
$ chmod +x unknown_binary
$ strace ./unknown_binary 4567
execve("./unknown", ["./unknown", "4567"], 0x7fffc418a298 /* 76 vars */) = 0
<SNIPPED>
socket(AF_INET, SOCK_STREAM, IPPROTO_IP) = 3
bind(3, {sa_family=AF_INET, sin_port=htons(4567), sin_addr=inet_addr("0.0.0.0")}, 16) = 0
listen(3, 5) = 0
accept(3,
We can see that it has set up the socket and is now waiting for a connection on the port provided. Use nc
to connect to it and try using some of the commands identified from strings
.
You will find that there doesn't seem to be a way to get the secret info through the interface alone.
Disassembly Analysis
A disassembler can reliably translate machine code into assembly code. Both objdump
and gdb
have disassemblers along with a number of other helpful options for disassembly analysis.
objdump
can display the program headers, symbol table and full disassembly of an executable file. The program headers will have some interesting information like memory mappings, executable sections of memory, and the starting address of execution. You can check that out with objdump -x
. Here's the symbol table from objdump -t
:
$ objdump -t unknown | grep .text
0000000000001190 l d .text 0000000000000000 .text
00000000000011c0 l F .text 0000000000000000 deregister_tm_clones
00000000000011f0 l F .text 0000000000000000 register_tm_clones
0000000000001230 l F .text 0000000000000000 __do_global_dtors_aux
0000000000001280 l F .text 0000000000000000 frame_dummy
0000000000001850 g F .text 0000000000000005 __libc_csu_fini
0000000000001662 g F .text 0000000000000171 run
00000000000014c2 g F .text 0000000000000008 quit
00000000000014ca g F .text 00000000000000be authorize
00000000000017e0 g F .text 0000000000000065 __libc_csu_init
0000000000001400 g F .text 00000000000000c2 printSecret
0000000000001190 g F .text 000000000000002f _start
000000000000128c g F .text 0000000000000171 main
00000000000015a3 g F .text 00000000000000bf parseCommand
0000000000001588 g F .text 000000000000001b sanitize
You can make the guess at this point that there are some important functions to explore: authorize
or printSecret
are probably a good place to start.
gdb
is a command-line debugger with many uses that will not be described here. Take a look at the documentation if you don't feel too comfortable with it.
If you tried all of the commands earlier, you probably noticed that logging in as a user is simple with /user gdb
to authorize you as the admin.
$ gdb unknown
Reading symbols from unknown...
(No debugging symbols found in unknown)
(gdb) b authorize
Breakpoint 1 at 0x14ca
(gdb) run 5000
Starting program: unknown 5000
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".
(in another terminal)
$ nc localhost <YOUR PORT>
$ /user admin password
(back to gdb terminal)
[New Thread 0x7ffff7d87700 (LWP 85484)]
Starting session 4
[Switching to Thread 0x7ffff7d87700 (LWP 85484)]
Thread 2 "unknown" hit Breakpoint 1, 0x00005555555554ca in authorize ()
Only the command "/user admin" hits a the final call to strtok
, which is the function that parses the third command line argument. You may be able to take advantage of something there.
Hints: the secret string is stored in some memory address, and the memory address could be found in some register.
Classwork
Reverse engineer the provided in order to print the secret information. There are many ways to do this, and you are free to approach the problem however you wish. Some starting ideas for you include: finding the admin password and signing in, using gdb to trick the program into thinking you provided the correct password (even when you did not), or even forcing the program into decoding and printing the secret without ever needing to sign in! In your submission, please also include a brief description of how you get the secret information.