When writing shellcode, the goal is to keep the shellcode as small as possible. This makes it flexible enough to use in many situations. Sometimes you might want to do something pretty complicated, though. A good example is a payload that will get you a VNC session on someone's machine. Shellcode for these types of things tend to be pretty large — too large to use in most exploits. This is where stagers come in. A stager allows you to use a small payload for the initial exploit, then load a larger payload later.
The first stage
As mentioned before, the first stage should be small. Therefore, it shouldn't do too much. It should do whatever it needs to do. No more, no less. So what does it need to do? It needs to load the second stage into memory and begin executing it. That's all.
For network based exploits, that involves listening on a port and reading the second stage. You could also have the stager connect back to a port on your machine to read the second stage. The preferred way to do this would depend on whether the machine you are attacking has any sort of firewall set up. If there is no firewall, then the best way would be the way that ends up with shorter shellcode for the first stage.
The second stage
The second stage is really whatever shellcode you would have used for the first stage. It doesn't need to be modified at all. This is one advantage of stagers. All exploit code that might have worked had it fit within the memory restrictions imposed by the vulnerability can now work without any changes just by using a stager.
Making it flexible
There are two things that should be considered when making a stager that is pretty general. The first consideration is simple — the second stage is variable length. To overcome this, all you have to do is read an integer representing the length of the shellcode before reading the actual shellcode into memory. Pretty simple.
The second item is not so simple. Where should the second stage go in memory? It could go anywhere within the program's address space. If you choose a low address within that space, you might have enough memory in the program's address space to hold the second stage. If you don't, your stager will crash the program when it tries to write outside of the program's address space. Instead, you can use the
mmap system call to create a chunk of memory that will definitely hold the second stage. This will make the shellcode for the first stage longer, but it will produce a much more flexible stager.