Encoding Shellcode


jms \ \ negation dot net

One of the basic goals in exploit development is to introduce arbitrary code onto the stack in order to modify execution flow.  In some instances, such as muts' HP OpenView NNM exploit, the application will only accept a limited number of characters as valid input. The application strips away or corrupts "bad" characters and execution flow fails.  The solution is to compose a series of simple mathematic instructions using the allowed character set that, when 'solved' via execution, result in the original payload being reconstructed elsewhere in memory and eventually being executed itself.

In more visual terms:

Let's say the following 4 decimal numbers constitute some theoretical payload you want to wind up in a register (let's ignore endian byte formatting, logic, sanity, and just demonstrate the premise);

22 25 58 16

However, this particular application only accepts input composed of characters with a value less than or equal to 20, so 22, 25, 58 are all bad; we cannot use them in the exploit payload.

The principle of shellcode encoding is that instead of trying to write the value "22 25 58 16" to a register, we will compose a mathematic instruction using allowed characters that, when executed, generates the solution "22 25 58 16".

Since we can only use a number as high as 20, and 00 is considered a null byte and can't be used, we can express "22 25 58 16" as the sum of the following:

20 20 20 14 +
01 04 20 01 +
01 01 18 01 +
22 25 58 16

This is the mile-high overview of payload encoding in fairly silly but straight forward terms, now let's tune this process for real world x86 assembly use:

We want to write our code to memory 4 bytes at a time, each expressed as an equation composed of 'good' characters that gets decoded into a register and reconstructed at a safe place in memory.

We will decode our shellcode into memory 'in reverse' 4 bytes at a time; the final 4 bytes get written at the bottom of the stack at the stack pointer address, the next 4 get placed in front of those first 4 bytes, and so on, until you've written your entire shellcode 4 bytes at a time (I'm not going to tuck into this weirdness but the more familiar you get with how data gets written to the stack, the more sense this will make).

For those who are a little sketchy on what hex is...:

In school, we first learn the decimal system, which is based on "10";


Hex, however, is based on 16.


So if we want to express half of 56 in decimal, the answer is 28. 

But half of 0x56 in hex?

Let's use the Gnu Debugger (gdb) to solve that.  Open up a Linux shell on a host with gdb installed, and type "gdb":

jms$ gdb

(gdb) print /x 0x56 / 2
$1 = 0x2b
(gdb) print /x 0x2b + 0x2b
$2 = 0x56

There you go, half of 0x56 is 0x2b.

So, lets say our goal is to put the value 0xf4b2cdf5 into the EAX register.  However, the application at bar discards any hex value higher than 0x7f. 

Our work flow:

1) Zero out the EAX register so we can write to it.
2) Execute a series of SUB commands containing hex values with legal, permitted characters, which populate EAX with our 0xf4b2cdf5. (Expressed as F5 CD B2 F4 because Little Endian memory convention re-orders data so the least significant byte is placed in the smallest address.)

Step one:

Zero out the EAX register.

You can take a register with value 00000000 and use "mov" to write it to EAX, eg:

mov eax,ebx # where ebx is value 00000000 at this point in execution flow

Or.. you can execute "xor eax,eax", which also sets the register to zero.

Or.. you can perform a simple AND instruction which takes 2 hex values whose sum is 0x7f7f7f7f.  Using gdb again, I start out by subtracting a large 4 byte value composed of allowed characters from 0x7f7f7f7f:

(gdb) print /x 0x7f7f7f7f - 0x65656565
$1 = 0x1a1a1a1a

I can confirm my math is correct by adding the value I generated with the output of the equation:

(gdb) print /x 0x65656565 + 0x1a1a1a1a
$2 = 0x7f7f7f7f

Expressed in ASM:

AND EAX,65656565
AND EAX,1a1a1a1a

We have now zero'd out our EAX register using nothing but allowed characters. (Other methods are available to accomplish this of course, these are simply three examples.)

Step two:

How to encode our 4 byte chunks using SUB commands and allowed characters.

Let's work with this value:  0xefb85430

Our first step is to create something called the "2's Compliment" for this value, which is a necessary precurser for binary addition/subtraction.  In math terms, to generate the "2's Compliment" we are going to 'invert the bits and add 1', using a very simple formulae and our Gnu Debugger.

Again, open up gdb in your linux shell.

First, we reverse the order of the 4 bytes (again, endian architecture reasons), so "EF B8 54 30" becomes "30 54 B8 EF" or 0x3054b8ef.

This is how we 'invert the bits and add 1' to 0x3054b8ef:

(gdb) print /x 0xffffffff - 0x3054b8ef + 1    
$1 = 0xcfab4711

Our 2's compliment of 0xefb85430 is 0xcfab4711 or CF AB 47 11.

Right off, we can see that CF and AB are 'bad' characters; they are higher than 0x7f, so this particular application will not accept them.  So what we need are some hex values composed of characters lower than 0x7f that when subtracted from 00000000, generate 0xcfab4711.

Some people can simply look at 0xcfab4711 and chop it up in their heads to generate appropriate values.  More power to them.  For mortals however, there is an easier way:

a) Separate it into its 4 values, CF AB 47 11

b) moving from left to right, if a character is a 'bad' value, subtract a large but safe value from it, say "0x66".  If it's a "good" character, subtract the smallest safe value, "0x01".

CF AB 47 11 ; original
66 66 01 01 ; good values

Test it in GDB:

(gdb) print /x 0xcfab4711 - 0x66660101
$1 = 0x69454610

69 45 46 10 are all allowed characters!

Let's check our math by adding the safe value we used above to the output of our equation:

(gdb) print /x 0x69454610 + 0x66660101
$1 = 0xcfab4711

And if we subtract those two 'good' values from 0x00000000 we get:

(gdb) print /x 0x00000000 - 0x66660101 - 0x69454610
$1 = 0x3054b8ef

Our original value!

Beautiful, we have two 'good' values that will generate our original 4 byte value when the following assembly is run against a zero'd out EAX register:

SUB EAX,69454610
SUB EAX,66660101

So, when put into action:

AND EAX,65656565
AND EAX,1a1a1a1a  ; EAX is now 00000000

SUB EAX,69454610
SUB EAX,66660101  ; 0xcfab4711 on eax register

Once we have 'encoded' every 4 byte chunk of our shellcode to safe memory location in this fashion it is decoded during execution, after being run via egghunting for example. And that's really it. 

Helpful parting hints:

If you can't get 4 good values using 2 SUB commands, don't panic: try using 3 SUB commands.  Or 4. You're writing code and there are many ways to accomplish simple goals, remember that.  Don't be afraid to freestyle a soluion that works.

Also, always check your values by adding them together manually.  Sometimes a value might get 'rounded off', in which case you should increment or decrease one of your equation values by 1 (0x01) until the final value reflects your original 2's compliment value.

If things aren't working out, see if there is a character that's bad that you originally thought was good.  We assumed in the above examples that 0x01 - 0x7f were allowed, but sometimes a few random characters in between generate unexpected results.  Don't be afraid to experiment with different values.  Again, if it takes 5 or 6 SUB statements to get where you want to go, go for it.  As long as your output is correct, you win.

You should also read up on 'stack alignment'.. if your reconstituted payload doesnt begin on an appropriate 4-byte boundary, it wont populate registers correctly.

Hope this helps fill in some blanks on an incredibly useful and powerful technique.  Please feel free to request clarification or correction if anything above seems off to you.