Encoding Shellcode
10/27/2014
jms \ \ negation dot net
One of the basic goals in exploit development is to introduce arbitrary
code onto the stack in order to modify execution flow. In some
instances, such as muts' HP OpenView NNM exploit, the application will
only accept a limited number of characters as valid input. The
application strips away or corrupts "bad" characters and
execution flow fails. The solution is to compose a series of simple
mathematic instructions using the allowed character set that, when
'solved' via execution, result in the original payload being
reconstructed elsewhere in memory and eventually being executed itself.
In more visual terms:
Let's say the following 4 decimal numbers constitute some theoretical
payload you want to wind up in a register (let's ignore endian byte
formatting, logic, sanity, and just demonstrate the premise);
22 25 58 16
However, this particular application only accepts input composed of
characters with a value less than or equal to 20, so 22, 25, 58 are all bad; we
cannot use them in the exploit payload.
The principle of shellcode encoding is that instead of trying to write
the value "22 25 58 16" to a register, we will compose a mathematic
instruction using allowed characters that, when executed, generates the
solution "22 25 58 16".
Since we can only use a number as high as 20, and 00 is considered a
null byte and can't be used, we can express "22 25 58 16" as the sum of
the following:
20 20 20 14 +
01 04 20 01 +
01 01 18 01 +
-----------
22 25 58 16
This is the mile-high overview of payload encoding in fairly silly but
straight forward terms, now let's tune this process for real world x86 assembly
use:
We want to write our code to memory 4 bytes at a time, each
expressed as an equation composed of 'good' characters that gets
decoded into a register and reconstructed at a safe place in memory.
We will decode our shellcode into memory 'in reverse' 4 bytes at a
time; the final 4 bytes get written at the bottom of the stack at the
stack pointer address, the next 4 get placed in front of those first 4
bytes, and so on, until you've written your entire shellcode 4 bytes at
a time (I'm not going to tuck into this weirdness but the more familiar
you get with how data gets written to the stack, the more sense this
will make).
For those who are a little sketchy on what hex is...:
In school, we first learn the decimal system, which is based on "10";
0,1,2,3,4,5,6,7,8,9
Hex, however, is based on 16.
0,1,2,3,4,5,6,7,8,9,a,b,c,d,e,f
So if we want to express half of 56 in decimal, the answer is 28.
But half of 0x56 in hex?
Let's use the Gnu Debugger (gdb) to solve that. Open up a Linux
shell on a host with gdb installed, and type "gdb":
jms$ gdb
(gdb) print /x 0x56 / 2
$1 = 0x2b
(gdb) print /x 0x2b + 0x2b
$2 = 0x56
(gdb)
There you go, half of 0x56 is 0x2b.
So, lets say our goal is to put the value 0xf4b2cdf5 into the EAX
register. However, the application at bar discards any hex value
higher than 0x7f.
Our work flow:
1) Zero out the EAX register so we can write to it.
2) Execute a series of SUB commands containing hex values with legal,
permitted characters, which populate EAX with our 0xf4b2cdf5.
(Expressed as F5 CD B2 F4 because Little Endian memory convention re-orders
data so the least significant byte is placed in the smallest address.)
Step one:
Zero out the EAX register.
You can take a register with value 00000000 and use "mov" to
write it to EAX, eg:
mov eax,ebx # where ebx is value 00000000 at this point in execution
flow
Or.. you can execute "xor eax,eax", which also sets the register to zero.
Or.. you can perform a simple AND instruction which takes 2 hex values
whose sum is 0x7f7f7f7f. Using gdb again, I start out by
subtracting a large 4 byte value composed of allowed characters from 0x7f7f7f7f:
(gdb) print /x 0x7f7f7f7f - 0x65656565
$1 = 0x1a1a1a1a
I can confirm my math is correct by adding the value I generated with the output of the equation:
(gdb) print /x 0x65656565 + 0x1a1a1a1a
$2 = 0x7f7f7f7f
(gdb)
Expressed in ASM:
AND EAX,65656565
AND EAX,1a1a1a1a
We have now zero'd out our EAX register using nothing but allowed
characters. (Other methods are available to accomplish this of course, these are simply three examples.)
Step two:
How to encode our 4 byte chunks using SUB commands and allowed
characters.
Let's work with this value: 0xefb85430
Our first step is to create something called the "2's Compliment" for
this value, which is a necessary precurser for binary
addition/subtraction. In math terms, to generate the "2's
Compliment" we are going to 'invert the bits and add 1', using a very
simple formulae and our Gnu Debugger.
Again, open up gdb in your linux shell.
First, we reverse the order of the 4 bytes (again, endian architecture
reasons), so "EF B8 54 30" becomes "30 54 B8 EF" or 0x3054b8ef.
This is how we 'invert the bits and add 1' to 0x3054b8ef:
(gdb) print /x 0xffffffff - 0x3054b8ef + 1
$1 = 0xcfab4711
Our 2's compliment of 0xefb85430 is 0xcfab4711 or CF AB 47 11.
Right off, we can see that CF and AB are 'bad' characters; they are
higher than 0x7f, so this particular application will not accept
them. So what we need are some hex values composed of characters
lower than 0x7f that when subtracted from 00000000, generate 0xcfab4711.
Some people can simply look at 0xcfab4711 and chop it up in their heads
to generate appropriate values. More power to them. For
mortals however, there is an easier way:
a) Separate it into its 4 values, CF AB 47 11
b) moving from left to right, if a character is a 'bad' value, subtract
a large but safe value from it, say "0x66". If it's a "good"
character, subtract the smallest safe value, "0x01".
CF AB 47 11 ; original
66 66 01 01 ; good values
Test it in GDB:
(gdb) print /x 0xcfab4711 - 0x66660101
$1 = 0x69454610
69 45 46 10 are all allowed characters!
Let's check our math by adding the safe value we used above to the
output of our equation:
(gdb) print /x 0x69454610 + 0x66660101
$1 = 0xcfab4711
(gdb)
And if we subtract those two 'good' values from 0x00000000 we get:
(gdb) print /x 0x00000000 - 0x66660101 - 0x69454610
$1 = 0x3054b8ef
(gdb)
Our original value!
Beautiful, we have two 'good' values that will generate our original 4
byte value when the following assembly is run against a zero'd out EAX register:
SUB EAX,69454610
SUB EAX,66660101
So, when put into action:
AND EAX,65656565
AND EAX,1a1a1a1a ; EAX is now 00000000
SUB EAX,69454610
SUB EAX,66660101 ; 0xcfab4711 on eax register
Once we have 'encoded' every 4 byte chunk of our shellcode to safe memory location in this fashion it is decoded during execution, after being run via egghunting for example. And that's really it.
Helpful parting hints:
If you can't get 4 good values using 2 SUB commands, don't panic: try
using 3 SUB commands. Or 4. You're writing code and there are
many ways to accomplish simple goals, remember that. Don't be
afraid to freestyle a soluion that works.
Also, always check your values by adding them together manually.
Sometimes a value might get 'rounded off', in which case you should
increment or decrease one of your equation values by 1 (0x01) until the
final value reflects your original 2's compliment value.
If things aren't working out, see if there is a character that's bad that
you originally thought was good. We assumed in the above examples
that 0x01 - 0x7f were allowed, but sometimes a few random characters in
between generate unexpected results. Don't be afraid to
experiment with different values. Again, if it takes 5 or 6 SUB
statements to get where you want to go, go for it. As long
as your output is correct, you win.
You should also read up on 'stack alignment'.. if your reconstituted
payload doesnt begin on an appropriate 4-byte boundary, it wont
populate registers correctly.
Hope this helps fill in some blanks on an incredibly useful and powerful
technique. Please feel free to request clarification or
correction if anything above seems off to you.