What are in this section?
4.1 The Conditions for Buffer Overflow to Occur
The flow of event analysis in the demonstration showed how the stack-based buffer overflow happens and during the process, the main conditions why it occurs can be summarized as follows:
The following section discusses those conditions in more detail.
4.1.1 Using Unsafe C Function
In the first condition, the unsafe C functions are functions that do not do the bound checking when copying or moving data to the destination buffer. The C functions that do not do the bound checking (or in other situations no NULL character (\0) has been used explicitly to terminate the string in buffer where it supposed to be terminated) mostly related to string and character manipulation such as gets() and strcpy(). Unfortunately the C string and character library is the mostly used in C programming. Programmers need to be fully understood on how to use those functions, the effect in generating buffer overflow vulnerability if not used properly and then should add extra code to do the bound checking manually.
For example, the code used in the demonstration can terminate the execution when we add the following simple extra code to check the permitted boundary and if it is violated, the program just exits.
/* bofvulcode.c */
int main(int argc, char **argv)
/* declare a buffer with max 512 bytes in size*/
/* verify the input */
if(argc < 2)
printf("Usage: %s <string_input_expected>\n", argv);
if (strlen(argv) > 512)
/* else if there is an input, copy the string into the buffer */
/* display the buffer's content */
printf("Buffer's content: %s\n", mybuff);
The program will continue if the input size is less than or equal to 512, otherwise it will terminate. The setback is the extra code added which in large program may affect the program’s size and the execution speed.
For the second condition, the input validation also can be implemented in the program to stop the buffer overflow. The problem is, not all the input combinations can be tested during the development or testing phase and this might becomes worse if the program is a shared library.
However the common mistakes and previous exploits can be used as a guide. For example, the previous program can be enhanced in validating the input something like shown below.
if (strlen(argv) > 512)
if(element_of_argv == NOP && element_of_argv == "sh" && ...)
if(last_argv_element != '\0')
In some situations the last byte may be a null character that terminates the string and it is of course very complex and bigger code if we want to verify every character in the string. However this can be done at the coding level.
We may need to have secure C coding knowledge to implement this thing but it can prevent the buffer overflow problem at the coding stage, even before we compile the code. Other than the lack of knowledge and skill, development time schedule issues, the size of the program and the execution speed of the binary always become the complaint. There need a further research to see the relationship between adding extra code specific to prevent buffer overflow and program’s size and speed to get the true picture. This can be considered critical for big program.
Some Integrated Development Environment (IDEs) already incorporated the validation feature such as NetBeans (multi platform and multi language IDE) and PHP. This feature of course will make user more aware regarding the importance of the input validation task and make them readily available.
Figure 4.X: NetBeans IDE with some common validation task feature for Java web application development
The third condition is more closely related to the vendor or implementer that supplies the hardware and the software. As mentioned previously, for C function call, compiler placing the return address adjacent to the program’s data and code. Furthermore, from the information discussed in section 2.5.3 and 2.6, this issue actually 'inherited' from the processor’s execution environment and stack set up mechanism. It is obvious this issue is out of programmer’s control. Although other methods such as relocating and encrypting the return address have been tried before, not all is successful.
Another thing to consider is the memory management implemented by processor such as paging and OS roles in protecting the return address such as local policy and multi-level privileges. In .NET programming for example, coder can implement the Code Access Security and Role-based Security. However, this is coding level implementation.
In the fourth condition, a suitable exploit code must be available. Without a suitable exploit code, the program just terminates. In this case, more knowledge and skill are needed. After knowing a program is vulnerable to buffer overflow, based on the known platform, he/she needs to create a shellcode using assembly language and tries precisely to overwrite the return address. Where the return address to be pointed to is a matter of their knowledge, skill and creativity. However tools, exploit code sample, POC etc. can be easily found in the Internet domain, making it even easier to exploit.
As discussed in the Literature Review section, a lot of researches have been concentrated on the third and fourth conditions that are to protect and detect the buffer overflow after the compilation (compiler) and running the program (through OS, processor, memory management etc.). The following Figure shows a block diagram for computer system detection and prevention at various stages.
Figure 4.1: Buffer overflow detection and prevention at various stages of program execution
It is apparent there should be more mechanisms that can be implemented for the first and second conditions, at the coding stage. Other than practising secure coding and using secure version of the C library, without properly understanding on how to use those functions the buffer overflow problems might resurface.
Let recap what has been done in the demonstration and information discussed in the literature review. The following flowchart shows a simplified flow of the demonstration process. The simplified flow means it does not represent a complete application development cycle just for the discussion purpose only.
Figure 4.2: Buffer overflow issue during the coding, compiling and running a program
In part A, it is more on programmer role. Programmer must know and practice the secure coding for C and use the knowledge to prevent buffer overflow. For example by adding extra code, understand and use the secure version library properly and figure out the input validation mechanism. The protections include source code level mechanism, third party tools such as static code scanner and code or peer review. New mechanisms may use syntax highlighting and intellisense to warn or alert programmer in real time. The advantages include architecture independent and legal input formats validation. The disadvantage for this stage includes extra codes that may increase the program’s size and speed.
Practising the secure coding and using the secure version of C library is very subjective. It is very difficult to find secure coding topic incorporated in any C syllabus. Most depend on the instructor’s ability and his/her discretion; others may have time constraint or consider it is not important. Many will have separate session for the secure coding.
In part B, these supposed to be a compiler, OS, memory and processor roles to detect and/or prevent buffer overflow. Detection and prevention include syntax and semantic analysis and warning mechanism. In current compiler technology, using default setting will generate warning if the unsafe constructs were used in the program. However, to reset this setting to error is not the compiler responsibility because of the program’s input and/or output diversity. Only the programmer that does the coding have better knowledge regarding the valid and invalid inputs. At this stage, other mechanisms are provided by OS, processor, memory management and patches. The disadvantageous include architecture dependent, need patches, re-editing the source code and re-compilation. Of course these things contribute to more time, cost and man-hour.
In part C, memory management, processor and OS play the roles to prevent the buffer overflow. At this stage the program development cycle almost completed. The resources such as man-hour and cost almost allocated. Imagine that the vendor or developer needs to create a patch, do the testing and go through the distribution channel. The buffer overflow solutions disadvantageous include architecture dependant, need patches and vulnerable to invalid input formats.
Apparently, both part B and C will waste more time, cost and man-hour. The patches also may not go through a complete design or development cycle. Moreover, applying patches normally may generate other new problems. As a conclusion, compared to part A, obviously, part B and C have demonstrated that the damage has been done.
It is clear that from Figure 4.2, if attention is emphasized at the coding level, writer thinks it is more beneficial for every party in minimizing or preventing buffer overflow problem. For example, Figure 4.3 shows a modified version of the previous flowchart if some mechanisms for buffer overflow detection and prevention implemented at the coding level. The flowchart incorporates secure coding knowledge and enhancing the editor as practical examples.
Figure 4.3: Buffer overflow detection and prevention enhancement at the coding level
Other than ensuring programmer has the secure coding knowledge, the editor and compiler features can be enhanced to alert or educate programmer in preventing the buffer overflow. For the secure coding, it is suggested that any C/C++ syllabus must incorporate secure coding at least as an extra topic. The buffer overflow vulnerability and exploit must be taught while using the unsafe C library. The theory class can be further extended to a practical lab which will provide more real experience and exposure on the severe effect that can be done by buffer overflow. In this case, the example can be found in the SEED project . Microsoft has taken the lead by implementing the Security Development Lifecycle (SDL) which covers broad processes of the software development which need to withstand malicious attacks. The SDL practices will be release to the development masses eventually.
On the C editor, the intellisense feature can be enhanced to include alert, warning or useful info regarding the buffer overflow vulnerability for the concerned C functions while programmer do the coding in real time making programmer, newbie or seasoned will aware the buffer overflow issue while doing the coding. Most of the C/C++ compiler and IDE already have the intellisense feature whether as a built-in or plug-in module. Microsoft Visual C++, for example has a built-in intellisense feature and gvim , the Windows version of vi editor has a plug-in type intellisense .
Another interesting thing to consider for the compile and run time is to implement a comprehensive exception handling as can be seen in Java (class and sample tutorial) and .NET family. C as a general programming language, the exception handling feature is the implementer responsibility and it is not mentioned in the standard, depending on the implementer to define it, if they want. For example, Microsoft has integrated the Structured Exception Handling (SEH)  for C (win32 programming) in its compiler however this still depend on the programmer whether to use it or not. SEH is not comprehensive and contains many undocumented part . In this case, the exception handling supposed to catch most of the security related issues including buffer overflow that possibly generated by using unsafe C functions.
The implementation should be emphasized at the compile time stage (though it also in the run-time), to clean up the code as much as possible before running it and it is very beneficial if exception handling can be included in the C standard. Another advantage using exception handling is when there are still problems after the Release version has been distributed, it is easier to locate and troubleshoot those problems.