HTML5 - Hacking Web Apps: Detecting and Preventing Web Application Security Problems (2012)

Hacking Web Apps: Detecting and Preventing Web Application Security Problems (2012)

Chapter 1. HTML5

Information in this chapter:

• What’s New in HTML5

• Security Considerations for Using and Abusing HTML5

Written language dates back at least 5000 years to the Sumerians, who used cuneiform for things like ledgers, laws, and lists. That original Stone Markup Language carved the way to our modern HyperText Markup Language. And what’s a site like Wikipedia but a collection of byzantine editing laws and lists of Buffy episodes and Star Trek aliens? We humans enjoy recording all kinds of information with written languages.

HTML largely grew as a standard based on de facto implementations. What some (rarely most) browsers did defined what HTML was. This meant that the standard represented a degree of real world; if you wrote web pages according to spec, then browsers would probably render it as you desired probably. The drawback of the standard’s early evolutionary development was that pages weren’t as universal as they should be. Different browsers had different quirks, which led to footnotes like, “Best viewed in Internet Explorer 4” or “Best viewed in Mosaic.” Quirks also created programming nightmares for developers, leading to poor design patterns (the ever-present User-Agent sniffing to determine capabilities as opposed to feature testing) or over-reliance on plugins (remember Shockwave?). The standard also had its own dusty corners with rarely used tags (<acronym>), poor UI design (<frame> and <frameset>) or outright annoying ones (<bgsound> and <marquee>). HTML2 tried to clarify certain variances. It became a standard in November 1995. HTML3 failed to coalesce into something acceptable. HTML4 arrived December 1999.

Eight years passed before HTML5 appeared as a public draft. It took another year or so to gain traction. Now, close to 12 years after HTML4 the latest version of the standard is preparing to exit draft state and become official. Those intervening 12 years saw the web become an ubiquitous part of daily life. From the first TV commercial to include a website URL to billion-dollar IPOs to darker aspects like scams and crime that will follow any technology or cultural shift.

The path to HTML5 included the map of de facto standards that web developers embraced from their favorite browsers. Yet importantly, the developers behind the standard gave careful consideration to balancing historical implementation with better-architected specifications. Likely the most impressive feat of HTML5 is the explicit description of how to parse an HTML document. What seems like an obvious task was not implemented consistently across browsers, which led to HTML and JavaScript hacks to work around quirks or, worse, take advantage of them. We’ll return to some of security implications of these quirks in later chapters, especially Chapter 2.

This chapter covers the new concepts, concerns, and cares for HTML5 and its related standards. Those wishing to find the quick attacks or trivial exploits against the design of these subsequent standards will be disappointed. The modern security ecosphere of browser developers, site developers, and security testers has given careful attention to HTML5. A non-scientific comparison of HTML4 and HTML5 observes that the words security and privacy appear 14 times and once respectively in the HTML4 standard. The same words appear 73 and 12 times in a current draft of HTML5. While it’s hard to argue more mentions means more security, it highlights the fact that security and privacy have attained more attention and importance in the standards process.

The new standard does not solve all possible security problems for the browser. What it does is reduce the ambiguous behavior of previous generations, provide more guidance on secure practices, establish stricter rules for parsing HTML, and introduce new features without weakening the browser. The benefit will be a better browsing experience. The drawback will be implementation errors and bugs as browsers compete to add support for features and site developers adopt them.


Modern browsers support HTML5 to varying degrees. Many web sites use HTML5 in one way or another. However, the standards covered in this chapter remain formally in working draft mode. Nonetheless, most have settled enough that there should only be minor changes in a JavaScript API or header as shown here. The major security principles remain applicable.

The New Document Object Model (DOM)

Welcome to <!doctype html>. That simple declaration makes a web page officially HTML5. The W3C provides a document that describes large differences between HTML5 and HTML4 at The following list highlights interesting changes:

• <!doctype html> is all you need. Modern browsers take this as an instruction to adopt a standards mode for interpreting HTML. Gone are the days of arguments of HTML vs. XHTML and adding DTDs to the doctype declaration.

• UTF-8 becomes the preferred encoding. This encoding is the friendliest to HTTP transport while being able to maintain compatibility with most language representations. Be on the lookout for security errors due to character conversions to and from UTF-8.

• HTML parsing has explicit rules. No more relying on or being thwarted by a browser’s implementation quirks. Quirks lead to ambiguity which leads to insecurity. Clear instructions on handling invalid characters (like NULL bytes) or unterminated tags reduce the chances of a browser “fixing up” HTML to the point where an HTML injection vulnerability becomes easily exploitable.

• New tags and attributes spell doom for security filters that rely on blacklists. All that careful attention to every tag listed in the HTML4 specification needs to catch up with HTML5.

• Increased complexity implies decreased security; it’s harder to catch corner cases and pathological situations that expose vulnerabilities.

• New APIs for everything from media elements to base64 conversion to registering custom protocol handlers. This speaks to the complexity of implementation that may introduce bugs in the browser.

Specific issues are covered in this chapter and others throughout the book.

Cross-Origin Resource Sharing (CORS)

Some features of HTML5 reflect the real-world experiences of web developers who have been pushing the boundaries of browser capabilities in order to create applications that look, feel, and perform no different than “native” applications installed on a user’s system. One of those boundaries being stressed is the venerable Same Origin Policy—one of the very few security mechanisms present in the first browsers. Developers often have legitimate reasons for wanting to relax the Same Origin Policy, whether to better enable a site spread across specific domain names, or to make possible a useful interaction of sites on unrelated domains. CORS enables site developers to grant permission for one Origin to be able to access the content of resources loaded from a different Origin. (Default browser behavior allows resources from different Origins to be requested, but access to the contents of each response’s resource is isolated per Origin. One site can’t peek into the DOM of another, e.g. set cookies, read text nodes that contain usernames, inject JavaScript nodes, etc.)

One of the browser’s workhorses for producing requests is the XMLHttpRequest (XHR) object. The XHR object is a recurring item throughout this book. Two of its main features, the ability of make asynchronous background requests and the ability to use non-GET methods, make it a key component of exploits. As a consequence, browsers have increasingly limited the XHR’s capabilities in order to reduce its adverse security exposure. With CORS, web developers can stretch those limits without unduly putting browsers at risk.

The security boundaries of cross-origin resources are established by request and response headers. The browser has three request headers (we’ll cover the preflight concept after introducing all of the headers):

Origin—The scheme/host/port of the resource initiating the request. Sharing must be granted to this Origin by the server. The security associated with this header is predicated on it coming from an uncompromised browser. Its value is to be set accurately by the browser; not to be modified by HTML, JavaScript, or plugins.

Access-Control-Request-Method—Used in a preflight request to determine if the server will honor the method(s) the XHR object wishes to use. For example, a browser might only need to rely on GET for one web application, but require a range of methods for a REST-ful web site. Thus, a web site may enforce a “least privileges” concept on the browser whereby it honors only those methods it deems necessary.

Access-Control-Request-Headers—Used in a preflight request to determine if the server will honor the additional headers the XHR object wishes to set. For example, client-side JavaScript is forbidden from manipulating the Origin header (or any Sec-header in the upcoming WebSockets section). On the other hand, the XHR object may wish to upload files via a POST method, in which case it may be desirable to set a Content-Type header (although browsers will limit those values this header may contain).

The server has five response headers that instruct the browser what to permit in terms of sharing access to the data of a response to a cross-origin request:

Access-Control-Allow-Credentials—May be “true” or “false.” By default, the browser will not submit cookies, HTTP authentication (e.g. Basic, Digest, NTLM) strings, or client SSL certificates across origins. This restriction prevents malicious content from attempting to leak the credentials to an unapproved origin. Setting this header to true allows any data in this credential category to be shared across origins.

Access-Control-Allow-Headers—The headers a request may include. There are immutable headers, such as Host and Origin. This applies to headers like Content-Type as well as custom X-headers.

Access-Control-Allow-Methods—The methods a request may use to obtain the resource. Always prefer to limit methods to only those deemed necessary, which is usually just GET.

Access-Control-Allow-Origin—The origin(s) with which the server permits the browser to share the server’s response data. This may be an explicit origin (e.g., ∗ (e.g. a wildcard to match any origin, or “null”(to deny requests). The wildcard (∗) always prevents credentials from bring included with a cross-origin request, regardless of the aforementioned Access-Control-Allow-Credentials header.

Access-Control-Expose-Headers—A list of headers that the browser may make visible to the client. For example, JavaScript would be able to read exposed headers from an XHR response.

Access-Control-Max-Age—The duration in seconds for which the response to a preflight request may be cached. Shorter times incur more overhead as the browser is forced to renew its CORS permissions with a new preflight request. Longer times increase the potential exposure of overly permissive controls from a preflight request. This is a policy decision for web developers. A good reference for this value would be the amount of time the web application maintains a user’s session without requiring re-authentication, much like a “Remember Me” button common among sites. Thus, typical durations may be a few minutes, a working day, or two weeks with a preference for shorter times.

Sharing resources cross-origin must be permitted by the web site. Access to response data from usual GET and POST requests will always be restricted to the Same Origin unless the response contains one of the CORS-related headers. A server may respond to these “usual” types of requests with Access-Control-headers. In other situations, the browser may first use a preflight request to establish a CORS policy. This is most common when the XHR object is used.

In this example, assume the HTML is loaded from an Origin of http://web.siSte. The following JavaScript shows an XHR request being made with a PUT method to another Origin ( that desires to include credentials (the “true” value for the third argument to function):

var xhr = new XMLHttpRequest();“PUT”, “”, true);


Once xhr.send() is processed the browser initiates a preflight request to determine if the server is willing to share a resource from its own origin with the requesting resource’s origin. The request looks something like the following:



User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:11.0) Gecko/20100101 Firefox/11.0

Accept: text/html,application/xhtml+xml,application/xml;q=0.9,∗/∗;q=0.8

Accept-Language: en-us,en;q=0.5


Access-Control-Request-Method: PUT

If the server at wishes to share resources with, then it will respond with something like:

TTP/1.1 200 OK

Date: Tue, 03 Apr 2012 06:51:53 GMT

Server: Apache


Access-Control-Allow-Methods: PUT

Access-Control-Allow-Credentials: true

Access-Control-Max-Age: 10

Content-Length: 0

This exchange of headers instructs the browser to expose the content of responses from the origin with resources loaded from the origin. Thus, an XHR object could receive JSON data from that would be able to read, manipulate, and display.

CORS is an agreement between origins that instructs the browser to relax the Same Origin Policy that would otherwise prevent response data from one origin being available to client-side resources of another origin. Allowing CORS carries security implications for a web application. Therefore, it’s important to keep in mind principles of the Same Origin Policy when intentionally relaxing it:

• Ensure the server code always verifies that Origin and Host headers match each other and that Origin matches a list of permitted values before responding with CORS headers. Follow the principle of “failing secure”—any error should return an empty response or a response with minimal content.

• Remember that CORS establishes sharing on a per-origin basis, not a per-resource basis. If it is only necessary to share a single resource, consider moving that resource to its own subdomain rather than exposing the rest of the web application’s resources. For example, establish a separate origin for API access rather than exposing the API via a directory on the site’s main origin.

• Use a wildcard (∗) value for the Access-Control-Allow-Origin header sparingly. This value exposes the resource’s data (e.g. web page) to pages on any web site. Remember, Same Origin Policy doesn’t prevent a page from loading resources from unrelated origins—it prevents the page from reading the response data from those origins.

• Evaluate the added impact of HTML injection attacks (cross-site scripting). A successful HTML injection will already be able to execute within the victim site’s origin. Any trust relationships established with CORS will additionally be exposed to the exploit.

CORS is one of the HTML5 features that will gain use as an utility for web exploits. This doesn’t mean CORS is fundamentally flawed or insecure. It means that hackers will continue to exfiltrate data from the browser, scan networks for live hosts or open ports, and inject JavaScript using new technologies. Web applications won’t be getting less secure; the exploits will just be getting more sophisticated.


One of the hindrances to building web applications that handle rapidly changing content (think status updates and chat messages) is HTTP’s request/response model. In the race for micro-optimizations of such behavior sites eventually hit a wall in which the browser must continually poll the server for updates. In other words, the browser always initiates the request, be it GET, POST, or some other method. WebSockets address this design limitation of HTTP by providing a bidirectional, also known as full-duplex, communication channel. WebSocket URL connections use ws:// or wss:// schemes, the latter for connections over SSL/TLS.

Once a browser establishes a WebSocket connection to a server, either the server or the browser may initiate a data transfer across the connection. Previous to WebSockets, the browser had to waste CPU cycles or bandwidth to periodically poll the server for new data. With WebSockets, data sent from the server triggers a browser event. For example, rather than checking every two seconds for a new chat message, the browser can use an event-driven approach that triggers when a WebSocket connection delivers new data from the server. Enough background, let’s dive into the technology.

The following network capture shows the handshake used to establish a WebSocket connection from the browser to the public server at ws://

GET /?encoding=text HTTP/1.1


Connection: keep-alive, Upgrade

Sec-WebSocket-Version: 13


Sec-WebSocket-Key: ZIeebbKKfc4iCGg1RzyX2w==

Upgrade: websocket

HTTP/1.1 101 WebSocket Protocol Handshake

Upgrade: WebSocket

Connection: Upgrade

Sec-WebSocket-Accept: YwDfcMHWrg7gr/aHOOil/tW+WHo=

Server: Kaazing Gateway

Date: Thu, 22 Mar 2012 02:45:32 GMT


Access-Control-Allow-Credentials: true

Access-Control-Allow-Headers: content-type

The browser sends a random 16 byte Sec-WebSocket-Key value. The value is base64-encoded to make it palatable to HTTP. In the previous example, the hexadecimal representation of the Key is 64879e6db28a7dce22086835473c97db. In practice, only the base64-encoded representation is necessary to remember.

The browser must also send the Origin header. This header isn’t specific to WebSockets. We’ll revisit this header in later chapters to demonstrate its use in restricting potentially malicious content. The Origin indicates the browsing context in which the WebSockets connection is created. In the previous example, the browser visited to load the demo. The WebSockets connection is being made to a different Origin, ws:// This header allows the browser and server to agree on which Origins may be mixed when connecting via WebSockets.


Note the link to the demo site has a trailing slash (, but the Origin header does not. Recall that Origin consists of the protocol (http://), port (80), and host (—not the path. Resources loaded by file:// URLs have a null Origin. In all cases, this header cannot be influenced by JavaScript or spoofed via DOM methods or properties. Its intent is to strictly identify an Origin so a server may have a reliable indication of the source of a request from an uncompromised browser. A hacker can spoof this header for their own traffic (to limited effect), but cannot exploit HTML, JavaScript, or plugins to spoof this header in another browser. Think of its security in terms of protecting trusted clients (the browser) from untrusted content (third-party JavaScript applications like games, ads, etc.).

The Sec-WebSocket-Version indicates the version of WebSockets to use. The current value is 13. It was previously 8. As a security exercise, it never hurts to see how a server responds to unused values (9 through 11), negative values (−1), higher values (would be 14 in this case), potential integer overflow values (2^32, 2^32+1, 2^64, 2^64+1), and so on. Doing so would be testing the web server’s code itself as opposed to the web application.

The meaning of the server’s response headers is as follows.

The Sec-WebSocket-Accept is the server’s response to the browser’s challenge header, Sec-WebSocket-Key. The-∗ response acknowledges the challenge by combining the Sec-WebSocket-Key with a GUID defined in RFC 6455. This acknowledgement is then verified by the browser. If the round-trip Key/Accept values match, then the connection is opened. Otherwise, the browser will refuse the connection. The following example demonstrates the key verification using command-line tools available on most Unix-like systems. The SHA-1 hash of the concatenated Sec-WebSocket-Key and GUID matches the Base64-encoded hash of the Sec-WebSocket-Accept header calculated by the server.



$ echo -n ‘ZIeebbKKfc4iCGg1RzyX2w==258EAFA5-E914-47DA-95CA-C5AB0DC85B11’ | shasum -

6300df70c1d6ae0ee0aff68738e8a5fed5be587a -

$ echo -n ‘YwDfcMHWrg7gr/aHOOil/tW+WHo=’ | base64 -D | xxd

0000000: 6300 df70 c1d6 ae0e e0af f687 38e8 a5fe c..p........8...

0000010: d5be 587a

This challenge/response handshake is designed to create a unique, unpredictable connection between the browser and the server. Several problems might occur if the challenge keys were sequential, e.g. 1 for the first connection, then 2 for the second; or time-based, e.g. epoch time in milliseconds. One possibility is race conditions; the browser would have to ensure challenge key 1 doesn’t get used by two requests trying to make a connection at the same time. Another concern is to prevent WebSockets connections from being used for cross-protocol attacks.

Cross-protocol attacks are an old trick in which the traffic of one protocol is directed at the service of another protocol in order to spoof commands. This is the easiest to exploit with text-based protocols. For example, recall the first line of an HTTP request that contains a method, a URI, and a version indicator:


Email uses another text-based protocol, SMTP. Now, imagine a web browser with an XMLHttpRequest (XHR) object that imposes no restrictions on HTTP method or destination. A clever spammer might try to lure browsers to a web page that uses the XHR object to connect to a mail server by trying a connection like:

EHLOhttps://email.server:587 HTTP/1.0

Or if the XHR could be given a completely arbitrary method a hacker would try to stuff a complete email delivery command into it. The rest of the request, including headers added by the browser, wouldn’t matter to the attack:

EHLO%20email.server:587%0a%0dMAIL%20FROM:<>%0a%0dRCPT%20TO:<>%0a%0dDATAspamspamspamspam%0a%0d.%0ahttps://email.server:587 HTTP/1.1

Host: email.server

Syntax doesn’t always hit 100% correctness for cross-protocol attacks; however, hacks like these arise because of implementation errors (browser allows connections to TCP ports with widely established non-HTTP protocols like 25 or 587, browser allows the XHR object to send arbitrary content, mail server does not strictly enforce syntax).

WebSockets are more versatile than the XHR object. As a message-oriented protocol that may transfer binary or text content, they are a prime candidate for attempting cross-protocol attacks against anything from SMTP servers to even binary protocols like SSH. The Sec-WebSocket-Keyand Sec-WebSocket-Accept challenge/response ensures that a proper browser connects to a valid WebSocket server as opposed to any type of service (e.g. SMTP). The intent is to prevent hackers from being able to create web pages that would cause a victim’s browser to send spam or perform some other action against a non-WebSocket service; as well as preventing hacks like HTML injection from delivering payloads that could turn a Twitter vulnerability into a high-volume spam generator. The challenge/response prevents the browser from being used as a relay for attacks against other services.

By design, the XMLHttpRequest object is prohibited from setting the Origin header or any header that begins with Sec-. This prevents malicious scripts from spoofing WebSocket connections.

The Sec-WebSocket-Protocol header (not present in the example) gives browsers explicit information about the kind of data to be tunneled over a WebSocket. It will be a comma-separated list of protocols. This gives the browser a chance to apply security decisions for common protocols instead of dealing with an opaque data stream with unknown implications for a user’s security or privacy settings.

Data frames may be masked with an XOR operation using a random 32-bit value chosen by the browser. Data is masked in order to prevent unintentional modification by intermediary devices like proxies. For example, a cacheing proxy might incorrectly return stale data for a request, or a poorly functioning proxy might mangle a data frame. Note the spec does not use the term encryption, as that is neither the purpose nor effect of masking. The masking key is embedded within the data frame if affects—open for any intermediary to see. TLS connections provide encryption with stream ciphers like RC4 or AES in CTR mode.1 Use wss:// to achieve strong encryption for the WebSocket connection. Just as you would rely on https:// for links to login pages or, preferably, the entire application.

Transferring Data

Communication over a WebSocket is full-duplex, either side may initiate a data transfer. The WebSocket API provides the methods for the browser to receive binary or text data.

var ws = new WebSocket();

ws.onmessage = function(msg) {

if( instanceof Blob) { // alternately: ... instanceof ArrayBuffer



else {




The Blob object is defined in the File API ( It holds immutable data of Blob.size property bytes. The data is arbitrary, but may be described as a particular MIME type with the Blob.type property. For example, a Blob might be images to retrieve while scrolling through a series of photos, file transfers for chat clients, or a jQuery template for updating a DOM node.

The ArrayBuffer object is defined in the Typed Array Specification ( It holds immutable data of bytes that represent signed/unsigned integers or floating point values of varying bit size (e.g. 8-bit integer, 64-bit floating point).

Message data of strings is always UTF-8 encoded. The browser should enforce this restriction, e.g. no NULL bytes should appear within the string.

Data is sent using the WebSocket object’s send method. The WebSocket API intends for ArrayBuffer, Blob, and String data to be acceptable arguments to send. However, support for non-String data currently varies. JavaScript strings are natively UTF-16; the browser encodes them to UTF-8 for transfer.


Always encrypt WebSocket connections by using the wss:// scheme. The persistent nature of WebSocket connections combined with its minimal overhead negates most of the performance-related objections to implementing TLS for all connections.

Data Frames

Browsers expose the minimum necessary API for JavaScript to interact with WebSockets using events like onopen, onerror, onclose, and onmessage plus methods like close and send. The mechanisms for transferring raw data from JavaScript calls to network traffic are handled deep in the browser’s code. The primary concern from a web application security perspective is how a web site uses WebSockets: Does it still validate data to prevent SQL injection or XSS attacks? Does the application properly enforce authentication and authorization for users to access pages that use WebSockets?

Nevertheless, it’s still interesting to have a basic idea of how WebSockets work over the network. In WebSockets terms, how data frames send data. The complete reference is in Section 5 of RFC 6455. Some interesting aspects are highlighted here.

000002AB 81 9b 82 6e f6 68 cb 1d d6 1c ea 0b 84 0d a2 0f ...n.h.. ........

000002BB 98 11 e0 01 92 11 a2 01 83 1c a2 1a 9e 0d f0 0b ........ ........

000002CB c9.

The following data frame was sent by the browser. The first byte, 0×81, has two important halves. The value, 0×81, is represented in binary as 10000001b. The first bit represents the FIN (message finished) flag, which is set to 1. The next three bits are currently unused and should always be 0. The final four bits may be one of several opcodes. Table 1.1 lists possible opcodes.

Table 1.1 Current WebSocket Opcodes

WebSocket Opcode



The data frame is a continuation of a previous frame or frames


The data frame contains text (always UTF-8)


The data frame contains binary data


Currently unused


Close the connection


Ping. A keep-alive query not exposed through the JavaScript API.


Pong. A keep-alive response not exposed through the JavaScript API.


Currently unused

Looking at our example’s first byte, 0×81, we determine that it is a single fragment (FIN bit is set) that contains text (opcode 0×01). The next byte, 0x1b, indicates the length of the message, 27 characters. This type of length-prefixed field is common to many protocols. If you were to step out of web application security to dive into protocol testing, one of the first tests would be modifying the data frame’s length to see how the server reacts to size underruns and overruns. Setting large size values for small messages could also lead to a DoS if the server blithely set aside the requested amount of memory before realizing the actual message was nowhere nearly so large.

00000150 81 1b 49 73 20 74 68 65 72 65 20 61 6e 79 62 6f ..Is the re anybo

00000160 64 79 20 6f 75 74 20 74 68 65 72 65 3f dy out t here?

Finally, here’s a closing data frame. The FIN bit is set and the opcode 0×08 tells the remote end to terminate the connection.

000002CC 88 82 04 4c 3a 56 07 a4 ...L:V..

WebSockets data frames have several other types of composition. However, these aspects are largely out of scope for web application testing since it is browser developers and web server developers who are responsible for them. Even so, a side project on testing a particular WebSockets implementation might be fun. Here are some final tips on areas to review at the protocol layer:

– Setting invalid length values;

– Setting unused flags;

– Mismatched masking flags and masking keys;

– Replying messages;

– Sending out of order frames or overlapping fragments;

– Setting invalid UTF-8 sequences in text messages (opcode 0×01).

The specification defines how clients and servers should react to error situations, but there’s no reason to expect bug-free code in browsers or servers. This is the difference between security of design and security of implementation.


WebSockets have perhaps the most flux of the HTML5 features in this chapter. The Sec-WebSocket-Version may not be 13 by the time the draft process finishes. Historically, updates have made changes that break older versions or do not provide backwards compatibility. Regardless of past issues, the direction of WebSockets is towards better security and continued support for text, binary, and compressed content.

Security Considerations

Denial of Service (DoS)—Web browsers limit the number of concurrent connections they will make to an Origin (a web application’s page may consist of resources from several Origins). This limit is typically four or six in order to balance the perceived responsiveness of the browser with the connection overhead imposed on the server. WebSockets connections do not have the same per-Origin restrictions. This doesn’t mean the potential for using WebSockets to DoS a site has been ignored. Instead, the protocol defines behaviors that browsers and servers should follow. Thus, the design of the protocol is intended to minimize this concern for site owners, but that doesn’t mean implementation errors that enable DoS attacks will appear in browsers.

For example, an HTML injection payload might deliver JavaScript code to create dozens of WebSockets connections from victims’ browsers to the web site. The mere presence of WebSockets on a site isn’t a vulnerability. This example describes using WebSockets to compound another exploit (cross-site scripting) such that the site becomes unusable.

Tunneled protocols—Tunneling binary protocols (i.e. non-textual data) over WebSockets is a compelling advantage of this API. Where the WebSocket protocol may be securely implemented, the protocol tunneled over it may not be. Web developers must apply the same principles of input validation, authentication, authorization, and so on to the server-side handling of data arriving on a WebSocket connection. Using a wss:// connection from an up-to-date browser has no bearing on potential buffer overflows for the server-side code handling chat, image streaming, or whatever else is being sent over the connection.

This problem isn’t specific to binary protocols, but they are highlighted here because they tend to be harder to inspect. It’s much easier for developers to read and review text data like HTTP requests and POST data than it is to inspect binary data streams. The latter requires extra tools to inspect and verify. Note that this security concern is related to how WebSockets are used, not an insecurity in the WebSocket protocol itself.

Untrusted Server Relay—The ws:// or wss:// endpoint might relay data from the browser to an arbitrary Origin in violation of privacy expectations or security controls. On the one hand, a connection to wss:// might proxy data from the browser to a VNC server on an internal network normally unreachable from the public Internet, as if it were a VPN connection. Such use violates neither the spirit nor the specification of WebSockets. In another scenario, a WebSocket connection might be used to relay messages from the browser to an IRC server. Again, this could be a clever use of WebSockets. However, the IRC relay could monitor messages passed through it, even relaying the messages to different destinations as it desires. In another case, a WebSocket connection might offer a single-sign-on service over an encrypted wss:// connection, but proxy username and password data over unencrypted channels like HTTP.

There’s no more or less reason to trust a server running a WebSocket service than one running normal HTTP. A malicious server will attack a user’s data regardless of the security of the connection or the browser. WebSockets provide a means to bring useful, non-HTTP protocols into the browser, with possibilities from text messaging to video transfer. However, the ability of WebSockets to transfer arbitrary data will revive age-old scams where malicious sites act as front-ends to social media destinations, banking, and so one. WebSockets will simply be another tool that enables these schemes. Just as users must be cautioned not to overly trust the “Secure” in SSL certificates, they must be careful with the kind of data relayed through WebSocket connections. Browser developers and site owners can only do so much to block phishing and similar social engineering attacks.

Web Storage

In the late 1990s many web sites were characterized as HTML front-ends to massive databases. Google’s early home pages boasted of having indexed one billion pages. Today, Facebook has indexed data for close to one billion people. Modern web sites boast of dealing with petabyte-size data sets—growth orders of magnitude beyond the previous decade. There are no signs that this network-centric data storage will diminish considering trends like “cloud computing” and “software as a service” that recall older slogans like, “The network is the computer.”

This doesn’t mean that web developers want to keep everything on a database fronted by a web server. There are many benefits to off-loading data storage to the browser, from bandwidth to performance to storage costs. The HTTP Cookie has always been a workhorse of browser storage. However, cookies have limits on quantity (20 cookies per domain), size (4 KB per cookie), and security (a useless path attribute2) that have been agreed to by browser makers in principle rather than by standard.

Web Storage aims to provide a mechanism for web developers to store large amounts of data in the browser using a standard API across browsers. The principle features of Web Storage attests to their ancestry in the HTTP Cookie: data is stored as key/value pairs and Web Storage objects may be marked as sessionStorage or localStorage (similar to session and persistent cookies).

The keys and values in a storage object are always JavaScript strings. A sessionStorage object is tied to a browsing context. For example, two different browser tabs will have unique sessionStorage objects. Changes to one will not affect the other. A localStorage object’s contents will be accessible to all browser tabs; modifying a key/value pair from one tab will affect the storage for each tab. In all cases, access is restricted by the Same Origin Policy.

An important aspect of Web Storage security is that the data is viewable and modifiable by the user (see Figure 1.1).


Figure 1.1 A Peek Inside a Browser’s Local Storage Object

The following code demonstrates a common pattern for enumerating keys of a storage object via a loop.

var key;

for (var i = 0, len = localStorage.length; i < len; i++){

key = localStorage.key(i);




Attaching lifetime of a sessionStorage object to the notion of “session” is a weak security reliance. Modern browsers will resume sessions after they have been closed or even after a system has been rebooted. Consequently, there is little security distinction between the two types of Web Storage objects’ lifetimes.

Finally, keep in mind these security considerations. Like most of this chapter, the focus is on how the HTML5 technology is used by a web application rather than vulnerabilities specific to the implementation or design of the technology in the browser.

• Prefer opportunistic purging of data—Determine an appropriate lifetime for sensitive data. Just because a browser is closed doesn’t mean a sessionStorage object’s data will be removed. Instead, the application could delete data after a time (to be executed when the browser is active, of course) or could be deleted on a beforeunload event (or onclose if either event is reliably triggered by the browser).

• Remember that data placed in a storage object having the same exposure as using a cookie. Its security relies on the browser’s Same Origin Policy, the browser’s patch level, plugins, and the underlying operating system. Encrypting data is the storage object has the same security as encrypting the cookie. Placing the decryption key in the storage object (or otherwise sending it to the browser) negates the encrypted data’s security.

• Consider the privacy and sensitivity associated with data to be placed in a storage object. The ability to store more data shouldn’t translate to the ability to store more sensitive data.

• Prepare for compromise—An html injection attack that executes within the same Origin as the storage object will be able to enumerate and exfiltrate its data without restriction. Keep this in mind when you select the kinds of data stored in the browser. (HTML injection is covered in Chapter 2.)

• HTML5 doesn’t magically make your site more secure. Features like <iframe> sandboxing and the Origin header are good ways to improve security design. However, these calls still be rendered ineffective by poorly configured proxies that strip headers, older browsers that do not support these features, or poor data validation that allows malicious content to infiltrate a web page.


The IndexedDB API has its own specification ( separate from the WebStorage API. Its status is less concrete and fewer browsers currently support it. However, it is conceptually similar to WebStorage in terms of providing a data storage mechanism for the browser. As such, the major security and privacy concerns associated with WebStorage apply to IndexedDB as well.

A major difference between IndexedDB and WebStorage is that IndexedDB’s key/value pairs are not limited to JavaScript strings. Keys may be objects of type Array, Date, float, or String. Values may be any of object that adheres to HTML5’s “structured clone” algorithm.3 Structured data is basically a more flexible serialization method than JSON. For example, it can handle Blob objects (an important aspect of WebSockets) and recursive, self-referencing objects. In practice, this means more sophisticated data types may be stored by IndexedDB.

Web Workers

Today’s web application developers find creative ways to bring traditional desktop software into the browser. This places more burden on the browser to manage objects (more memory), display graphics (faster page redraws), and process more events (more CPU). Developers who bring games to the browser don’t want to create Pong, they want to create full-fledged MMORPGs.

Regardless of what developers want a web application to do, they all want web applications to do more. The Web Workers specification ( addresses this by exposing concurrent programming APIs to JavaScript. In other words, the error-prone world of thread programming has been introduced to the error-prone world of web programming.

Actually, there’s no reason to be so pessimistic about Web Workers. The specification lays out clear guidelines for the security and implementation of threading within the browser. So, the design (and even implementation) of Workers may be secure, but a web application’s use of them may bring about vulnerabilities.

First, an overview of Workers. They fall under the Same Origin Policy of other JavaScript resources. Workers have additional restrictions designed to minimize any negative security impact.

• No direct access to the DOM. Therefore they cannot enumerate nodes, view cookies, or access the Window object. A Worker’s scope is not shared with the normal global scope of a JavaScript context. Workers still receive and return data associated with the DOM under the usual Same Origin Policy.

• May use the XMLHttpRequest object. Visibility of response data remains limited by the Same Origin Policy. Exceptions made by Cross-Origin Request Sharing may apply.

• May use a WebSocket object, although support varies by browser.

• The JavaScript source of a Worker object is obtained from a relative URL passed to the constructor of the object. The URL is resolved to the base URL of the script creating the object. This prevents Workers from loading JavaScript from a different origin.

Web Workers use message passing events to transfer data from the browsing context that creates the Worker with the Worker itself. Messages are sent with the postMessage() method. They are received with the onmessage() event handler. The message is tied to the event’s data property. The following code shows a web page with a form that sends messages back and forth to a Worker. Notice that the JavaScript source of the Worker is loaded from a relative URL passed into the Worker’s constructor, in this case “worker1.js.”

<!doctype html><html><body><div id=”output”></div>

<form action=”javascript:void(0);” onsubmit=”respond()”>

<input id=”prompt” type=”text”>



var worker1 = new Worker(“worker1.js”);

worker1.onmessage = function(evt) {

document.getElementById(“output”).textContent =;


function respond() {

var msg = document.getElementById(“prompt”);


msg.value = “”;

return false;




The worker1.js JavaScript source follows. This example cycles through several functions by changing the assignment of the onmessage event. Of course, the implementation could have also used a switch statement or if clauses to obtain the same effect. The goal of this example is to demonstrate the flexibility of a dynamically changeable interface.

var msg = “”;

onmessage = sayIntroduction;

function sayIntroduction(evt) {

onmessage = sayHello;

postMessage(“Who’s there?”);


function sayHello(evt) {

msg =;

onmessage = sayDavesNotHere;

postMessage(“Hello, “ + msg);


function sayDavesNotHere(evt) {

onmessage = sayGoodBye;

postMessage(“Dave’s not here.”);


function sayGoodBye(evt) {

onmessage = sayDavesNotHere;

postMessage(“I already said.”);


Don’t be afraid of using Web Workers. Their mere presence does not create a security problem. However, there are some things to watch out for (or test for if you’re in a hacking mood):

• The constructor must always take a relative URL. It would be a security bug if a Worker’s source were loaded from an arbitrary origin due to implementation errors like mishandling “%00,” “%ff,” or “”

• Resource consumption of CPU or memory. Web Workers do an excellent job of hiding the implementation details of safe concurrency operations from the JavaScript API. Browsers will enforce limitations on the number of Workers that may be spawned, infinite loops inside a worker, or deep recursion issues. However, errors in implementation may expose the browser to Denial of Service style attacks. For example, image a Web Worker that attempts to do lots of background processing—perhaps nothing more than multiplying numbers—in order to drain the battery of a mobile device.

• Workers may compound network-based Denial of Service attacks that originate from the browser. For example, consider an HTML injection payload that spawns a dozen Web Workers that in turn open parallel XHR connections to a site the hacker wishes to overwhelm.

• Concurrency issues. Just because the Web Worker API hides threading concepts like locking, deadlocks, race conditions, and so on doesn’t mean that the use of Web Workers will be free from concurrency errors. For example, a site may rely on one Worker to monitor authorization while another Worker performs authorized actions. It would be important that revocation of authorization be checked before performing an action. Multiple Workers have no guarantee of an order of execution among themselves. In the event-driven model of Workers, a poorly crafted authorization check in one Worker might be reordered behind another Worker’s call that should have otherwise been blocked.

Flotsam & Jetsam

It’s hard to pin down specific security failings when so many of the standards are incomplete or unimplemented. This final section tries to hit some minor specifications not covered in other chapters.

History API

The History API ( provides means to manage a state of sessions for a browsing context. It’s like a stack of links for navigating backwards and forwards. Its security relies on the Same Origin Policy. The object is simple to use. For example, the following code demonstrates pushing a new link onto the object:

history.pushState(null, “Login”, “”);

The security and privacy considerations of the History object come into play if a browser’s implementation is not correct. If the Same Origin Policy were not correctly enforced, then the History object could be abused by JavaScript loaded in one origin adding links to other origins. For example, imagine a broken browser that loads a page from that in turn creates a social engineering attack around a History object that points to other origins.

history.pushState(null, “Auction Site Login”, “”);

history.pushState(null, “Home”, “”);

history.pushState(null, “”, “javascript:malicious_code()”);

Alternately, the malicious web site could attempt to enumerate links from another origin’s History object, which would be a privacy exposure. The design of the History API prevents this, but there’s no guarantee mistakes will happen.

Draft APIs

The W3C ( maintains an extensive list of web-related specifications in varying states of completion. These range from HTML5 discussed in this chapter to things like using Gamepads for HTML games, describing microformats for sharing information, to mobile browsing, protocols, security, and more.

Reading mailing lists and taking part in discussions are a good way to find out what browser developers and web developers are working on next. It’s a great way to discover potential security problems, understand how new features affect privacy, and stay on top of emerging trends.


“I’m going through changes.” Changes. Black Sabbath

HTML5 has been looming for so long that the label has taken on many meanings outside of its explicit standard, from related items like Web Storage and Web Workers to more ambiguous concepts that used to be called “Web 2.0.” In any case, the clear indication is that web applications have more powerful features that continue to close the gap between desktop applications and pure browser applications. Phenomenally popular games like Angry Birds can transition almost seamlessly from native mobile apps to in-browser games without loss of sound, graphics, or—most important for any application—an engaging experience.

HTML5 exists in your browser now. Some features may be partially implemented, others may still be “vendor prefixed” with strings like -moz, -ms, or -webkit until a specification becomes official. With luck, the proliferation of vendor prefixes won’t lock in a particular implementation quirk or renew of programming anti-patterns of HTML’s earlier days. Keep this amount of flux in mind as you approach web application security. The authors behind HTML5 are striving to maintain a secure design (or at least, not worsen the security model of HTML). As such, there will be major areas to watch for implementation errors as browser adds more features:

• Same Origin Policy—The coarse-grained security model based on scheme, host, and port. Hackers have historically found holes in this model through Java, plugins, and DNS attacks. HTML5 continues to place significant trust in the constancy of this policy.

• Framed content—There are privacy and security concerns related to framing content. For example, an ad banner should be prevented from gathering information about its parent frame. Conversely, an enclosing frame shouldn’t be able to access its child frame resources if they come from a different origin. But clickjacking attacks only rely on the ability to frame content, not access to content. (We’ll return to this in Chapter 3). HTML5 provides new mechanisms for handling <iframe> restrictions. Modern web sites also perform significant on-the-fly updates of DOM nodes, which have the potential to confuse the Same Origin Policy or leave a node in a indeterminate state—something that’s never good for security. This is more of a concern for browser vendors who continue to wrangle security and the DOM.

• All JavaScript, all the time—More sophisticated browser applications rely more and more on complex JavaScript. HTML5’s APIs are just as useful as an exploit tool as they are for building web sites.

• Browsers can store more information and interact with more types of applications. The browser’s internal security model has to be able to partition sites well enough that one site rife with vulnerabilities doesn’t easily expose data associated with a stronger site. Modern browsers are adopting security coding policies and techniques such as process separation to help protect users.

• Regardless of browser technology, basic security principles must be applied to the server-side application. Enabling a SQL injection hack that steals unencrypted passwords should be an unforgivable offense.

1 An excellent resource for learning about cryptographic fundamentals and security principles is Applied Cryptography by Bruce Schneier. We’ll touch on cryptographic topics at several points in this book, but not at the level of rigorous algorithm review.

2 The Same Origin Policy does not restrict DOM access or JavaScript execution based on a link’s path. Trying to isolate cookies from the same origin, say between and, by their path attribute is trivially bypassed by malicious content that executes within the origin regardless of the content’s directory of execution.

3 Section 2.8.5 of the HTML5 draft dated March 29, 2012.