Honeypots - Malware Analyst’s Cookbook and DVD: Tools and Techniques for Fighting Malicious Code (2011)

Malware Analyst’s Cookbook and DVD: Tools and Techniques for Fighting Malicious Code (2011)

Chapter 2. Honeypots

Honeypots are systems that are designed to be exploited, whether through emulated vulnerabilities, real vulnerabilities, or weaknesses, such as an easily guessable SSH password. By creating such systems, you can attract and log activity from attackers and network worms for the purpose of studying their techniques. Honeypots are usually categorized as either high-interaction or low-interaction:

· High-interaction: Systems with a real non-emulated OS installed on them that can be accessed and explored by attackers. These systems may be virtual machines or physical machines that you can reset after they are compromised. They are frequently used to gain insight into human attackers and toolkits used by attackers.

· Low-interaction: Systems that only simulate parts of an operating system, such as a certain network protocols. These systems are most frequently used to collect malware by being “exploited” by other malware-infected systems.

Honeynets, on the other hand, consist of two or more honeypots on a network. Typically, a honeynet is used for monitoring a larger and more diverse network in which one honeypot may not be sufficient. For example, an attacker may gain access to one honeypot and then try to move laterally across the network to another computer. If there are no other computers on the network, the attacker may realize that the environment isn’t the expected corporate network; and then he’ll vanish. The purpose of this chapter is not to study an attacker’s every move, so we do not discuss honeynets or high-interaction honeypots. Instead, this chapter focuses on low-interaction honeypots for the purpose of collecting malware samples.

Setting up a low-interaction honeypot such as nepenthes, dionaea, or mwcollectd (http://code.mwcollect.org/—not covered in this chapter) is a great way to capture the malware that botnets and worms distribute. You can also potentially use them to detect new vulnerabilities being exploited in the wild, study trends and statistics, and develop a workflow that streamlines the process of obtaining, scanning, and reporting on new malicious code. Figure 2-1 shows a diagram of the high-level honeypot infrastructure that you can build with recipes in this chapter.

Figure 2-1: Honeypot example diagram

f0201.eps

Nepenthes Honeypots

Nepenthes (http://nepenthes.carnivore.it) is one of the most well-known and widely deployed low-interaction honeypots on the Internet. Markus Kötter and Paul Bächer first developed it in 2005. Nepenthes includes several modules for emulating Microsoft vulnerabilities that can be remotely exploited by systems scanning the Internet. In this section, you’ll learn how to collect malware samples, monitor attacks with IRC logging, and accept web-based submissions of malware from your nepenthes sensors.

Recipe 2-1: Collecting Malware Samples with Nepenthes

Nepenthes runs on a variety of operating systems, including Windows via Cygwin, Mac OS X, Linux, and BSD. The extensive readme1 file explains how to download pre-compiled binaries or install nepenthes from source for any of the aforementioned systems. However, the instructions in this recipe are specific to using nepenthes on Ubuntu.

Installing Nepenthes

To get started with the installation, type the following command:

$ sudo apt-get install nepenthes

This will install nepenthes and add the user account and group (both named nepenthes) that the daemon process runs as. Once the package is installed, you can start nepenthes as a service with the following command.

$ sudo service nepenthes start

When nepenthes begins running, it binds to several ports on your system. These are the ports on which nepenthes expects to see common remote exploitation. As you can see in the following netstat output, the nepenthes process has a process ID of 14243. Each line represents a different socket in the LISTEN state (waiting for incoming connections). The top line indicates that nepenthes is listening on port 80 of all IPv4 addresses (0.0.0.0) on the machine and there is currently no remote endpoint (0.0.0.0:*) connected to the socket.

$ sudo netstat –ntlp | grep nepenthes

tcp 0.0.0.0:80 0.0.0.0:* LISTEN 14243/nepenthes

tcp 0.0.0.0:10000 0.0.0.0:* LISTEN 14243/nepenthes

tcp 0.0.0.0:6129 0.0.0.0:* LISTEN 14243/nepenthes

tcp 0.0.0.0:465 0.0.0.0:* LISTEN 14243/nepenthes

tcp 0.0.0.0:5554 0.0.0.0:* LISTEN 14243/nepenthes

tcp 0.0.0.0:27347 0.0.0.0:* LISTEN 14243/nepenthes

tcp 0.0.0.0:17300 0.0.0.0:* LISTEN 14243/nepenthes

tcp 0.0.0.0:21 0.0.0.0:* LISTEN 14243/nepenthes

tcp 0.0.0.0:3127 0.0.0.0:* LISTEN 14243/nepenthes

tcp 0.0.0.0:2103 0.0.0.0:* LISTEN 14243/nepenthes

tcp 0.0.0.0:2105 0.0.0.0:* LISTEN 14243/nepenthes

tcp 0.0.0.0:2745 0.0.0.0:* LISTEN 14243/nepenthes

tcp 0.0.0.0:25 0.0.0.0:* LISTEN 14243/nepenthes

tcp 0.0.0.0:2107 0.0.0.0:* LISTEN 14243/nepenthes

tcp 0.0.0.0:443 0.0.0.0:* LISTEN 14243/nepenthes

tcp 0.0.0.0:220 0.0.0.0:* LISTEN 14243/nepenthes

tcp 0.0.0.0:445 0.0.0.0:* LISTEN 14243/nepenthes

tcp 0.0.0.0:1023 0.0.0.0:* LISTEN 14243/nepenthes

tcp 0.0.0.0:1025 0.0.0.0:* LISTEN 14243/nepenthes

tcp 0.0.0.0:993 0.0.0.0:* LISTEN 14243/nepenthes

tcp 0.0.0.0:995 0.0.0.0:* LISTEN 14243/nepenthes

tcp 0.0.0.0:314 0.0.0.0:* LISTEN 14243/nepenthes

tcp 0.0.0.0:135 0.0.0.0:* LISTEN 14243/nepenthes

tcp 0.0.0.0:5000 0.0.0.0:* LISTEN 14243/nepenthes

tcp 0.0.0.0:42 0.0.0.0:* LISTEN 14243/nepenthes

tcp 0.0.0.0:139 0.0.0.0:* LISTEN 14243/nepenthes

tcp 0.0.0.0:3372 0.0.0.0:* LISTEN 14243/nepenthes

tcp 0.0.0.0:110 0.0.0.0:* LISTEN 14243/nepenthes

tcp 0.0.0.0:143 0.0.0.0:* LISTEN 14243/nepenthes

To receive connections on these ports from machines on the Internet, you must allow access to the ports through any firewalls on your network. Also, if you are dropping or restricting traffic to your system with iptables (a host-based firewall), you can use the following command to open access to the ports required by nepenthes.

$ sudo iptables -I INPUT -p tcp --dport <port_number> -j ACCEPT

Note Nepenthes also may require port forwarding if your system is behind a home router or other device that performs network address translation (NAT). Also, note that NAT deployments can be problematic because of the use of bindshells, which may attempt to open a random port on the honeypot system for the attacking system to connect back to.

Nepenthes Logs

The default configuration that nepenthes comes with is enough to start capturing malware. Once up and running, you’ll want to know what attacks your honeypot logged and what files (malware) were downloaded as a result of the attacks. Here is a list of the directories and files that are associated with nepenthes.

· /var/log/nepenthes/: The default logging directory.

· /var/log/nepenthes/logged_downloads: Contains a list of all download attempts.

· /var/log/nepenthes/logged_submissions: Contains a list of all successful download attempts.

· /var/log/nepenthes/binaries/: Stores downloaded binaries. Each file is named after its MD5 hash and is only saved the first time it is received; it is not re-downloaded if seen in subsequent attacks.

· /var/log/nepenthes.log: The primary log file for nepenthes that contains all activity, including detection of duplicate attacks and other messages associated with nepenthes’s health and status.

To see what attacks your honeypot has received and what malware the attacking systems are trying to distribute, take a look at the logged_downloads file. (In the following output, the authors sanitized their honeypot’s IP addresses to 10.1.84.6.)

$ tail /var/log/nepenthes/logged_downloads

[2010-07-07T16:29:38] 74.160.64.241 10.1.84.6 tftp://74.160.64.241/ssms.exe

[2010-07-07T17:00:25] 74.109.128.237 10.1.84.6 tftp://74.109.128.237/ssms.exe

[2010-07-07T17:16:58] 74.72.155.203 10.1.84.6 ftp://1:1@74.72.155.203:56187/ssms.exe

[2010-07-07T18:45:57] 74.109.128.237 10.1.84.6 ftp://1:1@74.109.128.237:51288/ssms.exe

[2010-07-07T19:02:00] 67.55.20.66 10.1.84.6 tftp://67.55.20.66/ssms.exe

[2010-07-07T23:23:05] 74.138.48.239 10.1.84.6 ftp://1:1@74.138.48.239:11781/ssms.exe

[2010-07-08T00:18:02] 113.42.142.88 10.1.84.6 creceive://113.42.142.88:9988/0

[2010-07-08T00:38:47] 74.124.228.117 10.1.84.6 tftp://74.124.228.117/ssms.exe

[2010-07-08T04:56:56] 74.102.142.103 10.1.84.6 tftp://74.102.142.103/ssms.exe

[2010-07-08T07:31:54] 74.51.226.134 10.1.84.6 tftp://74.51.226.134/ssms.exe

This log file is in the format:

[Timestamp] [Source IP] [Destination IP] [Download instructions]

In the output, you can see attacks from nine unique source IP addresses over the course of 15 hours. Although the source addresses are different (with the exception of 74.109.128.237, which probed us twice), the download instructions are similar. For example, the protocol is either FTP or TFTP and the name of the file is always ssms.exe. If the protocol is FTP, the supplied username and password is 1:1. These patterns indicate that the attacking IPs may all belong to the same botnet or at least share similar code for spreading malware.

One thing you can’t tell at this point is whether all remote systems are hosting the same version of smss.exe. It may be a different variant of the malware on each system, despite the same file name. Any time you want to investigate entries in the logged_downloads file, you can use grep on the nepenthes.log file for additional information, like this:

$ grep 74.51.226.134 nepenthes.log -A2 | grep Downloaded -A2

[08072010 07:32:17 info down handler dia] Downloaded file tftp://74.51.226.134/ssms.exe 171795 bytes

[08072010 07:32:17 spam mgr submit] Download has flags 0

[08072010 07:32:17 info mgr submit] File ecfbf321d3dea3ec732e7957b1bb7b1a has type PE32 executable for MS Windows (GUI) Intel 80386 32-bit

You can see that the attack resulted in the download of ssms.exe and that file had the MD5 hash ecfbf321d3dea3ec732e7957b1bb7b1a. Now let’s check the timestamp for the corresponding file in the nepenthes download directory:

$ ls -l /var/lib/nepenthes/binaries/ | \

grep ecfbf321d3dea3ec732e7957b1bb7b1a

-rw-r--r-- 1 nepenthes nepenthes 171795 2010-06-11 20:18

ecfbf321d3dea3ec732e7957b1bb7b1a

Do you notice an inconsistency in the data? According to logged_downloads, 74.51.226.134 instructed the honeypot to download smss.exe on 2010-07-08, but the timestamp on the corresponding file is 2010-06-11. This isn’t an error. As previously mentioned, nepenthes doesn’t store duplicates of files that already exist in the downloads directory. Using the first-seen timestamp, you can get an idea of whether the bots are spreading new or old malware samples. Botnets and worms will often attempt to spread the same file repeatedly for a long time, so the behavior you’re observing isn’t out of the ordinary.

The following command searches the downloads directory for any activity on 2010-07-08:

$ ls -lt /var/lib/nepenthes/binaries/ | grep 2010-07-08

-rw-r--r-- 1 nepenthes nepenthes 57856 2010-07-08 00:18

e3c1fb9c29107fdab8920840f10d25b5

According to the results, only one of the attacks in the logged_downloads file resulted in a malware sample that had not been previously seen by the nepenthes sensor. This means that all the other download attempts from the log file were duplicates or otherwise resulted in an error. If you want to perform some automated processing of newly collected samples, you can set up a nightly cron job each day and grep the download directory for the current date.

1 http://nepenthes.carnivore.it/documentation:readme

Recipe 2-2: Real-Time Attack Monitoring with IRC Logging

Frequently reviewing your nepenthes log files and directories is a good way to find new activity. However, this is more of a manual process and it is a bit tedious. Fortunately, nepenthes comes with a number of useful modules that you can configure to receive near real-time alerts. This recipe shows you how to set up the log-irc module to receive alerts on an IRC channel of your choice. Before you begin, note that the configuration files for available nepenthes modules are located alongside the main nepenthes configuration file (nepenthes.conf) in the /etc/nepenthes directory.

To set up and configure logging to IRC, follow these steps:

1. Edit nepenthes.conf and make sure the following line is uncommented:

"logirc.so", "log-irc.conf", "" // needs configuration

2. Edit log-irc.conf with the appropriate IRC settings. The following code shows a sample configuration that works with the Rizon IRC network.

log-irc

{

use-tor "0";

tor

{

server "localhost";

port "9050";

};

irc

{

server

{

name "irc.rizon.net";

port "6667";

pass "";

};

user

{

nick "nep-cookbook";

ident "nep-sensor1";

userinfo "http://nepenthes.mwcollect.org/";

usermodes "+i";

};

channel

{

name "#malware_analysts_cookbook";

pass "";

};

};

};

Consider the following tips when setting up your sensor to log to IRC:

· If you plan to use a proxy or Tor, you can set use-tor to "1" and configure the server and port accordingly. See Recipe 1-1 for information on how to set up Tor.

· When you choose a nickname for your logging bot, be sure to choose one that is not in use; otherwise it will never successfully connect to the IRC channel.

· After changing the configuration file, you must restart nepenthes.

Once you do this, nepenthes will begin logging information on probes and attacks in near real-time on IRC. All you need to do is log into the IRC channel using your favorite IRC client to receive the messages. The following code shows an example of the output from when our nepenthes sensor was attacked by 113.42.142.88.

01:17 <nep-cookbook> Unknown ASN1_SMB Shellcode (Buffer 172 bytes)

(State 0)

01:17 <nep-cookbook> Unknown PNP Shellcode (Buffer 172 bytes)

(State 0)

01:17 <nep-cookbook> Unknown LSASS Shellcode (Buffer 172 bytes)

(State 0)

01:17 <nep-cookbook> Unknown DCOM Shellcode (Buffer 172 bytes)

(State 0)

01:17 <nep-cookbook> Unknown NETDDE exploit 76 bytes State 1

01:17 <nep-cookbook> Unknown SMBName exploit 0 bytes State 1

01:17 <nep-cookbook> Handler creceive download handler will download

creceive://113.42.142.88:9988/0

01:18 <nep-cookbook> File e3c1fb9c29107fdab8920840f10d25b5 has type

PE32 executable for MS Windows (GUI) Intel 80386 32-bit

With IRC logging enabled, you can immediately see when activity is occurring and when your honeypot system is successfully exploited. In the preceding example, the system was sent a binary with the MD5 hash e3c1fb9c29107fdab8920840f10d25b (fetched with the creceive module, which is a generic TCP downloader). That file could then be retrieved from the binaries directory for analysis.

Recipe 2-3: Accepting Nepenthes Submissions over HTTP with Python

dvd1.eps

You can find supporting material for this recipe on the companion DVD.

You might find it useful to automatically send binaries that your honeypot collects to a server elsewhere. This recipe shows you how to create CGI scripts in Python that accept binaries from nepenthes honeypots over HTTP; and then how to configure nepenthes to perform the automated submissions.

On the book’s DVD you will find a file named wwwhoney.tgz, which contains a small Python web server and the necessary scripts to receive HTTP-based submissions from nepenthes and dionaea (see Recipe 2-5 for using the scripts with dionaea). To get started with the web server, extract the archive to your desired location like this:

$ tar -xvf wwwhoney.tgz

wwwhoney/

wwwhoney/binaries/

wwwhoney/README

wwwhoney/cgi-bin/

wwwhoney/cgi-bin/libhoney.py

wwwhoney/cgi-bin/dionaea.py

wwwhoney/cgi-bin/nepenthes.py

wwwhoney/cgiserver.py

Here is a description of the files that you’ll find inside the wwwhoney.tgz archive:

· /binaries/: Directory where received binaries are stored

· /cgi-bin/libhoney.py: Library with functions shared by honeypot scripts

· /cgi-bin/dionaea.py: Script for accepting files from dionaea

· /cgi-bin/nepenthes.py: Script for accepting files from nepenthes

· cgiserver.py: Small Python-based CGI web server used to serve scripts

To start the web server in the background, use the following command:

$ python cgiserver.py &

Server running on port 9000!

The default port is set to 9000 and can be modified by editing the source of cgiserver.py. You can now configure your nepenthes sensor to submit malware samples to your web server. To do this, edit /etc/nepenthes/submit-http.conf. If you were running your web server from the IP 192.168.1.100, you would modify your nepenthes submit-http module to look like this:

submit-http

{

url "http://192.168.1.100:9000/cgi-bin/nepenthes.py";

email "your@email"; // optional

user "httpuser"; // optional

pass "httppass"; // optional

};

The only required field is the URL to which the binaries are submitted. The URL can be http or https. A username and password can be supplied via the user and pass parameters for basic access authentication if the URL you wish to submit to is restricted to authenticated access only.

At this point, all new binaries received by nepenthes are submitted to the nepenthes.py script. The code that follows shows the source of nepenthes.py.

#!/usr/bin/python

import sys

import cgi

import hashlib

from libhoney import *

form = cgi.FieldStorage()

if not form:

sys.exit()

(data, filename) = getFile(form, "file")

printHeader()

# the initial POST didn't include the file, so request it

if not data or not filename:

print "S_FILEREQUEST"

sys.exit()

# if the file already exists, we don't want it again

md5 = hashlib.md5(data).hexdigest()

if fileExists(md5):

print "S_FILEKNOWN"

sys.exit()

# store the file according to its md5 hash

if storeFile(data, md5):

print "S_FILEOK"

else:

print "S_ERROR"

The script first checks if the file is already in the web server’s archive. If not, the script requests it from the nepenthes sensor by replying with S_FILEREQUEST. The files are saved in the ./binaries/ directory named according to their MD5 hash. Keep in mind that this is just a start to your honeypot infrastructure. Here are a few ways that you can extend the template:

· Add a database back end to track and store samples (see the Remote Root website for an example in PHP that logs to MySQL).2

· Import the Python module we present in Recipe 4-4 for scanning submissions with VirusTotal, Jotti, ThreatExpert, and NoVirusThanks.

· Import the Python module presented in Recipe 3-8 to detect malicious attributes in the PE file headers.

· Import the Python modules presented in Chapter 8 to automate the execution of the samples you collect in a VMware or VirtualBox environment.

2 http://www.remoteroot.net/2008/07/21/nepenthes-submit-http-server-with-file-upload/

Working with Dionaea Honeypots

Dionaea (http://dionaea.carnivore.it) is a low-interaction honeypot and is considered the successor to nepenthes. Markus Kötter, one of the original developers of nepenthes, initially developed dionaea as part of the Honeynet Project’s Summer of Code 2009. In this section, you’ll learn how to collect malware samples with dionaea as well as how to send and receive collected samples over HTTP. You’ll also learn how to set up real-time event notification and sample sharing over XMPP, how to analyze and replay attacks, how to integrate p0f to passively identify operating systems, and how to graph attack patterns.

Recipe 2-4: Collecting Malware Samples with Dionaea

Before we begin with installing and setting up dionaea, here are a few of the most interesting features:

· It is written in C, but exposes a Python interface so you can easily add new modules without recompiling the base.

· It supports IPv6 and TLS, and uses libemu (see Recipe 6-10) for shellcode detection.

· It implements a Python-based version of the Windows Server Message Block (SMB) protocol, allowing it to properly establish sessions before being exploited by attacking machines. Other low-interaction honeypots only simulate certain vulnerable functions. Given that attacks over SMB will likely account for the majority of traffic that your honeypot will see, this gives dionaea a big advantage over other honeypots.

· It can send real-time notifications using the XMPP protocol (see Recipe 2-6).

· It logs information on attacks to an SQLite3 database, which gives you a simple way to generate and graph statistics (see Recipe 2-9).

Installing dionaea

There are numerous packages to install to properly set up dionaea. Rather than detail each step, we will refer you to the dionaea project page,3 which has the installation process well documented. You need to compile several packages from source, as dionaea needs versions of various packages that are likely not available through your package manager. The recommended OS for installing dionaea is Ubuntu or Debian Linux; however, you should be able to set it up on most Unix-based platforms.

Once you have successfully installed dionaea, you should have all of your files in /opt/dionaea. The next few recipes refer to this directory as $DIONAEA_HOME. One of the first things you’ll want to do is decide on some basic settings found in dionaea’s main configuration file at $DIONAEA_HOME/etc/dionaea/dionaea.conf.

The Logging Section

By default, dionaea will log everything (debug, info, message, warning, critical, and error messages). It’s good to keep the default settings while you install and become familiar with dionaea. However, if you are running a very busy sensor, the size of your log file can increase by several hundred gigabytes per day. Before putting your honeypot into “production” mode, we recommend changing the logging configuration in the following manner:

Table 2-1: Log Level Changes to Consider

Under the “default” parameters

Original Value

New Value

levels = "all"

levels = "all,-debug"

Under the “errors” parameters

Original Value

New Value

lev

levels = "error"

Like nepenthes, dionaea also has options to submit files over HTTP. The configuration is set up by default to submit binaries to the online sandboxes of Anubis, Norman, and the University of Mannheim’s CWSandbox instance (see Recipe 4-6). If you do not want to submit files to these sandboxes, you need to comment out the relevant portions in the configuration file. In the logging section, you can also set up dionaea to submit code to Joebox or even to your own HTTP handler—which is described more in Recipe 2-5.

The IP Section

By default, dionaea will bind to all IP addresses using both IPv4 and IPv6. Depending on how many IP addresses you have configured on your honeypot system, this can cause dionaea to take a bit of time to initialize. If you want to quickly have dionaea bind to all IPs without iterating each one, or restrict the IPs to which it binds, you may want to make changes like the following to the configuration file:

mode = "manual" // was "getifaddrs"

In the previous example, we changed the mode to "manual", which is set to "getifaddrs" by default. When the configuration file is set to manual, you must then supply information about what interface(s) and IP address(es) you want dionaea to bind to. The following are five possible example settings showing how you could configure your sensor.

# bind to all IPv4 addresses on eth0 interface

addrs = { eth0 = ["0.0.0.0"] }

# bind to .50 and .51 on eth0 interface

addrs = { eth0 = ["10.14.49.50", "10.14.49.51"] }

# bind to .50 on eth0 and all IPv4 on eth1

addrs = { eth0 = ["10.14.49.50"], eth1 = ["0.0.0.0"] }

# bind to all IPv6 addresses on eth0

addrs = { eth0 = ["::"] }

# bind to all IPv4 and all IPv6 addresses on eth0

addrs = { eth0 = ["::"], eth0 = ["0.0.0.0"] }

You can choose to bind to all IPv4 addresses on an interface by using 0.0.0.0, all IPv4 and IPv6 addresses by using ::, and individual addresses by just listing them out separated by a comma. You can mix and match different settings and protocols with different interfaces.

The Module Section

In the modules section, you can enable, disable, and configure various features and tools used by dionaea. Of particular interest are two of its subsections, ihandlers and services. Their default settings are shown in the following code:

ihandlers = {

handlers = ["ftpdownload",

"tftpdownload",

"emuprofile",

"cmdshell",

"store",

"uniquedownload",

"logsql",

// "logxmpp",

// "p0f",

// "surfids"]

}

services = {

serve = ["http",

"https",

"tftp",

"ftp",

"mirror",

"smb",

"epmap"]

}

Dionaea can make use of an SQLite database (the logsql handler) and it is enabled by default. If you do not want to use a SQLite database to store the activity from your sensor, you can comment out that line. You will learn to use the logxmpp and p0f handlers in Recipes 2-6 and 2-8, respectively. As for the services section, you may want to consider removing several of the listed services such as http, https, and ftp. Consider the information below to help you determine if you want to disable any of dionaea’s services.

· smb and epmap: Essential to collecting malware with dionaea, because a majority of malware is seen from attacks against the smb and epmap services.

· tftp: Functions as a TFTP server that accepts arbitrary file transfers and also detects attempts to exploit vulnerabilities against the TFTP service.

· http and https: Act as a web server and serves files from $DIONAEA_HOME/var/dionaea/wwwroot/.

· ftp: Permits all logins and captures files should someone choose to upload them. We recommend disabling this service as it does not currently have exploit detection and turning your machine into a file server for the Internet can be dangerous.

If you choose to disable any services, you can delete the service’s name from the configuration or place a comment (//) to the left of the name. We recommend using comments so you don’t forget the service names if you ever want to re-enable them.

Running dionaea

To start dionaea, execute the following command:

$ sudo ./dionaea -u nobody -g nogroup \

-p /opt/dionaea/var/dionaea.pid -D

Dionaea Version 0.1.0

Compiled on Linux/x86 at Jul 10 2010 13:03:11 with gcc 4.4.3

Started on s1.mac running Linux/i686 release 2.6.32-22-generic-pae

[12072010 22:26:12] dionaea dionaea.c:238: User nobody has uid 65534

[12072010 22:26:12] dionaea dionaea.c:257: Group nogroup has gid 65534

Dionaea is now running and will interact with attacks as they occur. The next recipes show what you can do with the samples after you collect them.

3 http://dionaea.carnivore.it/#compiling

Recipe 2-5: Accepting Dionaea Submissions over HTTP with Python

dvd1.eps

You can find supporting material for this recipe on the companion DVD.

As mentioned earlier, by default, dionaea is set up to submit samples it receives to three different sandbox systems. However, you can configure dionaea to submit files to any URL that you want. This recipe assumes that you’ve read and followed the same steps described in Recipe 2-3 to set up the wwwhoney Python web server supplied on the book’s DVD. The code that follows shows the contents of dionaea.py, which handles submissions from dionaea.

#!/usr/bin/python

import sys

import cgi

import hashlib

from libhoney import *

form = cgi.FieldStorage()

if not form:

sys.exit()

(data, filename) = getFile(form, "upfile")

printHeader()

# error if there's no file

if not data or not filename:

sys.exit()

# if the file already exists, we don't want it again

md5 = hashlib.md5(data).hexdigest()

if fileExists(md5):

sys.exit()

else:

storeFile(data, md5)

This script takes binary submissions from the dionaea sensors, checks if the file exists in your collection, and if not, saves the file to the ./binaries/ directory. To configure dionaea to play its role in the setup, you can add the following configuration to your dionaea.conf:

Malware_Analysts_Cookbook =

{

urls = ["http://192.168.1.100:9000/dionaea.py"]

email = "malware@cook.book"

user = "malware"

pass = "cookbook"

}

You, of course, need to modify the URL to point to your own server and only need to supply a username and password if you are protecting access to the URL with basic authentication. Once this is set up, you can point any number of dionaea sensors to your server and collect malware binaries in a central location.

Recipe 2-6: Real-time Event Notification and Binary Sharing with XMPP

One of the most interesting and innovative modules that comes with dionaea is the Extensible Messaging and Presence Protocol (XMPP) module, which you can use for real-time communications. If you have ever used a Jabber server or Google Talk, you have used XMPP. But dionaea takes real-time communication and binary sharing to a whole new level with its XMPP module. Instead of just logging information to chat channels, dionaea shares the binaries it has received with other clients on the channel. This gives you the power of distributed malware collection if you have friends or relationships with companies who also use dionaea.

Configuring Dionaea to Use XMPP

If you plan to use XMPP, you first need access to an instant messaging server that supports Jabber/XMPP protocols. The developers of dionaea use a modified version of Prosody,4 and it may also be possible to use ejabberd.5 Regardless of which software you choose, it is a good idea to use a server that was specifically set up for honeypot activity. The amount of data and size of files may not be permitted on public servers and may result in your being banned or removed from the server for abuse. You can read more about XMPP on the dionaea developer blog.6

For dionaea to use the XMPP module, you first need to enable logxmpp in the ihandlers section of dionaea.conf. The default configuration is set to use the developer’s Prosody server and share binaries anonymously with other clients. This means that identifying host information is removed when data is sent to the chat rooms. The amount of information shared is configurable from within dionaea.conf in the logxmpp section under the events directive.

Logging Attack Data from an XMPP Channel

To log attack data from to an XMPP channel, you can use the Python script at $DIONAEA_HOME/modules/python/util/xmpp/pg_backend.py. It logs into the specified XMPP server and parses all the XML messages sent to the chat rooms that you join. This XML data contains attack information and malicious binaries that are seen by the dionaea sensors. When you use pg_backend.py, you can provide a path to which binary files should be saved. If you supply database credentials, all attack activity from the various sensors can be logged to a central database. The following command shows the syntax for joining two channels, logging data to a database, and storing binary files to the /tmp directory.

$ python pg_backend.py -U username -P password \

-M server -C anon-files \

-C anon-events –d database \

–u db_user –p db_pass –f /tmp/

Table 2-2 provides a quick explanation of the switches.

Table 2-2: Options for pg_backend.py

Switch

Description

-U

Chatroom username

-P

Chatroom password

-M

XMPP server address

-C

Multi-user chatroom to join

-d

Database

-u

Database username

-p

Database password

-f

File path where binaries will be saved to

4 http://prosody.im/

5 http://www.ejabberd.im/

6 http://carnivore.it/2010/01/26/xmpp_-_basics

Recipe 2-7: Analyzing and Replaying Attacks Logged by Dionea

Dionaea makes use of something the developers call bi-directional streams or bistreams. Bistreams provide you with an easy way to retransmit data previously sent to your honeypot in a manner similar to the tcpreplay7 tool. You can leverage bistreams to replay an attack to a target server (your honeypot or any other system) for testing or troubleshooting purposes. If you take it a step further, you can modify bistreams to verify if any other input leads to exploitable conditions and perhaps to create a metasploit module out of your findings.

To create bistreams, dionaea records all attacks and stores the payloads from the incoming and outgoing packets as a list of Python tuples. The first entry is the direction (in or out) and the second is the data that is sent or received. For example, if a remote machine sent the NULL-terminated string 'hello' to your honeypot and the honeypot responded with 'goodbye', the conversation would be represented like this:

stream = [ ('in', b'hello\x00'), ('out', b'goodbye\x00'), ]

The previous line of code is saved in a Python file named according to the date, the service (such as smb, epmap, http) that handled the traffic, and the remote system’s IP address. Once you determine which file contains the attack data that you want to replay, use the Python script at $DIONAEA_HOME/modules/python/util/retry.py. The following command shows an example of replaying the traffic sent from 99.60.24.198 to your honeypot.

$ ./retry.py -sr -H localhost -p 445 -f smb-99.60.24.198\:4997-LAUhvL.py

doing smb-99.60.24.198:4997-LAUhvL.py

recv 89 of 89 bytes

recv 142 of 142 bytes

recv 142 of 142 bytes

recv 50 of 50 bytes

recv 139 of 139 bytes

recv 128 of 128 bytes

recv 84 of 84 bytes

If you replay an attack against your dionaea server, the results and activity are logged along with everything else. You can navigate to the bistreams directory and obtain a copy of the replay attack as dionaea sees it. Here’s how you verify that your honeypot received the replay traffic:

$ ls -l |grep 127.0.0.1

-rw------- 1 nobody nogroup 10291 2010-07-12 01:52 smb-127.0.0.1:48060-eaNqUN.py

In reality it would not serve much purpose to just replay an attack against your own dionaea server. It would more likely be useful for you to test this attack against a Windows VM that you have patched. For example, if you noticed a new attack, you could test for a possible 0-day exploit by replaying it against your fully patched system. As previously mentioned, you can use a text editor and manipulate data in the bistreams and then replay the attack using a variation of the original.

7 http://tcpreplay.synfin.net/

Recipe 2-8: Passive Identification of Remote Systems with p0f

Dionaea supports integration with p0f 8—a passive operating system identification tool. While not essential to analyzing malware, you can use p0f to identify the architecture (e.g., Windows, Linux), version (e.g., 2000, XP, Vista), service pack, and link type of the systems probing your honeypot. To get started, install p0f using the following command:

$ sudo apt-get install p0f

You will then need to enable p0f in dionaea.conf by removing the comment from p0f and logsql (because dionaea logs p0f results to an SQLite database) in the ihandlers section. By default, dionaea is configured to read data collected by p0f using a Unix domain socket (for inter-process communication) created at /tmp/p0f.sock. You can modify this name if you want, as long as it is supplied at the command line when you run p0f. To start p0f so that dionaea can use it, run the following command:

$ sudo p0f -i any -u root -Q /tmp/p0f.sock -q -l -d -o /dev/null \

-c 1024

Table 2-3 provides an explanation of the switches.

Table 2-3: p0f Switches

Switch

Description

-i any

The interface to listen on, such as eth0, eth1, and so on, or any to listen on all available interfaces.

-u root

chroot and setuid to root.

-Q /tmp/p0f.sock

Creates a Unix domain socket using the specified name.

-q

Does not display a banner.

-l

Uses single line output.

-d

Runs p0f as a daemon.

-o /dev/null

Sends all output to /dev/null.

-c 1024

Caches size for use with -Q.

This starts p0f as a daemon and makes it available for dionaea to use. You need to modify the permissions to the socket so that the account you are running dionaea under can read it. If you are running dionaea with the account nobody, you would make the following change:

$ sudo chown nobody:nogroup /tmp/p0f.sock

You must start (or re-start) dionaea for the p0f module to initialize. Once your honeypot begins receiving probes and attacks, you can use the following commands to verify that p0f logging is working properly:

$ sqlite3 /opt/dionaea/var/dionaea/logsql.sqlite

sqlite> select p0f,p0f_genre,p0f_link,p0f_detail from p0fs limit 10;

1|Windows|ethernet/modem|2000 SP4, XP SP1+

2|Windows|IPv6/IPIP|2000 SP4, XP SP1+

3|Windows|ethernet/modem|2000 SP4, XP SP1+

4|Windows|ethernet/modem|2000 SP4, XP SP1+

5|Windows|IPv6/IPIP|2000 SP4, XP SP1+

6|Windows|IPv6/IPIP|2000 SP4, XP SP1+

7|Windows|pppoe (DSL)|XP/2000 (RFC1323+, w+, tstamp+)

8|Windows|ethernet/modem|XP SP1+, 2000 SP3

9|Windows|ethernet/modem|2000 SP4, XP SP1+

10|Windows|IPv6/IPIP|2000 SP4, XP SP1+

As you can see, the first ten probes of our honeypot were all from Windows systems running 2000 or XP. This isn’t highly surprising, but once you collect data for a while, the statistics may be more meaningful for you. Keep in mind that p0f results are not guaranteed to be accurate, as some tools can disguise a machine’s network stack.

8 http://lcamtuf.coredump.cx/p0f.shtml

Recipe 2-9: Graphing Dionaea Attack Patterns with SQLite and Gnuplot

If you enable logsql so that activity from dionaea is stored in an SQLite database, you may be interested in plotting the data into a graph. This recipe shows how to use gnuplot9 to generate graphs from dionaea’s SQLite database. In December 2009, the dionaea development team posted two fairly large databases, named berlin and paris,10 which contain a ton of attack data. This recipe uses one of the databases, berlin, for graph plotting. You can download this database and follow the exact steps outlined in this recipe.

Berlin and Paris Details

The following list shows details about berlin:

· Contains one month of data (November 5–December 7, 2009)

· Contains 600,000 recorded attacks that resulted in 2,700 binary downloads

· Does not contain attacks by Conficker nodes (IP not in scan range)

· Includes p0f logging

The following list shows details about paris:

· Contains just over a week of data (November 29–December 7, 2009)

· Contains 7.8 million recorded attacks that resulted in 750,000 binary downloads

· Contains large amounts of Conficker traffic

Generating Graphs with gnuplot

To generate graphs from a dionaea database, follow these steps:

1. Download the berlin database from the location specified in the following command. Alternately, you can use paris or a database created by your own dionaea sensors.

$ wget ftp://ftp.carnivore.it/projects/dionaea/rawdata/\

berlin-20091207-logsql.sqlite.bz2 --no-passive-ftp

$ bunzip2 berlin-20091207-logsql.sqlite.bz2

The ftp.carnivore.it site uses active FTP, so you will need to add the —no-passive-ftp flag when using wget.

2. Create a SQL query that retrieves the type of information you’re interested in. The query listed in the following code obtains the number of binary downloads and attacks for each day in the databases. Save this query to a file called query.sql.

SELECT

strftime('%Y-%m-%d',connection_timestamp,'unixepoch',

'localtime')AS date,

count(DISTINCT downloads),

count(DISTINCT connections.connection)

FROM

connections

LEFT OUTER JOIN downloads ON (downloads.connection ==

connections.connection)

GROUP BY

strftime('%Y-%m-%d',connection_timestamp,'unixepoch',

'localtime')

ORDER BY

date ASC;

3. Execute the query against your target database and save the output to a text file.

$ sqlite3 berlin-20091207-logsql.sqlite

sqlite> .output data.txt

sqlite> .read query.sql

4. Exit SQLite by pressing Ctrl+D. Your data.txt file should look like the following:

$ cat data.txt

2009-11-05|80|5290

2009-11-06|62|5893

2009-11-07|73|4904

2009-11-08|92|7366

2009-11-09|76|5882

2009-11-10|94|5947

2009-11-11|65|5121

2009-11-12|59|5618

2009-11-13|56|4217

2009-11-14|53|3423

2009-11-15|51|4276

2009-11-16|69|4779

2009-11-17|83|8327

2009-11-18|69|13719

2009-11-19|362|148790

2009-11-20|3|229618

2009-11-21|9|3324

2009-11-22|75|8308

2009-11-23|68|7936

2009-11-24|87|9503

2009-11-25|114|9823

2009-11-26|87|7769

2009-11-27|114|9168

2009-11-28|141|9420

2009-11-29|63|4919

2009-11-30|95|12034

2009-12-01|65|12383

2009-12-02|79|8373

2009-12-03|77|7597

2009-12-04|112|8263

2009-12-05|96|10438

2009-12-06|81|9846

2009-12-07|16|1927

A pipe separates the columns. The first column is the date of the activity. The second column is the number of binaries that were downloaded on the corresponding date. The third column is the number of attacks that were observed on the corresponding date (not every attack results in a downloaded file).

5. Create a graph from the data using gnuplot. The following commands show how to install gnuplot on your Ubuntu system and then how to set the parameters of the graph.

$ apt-get install gnuplot

$ gnuplot

gnuplot> set terminal png size 750,210 nocrop butt font

"/usr/share/fonts/truetype/ttf-liberation\

/LiberationSans-Regular.ttf" 8

Terminal type set to 'png'

Options are 'nocrop font /usr/share/fonts/truetype/ttf-liberation\

/LiberationSans-Regular.ttf 8 butt size 750,210 '

gnuplot> set output "berlin.png"

gnuplot> set xdata time

gnuplot> set timefmt "%Y-%m-%d"

gnuplot> set format x "%b %d"

gnuplot> set ylabel "binaries"

gnuplot> set y2label "attacks"

gnuplot> set y2tics

gnuplot> set datafile separator "|"

gnuplot> plot "data.txt" using 1:2 title "binaries" with lines, \

"data.txt" using 1:3 title "attacks" with lines axes x1y2

You should now have a PNG file called berlin.png in your current working directory with data plotted on it that looks like Figure 2-2.

Figure 2-2: Attacks and binaries from the berlin database

f0202.eps

The graph shows the number of attacks on a dotted line, plotted against the Y-axis on the right. The number of downloaded binaries appears on a solid line, and is plotted against the Y-axis on the left. As you can see, the number of downloaded binaries rises and falls along with the number of attacks—which makes sense.

This is just one example of what you can do with the data from the dionaea database. You can create new queries and create all kinds of graphs with different data sets in the database. You can also learn more about the features of gnuplot from their website and other tutorials on the Internet to create even more advanced plotting.

9 http://www.gnuplot.info/

10 http://carnivore.it/2009/12/08/post_it_yourself