A Primer on Detection for Security - How to Defeat Advanced Malware: New Tools for Protection and Forensics (2015)

How to Defeat Advanced Malware: New Tools for Protection and Forensics (2015)

Chapter 1. A Primer on Detection for Security

Abstract

The security industry has relied for years on end point protection software that aims to detect specific behavioral patterns – signatures – of malware in order to protect IT systems. However, in today’s rapidly moving front of highly tailored malware, it has been proven to be impossible to build a useful signature-based detector for polymorphic malware.

Keywords

malware

polymorphic malware

endpoint protection

endpoint protection industry (EPP)

ROC curve

The security industry has relied for years on endpoint protection software that aims to detect specific behavioral patterns – signatures – of malware in order to protect a system under attack. Most signatures today attempt to capture key behavioral patterns of all variants of a particular exploit or class of malware. In fact, McAfee now reports identifying more than 75,000 unique variants of malware per day, most of which are slight variants on a few successful attacks, on a single vulnerability. If one can accurately capture the pattern, a single signature can deal with many variants. This approach is the key to success: The average “.dat” signature file measures 100 MB in size, and with thousands being added every day (Symantec1 created more than 10 million unique signatures in 2010), the problem of distributing signatures to endpoints has become severe with the net result that PCs can remain unprotected for a long time.

All detectors must be evaluated for accuracy against four key metrics, namely (for a given sample) the proportion of {True Positive, True Negative, False Positive, False Negative} results that the detector produces. The meaning of these is straightforward:

TPF: The frequency of samples that contained attacks and that was correctly identified

TNF: The frequency of samples that did not contain an attack and was not identified

FPF: The frequency of samples that was incorrectly identified as containing an attack, and

FNF: The frequency of samples that contained a real attack that was not identified.

The ROC curve and the four fractions listed above can be shown graphically as the areas of intersection of two statistical distributions. The distributions plot the value of the detector (e.g., the degree of suspicion of the detector that a particular event is a real attack) for both nonattack traffic and the actual attacks. An example ROC curve is shown below.

image

Every detector has a threshold at which it will trigger an alarm, and setting the threshold is critical to the utility of the detector in practice. What is the key is the ability of the detector to separate real attacks from normal traffic. A better detector separates the two curves more cleanly, leaving less overlap. The challenge is to accurately detect attacks given the enormous number of slight variations in malware that can be easily generated by an attacker, without increasing the False Positive or False Negative frequencies to the point that the detector is not useful.

It is important to understand that:

1. No detector is perfect. When a detector fails (False Negative), the attacker will succeed.

2. Tuning a detector is a careful balance of trading off False Positives (which train users/IT teams to ignore alarms) against False Negatives (which in turn allow attackers to successfully avoid detection), and doing so requires careful analysis by experts, and a large, relevant data set to check against.

3. Unfortunately today’s rapidly moving front of highly tailored malware adapts fast, leaves no time for human assessment, and makes historical attack data sets used to tune detectors significantly less useful.

4. It has been proven that it is impossible to build a useful signature-based detector for polymorphic malware: “The challenge of signature-based detection is to model a space on the order of O(28n) signatures to catch attacks hidden by polymorphism. To cover thirty-byte decoders requires O(2240) potential signatures; for comparison there exist an estimated 280 atoms in the universe.”2

1.1. Today’s approach: “compromise-first detection”

The endpoint protection industry (EPP) today relies on classic signature-based attack detection. We call this “compromise-first detection” because the increasing difficulty of differentiating between normal and attacker behavior has resulted in both high False Positives and high False Negatives. This occurs when the detector is unable to sufficiently distinguish between attack and non-attack traffic, causing significant overlap of the two distributions measured by the detector, as shown further. The ratio of the TPF to FPF is sometimes called the signal to noise ratio (SNR). A low SNR loses True Positives in a sea of False Positives, training users, and administrators to ignore warnings, and wasting the time of security staff.

image

As a result, the EPP industry has come to rely heavily on detectors that are sufficiently accurate only if they detect malware when it actually compromises the system, for example, when it overwrites a key Windows system dynamic-link library (DLL) or registry entry, or persists a file with a known-bad signature. Unfortunately, at this point, the system has already been compromised and must at the very least be reimaged, incurring costs to IT and downtime for users. Worse still, sophisticated attacks are crafted to immediately take advantage of an exploit, so with this type of detection, by the time the alert has been raised or blocking initiated (such as terminating a connection), the attacker may already have achieved his/her goal, such as stealing a file or moving deeper into the enterprise infrastructure. From the moment an attacker first compromises a single machine, the cost of remediation increases exponentially with time, because the attacker will rapidly penetrate deeper into the enterprise, causing more damage, requiring substantial additional remediation, and exposing more users and data.

Compromise-first detection is problematic. Delays in signature distribution together with detector inaccuracy aid the attacker, and the cost of remediation is high – all systems that might have been penetrated must be reimaged.

Ultimately, EPP vendors face an impossible challenge trading off False Positives versus False Negatives: They lose either way, and so do their customers.


1 Wired Business Media, January 06, 2012 “Symantec Confirms Hackers Accessed Source Code of Two Enterprise Security Products.”

2 On the Infeasibility of Modeling Polymorphic Shellcode, Columbia University.