Trade-Offs When Addressing Threats - Managing and Addressing Threats - Threat Modeling: Designing for Security (2014)

Threat Modeling: Designing for Security (2014)

Part III. Managing and Addressing Threats

Chapter 9. Trade-Offs When Addressing Threats

After you create a list of threats, you should consider whether standard approaches will work. It is often faster to do so than to assess the risk trade-offs and the variety of ways you might deal with the problem. Of course, it's helpful to understand that there are ways to manage risks other than the tactics and technologies you learned about in Chapter 8, “Defensive Tactics and Technologies,” and those more complex approaches are the subject of this chapter.

For each threat in your list, you need to make one or more decisions. The first decision is your strategy: Should you accept the risk, address it, avoid it, or transfer it? If you're going to address it, you must next decide when, and then how? There are a variety of ways to think about when to address the threat. Table 9.1 provides an example to make these choices appear more concrete and to help separate them:

Table 9.1 Sample Risk Approach Tracking Table

Item #


Why Not Use Standard Mitigation?




Physical tampering

We don't own the hardware.


Document on website

This may seem like a lot of things to do for every threat, but the first approach to fixing most issues is to try to apply standard mitigations, and only look for an alternative when that fails.

This chapter first teaches you about risk management, in the sense of avoiding, addressing, accepting, or transferring risks. You'll learn how to apply risk management to software design. From there, you'll learn about a variety of threat-specific prioritization approaches, ranging from the simple to the complex. The chapter also covers risk acceptance, and closes with a brief discussion of arms races in mitigation strategies.

Classic Strategies for Risk Management

As you consider each threat, you should make three decisions. Logically, these can be ordered as follows:

1. What's the level of risk?

2. What do you want to do to address that risk?

3. How are you going to achieve that?

Strategizing in this manner can be expensive. Therefore, when there is an easy way to address a problem, you should skip strategizing and just address it. Not only is it easier, it avoids the possibility that your risk management approach will lead you astray.

When you do find it necessary to strategize, a few classic strategies exist for addressing risks: You can avoid them, address them, accept them, or transfer them. You can also ignore risks, but that option is not ignored in this section.

Avoiding Risks

Avoiding risks is a great approach to the extent that you can do so. A good risk assessment can help you determine whether a risk is greater than the potential reward. If it is, then the risk may be worth avoiding. As the saying goes, “a ship in the harbor is safe, but that is not what ships are for.” So how do you avoid a risk? You don't build the feature, or you change the design sufficiently that the risk disappears.

Addressing Risks

Addressing risks is also a perfectly valid approach. The main ways you do so are via design changes (such as adding cryptography) or operational processes. Design and operational changes were covered in Chapter 8.

Accepting Risks

You can choose to accept a risk by deciding to accept all the costs of things going wrong. This is easier when you're operating a service than when you're building a product. You can't accept risk on behalf of your users or customers—if there's risk that affects them, that risk is transferred.

Sometimes a threat is real but the probability is very low, or the impact is minor. In these situations, it's probably reasonable to accept the risk. Before doing so, it may be helpful to reassess both the probability and the impact, asking whether there are any factors that would change either (e.g., “do not use Panexa while pregnant or when you might become pregnant”). It can also be illuminating to ask whether you would accept the risk for yourself, if you couldn't hand it to customers or others. Note that the word “illuminating” is used, rather than “useful.”

Risk acceptance differs from “ignore it” or “wait and see” (see the sections on each later in this chapter) insofar as it entails accepting that certain things are real risks. For example, Microsoft treats threats that involve unconstrained access to hardware as non-threats (Culp, 2013), and is explicit about the risk acceptance and the rationale behind that decision.

Transferring Risks

Risks that fall on your customers or end users are transferred risks. Many products bundle some level of risk with them, and use some combination of terms of service, licensing agreements, or user interface to transfer the risk. You should clearly disclose such risks.

Ignoring Risks

A traditional approach to risk in information security is to ignore it. Approaches to measuring risk have been hampered by an orientation towards secrecy and obscurity. Historically, both the occurrence of each breach and impact information has been generally kept secret. As a result, calculating frequency and making predictions have been hard, and a de facto strategy of ignoring risks emerged. From an executive standpoint, this strategy was highly effective, if frustrating for security staff.

This approach is becoming less effective as a combination of contracts, lawsuits, and laws increase the risk of ignoring risks. In particular, a variety of new American laws make ignoring information security risks more risky. They include breach disclosure laws, general information security laws, sectorial information security laws, and Federal law for public companies. Breach disclosure laws usually do not (directly) regulate information security, but require disclosure when it fails. Some states now have general security laws that apply to any storage of certain types of personal information. Outside of the U.S., there may also be requirements to disclose security breaches, especially those that involve a loss of control of personal information, or in certain sectors (such as telecommunications in Europe). The overall regulatory situation around security risks is also rapidly evolving, and disclosures are helping security professionals learn from each other's mistakes, and focus attention on important issues.

Finally, if you are threat modeling and create a list of security problems that you decide not to address, please send a copy of the list to the author, care of the publisher. There will be quarterly auctions to sell them to plaintiff's attorneys (or other interested parties). Even if you don't send them to me, they may be revealed by whistleblowers, accidentally shared or disclosed, or discovered as part of a legal action. All in all, it seems that “ignore it” is a riskier proposition than it has been before.

Selecting Mitigations for Risk Management

There are many ways to select mitigations. The following sections describe how to integrate risk management into your decisions about how to mitigate threats. The ways to mitigate cover a spectrum, with risk increasing as you move from changing the design through standard tactics and technologies to designing your own.

Changing the Design

The first way to address risks in a design is to eliminate the features to eliminate the risks. For example, if you have payroll software that is accessible to the entire Internet without authentication, then changing the design so that it's only available on your corporate intranet will reduce the risk. This aligns with the risk-avoidance approach. Unfortunately, carried to its logical conclusion, this leaves you with software that doesn't do anything at all. While that has a Zen simplicity to it, Zen simplicity is usually not a requirement; features are.

This brings you to the second way to address risks: Change the design by adding features that reduce risks. For example, you can redesign the software to add authentication and authorization features.

There are two ways to think about changing designs: iterative and comparative. Iterative means altering a small number of components, with each change intended to reduce the number of components or trust boundaries or otherwise eliminate some threats. The comparative method means coming up with two or more designs, and then comparing them. This would appear to be more expensive, and in the short term it is. However, it is common for iterative design to involve many iterations, so often it is more cost-effective (especially in the early days) to consider several designs and choose between them.

For an example of how one might change a design, consider Figure 9.1, which depicts a threat model for sending single-use login tokens to phones. (These tokens are often called one time tokens, abbreviated OTT.) There are many advantages to this type of authentication, including the capability to deploy to all sorts of phones, including voice-only phones, “dumb” mobile phones, and smartphones. The downside is a plethora of places where that token is subject to information disclosure.


Figure 9.1 An OTT threat model

Those places include the varied systems responsible for mobile phone roaming and the “femtocell” base stations that telephone companies distribute. They also include things such as Google Voice or iMessage, which put text messages onto the Internet in various ways. (These products are mentioned only as examples, rather than a comment on their security.)

There's a variety of ways to change the design and address the information disclosure threats that apply to Figure 9.1. Simplified versions of ways to address these threats follow. In each, m1 is a message from the server to the client, while m2 is a response.

§ Send a nonce, encrypted to a key held on a smartphone, and then send the decrypted nonce to the authentication server. The server checks that the noncen is the one that it encrypted for the phone, and if it is, approves the transaction. (m1 = ephone(noncen), m2 = noncen).

§ Send a nonce to the smartphone, and then send a signed version of the nonce to the server. The server validates that the signature on the noncephone is from the expected phone, and that it is a good signature on the expected nonce. If both checks pass, then the server approves the transaction. (m1 = noncephone, m2 = signphone(noncephone)).

§ Send a nonce to the smartphone. The smartphone hashes the nonce with a secret value it shares with the server, and then sends that hash back. m1 = noncephone , m2 = hashphone(noncephone)).

It's important to manage the keys appropriately for each of these methods and to understand these are simplified examples; they do not include time stamps, message addressing, and other elements to make the system fully secure. Including those in this discussion makes it hard to see the ways in which cryptographic building blocks could be applied.

The key in all design changes is to understand the differences introduced by the changes, and how those changes interact with the software requirements as a whole.


This model and some ways to address the threats, are worked through in more detail in Appendix E, “Case Studies.”

For another example of comparative threat modeling, consider the two systems shown in Figures 9.2 and 9.3. Figure 9.2 depicts an e-mail system, and Figure 9.3 is a version of 9-2 with a “lawful intercept” module added. (“Lawful intercept” is an Orwellian phrase for “thing which allows people to bypass the security features of your system.” Setting aside any arguments of “should we as a society have such a mechanism?” it's possible to assess the technical security implications of adding such mechanisms.)


Figure 9.2 An e-mail system


Figure 9.3 The same e-mail system with a lawful access module

It should be obvious that Figure 9.2 is more secure than Figure 9.3. Using software-centric modeling, Figure 9.3 adds two data flows and a process; thus, by STRIDE-per-element, it has an additional 12 threats (tampering, information disclosure, DoS with each flow, for 6; and the six S,T,R, I, D, and E threats against the process for a total of 12). Additionally, Figure 9.3 has two apparent groupings of elevation-of-privilege threats: those posed by outsiders and those posed by software-allowed, but human-policy-violating, use. Thus, if Figure 9.2 has a list of threats (1…n), then Figure 9.3 has a list of threats (1…n+14).

If instead of software-centric modeling you use attacker-centered modeling on the systems shown in Figures 9.2 and 9.3, you find two sets of threats: First, each law enforcement agency that is authorized to connect adds its employees and IT systems as possible threats, and possible threat vectors. Second, attackers are likely to attack these features of the system to abuse them. The 2010 “Aurora” attacks on Google and others allegedly did exactly this (McMillan, 2010, and Adida, 2013). Thus, by comparing them you can see that the addition of these features creates additional risk. You might also wonder where those risks fall, but that's outside the scope of this example.

More subtly, the addition of the code in Figure 9.3 is an obvious source of security vulnerabilities. As such, it may draw attention and possibly effort away from the rest of the system. Thus, the components that comprise Figure 9.2 are likely to be less secure, even ignoring the threats to the additional components. In the same vein, the requests and implementations for such backdoors may be confidential or classified. If that's the case, the features may not go through normal tracking for implementation, testing, or review, again reducing the odds that they are secure. Of course, because such a system is designed to bypass other security controls, any weaknesses are likely to have outsized impact.

Applying Standard Mitigation Technologies

To the extent that standard approaches will effectively solve a problem, they should be used. Devising new approaches to defense is very simple and lots of fun. Unfortunately, devising effective new approaches, building, and testing them can be very time consuming.


To be fair, testing newly devised mitigation approaches can actually be super-quick sometimes, as the new defense will fall with just a few minutes of expert scrutiny. PGP creator Phil Zimmerman tells the story of bringing the cipher he created for the first version of his popular encryption software to a large gathering of cryptographers, where he saw months of his painstaking work demolished over lunch. After that, PGP switched to using standard ciphers.

There are a number of ways in which you can use standard mitigations, including platform-provided ones, developer-implemented ones, or operational ones.

Platform-Provided Mitigations

Software developers have a number of ways to code mitigations to common threats. Each has pros and cons. That's not to say that they're all equal, or that any one has advantages in every situation. As a general rule, using the defenses in whatever platform you're building on is a good idea for several reasons: First, they often run at a higher trust level than defenses you can build. Second, they're generally well designed and subjected to a high level of scrutiny before you get to them. Related to that, they're often more intensively tested than what you can justify. Finally, they're usually either free or included in the price of the platform.

Developer-Implemented Mitigations

There is an entire set of mitigations that are not platform provided, and must be implemented by developers. Almost all of these can be seen as feature development work: Implement a cryptographic scheme to address a spoofing threat; implement a better logging system to address a repudiation threat, and so on.

An interesting and growing set of software is built and operated by the same organization. This creates an additional class of defensive opportunities that you can design. This class focuses on attack detection, whereby attempts to execute injection or overflow attacks can be found and repaired. With the right investment in detection and response capabilities, this allows some deferment of security costs to find and fix those bugs. Doing so requires developing the features to detect such attacks. It also increases the importance of information disclosure attacks, which disclose your source (or binary) to an attacker, possibly enabling them to develop a reliable exploit without triggering your attack detection.

Operational Mitigations

Systems administrators should consider a number of trade-offs in terms of policy, procedural, and software defenses. To the extent that software defenses are available, they will scale better and be more reliable than process-oriented controls. Operationally, process-oriented controls provide defense against a broader swath of issues. In particular, effective change control helps manage a wide variety of issues. A firewall to control issues at the network or edge is easier to manage and maintain. However, at the machine level, there is less doubt about what a packet means (Ptacek, 1998). Highly targeted defensive programs are often technically superior to more general ones, but they are less well-integrated into operations, which may mean that they are less effective overall. Many classes of mitigations that have become commercial standards are “arms race” technology.

For example, a signature-based anti-virus program is only as good as its last update. In contrast, a firewall will block packets according to its rules. Some firewalls were implemented as packet filters, and can be fooled, but others implemented connection proxying (meaning the connection terminates at the firewall, and code on the firewall makes an additional connection to the target system). Those firewalls don't engage in a meaningful arms race, although some of them include signature-driven intrusion detection or prevention code, and that code requires regular updating. (Arms races are so much “fun” that they get their own section, towards the end of this chapter.)

It is valuable for developers to understand the system administrator's perspective on defenses, and vice versa. This is especially true of standard mitigations, where you can rely on a body of knowledge, analogies, and other facets of the mitigation being a standard approach to make it easier to operate. Developing a defensive system that no one can operate is about as bad as developing one that doesn't work. The ultimate goal of deploying operational technology that meets business needs requires developers and system administrators to understand the limits of available defenses. The example defenses shown in Table 9.2 are for the Acme SQL Database introduced previously in the book.

Table 9.2 Defenses by Who Can Implement Them


Instances That Developers Must Implement

Instances That IT Departments Must Implement


Web/SQL/other client brute-forcing logins
Authentication for DBA (human), DB users

Authentication infrastructure for: web client, SQL client DBA (human), DB users


Integrity protection for Data, Management, Logs

Integrity protection for front end(s), Database admin


Logs (log analysis must be protected)
Certain actions from web and SQL clients will need careful logging.
Certain actions from DBAs will need careful logging.

Logs (log analysis must be protected)
If DBAs are not fully trusted, a system in another privilege domain to log all commands might be required.

Information Disclosure

Data, management, and logs must be protected.
Front ends must implement access control.
Only the front ends should be able to access the data.

ACLs and security groups must be managed.
Backups must be protected.

Denial of Service

Front ends must be designed to minimize DoS risks

The system must be deployed with sufficient resources.

Elevation of Privilege

Trusting client
DB should support prepared statements to make injection harder.
There should be no default way to run commands on the server, and calls like exec() or system() must be permissioned and configurable if they exist.

Avoiding improperly trusting clients which were written locally.
Configure DB appropriately.

Designing a Custom Mitigation

As explained earlier, custom approaches are risky and expensive to verify, but sometimes you have no choice. Aspects of your design or implementation could prevent all the standard approaches. For example, if you're implementing a low-cost device with an eight-bit processor, it's probably not going to be able to use 2,048-bit RSA keys at acceptable speeds. At that point, you need to consider what you can do.

Because the defense will be custom, there are fewer specifics to talk about. However, some general guidelines apply. Ensure that you have a clearly written definition of the goal of the custom system, along with the constraints you're operating under. Consider what will happen when it breaks, and in particular what you'll do to update it.

Custom mitigations differ from non-security code in a very important way: It's hard to see that they're ineffective. The mistakes made by inexperienced database designers are easy to see; problems like performance will crop up relatively quickly and obviously. With security mitigations, it's easier to be fooled.

When you're designing a custom approach to mitigation, it's worthwhile to take a couple of unusual steps early in the process. The first is to share your motivation and design. You may well get useful feedback on it. You'll get more useful feedback if you make the design public (rather than requiring an NDA), and/or if you pay for feedback, perhaps by hiring experts to break it or by offering a prize to anyone who can break it.


Be careful to explain what you did separately from why you think those defenses make it hard to break; the interleaving can inhibit the free flow of ideas in the same way that criticism can shut down a brainstorming session.

Fuzzing Is Not a Mitigation

Oftentimes, to address a variety of threats, people will say “we'll fuzz that!” Fuzzing is the technique of generating random input for a program, and it is stunningly effective at finding bugs in code that's never been fuzzed. This is especially true of parsers. However, fuzzing is not a way of mitigating threats; it's a way of testing mitigations. Fuzzing will not make your code secure, it will help you find bugs, and as those bugs are fixed, the average time to find the next bug using random input goes up. However, the time for a clever human to find that next bug does not change. Therefore, over time, fuzzing becomes less effective. When you are tempted to fuzz, you should ensure it's in your test plan, but at design time you need to take other actions.

You can do several things to make parsers more secure at design/code time. The first is to design your file format or network protocol for safe parsing (Sassaman, 2013). If the format is not Turing complete, parsing it is easier. If it doesn't contain macros, loops, multiple ways to encode things, or the capability to encapsulate layers of encoding, your parser can be a lot simpler, and thus safer. The next thing you can do to make parsers safer is to use a safer language than the C family. C's lack of type safety and primitive, unsafe string handling functions make safe coding of parsers hard. If the thing being parsed is already defined and you can't redefine it (say, HTML), it may help to create a state machine for your parser. You may, quite appropriately, be laughing at the idea of creating a state machine for HTML. Well, if you can't describe it, good luck with making a safe parser for it. Finally, if there's a canonicalization step, canonicalize early, and run the input through it until the output matches the input.

Threat-Specific Prioritization Approaches

This section is all about risks you're going to address (rather than avoid, accept, or transfer). For the problems you decide to address, you have choices to make about how to approach those risks. Those choices include simply waiting and seeing whether the risks materialize or fixing the easy stuff first, using threat-ranking techniques of various sorts, or using some approach to estimate the cost of the problem.

Simple Approaches

The very simplest approaches to threat prioritization don't involve any math or calculation. They are “wait and see” and “do the easy fixes first.”

Wait and See

The wait and see approach to security issues sometimes works pretty well, and often fails catastrophically. It can differ from “ignore it” when the system in question is either an internal network or a service offering that can be monitored for problems. Wait and see is a worse technique for, say, a gas tank, than it is for a website. It can also be an example of what 451 Group analyst Wendy Nather calls “the cheeseburger case”: “Doc, I'm gonna keep eating cheeseburgers until I have a heart attack. Then we'll deal with it” (Nather, 2013). The cheeseburger case is less about accepting risks, but more about ignoring those risks—and doing nothing to mitigate them until a catastrophe forces you to pay attention. Most businesses have long used monitoring as a part of their risk management strategy. For example, if a bank notices that one of its employees suddenly has a flashy car, someone is likely to question where the money came from. Monitoring, the “see” part of “wait and see,” needs to be planned effectively. There are four main types of monitoring: change detection, signature attack detection, anomaly attack detection, and impact detection.


Related to the cheeseburger approach in risk management is the ostrich approach of sticking ones head in the sand. Of course, your colleagues don't really stick their heads in the sand, they say “no one would ever do that!” The best response to this is to say “if I can give you an example where someone did that, will you agree to fix it?” (This is another place where the repertoire that people develop en route to becoming an expert can come in very handy.)

Change Detection

Change detection focuses on the operational discipline of ensuring that change is managed. To the extent that changes are managed, anything outside the change management process must be treated as a problem. (Kim, 2006)

Signature Attack Detection

Signature attack detection is based on the idea that an interesting subset of attacks has certain definable signatures, either sequences of bytes or messages that will appear only in an attack. With the rise of production-quality exploit software, this signature-oriented approach appeared promising. However, as the products to detect signatures proliferated, the attack tools added polymorphism, and simple signature detection is less effective than its proponents had hoped.

Anomaly Attack Detection

Anomaly attack detection is based on the idea that there's a normal and unchanging set of network traffic. This is a pipe dream. As business changes, the normal traffic changes; and when it does, the anomaly detector needs to be retrained about what's normal. As business accelerates and change accelerates, it becomes increasingly difficult to continue training the system so that it knows what's normal. There are also normal abnormalities (for example, every Friday evening, there's an audit run, and every close of quarter there's another one). It may be possible to use people to investigate every anomaly, although the cost of doing so rises very quickly.

Impact Detection

The final major form of detection is impact detection. If suddenly the shipped product count for the quarter is out of whack with your accounts receivable, there's a problem. (A friend once did a penetration test in which he ordered minus three copies of a book. The system credited his credit card the cost of the three books, minus shipping, and a week later he got the three books in the mail. When the merchant was told what happened, it turned out that the shipping clerk had seen the negative three copies, figured it was a bug, and sent the three copies.)

Generally, mature operations use some combination of all the techniques—change detection, signature attack detection, anomaly attack detection, and impact detection. The precise combination that will work for a given organization's threat profile, risk acceptance, culture, regulatory environment, and so on, are specific to that organization. Wait and see is less valid in (at least) three cases. First, when there's a risk of injury or death. Duh. Enough said. Second, when you're shipping a product to your customer, especially physical products. Regardless of whether the product is a baby's crib, a car, or a piece of software, wait and see what goes wrong cannot be the primary risk management approach. Third, when you don't have a response planned. For example, if you operate banking software, you probably have an adjustable dollar level at which you manually check transactions for fraud. You might adjust it based on current capacity in the fraud management department or on the cost effectiveness of the activity.

Easy Fixes First

Some organizations just starting to threat model may begin by fixing those things that are easy to fix before going on to harder fixes. For many experienced security practitioners, this seems like an odd choice, but it's worth discussing the pros and cons of the approach. On the pro side, fixing security issues is a good thing, and demonstrating that threat modeling is producing actionable bugs may help ensure that threat modeling continues. The downside is that the issues you're fixing may be the wrong things, or things that don't seem relevant to other parts of an organization, and thus appear to be a waste of time.

If you need to start from an easy fix first approach, ensure that you do so in consultation with the people who are performing the fixes, and ensure they don't perceive it as busywork. In addition, plan to move from easy fixes to a more mature approach, as described in the remainder of this section.

Threat-Ranking with a Bug Bar

There are a few techniques that enable you to rank your threats with a little more precision than “easy” and “everything else.” These ranking techniques are designed to provide you with a consistent approach to addressing threats.


One of the first of these was DREAD. This awesome acronym stands for discoverability, reproducibility, exploitability, affected users, and damage. Unfortunately, DREAD is fairly subjective and leads to odd results in many circumstances. Therefore, as of 2010, DREAD is no longer recommended for use by the Microsoft SDL team.

The most effective simple prioritization technique is a bug bar. In a bug bar, bugs are given a severity based on a shared understanding of their impact. That shared understanding comes from a “bug bar” table that lists the criteria used to classify bugs, and might include examples. Over the product lifecycle, the bar defining what bugs must be fixed is adjusted. Thus, six months before shipping, a medium-impact bug would be fixed, whereas a bug discovered on the day of shipping would be fixed in a hotfix. The bar will likely be refined, and what must be fixed may change as your security processes mature. Microsoft makes fairly heavy use of the bug bar concept.

A version of the complete Microsoft SDL bug bar is available online. The bug bar is made available under a Creative Commons license that allows you to use it within your organization (Microsoft, 2012).

Cost Estimation Approaches

Sometimes there are business reasons to use a threat prioritization approach that produces cost estimates. This section covers two such approaches: probability/impact assessments and FAIR.

Probability/Impact Assessments

Assessing probability and impact is an obvious approach, but effective implementations are rare. There are a few sticking points, including that, unlike hurricanes or tornados, information security events are often caused by malice. However, insurance companies still write theft insurance, and don't go out of business too often. Over time, the industry will likely learn more about both probabilities and impacts from incidents disclosure because of breach disclosure laws, and probability/impact assessments will become a more useful part of threat modeling. If you're going to use a probability/impact assessment of any form, you'll need to figure out the cost of mitigation, and few approaches help you do that (Gordon, 2006).

Probability assessments in information security are notoriously hard to get right. Well-engineered systems can often be broken with very inexpensive equipment. Kryptonite bike locks were found vulnerable to a Bic pen (Kahney, 2004). Facial recognition systems have been found to recognize photographs of an authorized person (Nguyen, 2009). Fingerprint readers have been beaten by gummy candy and laser printers (Matsumoto, 2002). Expensive equipment may be easier to get than you anticipate, for example, graduate students often have access to million-dollar lab equipment. At the other end of the spectrum, most people would consider an attack that allows someone to steal a few dollars as not worth a lot of time. However, a billion people worldwide live on less than one dollar a day, and their lives would be improved by stealing just a few dollars. (This isn't to say that everyone living at that level of poverty would become a criminal given the opportunity, only that typical Western assessments of cost/benefit trade-offs are challenged by global networking.)


FAIR is the acronym for Factor Analysis of Information Risk, developed by Jack Jones while he was Chief Security Officer for a large bank. FAIR focuses on defining business risk associated with technology systems. FAIR defines a threat as “anything (e.g., object, substance, human) that is capable of acting against an asset in a manner that can result in harm.”

The primary use of FAIR is for systems that a business is deploying, regardless of the source of the components (off the shelf or local). It defines risk as a function of loss event frequency and probable loss magnitude. Each of these is decomposed further, as shown inFigure 9.4.


Figure 9.4 FAIR's risk decomposition

FAIR has 10 defined steps in four stages:

Stage 1: Identify scenario components

1. Identify the asset at risk.

2. Identify the threat community under consideration.

Stage 2: Evaluate loss event frequency

1. Estimate the probable threat event frequency.

2. Estimate the threat capability.

3. Estimate control strength.

4. Derive vulnerability.

5. Derive loss event frequency.

Stage 3: Estimate probable loss magnitude

1. Estimate worst-case loss.

2. Estimate probable loss.

Stage 4: Derive and articulate risk

1. Derive and articulate risk.

The document “An Introduction to FAIR” (Jones, 2006) presents FAIR as an approach to risk management in business. The FAIR white paper starts out at what you might see as a very philosophical level. If you find that frustrating, consider jumping to page 64 of that introductory paper, where FAIR is presented in a more concrete and compact fashion.

There are two issues to discuss regarding FAIR. The first is the way in which it assigns numbers to various elements of risk. The white paper acknowledges these in a remarkably frank discussion in the conclusions. It's worth reiterating that without incident data, FAIR is a repeatable way to get to the same results, but the results are hard to compare to results for other systems. The second issue is FAIR's opening steps, and the asset- and attacker-centricity of the system. An asset is defined as “any data, device, or other component of the environment that supports information-related activities, and which can be affected in a manner that results in loss.” This broad definition implies a remarkable amount of work, as FAIR analysis is run on each asset. This is probably ameliorated by an intuitive ranking approach. FAIR at least provides an “example” set of threat communities, but it does not address the effort needed to analyze their behaviors, nor the risks in getting that analysis wrong. FAIR is probably the best of the approaches for quantifying risk. My lovely editor would like to know if that means it's worth using, and so would I. More seriously, if your organization relies on quantified risk assessments, FAIR has many things to recommend it, but if a simpler approach, such as a bug bar will work, that's probably a better return on investment.

Mitigation via Risk Acceptance

As discussed in the section “Classic Strategies for Risk Management,” it is perfectly reasonable to address risk via risk acceptance, and there are two ways that is commonly done: either via business risk acceptance or via user risk acceptance. The following sections describe both of these.

Mitigation via Business Acceptance

If an organization is building software for its own use, it is free to make whatever risk acceptance decisions it chooses. For example, if your inventory site exposes your product inventory to the world, that's a choice you can reasonably make, by applying whatever risk approach works for you.

In a number of cases, a business may choose to take into account other perspectives beyond its own in risk acceptance decisions. Those cases include privacy and “fitness for purpose,” a term borrowed from the lawyers to mean something that's good enough to do the job it's intended to serve.

If the software involves personal, private information, then the risk acceptance must take into account the myriad laws that apply to such things. Many of those laws require something like “appropriate security.” This constrains the decisions that the business can make regarding risk. Even if those laws do not apply, losing all your customer data may enable competitors to use that data to market to your customers, or it may upset your customers. Lastly, it may have an impact on your reputation.

Fitness for purpose is the other element that may influence a business's willingness to accept risk. If your system is being sold with the expectation that it can connect to the Internet, then there may be a reasonable expectation that it's sufficiently secure for that purpose. If it's being sold for medical use, “critical infrastructure,” or similar areas, there may be substantial additional expectations. In these types of cases, the business cannot silently accept the risk; at the very least the risk acceptance decisions must be communicated to customers. (I'm not a lawyer, but if you decide not to communicate such risks, I suspect you'll get to know some very unfriendly lawyers.)

Mitigation via User Acceptance

There are times when a software developer or systems administrator can't make a decision for the user. For example, there may be a business reason to visit (For instance, consider an investment bank. No reason to allow adult content, right? Except when Playboy's CEO visits to pitch her stock to the bank's analysts, she'll want to demo their new digital media line of business.) There may be a reason to have viruses on a system. Or maybe there's something that is “obviously” a security error, but the user wants to accept the risk involved.

If you need to warn someone about a potential risk, ensure that such warnings are NEAT: necessary, explanatory, actionable, and tested. These four relatively simple steps work well for designing warnings (or improving existing ones), and NEAT is covered in more detail in Chapter 15, “Human Factors and Usability.”

When authenticating, ensure that the authentication is two-way. If any details have changed since the last authentication, inform the user about what's different. This requires persisting some information, explaining it to the user, and walking them through evaluating it.

Arms Races in Mitigation Strategies

An arms race describes the predictable set of steps that both attackers and defenders engage in that leave both sides approximately where they were at the start of the arms race, only poorer. A classic example is signature-driven anti-virus software. Such software is only as good as its latest update. There will almost always be viruses that the signature authors have not yet discovered, or for which the signatures have not been tested, shipped, or applied.

It should perhaps go without saying that such arms races are to be avoided, but because they are frequent it's worth a few words on why arms races happen, and what you can do should you find it hard to avoid one.

Arms races happen because some factor makes a perfect defense hard. For example, it turns out that bolting on security defenses that block only what viruses do is nearly impossible. Commercial operating systems support a wide variety of behavior, and legitimate programs make use of those defined behaviors, sometimes in ways that are hard to distinguish from malicious use. Therefore, heuristic modules in anti-virus programs have a high rate of false positives. Similarly, whitelisting of programs in advance is an excellent defensive technique, but one that turns out to be very inhibitory to the normal use of computers.

When you find yourself in an arms race, you are playing an economic game, and your goal should be to both minimize your cost while maximizing your profit, while simultaneously maximizing your opponent's cost and minimizing their profit. Your costs go up when you must scramble, and your profits will be maximized the longer your opponent spends time reacting to your latest moves. This leads to two related strategies: Aim for the last-mover advantage and have a bag of tricks.

Last-mover advantage is a term credited to cryptographer extraordinaire Paul Kocher (Kocher, 2006). It refers to the idea that the last side to take action has an advantage. Further, your moves should be designed, in part, to flummox the other side, and make it tricky for them to respond. Therefore, here, the use of obfuscation or anti-debugging techniques can pay dividends. Having a system that's designed to roll forward to a new configuration can also help, and here's where a bag of tricks comes in. A bag of tricks is a set of moves in the arms race that are already coded and tested. When you detect that your opponent has taken a new move, you deploy something from your bag of tricks. For example, if you have a DRM scheme to prevent people from using your music files as they choose, you might have a set of additional restrictions coded up and ready to ship as attackers break your current scheme. This enables you to maximize how long you have the last-mover advantage.


There are many strategies you can apply to risk management. The classic risk management strategies of avoid, address, accept, or transfer are applicable to how you address threats.

More specific to threats and security are the approaches of changing the design, applying standard mitigations, and designing custom mitigations. It's expensive and time consuming to test your custom mitigation, and easy to get the design wrong, so custom mitigations should be a fallback. The department of “easy to get wrong” also includes fuzzing, which is more appropriately seen as a test technique. (It's a great technique, but you can't fuzz your way to a secure design or implementation.)

Sometimes you need to prioritize your mitigation approaches, and there's a variety of ways to do so, ranging from simple ones like wait and see (and the associated actions to ensure you do see) to bug bars and quantified risk management approaches such as FAIR.

Now and then you need to accept risks, or ask others to do so. When you need to accept risk, you should ensure that you're doing so with some structure and not using “accepted risk” as a synonym for ignoring it. When you need to ask others to accept risk, you should do so clearly, and the NEAT approach can help you do so.

Sometimes arms races are hard to avoid. If you do find yourself there, there are some strategies to drive your opponent's costs up while keeping yours low. You want to aim for a last-mover advantage that forces your opponent to scramble while you relax. That's what good risk management can get you.