Threat Modeling: Designing for Security (2014)
Part II. Finding Threats
Chapter 6. Privacy Tools
Threat modeling for privacy issues is an emergent and important area. Much like security threats violate a required security property, privacy threats are where a required privacy property is violated. Defining privacy requirements is a delicate balancing act, however, for a few reasons: First, the organization offering a service may want or even need a lot of information that the people using the service don't want to provide. Second, people have very different perceptions of what privacy is, and what data is private, and those perceptions can change with time. (For example, someone leaving an abusive relationship should be newly sensitive to the value of location privacy, and perhaps consider their address private for the first time.) Lastly, most people are “privacy pragmatists” and will make value tradeoffs for personal information.
Some people take all of this ambiguity to mean that engineering for privacy is a waste. They're wrong. Others assert that concern over privacy is a waste, as consumers don't behave in ways that expose privacy concerns. That's also wrong. People often pay for privacy when they understand the threat and the mitigation. That's why advertisements for curtains, mailboxes, and other privacy-enhancing technologies often lead with the word “privacy.”
Unlike the previous three chapters, each of which focused on a single type of tool, this chapter is an assemblage of tools for finding privacy threats. The approaches described in this chapter are more developed than “worry about privacy,” yet they are somewhat less developed than security attack libraries such as CAPEC (discussed in Chapter 5, “Attack Libraries”). In either event, they are important enough to include. Because this is an emergent area, appropriate exit criteria are less clear, so there are no exit criteria sections here.
In this chapter, you'll learn about the ways to threat model for privacy, including Solove's taxonomy of privacy harms, the IETF's “Privacy Considerations for Internet Protocols,” privacy impact assessments (PIAs), the nymity slider, contextual integrity, and the LINDDUN approach, a mirror of STRIDE created to find privacy threats. It may be reasonable to treat one or more of contextual integrity, Solove's taxonomy or (a subset of) LINDDUN as a building block that can snap into the four-stage model, either replacing or complementing the security threat discovery.
Many of these techniques are easier to execute when threat modeling operational systems, rather than boxed software. (Will your database be used to contain medical records? Hard to say!) The IETF process is more applicable than other processes to “boxed software” designs.
Solove's Taxonomy of Privacy
In his book, Understanding Privacy (Harvard University Press, 2008), George Washington University law professor Daniel Solove puts forth a taxonomy of privacy harms. These harms are analogous to threats in many ways, but also include impact. Despite Solove's clear writing, the descriptions might be most helpful to those with some background in privacy, and challenging for technologists to apply to their systems. It may be possible to use the taxonomy as a tool, applying it to a system under development, considering whether each of the harms presented is enabled. The following list presents a version of this taxonomy derived from Solove, but with two changes. First, I have added “identifier creation,” in parentheses. I believe that the creation of an identifier is a discrete harm because it enables so many of the other harms in the taxonomy. (Professor Solove and I have agreed to disagree on this issue.) Second, exposure is in brackets, because those using the other threat modeling techniques in this Part should already be handling such threats.
§ (Identifier creation)
§ Information collection: surveillance, interrogation
§ Information processing: aggregation, identification, insecurity, secondary use, exclusion
§ Information dissemination: breach of confidentiality, disclosure, increased accessibility, blackmail, appropriation, distortion, [exposure]
§ Invasion: intrusion, decisional interference
Many of the elements of this list are self-explanatory, and all are explained in depth in Solove's book. A few may benefit from a brief discussion. The harm of surveillance is twofold: First is the uncomfortable feeling of being watched and second are the behavioral changes it may cause. Identification means the association of information with a flesh-and-blood person. Insecurity refers to the psychological state of a person made to feel insecure, rather than a technical state. The harm of secondary use of information relates to societal trust. Exclusion is the use of information provided to exclude the provider (or others) from some benefit.
Solove's taxonomy is most usable by privacy experts, in the same way that STRIDE as a mnemonic is most useful for security experts. To make use of it in threat modeling, the steps include creating a model of the data flows, paying particular attention to personal data.
Finding these harms may be possible in parallel to or replacing security threat modeling. Below is advice on where and how to focus looking for these.
§ Identifier creation should be reasonably easy for a developer to identify.
§ Surveillance is where data is collected about a broad swath of people or where that data is gathered in a way that's hard for a person to notice.
§ Interrogation risks tend to correlate around data collection points, for example, the many “* required” fields on web forms. The tendency to lie on such forms may be seen as a response to the interrogation harm.
§ Aggregation is most frequently associated with inbound data flows from external entities.
§ Identification is likely to be found in conjunction with aggregation or where your system has in-person interaction.
§ Insecurity may associate with where data is brought together for decision purposes.
§ Secondary use may cross trust boundaries, possibly including boundaries that your customers expect to exist.
§ Exclusion happens at decision points, and often fraud management decisions.
§ Information dissemination threats (all of them) are likely to be associated with outbound data flows; you should look for them where data crosses trust boundaries.
§ Intrusion is an in-person intrusion; if your system has no such features, you may not need to look at these.
§ Decisional interference is largely focused on ways in which information collection and processing may influence decisions, and as such it most likely plays into a requirements discussion.
Privacy Considerations for Internet Protocols
The Internet Engineering Task Force (IETF) requires consideration of security threats, and has a process to threat model focused on their organizational needs, as discussed in Chapter 17, “Bringing Threat Modeling to Your Organization.” As of 2013, they sometimes require consideration of privacy threats. An informational RFC “Privacy Considerations for Internet Protocols,” outlines a set of security-privacy threats, a set of pure privacy threats, and offers a set of mitigations and some general guidelines for protocol designers (Cooper, 2013). The combined security-privacy threats are as follows:
§ Stored data compromise
§ Mis-attribution or intrusion (in the sense of unsolicited messages and denial-of-service attacks, rather than break-ins)
The privacy-specific threats are as follows:
§ Secondary use
§ Exclusion (users are unaware of the data that others may be collecting)
Each is considered in detail in the RFC. The set of mitigations includes data minimization, anonymity, pseudonymity, identity confidentiality, user participation and security. While somewhat specific to the design of network protocols, the document is clear, free, and likely a useful tool for those attempting to threat model privacy. The model, in terms of the abstracted threats and methods to address them, is an interesting step forward, and is designed to be helpful to protocol engineers.
Privacy Impact Assessments (PIA)
As outlined by Australian privacy expert Roger Clarke in his “An Evaluation of Privacy Impact Assessment Guidance Documents,” a PIA “is a systematic process that identifies and evaluates, from the perspectives of all stakeholders, the potential effects on privacy of a project, initiative, or proposed system or scheme, and includes a search for ways to avoid or mitigate negative privacy impacts.” Thus, a PIA is, in several important respects, a privacy analog to security threat modeling. Those respects include the systematic tools for identification and evaluation of privacy issues, and the goal of not simply identifying issues, but also mitigating them. However, as usually presented, PIAs have too much integration between their steps to snap into the four-stage framework used in this book.
There are also important differences between PIAs and threat modeling. PIAs are often focused on a system as situated in a social context, and the evaluation is often of a less technical nature than security threat modeling. Clarke's evaluation criteria include things such as the status, discoverability, and applicability of the PIA guidance document; the identification of a responsible person; and the role of an oversight agency; all of which would often be considered out of scope for threat modeling. (This is not a critique, but simply a contrast.) One sample PIA guideline from the Office of the Victorian Privacy Commissioner states the following:
“Your PIA Report might have a Table of Contents that looks something like this:
1. Description of the project
2. Description of the data flows
3. Analysis against ‘the’ Information Privacy Principles
4. Analysis against the other dimensions to privacy
5. Analysis of the privacy control environment
6. Findings and recommendations”
Note that step 2, “description of the data flows,” is highly reminiscent of “data flow diagrams,” while steps 3 and 4 are very similar to the “threat finding” building blocks. Therefore, this approach might be highly complementary to the four-step model of threat modeling.
The appropriate privacy principles or other dimensions to consider are somewhat dependent on jurisdiction, but they can also focus on classes of intrusion, such as those offered by Solove, or a list of concerns such as informational, bodily, territorial, communications, and locational privacy. Some of these documents, such as those from the Office of the Victorian Privacy Commissioner (2009a), have extensive lists of common privacy threats that can be used to support a guided brainstorming approach, even if the documents are not legally required. Privacy impact assessments that are performed to comply with a law will often have a formal structure for assessing sufficiency.
The Nymity Slider and the Privacy Ratchet
University of Waterloo professor Ian Goldberg has defined a measurement he calls nymity, the “amount of information about the identity of the participants that is revealed [in a transaction].” Nymity is from the Latin for name, from which anonymous (“without a name”) and pseudonym (“like a name”) are derived. Goldberg has pointed out that you can graph nymity on a continuum (Goldberg, 2000). Figure 6.1 shows the nymity slider. On the left-hand side, there is less privacy than on the right-hand side. As Goldberg points out, it is easy to move towards more nymity, and extremely difficult to move away from it. For example, there are protocols for electronic cash that have most of the privacy-preserving properties of physical cash, but if you deliver it over a TCP connection you lose many of those properties. As such, the nymity slider can be used to examine how privacy-threatening a protocol is, and to compare the amount of nymity a system uses. To the extent that it can be designed to use less identifying information, other privacy features will be easier to achieve.
Figure 6.1 The nymity slider
When using nymity privacy in threat modeling, the goal is to measure how much information a protocol, system, or design exposes or gathers. This enables you to compare it to other possible protocols, systems, or designs. The nymity slider is thus an adjunct to other threat-finding building blocks, not a replacement for them.
Closely related to nymity is the idea of linkability. Linkability is the ability to bring two records together, combining the data in each into a single record or virtual record. Consider several databases, one containing movie preferences, another containing book purchases, and a third containing telephone records. If each contains an e-mail address, you can learn that firstname.lastname@example.org likes religious movies, that he's bought books on poison, and that several of the people he talks with are known religious extremists. Such intersections might be of interest to the FBI, and it's a good thing you can link them all together! (Unfortunately, no one bothered to include the professional database showing he's a doctor, but that's beside the point!) The key is that you've engaged in linking several datasets based on an identifier. There is a set of identifiers, including e-mail addresses, phone numbers, and government-issued ID numbers, that are often used to link data, which can be considered strong evidence that multiple records refer to the same person. The presence of these strongly linkable data points increases linkability threats.
Linkability as a concept relates closely to Solove's concept of identification and aggregation. Linkability can be seen as a spectrum from strongly linkable with multiple validated identifiers to weakly linkable based on similarities in the data.(“John Doe and John E. Doe is probably the same person.”) As data becomes richer, the threat of linkage increases, even if the strongly linkable data points are removed. For example, Harvard professor Latanya Sweeney has shown how data with only date of birth, gender, and zip code uniquely identifies 87 percent of the U.S. population (Sweeney, 2002). There is an emergent scientific research stream into “re-identification” or “de-anonymization,” which discloses more such results on a regular basis. The release of anonymous datasets carries a real threat of re-identification, as AOL, Netflix, and others have discovered. (McCullagh, 2006; Narayanan, 2008; Buley, 2010).
Contextual integrity is a framework put forward by New York University professor Helen Nissenbaum. It is based on the insight that many privacy issues occur when information is taken from one context and brought into another. A context is a term of art with a deep grounding in discussions of the spheres, or arenas, of our lives. A context has associated roles, activities, norms, and values. Nissenbaum's approach focuses on understanding contexts and changes to those contexts. This section draws very heavily from Chapter 7 of her book Privacy in Context, (Stanford Univ. Press, 2009) to explain how you might apply the framework to product development.
Start by considering what a context is. If you look at a hospital as a context, then the roles might include doctors, patients, and nurses, but also family members, administrators, and a host of other roles. Each has a reason for being in a hospital, and associated with that reason are activities that they tend to perform there, norms of behavior, and values associated with those norms and activities.
Contexts are places or social areas such as restaurants, hospitals, work, the Boy Scouts, and schools (or a type of school, or even a specific school). An event can be “in a work context” even if it takes place somewhere other than your normal office. Any instance in which there is a defined or expected set of “normal” behaviors can be treated as a context. Contexts nest and overlap. For example, normal behavior in a church in the United States is influenced by the norms within the United States, as well as the narrower context of the parishioners. Thus, what is normal at a Catholic Church in Boston or a Baptist Revival in Mississippi may be inappropriate at a Unitarian Congregation in San Francisco (or vice versa). Similarly, there are shared roles across all schools, those of student or teacher, and more specific roles as you specify an elementary school versus a university. There are specific contexts within a university or even the particular departments of a university.
Contextual integrity is violated when the informational norms of a context are breached. Norms, in Nissenbaum's sense, are “characterized by four key parameters: context, actors, attributes, and transmission principles.” Context is roughly as just described. Actors are senders, recipients, and information subjects. Attributes refer to the nature of the information—for example, the nature or particulars of a disease from which someone is suffering. A transmission principle is “a constraint on the flow (distribution, dissemination, transmission) of information from party to party.” Nussbaum first provides two presentations of contextual integrity, followed by an augmented contextual integrity heuristic. As the technique is new, and the “augmented” approach is not a strict superset of the initial presentation, it may help you to see both.
Contextual Integrity Decision Heuristic
Nissenbaum first presents contextual integrity as a post-incident analytic tool. The essence of this is to document the context as follows:
1. Establish the prevailing context.
2. Establish key actors.
3. Ascertain what attributes are affected.
4. Establish changes in principles of transmission.
5. Red flag
Step 5 means “if the new practice generates changes in actors, attributes, or transmission principles, the practice is flagged as violating entrenched informational norms and constitutes a prima facie violation of contextual integrity.” You might have noticed a set of interesting potential overlaps with software development and threat modeling methodologies. In particular, actors overlap fairly strongly with personas, in Cooper's sense of personas (discussed in Appendix B, “Threat Trees”). A contextual integrity analysis probably does not require a set of personas for bad actors, as any data flow outside the intended participants (and perhaps some between them) is a violation. The information transmissions, and the associated attributes are likely visible in data flow or swim lane diagrams developed for normal security threat modeling.
Thus, to the extent that threat models are being enhanced from version to version, a set of change types could be used to trigger contextual integrity analysis. The extant diagram is the “prevailing context.” The important change types would include the addition of new human entities or new data flows.
Nissenbaum takes pains to explore the question of whether a violation of contextual integrity is a worthwhile reason to avoid the change. From the perspective of threat elicitation, such discussions are out of scope. Of course, they are in scope as you decide what to do with the identified privacy threats.
Augmented Contextual Integrity Heuristic
Nissenbaum also presents a longer, ‘augmented’ heuristic, which is more prescriptive about steps, and may work better to predict privacy issues.
1. Describe the new practice in terms of information flows.
2. Identify the prevailing context.
3. Identify information subjects, senders, and recipients.
4. Identify transmission principles.
5. Locate applicable norms, identify significant changes.
6. Prima facie assessment
a. Consider moral and political factors.
b. Identify threats to autonomy and freedom.
c. Identify effects on power structures.
d. Identify implications for justice, fairness, equality, social hierarchy, democracy and so on.
8. Evaluation 2
a. Ask how the system directly impinges on the values, goals, and ends of the context.
b. Consider moral and ethical factors in light of the context.
This is, perhaps obviously, not an afternoon's work. However, in considering how to tie this to a software engineering process, you should note that steps 1, 3, and 4 look very much like creating data flow diagrams. The context of most organizations is unlikely to change substantially, and thus descriptions of the context may be reusable, as may be the work products to support the evaluations of steps 7 and 8.
Perspective on Contextual Integrity
I very much like contextual integrity. It strikes me as providing deep insight into and explanations for a great number of privacy problems. That is, it may be possible to use it to predict privacy problems for products under design. However, that's an untested hypothesis. One area of concern is that the effort to spell out all the aspects of a context may be quite time consuming, but without spelling out all the aspects, the privacy threats many be missed. This sort of work is challenging when you're trying to ship software and Nissenbaum goes so far as to describe it as “tedious” (Privacy In Context, page 142). Additionally, the act of fixing a context in software or structured definitions may present risks that the fixed representation will deviate as social norms evolve.
This presents a somewhat complex challenge to the idea of using contextual integrity as a threat modeling methodology within a software engineering process. The process of creating taxonomies or categories is an essential step in structuring data in a database. Software engineers do it as a matter of course as they develop software, and even those who are deeply cognizant of taxonomies often treat it as an implicit step. These taxonomies can thus restrict the evolution of a context—or worse; generate dissonance between the software-engineered version of the context or the evolving social context. I encourage security and privacy experts to grapple with these issues.
LUNDDUN is a mnemonic developed by Mina Deng for her PhD at the Katholieke Universiteit in Leuven, Belgium (Deng, 2010). LINDDUN is an explicit mirroring of STRIDE-per-element threat modeling. It stands for the following violations of privacy properties:
§ Disclosure of information
§ Content Unawareness
§ Policy and consent Noncompliance
LINDDUN is presented as a complete approach to threat modeling with a process, threats, and requirements discovery method. It may be reasonable to use the LINDDUN threats or a derivative as a tool for privacy threat enumeration in the four-stage framework, snapping it either in place of or next to STRIDE security threat enumeration. However, the threats in LINDDUN are somewhat unusual terminology; therefore, the training requirements may be higher, or the learning curve steeper than other privacy approaches.
LINDDUN leaves your author deeply conflicted. The privacy terminology it relies on will be challenging for many readers. However, it is, in many ways, one of the most serious and thought-provoking approaches to privacy threat modeling, and those seriously interested in privacy threat modeling should take a look. As an aside, the tension between non-repudiation as a privacy threat and repudiation as a security threat is delicious.
Privacy is no less important to society than security. People will usually act to protect their privacy given an understanding of the threats and how they can address them. As such, it may help you to look for privacy threats in addition to security threats. The ways to do so are less prescriptive than ways to look for security threats.
There are many tools you can use to find privacy issues, including Solove's taxonomy of privacy harms. (A harm is a threat with its impact.) Solove's taxonomy helps you understand the harm associated with a privacy violation, and thus, perhaps, how best to prioritize it. The IETF has an approach to privacy threats for new Internet protocols. That approach may complement or substitute Privacy Impact Assessments. PIAs and the IETF's processes are appropriate when a regulatory or protocol design context calls for their use. Both are more prescriptive than the nymity slider, a tool for assessing the amount of personal information in a system and measuring privacy invasion for comparative purposes. They are also more prescriptive than contextual integrity, an approach which attempts to tease out the social norms of privacy. If your goal is to identify when a design is likely to raise privacy concerns, however, then contextual integrity may be the most helpful. Far more closely related to STRIDE-style threat identification is LINDDUN, which considers privacy violations in the manner that STRIDE considers security violations.