How Technology Models Operationalize Threat Data - How to Define and Build an Effective Cyber Threat Intelligence Capability (2015)

How to Define and Build an Effective Cyber Threat Intelligence Capability (2015)

Chapter 6. How Technology Models Operationalize Threat Data

Abstract

Before going out and investing in a cyber-threat capability, we stress the importance of developing an architectural plan to support the mission activities and the type of intelligence needed. This chapter covers to pros and cons of the “build everything” model, the “off-the-shelf” model and final option of combining both ideas.

Keywords

big data

SIM information

log analysis

DLP alerts

After going over the “why” aspect, you were also introduced to a business’ objective to drive a “what” – which is a set of mission activities, and the intelligence or data needs you should have to support activities where the data is sourced internally, externally, or both. In other words, we are now through “why” and “what.” Now, you are much better positioned to look at how to turn all of this information into operational intelligence from the data.

image

There are lots of ways organizations are trying to turn information into actionable intelligence, or mold it into tools people can actually use. You can ingest data feeds into firewalls, gateways, or other appliances of various types. You can ingest text data into a searchable index. You can look at packages to normalize, visualize, and store lots of data. As of this writing, there are many options to do some of what is needed, but in nearly all cases we have seen, the “solution” has been cobbled together from a range of off-the-shelf and custom parts.

One reason solutions are still largely “home brewed” is the volume and company-specific peculiarities of the data. To handle the so-called “big data”, companies evaluate technologies like Hadoop, SOLR, Elasticsearch, MongoDB, or whatever their choice of database for unstructured, semi-structured, and messy data may be. They study visualization and analytics tools from the free open-source libraries like Arbor to extremely powerful, and costly, commercial packages such as Spunk, i2, and Palantir. Some folks refer to the systems that result as “Frankenstein Boxes” since they often began life as a standard off-the-shelf product until the security and IT teams began bolting things onto their sides to try creating a more comprehensive system that actually does all the things its creators need it to do. What seems to be the clear wish is the ability to ingest and normalize internally sourced data such as network traffic, SIM information, log analysis, DLP alerts, etc. that are within the control of the staff and inside the perimeter, with external data such as feeds from intelligence providers, including human-readable written product and machine-readable data in formats such as STIX, XML, and JSON.

This book does not and will not advocate for one method or product over another. In fact, in a moment, you will see very much the opposite. What we can say at this point is that, if (as of this writing) there is an available product that does the type of intelligence support and correlation just described, the authors are unaware of it, and many companies are desperately seeking it. Regardless, whether you could buy such a solution, or have to build one out yourself, it is important to understand that before you go out and start buying things, there needs to be an architectural plan for how you are going to support those mission activities and the type of intelligence you just decided you need.

6.1. How- labor options or “how much do I do myself?”

In addition to technological implementation options, there is also the question of who is going to do the work. Some of the largest organizations that enjoy substantial budgets and executives buy-in on the importance of threat intelligence, can dare to look at one extreme end of the spectrum: build everything they need themselves.

If your company fits this description, you can task or hire a whole bunch of people to select technologies, cobble them together, build the Frankenstein box, and then bring in a completely different group of people to run it, since it is rare for both coders and engineers to be experienced threat analysts and investigators. Some people will go for the absolute build method that entails cobbling together the technology, selecting all the vendors, ingesting all the data, normalizing it all, visualizing it, putting the analysts and investigators in front of it, and sourcing everything except the feeds in-house. That is a great model to get exactly what you want to do everything you ask it to do.

However, the “build everything” model has the best fit explicitly because it is absolutely tailored to your mission activities, your network environment, and your organization. It will provide exactly the level of reporting you want, because you are deciding what it does, and how it works, and you are building the whole thing yourselves. Unfortunately, this level of detail and comprehensiveness does not come easily. It demands the highest cost of ownership. It is also the hardest, longest, and slowest to build out, launch, and get value from. In addition, do not forget that you will need both the skills, and the people to build the system and those to use and operate it. As noted above, those are usually two different groups of people at the very least, a fact that comes with its share of complications.

At the opposite end of the spectrum, you can look for off-the-shelf components, feeds, tools, and/or analyst support from a contractor, or a security, or intelligence shop to outsource everything. In the simplest form, you can say, “Here is the output I require, just send me the finished product, whatever that is, so that I can take some kind of action, or modify my posture or my business operations.” They want the intelligence at the end of the production process from an outsourced provider, requiring them to build nothing at all themselves. This can be up and running very quickly. It neither requires that you hire nor have the necessary people or skills in-house. In addition, it lowers your cost of ownership. However, like anything bought off-the-shelf, you get what the market has, and it may not be everything you need.

The third option of course is to combine both ideas. Whatever options you are considering, it is imperative that you use some form of framework or rigorous construct to evaluate the pros, cons, and tradeoffs. Herein, just as an example, we provide a simple matrix to look at various axes of evaluation for differing models, a useful framework since it goes beyond the technological architecture and ensures you also look at who is going to do the work as well.

image

The planning stage has got to include both architectural and technological framework, but should also consider how you plan to stand up this capability in terms of labor. You should ask the right questions – Do we build it? Do we buy it? Do we bring it in? Do we contract it? Do we contract it while we get up and running, and then either bring the contractors on board or let them go, and replace them with our own people? These are all things you have to think about because they will all affect your timeline, your project plan, and your budget. If you cannot answer these questions, you cannot make the effective budget requests, which may mean that they never get off the ground at all.

Whether you use this framework or another of your choice is far less important than ensuring you use some rigorous methodology for evaluating the options, tradeoffs, and advantages that each option – from “build it all” to “farm it out” to “somewhere in between” – will offer.

6.2. Implementation – the best laid plans

While talking with your peers during the planning process, you may encounter some who scratch their heads and say, “Gee, I do not even know if this whole thing is really justified.” Some may know what to do and want to do it, but have no idea how. You are likely to find a small number who are already in their own planning phases or far beyond that, that is, in the build out phase. However far along you may or may not yet be, experience has shown that when implementation actually begins, there are some common landmines that tend to trip people up as they get going. Here is a very typical “speed bump” for example. Some people know what business objectives they want to accomplish. They have identified what mission activities they need to do operationally, as well as the types of data that they want to procure from various internal or external sources and vendors. In other words, they have got their “why” and their “what” pretty well sorted out. The thing they actually had not thought of is at the operational line level, or nuts and bolts, of how you plug these things into that thing over there. What is the box tool that you are going to get all of this stuff into to ensure the tool database platform’s ability to support data? In some cases, the system’s end users sign the subscriptions only to ingest them (and pay for them) for 6 months, and drop on the floor because they did not figure out ahead of time how to effectively manipulate or use the data, only to store it.

Another common sticking point is when groups start ingesting data feeds and spending money on them only to get stuck due to the dumbest of problems; the normalization of data. One user said that he did not realize for 45 days that the reason he and his team were not getting some of the correlations out of the engine they expected was because one vendor sent their offerings with a GMT timestamp whereas the others, all in California like the user, used Pacific-time-zone timestamps. The result was misalignment of data that prevented some important connections being made.

Similarly, “domain” and “host” are common fields in a wide variety of data feeds. But “host” does not always mean the same thing to every vendor. Some interpret “host” as synonymous with a second-level domain and a top-level domain, for example,domain.com. Other vendors provide it as www.domain.com, a problem because that is actually a host (www) within a domain (domain.com). Some would argue that “host” is actually synonymous with IP address, others point out that an IP can host many domains, each with many hostnames. On and on it goes, and if a process ends up affected as a result, the whole thing may grind to a halt. This normalization of disparate data sources is an extremely prosaic type of challenge, yet it is a sticking point that bunges up the works over and over again.

Another extremely common topic that was hinted at a little earlier is the question of skill sets. Do you have the know-how to both build the thing and run it? These two processes typically require two very different groups of people, so you definitely need to take this into consideration.

Finally, suppose everything is up and running. You are getting exactly the data that you need; you are transmogrifying it exactly the way you want, and you are actually getting good information out of the systems and people you have put in place. The question is: Do they actually know what to do when something bad happens? When the magic box and the team of geniuses discover that something has gone wrong, do they have the checklists, procedures, escalations, and phone trees to spring into action appropriately?

Going back to the definition of intelligence being actionable – if the professionals do not know what to do next and you have not thought through contingency plans or a “what to do” checklist ahead of time, then all the work up to that point is for naught. With a plan in hand, the organization will reap the benefits of all of the work that went into everything up to that moment. The experts have to know what to do when the unfortunate does occur. For example, do they unplug an infected machine from the network when they realize it is compromised? (An obvious choice, and one a nontechnical executive is almost sure to ask.) Or do they leave it running (despite the risk of data loss)? “Why in the world would you do that?” asks the executive. Answer: Because many sophisticated types of malware self-destruct when decoupled from the internet in order to frustrate forensic activities once they have been discovered. Unplug the box and you may lose all evidence of what happened or who is after you. There are many such thorny scenarios to think through – procedures, planning, and checklists are a critical component to ensure that once the capability is actually in operation, it delivers on its mission when the moment actually comes.