CERT Resilience Management Model: An Overview - INFORMATION SECURITY AND RISK MANAGEMENT - Information Security Management Handbook, Sixth Edition (2012)

Information Security Management Handbook, Sixth Edition (2012)

DOMAIN 3: INFORMATION SECURITY AND RISK MANAGEMENT

Security Management Planning

Chapter 11. CERT Resilience Management Model: An Overview

Bonnie A. Goins Pilewski and Christopher Pilewski

The CERT® Resilience Management Model (CERT-RMM) is a process model that seeks to improve the management of risk and maintain operational resilience for an organization. It does this by aligning the business continuity management and IT operations and security management disciplines. It also brings the concept of quality and process management into the organization. CERT defines quality as “the extent to which an organization controls its ability to operate in a mission-driven, complex risk environment [CMMI Product Team 2006].”

With the advent of RMM, the model seeks to present the disciplines above in a process approach, which allows the organization to apply process improvement mechanisms, as well as to develop a basis for metrics and measurement. As most security professionals have experienced within their careers, it is difficult at best to craft meaningful metrics for security implementation; as such, any tool that would assist in this capacity is very welcome indeed! It also provides a unified framework for organizing the work in the field that is performed within the organization. As is true with process maturity models, such as the Capability Maturity Model for Integration (CMMI), RMM provides a base for process institutionalization and organizational process maturity.

CERT-RMM v1.0 contains 26 process areas that cover four areas of operational resilience management: enterprise management, engineering, operations, and process management. The practices focus on the activities that an organization performs to actively direct, control, and manageoperational resilience. The model does not prescribe specifically how an organization should secure information. Instead, it focuses on identifying critical information assets, making decisions about the activities and controls required to protect and sustain these assets, implementing strategies to achieve asset control, and maintaining control throughout the life of the assets.

The process areas and theirs tags are represented in the Table 11.1.

The model is managed much the same as the CMMI and includes the following levels for measurements:

Level 0: Incomplete

Level 1: Performed

Level 2: Managed

Level 3: Defined

Levels 4 and 5: Quantitatively Managed and Optimizing

Table 11.1 Process Area Tags

Process Area

Tag

Asset Definition and Management

ADM

Access Management

AM

Communications

COMM

Compliance

COMP

Controls Management

CTRL

Environmental Control

EC

Enterprise Focus

EF

External Dependencies Management

EXD

Financial Resource Management

FRM

Human Resource Management

HRM

Identity Management

ID

Incident Management and Control

IMC

Knowledge and Information Management

KIM

Measurement and Analysis

MA

Monitoring

MON

Organizational Process Definition

OPD

Organizational Process Focus

OPF

Organizational Training and Awareness

OTA

People Management

PM

Risk Management

RISK

Resilience Requirements Development

RRD

Resilience Requirements Management

RRM

Resilient Technical Solution Engineering

RTSE

Service Continuity

SC

Technology Management

TM

Vulnerability Analysis and Resolution

VAR

As stated in the CERT-RMM Report, RMM includes: a process definition, expressed in capability areas across the four RMM framework competencies (enterprise management, engineering, operations management, and process management); focus on the resiliency of four essential operational assets (people, information, technology, and facilities); the inclusion of processes and practices that define a scale of five capability levels for each capability area (incomplete, performed, managed, directed, and continuously improved); and easily aligns with and references common codes of practice such as ISO27000, ITIL, COBIT, and others, such as BS25999 and ISO24762.

RMM also includes quantitative process metrics and measurements that can be used to ensure that operational resiliency processes perform as intended.

Key Components of the RMM

RMM capability areas define the resiliency engineering process. Each capability area has a set of goals. Goals are required elements of the capability area. An example of a goal from the Service Continuity (SC) capability area is “SC-1 Prepare for Service Continuity.” These goals are broken down into specific practices. Specific practices are considered to be the “base practices” of the capability. An example of a specific practice from the SC capability area is “SC-1.1 Plan for Service Continuity,” which is a practice aimed at completing the goal “SC-1 Prepare for Service Continuity.”

These practices are also broken down into subpractices. Subpractices are neither specific nor detailed, but help the user determine how specific practices are implemented and how this helps achieve the goals of the capability area. Each organization will have its own subpractices either organically developed by the organization or acquired from a code of practice.

Subpractices can be linked to common codes of practice. Subpractices are typically generic in nature, while codes of practice can be very specific. For example, a subpractice may suggest “set password standards and guidelines” while a specific code of practice may state that “passwords should be changed in no longer than 90 day intervals.”

Examples of common codes of practice are detailed next, as detailed in the RMM Report.

BS 25999

BS 25999 is the British Standards Institution’s (BSI’s) code of practice and specification for business continuity management. The purpose of the standard is to provide a basis for understanding, developing, and implementing business continuity within an organization and to provide confidence in the organization’s dealings with the customers and other organizations.

There are two BS 25999 documents: the Code of Practice, BS 25999-1:2006 [BSI 2006] and the specification, BS 25999-2: 2007 [BSI 2007].

COBIT

COBIT is the Control Objectives for Information and Related Technology [ITGI 2007]. It was developed by the Information Systems Audit and Control Association (ISACA) and the IT Governance Institute (ITGI) to provide managers, auditors, and IT users with generally accepted information technology control objectives to maximize IT benefits and ensure appropriate IT governance, security, and control.

References are also made to Val IT [ITGI 2006] in this document. Val IT is a reference framework that addresses the governance of IT-enabled business investments.

COSO Enterprise Risk Assessment

In 2004, the Committee of Sponsoring Organizations of the Treadway Commission (COSO) issued an enterprise risk management framework to help organizations enhance their corporate governance and risk management activities [COSO 2004]. The ERM integrated framework provides a broader risk management view that encompasses COSO’s original focus on internal controls.

CMMI

CMMI® 3 is a process improvement maturity model for the development of products and services. It has several constellations or areas of interest that provide application-specific models that share common content.

The CMMI for Development (CMMI-DEV) represents the systems and software development domain [CMMI 2006]. In addition, the CMMI for Services (CMMI-SVC) constellation is represented by a draft CMMI model designed to cover the activities required to manage, establish, and deliver services [SEI 2007].

DRJ/DRII Gap

The DRJ/DRII GAP (Generally Accepted Practices) is put forth jointly by the Disaster Recovery Journal (DRJ) and the Disaster Recovery Institute International [DRJ 2007]. GAP is a set of identified and documented standards and guidelines that aim to create a depository of knowledge by and for the business continuity profession. The practices are aligned with DRII’s 10 areas of professional practice, as detailed in the DRII Professional Practice Guidelines.

FFIEC

The Federal Financial Institutions Examination Council (FFIEC) publishes a series of booklets that comprise the FFIEC Information Technology Examination Handbook. These booklets are published to help bank examiners with evaluation of financial institutions and service provider risk management processes, with the goal being to ensure the availability of critical financial services.

ISO/IEC 20000-2: 2005 (E)

ISO/IEC 20000 is a standard and code of practice for IT service management published by the International Organization for Standardization and the International Electrotechnical Commission (ISO/IEC). It is based on (and supersedes) the earlier British Standard BS 15000. It reflects the best practice guidance for IT service management as provided in the ITIL (Information Technology Infrastructure Library) framework, but also broadly covers other service management standards.

ISO/IEC 24762: 2008

ISO/IEC 24762, “Guidelines for information and communications technology disaster recovery services” [ISO/IEC 2008], is part of the business continuity management standards published by ISO/IEC. It can be applied in-house or to outsourced providers of disaster recovery physical facilities and services.

ISO/IEC 27002: 2005

ISO/IEC 27002, “Code of Practice for Information Security Management” [ISO/IEC 2005b], is also published by ISO/IEC. It is part of a growing “27000 series” that evolved from the original British Standard BS 7799, which was translated to ISO standard ISO 17799.

NFPA 1600

NFPA 1600 is the National Fire Protection Agency Standard on Disaster/Emergency Management and Business Continuity Programs [NFPA 2007]. It is primarily focused on the development, implementation, and operation of disaster, emergency, and business continuity programs, including the development of various types of related plans. The 2007 edition of this standard was used for reference and is an update of the 2004 standard.

Crossmapping

Materials are available that demonstrate the relationship among existing standards, their constituent relationships, and the RMM framework.

Figure 11.1 illustrates the relationship of RMM to these bodies of knowledge.

Figure 11.1 Relationship of CERT-RMM to CMMI process areas and bodies of knowledge.

Process Areas

Table 11.2 represents the process areas of the RMM, by category.

These process areas also have equivalents in other process models, such as CMMI. This allows the user to align resiliency processes with ongoing work in the integration activities of the organization.

Table 11.2 Process Areas by Category

Category Process Area

Engineering Asset Definition and Management

Engineering Controls Management

Engineering Resilience Requirements Development

Engineering Resilience Requirements Management

Engineering Resilient Technical Solution Engineering

Engineering Service Continuity

Enterprise Management Communications

Enterprise Management Compliance

Enterprise Management Enterprise Focus

Enterprise Management Financial Resource Management

Enterprise Management Human Resource Management

Enterprise Management Organizational Training and Awareness

Enterprise Management Risk Management

Operations Access Management

Operations Environmental Control

Operations External Dependencies Management

Operations Identity Management

Operations Incident Management and Control

Operations Knowledge and Information Management

Operations People Management

Operations Technology Management

Operations Vulnerability Analysis and Resolution

Process Management Measurement and Analysis

Process Management Monitoring

Process Management Organizational Process Definition

Process Management Organizational Process Focus

The alignment between CMMI and RMM is represented in Table 11.3.

RMM also contains the generic goals and practices that the organization implements to improve its organizational processes and capability to manage its environment toward resiliency of its operations. These practices exhibit the organization’s commitment and ability to perform resilience management processes, as well as its ability to measure performance and verify implementation. Generic processes are detailed in the RMM for use where noted.

Table 11.3 CMMI to RMM Alignment

CMMI Models Process Areas

Equivalent CERT-RMM Process Areas

CAM – Capacity and Availability Management (CMMI-SVC only)

TM – Technology Management

CERT-RMM addresses capacity management from the perspective of technology assets. It does not address the capacity of services. Availability management is a central theme of CERT-RMM, significantly expanded from CMMI-SVC. Service availability is addressed in CERT-RMM by managing the availability requirement for people, information, technology, and facilities. Thus, the process areas that drive availability management include

RRD – Resilience Requirements Development (where availability requirements are established)

RRM – Resilience Requirements Management (where the life cycle of availability requirements is managed)

EC – Environmental Control (where the availability requirements for facilities are implemented and managed)

KIM – Knowledge and Information Management (where the availability requirements for information are implemented and managed)

IRP – Incident Resolution and Prevention (CMMI-SVC only)

IMC – Incident Management and Control. In CERT-RMM, IMC expands IRP to address a broader incident management system and incident life cycle at the asset level

MA – Measurement and Analysis

MA – Measurement and Analysis is carried over intact from CMMI. In CERT-RMM, MA is directly connected to MON

OPD – Organizational Process Definition

OPD – Organizational Process Definition is carried over from CMMI

OPF – Organizational Process Focus

OPF – Organizational Process Focus is carried over intact from CMMI

CMMI Models Process Areas

Equivalent CERT-RMM Process Areas

OT – Organizational Training

OTA – Organizational Training and Awareness. OT is expanded to include awareness activities in OTA

REQM – Requirements Management

RRM – Resilience Requirements Management. Basic elements of REQM are included in RRM, but the focus is on managing the resilience requirements for assets and services

RD – Requirements Development

RRD – Resilience Requirements Development Basic elements of RD are included in RRM

RSKM – Risk Management

RISK – Risk Management Basic elements of RSKM are reflected in RRM

SAM – Supplier Agreement

Management

EXD – External Dependencies Management In CERT-RMM, SAM is expanded to address all external dependencies

SCON – Service Continuity

(CMMI-SVC only)

SC – Service Continuity In CERT-RMM, SC is positioned as an operational risk management activity that addresses what is required to sustain assets and services balanced

TS – Technical Solution

RTSE – Resilient Technical Solution Engineering

The alignment of generic practice to process area and its subsequent implementation in the organization is given in Table 11.4.

Engineering Process

Given that aspects of operational resilience management are requirements-driven, process areas in the Engineering category represent those that are focused on establishing and implementing resilience for organizational assets and business processes. These processes establish the basic building blocks for resilience and create the foundation to protect and sustain the assets.

Engineering process areas fall into three broad categories:

Requirements Management addresses the development and management of the security (protect) and resilience (sustain) objectives for assets and services.

Asset Management establishes the important people, information, technology, and facilities as assets present in the organization.

Establishing and Managing Resilience addresses the selection, implementation, and management of preventive controls. In addition, it addresses the development and implementation of SC and impact management plans and programs. It also recommends the consideration of resilience for software and systems early in the development life cycle.

Table 11.4 Generic Process Mapped to Related Process Area

Generic Practice

Related Process Area

How the Process Area Helps Implement the Generic Practice

GG2.GP1

Establish process governance

Enterprise Focus

Enterprise Focus addresses the governance aspect of managing operational resilience. Mastery of the Enterprise Focus process area can help achieve GG2.GP1 in other process areas

GG2.GP3

Provide resources

Human Resource Management

Financial Resource Management

Human Resource Management ensures that resources have the proper skill sets and their performance is consistent over time. Financial Resource Management addresses the provision of other resources to the process, such as financial capital

GG2.GP4

Train people

Organizational Training and Awareness

Organizational Training and Awareness ensures that resources are properly trained

GG2.GP8

Monitor and control the process

Monitoring

Measurement and Analysis

Monitoring provides the structure and process for identifying and collecting relevant information for controlling processes. Measurement and Analysis provide general guidance about measuring, analyzing, and recording information that can be used in establishing measures for monitoring actual performance of the process [CMMI 2007]

GG2.GP10

Review status with higher level managers

Enterprise Focus

As part of the governance process, Enterprise Focus requires oversight of the resilience process including identifying corrective actions

GG3.GP1

Establish a defined process

Organizational Process Definition

Organizational Process Definition establishes the organizational process assets necessary to implement the generic practice [CMMI 2007]

GG3.GP2

Collect improvement information

Organizational Process Definition

Organizational Process Focus

Organizational Process Definition establishes the organizational process assets. Organizational Process Focus addresses the incorporation of experiences into the organizational process assets [CMMI 2007]

The Engineering process areas include:

Requirements Management

Resilience Requirements Development (RRD)

Resilience Requirements Management (RRM)

Asset Management

Asset Definition and Management (ADM)

Establishing and Managing Resilience

Controls Management (CTRL)

Resilient Technical Solution Engineering (RTSE)

Service Continuity (SC)

Operations

The Operations process areas represent the core activities for managing the operational resilience of assets and services during this life-cycle phase. These process areas are focused on maintaining an acceptable level of operational resilience as determined by the organization. These process areas represent core security, business continuity, and IT operations/service delivery management activities. Areas of focus include the resilience of people, information, technology, and facilities assets.

Operations process areas fall into three broad categories:

Supplier Management addresses the management of external dependencies and the potential impact on the organization’s operational resilience.

Threat, Vulnerability, and Incident Management addresses the organization’s continuous cycle of identifying and managing threats, vulnerabilities, and incidents to minimize organizational disruption.

Asset Resilience Management addresses the asset-level activities that the organization performs to manage operational resilience of people, information, technology, and facilities to ensure that business processes and services are sustained.

The Operations process areas are:

Supplier Management

External Dependency Management (EXD)

Threat and Incident Management

Access Management (AM)

Identity Management (ID)

Incident Management and Control (IMC)

Vulnerability Analysis and Resolution (VAR)

Model Relationships

To understand how the elements of the process model translate to relationships in the environment, a number of maps have been created. These maps are depicted below.

Figure 11.2 Enterprise-level relationships.

People

Figure 11.2 shows the CERT-RMM process areas that participate in managing the operational resilience of people. They establish people as an important asset in service delivery and ensure that people meet job requirements and standards, have appropriate skills, are appropriately trained, and have access to other assets as needed to do their jobs.

Information

Figure 11.3 shows the CERT-RMM process areas that drive the operational resilience management of information. Information is established as a key element in service delivery. Requirements for protecting and sustaining information are established and used by processes such as risk management, controls management, and SC planning.

Figure 11.3 Relationships that drive the resilience of people.

Technology

Figure 11.4 shows the CERT-RMM process areas that drive the operational resilience management of technology. These relationships address the complexities of software and systems resilience, as well as the resilience of architectures where the technology assets live, their development and acquisition processes, and other processes such as configuration management and capacity planning and management.

Facilities

Figure 11.5 shows the CERT-RMM process areas that drive the operational resilience management of facilities. As with information and technology assets, relationships that drive the resilience of facilities have special considerations, such as protecting facilities from disruption, ensuring that facilities are sustained, managing the environmental conditions of facilities, determining the dependencies of facilities on their geographical region, and planning for the decommissioning of a facility. Because facilities are often owned and managed by an external party, consideration must also be given to how external parties implement and manage the resilience of facilities at the direction of the organization (Figure 11.6).

Figure 11.4 Relationships that drive information resilience.

Understanding Capability Levels

Like the standards that were described earlier in this chapter, CERT-RMM is not a prescriptive model. Process improvement is unique to each organization and, as such, CERT-RMM provides the basic structure to allow organizations to chart their own specific improvement path using the model as the basis.

Improvement paths are defined in RMM by capability levels. Levels characterize improvement from a poorly defined state to a state where processes are characterized and used consistently across the organization.

Figure 11.5 Relationships that drive technology resilience.

To reach a particular level, an organization must satisfy all of the relevant goals of the process area (or a set of process areas), as well as the generic goals that apply to the specific capability level. The structure of the continuous representation for CERT-RMM is provided in Table 11.5.

Connecting Capabilities to Process Maturity

Capability levels describe the degree to which a process has been institutionalized. Likewise, the degree to which a process is institutionalized is defined by the generic goals and practices. Table 11.5 links capability levels to the progression of processes and generic goals.

The progression of capability levels and the degree of process adoption are characterized by the following descriptions.

Capability Level 0: Incomplete

An incomplete process is a process that either is not performed or is partially performed. This leads to one or more of the specific goals of the process area not being satisfied. [CMMI 2007].

Figure 11.6 Relationships that drive facilities.

Capability Level 1: Performed

Capability Level 1 characterizes a performed process. A performed process is a process that satisfies all of the specific goals of the process area. It also supports work needed to perform RMM practices as defined by the specific goals. Although achieving Capability Level 1 results in important improvements, those improvements can be lost over time if they are not adopted. [CMMI 2007].

Table 11.5 Capability Levels Related to Goals and Process Progression

0 N/A Incomplete or no process or partially performed process

1 GG1 Performed process

2 GG2 Managed process

3 GG3 Defined process

Capability Level 2: Managed

As stated in the RMM Manual, a Capability Level 2 process is characterized as a managed process. A managed process is a performed process that has the basic infrastructure in place to support the process. At this level, the process is planned and executed in accordance with policy.

Corrective actions are taken when the actual results and performance deviate significantly from the plan. A managed process achieves the objectives of the plan and is adopted for consistent performance. [CMMI 2007].

Organizations operating at this capability level should begin to know that they can achieve and sustain their resilience goals, regardless of changes in or when faced with emerging threats. Instead of shifting planning and practices for security and business continuity to address the next threat, the organization defines and refines its processes to address any risk that comes its way.

Capability Level 3: Defined

A Capability Level 3 process is characterized as a defined process. A defined process is a managed process that is tailored from the organization’s set of standard processes. The process also contributes work products, measures, and other process improvement information for use by all organizational units [CMMI 2007].

What does this ultimately mean to the organization? When business units operate with different goals, assumptions, and practices, it is difficult to ensure that the organization’s collective goals and objectives can be reached. This is particularly true with risk management. If the organization’s risk assumptions are not reflected consistently in security, continuity, and IT operations, the organization’s risk management process will not be effective and may actually deter operational resilience. At Capability Level 3, there is more consistency across units and improvements made by each organizational unit can be accessed and used by the organization as a whole. Another significant distinction at Capability Level 3 is that the processes are typically described more rigorously and managed more proactively than at Capability Level 2. [CMMI 2007].

To summarize, an organization that reaches higher capability levels in each process area arguably exhibits a higher degree of organizational maturity with regard to security, continuity, and IT resilience.

Informal Diagnosis

Examples of informal diagnostic methods for CERT-RMM include: meetings or exercises in which the people who are responsible for the practices in a given process area meet, review the guidance, and discuss the extent to which the organization’s practices achieve intent; reviews or analyses performed by a single person or a small group to compare the organization’s practices to the guidance; informal collection and review of evidence that demonstrates appropriate performance. These activities can be useful to guide informal process improvement activities or to provide information for scoping or setting capability level targets for a more formal process improvement project.

Analyzing Gaps

Diagnostic activities typically reveal gaps between the current and the required performance. Before plans are established to close any gaps, it is important to consider the identified gaps in the context of the overall improvement objectives. If it is determined that one or more of the identified gaps are acceptable to the organization, it is recommended that the organization revisit and update the objectives for improvement. This iterative approach is valuable to ensure that the organization spends improvement resources in the most productive manner.

Reference

Caralli, R. A., Allen, J. H., Curtis, P. D., White, D. W., Young, L. R. CERT® Resilience Management Model, Version 1: Improving Operational Resilience Processes. Technical Report: CMU/SEI-2010-TR-012: ESC-TR-2010-012, May 2010.