# Information Security Management Handbook, Sixth Edition (2012)

### DOMAIN 5: CRYPTOGRAPHY

*Cryptographic Concepts, Methodologies, and Practices*

### Chapter 19. Format Preserving Encryption

Ralph Spencer Poore

**Introduction**

When an application encrypts data using a standard block cipher (e.g., TDES, IDEA, or AES),*** or all but the special class of algorithms know as format preserving encryption (FPE), the process changes the format to that of a binary string, usually of fixed length. If a datum were a social security account number (SSAN), e.g., with a format that is **NNN-NN-NNNN**, an AES 128-bit electronic codebook mode (ECB) encryption might look (in hexadecimal) like **3E07D4719AF32558BC02411F931E51846**. If a preexisting database had a template for the SSAN (3 digits, a hyphen, 2 digits, a hyphen, 4 digits) taking up 12 bytes of storage, then the encrypted value (32-hexadecimal digits or 16 ASCII characters) would be wrong in length and type. This poses a challenge for legacy systems where a complete redesign of data structures is not a business’ first choice. In some applications, the data must pass through intervening service providers who have standardized on a message format that does not allow changes. In both of these cases, to protect the data through encryption requires a mechanism that does not violate the format requirements. FPE becomes the obvious choice. However, not all encryption techniques that preserve formatting are equally secure.

In *Figure 19.1*, an overview of the generic FPE process is shown. The plaintext datum and the ciphertext datum are identical in length and data type and may be identical in all aspects of format depending on the implementation. The key and (in some implementations) additional variable input are also used in the cipher process.

Conceptually, it is a reversible mapping of a plaintext datum to a ciphertext datum of the same format. Because, in theory, the cryptographic algorithm used in the cipher process could be any keyed, reversible algorithm, an almost limitless number of schemes are possible that would preserve format. However, determining the security afforded by a given scheme is a Herculean (and for some, a Sisyphean!) task. For information security practitioners who are not crypto-mathematicians, relying on published standards and well-vetted implementations is essential.

**Figure 19.1 Overview of an FPE.**

**The Security Issues**

To understand the security issues, consider the protection of a primary account number (PAN). Most of these are now standardized as 16 numeric digits in the format depicted in *Figure 19.2*.

The first six digits form the bank identification number (BIN), which is composed of the major industry identifier (MII) and the issuer identification number (IIN). The next 5–12 digits (usually 9) is an account number (AC), and the last digit is a check digit based on the Luhn formula (LCD). The international standard *ISO 7812 Identification Cards—Identification of Issuers* (ISO/IEC, 2007) defines this in detail.

If the only formatting requirement were that the encrypted result must be 16 numeric digits, a direct cryptographic transformation would have 10^{16} possible inputs and 10^{16} possible outputs. However, as shown in *Figure 19.2*, the formatting rules are potentially more restrictive. For example, the BIN may have to remain cleartext to permit routing through intermediate networks. And the LCD may have to remain valid to prevent rejection by intermediate networks. This would result in only 10^{9} possible inputs and therefore outputs. 1,000,000,000 possibilities is a tiny fraction of the cryptographic key space for AES 128, which is 2^{128}>10^{38}, or 29 orders of magnitude above the billion possible outputs. An SSAN is another example that has, at best, the same 1 billion possibilities. Other sensitive data, e.g., authentication data, which includes personal identification numbers, card validation codes, and expiration dates, may have only three or four digits. Preserving format could so limit the input-to-output mapping that a cryptanalyst would no longer face the more daunting task of key space exhaustion.

**Figure 19.2 PAN.**

**Sample Implementation**

This problem of FPE has received many years of analysis. Lead researchers have included Mihir Bellare, John Black, Michael Brightwell, Moses Liskov, Michael Luby, Stefan Lucks, John Kelsey, Patarin, Charles Rackoff, Thomas Ristenpart, Ronald L. Rivest, Phillip Rogaway, Bruce Schneier, Harry E. Smith, Terence Spies, Till Stegers, and David Wagner. The Bibliography at the end of this chapter provides references used in the development of this chapter and is an excellent source of technical details for those who wish to have a more in-depth understanding.

The theoretical basis for some FPE is well described (if you are a mathematician) in the following writings:

*Ciphers with Arbitrary Finite Domains* by Black and Rogaway^{1}

*Format-Preserving Encryption* by Bellare et al.^{2}

*The FFX Mode of Operation for Format-Preserving Encryption* by Bellare et al.^{3}

*Format-Preserving Encryption: A Survey and Assessment* (draft) by Rogaway^{4}

Two broader treatments directly related to the Payment Card Industry (PCI) are

*FPE for the PCI DSS and VISA Best Practices—Data Field Encryption*^{5}

*Tokenization and Other Methods of Security for Cardholder Data*^{6}

The FFX mode is an enhancement of the Feistel Finite Set Encryption Mode (FFSEM)^{7} currently under consideration by NIST as an approved encryption mode for government use. FFSEM is a submode of FFX. Formal proofs are based on work by Luby–Rackoff,^{8} Black–Rogaway,^{1} and Patarin.^{9} Additional work in this area by Schneier and Kelsey^{10} and by Lucks^{11} document block-cipher design related to Feistel networks. The body of work supporting the mathematical basis for the FFX mode spans more than two decades.

The mathematical basis for FFX relates to two important cryptographic criteria:

1. The extent to which the output is indistinguishable from a random number generator.

2. The extent to which knowledge of previous plaintext/ciphertext pairs gives no knowledge of either ciphertext from another plaintext or plaintext from another ciphertext.

FFX is based on AES, which is a known, good deterministic random number generator. The evaluation of its outputs utilizing statistical tests does not provide any means by which to computationally distinguish it from truly random sources.^{12} This remains true for FFX over the domain used by the input and output sets (e.g., digits).

Another approach to format preservation was described in *Using Datatype-Preserving Encryption to Enhance Data Warehouse Security* by Brightwell and Smith.^{13} This uses indexing and shuffling based on a symmetric key and both cipher-feedback and data-feedback mechanisms. The result is a complex mapping within the domain of the datum’s format, e.g., digits, alphabetic characters, or printable ASCII. It is not based on the same mathematical foundation as FFSEM or FFX. As such, the mathematical proofs of security that apply to Feistel-based FPE may not apply to datatype-preserving encryption (DPE).

Similar to FFX, a generic format-preserving symmetric encryption algorithm designated BPS by Brier et al.^{14} can cipher short or long strings of characters from any given set. Unlike DPE, this algorithm can claim strong security proofs similar to those of FFX.

**Residual Risk**

The best cryptographic scheme ensures that, given a known plaintext/ciphertext pair, no cryptanalytic method exists that will transform an unknown ciphertext into its associated plaintext more efficiently than key-space exhaustion. However, systems that use cryptography rarely protect against the possibility of a perpetrator creating chosen plaintext/ciphertext pairs. Unless the perpetrator’s ability to do this is limited by the number of such pairs she is permitted to create, a data-space exhaustion attack remains a risk. In the case of a four-digit PIN, e.g., only 0000 through 9999 are possible values, i.e., 10,000 pairs would produce a dictionary that is 100 percent effective. This is why a PIN in financial services applications is, by standard, encrypted in a PIN block that includes the PAN. The resulting block has over 10^{20} possible outputs based on an exhaustion of potential inputs. This remains substantially lower than the number of attempts for key-space exhaustion (i.e., two-key TDES has a key space of 2^{112}>10^{33}), but is an improvement by many orders of magnitude.

The exhaustion of the datum-space is a risk independent of the cryptographic system used. Organizations must implement other controls, e.g., limiting the number of incorrect attempts or introducing time delays that make such attacks require exponential time, to address this.

**Conclusion**

FPE provides a means for addressing the need to protect sensitive data elements without adversely impacting databases, message formats, and format-sensitive applications. Care is needed in selecting an FPE that provides the level of cryptographic security required. Methods in addition to cryptography remain essential to address residual risks.

**References**

1. Black, J. and Rogaway, P. Ciphers with Arbitrary Finite Domains. *Topics in Cryptology—CT-RSA 2002*. LNCS 2271, Springer-Verlag, 2002, 114–130.

2. Bellare, M., Ristenpart, T., Rogaway, P., and Stegers, T. Format-preserving encryption, *Selected Areas in Cryptography (SAC 2009)*. LNCS 5867, Springer, 2009, 295–312.

3. Bellare, M., Rogaway, P., and Spies, T. The FFX mode of operation for format-preserving encryption, 2010. *http://csrc.nist.gov/groups/ST/toolkit/BCM/documents/proposedmodes/ffx/ffx-spec.pdf*.

4. Rogaway, P. Format-preserving encryption: A survey and assessment (draft), 2009.

5. Poore, R. S. *Format Preserving Encryption for the PCI DSS and VISA Best Practices—Data Field Encryption*. Cryptographic Assurance Services, LLC, December 2009.

6. Stapleton, J. and Poore, R. S. Tokenization and other methods of security for cardholder data. In *Proceedings of Information Security Journal: A Global Perspective*, 91–99, 2011.

7. Spies, T. Feistel finite set encryption mode. *http://csrc.nist.gov/groups/ST/toolkit/BCM/documents/proposedmodes/ffsem/ffsem-spec.pdf*, 2008.

8. Luby, M. and Rackoff, C. How to construct pseudo-random permutations from pseudo-random functions. In *Advances in Cryptology—CRYPTO 85* (Santa Barbara, CA)*,* LNCS 218, H. C. Williams, Ed. Springer-Verlag New York, New York, p. 447, 1986.

9. Patarin, J. Luby-Rackoff: 7 rounds are enough for 2^{n(1}^{‐}ε^{)} security. *Advances in Cryptology—CRYPTO 2003*, LNCS 2729, 513–529, 2003.

10. Schneier, B. and Kelsey, J. Unbalanced Feistel networks and block-cipher design. *Fast Software Encryption, Third International Workshop Proceedings*, February 1996, Springer-Verlag, 1996, pp. 121–144.

11. Lucks, S. Faster Luby-Rackoff ciphers. *Fast Software Encryption (FSE 1996)*, LNCS 1039, Springer, Heidelberg, pp. 180–203, 1996.

12. Soto, J., Jr. NIST IR 6390—Randomness testing of the advanced encryption standard candidate algorithms. National Institute of Standards and Technology, Gaithersburg, MD, September 1999. *http://csrc.nist.gov/groups/ST/toolkit/rng/documents/AES-REPORT2.doc*.

13. Brightwell, M. and Smith, H. E. Using datatype-preserving encryption to enhance data warehouse security. *20th NISSC Proceedings*, Baltimore, MD, 1997.

14. Brier, E., Peyrin, T., and Stern, J. BPS: A format-preserving encryption proposal. *http://csrc.nist.gov/groups/ST/toolkit/BCM/documents/proposedmodes/bps/bps-spec.pdf*.

15. Dworkin, M. Recommendation for block cipher modes of operation: Methods and techniques. NIST Special Publication 800-38A. NIST, Washington, DC, 2001.

16. Liskov, M., Rivest, R. L., and Wagner, D. Tweakable block ciphers, *Advances in Cryptology—CRYPTO 2002.* LNCS 2442, Springer, 2002, 31–46.

17. Morris, B., Rogaway, P., and Stegers, T. How to encipher messages on a small domain deterministic encryption and the Thorp shuffle. *Advances in Cryptology—CRYPTO 2009*. LNCS 5677, Springer, 2009, 286–302.

18. Schadd, J. Use of the advanced encryption standard (AES) encryption algorithm in cryptographic message syntax (CMS). RFC 3565. The Internet Society, 2003. *http://tools.ietf.org/html/rfc3565*.

19. Stapleton, J. and Poore, R. S. Registry of approved cryptographic resources for financial services industry standard—Registry number: 00002—Advanced Encryption Standard (AES). ANSI X9 SD-34-2009.

20. Stapleton, J. and Poore, R. S. Announcing the advanced encryption standard (AES). Federal Information Processing Standards (FIPS) Publication 197. U.S. DoC/NIST, 2001.

21. Stapleton, J. and Poore, R. S. Information technology—Security techniques—Encryption algorithms—Part 3: Block ciphers. ISO/IEC 18033-3:2005.

22. Stapleton, J. and Poore, R. S. VISA best practices—Data field encryption. Version 1.0 (October 5, 2009).

*** Triple Data Encryption Standard (TDES): Based on the Data Encryption Standard (DES), this is the application of DES three times with one or more keys in an encrypt, decrypt, encrypt sequence; International Data Encryption Algorithm (IDEA) is a block cipher designed by James Massey of ETH Zurich and Xuejia Lai; Advance Encryption Standard (AES):*http://csrc.nist.gov/publications/fips/fips197/fips-197.pdf*.