Symmetric Cryptography Fundamentals - Adaptive Code via C#. Agile coding with design patterns and SOLID principles (2014)

Adaptive Code via C#. Agile coding with design patterns and SOLID principles (2014)

Chapter 4. Symmetric Cryptography Fundamentals

Strong cryptography is a critical piece of information security that can be applied at many levels, from data storage to network communication. One of the most common classes of security problems people introduce is the misapplication of cryptography. It's an area that can look deceptively easy, when in reality there are an overwhelming number of pitfalls. Moreover, it is likely that many classes of cryptographic pitfalls are still unknown.

It doesn't help that cryptography is a huge topic, complete with its own subfields, such as public key infrastructure (PKI). Many books cover the algorithmic basics; one example is Bruce Schneier's classic, Applied Cryptography (John Wiley & Sons). Even that classic doesn't quite live up to its name, however, as it focuses on the implementation of cryptographic primitives from the developer's point of view and spends relatively little time discussing how to integrate cryptography into an application securely. As a result, we have seen numerous examples of developers armed with a reasonable understanding of cryptographic algorithms that they've picked up from that book, who then go on to build their own cryptographic protocols into their applications, which are often insecure.

Over the next three chapters, we focus on the basics of symmetric cryptography . With symmetric cryptography, any parties who wish to communicate securely must share a piece of secret information. That shared secret (usually an encryption key) must be communicated over a secure medium. In particular, sending the secret over the Internet is a bad idea, unless you're using some sort of channel that is already secure, such as one properly secured using public key encryption (which can be tough to do correctly in itself). In many cases, it's appropriate to use some type of out-of-band medium for communication, such as a telephone or a piece of paper.

In these three chapters, we'll cover everything most developers need to use symmetric cryptography effectively, up to the point when you need to choose an actual network protocol. Applying cryptography on the network is covered in Chapter 9.

WARNING

To ensure that you choose the right cryptographic protocols for your application, you need an understanding of these basics. However, you'll very rarely need to go all the way back to the primitive algorithms we discuss in these chapters. Instead, you should focus on out-of-the-box protocols that are believed to be cryptographically strong. While we therefore recommend that you thoroughly understand the material in these chapters, we advise you to go to the recipes in Chapter 9 to find something appropriate before you come here and build something yourself. Don't fall into the same trap that many of Applied Cryptography's readers have fallen into!

There are two classes of symmetric primitives, both of utmost importance. First are symmetric encryption algorithms, which provide for data secrecy. Second are message authentication codes (MACs), which can ensure that if someone tampers with data while in transit, the tampering will be detected. Recently, a third class of primitives has started to appear: encryption modes that provide for both data secrecy and message authentication. Such primitives can help make the application of cryptography less prone to disastrous errors.

In this chapter, we will look at how to generate, represent, store, and distribute symmetric-key material. In Chapter 5, we will look at encryption using block ciphers such as AES, and in Chapter 6, we will examine cryptographic hash functions (such as SHA1) and MACs.

TIP

Towards the end of this chapter, we do occasionally forward-reference algorithms from the next two chapters. It may be a good idea to read Recipe 5.1 through Recipe 5.4 and Recipe 6.1 through Recipe 6.4 before reading Recipe 4.10 through Recipe 4.14.

4.1. Representing Keys for Use in Cryptographic Algorithms

Problem

You need to keep an internal representation of a symmetric key. You may want to save this key to disk, pass it over a network, or use it in some other way.

Solution

Simply keep the key as an ordered array of bytes. For example:

/* When statically allocated */

unsigned char *key[KEYLEN_BYTES];

/* When dynamically allocated */

unsigned char *key = (unsigned char *)malloc(KEYLEN_BYTES);

When you're done using a key, you should delete it securely to prevent local attackers from recovering it from memory. (This is discussed in Recipe 13.2.)

Discussion

While keys in public key cryptography are represented as very large numbers (and often stored in containers such as X.509 certificates), symmetric keys are always represented as a series of consecutive bits. Algorithms operate on these binary representations.

Occasionally, people are tempted to use a single 64-bit unit to represent short keys (e.g., a long long when using GCC on most platforms). Similarly, we've commonly seen people use an array of word-size values. That's a bad idea because of byte-ordering issues. When representing integers, the bytes of the integer may appear most significant byte first (big-endian) or least significant byte first (little-endian). Figure 4-1 provides a visual illustration of the difference between big-endian and little-endian storage:

Big-endian versus little-endian

Figure 4-1. Big-endian versus little-endian

Endian-ness doesn't matter when performing integer operations, because the CPU implicitly knows how integers are supposed to be represented and treats them appropriately. However, a problem arises when we wish to treat a single integer or an array of integers as an array of bytes. Casting the address of the first integer to be a pointer to char does not give the right results on a little-endian machine, because the cast does not cause bytes to be swapped to their "natural" order. If you absolutely always cast to an appropriate type, this may not be an issue if you don't move data between architectures, but that would defeat any possible reason to use a bigger storage unit than a single byte. For this reason, you should always represent key material as an array of one-byte elements. If you do so, your code and the data will always be portable, even if you send the data across the network.

You should also avoid using signed data types, simply to avoid potential printing oddities due to sign extension. For example, let's say that you have a signed 32-bit value, 0xFF000000, and you want to shift it right by one bit. You might expect the result 0x7F800000, but you'd actually get0xFF800000, because the sign bit gets shifted, and the result also maintains the same sign.[1]

See Also

Recipe 3.2


[1] To be clear on semantics, note that shifting right eight bits will always give the same result as shifting right one bit eight times. That is, when shifting right an unsigned value, the leftmost bits always get filled in with zeros. But with a signed value, they always get filled in with the original value of the most significant bit.

4.2. Generating Random Symmetric Keys

Problem

You want to generate a secure symmetric key. You already have some mechanism for securely transporting the key to anyone who needs it. You need the key to be as strong as the cipher you're using, and you want the key to be absolutely independent of any other data in your system.

Solution

Use one of the recipes in Chapter 11 to collect a byte array of the necessary length filled with entropy.

When you're done using a key, you should delete it securely to prevent local attackers from recovering it from memory. This is discussed in Recipe 13.2.

Discussion

In Recipe 11.2, we present APIs for getting random data, including key material. We recommend using the spc_keygen( ) function from that API. See that recipe for considerations on which function to use.

To actually implement spc_keygen( ), use one of the techniques from Chapter 11. For example, you may want to use the randomness infrastructure that is built into the operating system (see Recipe 11.3 and Recipe 11.4), or you may want to collect your own entropy, particularly on an embedded platform where the operating system provides no such services (see Recipe 11.19 through Recipe 11.23).

In many cases, you may want to derive short-term keys from a single "master" key. See Recipe 4.11 for a discussion of how to do so.

Be conservative when choosing a symmetric key length. We recommend 128-bit symmetric keys. (See Recipe 5.3.)

See Also

Recipe 4.11, Recipe 5.3, Recipe 11.2, Recipe 11.3, Recipe 11.4, Recipe 11.19, Recipe 11.20, Recipe 11.21, Recipe 11.22, Recipe 11.23, Recipe 13.2

4.3. Representing Binary Keys (or Other Raw Data) as Hexadecimal

Problem

You want to print out keys in hexadecimal format, either for debugging or for easy communication.

Solution

The easiest way is to use the "%X" specifier in the printf() family of functions. In C++, you can set the ios::hex flag on a stream object before outputting a value, then clear the flag afterward.

Discussion

Here is a function called spc_print_hex() that prints arbitrary data of a specified length in formatted hexadecimal:

#include <stdio.h>

#include <string.h>

#define BYTES_PER_GROUP 4

#define GROUPS_PER_LINE 4

/* Don't change these */

#define BYTES_PER_LINE (BYTES_PER_GROUP * GROUPS_PER_LINE)

void spc_print_hex(char *prefix, unsigned char *str, int len) {

unsigned long i, j, preflen = 0;

if (prefix) {

printf("%s", prefix);

preflen = strlen(prefix);

}

for (i = 0; i < len; i++) {

printf("%02X ", str[i]);

if (((i % BYTES_PER_LINE) = = (BYTES_PER_LINE - 1)) && ((i + 1) != len)) {

putchar('\n');

for (j = 0; j < preflen; j++) putchar(' ');

}

else if ((i % BYTES_PER_GROUP) = = (BYTES_PER_GROUP - 1)) putchar(' ');

}

putchar('\n');

}

This function takes the following arguments:

prefix

String to be printed in front of the hexadecimal output. Subsequent lines of output are indented appropriately.

str

String to be printed, in binary. It is represented as an unsigned char * to make the code simpler. The caller will probably want to cast, or it can be easily rewritten to be a void *, which would require this code to cast this argument to a byte-based type for the array indexing to work correctly.

len

Number of bytes to print.

This function prints out bytes as two characters, and it pairs bytes in groups of four. It will also print only 16 bytes per line. Modifying the appropriate preprocessor declarations at the top easily changes those parameters.

Currently, this function writes to the standard output, but it can be modified to return a malloc( )'d string quite easily using sprintf( ) and putc( ) instead of printf( ) and putchar( ).

In C++, you can print any data object in hexadecimal by setting the flag ios::hex using the setf( ) method on ostream objects (the unsetf( ) method can be used to clear flags). You might also want the values to print in all uppercase, in which case you should set the ios::uppercase flag. If you want a leading "0x" to print to denote hexadecimal, also set the flag ios::showbase. For example:

cout.setf(ios::hex | ios::uppercase | ios::showbase);

cout << 1234 << endl;

cout.unsetf(ios::hex | ios::uppercase | ios::showbase);

4.4. Turning ASCII Hex Keys (or Other ASCII Hex Data) into Binary

Problem

You have a key represented in ASCII that you'd like to convert into binary form. The string containing the key is NULL-terminated.

Solution

The code listed in the following Section 4.4.3 parses an ASCII string that represents hexadecimal data, and it returns a malloc( )'d buffer of the appropriate length. Note that the buffer will be half the size of the input string, not counting the leading "0x" if it exists. The exception is when there is whitespace. This function passes back the number of bytes written in its second parameter. If that parameter is negative, an error occurred.

Discussion

The spc_hex2bin( ) function shown in this section converts an ASCII string into a binary string. Spaces and tabs are ignored. A leading "0x" or "0X" is ignored. There are two cases in which this function can fail. First, if it sees a non-hexadecimal digit, it assumes that the string is not in the right format, and it returns NULL, setting the error parameter to ERR_NOT_HEX. Second, if there is an odd number of hex digits in the string, it returns NULL, setting the error parameter to ERR_BAD_SIZE.

#include <string.h>

#include <stdlib.h>

#include <ctype.h>

#define ERR_NOT_HEX -1

#define ERR_BAD_SIZE -2

#define ERR_NO_MEM -3

unsigned char *spc_hex2bin(const unsigned char *input, size_t *l) {

unsigned char shift = 4, value = 0;

unsigned char *r, *ret;

const unsigned char *p;

if (!(r = ret = (unsigned char *)malloc(strlen(input) / 2))) {

*l = ERR_NO_MEM;

return 0;

}

for (p = input; isspace(*p); p++);

if (p[0] = = '0' && (p[1] = = 'x' || p[1] = = 'X')) p += 2;

while (p[0]) {

switch (p[0]) {

case '0': case '1': case '2': case '3': case '4':

case '5': case '6': case '7': case '8': case '9':

value |= (*p++ - '0') << shift;

break;

case 'a': case 'b': case 'c':

case 'd': case 'e': case 'f':

value |= (*p++ - 'a' + 0xa) << shift;

break;

case 'A': case 'B': case 'C':

case 'D': case 'E': case 'F':

value |= (*p++ - 'A' + 0xa) << shift;

break;

case 0:

if (!shift) {

*l = ERR_NOT_HEX;

free(ret);

return 0;

}

break;

default:

if (isspace(p[0])) p++;

else {

*l = ERR_NOT_HEX;

free(ret);

return 0;

}

}

if ((shift = (shift + 4) % 8) != 0) {

*r++ = value;

value = 0;

}

}

if (!shift) {

*l = ERR_BAD_SIZE;

free(ret);

return 0;

}

*l = (r - ret);

return (unsigned char *)realloc(ret, *l);

}

4.5. Performing Base64 Encoding

Problem

You want to represent binary data in as compact a textual representation as is reasonable, but the data must be easy to encode and decode, and it must use printable text characters.

Solution

Base64 encoding encodes six bits of data at a time, meaning that every six bits of input map to one character of output. The characters in the output will be a numeric digit, a letter (uppercase or lowercase), a forward slash, a plus, or the equal sign (which is a special padding character).

Note that four output characters map exactly to three input characters. As a result, if the input string isn't a multiple of three characters, you'll need to do some padding (explained in Section 4.5.3).

Discussion

The base64 alphabet takes 6-bit binary values representing numbers from 0 to 63 and maps them to a set of printable ASCII characters. The values 0 through 25 map to the uppercase letters in order. The values 26 through 51 map to the lowercase letters. Then come the decimal digits from 0 to 9, and finally + and /.

If the length of the input string isn't a multiple of three bytes, the leftover bits are padded to a multiple of six with zeros; then the last character is encoded. If only one byte would have been needed in the input to make it a multiple of three, the pad character ( =) is added to the end of the string. Otherwise, two pad characters are added.

#include <stdlib.h>

static char b64table[64] = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"

"abcdefghijklmnopqrstuvwxyz"

"0123456789+/";

/* Accepts a binary buffer with an associated size.

* Returns a base64 encoded, NULL-terminated string.

*/

unsigned char *spc_base64_encode(unsigned char *input, size_t len, int wrap) {

unsigned char *output, *p;

size_t i = 0, mod = len % 3, toalloc;

toalloc = (len / 3) * 4 + (3 - mod) % 3 + 1;

if (wrap) {

toalloc += len / 57;

if (len % 57) toalloc++;

}

p = output = (unsigned char *)malloc(((len / 3) + (mod ? 1 : 0)) * 4 + 1);

if (!p) return 0;

while (i < len - mod) {

*p++ = b64table[input[i++] >> 2];

*p++ = b64table[((input[i - 1] << 4) | (input[i] >> 4)) & 0x3f];

*p++ = b64table[((input[i] << 2) | (input[i + 1] >> 6)) & 0x3f];

*p++ = b64table[input[i + 1] & 0x3f];

i += 2;

if (wrap && !(i % 57)) *p++ = '\n';

}

if (!mod) {

if (wrap && i % 57) *p++ = '\n';

*p = 0;

return output;

} else {

*p++ = b64table[input[i++] >> 2];

*p++ = b64table[((input[i - 1] << 4) | (input[i] >> 4)) & 0x3f];

if (mod = = 1) {

*p++ = '=';

*p++ = '=';

if (wrap) *p++ = '\n';

*p = 0;

return output;

} else {

*p++ = b64table[(input[i] << 2) & 0x3f];

*p++ = '=';

if (wrap) *p++ = '\n';

*p = 0;

return output;

}

}

}

The public interface to the above code is the following:

unsigned char *spc base64_encode(unsigned char *input, size_t len, int wrap);

The result is a NULL-terminated string allocated internally via malloc( ). Some protocols may expect you to "wrap" base64-encoded data so that, when printed, it takes up less than 80 columns. If such behavior is necessary, you can pass in a non-zero value for the final parameter, which will cause this code to insert newlines once every 76 characters. In that case, the string will always end with a newline (followed by the expected NULL-terminator).

If the call to malloc( ) fails because there is no memory, this function returns 0.

See Also

Recipe 4.6

4.6. Performing Base64 Decoding

Problem

You have a base64-encoded string that you'd like to decode.

Solution

Use the inverse of the algorithm for encoding, presented in Recipe 4.5. This is most easily done via table lookup, mapping each character in the input to six bits of output.

Discussion

Following is our code for decoding a base64-encoded string. We look at each byte separately, mapping it to its associated 6-bit value. If the byte is NULL, we know that we've reached the end of the string. If it represents a character not in the base64 set, we ignore it unless the strict argument is non-zero, in which case we return an error.

WARNING

The RFC that specifies this encoding says you should silently ignore any unnecessary characters in the input stream. If you don't have to do so, we recommend you don't, as this constitutes a covert channel in any protocol using this encoding.

Note that we check to ensure strings are properly padded. If the string isn't properly padded or otherwise terminates prematurely, we return an error.

#include <stdlib.h>

#include <string.h>

static char b64revtb[256] = {

-3, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, /*0-15*/

-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, /*16-31*/

-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 62, -1, -1, -1, 63, /*32-47*/

52, 53, 54, 55, 56, 57, 58, 59, 60, 61, -1, -1, -1, -2, -1, -1, /*48-63*/

-1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, /*64-79*/

15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, -1, -1, -1, -1, -1, /*80-95*/

-1, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, /*96-111*/

41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, -1, -1, -1, -1, -1, /*112-127*/

-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, /*128-143*/

-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, /*144-159*/

-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, /*160-175*/

-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, /*176-191*/

-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, /*192-207*/

-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, /*208-223*/

-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, /*224-239*/

-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1 /*240-255*/

};

static unsigned int raw_base64_decode(unsigned char *in, unsigned char *out,

int strict, int *err) {

unsigned int result = 0, x;

unsigned char buf[3], *p = in, pad = 0;

*err = 0;

while (!pad) {

switch ((x = b64revtb[*p++])) {

case -3: /* NULL TERMINATOR */

if (((p - 1) - in) % 4) *err = 1;

return result;

case -2: /* PADDING CHARACTER. INVALID HERE */

if (((p - 1) - in) % 4 < 2) {

*err = 1;

return result;

} else if (((p - 1) - in) % 4 = = 2) {

/* Make sure there's appropriate padding */

if (*p != '=') {

*err = 1;

return result;

}

buf[2] = 0;

pad = 2;

result++;

break;

} else {

pad = 1;

result += 2;

break;

}

return result;

case -1:

if (strict) {

*err = 2;

return result;

}

break;

default:

switch (((p - 1) - in) % 4) {

case 0:

buf[0] = x << 2;

break;

case 1:

buf[0] |= (x >> 4);

buf[1] = x << 4;

break;

case 2:

buf[1] |= (x >> 2);

buf[2] = x << 6;

break;

case 3:

buf[2] |= x;

result += 3;

for (x = 0; x < 3 - pad; x++) *out++ = buf[x];

break;

}

break;

}

}

for (x = 0; x < 3 - pad; x++) *out++ = buf[x];

return result;

}

/* If err is non-zero on exit, then there was an incorrect padding error. We

* allocate enough space for all circumstances, but when there is padding, or

* there are characters outside the character set in the string (which we are

* supposed to ignore), then we end up allocating too much space. You can

* realloc( ) to the correct length if you wish.

*/

unsigned char *spc_base64_decode(unsigned char *buf, size_t *len, int strict,

int *err) {

unsigned char *outbuf;

outbuf = (unsigned char *)malloc(3 * (strlen(buf) / 4 + 1));

if (!outbuf) {

*err = -3;

*len = 0;

return 0;

}

*len = raw_base64_decode(buf, outbuf, strict, err);

if (*err) {

free(outbuf);

*len = 0;

outbuf = 0;

}

return outbuf;

}

The public API to this code is:

unsigned char *spc_base64_decode(unsigned char *buf, size_t *len, int strict, int

*err);

The API assumes that buf is a NULL-terminated string. The len parameter is a pointer that receives the length of the binary output. If there is an error, the memory pointed to by len will be 0, and the value pointed to by err will be non-zero. The error will be -1 if there is a padding error, -2 if strict checking was requested, but a character outside the strict set is found, and -3 if malloc( ) fails.

See Also

Recipe 4.5

4.7. Representing Keys (or Other Binary Data) as English Text

Problem

You want to use an easy-to-read format for displaying keys (or fingerprints or some other interesting binary data). English would work better than a hexadecimal representation because people's ability to recognize the key as correct by sight will be better.

Solution

Map a particular number of bits to a dictionary of words. The dictionary should be of such a size that an exact mapping to a number of bits is possible. That is, the dictionary should have a number of entries that is a power of two.

Discussion

The spc_bin2words() function shown here converts a binary string of the specified number of bytes into a string of English words. This function takes two arguments: str is the binary string to convert, and len is the number of bytes to be converted.

#include <string.h>

#include <stdlib.h>

#include "wordlist.h"

#define BITS_IN_LIST 11

#define MAX_WORDLEN 4

/* len parameter is measured in bytes. Remaining bits padded with 0. */

unsigned char *spc_bin2words(const unsigned char *str, size_t len) {

short add_space = 0;

size_t i, leftbits, leftovers, scratch = 0, scratch_bits = 0;

unsigned char *p, *res;

res = (unsigned char *)malloc((len * 8 / BITS_IN_LIST + 1) * (MAX_WORDLEN + 1));

if (!res) abort( );

res[0] = 0;

for (i = 0; i < len; i++) {

leftovers = str[i];

leftbits = 8;

while (leftbits) {

if (scratch_bits + leftbits <= BITS_IN_LIST) {

scratch |= (leftovers << (BITS_IN_LIST - leftbits - scratch_bits));

scratch_bits += leftbits;

leftbits = 0;

} else {

scratch |= (leftovers >> (leftbits - (BITS_IN_LIST - scratch_bits)));

leftbits -= (BITS_IN_LIST - scratch_bits);

leftovers &= ((1 << leftbits) - 1);

scratch_bits = BITS_IN_LIST;

}

if (scratch_bits = = BITS_IN_LIST) {

p = words[scratch];

/* The strcats are a bit inefficient because they start from the front of

* the string each time. But, they're less confusing, and these strings

* should never get more than a few words long, so efficiency will

* probably never be a real concern.

*/

if (add_space) strcat(res, " ");

strcat(res, p);

scratch = scratch_bits = 0;

add_space = 1;

}

}

}

if (scratch_bits) { /* Emit the final word */

p = words[scratch];

if (add_space) strcat(res, " ");

strcat(res, p);

}

res = (unsigned char *)realloc(res, strlen(res) + 1);

if (!res) abort( ); /* realloc failed; should never happen, as size shrinks */

return res;

}

To save space, the dictionary file (wordlist.h) is not provided here. Instead, you can find it on the book's web site.

WARNING

The previous code is subtly incompatible with the S/KEY dictionary because their dictionary is not in alphabetical order. (S/KEY is an authentication system using one-time passwords.) Be sure to use the right dictionary!

The code is written in such a way that you can use dictionaries of different sizes if you wish to encode a different number of bits per word. Currently, the dictionary encodes 11 bits of data (by having exactly 211 words), where no word is more than 4 characters long. The web site also provides a dictionary that encodes 13 bits of data, where no word is more than 6 letters long. The previous code can be modified to use the larger dictionary simply by changing the two appropriate preprocessor definitions at the top.

The algorithm takes 11 bits of the binary string, then finds the word that maps to the unique 11-bit value. Note that it is rare for the number of bits represented by a single word to align exactly to a byte. For example, if you were to encode a 2-byte binary string, those 16 bits would be encoded by 2 words, which could represent up to 22 bits. Therefore, there will usually be leftover bits. In the case of 2 bytes, there are 6 leftover bits. The algorithm sets all leftover bits to 0.

Because of this padding scheme, the output doesn't always encode how many bytes were in the input. For example, if the output is 6 words long, the input could have been either 7 or 8 bytes long. Therefore, you need to manually truncate the output to the desired length.

See Also

Recipe 4.8

4.8. Converting Text Keys to Binary Keys

Problem

A user enters a textual representation of a key or other binary data (see Recipe 4.7). You need to convert it to binary.

Solution

Parse out the words, then look them up in the dictionary to reconstruct the actual bits, as shown in the code included in the next section.

Discussion

This function spc_words2bin( ) uses the wordlist.h file provided on the book's web site, and it can be changed as described in Recipe 4.7.

#include <stdlib.h>

#include <string.h>

#include <ctype.h>

#include "wordlist.h"

#define BITS_IN_LIST 11

#define MAX_WORDLEN 4

unsigned char *spc_words2bin(unsigned char *str, size_t *outlen) {

int cmp, i;

size_t bitsinword, curbits, needed, reslen;

unsigned int ix, min, max;

unsigned char *p = str, *r, *res, word[MAX_WORDLEN + 1];

curbits = reslen = *outlen = 0;

if(!(r = res = (unsigned char *)malloc((strlen(str) + 1) / 2))

return 0;

memset(res, 0, (strlen(str) + 1) / 2);

for (;;) {

while (isspace(*p)) p++;

if (!*p) break;

/* The +1 is because we expect to see a space or a NULL after each and every

* word; otherwise, there's a syntax error.

*/

for (i = 0; i < MAX_WORDLEN + 1; i++) {

if (!*p || isspace(*p)) break;

if (islower(*p)) word[i] = *p++ - ' ';

else if (isupper(*p)) word[i] = *p++;

else {

free(res);

return 0;

}

}

if (i = = MAX_WORDLEN + 1) {

free(res);

return 0;

}

word[i] = 0;

min = 0;

max = (1 << BITS_IN_LIST) - 1;

do {

if (max < min) {

free(res);

return 0; /* Word not in list! */

}

ix = (max + min) / 2;

cmp = strcmp(word, words[ix]);

if (cmp > 0) min = ix + 1;

else if (cmp < 0) max = ix - 1;

} while (cmp);

bitsinword = BITS_IN_LIST;

while (bitsinword) {

needed = 8 - curbits;

if (bitsinword <= needed) {

*r |= (ix << (needed - bitsinword));

curbits += bitsinword;

bitsinword = 0;

} else {

*r |= (ix >> (bitsinword - needed));

bitsinword -= needed;

ix &= ((1 << bitsinword) - 1);

curbits = 8;

}

if (curbits = = 8) {

curbits = 0;

*++r = 0;

reslen++;

}

}

}

if (curbits && *r) {

free(res);

return 0; /* Error, bad format, extra bits! */

}

*outlen = reslen;

return (unsigned char *)realloc(res, reslen);

}

The inputs to the spc_words2bin( ) function are str , which is the English representation of the binary string, and outlen, which is a pointer to how many bytes are in the output. The return value is a binary string of length len. Note that any bits encoded by the English words that don't compose a full byte must be zero, but are otherwise ignored.

You must know a priori how many bytes you expect to get out of this function. For example, 6 words might map to a 56-bit binary string or to a 64-bit binary string (5 words can encode at most 55 bits, and 6 words encodes up to 66 bits).

See Also

Recipe 4.7

4.9. Using Salts, Nonces, and Initialization Vectors

Problem

You want to use an algorithm that requires a salt, a nonce or an initialization vector (IV). You need to understand the differences among these three things and figure out how to select good specimens of each.

Solution

There's a lot of terminology confusion, and the following Section 4.9.3 contains our take on it. Basically, salts and IVs should be random, and nonces are usually sequential, potentially with a random salt as a component, if there is room. With sequential nonces, you need to ensure that you never repeat a single {key, nonce} pairing.

To get good random values, use a well-seeded, cryptographically strong pseudo-random number generator (see the appropriate recipes in Chapter 11). Using that, get the necessary number of bits. For salt, 64 bits is sufficient. For an IV, get one of the requisite size.

Discussion

Salts, nonces, and IVs are all one-time values used in cryptography that don't need to be secret, but still lead to additional security. It is generally assumed that these values are visible to attackers, even if it is sometimes possible to hide them. At the very least, the security of cryptographic algorithms and protocols should not depend on the secrecy of such values.

TIP

We try to be consistent with respect to this terminology in the book. However, in the real world, even among cryptographers there's a lot of inconsistency. Therefore, be sure to follow the directions in the documentation for whatever primitive you're using.

Salts

Salt is random data that helps protect against dictionary and other precomputation attacks. Generally, salt is used in password-based systems and is concatenated to the front of a password before processing. Password systems often use a one-way hash function to turn a password into an "authenticator." In the simplest such system, if there were no salt, an attacker could build a dictionary of common passwords and just look up the original password by authenticator.

The use of salt means that the attacker would have to produce a totally separate dictionary for every possible salt value. If the salt is big enough, it essentially makes dictionary attacks infeasible. However, the attacker can generally still try to guess every password without using a stronger protocol. For a discussion of various password-based authentication technologies, see Recipe 8.1.

If the salt isn't chosen at random, certain dictionaries will be more likely than others. For this reason, salt is generally expected to be random.

Salt can be generated using the techniques discussed in Chapter 11.

Nonces

Nonces [2] are bits of data often input to cryptographic protocols and algorithms, including many message authentication codes and some encryption modes. Such values should only be used a single time with any particular cryptographic key. In fact, reuse generally isn't prohibited, but the odds of reuse need to be exceptionally low. That is, if you have a nonce that is very large compared to the number of times you expect to use it (e.g., the nonce is 128 bits, and you don't expect to use it more than 232 times), it is sufficient to choose nonces using a cryptographically strong pseudo-random number generator.

Sequential nonces have a few advantages over random nonces:

§ You can easily guarantee that nonces are not repeated. Note, though, that if the possible nonce space is large, this is not a big concern.

§ Many protocols already send a unique sequence number for each packet, so one can save space in transmitted messages.

§ The sequential ordering of nonces can be used to prevent replay attacks, but only if you actually check to ensure that the nonce is always incrementing. That is, if each message has a nonce attached to it, you can tell whether the message came in the right order, by looking at the nonce and making sure its value is always incrementing.

However, randomness in a nonce helps prevent against classes of attacks that amortize work across multiple keys in the same system.

We recommend that nonces have both a random portion and a sequential portion. Generally, the most significant bytes should be random, and the final 6 to 8 bytes should be sequential. An 8-byte counter can accommodate 264 messages without the counter's repeating, which should be more than big enough for any system.

If you use both a nonce and a salt, you can select a single random part for each key you use. The nonce on the whole has to be unique, but the salt can remain fixed for the lifetime of the key; the counter ensures that the nonce is always unique. In such a nonce, the random part is said to be a "salt." Generally, it's good to have four or more bytes of salt in a nonce.

If you decide to use only a random nonce, remember that the nonce needs to be changed after each message, and you lose the ability to prevent against capture-replay attacks.

The random portion of a nonce can be generated using the techniques discussed in Chapter 11. Generally, you will have a fixed-size buffer into which you place the nonce, and you will then set the remaining bytes to zero, incrementing them after each message is sent. For example, if you have a 16-byte nonce with an 8-byte counter in the least significant bytes, you might use the following code:

/* This assumes a 16-byte nonce where the last 8 bytes represent the counter! */

void increment_nonce(unsigned char *nonce) {

if (!++nonce[15]) if (!++nonce[14]) if (!++nonce[13]) if (!++nonce[12])

if (!++nonce[11]) if (!++nonce[10]) if (!++nonce[9]) if (!++nonce[8]) {

/* If you get here, you're out of nonces. This really shouldn't happen

* with an 8-byte nonce, so often you'll see: if (!++nonce[9]) ++nonce[8];

*/

}

}

Note that the this code can be more efficient if we do a 32-bit increment, but then there are endian-ness issues that make portability more difficult.

TIP

If sequential nonces are implemented correctly, they can help thwart capture relay attacks (see Recipe 6.1).

Initialization vectors (IVs)

The term initialization vector (IV) is the most widely used and abused of the three terms we've been discussing. IV and nonce are often used interchangeably. However, a careful definition does differentiate between these two concepts. For our purposes, an IV is a nonce with an additional requirement: it must be selected in a nonpredictable way. That is, the IV can't be sequential; it must be random. One popular example in which a real IV is required for maximizing security is when using the CBC encryption mode (see Recipe 5.6).

The big downside to an IV, as compared to a nonce, is that an IV does not afford protection against capture-replay attacks—unless you're willing to remember every IV that has ever been used, which is not a good solution. To ensure protection against such attacks when using an IV, the higher-level protocol must have its own notion of sequence numbers that get checked in order.

Another downside is that there is generally more data to send. Systems that use sequential nonces can often avoid sending the nonce, as it can be calculated from the sequence number already sent with the message.

Initialization vectors can be generated using the techniques discussed in Chapter 11.

See Also

§ Chapter 11

§ Recipe 5.6, Recipe 5.6, Recipe 8.1


[2] In the UK, "nonce" is slang for a child sex offender. However, this term is widespread in the cryptographic world, so we use it.

4.10. Deriving Symmetric Keys from a Password

Problem

You do not want passwords to be stored on disk. Instead, you would like to convert a password into a cryptographic key.

Solution

Use PBKDF2, the password-based key derivation function 2, specified in PKCS #5.[3]

TIP

You can also use this recipe to derive keys from other keys. See Recipe 4.1 for considerations; that recipe also discusses considerations for choosing good salt values.

Discussion

Passwords can generally vary in length, whereas symmetric keys are almost always a fixed size. Passwords may be vulnerable to guessing attacks, but ultimately we'd prefer symmetric keys not to be as easily guessable.

The function spc_pbkdf2( ) in the following code is an implementation of PKCS #5, Version 2.0. PKCS #5 stands for "Public Key Cryptography Standard #5," although there is nothing public-key-specific about this standard. The standard defines a way to turn a password into a symmetric key. The name of the function stands for "password-based key derivation function 2," where the 2 indicates that the function implements Version 2.0 of PKCS #5.

#include <stdio.h>

#include <string.h>

#include <openssl/evp.h>

#include <openssl/hmac.h>

#include <sys/types.h>

#include <netinet/in.h>

#include <arpa/inet.h> /* for htonl */

#ifdef WIN32

typedef unsigned _ _int64 spc_uint64_t;

#else

typedef unsigned long long spc_uint64_t;

#endif

/* This value needs to be the output size of your pseudo-random function (PRF)! */

#define PRF_OUT_LEN 20

/* This is an implementation of the PKCS#5 PBKDF2 PRF using HMAC-SHA1. It

* always gives 20-byte outputs.

*/

/* The first three functions are internal helper functions. */

static void pkcs5_initial_prf(unsigned char *p, size_t plen, unsigned char *salt,

size_t saltlen, size_t i, unsigned char *out,

size_t *outlen) {

size_t swapped_i;

HMAC_CTX ctx;

HMAC_CTX_init(&ctx);

HMAC_Init(&ctx, p, plen, EVP_sha1( ));

HMAC_Update(&ctx, salt, saltlen);

swapped_i = htonl(i);

HMAC_Update(&ctx, (unsigned char *)&swapped_i, 4);

HMAC_Final(&ctx, out, (unsigned int *)outlen);

}

/* The PRF doesn't *really* change in subsequent calls, but above we handled the

* concatenation of the salt and i within the function, instead of external to it,

* because the implementation is easier that way.

*/

static void pkcs5_subsequent_prf(unsigned char *p, size_t plen, unsigned char *v,

size_t vlen, unsigned char *o, size_t *olen) {

HMAC_CTX ctx;

HMAC_CTX_init(&ctx);

HMAC_Init(&ctx, p, plen, EVP_sha1( ));

HMAC_Update(&ctx, v, vlen);

HMAC_Final(&ctx, o, (unsigned int *)olen);

}

static void pkcs5_F(unsigned char *p, size_t plen, unsigned char *salt,

size_t saltlen, size_t ic, size_t bix, unsigned char *out) {

size_t i = 1, j, outlen;

unsigned char ulast[PRF_OUT_LEN];

memset(out,0, PRF_OUT_LEN);

pkcs5_initial_prf(p, plen, salt, saltlen, bix, ulast, &outlen);

while (i++ <= ic) {

for (j = 0; j < PRF_OUT_LEN; j++) out[j] ^= ulast[j];

pkcs5_subsequent_prf(p, plen, ulast, PRF_OUT_LEN, ulast, &outlen);

}

for (j = 0; j < PRF_OUT_LEN; j++) out[j] ^= ulast[j];

}

void spc_pbkdf2(unsigned char *pw, unsigned int pwlen, char *salt,

spc_uint64_t saltlen, unsigned int ic, unsigned char *dk,

spc_uint64_t dklen) {

unsigned long i, l, r;

unsigned char final[PRF_OUT_LEN] = {0,};

if (dklen > ((((spc_uint64_t)1) << 32) - 1) * PRF_OUT_LEN) {

/* Call an error handler. */

abort( );

}

l = dklen / PRF_OUT_LEN;

r = dklen % PRF_OUT_LEN;

for (i = 1; i <= l; i++)

pkcs5_F(pw, pwlen, salt, saltlen, ic, i, dk + (i - 1) * PRF_OUT_LEN);

if (r) {

pkcs5_F(pw, pwlen, salt, saltlen, ic, i, final);

for (l = 0; l < r; l++) *(dk + (i - 1) * PRF_OUT_LEN + l) = final[l];

}

}

The spc_pbkdf2( ) function takes seven arguments:

pw

Password, represented as an arbitrary string of bytes.

pwlen

Number of bytes in the password.

salt

String that need not be private but should be unique to the user. The notion of salt is discussed in Recipe 4.9.

saltlen

Number of bytes in the salt.

ic

"Iteration count," described in more detail later in this section. A good value is 10,000.

dk

Buffer into which the derived key will be placed.

dklen

Length of the desired derived key in bytes.

The Windows version of spc_pbkdf2( ) is called SpcPBKDF2( ) . It has essentially the same signature, though the names are slightly different because of Windows naming conventions. The implementation uses CryptoAPI for HMAC-SHA1 and requires SpcGetExportableContext( ) andSpcImportKeyData( ) from Recipe 5.26.

#include <windows.h>

#include <wincrypt.h>

/* This value needs to be the output size of your pseudo-random function (PRF)! */

#define PRF_OUT_LEN 20

/* This is an implementation of the PKCS#5 PBKDF2 PRF using HMAC-SHA1. It

* always gives 20-byte outputs.

*/

static HCRYPTHASH InitHMAC(HCRYPTPROV hProvider, HCRYPTKEY hKey, ALG_ID Algid) {

HMAC_INFO HMACInfo;

HCRYPTHASH hHash;

HMACInfo.HashAlgid = Algid;

HMACInfo.pbInnerString = HMACInfo.pbOuterString = 0;

HMACInfo.cbInnerString = HMACInfo.cbOuterString = 0;

if (!CryptCreateHash(hProvider, CALG_HMAC, hKey, 0, &hHash)) return 0;

CryptSetHashParam(hHash, HP_HMAC_INFO, (BYTE *)&HMACInfo, 0);

return hHash;

}

static void FinalHMAC(HCRYPTHASH hHash, BYTE *pbOut, DWORD *cbOut) {

*cbOut = PRF_OUT_LEN;

CryptGetHashParam(hHash, HP_HASHVAL, pbOut, cbOut, 0);

CryptDestroyHash(hHash);

}

static DWORD SwapInt32(DWORD dwInt32) {

_ _asm mov eax, dwInt32

_ _asm bswap eax

}

static BOOL PKCS5InitialPRF(HCRYPTPROV hProvider, HCRYPTKEY hKey,

BYTE *pbSalt, DWORD cbSalt, DWORD dwCounter,

BYTE *pbOut, DWORD *cbOut) {

HCRYPTHASH hHash;

if (!(hHash = InitHMAC(hProvider, hKey, CALG_SHA1))) return FALSE;

CryptHashData(hHash, pbSalt, cbSalt, 0);

dwCounter = SwapInt32(dwCounter);

CryptHashData(hHash, (BYTE *)&dwCounter, sizeof(dwCounter), 0);

FinalHMAC(hHash, pbOut, cbOut);

return TRUE;

}

static BOOL PKCS5UpdatePRF(HCRYPTPROV hProvider, HCRYPTKEY hKey,

BYTE *pbSalt, DWORD cbSalt,

BYTE *pbOut, DWORD *cbOut) {

HCRYPTHASH hHash;

if (!(hHash = InitHMAC(hProvider, hKey, CALG_SHA1))) return FALSE;

CryptHashData(hHash, pbSalt, cbSalt, 0);

FinalHMAC(hHash, pbOut, cbOut);

return TRUE;

}

static BOOL PKCS5FinalPRF(HCRYPTPROV hProvider, HCRYPTKEY hKey,

BYTE *pbSalt, DWORD cbSalt, DWORD dwIterations,

DWORD dwBlock, BYTE *pbOut) {

BYTE pbBuffer[PRF_OUT_LEN];

DWORD cbBuffer, dwIndex, dwIteration = 1;

SecureZeroMemory(pbOut, PRF_OUT_LEN);

if (!(PKCS5InitialPRF(hProvider, hKey, pbSalt, cbSalt, dwBlock, pbBuffer,

&cbBuffer))) return FALSE;

while (dwIteration < dwIterations) {

for (dwIndex = 0; dwIndex < PRF_OUT_LEN; dwIndex++)

pbOut[dwIndex] ^= pbBuffer[dwIndex];

if (!(PKCS5UpdatePRF(hProvider, hKey, pbBuffer, PRF_OUT_LEN, pbBuffer,

&cbBuffer))) return FALSE;

}

for (dwIndex = 0; dwIndex < PRF_OUT_LEN; dwIndex++)

pbOut[dwIndex] ^= pbBuffer[dwIndex];

return TRUE;

}

BOOL SpcPBKDF2(BYTE *pbPassword, DWORD cbPassword, BYTE *pbSalt, DWORD cbSalt,

DWORD dwIterations, BYTE *pbOut, DWORD cbOut) {

BOOL bResult = FALSE;

BYTE pbFinal[PRF_OUT_LEN];

DWORD dwBlock, dwBlockCount, dwLeftOver;

HCRYPTKEY hKey;

HCRYPTPROV hProvider;

if (cbOut > ((((_ _int64)1) << 32) - 1) * PRF_OUT_LEN) return FALSE;

if (!(hProvider = SpcGetExportableContext( ))) return FALSE;

if (!(hKey = SpcImportKeyData(hProvider, CALG_RC4, pbPassword, cbPassword))) {

CryptReleaseContext(hProvider, 0);

return FALSE;

}

dwBlockCount = cbOut / PRF_OUT_LEN;

dwLeftOver = cbOut % PRF_OUT_LEN;

for (dwBlock = 1; dwBlock <= dwBlockCount; dwBlock++) {

if (!PKCS5FinalPRF(hProvider, hKey, pbSalt, cbSalt, dwIterations, dwBlock,

pbOut + (dwBlock - 1) * PRF_OUT_LEN)) goto done;

}

if (dwLeftOver) {

SecureZeroMemory(pbFinal, PRF_OUT_LEN);

if (!PKCS5FinalPRF(hProvider, hKey, pbSalt, cbSalt, dwIterations, dwBlock,

pbFinal)) goto done;

CopyMemory(pbOut + (dwBlock - 1) * PRF_OUT_LEN, pbFinal, dwLeftOver);

}

bResult = TRUE;

done:

CryptDestroyKey(hKey);

CryptReleaseContext(hProvider, hKey);

return bResult;

}

The salt is used to prevent against a dictionary attack. Without salt, a malicious system administrator could easily figure out when a user has the same password as someone else, and he would be able to precompute a huge dictionary of common passwords and look to see if the user's password is in that list.

While salt is not expected to be private, it still must be chosen carefully. See Recipe 4.9 for more on salt.

HOW MANY ITERATIONS?

To what value should you set the iteration count? The answer depends on the environment in which you expect your software to be deployed. The basic idea is to increase computational costs so that a brute-force attack with lots of high-end hardware is as expensive as possible, but not to cause too noticeable a delay on the lowest-end box on which you would wish to run legitimately.

Often, password computations such as these occur on a server. However, there are still people out there who run servers on their 33 MHz machines. We personally believe that people running on that kind of hardware should be able to tolerate a one-second delay, at the very least when computing a password for a modern application. Usually, a human waiting on the other end will be willing to tolerate an even longer wait as long as they know why they are waiting. Two to three seconds isn't so bad.

With that guideline, we have timed our PKCS #5 implementation with some standard input. Based on those timings, we think that 10,000 is good for most applications, and 5,000 is the lowest iteration count you should consider in this day and age. On a 33 MHz machine, 10,000 iterations should take about 2.5 seconds to process. On a 1.67 GHz machine, they take a mere 0.045 seconds. Even if your computation occurs on an embedded processor, people will still be able to tolerate the delay.

The good thing is that it would take a single 1.67 GHz machine more than 6 years to guess 232 passwords, when using PKCS #5 and 10,000 iterations. Therefore, if there really is at least 32 bits of entropy in your password (which is very rare), you probably won't have to worry about any attacker who has fewer than a hundred high-end machines at his disposal, at least for a few years.

Expect governments that want your password to put together a few thousand boxes complete with crypto acceleration, though!

Even with salt, password-guessing attacks are still possible. To prevent against this kind of attack, PKCS #5 allows the specification of an iteration count, which basically causes an expensive portion of the key derivation function to loop the specified number of times. The idea is to slow down the time it takes to compute a single key from a password. If you make key derivation take a tenth of a second, the user won't notice. However, if an attacker tries to carry out an exhaustive search of all possible passwords, she will have to spend a tenth of a second for each password she wants to try, which will make cracking even a weak password quite difficult. As we describe in the sidebar "How Many Iterations?", we recommend an iteration count of 10,000.

The actual specification of the key derivation function can be found in Version 2.0 of the PKCS #5 standards document. In brief, we use a pseudo-random function using the password and salt to get out as many bytes as we need, and we then take those outputs and feed them back into themselves for each iteration.

There's no need to use HMAC-SHA1 in PKCS #5. Instead, you could use the Advanced Encryption Standard (AES) as the underlying cryptographic primitive, substituting SHA1 for a hash function based on AES (see Recipe 6.15 and Recipe 6.16).

See Also

§ RSA's PKCS #5 page: http://www.rsasecurity.com/rsalabs/pkcs/pkcs-5/

§ Recipe 4.9, Recipe 4.11, Recipe 5.26, Recipe 6.15, Recipe 6.16


[3] This standard is available from RSA Security at http://www.rsasecurity.com/rsalabs/pkcs/pkcs-5/.

4.11. Algorithmically Generating Symmetric Keys from One Base Secret

Problem

You want to generate a key to use for a short time from a long-term secret (generally a key, but perhaps a password). If a short-term key is compromised, it should be impossible to recover the base secret. Multiple entities in the system should be able to compute the same derived key if they have the right base secret.

For example, you might want to have a single long-term key and use it to create daily encryption keys or session-specific keys.

Solution

Mix a base secret and any unique information you have available, passing them through a pseudo-random function (PRF), as discussed in the following section.

Discussion

The basic idea behind secure key derivation is to take a base secret and a unique identifier that distinguishes the key to be derived (called a distinguisher ) and pass those two items through a pseudo-random function. The PRF acts very much like a cryptographic one-way hash from a theoretical security point of view, and indeed, such a one-way hash is often good as a PRF.

There are many different ad hoc solutions for doing key derivation, ranging from the simple to the complex. On the simple side of the spectrum, you can concatenate a base key with unique data and pass the string through SHA1. On the complex side is the PBKDF2 function from PKCS #5 (described in Recipe 4.10).

The simple SHA1 approach is perhaps too simple for general-purpose requirements. In particular, there are cases where you one might need a key that is larger than the SHA1 output length (i.e., if you're using AES with 192-bit keys but are willing to have only 160 bits of strength). A general-purpose hash function maps n bits to a fixed number of bits, whereas we would like a function capable of mapping n bits to m bits.

PBKDF2 can be overkill. Its interface includes functionality to thwart password-guessing attacks, which is unnecessary when deriving keys from secrets that were themselves randomly generated.

Fortunately, it is easy to build an n-bit to m-bit PRF that is secure for key derivation. The big difficulty is often in selecting good distinguishers (i.e., information that differentiates parties). Generally, it is okay to send differentiating information that one side does not already have and cannot compute in the clear, because if an attacker tampers with the information in traffic, the two sides will not be able to agree on a working key. (Of course, you do need to be prepared to handle such attacks.) Similarly, it is okay to send a salt. See the sidebar, Distinguisher Selection for a discussion.

DISTINGUISHER SELECTION

The basic idea behind a distinguisher is that it must be unique.

If you want to create a particular derived key, we recommend that you string together in a predetermined order any interesting information about that key, separating data items with a unique separation character (i.e., not a character that would be valid in one of the data items). You can use alternate formats, as long as your data representation is unambiguous, in that each possible distinguisher is generated by a single, unique set of information.

As an example, let's say you want to have a different session key that you change once a day. You could then use the date as a unique distinguisher. If you want to change keys every time there's a connection, the date is no longer unique. However, you could use the date concatenated with the number of times a connection has been established on that date. The two together constitute a unique value.

There are many potential data items you might want to include in a distinguisher, and they do not have to be unique to be useful, as long as there is a guarantee that the distinguisher itself is unique. Here is a list of some common data items you could use:

§ The encryption algorithm and any parameters for which the derived key will be used

§ The number of times the base key has been used, either overall or in the context of other interesting data items

§ A unique identifier corresponding to an entity in the system, such as a username or email address

§ The IP addresses of communicating parties

§ A timestamp, or at least the current date

§ The MAC address associated with the network interface being used

§ Any other session-specific information

In addition, to prevent against any possible offline precomputation attacks, we recommend you add to your differentiator a random salt of at least 64 bits, which you then communicate to any other party that needs to derive the same key.

The easiest way to get a solid solution that will resist potentially practical attacks is to use HMAC in counter mode. (Other MACs are not as well suited for this task, because they tend not to handle variable-length keys.) You can also use this solution if you want an all-block cipher solution, because you can use a construction to convert a block cipher into a hash function (see Recipe 6.15 and Recipe 6.16).

More specifically, key HMAC with the base secret. Then, for every block of output you need (where the block size is the size of the HMAC output), MAC the distinguishers concatenated with a fixed-size counter at the end. The counter should indicate the number of blocks of output previously processed. The basic idea is to make sure that each MAC input is unique.

If the desired output length is not a multiple of the MAC output length, simply generate blocks until you have sufficient bytes, then truncate.

WARNING

The security level of this solution is limited by the minimum of the number of bits of entropy in the base secret and the output size of the MAC. For example, if you use a key with 256 bits of entropy, and you use HMAC-SHA1 to produce a 256-bit derived key, never assume that you have more than 160 bits of effective security (that is the output size of HMAC-SHA1).

Here is an example implementation of a PRF based on HMAC-SHA1, using the OpenSSL API for HMAC (discussed in Recipe 6.10):

#include <sys/types.h>

#include <netinet/in.h>

#include <arpa/inet.h>

#include <openssl/evp.h>

#include <openssl/hmac.h>

#define HMAC_OUT_LEN 20 /* SHA1 specific */

void spc_make_derived_key(unsigned char *base, size_t bl, unsigned char *dist,

size_t dl, unsigned char *out, size_t ol) {

HMAC_CTX c;

unsigned long ctr = 0, nbo_ctr;

size_t tl, i;

unsigned char last[HMAC_OUT_LEN];

while (ol >= HMAC_OUT_LEN) {

HMAC_Init(&c, base, bl, EVP_sha1( ));

HMAC_Update(&c, dist, dl);

nbo_ctr = htonl(ctr++);

HMAC_Update(&c, (unsigned char *)&nbo_ctr, sizeof(nbo_ctr));

HMAC_Final(&c, out, &tl);

out += HMAC_OUT_LEN;

ol -= HMAC_OUT_LEN;

}

if (!ol) return;

HMAC_Init(&c, base, bl, EVP_sha1( ));

HMAC_Update(&c, dist, dl);

nbo_ctr = htonl(ctr);

HMAC_Update(&c, (unsigned char *)&nbo_ctr, sizeof(nbo_ctr));

HMAC_Final(&c, last, &tl);

for (i = 0; i < ol; i++)

out[i] = last[i];

}

Here is an example implementation of a PRF based on HMAC-SHA1, using the Windows CryptoAPI for HMAC (discussed in Recipe 6.10). The code presented here also requires SpcGetExportableContext( ) and SpcImportKeyData( ) from Recipe 5.26.

#include <windows.h>

#include <wincrypt.h>

#define HMAC_OUT_LEN 20 /* SHA1 specific */

static DWORD SwapInt32(DWORD dwInt32) {

_ _asm mov eax, dwInt32

_ _asm bswap eax

}

BOOL SpcMakeDerivedKey(BYTE *pbBase, DWORD cbBase, BYTE *pbDist, DWORD cbDist,

BYTE *pbOut, DWORD cbOut) {

BYTE pbLast[HMAC_OUT_LEN];

DWORD cbData, dwCounter = 0, dwBigCounter;

HCRYPTKEY hKey;

HMAC_INFO HMACInfo;

HCRYPTHASH hHash;

HCRYPTPROV hProvider;

if (!(hProvider = SpcGetExportableContext( ))) return FALSE;

if (!(hKey = SpcImportKeyData(hProvider, CALG_RC4, pbBase, cbBase))) {

CryptReleaseContext(hProvider, 0);

return FALSE;

}

HMACInfo.HashAlgid = CALG_SHA1;

HMACInfo.pbInnerString = HMACInfo.pbOuterString = 0;

HMACInfo.cbInnerString = HMACInfo.cbOuterString = 0;

while (cbOut >= HMAC_OUT_LEN) {

if (!CryptCreateHash(hProvider, CALG_HMAC, hKey, 0, &hHash)) {

CryptDestroyKey(hKey);

CryptReleaseContext(hProvider, 0);

return FALSE;

}

CryptSetHashParam(hHash, HP_HMAC_INFO, (BYTE *)&HMACInfo, 0);

CryptHashData(hHash, pbDist, cbDist, 0);

dwBigCounter = SwapInt32(dwCounter++);

CryptHashData(hHash, (BYTE *)&dwBigCounter, sizeof(dwBigCounter), 0);

cbData = HMAC_OUT_LEN;

CryptGetHashParam(hHash, HP_HASHVAL, pbOut, &cbData, 0);

CryptDestroyHash(hHash);

pbOut += HMAC_OUT_LEN;

cbOut -= HMAC_OUT_LEN;

}

if (cbOut) {

if (!CryptCreateHash(hProvider, CALG_HMAC, hKey, 0, &hHash)) {

CryptDestroyKey(hKey);

CryptReleaseContext(hProvider, 0);

return FALSE;

}

CryptSetHashParam(hHash, HP_HMAC_INFO, (BYTE *)&HMACInfo, 0);

CryptHashData(hHash, pbDist, cbDist, 0);

dwBigCounter = SwapInt32(dwCounter);

CryptHashData(hHash, (BYTE *)&dwBigCounter, sizeof(dwBigCounter), 0);

cbData = HMAC_OUT_LEN;

CryptGetHashParam(hHash, HP_HASHVAL, pbLast, &cbData, 0);

CryptDestroyHash(hHash);

CopyMemory(pbOut, pbLast, cbOut);

}

CryptDestroyKey(hKey);

CryptReleaseContext(hProvider, 0);

return TRUE;

}

Ultimately, if you have a well-specified constant set of distinguishers and a constant base secret length, it is sufficient to replace HMAC by SHA1-hashing the concatenation of the key, distinguisher, and counter.

See Also

Recipe 4.10, Recipe 5.26, Recipe 6.10, Recipe 6.15, Recipe 6.16

4.12. Encrypting in a Single Reduced Character Set

Problem

You're storing data in a format in which particular characters are invalid. For example, you might be using a database, and you'd like to encrypt all the fields, but the database does not support binary strings. You want to avoid growing the message itself (sometimes database fields have length limits) and thus want to avoid encoding binary data into a representation like base64.

Solution

Encrypt the data using a stream cipher (or a block cipher in a streaming mode). Do so in such a way that you map each byte of output to a byte in the valid character set.

For example, let's say that your character set is the 64 characters consisting of all uppercase and lowercase letters, the 10 numerical digits, the space, and the period. For each character, do the following:

1. Map the input character to a number from 0 to 63.

2. Take a byte of output from the stream cipher and reduce it modulo 64.

3. Add the random byte and the character, reducing the result modulo 64.

4. The result will be a value from 0 to 63. Map it back into the desired character set.

Decryption is done with exactly the same process.

See Recipe 5.2 for a discussion of picking a streaming cipher solution. Generally, we recommend using AES in CTR mode or the SNOW 2.0 stream cipher.

Discussion

If your character set is an 8-bit quantity per character (e.g., some subset of ASCII instead of Unicode or something like that), the following code will work:

typedef struct {

unsigned char *cset;

int csetlen;

unsigned char reverse[256];

unsigned char maxvalid;

} ENCMAP;

#define decrypt_within_charset encrypt_within_charset

void setup_charset_map(ENCMAP *s, unsigned char *charset, int csetlen) {

int i;

s->cset = charset;

s->csetlen = csetlen;

for (i = 0; i < 256; i++) s->reverse[i] = -1;

for (i = 0; i < csetlen; i++) s->reverse[charset[i]] = i;

s->maxvalid = 255 - (256 % csetlen);

}

void encrypt_within_charset(ENCMAP *s, unsigned char *in, long inlen,

unsigned char *out, unsigned char (*keystream_byte)( )) {

long i;

unsigned char c;

for (i = 0; i < inlen; i++) {

do {

c = (*keystream_byte)( );

} while(c > s->maxvalid);

*out++ = s->cset[(s->reverse[*in++] + c) % s->csetlen];

}

}

The function setup_charset_map( ) must be called once to set up a table that maps ASCII values into an index of the valid subset of characters. The data type that stores the mapping data is ENCMAP . The other two arguments are charset, a list of all characters in the valid subset, and csetlen, which specifies the number of characters in that set.

Once the character map is set up, you can call encrypt_within_charset( ) to encrypt or decrypt data, while staying within the specified character set. This function has the following arguments:

s

Pointer to the ENCMAP object.

in

Buffer containing the data to be encrypted or decrypted.

inlen

Length in bytes of the input buffer.

out

Buffer into which the encrypted or decrypted data is placed.

keystream_byte

Pointer to a callback function that should return a single byte of cryptographically strong keystream.

This code needs to know how to get more bytes of keystream on demand, because some bytes of keystream will be thrown away if they could potentially be leveraged in a statistical attack. Therefore, the amount of keystream necessary is theoretically unbounded (though in practice it should never be significantly more than twice the length of the input). As a result, we need to know how to invoke a function that gives us new keystream instead of just passing in a buffer of static keystream.

It would be easy (and preferable) to extend this code example to use a cipher context object (keyed and in a streaming mode) as a parameter instead of the function pointer. Then you could get the next byte of keystream directly from the passed context object. If your crypto library does not allow you direct access to keystream, encrypting all zeros returns the original keystream.

WARNING

Remember to use a MAC anytime you encrypt, even though this expands your message length. The MAC is almost always necessary for security! For databases, you can always base64-encode the MAC output and stick it in another field. (See Recipe 6.9 for how to MAC data securely.)

Note that encrypt_within_charset( ) can be used for both encryption and decryption. For clarity's sake, we alias decrypt_within_charset( ) using a macro.

The previous code works for fixed-size wide characters if you operate on the appropriate sized values, even though we only operate on single characters. As written, however, our code isn't useful for variable-byte character sets. With such data, we recommend that you accept a solution that involves message expansion, such as encrypting, then base64-encoding the result.

See Also

Recipe 5.2, Recipe 6.9

4.13. Managing Key Material Securely

Problem

You want to minimize the odds of someone getting your raw key material, particularly if they end up with local access to the machine.

Solution

There are a number of things you can do to reduce these risks:

§ Securely erase keys as soon as you have finished using them. Use the spc_memzero( ) function from Recipe 13.2.

§ When you need to store key material, password-protect it, preferably using a scheme to provide encryption and message integrity so that you can detect it if the encrypted key file is ever modified. For example, you can use PBKD2 (see Recipe 4.10) to generate a key from a password and then use that key to encrypt using a mode that also provides integrity, such as CWC (see Recipe 5.10). For secret keys in public key cryptosystems, use PEM-encoding, which affords password protection (see Recipe 7.17).

§ Store differentiating information with your medium- or long-term symmetric keys to make sure you don't reuse keys. (See Recipe 4.11.)

See Also

Recipe 4.10, Recipe 4.11, Recipe 5.10, Recipe 7.17, Recipe 13.2

4.14. Timing Cryptographic Primitives

Problem

You want to compare the efficiency of two similar cryptographic primitives and would like to ensure that you do so in a fair manner.

Solution

Time operations by calculating how many cycles it takes to process each byte, so that you can compare numbers across processors of different speeds more fairly.

Focus on the expected average case, best case, and worst case.

Discussion

When you're looking at timing information, you usually have one of two motivations: either you're interested in comparing the performance of two algorithms, or you'd like to get a sense of how much data you'll actually be able to pump through a particular machine.

Measuring bytes per second is a useful thing when you're comparing the performance of multiple algorithms on a single box, but it gives no real indication of performance on other machines. Therefore, cryptographers prefer to measure how many processor clock cycles it takes to process each byte, because doing so allows for comparisons that are more widely applicable. For example, such comparisons will generally hold fast on the same line of processors running at different speeds.

If you're directly comparing the speed of an algorithm on a 2GHz Pentium 4 against the published speed of the same algorithm run on a 800 MHz Pentium 3, the first one will always be faster when measured in bytes per second. However, if you convert the numbers from bytes per second to cycles per byte, you'll see that, if you run the same implementation of an algorithm on a P3 and a P4, the P3 will generally be faster by 25% or so, just because instructions on a P4 take longer to execute on average than they do on a P3.

If you know the speed of an algorithm in bytes per second, you can calculate the number of cycles per byte simply by dividing by the clock speed in hertz (giving you bytes per cycle) and taking the reciprocal (getting cycles per byte). If you know the speed measured in gigabytes per second, you can divide by the clock speed in gigahertz, then take the reciprocal. For example, you can process data at 0.2 gigabytes per second on a 3 GHz CPU as follows:

.2/3 = 0.066666666666666666 (bytes processed per cycle)

1/0.066666666666666666 = 15.0 cycles per byte

For many different reasons, it can be fairly difficult to get timing numbers that are completely accurate. Often, internal clocks that the programmer can read are somewhat asynchronous from the core processor clock. More significantly, there's often significant overhead that can be included in timing results, such as the cost of context switches and sometimes timing overhead.

TIP

Some CPUs, such as AMD's Athlon, are advertised such that the actual clock speed is not obvious. For example, the Athlon 2000 runs at roughly 1666 MHz, significantly less than the 2000 MHz one might suspect.

Generally, you'll want to find out how quickly a primitive or algorithm can process a fixed amount of data, and you'd like to know how well it does that in a real-world environment. For that reason, you generally shouldn't worry much about subtracting out things that aren't relevant to the underlying algorithm, such as context switches and procedure call overhead. Instead, we recommend running the algorithm many times and averaging the total time to give a good indication of overall performance.

In the following sections we'll discuss timing basics, then look at the particulars of timing cryptographic code.

Timing basics

You need to be able to record the current time with as much precision as possible. On a modern x86 machine, it's somewhat common to see people using inline assembly to call the RDTSC instruction directly, which returns the number of clock cycles since boot as a 64-bit value. For example, here's some inline assembly for GCC on 32-bit x86 platforms (only!) that reads the counter, placing it into a 64-bit unsigned long long that you pass in by address:

#define current_stamp(a) asm volatile("rdtsc" : "=a"(((unsigned int *)(a))[0]),\

"=d"(((unsigned int *)a)[1]))

The following program uses the above macro to return the number of ticks since boot:

#include <stdio.h>

int main(int argc, char *argv[ ]) {

spc_uint64_t x;

current_stamp(&x);

printf("%lld ticks since boot (when I read the clock).\n", x);

return 0;

}

RDTSC is fairly accurate, although processor pipelining issues can lead to this technique's being a few cycles off, but this is rarely a big deal.

On Windows machines, you can read the same thing using QueryPerformanceCounter( ) , which takes a pointer to a 64-bit integer (the LARGE_INTEGER or _ _int64 type).

You can get fairly accurate timing just by subtracting two subsequent calls to current_stamp( ) . For example, you can time how long an empty for loop with 10,000 iterations takes:

#include <stdio.h>

int main(int argc, char *argv[ ]) {

spc_uint64_t start, finish, diff;

volatile int i;

current_stamp(&start);

for (i = 0; i < 10000; i++);

current_stamp(&finish);

diff = finish - start;

printf("That loop took %lld cycles.\n", diff);

return 0;

}

On an Athlon XP, compiling with GCC 2.95.4, the previous code will consistently give 43-44 cycles without optimization turned on and 37-38 cycles with optimization turned on. Generally, if i is declared volatile, the compiler won't eliminate the loop, even when it can figure out that there are no side effects.

Note that you can expect some minimal overhead in gathering the timestamp to begin with. You can calculate the fixed timing overhead by timing nothing:

int main(int argc, char *argv[ ]) {

spc_uint64_t start, finish, diff;

current_stamp(&start);

current_stamp(&finish);

diff = finish - start;

printf("Timing overhead takes %lld cycles.\n", diff);

return 0;

}

On an Athlon XP, the overhead is usually reported as 0 cycles and occasionally as 1 cycle. This isn't really accurate, because the two store operations in the first timestamp call take about 2 to 4 cycles. The problem is largely due to pipelining and other complex architectural issues, and it is hard to work around. You can explicitly introduce pipeline stalls, but we've found that doesn't always work as well as expected. One thing to do is to time the processing of a large amount of data. Even then, you will get variances in timing because of things not under your control, such as context switches. In short, you can get within a few cycles of the truth, and beyond that you'll probably have to take some sort of average.

A more portable but less accurate way of getting timing information on Unix-based platforms is to ask the operating system for the clock using the gettimeofday( ) function. The resolution varies depending on your underlying hardware, but it's usually very good. It can be implemented using RDTSC but might have additional overhead. Nonetheless, on most operating systems, gettimeofday( ) is very accurate.

OTHER WAYS TO GET THE TIME

On many machines, there are other ways to get the time. One way is to use the POSIX times( ) function, which has the advantage that you can separate the time your process spends in the kernel from the time spent in user space running your code. While times( ) is obsoleted on many systems, getrusage( ) does the same thing.

Another alternative is the ISO C89 standard function, clock( ) . However, other timers we discuss generally provide resolution that is as good as or better than this function.

Here's a macro that will use gettimeofday( ) to put the number of microseconds since January 1, 1970 into an unsigned 64-bit integer (if your compiler does not support a 64-bit integer type, you'll have to store the two 32-bit values separately and diff them properly; see below).

#include <sys/time.h>

#define current_time_as_int64(a) { \

struct timeval t; \

gettimeofday(&t, 0); \

*a = (spc_uint64_t)((t.tv_sec * 1000000) + t.tv_usec); \

}

Attackers can often force the worst-case performance for functionality with well-chosen inputs. Therefore, you should always be sure to determine the worst-case performance characteristics of whatever it is you're doing, and plan accordingly.

WARNING

The gettimeofday( ) -based macro does not compute the same thing the RDTSC version does! The former returns the number of microseconds elapsed, while the latter returns the number of cycles elapsed.

You'll usually be interested in the number of seconds elapsed. Therefore, you'll need to convert the result of the gettimeofday( ) call to a number of cycles. To perform this conversion, divide by the clock speed, represented as a floating-point number in gigahertz.

Because you care about elapsed time, you'll want to subtract the starting time from the ending time to come up with elapsed time. You can transform a per-second representation to a per-cycle representation once you've calculated the total running time by subtracting the start from the end. Here's a function to do both, which requires you to define a constant with your clock speed in gigahertz:

#define MY_GHZ 1.6666666666666667 /* We're using an Athlon XP 2000 */

spc_uint64_t get_cycle_count(spc_uint64_t start, spc_uint64_t end) {

return (spc_uint64_t)((end - start) / (doublt)MY_GHZ);

}

Timing cryptographic code

When timing cryptographic primitives, you'll generally want to know how many cycles it takes to process a byte, on average. That's easy: just divide the number of bytes you process by the number of cycles it takes to process. If you wish, you can remove overhead from the cycle count, such as timing overhead (e.g., a loop).

One important thing to note about timing cryptographic code is that some types of algorithms have different performance characteristics as they process more data. That is, they can be dominated by per-message overhead costs for small message sizes. For example, most hash functions such as SHA1 are significantly slower (per byte) for small messages than they are for large messages.

You need to figure out whether you care about optimal performance or average-case performance. Most often, it will be the latter. For example, if you are comparing the speeds of SHA1 and some other cryptographic hash function such as RIPEMD-160, you should ask yourself what range of message sizes you expect to see and test for values sampled throughout that range.