﻿ ﻿Decoding, Decryption, Attacks - Foundations of Coding - Foundations of Coding: Compression, Encryption, Error Correction (2015)

Foundations of Coding: Compression, Encryption, Error Correction (2015)

1.4 Decoding, Decryption, Attacks

To conclude this introductory chapter, we develop encoding methods adopting the point of view of the inverse operation, decoding. We have already seen that, for many reasons, decoding – which consists in inverting the encoding functions – is not a trivial task:

· as in the fax code we detailed at the beginning of this chapter, if the ciphertext is a sequence of bits, recovering the source message without ambiguity by parsing the sequence into blocks requires a particular form of the code;

· an exact decoding is sometimes not even completely attainable, if the encodingmethod does not include all the initial information, when performing a compression for example; we have seen that the fax image loses in quality; there exist many encoding methods “ with loss,” which makes encoding and decoding asymmetrics;

· we have seen that the principle of one-way functions makes the computation of the encoding function and the decoding function completely different; it may happen that there is an efficient algorithm for one of them but not for the other.

We are going to develop all these aspects, using the word decoding as a general term allowing one to recover the source from a codeword, the word decryption for cryptographic decoding, and the words breaking or attack for an unauthorized decryption, namely when the recipient is the only one possessing the information.

1.4.1 Decoding without Ambiguity

The first virtue of a code is its ability to be decoded. This is obvious, but not necessarily a trivial issue.

Let us suppose that the code is a bijective function, which transforms the message written by the sender into a message transmitted through the channel. For a source message , a string over any source alphabet, and for a code alphabet V, let us denote by f the encoding function. Then, one has the codeword , with for all i. The code, seen as the set of all codewords, is then the image of the encoding function f. However, f being bijective is not enough for the message to be decoded without ambiguity by the recipient.

As an example, let us consider the encoding of the alphabet letters using integers written in base 10:

Then, the codeword 1209 may correspond to several messages: for example, BUJ, MAJ, or BCAJ.

Thus, in order to avoid such a problem, one has to add some constraints on the code for any message to be decoded without ambiguity. That is to say, when receiving a codeword, the recipient has to be able to recover a unique message from it. A code C over an alphabet V is called nonambiguous (one sometimes says uniquely decodable) if, for all , there exists at most one sequence such that

The following property is just a simple reformulation:

Property 8

A code C over an alphabet V is nonambiguous if and only if for all sequences and in :

Example 1.10 (Over the Alphabet )

· the code is not uniquely decodable.

· the code is uniquely decodable.

· the code is uniquely decodable.

The decoding constraint implies that all codewords should have a minimum length. Kraft's theorem gives a necessary and sufficient condition on the length of codewords to insure the existence of a uniquely decodable code.

Theorem 14 (Kraft)

There exists a uniquely decodable code over some alphabet V with n codewords of length if and only if

Proof

() Let C be a uniquely decodable code, of arity q (the vocabulary of the code contains q characters). Let m be the length of the longest word in C. For , let be the number of words of length k. One develops the following expression, for any integer u, with :

Once developed, each term of this sum is of the form Then, by regrouping for each value , one obtains the terms Set . The initial expression can be written as follows:

Notice that is the number of combinations of words in C whose overall length is equal to s. As C is uniquely decodable, two combinations of words in C cannot be equal to the same word over the alphabet of C. As C is of arity q, and is lower than the overall number of messages of length s on this alphabet, one has . This implies that

Thus, , and when u tends toward infinity. () The reciprocal proposition is a consequence of McMillan's theorem, which is studied later on in this chapter.

1.4.1.1 Prefix Property

One says that a code C over an alphabet V has the prefix property (one sometimes says that it is instantaneous, or irreducible) if and only if for all pairs of distinct codewords is not a prefix of .

Example 1.11

: b is not a prefix of a. However, c is a prefix of a.

If the prefix property applies, one is able to decode the words of such a code as soon as one has received the whole word (instantaneousness), which is not always the case with uniquely decodable codes: for instance, if , and one receives message . Then one will have to wait for the next occurrence of a 0 to be able to decode the second word (0 or 01?).

Property 9

Any code having the prefix property is uniquely decodable.

Proof

Let C be a code over V that is not uniquely decodable and has the prefix property. Then there exists a string such that , with and codewords of C and for at least one index i. Let us choose the minimum index isuch that (for all ). Then , otherwise, given the choice of i, one would have , which contradicts the definition of i. If , then is a prefix of . Otherwise, is a prefix of . Thus, Cdoes not have the prefix property.

The reciprocal proposition is false: indeed, the code is uniquely decodable but it does not have the prefix property. The following property is obvious, but it insures the decoding ability for some widely used kinds of codes.

Property 10

If all the words of some code are of the same length, then it has the prefix property.

Exercise 1.32

Let S be the source of alphabet with probabilities:One encodes S using the following codes:

2. Is it an instantaneous code?

3. Compute the entropy of the source.

Solution on page 291.

Exercise 1.33

We wish to build a binary compression code over a source (supposed to be infinite) where is the source alphabet and is the probability law of .

One proposes the following code: one enumerates the number of occurrences of “0” before the appearance of “1.” The two encoding rules are as follows:

· A string of four consecutive “0”s (without “1”) is encoded with 0.

· If less than four “0s” appear before a symbol “1,” one encodes the string with the codeword “,” being the binary representation of the number of “0s” before the symbol “1.”

For instance, the appearance of four consecutive zeros “0000” is encoded with “0,” whereas the string “001” is encoded with “110” because two “0”s appear before the symbol “1” (and “10” is the binary representation of 2).

1. Write explicitly the five codewords of this compression code. Does this code have the prefix property?

2. 2. Knowing that the probability of appearance of two successive symbols and is – when supposing that the source is without memory – , compute the probability of occurrence in of a string composed of k “0”s followed by a “1.”

3. For each codeword, compute the number of bits of code required per bit of the source. Deduce the compression rate of this code, namely the mean length per bit of the source.

Solution on page 291.

1.4.1.2 Huffman Trees

A Huffman tree is an object that enables one to easily represent all codes having the prefix property, and this representation makes their manipulation a lot easier. Here, we give the definitions in the binary case. However, these can be extended to codes of any arity.

One calls a Huffman tree a binary tree such that any subtree has either 0 or 2 sons (the tree is locally complete). One assigns the symbol “1” to the edge connecting the local root to the left subtree and “0” to the edge connecting the local root to the right subtree.

Figure 1.12 Example of a Huffman tree

To each leaf of a Huffman tree, one can associate a word in : it is a string composed of the symbols marking the edges of the path from the root to the leaf.

The maximum length of the words in a Huffman tree is called the depth of the tree. One calls a Huffman code the set of words corresponding to the paths in a Huffman tree; the depth of this tree is also called depth of the code C.

Example 1.12 (Code corresponding to the tree of Figure 1.12)

1.4.1.3 Representation of Instantaneous Codes

The introduction of Huffman trees is justified by the two following properties, which enable one to manipulate instantaneous codes with such trees.

Property 11

A Huffman code has the prefix property.

Proof

If a codeword is a prefix of , then the path representing in the Huffman tree is included in the path representing . As and are, by definition, associated with the leaves of the tree, . Thus, there do not exist two different codewords such that one of them is a prefix of the other. Hence, the Huffman code has the prefix property.

Property 12

Any code having the prefix property is included in a Huffman code.

Proof

Let us consider a complete Huffman tree (all leaves are at the same distance from the root) of depth l (the length of the longest word in C). Each codeword in C is associated to a path from the root to a node. Then, one can prune the subtree having this node as a root (all the words that could be represented in the nodes of this subtree would have as a prefix). All other codewords in C remain in the nodes of the resulting tree. It is possible to perform the same operation for all the other words. One eventually obtain a Huffman tree containing all codewords in C.

1.4.1.4 McMillan's Theorem

We have seen that Huffman trees enable one to represent all instantaneous codes. However, they do not enable one to represent all uniquely decodable codes. McMillan's theorem insures that one can avoid the use of uniquely decodable codes (nonambiguous) not having the prefix property. Indeed, there always exists another code that has the prefix property with the same lengths of codewords. Therefore, nothing can be gained by using ambiguous codes.

Theorem 15 (McMillan)

Over an alphabet V, there exists a code having the prefix property whose codewords are of length if and only if

Exercise 1.34

Using the representation of instantaneous codes with Huffman trees, give a proof of McMillan's theorem.

Solution on page 291.

Corollaire 1.81

If there exists a uniquely decodable code whose words are of length , then there exists an instantaneous code whose words are of the same length.

This is a consequence of Kraft's and McMillan's theorems. All decodable codes not having the prefix property do not produce codes with shorter words than instantaneous codes; therefore, one can limit oneself to the latter codes for information compression (their properties make them easier to use).

1.4.2 Noninjective Codes

All codes do not insure a nonambiguous decoding and are not even bijective. It might happen, for several reasons, that encoding functions only process a digest (or a fingerprint) of a message, or only the information which is considered to be sufficient for the needs of transmission. For instance, fingerprints are used for error detection: when receiving a message and its fingerprint, the recipient is able to check that the overall message has not been modified during the transmission by recomputing the fingerprint from the message he has received and comparing it to the fingerprint that was transmitted.

Lossy compression is used, for example, in processing images or sounds. The information is encoded in a way that will enable one to retrieve maybe only a variation of the original data. The differences should be slight enough so that they are not perceptible (for human ear or eye) or so that the new data is still useful.

1.4.2.1 Fingerprint Integrity Check

The most simple principle of error detection is an example of fingerprint computation. We have seen an example of this kind of encoding with the fax code, even if the code we added to each line did not depend on the content of the line. Besides, this only had a limited detection capacity.

The first principle of fingerprints for error detection is the addition of a simple parity bit to the cipherblocks. For a word , the parity bit is equal to . Obviously, this equality is false when an odd number of bits change their value in the set “ message+fingerprint.” Hence, the addition of a parity bit enables one to detect errors on a odd number of bits. We will see this mechanism in more detail in Chapter 4, in particular in Figure 4.2 on page 212.

1.4.2.2 Hash Functions

The hash function follows the same principle, but it encodes a more evolved fingerprint, as it is meant to identify the message. This is the definition of a summary of the message, which will enable one not to recover it, but to identify it using a correspondence table. This works as a human fingerprint, which does not enable one to reconstitute the other characteristics of an individual but which enables one to identify him.

Hash functions are particularly useful in cryptography. They notably enable one to decrease the amount of information to be encrypted. If the image of x by the hash function is called the fingerprint of x, one can – for example – encrypt only the fingerprint. Moreover, they enable one to set up electronic signature and message authentication protocols (see Section 3.5.3) and also to check the integrity of a document, in the same way as the parity bit (which is a particular hash function). Formally, a hash function is an application that transforms a string of any size into a string of fixed size n, as illustrated in Figure 1.13.

Figure 1.13 Principle of a hash function

One talks about a collision between x and when

Considering that the input of a hash function can be of any size (in particular ), collisions are inevitable. If y is such that , then x is called the preimage of y (one recalls that y is the fingerprint of x).

One of the basic constraints in setting up a hash function is efficiency: a fingerprint must be easy to compute. Besides, hash functions have a natural compression property.

Other properties can be expected:

· Preimage resistant: given y, one cannot find – in reasonable time – some x such that .

· Second preimage resistant: given x, one cannot find – in reasonable time – such that ;

· Collision resistant: one can not find in reasonable time – x and such that ;

A one-way hash function is a hash function satisfying the properties of preimage resistance and second preimage resistance.

Exercise 1.35 (Security of Hash Functions)

Prove, using the contrapositive proposition, that collision resistance implies second preimage resistance, which implies preimage resistance.

Solution on page 292.

Exercise 1.36 (A Bad Hash Functionxs)

Let be any function. One proposes an iterative hash function , such that, for some x of size bits, divided into two blocks and , one has where is the concatenation of and . Prove that g is not second preimage resistant.

Solution on page 292.

Hash functions can be used for

· Manipulation Detection Code (MDC) that enable one to check the integrity of a message (in the manner of parity bits);

· MAC that manage both the integrity and the authentication of the source of data.

We will see several examples of such applications in Chapter 3.

Construction of a Merkle–Damgård hash function One of the most famous constructions of hash functions relies on a compression function

Such a function is illustrated in Figure 1.14.

Figure 1.14 Compression function of a hash function

Message M is split into blocks of b bits (one will possibly have to add some padding bits for the size of M to be divisible by b).

One iterates the compression function h according to the scheme presented in Figure 1.15.

Figure 1.15 Merkle–Damgård construction

IV (Initial Value) is a string (of size n) fixed by the algorithm or the implementation. A theorem – which was proved independently by Ralph Merkle and Ivan Damgård – enables one to describe the theoretical properties of such a construction:

Theorem 16 (Merkle–Damgård)

If h is collision resistant, then so is H (Figure 1.15).

It is this result that actually makes Merkle–Damgård construction the most used construction in fingerprint computation.

Exercise 1.37

Prove Merkle–Damgård's theorem using the contrapositive proposition.

Solution on page 292.

Exercise 1.38 (Construction by Composition)

Let be a hash function and let another hash function such that, if , then , standing for the concatenation operation.

1. Prove that if f is collision resistant, then so is h.

2. What is the drawback of this construction ?

Solution on page 293.

Hence, one only needs to make explicit the construction of compression functions h, which are collision resistant. For instance, the Davies–Meyer construction (Figure 1.16) defines , where is a symmetric block encryption function.

Figure 1.16 Davies–Meyer construction

But an attack on preimage resistance was set up by Drew Dean in 1999, who exploited the existence of fixed points in this construction. Therefore, compression functions using this construction are less robust.

The Miyaguchi–Preneel construction (Figure 1.17) is an improvement on the previous construction and is particularly robust from a cryptographic point of view.

Figure 1.17 Miyaguchi–Preneel construction

Function g adapts the construction to the size of the key of the encryption function E. Hence, one has .

Galois hashing Another popular hash function is GHASH, for Galois hashing, which uses multiplication in the field with elements and Horner scheme. The idea is to choose an element h of where the field is usually build as polynomials modulo 2 and modulo the primitive polynomial . Then, a message m is cut into blocks of bits and each block is considered as a coefficient of the reverse polynomial . The hash value is obtained as the evaluation of this polynomial in hvia Algorithm 1.7, where each block is considered as the element :

Exercise 1.39 (Security of GHASH)

1. Suppose that there exist i and j with such that for the chosen element of the multiplication, we have . Deduce a way to build a collision in the hash function.

2. We know that , with . How many possible distinct orders are there for elements of ?

3. For a given h, with a given order o, what size of message would be required to obtain a collision using the first question?

4. If you were to choose an element h for the hashing, which elements would be best?

5. For a randomly chosen non zero element h, what is the probability that does not divide its order?

6. If this is not the case what size of message would be required to obtain a collision using the first question?

Solution on page 293.

1.4.2.3 Lossy Transformations

Other transformation processes end with a message encoded with loss and make full decoding impossible. For example, a fax is just a bad copy of the original image, but one sees that it fulfills its duty of transmitting the information contained in the document. One can also encode an image (respectively a sound) without keeping all the information, provided that the loss is not noticeable to the naked-eye (respectively to the ear).

Numerical information is, in essence, discrete information. We have seen that continuous or analogical data, like sounds and images, can be easily digitized. As for sounds, one can encode at each instant a frequency and an amplitude. For images, one decomposes them into pixels and encodes a color for each pixel.

Yet, this natural digitization might not be the best model for many codes. In images, for example, the colors of two contiguous pixels are often not independent, and it will be more judicious to encode a set of pixels as a function rather than a single pixel as a value. Therefore, one encodes blocks of pixels with a periodic (or almost periodic) function.

Hence, encoding happens to be performed on functions rather than on discrete or numerical entities, and this is the principle of the following section. Therefore, encoding will be a particular function, which will be applied to functions, and that we will call rather atransform, or a transformation.

1.4.2.4 Fourier Transform and Discrete Fourier Transform (DFT)

Let us suppose that a message, or part of a message, can be formulated as a periodic and integrable function h (more precisely h should be in ), varying with respect to time t, and of period . This happens with sounds. As we assume that h is periodic, the same message can be formulated as an amplitude H, varying with respect to a frequency f. The Fourier transform is an encoding process that enables one to switch from one representation another. Like any encoding process, it is associated to its inverse, decoding, and is formulated with the following formulas (Figure 1.18).

Figure 1.18 Fourier transform

For a sound, even if the natural and immediate encoding is rather , one often uses that encodes exactly the same information – as encoding is reversible – and is a lot cheaper, because it makes good use of the periodicity of h. Therefore, the Fourier transform is very efficient for compression.

The DFT follows the same principle but with discrete functions. This will obviously be very useful as – by essence – our numerical messages have to be encoded with discrete information. Now let us suppose that our functions and are discrete functions, that is to say some vectors and with discrete variables. One formulates the transformation by denoting an root of unity. satisfies the equalities: .

Figure 1.19 Discrete Fourier Transform (DFT)

In other words following Figure 1.19, if , then

1.14

Decoding is correct as

The Discrete Cosine Transform (DCT) is a direct consequence of the DFT for some discrete function h. But instead of being time-varying (which is a good model for a sound), it is space varying (which enables one to encode an image); hence, h is a two-variable discrete function . For instance, it is the color of a pixel whose coordinates are x and y. In the same way, it is possible to represent differently the same information with a two-variable discrete function standing for a spectral analysis of the image. The DCT and its inverse are shown in Figure 1.20, where whenever and .

Figure 1.20 Discrete Cosine Transform (DCT)

DCT is also a good compression principle for images, as for any periodic (or almost periodic, i.e., periodic up to a small error) message.

These transformations are reversible. Moreover, not only do they prove themselves to be good compression processes but their interest also lies in the easiness of choosing and keeping only important information. Indeed, during such encoding, it is possible to keep only some coefficients of the DFT or the DCT – to reduce the size of information one has to encode – while not necessarily changing the audio/visual result. We will handle these principles in more detail in Chapter 2.

1.4.2.5 DFT Algorithm

One can write the DFT transformation as a matrix with . Thus, the inverse transformation can be written as .

Remark 2

In some fields, simply does not exist. It is therefore sometimes useful to define the transform with another constant factor: and . For the sake of simplicity, in the following, we will avoid and use so that and .

An immediate algorithm for the calculation of this transform uses a matrix vector product and thus has a complexity of .

A “ divide and conquer” algorithm decreases this complexity, which is extremely important for encoding. The “ divide and conquer” principle is to split the problem into equivalent subproblems of lower size. Here, one divides the expression of the transform into two parts. Assuming that n is even, and setting , one has

However, as is an root of unity, is equal to 1 or according to the parity of k.

If k is even, then one defines the vector

and the even coefficients of H (the transform of h) are the coefficients of the transform of , which is half the size of h:

Now, if k is odd, one defines the vector

and the odd coefficients of H (the transform of h) are the coefficients of the transform of , which is half the size of h:

One obtains Algorithm 1.12.

The complexity of this algorithm is , in consequence . It is almost a linear complexity, thus an important improvement with respect to the initial algorithm. Moreover, the algorithm for the inverse Fourier transform is then straightforward, as

1.4.2.6 DFT and Roots of Unity in a Finite Field

One considers the polynomial in a field with . An root of unity in is a simple root, if there exists such root, of the polynomial . The order of a root of unity is the least integer o such that . As is a root of , one has obviously . Besides, o divides n. Indeed, if one sets , then one has . Thus .

A primitive root of unity is an root of unity of order n.

This notion is crucial for the application of the DFT: in order to compute the DFT in the field , we used a particular root – – which is primitive in .

Now, primitive roots are available for any n in , whereas one is not even sure of their existence in a given finite field. Indeed, in , a primitive root of unity is what we simply called a primitive root in Section 1.3.5.3. In the same way as we did for these roots, the following theorem enables one to determine the fields in which it is possible to make fast calculations on vectors of a given size n:

Theorem 17

Let q be a power of some prime number and let n be coprime with q. The finite field contains an primitive root of unity if and only if n divides .

Proof

If a is an primitive root, then and n is the order of a. As a is also an element of the field with q elements, its order necessarily divides . Reciprocally, one uses a generator g of the field (a primitive root) whose existence is ensured by the algorithm in Section 1.3.5.3. Hence, if then is an primitive root of unity.

One says that a field supports the DFT at order n if there exist primitive roots of unity in this field. All fields supporting DFT for n equal to a power of 2 are obviously very interesting for applying the fast divide and conquer algorithm above. For instance, as we will see in Algorithm 1.13, the field with elements enables one to multiply polynomials of degree up to with the fast algorithm as .

Therefore, one has to compute such an root of unity. It is possible to use a generator, but one would rather use a variant of the algorithm presented in Section 1.3.5.3: draw randomly an root (a root of the polynomial in ) and test whether its order is actually n. The following corollary gives us the probability of success: chances over n.

Corollaire 1.90

If a finite field has at least one primitive root of unity, then it has exactly primitive roots.

Proof

Let . Let g be a generator of the field. Thus, is an primitive root, as well as all for t between 1 and coprime with n. All these are distinct; otherwise g would not be a generator; and these are the only ones as with t not coprime to n is of order strictly lower than n. The primitive roots are necessarily written as : if is an primitive root, then . As g is a generator, one has . This proves that u is in the form tk.

Exercise 1.40

Find a primitive root of unity modulo 31.

Solution on page 293.

However, if the field does not contain an primitive root of unity, one may extend the field. In the same way as with respect to , one can consider a field containing in which the polynomial can be completely factorized into polynomials of degree 1. This field is an extension of the field and it is called a splitting field of .

Exercise 1.41

Find a primitive root of unity in a field of characteristic .

Solution on page 293.

1.4.2.7 Fast Product of Polynomials Using DFT

As for the discrete Fourier transform (DFT), the product of two polynomials – which is often used in coding theory (see all constructions based on polynomials in this chapter) – has a naive algorithm of complexity bound .

The DFT and the calculation algorithm we have just seen enables one to perform this computation in time .

Given two polynomials and , one denotes by and the respective discrete Fourier transform of vectors and , where the coefficients of the polynomials are extended with zeros up to the degree (degree of the product). Then, from the definition of the transform, one can immediately write the coefficients as and . By simply multiplying these two scalars and using the arithmetic of polynomials, one obtains the value of theproduct evaluated at : ; in other words .

This property enables one to build Algorithm 1.13 that computes the product of two polynomials in time .

The complexity is truly , as termwise multiplication is linear and the Fourier transform – as well as its inverse - has a complexity .

1.4.3 Cryptanalysis

We have studied some skewness properties between encoding and decoding. One of them, probably the most important one in cryptography, is that which differentiates decryption (by the recipient) from breaking (by a third party). We dedicate a small part of this book to attack techniques based on the weaknesses of codes, which are developed too quickly.

Cryptographic codes use pseudo-random generators for secret key generation, hash functions for authentication, and one-way functions for public key techniques. We will present separately known attacks for each of these steps. Knowing these attack techniques is essential to generate codes that can resist them.

1.4.3.1 Attacks on Linear Congruential Generator

Linear congruential generators have been looked at in Section 1.3.7 A random number is generated as a function of the previously generated number , using the formula .

Exercise 1.42 (Attack on LCGs)

· If m is prime, what is the maximum period of an LCG? In particular, Fishman and Moore studied generators modulo in 1986. They determined that if then the period is maximum and the generator has good statistical properties. What can you say about ?

· For with p an odd prime, what is the maximum period of an LCG?

One can prove that if is the maximum period, then for and that if with distinct primes.

· Assuming that m is known, how can one recover a and b?

· Now, suppose that m is unknown. How can one find the generator if ? Hint: one may study . What happens if ?

· What is the next integer in this list: 577,114,910,666,107?

Solution on page 294.

1.4.3.2 Berlekamp–Massey Algorithm for the Synthesis of LFSRs

The Berlekamp–Massey algorithm enables one to detect, for an infinite sequence , , of elements in a field , whether – beyond some rank – its elements are linear combinations of the previous elements. This is what we have called linear recurrent sequences.

This algorithm is very useful in coding theory, notably in order to perform the cryptanalysis of random generators and cryptographic keys and even to correct the errors of cyclic codes (Section 4.4.6.1). In particular, it enables one to recover the generator polynomial of an LFSR (Section 1.3.7.2) only knowing the first terms of the sequence generated by this LFSR.

The question is, for the sequence , how to find coefficients , if they exist, such that

If one uses these constants in order to define the polynomial , this polynomial is called an annihilator of the sequence.

The set of annihilators is an ideal in the ring of polynomials over . As is a principal ideal ring, there exists a unitary annihilator polynomial of minimum degree, called the minimal polynomial of the sequence.

How does one find this polynomial only from the coefficients of the sequence? If one knows the degree d of this polynomial, one has to write d linear equations corresponding to the property of linear recurrence for coefficients. Then one has to solve the following linear system:

1.15

The only thing that remains to do is to determine the degree of the minimal polynomial. At first sight, one may try iteratively all possible degrees (starting from a constant polynomial of degree 0) and match each polynomial produced with the sequence in order to see whether it is an annihilator or not. If the sequence is truly infinite, this might never stop.

Otherwise, if the sequence is finite, one notices, when considering the system, that the maximum degree of the minimal polynomial is half the number of elements in the sequence. This algorithm implies that one should solve successively several linear systems. In practice, it is possible to take advantage of the symmetric structure of the system in order to solve it on-the-fly, while adding the elements of the sequence progressively. This gives the following Berlekamp–Massey algorithm that has a complexity bound of only .

The main idea of this algorithm is to explicitly compute the coefficients of the polynomial. Thus, the update of is performed in two steps. The trick of the test is to enable one to perform each of these two steps alternately. Let us explain the algorithm by looking at the three first terms of the sequence. The degree of the minimal polynomial increases by one (at most) every time one adds two elements of the sequence. The are called discrepancies.

The first discrepancy is the first term of the sequence, and , becomes . Discrepancies correspond to the values taken by the polynomial in the sequel of the sequence. If the discrepancy is null, then the polynomial one considers is an annihilator of an additional part of the sequence. Hence, the second discrepancy is applied to the sequence , namely . Therefore, the update of is , namely , which is an annihilator of the sequence . Then, the third discrepancy is equal to and the two polynomials and are, respectively, equal to and . Hence, annihilates and annihilates .

Hence, as multiplication by X of these annihilator polynomials comes to shifting by one position their application in the initial sequence, one obtains and . Then, it is possible to combine these two polynomials in order to also annihilate the sequence , by , which is exactly what the algorithm does. In the sequel, if all next discrepancies are null, this means that the polynomial we have obtained is an annihilator of the sequel of the sequence. One can still continue until one is sure of having the minimal polynomial of the terms of the sequence or one can stop the algorithm earlier without checking the last discrepancies (using the control variable ).

As for complexity, the loop ends after and at most iterations. In each iteration, computing the discrepancy operations and updating the polynomial require both operations, for an overall number of operation of .

It is even possible to use a fast algorithm to reduce this complexity, at least asymptotically. The idea is to see the sequence as a polynomial too. Then, the minimal polynomial of the sequence is such that the product of and has only a finite number of nonzero terms, the terms of degree at most d. It is possible to rewrite this statement in the following way:

1.16

Hence, one notices that computing , Q, and R can be performed by the Euclidean algorithm interrupted in the middle of the computation, as the degree of R is lower than n. Thus, the complexity bound of the computation is the same as for Euclidean algorithm, namely . However, in practice, the Berlekamp–Massey algorithm remains more efficient for values of d up to dozens of thousands.

The Birthday Paradox is a probability result, and it is called a paradox because it seems to go against the first intuition one could have. It is used in several attack methods in cryptanalysis. It also shows that one should distrust intuitions when talking about probabilities.

There are 365 days in a year and still, in a group of more than 23 people, there is more than one chance in two of having at least two of them with the same birth date !

Indeed, let us take a population of k people. Knowing that the number of days in a year is n, the number of combinations of k distinct birth dates is . Therefore, the probability of having all people with distinct birth dates is . Thus, the probability of having at least two people with the same birthday is

Hence, when considering 365 days, this probability is around one chance in in a group of 9 people, more than one chance in 2 in a group of 23 people, and 99.4% in a group of 60 people. More generally, one has the following theorem:

Theorem 18

In a set of elements chosen randomly among n possibilities, the probability of collision is higher than 50%.

Proof

We have seen that the number of collisions, in a space of size with k draws, is . One has to give an estimation of this probability: . Yet, , for x positive, thus

Then, for this probability to be greater than , it is sufficient to have , namely, as k is positive, . Hence, for (again one has, for ).

This kind of collision probability – that one's intuition would tend to weaken – enables one to build attacks against systems for which intuition would give a limited chance of success.

1.4.3.4 Yuval's Attack on Hash Functions

Resistance to collision of hash functions can be measured: one has to determine the probability of finding collisions, which is close to the probability of collision in birth dates (birth dates play the role of fingerprints for individuals).

That is why Yuval's attack on hash functions is also called a birthday attack. It is a question of transmitting some corrupted message instead of a legitimate message M, in such way that the corruption is unnoticeable for a hash function h. Then, one looks for and , such that . After that, one can, for example, fraudulently change M into , or send M and pretend to have sent , which is precisely what h should prevent!

As a consequence of Theorem 18, the expected number of draws of in Yuval's attack is .

If one uses Yuval's attack to send and then to repudiate it later arguing that was actually sent, one has more than one chance in two of succeeding in attempts. This shows that brute force can be efficient if a hash function is not collision resistant.

But is this attack really feasible? A simple calculation is enough to be convinced: for a numerical fingerprint on 128 bits, one should perform around attempts, which is feasible on general public machines of today: a computer running at 3 GHz performs operations per day, thus it would take a little more than two months on the 1000 PCs of a company to find a collision.

But if one uses hash functions on 160 bits, the cost is multiplied by a factor , which is unreachable so far.

1.4.3.5 Factoring Composite Numbers

It is quite easy to distinguish a composite number from a prime number. But knowing the numbers composing it seems to be a much more difficult problem. It is the factorization problem. Although it can be formulated in a quite simple way, so far there does not exist an efficient solution to it (for instance, the famous Sieve of Eratosthenes is useless for numbers of more than 10 digits).

The difficulty of this problem and the efficiency of attack methods are very important, as a lot of one-way functions rely on the difficulty of factorization or on the difficulty of equivalent problems, such as the discrete logarithm problem. Thus, looking for good factorization algorithms is almost a cryptanalysis method.

Many different algorithms do exist. The goal of this section is not to enumerate them all but rather to give an idea of the most efficient ones for numbers of different sizes.

Pollard's Rho algorithm (Numbers of few digits) The first class of target numbers is composed of “ everyday composite numbers,” namely numbers of less than 20 digits. Pollard's algorithm is very efficient for such numbers.

The algorithm only requires a few lines of code (around forty) and is very easy to implement. Let m be the composite number one wishes to factorize. First of all, one has to compute a sequence of the form of large period (the longer the are distinct the better).

Table 1.8 Distribution of the Multiples of p Modulo m

 0 1 2 p p+1 p+2 kp kp+1 kp+2 m-1

Then, the idea is to notice that, if p is a factor of m, the distinct modulo m are less often distinct modulo p (Table 1.8). In this case, if then the GCD of m and is equal to kp and it is a nontrivial factor of m. If the are actually pairwise distinct, the computation ends in at most p steps. A first version of the algorithm consists in producing some and, when adding a new element, computing the GCD with all previous 's. This version has two major drawbacks: first of all, one has to store around pelements; also, it takes GCD computations if i and are the smallest indexes such that . The second trick of Pollard is to use Floyd's cycle detection. It consists in storing only the such that k is a power of 2. Indeed, as the are generated by a function, if , then for all and a cycle is created modulo p, even if it is not directly noticeable.

When only storing powers of 2, the cycle will only be detected for with , as illustrated in Figure 1.21

Figure 1.21 Floyd's cycle detection, in the form of a rho

Yet, . Hence, one performs at most twice the number of requested operations. One single GCD is computed at each step and one single additional element is stored throughout the algorithm. This gives the very fast algorithm presented in Algorithm 1.16.

If one takes , so that the is actually pairwise distinct modulo m, for example, it will take at worst iterations if p is the smallest factor of m, and even less in general:

Theorem 19

Pollard's Rho algorithm has more than one chance in two of succeeding in steps.

Proof

Once again, the proof is similar to that of the birthday paradox. If k distinct values are drawn randomly, then there are combinations without collisions between the for an overall . For the probability of finding a nontrivial factor to be greater than , one must have – according to Theorem 18 - .

In practice, this algorithm factorizes numbers of 1 up to around 25 digits within seconds (with factors of 12 or 13 digits, this gives approximately 10 million operations!) but very quickly, it becomes useless when considering factors of more than 15 digits.

Elliptic curves (Numbers with a few dozen of digits) To go further, one can use a method based on elliptic curves, designed by Pollard and Lenstra.

This method uses elliptic curves of the form , defined in Section 1.3.6. The idea is to consider the set of solutions of the latter equation modulo the number m to be factorized and to try to add them as if this set was a group. As the addition of points (see Theorem 11) requires to invert coordinates modulo m, if the inversion of say fails then it means that is not invertible. In other words contains a proper factor of m, which can be revealed by computing . To simplify, suppose is the product of only two distinct primes. Then, the curve equation defines two proper elliptic curves, one modulo p and one modulo q. In consequence for any point P, Lagrange's theorem (Theorem 1, page 31), ensures that , say modulo q, only if k divides , the number of points of the curve modulo q. If the curve is chosen randomly, then and are close to p and q, and not necessarily primes. Hence, some of their prime factors will differ with high probability. Therefore, if we compute , for a small e, it is likely thate will divide one of or , but less likely that it divides both numbers at the same time. When this is the case, itmeans that is not on the curve modulo m and therefore that its computation will crash. The overall procedure is thus to try many small prime factors e. A way to perform this efficiently is to compute with a not too large B. The algorithm is detailed as Algorithm 1.17.

The computational procedure remains simple (around 150 lines of code) and we can give a few ideas concerning the properties of this algorithm: it is conjectured that, in order to factorize an integer m of smallest factor p, this algorithm requires an average number of operations of the order of

In practice, this algorithm factorizes numbers of 25 to 40 digits (with two factors of similar sizes) within seconds. Besides, if one is lucky and chooses some particular elliptic curve, the latter might enable one to factorize very quickly: the project ECMNET (Elliptic Curve Method on the Net) provides an implementation of this algorithm, available on the Internet. This project has allowed the factorization of numbers with factors up to digits.

The main issue is that good elliptic curves vary for each number one wishes to factorize and so far there does not exist a method of finding the appropriate curve for a given number. Notwithstanding, the rapidity of computation when one has found a “ good” elliptic curve is at the origin of the factorization program on the Internet: indeed, many Internet users can retrieve the ECM program and launch it in order to try several elliptic curves. Hence, numerous elliptic curves can be studied at the same time and possibly speed up the search of prime factors.

Number Field Sieve (World champion) Finally, the current champion of RSA key factorization (product of two large prime numbers, see Section 3.4.2) is the Number Field Sieve algorithm, which seems to require – in order to factorize some number m, product of two factors of similar sizes – an average number of operations of the order of

A number field is an extension of the field of rational numbers (one considers the infinite field ).

The number field sieve is a generalization of the quadratic sieve when considering the field of all integers modulo m. For the sake of simplicity, we only present the main idea of the latter.

The aim is to find couples of numbers whose squares are congruent modulo . Then is a multiple of m. Now, let , if one is lucky, so that and thus d is a nontrivial factor of m.

Example 1.13

Let us try to factorize . We compute randomly some squares: and . Yet, and are small with respect to and thus easier to factorize, for example using Pollard's method. One obtains and . Therefore, one can write . We have found a relation of the form , which gives us a factor of : , and .

The whole difficulty of the algorithm is to find such integers x and y. For this, one has to compute several squares and store those whose remainders modulo m are small enough to be factorized using another method. Then, one has to find a linear combination of these squares that would give another square: in a matrix, let us store the exponents in the columns and the squares in the lines (Table 1.9).

 Exponent of 2 Exponent of 3 of 5 of 7 of 2 3 1 0 2 0 1 1 0 2 1 1

According to this table, is a square if and only if the line of added to the line of only gives even exponents. In other words, if M is the matrix of exponents (rows and columns of Table 1.9), one has to find some binary vector x such that is even or to find a solution modulo 2 to the associated linear system: x such that .

Although the basic idea is quite simple, the calculation is a little more delicate than for the previous algorithms. Still, this algorithm holds the current records. In particular, it enabled one to factorize a 200 digit (665 bits) RSA key in 2005. The computing time for this last factorization was gigantic: more than a year and a half of computation on more than 80 machines !

1.4.3.6 Strong Prime Numbers

RSA cryptography (see Chapter 3) is based on the use of two prime numbers p and q and on the difficulty of factorizing their product m. In order to resist various factorization methods (some of them are presented in the form of exercises in Chapter 3, on page 173), the prime numbers one uses must satisfy several properties:

· In order to resist factorization based on elliptic curves, p and q must have similar sizes and must be large enough. For instance, in order to work with numbers on 1024 bits, they should both have a size of around 512 bits.

· must be large enough: otherwise, it is sufficient to try as a value for p or q all integers close to (Fermat's square root attack).

· In order to resist Pollard's algorithms and (which take advantage of the factorization of and when possible), p and q have to be strong prime numbers, that is, each of them must satisfy the conditions:

· has a large factor, denoted by r.

· has a large factor.

· has a large factor.

Gordon's Algorithm 1.18 enables one to generate strong prime numbers:

Exercise 1.43

Prove that the output of Gordon's algorithm is a strong prime number.

Solution on page 294.

1.4.3.7 Solving the Discrete Logarithm Problem

Alongside modular exponentiation, the other major class of one-way functions relies on the discrete logarithm problem.

Let G be a group of size n admitting a generator (i.e., G is cyclic). For instance, one could consider the group of invertible elements modulo some prime number p, .

Given a primitive root g and an element b in G, the problem is to find the discrete logarithm of b in base g, namely to find x such that .

The naive algorithm for solving this problem is to try all possible x until one finds the right one. The worst and the average complexity bounds are , thus an exponential complexity with respect to , the size of n.

So far, the best known algorithms for solving this problem have a complexity bound in the general case. Most of the time, these algorithms are based on factorization algorithms: we will see that variants of Pollard's Rho – – and index calculus – – algorithms can be applied to groups. However, complexities are raised to the square with respect to factorization. Indeed, if one considers numbers modulo some composite number – a product of two prime numbers – , factorization algorithms have a complexity bound that depends on the smallest prime factor, roughly: . That is why the discrete logarithm method enables one to consider numbers half the size as those used for factorization-based methods with the same level of security. These sizes are even further reduced if one considers more generic groups, such as the group of points of an elliptic curve, for which the best discrete logarithm methods do not apply.

Baby step, Giant step This method was developed by Shanks, and it is divided into two phases: the baby step, tests from to for all x in some interval, and the giant step, jumps from to .

The idea is to decompose x into two pieces , with i and j between 1 and . Hence, one can write , or . Thus, one has to compute all possible (baby step), and all possible in order to checks if one of these values has been computed in the baby step (giant step).

Although computing all these values only takes multiplications, looking for the correspondences with a naive method implies that one has to try possibilities in the worst case.

The trick that decreases this complexity is to sort the in increasing order (complexity ) in order to be able to perform comparisons with a dichotomous research with only tests!

Therefore, the time complexity is improved; unfortunately, the space complexity is such that this algorithm is not practical: one has to store all integers. Even for a reasonable number of operations (around nowadays), the required memory space is then of the order of several billion gigabytes.

Pollard's Rho returns Pollard's Rho algorithm enables one to modify the baby step, giant step method and to introduce Floyd's cycle detection. Hence, one can preserve the time complexity bound , while reducing significantly the memory complexity bound down to only bytes. In comparison with the factorization algorithm, one has to modify the generator function of the sequence in the following way:

Build three subsets , and of G of similar sizes forming a partition of G (for example, in , with one can always take , and ).

Then, one defines the generator function f such that

Hence, each element of the sequence can be written in the form for some and some . Yet, in the same way as the Rho algorithm for factorization, the are more or less equally spread modulo p. Therefore, a collision occurs on the average after draws, even if and . The function f insures, as for factorization, that this collision will be constantly reproduced after k steps; thus, it is possible to find y such that thanks to Floyd's algorithm with only a memory complexity bound of several integers of size .

Then, one has and – at the same time – . In this case, one obtains , which means that, in the space of indexes, one directly finds x (one recalls that ):

Be careful: we are solving logarithms in but the latter equation lies in where ; thus although is nonzero, it is not necessarily invertible modulo n. If this is not the case, then restart the algorithm.

Coppersmith's Index Calculus The same way as we have just modified Pollard's algorithm and adapted it to the computation of discrete logarithms, it is possible to modify sieve algorithms. Then, the obtained method works for the discrete logarithm over the group of invertible of a finite field, but not for any group. In particular, cryptology over curves are still resisting this kind of attack. Let us show anyway how this attack works over finite fields with an example.

Example 1.14

How to find x, such that ? One recalls that is prime and that is a primitive root of for . The idea is to draw randomly some values of i such that can be easily factorized (i.e., it only has small prime factors, or smooth). In practice, one is given a basis of prime factors, for example .

Then, for each prime number , one divides with the greatest possible power of . At the end of the process, if one obtains the value 1, then can be factorized in the basis B.

After several random draws, we keep the values , and :

When considering the logarithms, Theorem 7 guaranties that one obtains a linear system in the space of exponents whose unknown values are the discrete logarithms of 2, 3, and 5, now modulo :

This gives (or in other words ), then and

One has to find a number in the form whose remainder can be factorized in the basis . Still with the same method, one finds – for example – after several random attempts: . Thus, satisfies .

Such index calculus algorithms are more efficient in some particular fields. The record is held by a French team which managed – in November 2005 – to compute a discrete logarithm in the field in only 17 days on the 64 processors of the Bull supercalculator Teranova.

Now we have at our disposal enough tools to start specializing. We are able to build the objects we will use throughout this book, and we have described the common algorithms that are the foundations of coding. Each of the following chapters will refer to the work in this chapter, depending on needs of the specific objectives: compression, encryption, or correction.

1 The word code in natural languages can have several meanings. In our context, we will see that it applies to transformations and their results, but in computer science, it can also mean a computer program. This apparent confusion actually enables one to figure out the link between various mathematical and computer processes. That is why we will keep the word with its multiple meanings.2 Analogous definition can be made for space complexity to measure the amount of space, or memory required by the algorithm.3 r must be coprime with n, the greatest prime factor q of r must satisfy , and n must also satisfy .4 http://csrc.nist.gov/publications/fips/fips186-3/fips_186-3.pdf5 http://hyperelliptic.org/EFD/index.html6 http://galg.acrypta.com/telechargements/ARCANA_ECDB.tgz

﻿