THE SIMPLE SUBSTITUTION CIPHER - Hacking Secret Ciphers with Python (2013)

Hacking Secret Ciphers with Python (2013)

Chapter 17: THE SIMPLE SUBSTITUTION CIPHER

Topics Covered In This Chapter:

· The sort() list method

· Getting rid of duplicate characters from a string

· The isupper() and islower() string methods

· Wrapper functions

“In my role as Wikileaks editor, I've been involved in fighting off many legal attacks. To do that, and keep our sources safe, we have had to spread assets, encrypt everything, and move telecommunications and people around the world to activate protective laws in different national jurisdictions.”

Julian Assange, editor-in-chief of Wikileaks

The transposition and affine ciphers have thousands of possible keys, but a computer can still brute-force through all of them easily. We’ll need a cipher that has so many possible keys, no computer can possibly brute-force through them all.

The simple substitution cipher is effectively invulnerable to a brute-force attack. Even if your computer could try out a trillion keys every second, it would still take twelve million years for it to try out every key.

The Simple Substitution Cipher with Paper and Pencil

To implement the simple substitution cipher, choose a random letter to encrypt each letter of the alphabet. Use each letter once and only once. The key will end up being a string of 26 letters of the alphabet in random order. There are 403,291,461,126,605,635,584,000,000 possible orderings for keys. (To see how this number was calculated, see http://invpy.com/factorial).

Let’s do the simple substitution cipher with paper and pencil first. For example, let’s encrypt the message, “Attack at dawn.” with the key VJZBGNFEPLITMXDWKQUCRYAHSO. First write out the letters of the alphabet and then write the key underneath it.

A

B

C

D

E

F

G

H

I

J

K

L

M

N

O

P

Q

R

S

T

U

V

W

X

Y

Z

V

J

Z

B

G

N

F

E

P

L

I

T

M

X

D

W

K

Q

U

C

R

Y

A

H

S

O

To encrypt a message, find the letter from the plaintext in the top row and substitute it with the letter in the bottom row. A encrypts to V, and T encrypts to C, C encrypts to Z, and so on. So the message “Attack at dawn.” encrypts to “Vccvzi vc bvax.”

To decrypt, find the letter from the ciphertext in the bottom row and replace it with the letter from the top row. V decrypts to A, C decrypts to T, Z decrypts to C, and so on.

This is very similar to how the Caesar cipher works with the St. Cyr slide, except the bottom row is scrambled instead of in alphabetical order and just shifted over. The advantage of the simple substitution cipher is that there are far more possible keys. The disadvantage is that the key is 26 characters long and harder to memorize. If you write down the key, make sure that this key is never read by anyone else!

Practice Exercises, Chapter 17, Set A

Practice exercises can be found at http://invpy.com/hackingpractice17A.

Source Code of the Simple Substitution Cipher

Open a new file editor window by clicking on FileNew Window. Type in the following code into the file editor, and then save it as simpleSubCipher.py. Press F5 to run the program. Note that first you will need to download the pyperclip.py module and place this file in the same directory as the simpleSubCipher.py file. You can download this file from http://invpy.com/pyperclip.py.

Source code for simpleSubCipher.py

1. # Simple Substitution Cipher

2. # http://inventwithpython.com/hacking (BSD Licensed)

3.

4. import pyperclip, sys, random

5.

6.

7. LETTERS = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'

8.

9. def main():

10. myMessage = 'If a man is offered a fact which goes against his instincts, he will scrutinize it closely, and unless the evidence is overwhelming, he will refuse to believe it. If, on the other hand, he is offered something which affords a reason for acting in accordance to his instincts, he will accept it even on the slightest evidence. The origin of myths is explained in this way. -Bertrand Russell'

11. myKey = 'LFWOAYUISVKMNXPBDCRJTQEGHZ'

12. myMode = 'encrypt' # set to 'encrypt' or 'decrypt'

13.

14. checkValidKey(myKey)

15.

16. if myMode == 'encrypt':

17. translated = encryptMessage(myKey, myMessage)

18. elif myMode == 'decrypt':

19. translated = decryptMessage(myKey, myMessage)

20. print('Using key %s' % (myKey))

21. print('The %sed message is:' % (myMode))

22. print(translated)

23. pyperclip.copy(translated)

24. print()

25. print('This message has been copied to the clipboard.')

26.

27.

28. def checkValidKey(key):

29. keyList = list(key)

30. lettersList = list(LETTERS)

31. keyList.sort()

32. lettersList.sort()

33. if keyList != lettersList:

34. sys.exit('There is an error in the key or symbol set.')

35.

36.

37. def encryptMessage(key, message):

38. return translateMessage(key, message, 'encrypt')

39.

40.

41. def decryptMessage(key, message):

42. return translateMessage(key, message, 'decrypt')

43.

44.

45. def translateMessage(key, message, mode):

46. translated = ''

47. charsA = LETTERS

48. charsB = key

49. if mode == 'decrypt':

50. # For decrypting, we can use the same code as encrypting. We

51. # just need to swap where the key and LETTERS strings are used.

52. charsA, charsB = charsB, charsA

53.

54. # loop through each symbol in the message

55. for symbol in message:

56. if symbol.upper() in charsA:

57. # encrypt/decrypt the symbol

58. symIndex = charsA.find(symbol.upper())

59. if symbol.isupper():

60. translated += charsB[symIndex].upper()

61. else:

62. translated += charsB[symIndex].lower()

63. else:

64. # symbol is not in LETTERS, just add it

65. translated += symbol

66.

67. return translated

68.

69.

70. def getRandomKey():

71. key = list(LETTERS)

72. random.shuffle(key)

73. return ''.join(key)

74.

75.

76. if __name__ == '__main__':

77. main()

Sample Run of the Simple Substitution Cipher Program

When you run this program, the output will look like this:

Using key LFWOAYUISVKMNXPBDCRJTQEGHZ

The encrypted message is:

Sy l nlx sr pyyacao l ylwj eiswi upar lulsxrj isr sxrjsxwjr, ia esmm rwctjsxsza sj wmpramh, lxo txmarr jia aqsoaxwa sr pqaceiamnsxu, ia esmm caytra jp famsaqa sj. Sy, px jia pjiac ilxo, ia sr pyyacao rpnajisxu eiswi lyypcor l calrpx ypc lwjsxu sx lwwpcolxwa jp isr sxrjsxwjr, ia esmm lwwabj sj aqax px jia rmsuijarj aqsoaxwa. Jia pcsusx py nhjir sr agbmlsxao sx jisr elh. -Facjclxo Ctrramm

This message has been copied to the clipboard.

Notice that if the letter in the plaintext was lowercase, it will be lowercase in the ciphertext. If the letter was uppercase in the plaintext, it will be uppercase in the ciphertext. The simple substitution cipher does not encrypt spaces or punctuation marks. (Although the end of this chapter explains how to modify the program to encrypt those characters too.)

To decrypt this ciphertext, paste it as the value for the myMessage variable on line 10 and change myMode to the string 'decrypt'. Then run the program again. The output will look like this:

Using key LFWOAYUISVKMNXPBDCRJTQEGHZ

The decrypted message is:

If a man is offered a fact which goes against his instincts, he will scrutinize it closely, and unless the evidence is overwhelming, he will refuse to believe it. If, on the other hand, he is offered something which affords a reason for acting in accordance to his instincts, he will accept it even on the slightest evidence. The origin of myths is explained in this way. -Bertrand Russell

This message has been copied to the clipboard.

How the Program Works

simpleSubCipher.py

1. # Simple Substitution Cipher

2. # http://inventwithpython.com/hacking (BSD Licensed)

3.

4. import pyperclip, sys, random

5.

6.

7. LETTERS = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'

The first few lines are comments describing the program. Then the pyperclip, sys, and random modules are imported. Finally, the LETTERS constant variable is set to a string of all the uppercase letters. The LETTERS string will be our symbol set for the simple substitution cipher program.

The Program’s main() Function

simpleSubCipher.py

9. def main():

10. myMessage = 'If a man is offered a fact which goes against his instincts, he will scrutinize it closely, and unless the evidence is overwhelming, he will refuse to believe it. If, on the other hand, he is offered something which affords a reason for acting in accordance to his instincts, he will accept it even on the slightest evidence. The origin of myths is explained in this way. -Bertrand Russell'

11. myKey = 'LFWOAYUISVKMNXPBDCRJTQEGHZ'

12. myMode = 'encrypt' # set to 'encrypt' or 'decrypt'

The main() function is similar to the main() function of cipher programs in the previous chapters. It contains the variables that store the message, key, and mode that will be used for the program.

simpleSubCipher.py

14. checkValidKey(myKey)

The keys for simple substitution ciphers are easy to get wrong. For example, the key might not have every letter of the alphabet. Or the key may have the same letter twice. The checkValidKey() function (which is explained later) makes sure the key is usable by the encryption and decryption functions, and will exit the program with an error message if they are not.

simpleSubCipher.py

16. if myMode == 'encrypt':

17. translated = encryptMessage(myKey, myMessage)

18. elif myMode == 'decrypt':

19. translated = decryptMessage(myKey, myMessage)

If the program execution returns from checkValidKey() instead of terminating, we can assume the key is valid. Lines 16 through 19 check whether the myMode variable is set to 'encrypt' or 'decrypt' and calls either encryptMessage() or decryptMessage(). The return value ofencryptMessage() and decryptMessage() (which is explained later) will be a string of the encrypted (or decrypted) message. This string will be stored in the translated variable.

simpleSubCipher.py

20. print('Using key %s' % (myKey))

21. print('The %sed message is:' % (myMode))

22. print(translated)

23. pyperclip.copy(translated)

24. print()

25. print('This message has been copied to the clipboard.')

The key that was used is printed to the screen on line 20. The encrypted (or decrypted) message is printed on the screen and also copied to the clipboard. Line 25 is the last line of code in the main() function, so the program execution returns after line 25. Since the main() call is done at the last line of the program, the program will then exit.

The sort() List Method

simpleSubCipher.py

28. def checkValidKey(key):

29. keyList = list(key)

30. lettersList = list(LETTERS)

31. keyList.sort()

32. lettersList.sort()

A simple substitution key string value is only valid if it has each of the characters in the symbol set with no duplicate or missing letters. We can check if a string value is a valid key by sorting it and the symbol set into alphabetical order and checking if they are equal. (Although LETTERS is already in alphabetical order, we still need to sort it since it could be expanded to contain other characters.)

On line 29 the string in key is passed to list(). The list value returned is stored in a variable named keyList. On line 30, the LETTERS constant variable (which, remember, is the string 'ABCDEFGHIJKLMNOPQRSTUVWXYZ') is passed to list() which returns the list ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z'].

The sort() list method will rearrange the order of items in the list into alphabetical order. The lists in keyList and lettersList are then sorted in alphabetical order by calling the sort() list method on them. Note that just like the append() list method, the sort() list method modifies the list in place and does not have a return value. You want your code to look like this:

keyList.sort()

...and not look like this:

keyList = keyList.sort()

simpleSubCipher.py

33. if keyList != lettersList:

34. sys.exit('There is an error in the key or symbol set.')

Once sorted, the keyList and lettersList values should be the same, since keyList was just the characters in LETTERS with the order scrambled. If keyList and lettersList are equal, we also know that keyList (and, therefore, the key parameter) does not have any duplicates in it, sinceLETTERS does not have any duplicates in it.

However, if the condition on line 33 is True, then the value in myKey was set to an invalid value and the program will exit by calling sys.exit().

Wrapper Functions

simpleSubCipher.py

37. def encryptMessage(key, message):

38. return translateMessage(key, message, 'encrypt')

39.

40.

41. def decryptMessage(key, message):

42. return translateMessage(key, message, 'decrypt')

43.

44.

45. def translateMessage(key, message, mode):

The code for encrypting and the code for decrypting are almost exactly the same. It’s always a good idea to put code into a function and call it twice rather than type out the code twice. First, this saves you some typing. But second, if there’s ever a bug in the duplicate code, you only have to fix the bug in one place instead of multiple places. It is (usually) much more reasonable to replace duplicate code with a single function that has the code.

Wrapper functions simply wrap the code of another function, and return the value the wrapped function returns. Often the wrapper function might make a slight change to the arguments or return value of wrapped function (otherwise you would just call the wrapped function directly.) In this case, encryptMessage() and decryptMessage() (the wrapper functions) calls translateMessage() (the wrapped function) and returns the value translateMessage() returns.

On line 45 notice that translateMessage() has the parameters key and message, but also a third parameter named mode. When it calls translateMessage(), the call in encryptMessage() function passes 'encrypt' for the mode parameter, and the call in decryptMessage() function passes'decrypt'. This is how the translateMessage() function knows whether it should encrypt or decrypt the message it is passed.

With these wrapper functions, someone who imports the simpleSubCipher.py program can call functions named encryptMessage() and decryptMessage() like they do with all the other cipher programs in this book. They might create a program that encrypts with various ciphers, like below:

import affineCipher, simpleSubCipher, transpositionCipher

...some other code here...

ciphertext1 = affineCipher.encryptMessage(encKey1, 'Hello!')

ciphertext2 = transpositionCipher.encryptMessage(encKey2, 'Hello!')

ciphertext3 = simpleSubCipher.encryptMessage(encKey3, 'Hello!')

The wrapper functions give the simple substitution cipher program function names that are consistent with the other cipher programs. Consistent names are very helpful, because it makes it easier for someone familiar with one of the cipher programs in this book to already be familiar with the other cipher programs. (You can even see that the first parameter was always made the key and the second parameter is always the message.) This is the reason we have these wrapper functions, because making the programmer call the translateMessage() function would be inconsistent with the other programs.

The Program’s translateMessage() Function

simpleSubCipher.py

45. def translateMessage(key, message, mode):

46. translated = ''

47. charsA = LETTERS

48. charsB = key

49. if mode == 'decrypt':

50. # For decrypting, we can use the same code as encrypting. We

51. # just need to swap where the key and LETTERS strings are used.

52. charsA, charsB = charsB, charsA

The translateMessage() function does the encryption (or decryption, if the mode parameter is set to the string 'decrypt'). The encryption process is very simple: for each letter in the message parameter, we look up its index in LETTERS and replace it with the letter at that same index in the keyparameter. To decrypt we do the opposite: we look up the index in key and replace it with the letter at the same index in the LETTERS.

The table below shows why using the same index will encrypt or decrypt the letter. The top row shows the characters in charsA (which is set to LETTERS on line 47), the middle row shows the characters in charsB (which is set to key on line 48), and the bottom row are the integer indexes (for our own reference in this example).

A

B

C

D

E

F

G

H

I

J

K

L

M

V

J

Z

B

G

N

F

E

P

L

I

T

M

0

1

2

3

4

5

6

7

8

9

10

11

12

N

O

P

Q

R

S

T

U

V

W

X

Y

Z

X

D

W

K

Q

U

C

R

Y

A

H

S

O

13

14

15

16

17

18

19

20

21

22

23

24

25

The code in translateMessage() will always look up the message character’s index in charsA and replace it with the character at that index in charsB.

So to encrypt, we can just leave charsA and charsB as they are. This will replace the character in LETTERS with the character in key, because charsA is set to LETTERS and charsB is set to key.

When decrypting, the values in charsA and charsB (that is, LETTERS and key) are swapped on line 52, so the table would look like this:

V

J

Z

B

G

N

F

E

P

L

I

T

M

A

B

C

D

E

F

G

H

I

J

K

L

M

0

1

2

3

4

5

6

7

8

9

10

11

12

X

D

W

K

Q

U

C

R

Y

A

H

S

O

N

O

P

Q

R

S

T

U

V

W

X

Y

Z

13

14

15

16

17

18

19

20

21

22

23

24

25

Remember, our code in translateMessage() always replaces the character in charsA (the top row) with the character at that same index in charsB (the middle row). So when lines 47 and 48 will swap the values in charsA and charsB, the code in translateMessage() will be doing the decryption process instead of the encryption process.

simpleSubCipher.py

54. # loop through each symbol in the message

55. for symbol in message:

56. if symbol.upper() in charsA:

57. # encrypt/decrypt the symbol

58. symIndex = charsA.find(symbol.upper())

The for loop on line 55 will set the symbol variable to a character in the message string on each iteration through the loop. If the uppercase form of this symbol exists in charsA (remember that both the key and LETTERS only have uppercase characters in them), then we will find the index of the uppercase form of symbol in charsA. This index will be stored in a variable named symIndex.

We already know that the find() method will never return -1 (remember, a -1 from the find() method means the argument could not be found the string) because the if statement on line 56 guarantees that symbol.upper() will exist in charsA. Otherwise line 58 wouldn’t have been executed in the first place.

The isupper() and islower() String Methods

The isupper() string method returns True if:

1. The string has at least one uppercase letter.

2. The string does not have any lowercase letters in it.

The islower() string method returns True if:

1. The string has at least one lowercase letter.

2. The string does not have any uppercase letters in it.

Non-letter characters in the string do not affect whether these methods return True or False. Try typing the following into the interactive shell:

>>> 'HELLO'.isupper()

True

>>> 'HELLO WORLD 123'.isupper()

True

>>> 'hELLO'.isupper()

False

>>> 'hELLO'.islower()

False

>>> 'hello'.islower()

True

>>> '123'.isupper()

False

>>> ''.islower()

False

>>>

simpleSubCipher.py

59. if symbol.isupper():

60. translated += charsB[symIndex].upper()

61. else:

62. translated += charsB[symIndex].lower()

If symbol is an uppercase letter, than we want to concatenate the uppercase version of the character at charsB[symIndex] to translated. Otherwise we will concatenate the lowercase version of the character at charsB[symIndex] to translated.

If symbol was a number or punctuation mark like '5' or '?', then the condition on line 59 would be False (since those strings don’t have at least one uppercase letter in them) and line 62 would have been executed. In this case, line 62’s lower() method call would have no effect on the string since it has no letters in it. Try typing the following into the interactive shell:

>>> '5'.lower()

'5'

>>> '?'.lower()

'?'

>>>

So line 62 in the else block takes care of translating any lowercase characters and non-letter characters.

simpleSubCipher.py

63. else:

64. # symbol is not in LETTERS, just add it

65. translated += symbol

By looking at the indentation, you can tell that the else statement on line 63 is paired with the if statement on line 56. The code in the block that follows (that is, line 65) is executed if symbol is not in LETTERS. This means that we cannot encrypt (or decrypt) the character in symbol, so we will just concatenate it to the end of translated as is.

simpleSubCipher.py

67. return translated

At the end of the translateMessage() function we return the value in the translated variable, which will contain the encrypted or decrypted message.

Practice Exercises, Chapter 17, Set B

Practice exercises can be found at http://invpy.com/hackingpractice17B.

Generating a Random Key

simpleSubCipher.py

70. def getRandomKey():

71. key = list(LETTERS)

72. random.shuffle(key)

73. return ''.join(key)

Typing up a string that contains each letter of the alphabet once and only once can be difficult. To aid the user, the getRandomKey() function will return a valid key to use. Lines 71 to 73 do this by randomly scrambling the characters in the LETTERS constant. See the “Randomly Scrambling a String” section in Chapter 10 for an explanation of how to scramble a string using list(), random.shuffle(), and join().

To use the getRandomKey() function, line 11 can be changed to this:

11. myKey = getRandomKey()

Remember that our cipher program will print out the key being used on line 20. This is how the user can find out what key the getRandomKey() function returned.

simpleSubCipher.py

76. if __name__ == '__main__':

77. main()

Lines 76 and 77 are at the bottom of the program, and call main() if simpleSubCipher.py is being run as a program instead of imported as a module by another program.

Encrypting Spaces and Punctuation

The simple substitution cipher in this chapter only encrypts the letters in the plaintext. This artificial limitation is here because the hacking program in the next chapter only works if the letters alone have been substituted.

If you want the simple substitution program to encrypt more than just the letter characters, make the following changes:

simpleSubCipher.py

7. LETTERS = r""" !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXY Z[\]^_`abcdefghijklmnopqrstuvwxyz{|}~"""

Line 7’s value stored in the LETTERS constant is changed to a string of all the characters Using a triple-quotes raw string so you do not have to escape the quotes and \ slash character makes typing this easier.

The key used must also have all of these characters, so line 11 changes to something like this:

simpleSubCipher.py

11. myKey = r"""/{9@6hUf:q?_)^eTi|W1,NLD7xk(-SF>Iz0E=d;Bu#c]w~'VvHKmpJ+}s8y& XtP43.b[OA!*\Q<M%$ZgG52YloaRCn"`rj"""

The code that differentiates between upper and lowercase letters on lines 58 to 62 can be replaced with these two lines:

simpleSubCipher.py

58. symIndex = charsA.find(symbol.upper())

59. if symbol.isupper():

60. translated += charsB[symIndex].upper()

61. else:

62. translated += charsB[symIndex].lower()

58. symIndex = charsA.find(symbol)

59. translated += charsB[symIndex]

Now when you run the simple substitution cipher program, the ciphertext looks much more like random gibberish:

Using key /{9@6hUf:q?_)^eTi|W1,NLD7xk(-SF>Iz0E=d;Bu#c]w~'VvHKmpJ+}s8y& XtP43.b[OA!*\Q<M%$ZgG52YloaRCn"`rj

The encrypted message is:

#A/3/%3$/\2/ZAAO5O[/3/A3bY/a*\b*/!ZO2/3!3\$2Y/*\2/\$2Y\$bY2)/*O/a\MM/2b5lY\$\nO/\Y/bMZ2OMC)/3$[/l$MO22/Y*O/Oo\[O$bO/\2/ZoO5a*OM%\$!)/*O/a\MM/5OAl2O/YZ/.OM\OoO/\Ye/#A)/Z$/Y*O/ZY*O5/*3$[)/*O/\2/ZAAO5O[/2Z%OY*\$!/a*\b*/3AAZ5[2/3/5O32Z$/AZ5/3bY\$!/\$/3bbZ5[3$bO/YZ/*\2/\$2Y\$bY2)/*O/a\MM/3bbOgY/\Y/OoO$/Z$/Y*O/2M\!*YO2Y/Oo\[O$bOe/p*O/Z5\!\$/ZA/%CY*2/\2/ORgM3\$O[/\$/Y*\2/a3Ce/^0O5Y53$[/Kl22OMM

This message has been copied to the clipboard.

Practice Exercises, Chapter 17, Set C

Practice exercises can be found at http://invpy.com/hackingpractice17C.

Summary

In this chapter, we have learned about the new “set” data type. In many of our programs, lists are much easier to use than sets, but sets are a simple way to get rid of duplicate values from lists or strings.

The isupper() and islower() string methods can tell us if a string value is made up of only uppercase or lowercase letters. And the sort() list method is very useful at putting the items in a list in order.

The simple substitution cipher has far too many possible keys to brute-force through. This makes it impervious to the techniques that our previous cipher hacking programs have used. We are going to have to make smarter programs in order to break this code.

In the next chapter, we will learn how to hack the simple substitution cipher. Instead of brute-forcing through all the keys, we will use a much more intelligent and sophisticated algorithm.