Lab 9 : Cryptanalysis of the Vigenère cipher

0 Preliminaries

Download the following code to get started lab9.py.

To run doctests, add the following to the bottom of your file:

if (__name__ == '__main__'):
  import doctest
  doctest.testmod(optionflags= doctest.NORMALIZE_WHITESPACE)

1 Vigenère cipher

Here is a way to implement the Vigenère encryption and decryption functions. First, divide the text up column-wise, with the number of columns being the length of the key. Second, encrypt (or decrypt) the ith column using the ith letter of the keyword using the additive cipher. Third, interleave the columns back into a single string.

Exercise 1 Implement the Vigenère encryption function. Use every_nth to get the columns, additive_encrypt to encrypt each column, and zip_uneven to interleave the encrypted columns.

def encrypt(s, k):
    """
    >>> encrypt('ONAPLANETHEPLANEISDUE', 'MILK')
    'AVLZXIYOFPPZXIYOUAOEQ'

    """

Exercise 2 Write a function which “inverts” a keyword: that is, it should return a keyword each of whose letters is the additive inverse of the corresponding letter in the input keyword.

def invert_key(k):
    """
    >>> invert_key('MILK')
    'OSPQ'

    """

Exercise 3 Implement the Vigenère decryption function. Use encrypt and invert_key.

def decrypt(s, k):
    """
    >>> decrypt('AVLZXIYOFPPZXIYOUAOEQ','MILK')
    'ONAPLANETHEPLANEISDUE'

    """

2 Guessing keys

Suppose you are given a ciphertext you know to be encrypted with Vigenère, and you have a hypothesis about what the key length is. One strategy would be to try to apply the letter frequency attack you would use against the additive cipher column-wise. So first divide up the ciphertext into n columns, where n is the conjectured key length. Second, determine the most frequent letters in each column and view the result as a single word. Third, guess the keyword by viewing this word as the encryption of a string of English letters of high probability (determined only by letter frequency): e.g., ‘EEEE’.

Exercise 4 Write a function which gives the highest frequency letters in each column. Use every_nth and letters_by_freq.

def column_top_freq(s, step):
   """
   >>> column_top_freq('AVLZXIYOFPPZXIYOUAOEQ', 4)
   ['X', 'I', 'Y', 'O']

   """

Exercise 5 Write a function which computes your guess of the key based on a given string of English letters. Use column_top_freqs and decrypt.

def guess_key(s, t):
    """
    >>> guess_key('AVLZXIYOFPPZXIYOUAOEQ', 'EEEE')
    'TEUK'

    """

Exercise 6 Encrypt frankenstein with a keyword of your choice (use prep_text first). See if you can recover it using this strategy (assuming you know the key length).

Challenge Write a function which computes many guesses according using relatively high probability letters in English.

eng_letters_by_freq = ['E', 'T', 'A', 'O', 'I', 'N', 'S',
                       'H', 'R', 'D', 'L', 'C', 'U', 'M',
                       'W', 'F', 'G', 'Y', 'P', 'B', 'V',
                       'K', 'J', 'X', 'Q', 'Z']

from itertools import product

def prod_high_freq_eng(n, repeat):
    """
    >>> prod_high_freq_eng(2, 3)
    ['EEE', 'EET', 'ETE', 'ETT', 'TEE', 'TET', 'TTE', 'TTT']

    """

def guess_keys(s, step, n):
    """
    >>> guess_keys('AVLZXIYOFPPZXIYOUAOEQ', 4, 2)
    ['TEUK', 'TEUV', 'TEFK', 'TEFV', 'TPUK', 'TPUV', 'TPFK', 'TPFV', 'EEUK', 'EEUV', 'EEFK', 'EEFV', 'EPUK', 'EPUV', 'EPFK', 'EPFV']

    """

4 Index of Coincidence

One method to guess the key length is to compute the Index of Coincidence of column-wise. For example, in the case of there being only one column, if the IC of the text is much nearer to the IC of English than the IC of random text, then you already have evidence that the text was encrypted with a monoalphabetic substitution cipher.

Exercise 7 Write a function which computes the Index of Coincidence of a given string. Use count_letters.

def index_of_coincidence(s):
   """
   Assumes s is string over {A,...,Z}

   >>> index_of_coincidence(prep_text(frankenstein))
   0.06639195158323705
   >>> index_of_coincidence(prep_text(random_text))
   0.03983400000000001

   """

Exercise 8 Encrypt frankenstein with a keyword of your choice. This time, forget you know the key length. Using every_nth see if you can determine the key length using the Index of Coincidence computed on columns.

5 Kasiski test

Exercise 9 Write a function which takes a string and a list of substrings and runs the Kasiski test. Use kasiski_dists and gcd_list.

def kasiski(s, l):
    """
    >>> kasiski('AVLZXIYOFPPZXIYOUAOEQ', ['ZXIYO'])
    8

    """

Exercise 10 Encrypt frankenstein with a keyword of your choice. Using n_grams_by_freq to get repeated substrings, see if you can recover the keylength using kasiski.