Lab 9 : Cryptanalysis of the Vigenère cipher¶
0 Preliminaries¶
Download the following code to get started lab9.py
.
To run doctests, add the following to the bottom of your file:
if (__name__ == '__main__'):
import doctest
doctest.testmod(optionflags= doctest.NORMALIZE_WHITESPACE)
1 Vigenère cipher¶
Here is a way to implement the Vigenère encryption and decryption functions. First, divide the text up column-wise, with the number of columns being the length of the key. Second, encrypt (or decrypt) the ith column using the ith letter of the keyword using the additive cipher. Third, interleave the columns back into a single string.
Exercise 1 Implement the Vigenère encryption function. Use every_nth
to get the columns, additive_encrypt
to encrypt each column, and zip_uneven
to interleave the encrypted columns.
def encrypt(s, k):
"""
>>> encrypt('ONAPLANETHEPLANEISDUE', 'MILK')
'AVLZXIYOFPPZXIYOUAOEQ'
"""
Exercise 2 Write a function which “inverts” a keyword: that is, it should return a keyword each of whose letters is the additive inverse of the corresponding letter in the input keyword.
def invert_key(k):
"""
>>> invert_key('MILK')
'OSPQ'
"""
Exercise 3 Implement the Vigenère decryption function. Use encrypt
and invert_key
.
def decrypt(s, k):
"""
>>> decrypt('AVLZXIYOFPPZXIYOUAOEQ','MILK')
'ONAPLANETHEPLANEISDUE'
"""
2 Guessing keys¶
Suppose you are given a ciphertext you know to be encrypted with Vigenère, and you have a hypothesis about what the key length is. One strategy would be to try to apply the letter frequency attack you would use against the additive cipher column-wise. So first divide up the ciphertext into n columns, where n is the conjectured key length. Second, determine the most frequent letters in each column and view the result as a single word. Third, guess the keyword by viewing this word as the encryption of a string of English letters of high probability (determined only by letter frequency): e.g., ‘EEEE’.
Exercise 4 Write a function which gives the highest frequency letters in each column. Use every_nth
and letters_by_freq
.
def column_top_freq(s, step):
"""
>>> column_top_freq('AVLZXIYOFPPZXIYOUAOEQ', 4)
['X', 'I', 'Y', 'O']
"""
Exercise 5 Write a function which computes your guess of the key based on a given string of English letters. Use column_top_freqs
and decrypt
.
def guess_key(s, t):
"""
>>> guess_key('AVLZXIYOFPPZXIYOUAOEQ', 'EEEE')
'TEUK'
"""
Exercise 6 Encrypt frankenstein
with a keyword of your choice (use prep_text
first). See if you can recover it using this strategy (assuming you know the key length).
Challenge Write a function which computes many guesses according using relatively high probability letters in English.
eng_letters_by_freq = ['E', 'T', 'A', 'O', 'I', 'N', 'S',
'H', 'R', 'D', 'L', 'C', 'U', 'M',
'W', 'F', 'G', 'Y', 'P', 'B', 'V',
'K', 'J', 'X', 'Q', 'Z']
from itertools import product
def prod_high_freq_eng(n, repeat):
"""
>>> prod_high_freq_eng(2, 3)
['EEE', 'EET', 'ETE', 'ETT', 'TEE', 'TET', 'TTE', 'TTT']
"""
def guess_keys(s, step, n):
"""
>>> guess_keys('AVLZXIYOFPPZXIYOUAOEQ', 4, 2)
['TEUK', 'TEUV', 'TEFK', 'TEFV', 'TPUK', 'TPUV', 'TPFK', 'TPFV', 'EEUK', 'EEUV', 'EEFK', 'EEFV', 'EPUK', 'EPUV', 'EPFK', 'EPFV']
"""
4 Index of Coincidence¶
One method to guess the key length is to compute the Index of Coincidence of column-wise. For example, in the case of there being only one column, if the IC of the text is much nearer to the IC of English than the IC of random text, then you already have evidence that the text was encrypted with a monoalphabetic substitution cipher.
Exercise 7 Write a function which computes the Index of Coincidence of a given string. Use count_letters
.
def index_of_coincidence(s):
"""
Assumes s is string over {A,...,Z}
>>> index_of_coincidence(prep_text(frankenstein))
0.06639195158323705
>>> index_of_coincidence(prep_text(random_text))
0.03983400000000001
"""
Exercise 8 Encrypt frankenstein
with a keyword of your choice. This time, forget you know the key length. Using every_nth
see if you can determine the key length using the Index of Coincidence computed on columns.
5 Kasiski test¶
Exercise 9 Write a function which takes a string and a list of substrings and runs the Kasiski test. Use kasiski_dists
and gcd_list
.
def kasiski(s, l):
"""
>>> kasiski('AVLZXIYOFPPZXIYOUAOEQ', ['ZXIYO'])
8
"""
Exercise 10 Encrypt frankenstein
with a keyword of your choice. Using n_grams_by_freq
to get repeated substrings, see if you can recover the keylength using kasiski
.