One way to break a substitution cipher is to use frequency analysis. Let’s focus on the English language for now but the process works the same way for other languages. The character “e” occurs 12.7% in the Eglish language and the letter “t” occurs 9.35% and the letter “a” is 8.2% of the time. So, you can count each character in the ciphered text, compute its frequency and start replacing it. After you complete the first say 4 characters, you may want to change your strategy. Can you explain why you want to stop after the first 4 characters? The two-character “of” is about 4.16% and “to” is about 2%. Now try three characters frequency.
After a period of time, you will be able to decipher most of the text. I am going to ask you to do your first significant homework using frequency analysis.
One possibility for the substitution cipher:
A
B
C
D
E
F
G
H
I
J
K
L
M
N
O
P
Q
R
S
T
U
V
W
X
Y
Z
V
J
Z
B
G
N
F
E
P
L
I
T
M
X
D
W
K
Q
U
C
R
Y
A
H
S
O
Note that we can have 25 factorial of these possibilities (remember 2^88 – 28-bit key)
Here is your homework – A character sub was used. Can you decipher as much of this text as you can? Submit your results in the drop box. You may use Python to code the solution.
ztrfqvdjrzqt
rci wzyabi wdmwrzrdrzqt jzacif zw g jzacif rcgr cgw miit zt dwi hqf ygtx cdtvfivw qh xigfw (gt iojibbitr czwrqfx zw lzsit zt wzyqt wztlcw ‘rci jqvi mqqk’). zr mgwzjgbbx jqtwzwrw qh wdmwrzrdrztl isifx abgztrior jcgfgjrif hqf g vzhhifitr jzacifrior jcgfgjrif. zr vzhhifw hfqy rci jgiwgf jzacif zt rcgr rci jzacif gbacgmir zw tqr wzyabx rci gbacgmir wczhriv, zr zw jqyabiribx udymbiv.
rci wzyabi wdmwrzrdrzqt jzacif qhhifw sifx bzrrbi jqyydtzjgrzqt wijdfzrx, gtv zr pzbb mi wcqpt rcgr zr jgt mi igwzbx mfqkit isit mx cgtv, iwaijzgbbx gw rci yiwwgliw mijqyi bqtlif (yqfi rcgt wisifgb cdtvfiv jzacifrior jcgfgjrifw).