venerdì 30 settembre 2011

Break easily Caesar Crypto-System

When we would guarantee the confidentiality of our data we can use a lot of ciphering methods. The problem dates back to the ancient Romans: when Julius Caesar have to communicate with his generals he wanted to do it without the bearers could read the contents of the message. For this reason he thought up a simple method of encryption that at our days is one of the simplest and most widely known encryption techniques.
Now we're going to how it works. Simply take a integer number in interval [0 to 26]: this is your chiper key (k). The algorithm based on shifting alphabet of k steps and on corresponding between character in original alphabet with character present in same position in transformed alphabet.
I.e. if we choose k = 4 we obtain that corresponding:

normal alphabet: abcdefghijklmnopqrstuvwxyz
shifted alphabet: efghijklmnopqrstuvwxyzabcd

Note that shifting is obtainable considering modulo operation; in fact if you associate a number for each letter in base of position in the alphabet (a --> 0, b-->1, ..., z-->25), to obtain transformed alphabet compute a modulo operation for each letters. I.e. with k = 4:

'a'-->0; (0+4) mod 26 = 4<--'e'
'o'-->14; (14+4) mod 26 = 18<--'s'
'y'-->24; (24+4) mod 26 = 2<--'c'

In this case the world "Caesar" become "Geiwev" because 'C' correspond to 'G', 'a' correspond to 'e', 'e' correspond to 'i', and so on.

If we refers to classical definition of crypto-system, Caesar Encryption uses these characteristics:

  • M = {sequences of letters}
  • K = {i / i is an integer and 0<=i<=25}
  • E = {E(k,m) = (m+k) mod 26}
  • D = {D(k,c) = (26+c-k) mod 26}
  • C = M
The Caesar cipher can be easily broken even attacker knows the ciphertext and that the used algorithm is Caesar Encryption. Since there are only 26 keys is very easy organize a brute-force attack: taking the decrypting function and trying all 26 keys, one of these works properly and give the original text in clear. But if we want to try few keys, there are the most likely, we can use a mathematical (statistical) approach based on correlation. Considering character frequency on English language (or another language) we can calculate the likeness between chars frequency in ciphertext, like decrypted with a generic key, and chars frequency in English language. In mathematical terms, for each key (from 0 to 25) we have to calculate:

f(c): the frequency of character c in English language
p(c): the frequency of character c in ciphertext
a(i): the correlation function calculate in this way



Consider this example to better understand:

Suppose this is cipher text we want decrypt and this is written in English:

dwnzswna ykza ywaown atwilha

First of all we need chars frequency table in English language: with google (or wikipedia) we can find it easily. I use this:

A --> 0.080000
B --> 0.015000
C --> 0.030000
D --> 0.040000
E --> 0.130000
F --> 0.020000
G --> 0.015000
H --> 0.060000
I --> 0.065000
J --> 0.005000
K --> 0.005000
L --> 0.035000
M --> 0.030000
N --> 0.070000
O --> 0.080000
P --> 0.020000
Q --> 0.002000
R --> 0.065000
S --> 0.060000
T --> 0.090000
U --> 0.030000
V --> 0.010000
W --> 0.015000
X --> 0.005000
Y --> 0.020000
Z --> 0.002000

Then obtain a frequency of letters that appear in ciphertext:

A --> 0.200000
B --> 0.000000
C --> 0.000000
D --> 0.040000
E --> 0.000000
F --> 0.000000
G --> 0.000000
H --> 0.040000
I --> 0.040000
J --> 0.000000
K --> 0.040000
L --> 0.040000
M --> 0.000000
N --> 0.120000
O --> 0.040000
P --> 0.000000
Q --> 0.000000
R --> 0.000000
S --> 0.040000
T --> 0.040000
U --> 0.000000
V --> 0.000000
W --> 0.200000
X --> 0.000000
Y --> 0.080000
Z --> 0.080000
It's easy to calculate: named o(i) the occurrence of generic letters i in the ciphertext. The frequency of letter i in ciphertext is: p(c) = o(i)/numbers of letters in ciphertext.
At this step we are ready to calculare 26 correlation function:

0 --> 0.046560
1 --> 0.020400
2 --> 0.024480
3 --> 0.038080
4 --> 0.032880
5 --> 0.043600
6 --> 0.040200
7 --> 0.050000
8 --> 0.044440
9 --> 0.049440
10 --> 0.035600
11 --> 0.039680
12 --> 0.035480
13 --> 0.034480
14 --> 0.038040
15 --> 0.039280
16 --> 0.023200
17 --> 0.027480
18 --> 0.058880
19 --> 0.039280
20 --> 0.042760
21 --> 0.039880
22 --> 0.067600
23 --> 0.024040
24 --> 0.026880
25 --> 0.036360

The best 5 results are in correspond with keys 22, 18, 7, 8 and 0. If you try this 5 keys you can successfully  discover that the right key is 22 and the text in clear:

 hardware code caesar example

For your curiosity, or even to test Caesar encryption, I develop a package of functions that can be used in Matlab environment that you can download from link below.