[a b c d e f ... z] [a c b d e f ... z] [a c b d e f ... r] [x c b d e f ... r] [r c b d e f... f] [r c b d e f ... f [o c b d e f ... f Codes and Chains [o c p d e f ... f [o c p d e f ... f [o c p d h f ... f [o h p d c f ... f [n h p d c f ... f [n h v d c f ... f Paulo Orenstein [n h v m c f ... f Joint work with Juliana Freire [n h v m i f ... f Mathematics Department, PUC-Rio [n m v h i f ... f
Some words -- or not? ! 2
Some words -- or not? ! 2
Correspondences [a b d g ... h] [p e u l ... k] [m i t v ... r] . . . ! 3
Correspondences [a b d g ... h] [p e u l ... k] 26! [m i t v ... r] . . . ! 3
All the correspondences
But we want a single one...
Counting letters Each vertex is a correspondence between letters and ciphers. The one we are looking for makes the text as similar to portuguese as possible. Some correspondences are more plausible -- let’s quantify that. Counting the frequencies of letters is a good idea. Counting the frequencies of pairs of letters is even better. As a pattern, Dom Casmurro , with 288.892 characters. ! 6
Counting letters Each vertex is a correspondence between letters and ciphers. The one we are looking for makes the text as similar to portuguese as possible. Some correspondences are more plausible -- let’s quantify that. Counting the frequencies of letters is a good idea. Counting the frequencies of pairs of letters is even better. As a pattern, Dom Casmurro , with 288.892 characters. ! 6
Counting letters Each vertex is a correspondence between letters and ciphers. The one we are looking for makes the text as similar to portuguese as possible. Some correspondences are more plausible -- let’s quantify that. Counting the frequencies of letters is a good idea. Counting the frequencies of pairs of letters is even better. As a pattern, Dom Casmurro , with 288.892 characters. ! 6
Counting letters Each vertex is a correspondence between letters and ciphers. The one we are looking for makes the text as similar to portuguese as possible. Some correspondences are more plausible -- let’s quantify that. Counting the frequencies of letters is a good idea. Counting the frequencies of pairs of letters is even better. As a pattern, Dom Casmurro , with 288.892 characters. ! 6
Counting letters Each vertex is a correspondence between letters and ciphers. The one we are looking for makes the text as similar to portuguese as possible. Some correspondences are more plausible -- let’s quantify that. Counting the frequencies of letters is a good idea. Counting the frequencies of pairs of letters is even better. As a pattern, Dom Casmurro , with 288.892 characters. ! 6
Counting letters Each vertex is a correspondence between letters and ciphers. The one we are looking for makes the text as similar to portuguese as possible. Some correspondences are more plausible -- let’s quantify that. Counting the frequencies of letters is a good idea. Counting the frequencies of pairs of letters is even better. As a pattern, Dom Casmurro , with 288.892 characters. ! 6
What Dom Casmurro tells us Most frequent pairs Least frequent pairs AS (5511) CJ, MG, PB (0) RA (5374) FG, XD, VC (1) OT (5186) WA, DC, HN (2) ET (5019) TN, LJ, BT (3) DE (4902) DJ, ZH (4) ! 7
Imitating Portuguese The plausibility of a correspondence c is Pl ( c ) ⇣ ) ⌘ c ( Pl ( c ) = Π ( ) , where port(par) is the number of time that the pair of letters appears on Dom Casmurro , cod c (par) is the number of times that the pair appears in the cipher text using the correspondence c . ! 8
What we have so far...
What we have so far...
What we have so far... 32 7 87 3 11 2 55 63 29 80 2 4 15 174 81 294 332 407 564
Adjacent correspondences [A B C D] [A C B D] ! 10
A graph of correspondences
32 7 32 87 3 11 2 2 87 55 30 63 11 29 2 55 77 2 4 63 15 29 80 23 2 4 32 15 81 174 5 7 87 332 3 81 11 897 2 407 55 332 63 29 8 564 80 4 76 32 15 564 174 32 7 87 87 3 81 11 7 294 2 55 332 3 11 9 63 2 29 55 407 80 2 63 29 15 564 80 2 4 199 15 123 76 174 81 81 294
Our first Markov Chain ! 14
Our first Markov Chain Sun Cloudy Rain 0.6 0.3 0.1 Sun 0.35 0.35 0.3 Cloudy 0.2 0.4 0.4 Rain ! 14
Walking on the tetrahedron
Walking on the tetrahedron
Walking on the tetrahedron
Walking on the tetrahedron
Walking on the tetrahedron
Walking on the tetrahedron
Walking on the tetrahedron
Stationary distribution Initial distribution Text Text Stationary distribution ! 16
Stationary distribution 2 2 16 16 3 3 16 16 2 2 2 16 16 16 ! 16
Stationary distribution Indeed, this is the stationary distribution T 0 1 1 1 0 2 1 0 0 0 0 0 2 2 16 1 1 2 0 0 0 0 0 B C B C 2 2 16 B C B C 1 1 1 3 ✓ 2 0 0 0 0 B C B C ◆ 3 3 3 16 2 3 2 3 2 2 B C B C 1 1 2 0 0 0 0 0 B C B C = 2 2 16 16 16 16 16 16 16 16 B C B C 1 1 1 3 B 0 0 0 0 C B C 3 3 3 16 B C B C 1 1 2 B C B C 0 0 0 0 0 @ A @ A 2 2 16 1 1 2 0 0 0 0 0 2 2 16 ! 17
Stationary distribution Theorem X M ( x, y ) X π n → ∞ M n ( x, y ) → π ( y ) ∀ x, y ∈ X . M n ( x, y ) ≥ 0 n 0 n > n 0 ! 18
The key insight Walking around the graph, we want to find vertices of high plausibility. It would be good to have a Markov Chain with a stationary distribution that gives higher probability to more plausible vertices. Let’s construct a Markov chain that has as its stationary distribution precisely the (normalized) plausibility: the Metropolis-Hastings algorithm. ! 19
The key insight Walking around the graph, we want to find vertices of high plausibility. It would be good to have a Markov Chain with a stationary distribution that gives higher probability to more plausible vertices. Let’s construct a Markov chain that has as its stationary distribution precisely the (normalized) plausibility: the Metropolis-Hastings algorithm. ! 19
The key insight Walking around the graph, we want to find vertices of high plausibility. It would be good to have a Markov Chain with a stationary distribution that gives higher probability to more plausible vertices. Let’s construct a Markov chain that has as its stationary distribution precisely the (normalized) plausibility: the Metropolis-Hastings algorithm. ! 19
The key insight Walking around the graph, we want to find vertices of high plausibility. It would be good to have a Markov Chain with a stationary distribution that gives higher probability to more plausible vertices. Let’s construct a Markov chain that has as its stationary distribution precisely the (normalized) plausibility: the Metropolis-Hastings algorithm. ! 19
Metropolis-Hastings Given a symmetric matrix M associated to a Markov chain and a vector , let us find another chain that has as P P = [ P ( 1 ) . . . P ( n )] its stationary distribution. M ( x, y ) x 6 = y, P ( y ) � P ( x ) M ( x, y ) P ( y ) M ( x, y ) = x 6 = y, P ( y ) < P ( x ) P ( x ) M ( x, y ) + x = y. where correction is such that the sum of entries in each line of M is 1. ! 20
Metropolis-Hastings Given a symmetric matrix M associated to a Markov chain and a vector , let us find another chain that has as P P = [ P ( 1 ) . . . P ( n )] its stationary distribution. M ( x, y ) x 6 = y, P ( y ) � P ( x ) M ( x, y ) P ( y ) M ( x, y ) = x 6 = y, P ( y ) < P ( x ) P ( x ) M ( x, y ) + x = y. where correction is such that the sum of entries in each line of M is 1. ! 20
Metropolis-Hastings Given a symmetric matrix M associated to a Markov chain and a vector , let us find another chain that has as P P = [ P ( 1 ) . . . P ( n )] its stationary distribution. M ( x, y ) x 6 = y, P ( y ) � P ( x ) M ( x, y ) P ( y ) M ( x, y ) = x 6 = y, P ( y ) < P ( x ) P ( x ) M ( x, y ) + x = y. where correction is such that the sum of entries in each line of M is 1. ! 20
Metropolis-Hastings, 3x3 Let , with . p = ( p 1 p 2 p 3 ) p 1 > p 3 > p 2 The matrix indeed has as its stationary distribution: M p p 2 p 3 α M 12 M 13 p 1 p 1 ( p 1 p 2 p 3 ) = ( p 1 p 2 p 3 ) . M 12 β M 23 p 2 M 13 M 23 γ p 3 ! 21
Decoding ‣ Pick a correspondence and calculate its plausibility. ‣ Randomly pick an adjacent correspondence by exchanging a pair of letters, and compare its plausibility with the previous one. ‣ If it improves, accept the candidate correspondence; else, accept the change with a very small probability. ‣ Repeat the last few steps several times. ‣ Read the ciphered text after making the prescribed substitutions. ! 22
It could all go wrong... - Is the text written from left to right? - English, Portuguese, Chinese? - Are there space characters? - Is there even any punctuation? - Uppercase, lowercase, accents? ! 23
Cross your fingers... ! 24
Cross your fingers... 1.692 characters! ! 24
Recommend
More recommend