Codes and Chains [o c p d e f ... f [o c p d e f ... f [o c p d h - - PowerPoint PPT Presentation

codes and chains
SMART_READER_LITE
LIVE PREVIEW

Codes and Chains [o c p d e f ... f [o c p d e f ... f [o c p d h - - PowerPoint PPT Presentation

[a b c d e f ... z] [a c b d e f ... z] [a c b d e f ... r] [x c b d e f ... r] [r c b d e f... f] [r c b d e f ... f [o c b d e f ... f Codes and Chains [o c p d e f ... f [o c p d e f ... f [o c p d h f ... f [o h p d c f ... f [n h p


slide-1
SLIDE 1

Codes and Chains

Paulo Orenstein [a b c d e f ... z] [a c b d e f ... z] [a c b d e f ... r] [x c b d e f ... r] [r c b d e f... f] [r c b d e f ... f [o c b d e f ... f [o c p d e f ... f [o c p d e f ... f [o c p d h f ... f [o h p d c f ... f [n h p d c f ... f [n h v d c f ... f [n h v m c f ... f [n h v m i f ... f [n m v h i f ... f

Mathematics Department, PUC-Rio Joint work with Juliana Freire

slide-2
SLIDE 2

Some words -- or not?

!2

slide-3
SLIDE 3

Some words -- or not?

!2

slide-4
SLIDE 4

Correspondences

!3

[a b d g ... h] [p e u l ... k] [m i t v ... r] . . .

slide-5
SLIDE 5

Correspondences

!3

[a b d g ... h] [p e u l ... k] [m i t v ... r] . . . 26!

slide-6
SLIDE 6

All the correspondences

slide-7
SLIDE 7

But we want a single one...

slide-8
SLIDE 8

Counting letters

!6

The one we are looking for makes the text as similar to portuguese as possible. Some correspondences are more plausible -- let’s quantify that. Counting the frequencies of letters is a good idea. Counting the frequencies of pairs of letters is even better. As a pattern, Dom Casmurro, with 288.892 characters. Each vertex is a correspondence between letters and ciphers.

slide-9
SLIDE 9

Counting letters

!6

The one we are looking for makes the text as similar to portuguese as possible. Some correspondences are more plausible -- let’s quantify that. Counting the frequencies of letters is a good idea. Counting the frequencies of pairs of letters is even better. As a pattern, Dom Casmurro, with 288.892 characters. Each vertex is a correspondence between letters and ciphers.

slide-10
SLIDE 10

Counting letters

!6

The one we are looking for makes the text as similar to portuguese as possible. Some correspondences are more plausible -- let’s quantify that. Counting the frequencies of letters is a good idea. Counting the frequencies of pairs of letters is even better. As a pattern, Dom Casmurro, with 288.892 characters. Each vertex is a correspondence between letters and ciphers.

slide-11
SLIDE 11

Counting letters

!6

The one we are looking for makes the text as similar to portuguese as possible. Some correspondences are more plausible -- let’s quantify that. Counting the frequencies of letters is a good idea. Counting the frequencies of pairs of letters is even better. As a pattern, Dom Casmurro, with 288.892 characters. Each vertex is a correspondence between letters and ciphers.

slide-12
SLIDE 12

Counting letters

!6

The one we are looking for makes the text as similar to portuguese as possible. Some correspondences are more plausible -- let’s quantify that. Counting the frequencies of letters is a good idea. Counting the frequencies of pairs of letters is even better. As a pattern, Dom Casmurro, with 288.892 characters. Each vertex is a correspondence between letters and ciphers.

slide-13
SLIDE 13

Counting letters

!6

The one we are looking for makes the text as similar to portuguese as possible. Some correspondences are more plausible -- let’s quantify that. Counting the frequencies of letters is a good idea. Counting the frequencies of pairs of letters is even better. As a pattern, Dom Casmurro, with 288.892 characters. Each vertex is a correspondence between letters and ciphers.

slide-14
SLIDE 14

What Dom Casmurro tells us

Most frequent pairs Least frequent pairs

AS (5511) CJ, MG, PB (0) RA (5374) FG, XD, VC (1) OT (5186) WA, DC, HN (2) ET (5019) TN, LJ, BT (3) DE (4902) DJ, ZH (4)

!7

slide-15
SLIDE 15

Imitating Portuguese

The plausibility of a correspondence c is

where port(par) is the number of time that the pair of letters appears on Dom Casmurro, codc(par) is the number of times that the pair appears in the cipher text using the correspondence c.

!8

Pl(c) = Π ⇣ ( )

c(

)⌘

,

Pl(c)

slide-16
SLIDE 16

What we have so far...

slide-17
SLIDE 17

What we have so far...

slide-18
SLIDE 18

What we have so far...

32 7 87 2 55 3 11 63 15 174 4 2 29 80 81 564 332 294 407

slide-19
SLIDE 19

[A B C D] [A C B D]

Adjacent correspondences

!10

slide-20
SLIDE 20

A graph of correspondences

slide-21
SLIDE 21
slide-22
SLIDE 22

32 7 87 2 55 3 11 63 15 80 4 2 29 77 81 564 332 5 407 32 2 87 2 55 30 11 63 15 174 4 2 29 23 81 564 332 897 8 32 7 87 2 55 3 11 63 15 174 4 76 29 80 81 564 332 294 407 32 7 87 2 55 3 11 63 15 123 9 2 29 80 81 199 76 32 7 87 2 55 3 11 63 15 174 4 2 29 80 81 294

slide-23
SLIDE 23

Our first Markov Chain

!14

slide-24
SLIDE 24

Our first Markov Chain

!14

Sun Cloudy Rain Sun

0.6 0.3 0.1

Cloudy

0.35 0.35 0.3

Rain

0.2 0.4 0.4

slide-25
SLIDE 25

Walking on the tetrahedron

slide-26
SLIDE 26

Walking on the tetrahedron

slide-27
SLIDE 27

Walking on the tetrahedron

slide-28
SLIDE 28

Walking on the tetrahedron

slide-29
SLIDE 29

Walking on the tetrahedron

slide-30
SLIDE 30

Walking on the tetrahedron

slide-31
SLIDE 31

Walking on the tetrahedron

slide-32
SLIDE 32

Stationary distribution

!16

Text Text

Initial distribution Stationary distribution

slide-33
SLIDE 33

Stationary distribution

!16

2 16 2 16 2 16 2 16 2 16 3 16 3 16

slide-34
SLIDE 34

!17

✓ 2 16 2 16 3 16 2 16 3 16 2 16 2 16 ◆ B B B B B B B B B @

1 2 1 2 1 2 1 2 1 3 1 3 1 3 1 2 1 2 1 3 1 3 1 3 1 2 1 2 1 2 1 2

1 C C C C C C C C C A = B B B B B B B B B @

2 16 2 16 3 16 2 16 3 16 2 16 2 16

1 C C C C C C C C C A

T

Stationary distribution

Indeed, this is the stationary distribution

slide-35
SLIDE 35

!18

Stationary distribution

Theorem X X M(x, y) π n → ∞ Mn(x, y) → π(y) ∀ x, y ∈ X.

n0 Mn(x, y) ≥ 0 n > n0

slide-36
SLIDE 36

The key insight

!19

Walking around the graph, we want to find vertices of high plausibility. It would be good to have a Markov Chain with a stationary distribution that gives higher probability to more plausible vertices. Let’s construct a Markov chain that has as its stationary distribution precisely the (normalized) plausibility: the Metropolis-Hastings algorithm.

slide-37
SLIDE 37

The key insight

!19

Walking around the graph, we want to find vertices of high plausibility. It would be good to have a Markov Chain with a stationary distribution that gives higher probability to more plausible vertices. Let’s construct a Markov chain that has as its stationary distribution precisely the (normalized) plausibility: the Metropolis-Hastings algorithm.

slide-38
SLIDE 38

The key insight

!19

Walking around the graph, we want to find vertices of high plausibility. It would be good to have a Markov Chain with a stationary distribution that gives higher probability to more plausible vertices. Let’s construct a Markov chain that has as its stationary distribution precisely the (normalized) plausibility: the Metropolis-Hastings algorithm.

slide-39
SLIDE 39

The key insight

!19

Walking around the graph, we want to find vertices of high plausibility. It would be good to have a Markov Chain with a stationary distribution that gives higher probability to more plausible vertices. Let’s construct a Markov chain that has as its stationary distribution precisely the (normalized) plausibility: the Metropolis-Hastings algorithm.

slide-40
SLIDE 40

Metropolis-Hastings

!20

Given a symmetric matrix M associated to a Markov chain and a vector , let us find another chain that has as its stationary distribution. P = [P(1) . . . P(n)] where correction is such that the sum of entries in each line of

M is 1.

P

M(x, y) =        M(x, y) x 6= y, P(y) P(x) M(x, y) P(y)

P(x)

x 6= y, P(y) < P(x) M(x, y) + x = y.

slide-41
SLIDE 41

Metropolis-Hastings

!20

Given a symmetric matrix M associated to a Markov chain and a vector , let us find another chain that has as its stationary distribution. P = [P(1) . . . P(n)] where correction is such that the sum of entries in each line of

M is 1.

P

M(x, y) =        M(x, y) x 6= y, P(y) P(x) M(x, y) P(y)

P(x)

x 6= y, P(y) < P(x) M(x, y) + x = y.

slide-42
SLIDE 42

Metropolis-Hastings

!20

Given a symmetric matrix M associated to a Markov chain and a vector , let us find another chain that has as its stationary distribution. P = [P(1) . . . P(n)] where correction is such that the sum of entries in each line of

M is 1.

P

M(x, y) =        M(x, y) x 6= y, P(y) P(x) M(x, y) P(y)

P(x)

x 6= y, P(y) < P(x) M(x, y) + x = y.

slide-43
SLIDE 43

Metropolis-Hastings, 3x3

!21

Let , with . The matrix indeed has as its stationary distribution:

(p1 p2 p3)    α M12

p2 p1

M13

p3 p1

M12 β M23 M13 M23

p2 p3

γ    = (p1 p2 p3).

M p = (p1 p2 p3)

p1 > p3 > p2

p

slide-44
SLIDE 44

Decoding

  • Pick a correspondence and calculate its plausibility.
  • Randomly pick an adjacent correspondence by exchanging a pair
  • f letters, and compare its plausibility with the previous one.
  • If it improves, accept the candidate correspondence; else, accept

the change with a very small probability.

  • Repeat the last few steps several times.
  • Read the ciphered text after making the prescribed substitutions.

!22

slide-45
SLIDE 45

It could all go wrong...

  • Is the text written from left to right?
  • English, Portuguese, Chinese?
  • Are there space characters?
  • Is there even any punctuation?
  • Uppercase, lowercase, accents?

!23

slide-46
SLIDE 46

Cross your fingers...

!24

slide-47
SLIDE 47

Cross your fingers...

!24

1.692 characters!

slide-48
SLIDE 48

The text is revealed!

!25

slide-49
SLIDE 49

The text is revealed!

“...raguezosignificaouaseumfixhopeoueonoprecisandodosmeuscui dadosdemaecuisjsintomademaicacsveljoouenaosabemouesiacoua ntomenoraminiatura...”

!25

slide-50
SLIDE 50

The text is revealed!

“...raguezosignificaouaseumfixhopeoueonoprecisandodosmeuscui dadosdemaecuisjsintomademaicacsveljoouenaosabemouesiacoua ntomenoraminiatura...”

!25

“...ouandoentrouparaocolegiooraguezpareciaserumsuleitoateoue beminteresantemaseispodiaperfettamentecuidarmelhordoouelhe eraseguramenteoftudoisolhepareuumacenaititeresamitepprovave lmenteondeeleestariemetidoparasemprenesecasocomoeoatament edeveremosatuarlaseiouetemgemnttentandoentendermeumisteri

  • soalfabetoueninguementente...”
slide-51
SLIDE 51

The author

!26

slide-52
SLIDE 52

A different kind of ending

!27

slide-53
SLIDE 53

A different kind of ending

!27

slide-54
SLIDE 54

!28

A different kind of ending

slide-55
SLIDE 55

!28

A different kind of ending

slide-56
SLIDE 56

References

CHEN, J., ROSENTHAL, J. S. Decrypting Classical Cipher Text Using Markov Chain Monte Carlo. Statistics and Computing, vol. 22, 397-413, 2011. DIACONIS, P. The Markov Chain Monte Carlo Revolution. Bulletin of the American Mathematical Society, vol. 46, 179-205, 2009. ESTEVES, B. Gritomudonomuro. Piauí, ed. 65, 14-18. Available at http:// revistapiaui.estadao.com.br/edicao-65/questoes-de-criptografia/gritomudonomuro. JOCKUSH, W., PROPP, J., SHOR, P. Random Domino Tilings and the Arctic Circle Theorem, 1998. Disponível em: http://arxiv.org/abs/math/9801068. LEVIN, D. A., PERES, Y., WILMER, E. L. Markov Chains and Mixing Times. American Mathematical Society, 2009.

!29