SLIDE 1 A Natural Language Approach to Automated Cryptanalysis of Two-time Pads
Joshua Mason Kathryn Watkins Jason Eisner Adam Stubblefield
SLIDE 2
The Two Time Pad Problem
SLIDE 3 ⊕ Attack at Dawn doQvYcSWIPyXaC
SLIDE 4 Take the Beach ⊕ doQvYcSWIPyXaC ⊕ Attack at Dawn doQvYcSWIPyXaC
SLIDE 5 ⊕ Take the Beach ⊕ doQvYcSWIPyXaC ⊕ Attack at Dawn doQvYcSWIPyXaC
SLIDE 6 ⊕ Take the Beach Attack at Dawn ⊕ doQvYcSWIPyXaC doQvYcSWIPyXaC ⊕
⊕
Take the Beach ⊕ doQvYcSWIPyXaC ⊕ Attack at Dawn doQvYcSWIPyXaC
SLIDE 7 ⊕ Take the Beach Attack at Dawn ⊕ doQvYcSWIPyXaC doQvYcSWIPyXaC ⊕
⊕
Take the Beach ⊕ doQvYcSWIPyXaC ⊕ Attack at Dawn doQvYcSWIPyXaC
SLIDE 8 ⊕ Take the Beach Attack at Dawn ⊕ Take the Beach ⊕ doQvYcSWIPyXaC ⊕ Attack at Dawn doQvYcSWIPyXaC
SLIDE 9 Take the Beach Attack at Dawn ⊕
15 15 1f 04 43 1f 48 04 54 62 21 00 14 6
=
SLIDE 10
SLIDE 11
SLIDE 12
SLIDE 13
SLIDE 14 OJNcDfoMncXzYwwQQZRXYWORT190LP
SLIDE 15 OJNcDfoMncXzYwwQQZRXYWORT190LP the ⊕
SLIDE 16 QpL OJNcDfoMncXzYwwQQZRXYWORT190LP the ⊕
SLIDE 17 OJNcDfoMncXzYwwQQZRXYWORT190LP the ⊕
SLIDE 18 the ⊕ Man OJNcDfoMncXzYwwQQZRXYWORT190LP
SLIDE 19 Formalized by F. Rubin in 1978 Automated by E. Dawson and L. Nielson in 1996
SLIDE 20 Assumptions
- Uppercase English characters and space
- Space is always the most frequent character
SLIDE 21
P0 ⊕ P1 = 6e 71 00 6f 79 61
SLIDE 22
P0 ⊕ P1 = 6e 71 00 6f 79 61
SLIDE 23
P0 ⊕ P1 = 6e 71 6f 79 61
SLIDE 24
P0 ⊕ P1 = 6e 71 6f 79 61
SLIDE 25
P1 ⊕ P2 = 67 82 00 00 00 00 00 34
SLIDE 26
P1 ⊕ P2 = 67 82 00 00 00 00 00 34
SLIDE 27
P1 ⊕ P2 = 67 82 00 00 00 34
SLIDE 28 Testing Methodology
- Trained on the first 600K characters of the
Bible
- Attempted recovery of passages from first
600K characters of the bible
SLIDE 29 P0 ⊕ P1
62.7%
P1 ⊕ P2
61.5%
P0 ⊕ P1
62.6%
Percentage Correctly Recovered Dawson & Nielson
SLIDE 30 P0 ⊕ P1
62.7% 100%
P1 ⊕ P2
61.5% 99.99%
P0 ⊕ P1
62.6% 99.96%
Percentage Correctly Recovered Dawson & Nielson Our Technique
SLIDE 31 Our Assumptions
- Plaintext has some structure
- Plaintext is in a language we know
SLIDE 32
n-gram count
2 a 2 p 2 l 1 e 2
SLIDE 33
SLIDE 34 7 billion characters
SLIDE 35 450 million characters 7 billion characters
SLIDE 36 4 billion characters 450 million characters 7 billion characters
SLIDE 38 start a
0e 02 11 02
SLIDE 39 start a
P0 ⊕ P1
0e 02 11 02
SLIDE 40 start a
p(a) p(o) p(o) p(a)
P0 ⊕ P1
0e 02 11 02
SLIDE 41 start a
ap
ap p(p|a) p(r|o) p(r|o) p(p|a) p(a) p(o) p(o) p(a)
P0 ⊕ P1
0e 02 11 02
SLIDE 42 start a
ap
ap app
app p(a) p(o) p(o) p(a) p(p|a) p(r|o) p(r|o) p(p|a) p(p|ap) p(a|or) p(a|or) p(p|ap)
P0 ⊕ P1
0e 02 11 02
SLIDE 43 start a
ap
ap p(p|a) p(r|o) p(r|o) p(p|a) p(a) p(o) p(o) p(a)
P0 ⊕ P1
0e 02 0e 02
SLIDE 44 start a
ap
ap apa
apa p(p|a) p(r|o) p(r|o) p(p|a) p(a) p(o) p(o) p(a) p(a|ap) p(o|or) p(o|or) p(a|ap)
P0 ⊕ P1
0e 02 0e 02
SLIDE 45 start a
ap
ap apa
apa p(a|ap) p(o|or) p(o|or) p(a|ap) p(p|a) p(r|o) p(r|o) p(p|a) p(a) p(o) p(o) p(a)
P0 ⊕ P1
0e 02 0e 02
SLIDE 46
Memory/Computation
SLIDE 47 start a b c
P2 ⊕ P3 01 00 02 02
SLIDE 48 start b c c
P2 ⊕ P3 01 00 02 02
SLIDE 49 start b c c b
P2 ⊕ P3 01 00 02 02
SLIDE 50 start b c c b ba ca bb cb bc cc ca ba cb bb cc bc
P2 ⊕ P3 01 00 02 02
SLIDE 51 start b c c b p(b) p(c) p(c) p(b)
P2 ⊕ P3 01 00 02 02
SLIDE 52 b c c b p(b) p(c) p(c) p(b)
P2 ⊕ P3 01 00 02 02
SLIDE 53 p(b) p(c) p(c) p(b) ba ca bb cb bc cc ca ba cb bb cc bc b c c b
P2 ⊕ P3 01 00 02 02
SLIDE 54 p(b) p(c) p(c) p(b) ba ca bb cb bc cc ca ba cb bb cc bc b c c b p(a|b) p(a|c) p(b|b) p(b|c) p(c|b) p(c|c) p(a|c) p(a|b) p(b|c) p(b|b) p(c|c) p(c|b)
P2 ⊕ P3 01 00 02 02
SLIDE 55 ba ca bb cb bc cc ca ba cb bb cc bc p(a|b) p(a|c) p(b|b) p(b|c) p(c|b) p(c|c) p(a|c) p(a|b) p(b|c) p(b|b) p(c|c) p(c|b)
P2 ⊕ P3 01 00 02 02
SLIDE 56 ba ca bb cb bc cc ca ba cb bb cc bc
P2 ⊕ P3 01 00 02 02
SLIDE 57 ba ca ca ba cc bc
P2 ⊕ P3 01 00 02 02
SLIDE 58 ba ca ca ba cc bc
... P2 ⊕ P3 01 00 02 02
SLIDE 59
... P2 ⊕ P3 01 00 02 02
SLIDE 60
... END P2 ⊕ P3 01 00 02 02
SLIDE 61
END P2 ⊕ P3 01 00 02 02
SLIDE 62 ba ca
... END
b c
P2 ⊕ P3 01 00 02 02
SLIDE 63
Commodity Hardware
System
Dual Core Pentium 3 GHz
Memory
8 GB
Storage
1.2 TB
SLIDE 64
Model Build Time
~12 hours
Runtime
200 ms per byte
Memory Usage
~2 GB
SLIDE 65
Our testing methodology
SLIDE 66 402,590 Files 98,699 Files 520,931 Files
SLIDE 67 402,590 Files 98,699 Files 520,931 Files 2,590 Files 8,699 Files 20,931 Files
SLIDE 68 402,590 Files 98,699 Files 520,931 Files 2,590 Files 8,699 Files 20,931 Files 50 Files 50 Files 50 Files
SLIDE 69
Small HTML
90.64%
E-mail
82.29%
Documents
53.84%
SLIDE 70
Small Medium HTML
90.64% 92.78%
E-mail
82.29% 89.04%
Documents
53.84% 53.05%
SLIDE 71
Small Medium Large HTML
90.64% 92.78% 93.79%
E-mail
82.29% 89.04% 90.85%
Documents
53.84% 53.05% 52.72%
SLIDE 72
The Switching Problem
SLIDE 73 I want to remind you about
- ur All-Employee Meeting this
Tuesday, Oct. 23, at 10 a.m. Houston time at the Hyatt
- Regency. We obviously have a
lot to talk about. Last week Well I hope you have Dad doing some of the cleaning! You know how he always has an
- pinion but yet no
- participation. Anyway I hope
you're doing fine. I'm fine
SLIDE 74 I want to remind you about
- ur All-Employee Meeting this
Tuesday, Oct. 23, at 10 a.m. Houston time at the Hyatt Regency participation. Anyway I hope you're doing
- fine. I'm fine and about to
Well I hope you have Dad doing some of the cleaning! You know how he always has an
- pinion but yet no. We
- bviously have a lot to talk
- about. Last week we reported
third quarter earnings. We
SLIDE 75
Wu showed Word 2002 re-uses one time pad
SLIDE 76 T 1 3 / 1 5 1 D r e v i s i
1
Working T13 Draft 1510D
R e v i s i
1 . J a n u a r y 1 7 , 2 3
A T A / A T A P I H
t A d a p t e r s S t a n d a r d ( A T A – A d a p t e r )
T h i s i s a n i n t e r n a l w
k i n g d
u m e n t
T 1 3 , a T e c h n i c a l C
m i t t e e
A c c r e d i t e d S t a n d a r d s C
m i t t e e I N C I T S . T h e T 1 3 T e c h n i c a l C
m i t t e e m a y m
i f y t h e c
t e n t s . T h i s d
u m e n t i s m a d e a v a i l a b l e f
r e v i e w a n d c
m e n t
l y . P e r m i s s i
i s g r a n t e d t
e m b e r s
I N C I T S , i t s t e c h n i c a l c
m i t t e e s , a n d t h e i r a s s
i a t e d t a s k g r
p s t
e p r
u c e t h i s d
u m e n t f
t h e p u r p
e s
I N C I T S s t a n d a r d i z a t i
a c t i v i t i e s w i t h
t f u r t h e r p e r m i s s i
, p r
i d e d t h i s n
i c e i s i n c l u d e d . A l l
h e r r i g h t s a r e r e s e r v e d . A n y c
m e r c i a l
f
r
i t r e p l i c a t i
r e p u b l i c a t i
i s p r
i b i t e d . T 1 3 T e c h n i c a l E d i t
: T
y G
f e l l
P a c i f i c D i g i t a l C
p
a t i
2 5 2 A l t
P a r k w a y I r v i n e , C A 9 2 6 2 U S A T e l : 9 4 9
5 2
1 1 1 F a x : 9 4 9
5 2
3 9 7 E m a i l : t g
f e l l
@ p a c i f i c d i g i t a l . c
W
k i n g T 1 3 D r a f t 1 5 3 2 D V
u m e 1
Revision 2 18 February 2003
I n f
m a t i
T e c h n
y
T A t t a c h m e n t w i t h P a c k e t I n t e r f a c e – 7 V
u m e 1 ( A T A / A T A P I
V 1 )
This is an internal working document of T13, a Technical Committee of Accredited Standards Committee
- INCITS. As such, this is not a completed standard and has not been approved. The contents may be modified
by the T13 Technical Committee. This document is made available for review and comment only. Permission is granted to members of INCITS, its technical committees, and their associated task groups to reproduce this document for the purposes of INCITS standardization activities without further permission, provided this notice is included. All other rights are reserved. Any commercial or for-profit replication or republication is prohibited. T13 Technical Editor: Peter T. McLean Maxtor Corporation 2190 Miller Drive Longmont, CO 80501-6744 USA Tel: 303-678-2149 Fax: 303-682-4811 Email: pete_mclean@maxtor.com
R e f e r e n c e n u m b e r A N S I I N C I T S . * * *
x x x P r i n t e d O c t
e r , 1 7 , 2 6 1 2 : 5 6 P M
SLIDE 77 W
k i n g T 1 3 D r a f t 1 5 3 2 D V
u m e 1
Revision 2 18 February 2003
I n f
m a t i
T e c h n
y
T A t t a c h m e n t w i t h P a c k e t I n t e r f a c e – 7 V
u m e 1 ( A T A / A T A P I
V 1 )
This is an internal working document of T13, a Technical Committee of Accredited Standards Committee
- INCITS. As such, this is not a completed standard and has not been approved. The contents may be modified
by the T13 Technical Committee. This document is made available for review and comment only. Permission is granted to members of INCITS, its technical committees, and their associated task groups to reproduce this document for the purposes of INCITS standardization activities without further permission, provided this notice is included. All other rights are reserved. Any commercial or for-profit replication or republication is prohibited. T13 Technical Editor: Peter T. McLean Maxtor Corporation 2190 Miller Drive Longmont, CO 80501-6744 USA Tel: 303-678-2149 Fax: 303-682-4811 Email: pete_mclean@maxtor.com
R e f e r e n c e n u m b e r A N S I I N C I T S . * * *
x x x P r i n t e d O c t
e r , 1 7 , 2 6 1 2 : 5 6 P M
T13/1510D revision 1
Working T13 Draft 1510D
Revision 1.0 January 17, 2003
A T A / A T A P I H
t A d a p t e r s S t a n d a r d ( A T A – A d a p t e r )
This is an internal working document of T13, a Technical Committee of Accredited Standards Committee
- INCITS. The T13 Technical Committee may modify the contents. This document is made available for review
and comment only. Permission is granted to members of INCITS, its technical committees, and their associated task groups to reproduce this document for the purposes of INCITS standardization activities without further permission, provided this notice is included. All other rights are reserved. Any commercial or for-profit replication or republication is prohibited. T13 Technical Editor: Tony Goodfellow Pacific Digital Corporation 2052 Alton Parkway Irvine, CA92602 USA Tel: 949-252-1111 Fax: 949-252-9397 Email: tgoodfellow@pacificdigital.com
Revision 1 January 17, 2003
SLIDE 78 T 1 3 / 1 5 1 D r e v i s i
1
Working T13 Draft 1510D
R e v i s i
1 . J a n u a r y 1 7 , 2 3
A T A / A T A P I H
t A d a p t e r s S t a n d a r d ( A T A – A d a p t e r )
T h i s i s a n i n t e r n a l w
k i n g d
u m e n t
T 1 3 , a T e c h n i c a l C
m i t t e e
A c c r e d i t e d S t a n d a r d s C
m i t t e e I N C I T S . T h e T 1 3 T e c h n i c a l C
m i t t e e m a y m
i f y t h e c
t e n t s . T h i s d
u m e n t i s m a d e a v a i l a b l e f
r e v i e w a n d c
m e n t
l y . P e r m i s s i
i s g r a n t e d t
e m b e r s
I N C I T S , i t s t e c h n i c a l c
m i t t e e s , a n d t h e i r a s s
i a t e d t a s k g r
p s t
e p r
u c e t h i s d
u m e n t f
t h e p u r p
e s
I N C I T S s t a n d a r d i z a t i
a c t i v i t i e s w i t h
t f u r t h e r p e r m i s s i
, p r
i d e d t h i s n
i c e i s i n c l u d e d . A l l
h e r r i g h t s a r e r e s e r v e d . A n y c
m e r c i a l
f
r
i t r e p l i c a t i
r e p u b l i c a t i
i s p r
i b i t e d . T 1 3 T e c h n i c a l E d i t
: T
y G
f e l l
P a c i f i c D i g i t a l C
p
a t i
2 5 2 A l t
P a r k w a y I r v i n e , C A 9 2 6 2 U S A T e l : 9 4 9
5 2
1 1 1 F a x : 9 4 9
5 2
3 9 7 E m a i l : t g
f e l l
@ p a c i f i c d i g i t a l . c
Working T13 Draft 1532D Volume 1
R e v i s i
2 1 8 F e b r u a r y 2 3
I n f
m a t i
T e c h n
y
T A t t a c h m e n t w i t h P a c k e t I n t e r f a c e – 7 V
u m e 1 ( A T A / A T A P I
V 1 )
T h i s i s a n i n t e r n a l w
k i n g d
u m e n t
T 1 3 , a T e c h n i c a l C
m i t t e e
A c c r e d i t e d S t a n d a r d s C
m i t t e e I N C I T S . A s s u c h , t h i s i s n
a c
p l e t e d s t a n d a r d a n d h a s n
b e e n a p p r
e d . T h e c
t e n t s m a y b e m
i f i e d b y t h e T 1 3 T e c h n i c a l C
m i t t e e . T h i s d
u m e n t i s m a d e a v a i l a b l e f
r e v i e w a n d c
m e n t
l y . P e r m i s s i
i s g r a n t e d t
e m b e r s
I N C I T S , i t s t e c h n i c a l c
m i t t e e s , a n d t h e i r a s s
i a t e d t a s k g r
p s t
e p r
u c e t h i s d
u m e n t f
t h e p u r p
e s
I N C I T S s t a n d a r d i z a t i
a c t i v i t i e s w i t h
t f u r t h e r p e r m i s s i
, p r
i d e d t h i s n
i c e i s i n c l u d e d . A l l
h e r r i g h t s a r e r e s e r v e d . A n y c
m e r c i a l
f
r
i t r e p l i c a t i
r e p u b l i c a t i
i s p r
i b i t e d . T 1 3 T e c h n i c a l E d i t
: P e t e r T . M c L e a n M a x t
C
p
a t i
2 1 9 M i l l e r D r i v e L
g m
t , C O 8 5 1
7 4 4 U S A T e l : 3 3
7 8
1 4 9 F a x : 3 3
8 2
8 1 1 E m a i l : p e t e _ m c l e a n @ m a x t
. c
R e f e r e n c e n u m b e r A N S I I N C I T S . * * *
x x x P r i n t e d O c t
e r , 1 7 , 2 6 1 2 : 5 6 P M
Revision 2 18 February 2003
SLIDE 79
- November 13, 2002 ATA/ATAPI Host
Adapters Standard (ATA Adapter) This is an internal working document
- f T13, a Technical Committee of
Accredited Standards Committee
- INCITS. The T13 Technical Committee
may modify the contents. This document is made available for review and comment only. Permission is granted to members of INCITS, its technical committees, and their associated task groups to reproduce
SLIDE 80
- November 13, 2002 ATA/ATAPI Host
Adapters Standard (ATF; h Packet) This is no internal working document
- f T13, a Technical Committee of
Accredited Standards Committee
- INCITS. The T13 Technical Committee
may modify the contents. This document is made available and has not been approved. The contents may be modified by the T13 Technical technical committees, and their associated task groups to reproduce
SLIDE 81
- November 13, 2002 ATA/ATAPI Host
Adapters Standard (ATF; h Packet) This is no internal working document
- f T13, a Technical Committee of
Accredited Standards Committee
- INCITS. The T13 Technical Committee
may modify the contents. This document is made available and has not been approved. The contents may be modified by the T13 Technical technical committees, and their associated task groups to reproduce
SLIDE 82
Exact Pairwise HTML
93.79% 99.45%
E-mail
90.85% 98.41%
Documents
52.72% 75.91%
SLIDE 83 Take the Beach ⊕ doQvYcSWIPyXaC ⊕ Attack at Dawn doQvYcSWIPyXaC
SLIDE 84 Take the Beach ⊕ doQvYcSWIPyXaC ⊕ Attack at Dawn doQvYcSWIPyXaC Bring me Cakes ⊕ doQvYcSWIPyXaC
SLIDE 85 Take the Beach ⊕ doQvYcSWIPyXaC ⊕ Attack at Dawn doQvYcSWIPyXaC ⊕ Take the Beach Attack at Dawn ⊕
SLIDE 86 ⊕ Attack at Dawn doQvYcSWIPyXaC ⊕ Attack at Dawn ⊕ Bring me Cakes Bring me Cakes ⊕ doQvYcSWIPyXaC
SLIDE 87 Take the Beach ⊕ doQvYcSWIPyXaC ⊕ Take the Beach ⊕ Bring me Cakes Bring me Cakes ⊕ doQvYcSWIPyXaC
SLIDE 88 ⊕ ⊕ ⊕ Attack at Dawn Take the Beach Take the Beach Attack at Dawn Bring me Cakes Bring me Cakes
SLIDE 89 ⊕ Attack at Dawn Take the Beach
SLIDE 90 ⊕ Attack at Dawn Take the Beach A T
SLIDE 91 ⊕ Take the Beach Bring me Cakes ⊕ Attack at Dawn Take the Beach A T T B
SLIDE 92
Small HTML
99.96%
E-mail
98.24%
Documents
69.92%
SLIDE 93
Small Medium HTML
99.96% 99.95%
E-mail
98.24% 98.33%
Documents
69.92% 71.11%
SLIDE 94
Small Medium Large HTML
99.96% 99.95% 99.95%
E-mail
98.24% 98.33% 98.34%
Documents
69.92% 71.11% 69.39%
SLIDE 96
Large HTML
93.79%
E-mail ⊕ HTML
96.60%
E-mail
90.85%
SLIDE 97 Able to recover plaintext with over 99% accuracy
Conclusions
SLIDE 98 Able to recover plaintext with over 99% accuracy Technique works on different document types
Conclusions
SLIDE 99 Able to recover plaintext with over 99% accuracy Technique works on different document types Keystream reuse is a real problem
Conclusions