Linguistic Steganography on Twitter: Hierarchical Language Modelling - - PowerPoint PPT Presentation

linguistic steganography on twitter hierarchical language
SMART_READER_LITE
LIVE PREVIEW

Linguistic Steganography on Twitter: Hierarchical Language Modelling - - PowerPoint PPT Presentation

Linguistic Steganography on Twitter: Hierarchical Language Modelling with Manual Interaction Alex Wilson, Phil Blunsom, and Andrew D. Ker University of Oxford Twitter Twitter is a social networking site, launched in 2006. Users post


slide-1
SLIDE 1

Linguistic Steganography on Twitter: Hierarchical Language Modelling with Manual Interaction

Alex Wilson, Phil Blunsom, and Andrew D. Ker University of Oxford

slide-2
SLIDE 2

Twitter

◮ Twitter is a social networking site, launched in 2006. ◮ Users post short messages (tweets), at most 140 characters

long.

◮ 500M tweets posted each day, from 200M active users. ◮ Twitter a suitable setting because linguistic steganography

generally requires the steganographer to act as the cover source.

slide-3
SLIDE 3

Twitter Steganography

◮ Alice has a Twitter account, and has posted some number of

innocent tweets, before starting to send steganographic messages.

◮ Bob shares a key with Alice, and has access to her tweets. ◮ We assume the Warden is human.

slide-4
SLIDE 4

CoverTweet

!"#$%&'"(#)*'$ +,,-./ 0*"%)12*%)'")*'$1 (3#%4"3 536'$212*%)'")*'$1 (3#%4"3 !"#$% &'$$& (&$)" &'$$& *+,-"+. /$, 000 000 000 0*7)3"1891 :#%;1<#743 =7*63

slide-5
SLIDE 5

CoverTweet

!"#$%

gosh now I really don’t want my beard to go away

&'()*+,'-(.#,) /00123 4#'*.56#*.,'.#,)5

  • %(*7'%

8%$,)656#*.,'.#,)5

  • %(*7'%

!"#$% &'$$& (&$)" &'$$& *+,-"+. /$, 000 1%+234"%5$. &'$$&3 000 000 4#".%'59:5 ;(*<5=("7% 1%+234"%536&7$68"#$%692&"65:-&9;-$63&$)" 6"<=$8&3

slide-6
SLIDE 6

CoverTweet

!"#$%

gosh now I really don’t want my beard to go away gosh today i truly don’t want anything my beard to move away gosh now i genuinely don’t want my beard to go away god now i truly do not want my beard to go away gosh today i really don’t want my beard to go away ... gosh now I really don’t want to my barbe of going away gosh now I genuinely just don’t wanna my beard to go away gosh there, i really don’t wanna my beard to go away gosh now I really don’t mean my beard to get away gosh now I truly don’t want my beard of going away

&'()*+,'-(.#,) /00123 4#'*.56#*.,'.#,)5

  • %(*7'%

8%$,)656#*.,'.#,)5

  • %(*7'%

!"#$% &'$$& (&$)" &'$$& *+,-"+. /$, 000 1%+234"%5$. &'$$&3 000 000 4#".%'59:5 ;(*<5=("7% 1%+234"%536&7$68"#$%692&"65:-&9;-$63&$)" 6"<=$8&3

slide-7
SLIDE 7

CoverTweet

!"#$%&'"(#)*'$ +,,-./ 0*"%)12*%)'")*'$1 (3#%4"3 536'$212*%)'")*'$1 (3#%4"3 !"#$% &'$$& (&$)" &'$$& *+,-"+. /$, 000 (&$)"1&'$$&2 '3&415"%%$5& 6+,-"+. 000 000 0*7)3"1891 :#%;1<#743 =%%*>$%1?#743%1)'1%)3>'1'8@36)% =7*63

gosh today i truly don’t want anything my beard to move away gosh now i genuinely don’t want my beard to go away god now i truly do not want my beard to go away gosh today i really don’t want my beard to go away ... gosh now I really don’t want to my barbe of going away gosh now I genuinely just don’t wanna my beard to go away gosh there, i really don’t wanna my beard to go away gosh now I really don’t mean my beard to get away gosh now I truly don’t want my beard of going away

slide-8
SLIDE 8

CoverTweet

!"#$%&'"(#)*'$ +,,-./ 0*"%)12*%)'")*'$1 (3#%4"3 536'$212*%)'")*'$1 (3#%4"3 !"#$% &'$$& (&$)" &'$$& *+,-"+. /$, 000 (&$)"1&'$$&2 '3&415"%%$5& 6+,-"+. 000 000 0*7)3"1891 :#%;1<#743 =%%*>$%1?#743%1)'1%)3>'1'8@36)% =7*63

gosh today i truly don’t want anything my beard to move away gosh now i genuinely don’t want my beard to go away god now i truly do not want my beard to go away gosh today i really don’t want my beard to go away ... gosh now I really don’t want to my barbe of going away gosh now I genuinely just don’t wanna my beard to go away gosh there, i really don’t wanna my beard to go away gosh now I really don’t mean my beard to get away gosh now I truly don’t want my beard of going away 0100 0100 1100 0110 0001 0100 1101 0110 0100

slide-9
SLIDE 9

CoverTweet

!"#$%&'"(#)*'$ +,,-./ 0*"%)12*%)'")*'$1 (3#%4"3 536'$212*%)'")*'$1 (3#%4"3 !"#$% &'$$& (&$)" &'$$& *+,-"+. /$, 000 (&$)"1&'$$&2 '3&415"%%$5& 6+,-"+. 000 000 0*7)3"1891 :#%;1<#743 =%%*>$%1?#743%1)'1%)3>'1'8@36)% =7*63

gosh today i truly don’t want anything my beard to move away gosh now i genuinely don’t want my beard to go away god now i truly do not want my beard to go away gosh today i really don’t want my beard to go away ... gosh now I really don’t want to my barbe of going away gosh now I genuinely just don’t wanna my beard to go away gosh there, i really don’t wanna my beard to go away gosh now I really don’t mean my beard to get away gosh now I truly don’t want my beard of going away 0100 0100 1100 0110 0001 0100 1101 0110 0100 gosh today i truly don’t want anything my beard to move away gosh now i genuinely don’t want my beard to go away gosh now I genuinely just don’t wanna my beard to go away gosh now I truly don’t want my beard of going away 0100 0100 0100 0100

slide-10
SLIDE 10

CoverTweet

!"#$%&'"(#)*'$ +,,-./ 0*"%)12*%)'")*'$1 (3#%4"3 536'$212*%)'")*'$1 (3#%4"3 !"#$% &'$$& (&$)" &'$$& *+,-"+. /$, 000 1+23$.45672) 8%6&4.76&"%&7"249$+65%$ 000 000 0*7)3"1891 :#%;1<#743 =7*63

gosh today i truly don’t want anything my beard to move away gosh now i genuinely don’t want my beard to go away gosh now I genuinely just don’t wanna my beard to go away gosh now I truly don’t want my beard of going away

slide-11
SLIDE 11

CoverTweet

!"#$%&'"(#)*'$ +,,-./ 0*"%)12*%)'")*'$1 (3#%4"3 536'$212*%)'")*'$1 (3#%4"3 !"#$% &'$$& (&$)" &'$$& *+,-"+. /$, 000 1+23$.45672) 8%6&4.76&"%&7"249$+65%$ 000 000 0*7)3"1891 :#%;1<#743 =7*63

gosh today i truly don’t want anything my beard to move away gosh now i genuinely don’t want my beard to go away gosh now I genuinely just don’t wanna my beard to go away gosh now I truly don’t want my beard of going away gosh now i genuinely don’t want my beard to go away gosh now I genuinely just don’t wanna my beard to go away gosh now I truly don’t want my beard of going away gosh today i truly don’t want anything my beard to move away

slide-12
SLIDE 12

CoverTweet

!"#$%&'"(#)*'$ +,,-./ 0*"%)12*%)'")*'$1 (3#%4"3 536'$212*%)'")*'$1 (3#%4"3 !"#$% &'$$& (&$)" &'$$& *+,-"+. /$, 000 1+23$.45672) 8%6&4.76&"%&7"249$+65%$ 000 000 0*7)3"1891 :#%;1<#743 =7*63

gosh today i truly don’t want anything my beard to move away gosh now i genuinely don’t want my beard to go away gosh now I genuinely just don’t wanna my beard to go away gosh now I truly don’t want my beard of going away gosh now i genuinely don’t want my beard to go away gosh now I genuinely just don’t wanna my beard to go away gosh now I truly don’t want my beard of going away gosh today i truly don’t want anything my beard to move away

slide-13
SLIDE 13

CoverTweet

!"#$%&'"(#)*'$ +,,-./ 0*"%)12*%)'")*'$1 (3#%4"3 536'$212*%)'")*'$1 (3#%4"3 !"#$% &'$$& (&$)" &'$$& *+,-"+. /$, 000 1%+234"%5$. &'$$&3 (&$)"6&'$$&3 '7&869"%%$9& :+,-"+. ;+2<$.6=372) >%3&6.73&"%&7"265$+3=%$ 000 000 0*7)3"1891 :#%;1<#743 =7*63

gosh now I really don’t want my beard to go away gosh now i genuinely don’t want my beard to go away

slide-14
SLIDE 14

Statistical Machine Translation

◮ Model the probability that a stego sentence s is a translation

  • f cover sentence c (Pr(s|c)).

◮ Bayes’ law:

Pr(s|c) = Pr(c|s) Pr(s) Pr(c)

slide-15
SLIDE 15

Statistical Machine Translation

◮ Model the probability that a stego sentence s is a translation

  • f cover sentence c (Pr(s|c)).

◮ Bayes’ law:

Pr(s|c) = Pr(c|s) Pr(s) Pr(c)

slide-16
SLIDE 16

Statistical Machine Translation

◮ Model the probability that a stego sentence s is a translation

  • f cover sentence c (Pr(s|c)).

◮ Bayes’ law:

Pr(s|c) = Pr(c|s)Pr(s) Pr(c)

slide-17
SLIDE 17

Language Modelling

◮ Our stego sentence s is made up of words w1, . . . , wT.

Pr(w1, . . . , wT) = Pr(w1)

T

  • i=2

Pr(wi|w1, . . . , wi−1) ≈ Pr(w1) Pr(w2|w1)

T

  • i=3

Pr(wi|wi−1, wi−2)

◮ This is a 2nd order Markov model

slide-18
SLIDE 18

Language Modelling

◮ These probabilities are calculated using the maximum

likelihood estimation (MLE): Pr(sat|the, cat) = count(the cat sat) count(the cat)

◮ Counts gathered from large text corpora (here 72M tweets). In

practice, the counts are smoothed to avoid probabilities of 0.

slide-19
SLIDE 19

Alice’s Language Model

◮ What data can we use to train the language model? ◮ We need to train on cover data, of which we don’t have

enough of (a few hundred from Alice).

◮ We do have a huge amount of other twitter data (500M per

day!).

◮ This is the problem of language model adaptation.

slide-20
SLIDE 20

Alice’s Language Model

◮ We train a small model on Alice’s data, and a large model on

general twitter data.

◮ The probabilities from both models are then linearly

  • interpolated. For example:

Pr(w3|w2, w1) = (1 − λ) Pr

A (w3|w2, w1) + λ Pr G (w3|w2, w1)

slide-21
SLIDE 21

Linguistic Distortion Measure

D(c, s) = − log Pr(s|c) Pr(c|c)

  • 0 ≤ D ≤ ∞
slide-22
SLIDE 22

Cover:

I wish I was drinking a mojito right about now #keepingitreal

Possible stego tweets:

  • 1. i wish i was drinking a mojito law around now #keepingitreal 0.815
  • 2. i wish i was drink a mojito good about now #keepingitreal 1.229
  • 3. if only i used to be drinking a mojito right about now

#keepingitreal 1.670

  • 4. i wish i was drinking a mojito right about far #keepingitreal 1.732
  • 5. i ’d like to be drinking a mojito right around now #keepingitreal

1.878 . . .

  • 3000. i wish i went drinkable a mojito entitled around today

#keepingitreal 18.199

slide-23
SLIDE 23

Secondary Distortion Measure: Human Interaction

◮ Language modelling isn’t good enough to guarantee that the

  • ption with lowest distortion is actually the best.

◮ Alice can choose the true best choice, from the ranked stego

  • bjects given by the first distortion measure.

◮ What if no option is fluent?

◮ Alice can’t signal no payload. ◮ Recipient can’t tell when there are no good options. ◮ Alice will have to rewrite tweet, or not use it.

slide-24
SLIDE 24

Evaluation

◮ Gathered 72M tweets, all posted in May 2013, from the

Harvard TweetMap.

◮ Randomly selected 10 users with ‘typical’ characteristics:

◮ Average number of words per tweet (11) ◮ Average size of vocabulary per tweet (6) ◮ Sufficient tweets for training the LM (> 500)

◮ For these 10 users, we trained the LM on the majority of their

tweets.

slide-25
SLIDE 25

Evaluation Data

Which of these have a hidden message?

  • 1. i just want to get above and beyond the state of drunk
  • 2. I want someone to come on long midnight walks please, that be

perfect

  • 3. in fact i just need a pet tortoise in my life
  • 4. sneaking your favourite foods into the trolley when you go food

shopping

slide-26
SLIDE 26

Evaluation Data

Which of these have a hidden message?

  • 1. i just want to get above and beyond the state of drunk
  • 2. I want someone to come on long midnight walks please, that be

perfect

  • 3. in fact i just need a pet tortoise in my life
  • 4. sneaking your favourite foods into the trolley when you go food

shopping

slide-27
SLIDE 27

◮ 20 human judges were shown 80 innocent tweets from each

user, followed by 20 unclassified tweets (10 stego, 10 innocent).

◮ The judges were asked: which of these are steganographic?

slide-28
SLIDE 28

Results

◮ Of 1000 steganographic tweets shown to judges, only 515

were correctly identified.

◮ Insufficient evidence to reject the null hypothesis that judges

are guessing randomly

U1 U2 U3 U4 U5 U6 U7 U8 U9 U10 0% 10% 20% 30% 40% 50% 60%

accuracy of stego tweet identification

user

slide-29
SLIDE 29

Results

◮ Of 1000 steganographic tweets shown to judges, only 515

were correctly identified.

◮ Insufficient evidence to reject the null hypothesis that judges

are guessing randomly. s

J1 J2 J3 J4 J5 J6 J7 J8 J9 J10 J11 J12 J13 J14 J15 J16 J17 J18 J19 J20 0% 10% 20% 30% 40% 50% 60%

accuracy of stego tweet identification

judge

slide-30
SLIDE 30

Summary

◮ Very secure against human judges! ◮ Embedded more bits per stego object than existing systems. ◮ More work needed on statistical detection methods. ◮ Synchronisation and coding need considerable work.