Towards Human Interactive Proofs in the Text-Domain Richard - - PowerPoint PPT Presentation

towards human interactive proofs in the text domain
SMART_READER_LITE
LIVE PREVIEW

Towards Human Interactive Proofs in the Text-Domain Richard - - PowerPoint PPT Presentation

Towards Human Interactive Proofs in the Text-Domain Richard Bergmair University of Derby in Austria and Stefan Katzenbeisser Technische Universitt Mnchen Institut fr Informatik Towards Human Interactive Proofs in the Text-Domain


slide-1
SLIDE 1

Towards Human Interactive Proofs in the Text-Domain

Richard Bergmair

University of Derby in Austria

and Stefan Katzenbeisser

Technische Universität München Institut für Informatik

Towards Human Interactive Proofs in the Text-Domain – p.1/29

slide-2
SLIDE 2

Introduction & Prior Work

Many serious threats to Information Security rely on attacks that can only be carried out by computers, not by humans:

  • manipulation of online polls
  • bulk subscription to web-services
  • distribution of spam and worms
  • privacy infringement by unwanted data mining
  • denial-of-service attacks
  • dictionary attacks

Towards Human Interactive Proofs in the Text-Domain – p.2/29

slide-3
SLIDE 3

Introduction & Prior Work

Moni Naor. Verification of a human in the loop or identification via the turing test. Unpublished Manuscript. http://www.wisdom.weizmann.ac.il/~naor/ PAPERS/human.ps, 1997.

Towards Human Interactive Proofs in the Text-Domain – p.3/29

slide-4
SLIDE 4

Introduction & Prior Work

Luis von Ahn, Manuel Blum, Nicholas J. Hopper, and John

  • Langford. CAPTCHA: using hard ai problems for security.

In Advances in Cryptology, Eurocrypt 2003, May 2003.

Towards Human Interactive Proofs in the Text-Domain – p.4/29

slide-5
SLIDE 5

Introduction & Prior Work

Luis von Ahn, Manuel Blum, Nicholas J. Hopper, and John

  • Langford. CAPTCHA: using hard ai problems for security.

In Advances in Cryptology, Eurocrypt 2003, May 2003.

Towards Human Interactive Proofs in the Text-Domain – p.5/29

slide-6
SLIDE 6

Introduction & Prior Work

Luis von Ahn, Manuel Blum, and John Langford. Telling humans and computers apart automatically. Communications of the ACM, 47(2):56–60, 2004.

Towards Human Interactive Proofs in the Text-Domain – p.6/29

slide-7
SLIDE 7

Introduction & Prior Work

Unpublished Abstract from First Workshop on Human Interactive Proofs, January 2002.

Towards Human Interactive Proofs in the Text-Domain – p.7/29

slide-8
SLIDE 8

Sense Ambiguity

George A. Miller, Richard Beckwith, Christiane Fellbaum, Derek Gross, and Katherine Miller. Introduction to WordNet: An on-line lexical database. http://www.cogsci.princeton.edu/~wn/5papers.ps, August 1993.

Towards Human Interactive Proofs in the Text-Domain – p.8/29

slide-9
SLIDE 9

Sense Ambiguity

  • It should move through several more drafts.
  • It should run through several more drafts.
  • It should go through several more drafts.
  • All articles must move through copy-editing.
  • All articles must run through copy-editing.
  • All articles must go through copy-editing.

syn(move) = {move, run, go}

??

Towards Human Interactive Proofs in the Text-Domain – p.9/29

slide-10
SLIDE 10

Sense Ambiguity

  • That sermon will move people.
  • That sermon will impress people.
  • That sermon will strike people.
  • Your speech must move the audience.
  • Your speech must impress the audience.
  • Your speech must strike the audience.

syn(move) = {move, impress, strike}

??

Towards Human Interactive Proofs in the Text-Domain – p.10/29

slide-11
SLIDE 11

Sense Ambiguity

Can we conclude that all these words are generally synonymous to move?

syn(move) = {move, run, go, impress, strike}

Unfortunately, we can’t.

Towards Human Interactive Proofs in the Text-Domain – p.11/29

slide-12
SLIDE 12

Sense Ambiguity

  • It should move through several more drafts.
  • It should run through several more drafts.
  • It should go through several more drafts.

BUT

  • Your speech must move the audience.
  • *Your speech must run the audience.
  • *Your speech must go the audience.

Towards Human Interactive Proofs in the Text-Domain – p.12/29

slide-13
SLIDE 13

Sense Ambiguity

  • That sermon will move people.
  • That sermon will impress people.
  • That sermon will strike people.

BUT

  • All articles must move through copy-editing.
  • *All articles must impress through copy-editing.
  • *All articles must strike through copy-editing.

Towards Human Interactive Proofs in the Text-Domain – p.13/29

slide-14
SLIDE 14

Sense Ambiguity

George A. Miller, Richard Beckwith, Christiane Fellbaum, Derek Gross, and Katherine Miller. Introduction to WordNet: An on-line lexical database. http://www.cogsci.princeton.edu/~wn/5papers.ps, August 1993.

Towards Human Interactive Proofs in the Text-Domain – p.14/29

slide-15
SLIDE 15

Sense Ambiguity

We cannot include a synset like

syn(move) = {move, run, go, impress, strike}

in a dictionary! All we can do is to state that

syn(c1, move)

= {move, run, go}

syn(c2, move)

= {move, impress, strike}

for some linguistic contexts c1 = c2.

Towards Human Interactive Proofs in the Text-Domain – p.15/29

slide-16
SLIDE 16

Sense Ambiguity

Pick the sentences that are meaningful replacements of each other:

It should move through several more drafts. It should run through several more drafts. It should go through several more drafts. It should impress through several more drafts. It should strike through several more drafts.

syn(c1, move)

= {move, run, go}, or

syn(c2, move)

= {move, impress, strike} ?

Towards Human Interactive Proofs in the Text-Domain – p.16/29

slide-17
SLIDE 17

Sense Ambiguity

The problem of automatic word-sense disambiguation has been under investigation in a computational context

  • since the 1950s

and is of central importance for

  • machine translation
  • text mining
  • spell checking
  • text classification
  • ...

Towards Human Interactive Proofs in the Text-Domain – p.17/29

slide-18
SLIDE 18

Sense Ambiguity

Rada Mihalcea, Timothy Chklovski, and Adam Kilgarriff. The senseval-3 english lexical sample task. In Senseval-3: Third International Workshop on the Evaluation of Systems for the Semantic Analysis of Text, pages 25–28, Barcelona, Spain, July 2004.

Towards Human Interactive Proofs in the Text-Domain – p.18/29

slide-19
SLIDE 19

Sense Ambiguity

We have introduced sense ambiguity making use of a function syn : C × W → 2W that assigns to a word

w ∈ W used in context c ∈ C the set s ⊂ W of all words

that are correct replacements of w. We have presented evidence to suggest that no machine can reproduce syn with high accuracy. Humans can produce an annotation, by hand-crafting a table of associations sa ⊂ syn, such that |sa| ≪ |syn|.

Towards Human Interactive Proofs in the Text-Domain – p.19/29

slide-20
SLIDE 20

Lexical HIP

What do we need?

  • A public lexicon of words organized into sets of

words that are synonymous in some linguistic

  • context. (like WordNet)
  • A corpus: A set of sentences that contain words

also contained in multiple synsets of the dictionary.

  • An initially hand-craftet secret annotation sa that is

a subset of syn.

Towards Human Interactive Proofs in the Text-Domain – p.20/29

slide-21
SLIDE 21

Lexical HIP: Generation Phase

t1               

  • It should move through ...

c

  • It should run through ...

c1 ∈ R(c)

  • It should go through ...

c2 ∈ R(c)

  • It should impress through ...

c3 ∈ Q(c)

  • It should strike through ...

c4 ∈ Q(c) t2     

  • We’ll send your order ...

d

  • We’ll ship your order ...

d1 ∈ R(d)

  • We’ll broadcast your order ...

d2 ∈ Q(d)

Towards Human Interactive Proofs in the Text-Domain – p.21/29

slide-22
SLIDE 22

Lexical HIP: Testing Phase

t1               

  • It should move through ...

c

  • It should run through ...

c1 ∈ R(c)

  • It should go through ...

c2 ∈ R(c)

  • It should impress through ...

c3 ∈ Q(c)

  • It should strike through ...

c4 ∈ Q(c) t2     

  • We’ll send your order ...

d

  • We’ll ship your order ...

d1 ∈ R(d)

  • We’ll broadcast your order ...

d2 ∈ Q(d)

Towards Human Interactive Proofs in the Text-Domain – p.22/29

slide-23
SLIDE 23

Lexical HIP: Verification Phase

t1               

  • It should move through ...

c √

  • It should run through ...

c1 ∈ R(c) ×

  • It should go through ...

c2 ∈ R(c) √

  • It should impress through ...

c3 ∈ Q(c) ×

  • It should strike through ...

c4 ∈ Q(c) √ t2     

  • We’ll send your order ...

d √

  • We’ll ship your order ...

d1 ∈ R(d) √

  • We’ll broadcast your order ...

d2 ∈ Q(d) √

Towards Human Interactive Proofs in the Text-Domain – p.23/29

slide-24
SLIDE 24

Lexical HIP: Learning

We have to trust in sa to be private at any time. If we hand-craft it once, it will soon loose this property because whenever an association is used it is in fact published to the testee and to the adversary. We have to think about sa as a dynamic resource, where we have to

  • add new private associations
  • remove associations if they are published

Towards Human Interactive Proofs in the Text-Domain – p.24/29

slide-25
SLIDE 25

Lexical HIP: Learning Phase

t2                             

  • We’ll send your order ...

c √

  • We’ll ship your order ...

c1 ∈ R(c) √

  • We’ll broadcast your order ...

c2 ∈ Q(c) √

  • We’ll cough your order ...

d ∈ P(c) √

  • We’ll take your order ...

e ∈ P(c) ?

  • We’ll accept your order ...

e1 ∈ Q(e) ?

  • We’ll hire your order ...

e1 ∈ Q(e) ?

Towards Human Interactive Proofs in the Text-Domain – p.25/29

slide-26
SLIDE 26

Conclusions

In this contribution we have

  • shown that the construction of text-based HIPs

might in fact be possible.

  • demonstrated word-sense ambiguity as a

promising security primitive to build upon.

  • presented the details of a construction

automatically distinguishing computers and humans.

Towards Human Interactive Proofs in the Text-Domain – p.26/29

slide-27
SLIDE 27

Conclusions

  • The construction is NOT a CAPTCHA in the sense
  • f a facility that does not rely on any private

resources but a randomness source.

  • HOWEVER we demonstrated that the security

problems that arise from the use of a private database can be overcome by a learning approach.

  • Details of such a learning construction were given.

Towards Human Interactive Proofs in the Text-Domain – p.27/29

slide-28
SLIDE 28

Future Research

  • We have pointed out the relevance of natural

language semantics, and natural language learning to the construction of secure text-based HIPs.

  • Since lexical methods provide only for the tip of the

linguistic iceberg, we believe it will be fruitful to investigate the application of other methods as well, perhaps grammatical or ontological in nature.

Towards Human Interactive Proofs in the Text-Domain – p.28/29

slide-29
SLIDE 29

Towards Human Interactive Proofs in the Text-Domain

Richard Bergmair

University of Derby in Austria

and Stefan Katzenbeisser

Technische Universität München Institut für Informatik

Towards Human Interactive Proofs in the Text-Domain – p.29/29