Something about audio CAPTCHAs Elie Bursztein, Romain Bauxis, - - PowerPoint PPT Presentation

something about audio captchas
SMART_READER_LITE
LIVE PREVIEW

Something about audio CAPTCHAs Elie Bursztein, Romain Bauxis, - - PowerPoint PPT Presentation

Something about audio CAPTCHAs Elie Bursztein, Romain Bauxis, Daniele Perito, Hristo Paskov, Celine Fabry, John Mitchell 1 Elie Bursztein (@elie) http://ly.tl/p18 The Failure of Noise-Based Non-Continuous Audio Captchas Elie Bursztein


slide-1
SLIDE 1

Something about audio CAPTCHAs

Elie Bursztein, Romain Bauxis, Daniele Perito, Hristo Paskov, Celine Fabry, John Mitchell

1

slide-2
SLIDE 2

Elie Bursztein (@elie) The Failure of Noise-Based Non-Continuous Audio Captchas http://ly.tl/p18

slide-3
SLIDE 3

Elie Bursztein (@elie) The Failure of Noise-Based Non-Continuous Audio Captchas http://ly.tl/p18

slide-4
SLIDE 4

Elie Bursztein (@elie) The Failure of Noise-Based Non-Continuous Audio Captchas http://ly.tl/p18

slide-5
SLIDE 5

Elie Bursztein (@elie) The Failure of Noise-Based Non-Continuous Audio Captchas http://ly.tl/p18

slide-6
SLIDE 6

Elie Bursztein (@elie) The Failure of Noise-Based Non-Continuous Audio Captchas http://ly.tl/p18

CAPTCHAS

slide-7
SLIDE 7

Elie Bursztein (@elie) The Failure of Noise-Based Non-Continuous Audio Captchas http://ly.tl/p18

CAPTCHAS

slide-8
SLIDE 8

Elie Bursztein (@elie) The Failure of Noise-Based Non-Continuous Audio Captchas http://ly.tl/p18

CAPTCHAS

slide-9
SLIDE 9

Elie Bursztein (@elie) The Failure of Noise-Based Non-Continuous Audio Captchas http://ly.tl/p18

CAPTCHAS

slide-10
SLIDE 10

Elie Bursztein (@elie) The Failure of Noise-Based Non-Continuous Audio Captchas http://ly.tl/p18

CAPTCHAS

slide-11
SLIDE 11

Elie Bursztein (@elie) The Failure of Noise-Based Non-Continuous Audio Captchas http://ly.tl/p18

CAPTCHAS

slide-12
SLIDE 12

Elie Bursztein (@elie) The Failure of Noise-Based Non-Continuous Audio Captchas http://ly.tl/p18

CAPTCHAS

slide-13
SLIDE 13

Elie Bursztein (@elie) The Failure of Noise-Based Non-Continuous Audio Captchas http://ly.tl/p18

Audio capchas

slide-14
SLIDE 14

Elie Bursztein (@elie) The Failure of Noise-Based Non-Continuous Audio Captchas http://ly.tl/p18

Audio capchas

slide-15
SLIDE 15

Elie Bursztein (@elie) The Failure of Noise-Based Non-Continuous Audio Captchas http://ly.tl/p18

Outline

  • Audio captchas background
  • Breaking audio captchas
  • Evaluation results
  • Demo
slide-16
SLIDE 16

Elie Bursztein (@elie) The Failure of Noise-Based Non-Continuous Audio Captchas http://ly.tl/p18

Super secure captcha Captcha Maker

Creating audio captcha

slide-17
SLIDE 17

Elie Bursztein (@elie) The Failure of Noise-Based Non-Continuous Audio Captchas http://ly.tl/p18

Super secure captcha Captcha Maker

Creating audio captcha

Voices

slide-18
SLIDE 18

Elie Bursztein (@elie) The Failure of Noise-Based Non-Continuous Audio Captchas http://ly.tl/p18

Super secure captcha Captcha Maker

Creating audio captcha

Noises

slide-19
SLIDE 19

Elie Bursztein (@elie) The Failure of Noise-Based Non-Continuous Audio Captchas http://ly.tl/p18

Super secure captcha

Creating audio captcha

slide-20
SLIDE 20

Elie Bursztein (@elie) The Failure of Noise-Based Non-Continuous Audio Captchas http://ly.tl/p18

Type of noise

  • Additive noise i.e white noise
  • Convolutive noise i.e echo
  • Semantic noise i.e music
slide-21
SLIDE 21

Elie Bursztein (@elie) The Failure of Noise-Based Non-Continuous Audio Captchas http://ly.tl/p18

Noise intensity (RMS/SNR)

2 9 Microsoft J Digg A K Authorize K J 5 H

slide-22
SLIDE 22

Elie Bursztein (@elie) The Failure of Noise-Based Non-Continuous Audio Captchas http://ly.tl/p18

Sound representation

WAV DFT Cep TFR TCR TDC

slide-23
SLIDE 23

Elie Bursztein (@elie) The Failure of Noise-Based Non-Continuous Audio Captchas http://ly.tl/p18

Breaking audio captchas

slide-24
SLIDE 24

Elie Bursztein (@elie) The Failure of Noise-Based Non-Continuous Audio Captchas http://ly.tl/p18

Solving an audio captcha

slide-25
SLIDE 25

Elie Bursztein (@elie) The Failure of Noise-Based Non-Continuous Audio Captchas http://ly.tl/p18

Solving an audio captcha

slide-26
SLIDE 26

Elie Bursztein (@elie) The Failure of Noise-Based Non-Continuous Audio Captchas http://ly.tl/p18

Solving an audio captcha

slide-27
SLIDE 27

Elie Bursztein (@elie) The Failure of Noise-Based Non-Continuous Audio Captchas http://ly.tl/p18

Solving an audio captcha

slide-28
SLIDE 28

Elie Bursztein (@elie) The Failure of Noise-Based Non-Continuous Audio Captchas http://ly.tl/p18

Solving an audio captcha

slide-29
SLIDE 29

Elie Bursztein (@elie) The Failure of Noise-Based Non-Continuous Audio Captchas http://ly.tl/p18

Solving an audio captcha

C

slide-30
SLIDE 30

Elie Bursztein (@elie) The Failure of Noise-Based Non-Continuous Audio Captchas http://ly.tl/p18

Solving an audio captcha

C T T A R A F R S 2

slide-31
SLIDE 31

Elie Bursztein (@elie) The Failure of Noise-Based Non-Continuous Audio Captchas http://ly.tl/p18

Dealing with random noise

  • Statistical learning
  • Supervised learning
  • RLS (Regularized

least square) classifier

Authorize eBay Recaptcha Authorize Digg

5: J:

slide-32
SLIDE 32

Elie Bursztein (@elie) The Failure of Noise-Based Non-Continuous Audio Captchas http://ly.tl/p18

Solver efficiency

Solver accuracy = Coverage * Precision^length Coverage: Segmentation Precision: Recognition rate

slide-33
SLIDE 33

Elie Bursztein (@elie) The Failure of Noise-Based Non-Continuous Audio Captchas http://ly.tl/p18

Decaptcha

slide-34
SLIDE 34

Elie Bursztein (@elie) The Failure of Noise-Based Non-Continuous Audio Captchas http://ly.tl/p18

Decaptcha overview

Web Site

Captcha scraping

Sound processing Mechanical Turk users

Captcha labels Discretized and segmented captcha

Classifier

Answers

slide-35
SLIDE 35

Elie Bursztein (@elie) The Failure of Noise-Based Non-Continuous Audio Captchas http://ly.tl/p18

Testing corpus

Familly Name Description Constant Noise White White Gaussian noise. buzz Sine waves at 700 Hz, 2100 Hz and 3500 Hz. Regular noise pow 10 ms bursts of white Gaussian noise repeated every 100 ms. rnoise Every 100 ms, a section

  • f the signal is replaced

by white noise of the same RMS amplitude. lofi Add distortion, cracks, bandwidth limiting and

  • compression. Simulates
  • ld audio equipment.

echo The signal starts to echo at 0.6, 1.32, and 1.92 seconds. disintegrator Amplifies random half- cycles of the signal by a multiplier. Simulates a bad audio channel. Semantic noise chopin Chopin Polonaise for Piano No. 6, Op. 53. gregorian Gregorian chant. nina “Just in time“ by Nina Simone. Table III

slide-36
SLIDE 36

Elie Bursztein (@elie) The Failure of Noise-Based Non-Continuous Audio Captchas http://ly.tl/p18

Synthetic evaluation

−5 5 10 15 20 10 20 30 40 50 60 70 80 90 100 SNR (dB) Per−Captcha Precision (%) white buzz gregorian nina chopin pow echo, lofi, rnoise, disintegrator

slide-37
SLIDE 37

Elie Bursztein (@elie) The Failure of Noise-Based Non-Continuous Audio Captchas http://ly.tl/p18

Semantic noise

slide-38
SLIDE 38

Elie Bursztein (@elie) The Failure of Noise-Based Non-Continuous Audio Captchas http://ly.tl/p18

Captcha features

Scheme Authorize Digg eBay Microsoft Recaptcha Yahoo Length 5 5 6 10 8 7 Type of voice Female Female Various Various Various Child Background Noise None Constant (random) Constant (random) Constant (random) Constant (random) None Intermediate noise None None Regular (speech) Regular (speech) Regular (speech) Regular (speech) Charset 0-9a-z a-z 0-9 0-9 0-9 0-9

  • Avg. duration

5.0 6.8 4.4 7.1 25.3 18.0 Sample rate 8000 8000 8000 8000 8000 8000 22050 Beep no no no no no yes

slide-39
SLIDE 39

Elie Bursztein (@elie) The Failure of Noise-Based Non-Continuous Audio Captchas http://ly.tl/p18

Results

Length Coverage Digit Captcha Authorize 5 100 97 89.2% Digg 5 100 76 41.4% eBay 6 85.6 92.5 82.9% Microsoft 10 80.6 89.6 48.9% Recaptcha 8 99.9 40.5 1.5% Yahoo 7 99.1 74.7 45.4%

slide-40
SLIDE 40

Elie Bursztein (@elie) The Failure of Noise-Based Non-Continuous Audio Captchas http://ly.tl/p18

Recaptcha semantic noise

  • 70
  • 60
  • 50
  • 40
  • 30
  • 20
  • 10

20 40 60 80 100 120 140 160 180 200

3 7 N 2 1 4 9 N 5 DB Time in seconds

slide-41
SLIDE 41

Elie Bursztein (@elie) The Failure of Noise-Based Non-Continuous Audio Captchas http://ly.tl/p18

Recaptcha semantic noise

  • 70
  • 60
  • 50
  • 40
  • 30
  • 20
  • 10

20 40 60 80 100 120 140 160 180 200

3 7 N 2 1 4 9 N 5 DB Time in seconds

slide-42
SLIDE 42

Elie Bursztein (@elie) The Failure of Noise-Based Non-Continuous Audio Captchas http://ly.tl/p18

Confusion matrices

slide-43
SLIDE 43

Elie Bursztein (@elie) The Failure of Noise-Based Non-Continuous Audio Captchas http://ly.tl/p18

How many captchas do you need ?

10

2

10

3

10

4

10 20 30 40 50 60 70 80 90 100 Per−Captcha Precision (%) Corpus Size (in Digits) Authorize Digg Ebay MSLive Recaptcha Yahoo

slide-44
SLIDE 44

Elie Bursztein (@elie) The Failure of Noise-Based Non-Continuous Audio Captchas http://ly.tl/p18

Conclusion

  • Non-continuous based captchas are broken
  • Urgent need to come-up with the next generation of

audio captchas

slide-45
SLIDE 45

Elie Bursztein (@elie) The Failure of Noise-Based Non-Continuous Audio Captchas http://ly.tl/p18

Questions ?

Thanks http://ly.tl/p18

Twitter: @elie