Reverse Engineering CAPTCHAs Abram Hindle, Micheal W. Godfrey, - - PowerPoint PPT Presentation

reverse engineering captchas
SMART_READER_LITE
LIVE PREVIEW

Reverse Engineering CAPTCHAs Abram Hindle, Micheal W. Godfrey, - - PowerPoint PPT Presentation

Reverse Engineering CAPTCHAs WCRE 2008 Reverse Engineering CAPTCHAs Abram Hindle, Micheal W. Godfrey, Richard C. Holt Software Architecture Group David R. Cheriton School of Computer Science University of Waterloo Canada


slide-1
SLIDE 1

Reverse Engineering CAPTCHAs WCRE 2008

Reverse Engineering CAPTCHAs

Abram Hindle, Micheal W. Godfrey, Richard C. Holt

Software Architecture Group David R. Cheriton School of Computer Science University of Waterloo Canada http://swag.uwaterloo.ca/

{ahindle,migod,holt}@cs.uwaterloo.ca

Abram Hindle 1

slide-2
SLIDE 2

Reverse Engineering CAPTCHAs WCRE 2008

slide-3
SLIDE 3

Reverse Engineering CAPTCHAs WCRE 2008

slide-4
SLIDE 4

Reverse Engineering CAPTCHAs WCRE 2008

slide-5
SLIDE 5

Reverse Engineering CAPTCHAs WCRE 2008

slide-6
SLIDE 6

Reverse Engineering CAPTCHAs WCRE 2008

slide-7
SLIDE 7

Reverse Engineering CAPTCHAs WCRE 2008

Motivation

  • How can we solve that CAPTCHA?
  • How was a CAPTCHA made?

Abram Hindle 7

slide-8
SLIDE 8

Reverse Engineering CAPTCHAs WCRE 2008

Why Reverse Engineer?

  • If we can reverse engineer a CAPTCHA

– leverage weaknesses – re-implement a CAPTCHA

∗ The more we understand the easier it is to defeat ∗ We can solve by cloning

Abram Hindle 8

slide-9
SLIDE 9

Reverse Engineering CAPTCHAs WCRE 2008 Abram Hindle 9

slide-10
SLIDE 10

Reverse Engineering CAPTCHAs WCRE 2008

CAPTCHA Properties

Abram Hindle 10

slide-11
SLIDE 11

Reverse Engineering CAPTCHAs WCRE 2008 Abram Hindle 11

slide-12
SLIDE 12

Reverse Engineering CAPTCHAs WCRE 2008

Common Properties

  • Readable: the captcha must be easily read and

decoded by humans.

  • Unguessable: The captcha message cannot be

guessed at random with any real confidence.

  • Order-able: Characters are read left to right, top to

bottom (exceptions could include Hebrew or Arabic captchas). If a captcha is readable, its character

  • rdering should be apparent.

Abram Hindle 12

slide-13
SLIDE 13

Reverse Engineering CAPTCHAs WCRE 2008

Bitmap fonts and placement

Abram Hindle 13

slide-14
SLIDE 14

Reverse Engineering CAPTCHAs WCRE 2008

Backgrounds

Abram Hindle 14

slide-15
SLIDE 15

Reverse Engineering CAPTCHAs WCRE 2008

Noise

Abram Hindle 15

slide-16
SLIDE 16

Reverse Engineering CAPTCHAs WCRE 2008

Linear Transformations

Abram Hindle 16

slide-17
SLIDE 17

Reverse Engineering CAPTCHAs WCRE 2008

Non-Linear Transformations

Abram Hindle 17

slide-18
SLIDE 18

Reverse Engineering CAPTCHAs WCRE 2008

Dripping and Fuzzy Text

Abram Hindle 18

slide-19
SLIDE 19

Reverse Engineering CAPTCHAs WCRE 2008

CAPTCHA Breaking

Abram Hindle 19

slide-20
SLIDE 20

Reverse Engineering CAPTCHAs WCRE 2008 Abram Hindle 20

slide-21
SLIDE 21

Reverse Engineering CAPTCHAs WCRE 2008

Layering

Abram Hindle 21

slide-22
SLIDE 22

Reverse Engineering CAPTCHAs WCRE 2008 Abram Hindle 22

slide-23
SLIDE 23

Reverse Engineering CAPTCHAs WCRE 2008 Abram Hindle 23

slide-24
SLIDE 24

Reverse Engineering CAPTCHAs WCRE 2008 Abram Hindle 24

slide-25
SLIDE 25

Reverse Engineering CAPTCHAs WCRE 2008 Abram Hindle 25

slide-26
SLIDE 26

Reverse Engineering CAPTCHAs WCRE 2008 Abram Hindle 26

slide-27
SLIDE 27

Reverse Engineering CAPTCHAs WCRE 2008 Abram Hindle 27

slide-28
SLIDE 28

Reverse Engineering CAPTCHAs WCRE 2008 Abram Hindle 28

slide-29
SLIDE 29

Reverse Engineering CAPTCHAs WCRE 2008 Abram Hindle 29

slide-30
SLIDE 30

Reverse Engineering CAPTCHAs WCRE 2008 Abram Hindle 30

slide-31
SLIDE 31

Reverse Engineering CAPTCHAs WCRE 2008

Text Pixel Identification and Image Cleanup

Abram Hindle 31

slide-32
SLIDE 32

Reverse Engineering CAPTCHAs WCRE 2008

Erosion and Dilation

Abram Hindle 32

slide-33
SLIDE 33

Reverse Engineering CAPTCHAs WCRE 2008

Thresholding

Abram Hindle 33

slide-34
SLIDE 34

Reverse Engineering CAPTCHAs WCRE 2008

Edge Detection

Abram Hindle 34

slide-35
SLIDE 35

Reverse Engineering CAPTCHAs WCRE 2008

Segmentation

Abram Hindle 35

slide-36
SLIDE 36

Reverse Engineering CAPTCHAs WCRE 2008

Weight Segmentation

Abram Hindle 36

slide-37
SLIDE 37

Reverse Engineering CAPTCHAs WCRE 2008

Box Segmenter

Abram Hindle 37

slide-38
SLIDE 38

Reverse Engineering CAPTCHAs WCRE 2008

Shrinking and K-Means segmentation

Abram Hindle 38

slide-39
SLIDE 39

Reverse Engineering CAPTCHAs WCRE 2008

Fill Flood

Abram Hindle 39

slide-40
SLIDE 40

Reverse Engineering CAPTCHAs WCRE 2008

Normalization and Character Matching

Abram Hindle 40

slide-41
SLIDE 41

Reverse Engineering CAPTCHAs WCRE 2008

PCA of A

Abram Hindle 41

slide-42
SLIDE 42

Reverse Engineering CAPTCHAs WCRE 2008

PCA of F

Abram Hindle 42

slide-43
SLIDE 43

Reverse Engineering CAPTCHAs WCRE 2008

Skeletonization

Abram Hindle 43

slide-44
SLIDE 44

Reverse Engineering CAPTCHAs WCRE 2008

CAPTCHA Solving

  • Character Database
  • Normalization of Characters

– PCA etc.

  • Matching

– Nearest Neighbor – Shape Matching

Abram Hindle 44

slide-45
SLIDE 45

Reverse Engineering CAPTCHAs WCRE 2008

Piratebay Database

Abram Hindle 45

slide-46
SLIDE 46

Reverse Engineering CAPTCHAs WCRE 2008

Digg Database

Abram Hindle 46

slide-47
SLIDE 47

Reverse Engineering CAPTCHAs WCRE 2008

Reverse Engineering

  • Layering
  • Background
  • Noise
  • Text
  • Transforms

Abram Hindle 47

slide-48
SLIDE 48

Reverse Engineering CAPTCHAs WCRE 2008

slide-49
SLIDE 49

Reverse Engineering CAPTCHAs WCRE 2008

Captcha Solving Summary

  • Image Clean Up
  • Text Pixel Identification
  • Segmentation
  • Character Matching

– Normalization

Abram Hindle 49

slide-50
SLIDE 50

Reverse Engineering CAPTCHAs WCRE 2008

Solving by Cloning

  • Reverse Engineer captcha
  • Preprocess the captcha
  • Parameterize
  • Generate candidates

– Search through the captchas – Find best match – Repeat

Abram Hindle 50

slide-51
SLIDE 51

Reverse Engineering CAPTCHAs WCRE 2008

Watercap demo

  • Provided with a captcha of “WCREWCRE” and the

code to generate such captchas

  • Algorithm

– Per each column we iterate through each character,

∗ generating a captcha for each prefix and

character,

· keeping the best match.

Abram Hindle 51

slide-52
SLIDE 52

Reverse Engineering CAPTCHAs WCRE 2008

CAPTCHA Example Accuracy Digg 30% PHPBB 99% Piratebay 61% Watercap 27% / 93% Rogers 95% Minimum accuracy of our captcha breakers

Abram Hindle 52

slide-53
SLIDE 53

Reverse Engineering CAPTCHAs WCRE 2008

How to improve captcha implementations

  • Non-linear transformations
  • Non-fill-flood-able letters
  • Use more characters
  • Limit captcha access
  • Similar to the background
  • Non continuous and overlapping characters

Abram Hindle 53

slide-54
SLIDE 54

Reverse Engineering CAPTCHAs WCRE 2008

Ethics

  • Spammers
  • Visually Impaired
  • Poor security
  • Options:

– Telephone Confirmation – Credit Cards – Web of trust

Abram Hindle 54

slide-55
SLIDE 55

Reverse Engineering CAPTCHAs WCRE 2008

Reverse Engineering Lessons

  • RE Can be interpretative
  • Some outputs have properties that allow us to

Reverse Engineer the software that created it – In this case 2D Image generation has many common patterns

  • Absence of code still allows RE

Abram Hindle 55

slide-56
SLIDE 56

Reverse Engineering CAPTCHAs WCRE 2008

Future Work

  • Better Breakers
  • Layer recognition
  • Audio captchas

Abram Hindle 56

slide-57
SLIDE 57

Reverse Engineering CAPTCHAs WCRE 2008

Conclusion

  • Reverse Engineering captchas hi-lights techniques

that have weaknesses.

  • Captcha generation follows certain patterns which are

recoverable and leveragable.

  • Captchas have been defeated

– Even “good” captchas from Microsoft, Yahoo and Google have been defeated.

Abram Hindle 57