APTCHA I am Andreas Charalampous, April 2020 Contents 1. - - PowerPoint PPT Presentation

aptcha
SMART_READER_LITE
LIVE PREVIEW

APTCHA I am Andreas Charalampous, April 2020 Contents 1. - - PowerPoint PPT Presentation

CS682 - Advanced Security Topics Instructor: Elias Athanasopoulos APTCHA I am Andreas Charalampous, April 2020 Contents 1. Introduction to Captcha 2. Paper 1: Re: Captchas Understanding Captcha-Solving Services in an economic context 3.


slide-1
SLIDE 1

APTCHA

I am Andreas Charalampous, April 2020 CS682 - Advanced Security Topics Instructor: Elias Athanasopoulos

slide-2
SLIDE 2

Contents

  • 1. Introduction to Captcha
  • 2. Paper 1: Re: Captchas – Understanding Captcha-Solving Services in an

economic context

  • 3. Paper 2: I am Robot: (DEEP) Learning to break Semantic Image Captchas
slide-3
SLIDE 3
  • 1. Introduction to Captcha

i.

Motivation

  • ii. Definition

iii.Type of Captcha Challenges iv.reCaptcha

slide-4
SLIDE 4

Motivation

  • Using computers for bot fraud, attackers can attack at scale.
  • Fake Registrations - Create multiple accounts automatically.
  • Comment/Posting Spam.
  • Purchase of tickets.
  • Resource that has to be guarded.
  • A defense mechanism is needed to distinguish computers and humans, let humans

in and spammers out of resources.

slide-5
SLIDE 5

Definition of Captcha

  • Captcha: Completely Automated Public Turing test to tell Computers and Humans Apart.
  • Reverse Turing Test.
  • Term coined by Luis von Ahn, Manuel Blum, Nicholas J. Hopper, and John Langford in 2003.
  • Captchas protect open Web Resources from being exploited at scale.
  • Challenge-Response to determine whether the user is human or not.
  • A Captcha challenge must at the same time make the bot fail and the human easily solve it.
  • Approximately 10 seconds for a human to solve a typical Captcha.
slide-6
SLIDE 6

Type of Captcha Challenges

  • First version of Captcha (v.1) is the “twisted text”,

made in 1997.

  • Earliest commercial use by idrive.com and Paypal in 2002

and 2001 respectively.

  • Math problems captchas.
  • Audio captchas.
  • Picture captchas.
slide-7
SLIDE 7

Type of Captcha Challenges

Advertisement Captcha Game Captcha SlideLock Captcha Drag-And-Drop Captcha Trivial Captcha

slide-8
SLIDE 8

reCaptcha

  • Was developed by Luis von Ahn, David Abraham,

Manuel Blum, Michael Crawford, Ben Maurer, Colin McMillen, and Edison in May 2007.

  • It was acquired by Google in September 2009.
  • Used for digitization of The New York Times

archives and books from Google Books.

  • Two of the reCaptcha challenges are image and

distorted text identification.

slide-9
SLIDE 9

No Captcha ReCaptcha

  • Developed in 2014.
  • Consists of a checkbox where the user is asked

to just click it.

  • Performs behavioral analysis on the browser

predicting if the user is human or not.

  • Easier for humans.
  • “Harder” for bots.
slide-10
SLIDE 10

Evolution and Variety in Captchas

  • Captchas are evolving for more than 20 years and will keep on doing. Many

different kinds of captcha challenges.

  • Are improved, finding ways to make it easier to users, more difficult to bots.
  • Provide accessibility to health impaired users.
  • Captchas are kept being bypassed by automation software or solver services, creating

an arms race between solvers and providers.

slide-11
SLIDE 11
slide-12
SLIDE 12
  • 2. Paper 1: Re: Captchas – Understanding Captcha-

Solving Services in an economic context

i.

Introduction

  • ii. What is examined in this paper
  • iii. Automated Software Solvers
  • iv. Human Solver Services
  • v. Conclusion
slide-13
SLIDE 13
  • Captchas attached value to the problem of solving them, creating an

industrial market, where captcha providers and solver are competing.

  • Providers come against two types of solvers:
  • Automated solving technology.
  • Real time Human Labor .
  • Captchas are evaluated in economic terms.

Introduction

slide-14
SLIDE 14

What is examined in the paper?

  • How this new market works
  • Serving quality to price.
  • Solving capacity of the market leaders.
  • Details about solving services.
  • How the two categories of solvers work:
  • Automated solving:
  • How it evolved.
  • How the arms race favors the providers (defender).
  • Human Labor:
  • Why it surpassed automated solving.
  • How the cost of it dropped significantly.
  • Which Captchas are targeted most.
slide-15
SLIDE 15

To support further the study

  • Interviewed Mr. E., owner of a successful CAPTCHA-solving service. He

provided validation and insight of the underlying business processes.

  • Studied the whole market, from all aspects and view.
  • Purchased solving services from both categories and tested them.
  • Became part of the human labor pool.
slide-16
SLIDE 16

Automated Software Solvers

  • Use segmentation algorithms – Optical Character Recognition (OCR)
  • Complex.
  • Fails to replicate human accuracy.
  • Advantages:
  • Near-zero cost. Only cost is in creating solver.
  • Near-infinite capacity.
  • Tested Xrumer and reCaptchaOCR.
slide-17
SLIDE 17

Xrumer

  • Software for spamming, mostly forums and comment sections.
  • Integrated support for bypassing many different anti-spam mechanisms,

including Captcha.

  • Available from 2006 and in 2010 it cost $540. Authors purchased it for

evaluation.

  • In 2008 was capable of solving Captchas of major message boards.
slide-18
SLIDE 18

XrumerTests

  • Tested on netbook with 1.6Ghz Intel Atom Processor.
  • On all but one captchas scored 100% accuracy, requiring 1 second or less for each

Captcha.

  • Only on phpBB which uses GD Captcha generator and foreground noise, scored 35%

accuracy, requiring 6-7 seconds per captcha.

  • Even though the scores are pretty impressive, a couple of months later

theses captchas were updated, defeating Xrumer.

slide-19
SLIDE 19

reCaptchaOCR

  • Created in December 2009.
  • Focused on reCaptcha.
  • Developed to defeat early 2008 reCaptchas.
  • Was able to defeat late 2009 reCaptchas.
  • Early 2010 reCaptcha was updated and

reCaptchaOCR was unable to defeat it.

(a) Early 2008 (b) Late 2009 (c) Early 2010

slide-20
SLIDE 20

reCaptchaOCRTests

  • Tested on netbook with 2.13Ghz Intel Core 2 Duo Processor.
  • Uses iteration for improving accuracy.
  • With 613 iterations:
  • 100 (a) captchas scoring 30%.
  • 100 (b) captchas scoring 18%.
  • Average 105 seconds per challenge.
  • With 75 iterations:
  • 100 (a) captchas scoring 29%.
  • 100 (b) captchas scoring 17%.
  • Average 12 seconds per challenge.
slide-21
SLIDE 21

Conclusion

  • Arms races traditionally favor the attacker. Here attackers have the more

challenging recognition problem, while providers can be agile.

  • Economics of automated solving are driven by several factors:
  • Cost of developing new solvers.
  • Accuracy of those solvers.
  • Responsiveness of the sites whose captchas are attacked.
slide-22
SLIDE 22

Human Solver Services

  • Instead of using automated solving software, the workload of captchas is

given to humans to solve.

  • Opportunistically.
  • On a “For a Hire” Basis.
slide-23
SLIDE 23

Opportunistic Solving

  • Individual solving a Captcha as part of some other task.
  • An attacker controlling a popular Website, might use its visitors for solving

third-party Captchas by offering them as the visitor’s challenge.

  • Did not play a major role in the market.
slide-24
SLIDE 24

Paid Solving

  • Core of the CAPTCHA-solving ecosystem.
  • Services are paying individuals to solve captchas.
  • Price is calculated as $X/1000, where X is the amount paid for solving 1000

Captchas.

  • An advertisement in 2006 was looking for a full-time CAPTCHA solver for

$10/1000.

slide-25
SLIDE 25

decaptcher.com

DeCaptcher

PixProfit

Pictures are life

demenoba

1 2 3 4 5 6 7 8

Workers all around the world

slide-26
SLIDE 26

Paid Solving Evolution

  • From 2007 to 2010 the market has been expanding with wages declining.
  • 2007: $10/1000.
  • Mid-2008: $1.5/1000.
  • Mid-2009: $1/1000.
  • 2010: $0.75/1000 – $0.5/1000.
  • Solving is unskilled activity.
  • Services preferred labor from Eastern Europe, Bangladesh, China, India, Vietnam.
  • Competition made wages reduce even more.
slide-27
SLIDE 27

Solver Service Quality

  • Evaluate 8 Paid Services:
  • Antigate https://anti-captcha.com/
  • BeatCaptchas https://beatcaptchas.com.cutestat.com/
  • BypassCaptcha http://bypasscaptcha.com/
  • CaptchaBot http://www.captchabot.com/
  • CaptchaBypass – Ceased Operation during evaluation
  • CaptchaGateway – Ceased Operation during evaluation
  • DeCaptcher https://de-captcher.com/
  • ImageToText – Ceased Operation
  • Based on:

1.

Customer Interface

2.

Solution Accuracy

3.

Response time

4.

Capacity

5.

Load and Availability

slide-28
SLIDE 28

Verifying Results

  • For each captcha, the most frequent solution from solvers is used.
  • If there are more frequent solutions, the answers are incorrect.
  • Heuristic Evaluation:
  • 1025 random selected captchas that had at least one solution and checked manually.
  • 1009 correct.
  • 16 incorrect
  • 6 of them because of characters similarities (zero vs O (0 – o), six versus letter B (6 – b))
slide-29
SLIDE 29

Customer Account Creation

  • All of them required prepayment.
  • Antigate and Decaptcher, offer bidding systems for higher priority access when load is high.
  • For most services, account registration is accomplished via Web and email.
  • Some of them presented obstacles during registration:
  • CaptchaBot and Antigate required third-party invitation codes.
  • Antigate guards against Western users and required the name of Prime Minister in Cyrillic.
  • Some of them, like ImageToText, required live phone call.
slide-30
SLIDE 30

Evaluation Details

  • Tested as customer for about five months using captchas from 25 popular

sites, some of them including PayPal, eBay, Google etc.

  • Submitted a single Captcha every five minutes to all services, recording the

time submitted.

slide-31
SLIDE 31
  • 1. Customer Interface
  • Most provide an API package for uploading Captchas and receiving results.
  • Two ways when interacting with the services:
  • API performs HTTP Post that uploads the image and waits for the result in HTTP

response: BeatCaptcha, BypassCaptcha, CaptchaBypass and CaptchaBot.

  • API performs one HTTP POST to upload the image, receives an image ID in the HTTP

response and polls the site for the solution using the ID: Antigate, CaptchaGateway, ImageToText.

slide-32
SLIDE 32
  • 2. Solution Accuracy

Error rate for each combination of service and CAPTCHA type

slide-33
SLIDE 33
  • 2. Solution Accuracy

Median error rate for all services Median error rate for all CAPTCHAs

slide-34
SLIDE 34
  • 3. Response Time

Median Response Time for every service Median Response Time for all Captchas

slide-35
SLIDE 35
  • 3. Response Time

Response time for each combination of service and CAPTCHA type

slide-36
SLIDE 36
  • 4. Capacity
  • Number of captchas solved in given time.
  • Increase number of load until service is overloaded.
  • Antigate has the best capacity, 27 to 41 captchas per second.
  • 1,536 threads submitting with bid set 3/1000.
  • Rejection rate very low.
  • Around 400-500 workers for their requests, the number may be larger.
  • DeCaptcher and CaptchaBypass sustained 14-15 captchas per second
  • BeatCaptchas 8 and BypassCaptcha 4 captchas per second.
slide-37
SLIDE 37
  • 5. Load and Availability
  • Customers can poll services for load reports.
  • Examine how workers get affected by load.

Load per hour reported by Antigate (Left) and DeCaptcher (Right)

slide-38
SLIDE 38

Workforce

  • Examine solving services from the solver’s (worker) aspect.
  • Export Demographic conclusions about the solvers, like origin.
  • Evaluates solvers adaptability.
  • Identify the most targeted sites.
  • Test two job sites:
  • Kolotibablo http://kototibablo.com/
  • PixProfit http://pixprofit.com/
slide-39
SLIDE 39

Worker Interface

  • First an account is needed to be created.
  • Web based Interface.
slide-40
SLIDE 40

Worker Wages

  • PixProfit: $1/1000
  • Kolotibablo: $0.5/1000 - $0.75/1000
  • Provides list for the top 100 solvers per day.
  • Average payout for 1 December 2009: $106.31
  • Average payout for 1 January 2010: $47.32
slide-41
SLIDE 41

Geo locating Workers

  • Crafted captchas using words from various languages to reveal geographic demographics
  • f solvers.
  • Captchas were showing number written in different languages.
  • Instructions were in the same language.
  • Language Varieties:
  • Prevalence of Web Native speakers: English, Chinese, Hindi.
  • Regions with low-cost labor markets: India, China, Latin America.
  • Developed Regions: Western Europe.
  • Synthetic language: Klingon.
slide-42
SLIDE 42

Geo locating Workers

Accuracy of each service on different language captchas

slide-43
SLIDE 43

Adaptability

  • Examined how services and solvers adapt to changes.
  • Sent image captchas where solvers had to identify cats and dogs.
  • Sent one captchas every 3 minutes to all services, for 12 days.
  • ImageToText had average 39.9% success.
  • BeatCaptchas had average 20.4% success.
  • The rest had success below 7%.

Error Rate of ImageToText on image captchas

slide-44
SLIDE 44

Targeted Sites

  • Identified the targeted sites of Kolotibablo and PixProfit.
  • 1. For 82 days, gathered from Kolotibablo and PixProfit 25K and 28K

captchas respectively.

  • 2. Grouped them by image dimensions.
  • 3. Manually tried to identify sites with same dimensions.
slide-45
SLIDE 45

Conclusion

  • Quality of captchas made easy to outsource to the global unskilled labor market.
  • There is a whole highly competitive business market for solving captchas, following

different models.

  • Do Captchas work:

1.

Telling computer and humans apart: Succeeded

  • 2. Preventing automated site access: Failed

3.

Limiting automated site access: Debatable

slide-46
SLIDE 46
slide-47
SLIDE 47
  • 3. Paper 2: I am Robot: (DEEP) Learning to break Semantic Image

Captchas

i.

What is examined in this paper?

  • ii. reCaptcha Analyzed
  • iii. System Overview
  • iv. Automated Solving Image reCaptcha
  • v. Influencing Advanced Risk Analysis System
  • vi. Guidelines and Countermeasures
slide-48
SLIDE 48

What is examined in this paper?

  • Explore Google’s Advance Risk Analysis System (ARAS) used on the latest version of

reCaptcha.

  • How it works.
  • Flaws.
  • Methods to influence it.
  • Design of novel low-cost attack using deep learning technologies for the image reCaptcha.
  • Introduce new safeguards and modifications for preventing the manipulation of ARAS and

mitigating attacks on image reCaptcha.

slide-49
SLIDE 49

reCaptcha Analyzed

  • ReCaptcha is the most widely used captcha service.
  • 200 million reCaptchas are solved every day.
  • Many captchas deter valid users from visiting a website.
  • Automated solvers are less lucrative than human solvers.
  • The motivation is to make challenges easier for valid users and at the same time harder for frauds, human
  • r automated.
  • Advanced Risk Analysis System:
  • Acquires user information from Google tracking cookies and browser. Even when not logged in or in incognito.
  • Based on the above, ARAS provides an easy, hard or no challenge at all to user.
slide-50
SLIDE 50

reCaptcha Workflow

  • Site that protect resources with reCaptcha, contains a

reCaptcha Widget.

  • Widget collects information about the user’s browser

and checks for automation kits.

  • 1. The user clicks on the checkbox, and a request is sent

to Google containing:

  • Referrer
  • Sitekey
  • Cookie
  • Information gathered by widget.
  • 2. The above are checked by ARAS and a HTML frame,

containing the corresponding challenge, is sent.

reCaptcha checkbox

slide-51
SLIDE 51

reCaptcha Workflow (cont.)

3.

When the checkbox is clicked, HTML field recaptcha-token is populated with a token.

  • If user is legitimate, token becomes valid by Google.
  • If not, it is invalid until user solves challenge.

4.

The token is then submitted to the site.

5.

Website sends a verification request to Google.

6.

Google sends a response, which is JSON object with a boolean field indicating if the verification was a success.

  • If the verification fails, error codes offer more information.
  • Solution must be provided in 55 seconds.
  • If not, user clicks on checkbox again to get a new challenge.
slide-52
SLIDE 52

reCaptcha Challenges

No captcha reCaptcha 1. 2. Image Recaptcha 3a.

Scanned words

3b. Street view numbers 3c.

Distorted one-word Distorted two- word

3d. 3e. Fallback captcha

slide-53
SLIDE 53

System Overview

  • Built on Selenium, specifically Mozilla Webdriver (Mozilla Firefox v.36)
  • Functionality for locating specific HTML DOM elements.
  • Features for executing JavaScript.
  • Controllers for handling keyboard and mouse event.
  • Easily saving and loading browser cookies.
  • Has two main components:

1.

Cookie Manager.

  • 2. Recaptcha Breaker.
slide-54
SLIDE 54

Cookie Manager

  • Each cookie receives up to 8 checkbox per day.
  • Around 63,000 cookies per day are required.
  • Cookies are automatically created and trained on virtual machine, in order to be

viewed as a user.

  • System configured to perform specific humane actions:
  • Mimicking diurnal cycle, with random resting intervals between actions.
  • Google search certain terms and follow links.
  • Open videos in Youtube.
slide-55
SLIDE 55

Recaptcha Breaker

  • Uses the cookies from manager.
  • 1. Visits sites that employ reCaptchas.
  • 2. The system locates the checkbox element through recaptcha-anchor and

performs mouse click action.

  • 3. In case of checkbox challenge, recaptcha-token is extracted.
  • In case of image captcha, it is passed to another module.
slide-56
SLIDE 56

Breaking the image captcha

  • In case of image, a popup is created in goog-bubble-content element. Inside the popup there

is an Iframe, with the challenge.

  • To identify the challenge system looks for:
  • rc-imageselect: image captchas
  • rc-defaultchallenge-response-field:text captchas
  • Image captcha
  • hint: rc-imageselect-desc.
  • Candidate Images: rc-imageselect-tile
  • Verification Button: recaptcha-verify-button
slide-57
SLIDE 57

Image Tags

  • Goal: Using Image Annotation Module, get tags of candidate images that match the given hint.
  • All extracted images are passed to an Image Annotation Module.
  • Clarifai: 20 tags with confidence score.
  • Alchemy: up to 8 tags with confidence score .
  • TDL: 8 tags with confidence score .
  • NeuralTalk: free-form description.
  • Caffe: 10 labels, 5 with high score, 5 more specific with lower score.
  • Also took advantage of Google Reverse Image Search (GRIS), in order to get description and

page titles. Also if found, a better quality of image is obtained.

slide-58
SLIDE 58

Tag Classifier

  • Implemented tag classifier, which allows system to select images with

similar content, in case tags do not match hint.

  • Classifier guesses the content using a subset of the given tags.
slide-59
SLIDE 59

History Module

  • Many images in captchas are repeated.
  • A labelled dataset is created containing images and their tags.
  • Each image’s hint is stored in hint_list.
slide-60
SLIDE 60

Automated Solving Image reCaptcha

  • Each candidate image will be assigned to one of 3 sets: Select, Discard, Undecided.
  • Initially all candidate images are placed in Undecided.

1.

If hint is not provided, sample image is searched in labelled dataset to obtain one.

  • 2. Information about all images are collected from GRIS.
  • 3. Every candidate image is searched in labelled dataset of the history module.
  • If found, compares their tag to hint and if found match, candidate image is placed in select

set.

  • If not found, hint_list is checked, and if found match, candidate image is placed in discard

set.

slide-61
SLIDE 61

Automated Solving Image reCaptcha(cont.)

  • 4. Image annotation processes all images and tags are assigned.
  • If tags match the hint, the image is added in select.
  • If it matches one of the tags in the hint_list, added in discard.
  • 5. System picks from select set, if not enough, picks from undecided.
slide-62
SLIDE 62

Influencing Advanced Risk Analysis System

  • Different approaches to influence ARAS:
  • Browsing History
  • Google Account Usage
  • Geo location
  • Browser Checks:
  • 1. Automation
  • 2. User-Agent
  • 3. Screen Resolution
  • 4. Mouse
  • 5. Cookie Reputation
  • 6. Site restriction
  • 7. Token Harvesting
slide-63
SLIDE 63

ARAS Influence Evaluation Browsing History

  • Quantify the minimum amount of browsing history needed in order to get a

checkbox captcha.

  • Multiple network connection setups.
  • ToR connections, with exit nodes in USA.
  • Result: ARAS is neutralized if the appended cookie is 9 days old, no matter the

network connection.

  • Even without browsing.
slide-64
SLIDE 64

ARAS Influence Evaluation Google Account Usage

  • Tried various accounts, with different settings.
  • Without phone verification.
  • With verified phone.
  • With alternative email from another provider.
  • Result: With an account, no matter the setting, after 60 days ARAS gives a

checkbox captcha.

  • Conclusion: Is easier not to use an account at all.
slide-65
SLIDE 65

ARAS Influence Evaluation Cookie Geolocation

  • Used ToR to create cookies from different regions.
  • Result: no restrictions on the location of IP of cookie creation.
slide-66
SLIDE 66

ARAS Influence Evaluation Browser Checks

  • Automation:
  • WebDriver sets the webdriver attribute to TRUE if automation is found
  • Manually set attribute to TRUE, using Javascript.
  • Result: No difference, checkbox captcha provided.
  • Screen Resolution:
  • Tried various resolutions, from 1x1 to 4096 x 2160.
  • Result: No difference, checkbox captcha provided.
slide-67
SLIDE 67

ARAS Influence Evaluation Browser Checks

  • User-Agent:
  • User-Agent is compared to the Canvas Fingerprint for validity.
slide-68
SLIDE 68

ARAS Influence Evaluation Browser Checks

  • Mouse:Tried different behaviors to check if ARAS is affected.
  • Timing of movements.
  • Erratic Movement Patterns.
  • Multiple clicks in widget and checkbox.
  • Used getElementById().click() Javascript function, to simulate clicking without

hovering.

  • Result: None of the above had a negative effect.
slide-69
SLIDE 69

ARAS Influence Evaluation Browser Checks

  • Token Harvesting:
  • Experimented if creating a large number of cookies from single IP is prohibited.
  • Result: 63000 cookies per day without getting blocked. Only restriction was

when triggering concurrent request.

  • Selling token harvesting attacks for $2/1000, could make $104-110 daily, or

even higher with multiple attacks.

slide-70
SLIDE 70

ARAS Influence Evaluation Maximum number of checkbox

  • Identify how many checkbox captchas can solve in a day without being

blocked.

Checkbox captchas obtained per minute

slide-71
SLIDE 71

ARAS Influence Evaluation Overall Evaluation

  • ReCaptcha suffers from significant flaws and omissions.
  • In an attempt to remove the burden for legitimate users, attacks were

enabled.

  • The checks performed, can be used to introduce more safeguards.
slide-72
SLIDE 72

Automated Solving Image reCaptcha Evaluation

  • The image captcha breaking is evaluated based on these aspects:
  • Solution flexibility: how many wrong answers are allowed.
  • Image Repetition: at what rate challenges/images are repeated.
  • Live Attack: Real attack results.
slide-73
SLIDE 73

Automated Solving Image reCaptcha Solution Flexibility

  • Manually solved image challenges using

different combination of correct and wrong selections.

  • 74% of the image captchas had 2 correct

images out of 9 candidates, the rest had 3-4.

  • Based on these results, system was set

to select 3 images.

Combinations of correct and wrong answers that pass image reCaptcha

slide-74
SLIDE 74

Automated Solving Image reCaptcha Image Repetition

  • Searched challenges with identical MD5 values.
  • From 700 captchas, found 6 pairs of identical challenge. In 2 different sites within two hours.
  • Conclusion: Challenges are not created on-the-fly, but from a small pool of challenges.
  • Searched images using perceptual hashes.
  • Identified 1368 images identical in total.
  • 358 different repeating images.
  • Most repeated image, was found 92 times.
slide-75
SLIDE 75

Automated Solving Image reCaptcha Attack Simulation

Accuracy of simulated attack for different combinations of modules against the image reCaptcha

slide-76
SLIDE 76

Automated Solving Image reCaptcha Live Attack

  • Used Clarifai.
  • Labelled dataset: Manually labelled 3000 images from challenges, with a tag from

hint_list.

  • Run the attack on 2235 captchas, scoring 70.78% accuracy.
  • Better results because of repetition (found 1515 sample images and 385 candidate in

labelled dataset).

  • Also found 4 pairs of identical challenges. Google doesn’t remove challenges even if

completed correctly.

slide-77
SLIDE 77

Live Attack Time and Hints Repetition

Cumulative distribution of time required for each step Frequency and success rate for each type of hint

slide-78
SLIDE 78

Facebook Captcha Attack

  • Facebook uses captchas to prevent bots

from sending suspicious URLs and spam.

  • Resizes images dynamically in html,

allowing access to high resolution versions.

  • May have 2 to 10 correct images, 5 - 7 in

most cases.

slide-79
SLIDE 79

Facebook Captcha Attack

Attack accuracy against Facebook’s image captcha

slide-80
SLIDE 80

Guidelines and Countermeasures

  • Token Auctioning:
  • Token verification api has an optional field comparing ip address of user that solved and the one that

submitted the token. Should be mandatory.

  • Risk Analysis:
  • Account:
  • Those that are not logged in, will have to solve the hardest challenge.
  • Limit number of tokens per IP address.
  • Cookie Reputation:
  • Number of cookies that can be created within a time period, should be regulated.
  • Browser Checks:
  • Stricter approach and return no challenge if overtly suspicious.
  • E.g. mismatch browser-user-agent.
slide-81
SLIDE 81

Guidelines and Countermeasures (cont.)

  • Image captcha attacks:
  • Remove flexibility.
  • Increase number and increase range of correct images.
  • Repetition:
  • When a challenge is shown, it should be removed from pool.
  • Pool of challenges should be larger.
  • Hint and Content:
  • Hint should be removed
  • Providers can make experiments to find problematic image categories for image annotation software.
  • Content homogeneity:
  • Populate challenges with filler images of the same category as solutions.
slide-82
SLIDE 82

Guidelines and Countermeasures (cont.)

  • Advanced Semantic Relations:
  • Instead of similar objects, user could be asked to select semantically related objects.
  • Adversarial Images:
  • Altering a small number of pixels, the image are misclassified, but are same visually.
  • Introducing noise:
  • Experiment on random grid, with varying parameters.
  • Grid reduces probability of retrieved higher resolution images.
  • Images will have to be cleaned first, so computational cost is added.
slide-83
SLIDE 83
slide-84
SLIDE 84