[PPT] - EPL682 - PAPERS ---------- Re: CAPTCHAs Understanding PowerPoint Presentation

SLIDE 1

EPL682 - PAPERS

Re: CAPTCHAs – Understanding CAPTCHA-Solving Services in an Economic Context

I Am Robot: (Deep) Learning to Break Semantic Image CAPTCHAs Antreas Dionysiou - Department of Computer Science University of Cyprus February 2019

SLIDE 2

2

BACKGROUND

SLIDE 3

What are CAPTCHAs?

Completely Automated Public Turing test to tell Computers and

Humans Apart (CAPTCHA).

Proposed in 2003 by Von et al.
Also referred as Reverse Turing Tests.
CAPTCHAs tell if a user is human or not.
Different versions of CAPTCHA exists.
Block automated bot systems attacks.
Must resist automated solving.
Must be painless for humans.

3

SLIDE 4

CAPTCHA Versions

4

SLIDE 5

Text-based CAPTCHAs

Most widely used CAPTCHA scheme.
CAPTCHA designing, reflects a trade-off between protection and

usability.

5

SLIDE 6

Paper: “Re: CAPTCHAs-Understanding CAPTCHA- Solving Services in an Economic Context.”

6

SLIDE 7

What is all about? (Summary)

Brief explanation about CAPTCHAs.
CAPTCHA solving ecosystem has emerged with 2 major categories:

1. Automated CAPTCHA solvers (software). 2. Real-time human labor.

Evaluation of CAPTCHAs in economic terms.
CAPTCHA’s underlying cost structure benefits defender.
Plenty of CAPTCHA solving services with very low prices.
CAPTCHAs should be viewed as an economic impediment to an

attacker (not only as a technological one).

$1/1000

Motoyama, Marti, et al. "Re: CAPTCHAs-Understanding CAPTCHA-Solving Services in an Economic Context." USENIX Security Symposium. Vol. 10. 2010. 7

SLIDE 8

What is all about? (Cont.)

The overall shape of market is poorly understood.
Big evolution of automated solving tools…
…but, eclipsed by the emergence of human-based solving market.
Economic examination of human-based solving market.

Human-based solvers Automated (software) solvers Hybrid solvers

8 Motoyama, Marti, et al. "Re: CAPTCHAs-Understanding CAPTCHA-Solving Services in an Economic Context." USENIX Security Symposium. Vol. 10. 2010.

SLIDE 9

Related work

The authors claim that they are the first to identify the growth of human-

labor-based CAPTCHA solving services.

The closest work related is the study of Bursztein et al. [1], BUT is focused
n CAPTCHA difficulty rather than the underlying business models.
No other related work (at that time).

[1] E. Bursztein, S. Bethard, J. C. Mitchell, D. Jurafsky, and C. Fabry. How good are humans at solving CAPTCHAs? a large scale evaluation. In IEEE S&P ’10, 2010.

9 Motoyama, Marti, et al. "Re: CAPTCHAs-Understanding CAPTCHA-Solving Services in an Economic Context." USENIX Security Symposium. Vol. 10. 2010.

SLIDE 10

Authors Tried to Answer Key Questions Like

Which CAPTCHAs are mostly targeted? Rough solving capacity? Quality of service? Pricing of services? Workforce demographics? Services’ adaptability to changes in CAPTCHA schemes?

Overall, this research provides a reasoning about the net value of CAPTCHAs under existing threats.

10 Motoyama, Marti, et al. "Re: CAPTCHAs-Understanding CAPTCHA-Solving Services in an Economic Context." USENIX Security Symposium. Vol. 10. 2010.

SLIDE 11

CAPTCHA Economics, but why???

CAPTCHA’s technical perspective, doesn’t capture the business

realities of CAPTCHA-solving ecosystem.

The profitability of any scam is a function of 3 factors:

1. The cost of CAPTCHA solving. 2. The effectiveness of any secondary defenses. 3. The efficiency of the attacker’s business model.

CAPTCHAs add friction to the attacker’s business model.
CAPTCHAs minimize the cost and legitimate user impact of heavier-

weight secondary defenses (e.g. sms, etc.).

11 Motoyama, Marti, et al. "Re: CAPTCHAs-Understanding CAPTCHA-Solving Services in an Economic Context." USENIX Security Symposium. Vol. 10. 2010.

SLIDE 12

Economics of CAPTCHA-solving Market

The market for CAPTCHA-solving services has been expanded…
…but, the wages of workers have been declining due to these

reasons:

1. CAPTCHA solving is an unskilled job. 2. It can be easily sourced via internet to the lowest cost labor. 3. An increased competition on the retail side exist.

Mr. E said that 50% of revenue is profit, roughly 10% is for servers and

bandwidth, and the remainder is split between solving labor.

12 Motoyama, Marti, et al. "Re: CAPTCHAs-Understanding CAPTCHA-Solving Services in an Economic Context." USENIX Security Symposium. Vol. 10. 2010.

SLIDE 13

CAPTCHA-Solving Market Workflow

13 Motoyama, Marti, et al. "Re: CAPTCHAs-Understanding CAPTCHA-Solving Services in an Economic Context." USENIX Security Symposium. Vol. 10. 2010.

SLIDE 14

CAPTCHA-Solving Services Analysis

Evaluated services which were well-advertised at the time.
Evaluated 8 CAPTCHA-solving services for 5 months collecting

CAPTCHAs by most popular web sites.

Evaluating several aspects such as:

1. Customer interface. 2. Solution accuracy. 3. Response time. 4. Availability. 5. Capacity.

14 Motoyama, Marti, et al. "Re: CAPTCHAs-Understanding CAPTCHA-Solving Services in an Economic Context." USENIX Security Symposium. Vol. 10. 2010.

SLIDE 15

Quality of Service Assessment

15 Motoyama, Marti, et al. "Re: CAPTCHAs-Understanding CAPTCHA-Solving Services in an Economic Context." USENIX Security Symposium. Vol. 10. 2010.

SLIDE 16

Quality of Service Assessment (cont.)

Median error rate and response time (in seconds) for all services.

Services are ranked top-to-bottom in order of increasing error rate.

16 Motoyama, Marti, et al. "Re: CAPTCHAs-Understanding CAPTCHA-Solving Services in an Economic Context." USENIX Security Symposium. Vol. 10. 2010.

SLIDE 17

Services Analysis Results

Antigate and ImageToText provided the fastest service.
Accuracy and response time varied with the type of CAPTCHA.
The value of a particular solver depends on 3 factors, namely:

1. Accuracy. 2. Response time. 3. Price.

DeCaptcher and CaptchaBot had the largest solving capacity, as they

could solve 14–15 CAPTCHAs per second.

17 Motoyama, Marti, et al. "Re: CAPTCHAs-Understanding CAPTCHA-Solving Services in an Economic Context." USENIX Security Symposium. Vol. 10. 2010.

SLIDE 18

Worker Wages

They focused on two services namely Kolotibablo and PixProfit.
Kolotibablo pays workers at a variable rate (from $0.50/1,000 up to
ver $0.75/1,000 CAPTCHAs) depending on how many CAPTCHAs

they have solved.

PixProfit offers a somewhat higher rate of $1/1,000.
A minimum amount of money should be collected before payout.
Most services provide payment via an online e-currency system.

18 Motoyama, Marti, et al. "Re: CAPTCHAs-Understanding CAPTCHA-Solving Services in an Economic Context." USENIX Security Symposium. Vol. 10. 2010.

SLIDE 19

Geographic Demographics

All services include a sizeable workforce fluent in Chinese, likely

mainland China.

Antigate has appreciable accuracies for Russian and Hindi,

presumably drawing on workforces in Russia and India.

Similarly, for CaptchaBypass and Russian.
BeatCaptcha and Tamil, Portuguese, and Spanish.
DeCaptcher and Tamil.
ImageToText has appreciable accuracy across a remarkable range of

languages.

19 Motoyama, Marti, et al. "Re: CAPTCHAs-Understanding CAPTCHA-Solving Services in an Economic Context." USENIX Security Symposium. Vol. 10. 2010.

SLIDE 20

Adaptability of CAPTCHA Services

Again focused on Kolotibablo and PixProfit services.
Test them on the Asirra CAPTCHA.
ImageToText displayed a remarkable adaptability, solving the Asirra

CAPTCHA on average 39.9% of the time.

Figure 5: ImageToText error rate for the custom Asirra CAPTCHA over time.

20 Motoyama, Marti, et al. "Re: CAPTCHAs-Understanding CAPTCHA-Solving Services in an Economic Context." USENIX Security Symposium. Vol. 10. 2010.

SLIDE 21

Most Popular Targeted CAPTCHAs

21 Motoyama, Marti, et al. "Re: CAPTCHAs-Understanding CAPTCHA-Solving Services in an Economic Context." USENIX Security Symposium. Vol. 10. 2010.

SLIDE 22

Conclusions

CAPTCHAs’ low-impact quality makes them attractive to site operators,
…but, at the same time, easy to be outsourced to global unskilled labor

market.

CAPTCHA-solving business is well-developed, highly-competitive, and

with large capacity industry.

Wholesale and retail prices for CAPTCHA-solving will continue to decline.
CAPTCHAs don’t prevent large-scale automated site access,
…but, they effectively limit automated site access.

22 Motoyama, Marti, et al. "Re: CAPTCHAs-Understanding CAPTCHA-Solving Services in an Economic Context." USENIX Security Symposium. Vol. 10. 2010.

SLIDE 23

Conclusions (Cont.)

As the cost of CAPTCHA solving decreases, a site operator must

employ secondary defenses more aggressively.

CAPTCHAs should be regarded as an economic impediment (not only

a technological one).

CAPTCHAs are low-impact mechanisms that add friction to the

attacker’s business model.

CAPTCHAs minimize the cost and legitimate user impact of heavier-

weight secondary defenses.

23 Motoyama, Marti, et al. "Re: CAPTCHAs-Understanding CAPTCHA-Solving Services in an Economic Context." USENIX Security Symposium. Vol. 10. 2010.

SLIDE 24

Paper: “I Am Robot: (Deep) Learning to Break Semantic Image CAPTCHAs.”

24

SLIDE 25

What is all about? (Summary)

A study of the latest version of Google’s reCaptcha.
Authors influence reCaptcha’s risk analysis process.
Identify reCaptcha’s flaws, bypass restrictions, and deploy large-scale

attacks.

Proposal of an effective and low-cost deep-learning-based attack for

the semantic annotation of images.

Proposal of a series of safeguards and modifications for resisting their

attacks.

25 Sivakorn, S., Polakis, I., & Keromytis, A. D. (2016, March). I am robot:(deep) learning to break semantic image captchas. In 2016 IEEE European Symposium on Security and Privacy (EuroS&P) (pp. 388-403). IEEE.

SLIDE 26

Related work

Yan et al., “A low-cost attack on a microsoft CAPTCHA,” in CCS ’08.
Yan et al., “Breaking visual CAPTCHAs with naive pattern recognition

algorithms,” in ACSAC ’07.

Li et al., “Breaking e-banking CAPTCHAs,” in ACSAC ’10.
Perez et al., “Breaking reCAPTCHAs with unpredictable collapse:

Heuristic character segmentation and recognition,” in MCPR 2012.

Many, many, other papers related to automated CAPTCHA solving...

26 Sivakorn, S., Polakis, I., & Keromytis, A. D. (2016, March). I am robot:(deep) learning to break semantic image captchas. In 2016 IEEE European Symposium on Security and Privacy (EuroS&P) (pp. 388-403). IEEE.

SLIDE 27

Google’s reCaptcha

The goal of Google’s latest version of reCaptcha, is to:

1. Minimize the effort for legitimate users. 2. Requiring tasks that are more challenging to computers than “simple” text recognition.

reCaptcha is driven by an “advanced risk analysis system”.
reCaptcha widget also performs a series of browser checks.
Most widely used CAPTCHA service.
Leverages information about users’ activities through cookies.

27 Sivakorn, S., Polakis, I., & Keromytis, A. D. (2016, March). I am robot:(deep) learning to break semantic image captchas. In 2016 IEEE European Symposium on Security and Privacy (EuroS&P) (pp. 388-403). IEEE.

SLIDE 28

How reCaptcha works?

1. User clicks on a checkbox.
2. A request is sent containing all related to user collected

information.

3. The request is analyzed by the advanced risk analysis system, which

decides the type of CAPTCHA challenge to be presented to the user.

4. If the user requests multiple challenges or provides several wrong

answers, the system will return increasingly harder challenges.

28 Sivakorn, S., Polakis, I., & Keromytis, A. D. (2016, March). I am robot:(deep) learning to break semantic image captchas. In 2016 IEEE European Symposium on Security and Privacy (EuroS&P) (pp. 388-403). IEEE.

SLIDE 29

CAPTCHA Versions

29 Sivakorn, S., Polakis, I., & Keromytis, A. D. (2016, March). I am robot:(deep) learning to break semantic image captchas. In 2016 IEEE European Symposium on Security and Privacy (EuroS&P) (pp. 388-403). IEEE.

SLIDE 30

Contributions

Deployed an automation tool without being detected by reCaptcha widget.
Identified design flaws that allow attackers to “influence” the advanced risk

analysis process.

ML-based system for solving image-based CAPTCHAs, that extracts

semantic information from images.

Highly effective and efficient system, achieving 70.78% accuracy, solving

challenges in ≈19 seconds.

Demonstrated their attack’s generic applicability.
Evaluated their tool in terms of cost-effectiveness (offline-mode).

30 Sivakorn, S., Polakis, I., & Keromytis, A. D. (2016, March). I am robot:(deep) learning to break semantic image captchas. In 2016 IEEE European Symposium on Security and Privacy (EuroS&P) (pp. 388-403). IEEE.

SLIDE 31

Their CAPTCHA-solving system

Their system is build on Selenium, and Mozilla Firefox (v.36).
Their system is based on 2 components:

1. The 1st is responsible for creating tracking cookies that influence the risk analysis process. 2. The 2nd processes the challenges following different techniques based on the type of challenge.

31 Sivakorn, S., Polakis, I., & Keromytis, A. D. (2016, March). I am robot:(deep) learning to break semantic image captchas. In 2016 IEEE European Symposium on Security and Privacy (EuroS&P) (pp. 388-403). IEEE.

SLIDE 32

Computer Vision Algorithms and Image Annotation Services Used

Google’s reverse image search (GRIS) for conducting an image-based

search.

Different Image annotation services for assigning tags (keywords) or

free-form description of images.

A ML-based classifier that can guess the content of an image based
n a subset of the tags.
A manually created labeled-dataset with images and their tag from

challenges they have collected (History Module).

32 Sivakorn, S., Polakis, I., & Keromytis, A. D. (2016, March). I am robot:(deep) learning to break semantic image captchas. In 2016 IEEE European Symposium on Security and Privacy (EuroS&P) (pp. 388-403). IEEE.

SLIDE 33

Findings

Google’s advanced risk analysis can be neutralized by using a 9-day
ld cookie (with or without web surfing).
Being logged in a Google account, with, and without conducting a

phone verification does not influence risk analysis system.

No restriction based on the country in which a cookie is created.
Webdriver variable does not have an effect.
User-agent’s browser and engine versions as well as the actual

environment of the experiment plays critical role.

33 Sivakorn, S., Polakis, I., & Keromytis, A. D. (2016, March). I am robot:(deep) learning to break semantic image captchas. In 2016 IEEE European Symposium on Security and Privacy (EuroS&P) (pp. 388-403). IEEE.

SLIDE 34

Findings (Cont.)

User-agent that does not contain complete information, or is miss-

formatted receives a hard (fallback) CAPTCHA.

Widget does not detect the underlying operating system.
Mismatch between user-agents, during a cookie’s creation and when

requesting a CAPTCHA with that cookie, does not have effect.

Screen resolution and mouse behavior do not affect the outcome of risk

analysis.

Cookies are not assigned a reputation score (according to history).
No mechanism prohibiting the creation of a large number of cookies from a

single IP address.

34 Sivakorn, S., Polakis, I., & Keromytis, A. D. (2016, March). I am robot:(deep) learning to break semantic image captchas. In 2016 IEEE European Symposium on Security and Privacy (EuroS&P) (pp. 388-403). IEEE.

SLIDE 35

Findings (Cont.)

Capacity per day:

1. During weekdays, they could solve between 52,000 and 55,000. 2. During weekends they could solve 59,000.

reCaptcha version suffers from significant flaws and omissions.
For most cases (74%) the number of correct candidate images is 2;

the rest contain 3 and they also found two challenges with 4.

Challenges are not created “on-the-fly” but selected from a relatively

small pool of challenges.

35 Sivakorn, S., Polakis, I., & Keromytis, A. D. (2016, March). I am robot:(deep) learning to break semantic image captchas. In 2016 IEEE European Symposium on Security and Privacy (EuroS&P) (pp. 388-403). IEEE.

SLIDE 36

Findings (Cont.)

1,368 redundant images that belonged to 358 sets of identical images.
Highly efficient attack solving challenges in ≈19 seconds, mentioning

that the most time consuming phase is GRIS.

A limited variety of image categories has been detected.
Adversaries can deploy accurate and efficient attacks against the

image reCaptcha without relying on external services.

Evaluated their CAPTCHA breaking system’s economic viability.

36 Sivakorn, S., Polakis, I., & Keromytis, A. D. (2016, March). I am robot:(deep) learning to break semantic image captchas. In 2016 IEEE European Symposium on Security and Privacy (EuroS&P) (pp. 388-403). IEEE.

SLIDE 37

Findings (Cont.)

Discuss countermeasures for defending against their attacks, and

their potential impact on the usability.

reCaptcha has been updated after authors informed Google and

Facebook.

37 Sivakorn, S., Polakis, I., & Keromytis, A. D. (2016, March). I am robot:(deep) learning to break semantic image captchas. In 2016 IEEE European Symposium on Security and Privacy (EuroS&P) (pp. 388-403). IEEE.

SLIDE 38

Conclusions – Future work

Further improvement of their attack’s accuracy can be explored.
Reassessment on reverse Turing tests (CAPTCHAs) and their design is

considered critical.

Demonstrated the feasibility of large-scale CAPTCHA-solving attacks.
reCaptcha’s advanced risk analysis system and widget possess

valuable functionality, that can be incorporated into future captcha schemes for mitigating attacks.

38 Sivakorn, S., Polakis, I., & Keromytis, A. D. (2016, March). I am robot:(deep) learning to break semantic image captchas. In 2016 IEEE European Symposium on Security and Privacy (EuroS&P) (pp. 388-403). IEEE.

SLIDE 39

Thanks for your attention!!! J Any questions?

39