Tripwire Inferring Internet Site Compromise Joe DeBlasio Stefan - - PowerPoint PPT Presentation
Tripwire Inferring Internet Site Compromise Joe DeBlasio Stefan - - PowerPoint PPT Presentation
Tripwire Inferring Internet Site Compromise Joe DeBlasio Stefan Savage UC San Diego Geoffrey M. Voelker Alex C. Snoeren On account compromise Compromise of email/social network/etc is devastating Personal/professional reputational, financial
2
On account compromise
Compromise of email/social network/etc is devastating Personal/professional reputational, financial damage Compromise can come many sources Phishing, brute forcing, malware, password re-use
3
On account compromise
Compromise of email/social network/etc is devastating Personal/professional reputational, financial damage Compromise can come many sources Phishing, brute forcing, malware, password re-use
4
Password re-use from data breaches
Site A’s usernames and passwords exposed… …then attacker uses leaked credentials on unrelated Site B > 40% of users reuse passwords; 25% usually use only one [1]
[1] Das, Anupam, et al. "The tangled web of password reuse." Symposium on Network and Distributed System Security (NDSS). 2014.
5
Natural & valuable target: email accounts
Most sites have email address Natural to try password on email Email accounts are valuable
https://krebsonsecurity.com/2013/06/the-value-of-a-hacked-email-account/
6
7
This work: getting more insight into compromise prevalence We have no idea how many sites are compromised
8
Detecting site compromise
Publicly known compromises are attacker- or site-identified But what about when
- attackers are quiet, and
- sites cannot or will not help?
3rd party detection may expose additional compromises
9
Tripwire
Technique for 3rd-party compromise detection using password re-use attacks
10
Tripwire: detecting compromise via re-use attacks
- 1. Partner with a major email provider
Create many distinct email accounts to act as honeypots
Email Provider
B@ PW2 C@ PW3 A@ PW1
11
Tripwire: detecting compromise via re-use attacks
- 2. Register for accounts on services you want to monitor
Unique email per site, password shared between them
Email Provider Site B Site A
B@ PW2 C@ PW3 A@ PW1
Site C
12
Tripwire: detecting compromise via re-use attacks
- 3. Monitor email accounts for login
Email Provider Site B Site A
B@ PW2 C@ PW3 A@ PW1
Site C
13
Tripwire: detecting compromise via re-use attacks
- 3. Monitor email accounts for login
Email Provider Site B Site A
B@ PW2 C@ PW3 A@ PW1 🍰@ PW3
Site C
14
Tripwire: detecting compromise via re-use attacks
- 3. Monitor email accounts for login
Email logins indicate compromise of corresponding site
Email Provider Site B Site A
B@ PW2 C@ PW3 A@ PW1 🍰@ PW3
Site C
15
Tripwire: our contribution
Proof of concept implementation & study ~2300 sites under measurement; prototype crawler Fresh compromises detected 19 compromises over 24 months; only one previously public Large and small sites affected Largest site has ~50m users; >100m users impacted across Today’s plan: Registration, Compromises, Disclosure
16
Ethical considerations
We did not receive consent from sites under measurement (doing so may compromise integrity and is impractical) We believe that technical burden on sites is low (created few accounts, accounts unused, rate-limited) We have obscured compromised sites’ identities (limits reputational damage from involuntary inclusion) We consulted our group’s & institution’s counsel
17
Tripwire Process
Generate UNs / PWs Create email accounts Register on websites Monitor accounts
18
Full identity created e.g. full name, address, phone, mother’s maiden name, etc. Plausible usernames AngryNeighbor1234 Two types of passwords
- “Easy to crack”
Website1
- “Hard to crack”
QpFAiy5BfB
Generate UNs / PWs Create email accounts Register on websites Monitor accounts
19
Multiple accounts with differing PW strengths allows inference
- f breach severity
Only accounts with easy passwords accessed? Breach contained well-hashed passwords. Accounts with hard and easy passwords accessed? Plain text or weak hashing.
Generate UNs / PWs Create email accounts Register on websites Monitor accounts
20
Send usernames and passwords to email provider Provider creates matching email accounts Availability of username used as proxy for global availability
Generate UNs / PWs Create email accounts Register on websites Monitor accounts
21
Automated crawler registers for accounts Best effort! Developers try to make this process hard to automate! Skips ineligible, confusing, or non-English sites e.g. can not support sites that require a credit card, fails complicated CAPTCHAs.
Generate UNs / PWs Create email accounts Register on websites Monitor accounts
22
Crawler provided URL and identity URLs from Alexa rankings; crawled approximately top 30k PhantomJS-based, Javascript-capable crawler Load page → Find registration → Fill form → Email verification If succeeds, registers again with additional identity (with different password type)
Generate UNs / PWs Create email accounts Register on websites Monitor accounts
23
Email provider monitors ALL created accounts for logins Reports back all successful logins events Provider does not know what accounts have been used Provider only has list of all usernames >100k unused email accounts– none were ever accessed Strong evidence that email provider was not breached
Generate UNs / PWs Create email accounts Register on websites Monitor accounts
24
Sites Monitored
Of ~30k sites considered ~45% are not in English ~20% fail to load or are otherwise ineligible Crawler succeeds on ~20% of eligible sites: 2,300 sites total ~1% of sites measured were compromised! So, what did we find?
25 Hard? Alexa* Category A ✔ 500 Deals B 8500 Gaming C 5500 BitTorrent D ✔ 20500 Wallpapers E 16000 Gaming F 18500 Gaming G ✔ 17500 RSS Feeds H ✔ 17500 Marketing I ✔ 7500 Horoscopes J ✔ 20500 Gaming Hard? Alexa* Category K ✔ 20500 Classifieds L 11000 Adult M ✔ 20000 Vacations N 11500 Gaming O 18000 Outdoors P ? 1500 Adult Q ✔ 22000 Tourism R ✔ 22500 Press S 4000 BTC Forum
* Rounded up to nearest 500
26 Hard? Alexa* Category A ✔ 500 Deals B 8500 Gaming C 5500 BitTorrent D ✔ 20500 Wallpapers E 16000 Gaming F 18500 Gaming G ✔ 17500 RSS Feeds H ✔ 17500 Marketing I ✔ 7500 Horoscopes J ✔ 20500 Gaming Hard? Alexa* Category K ✔ 20500 Classifieds L 11000 Adult M ✔ 20000 Vacations N 11500 Gaming O 18000 Outdoors P ? 1500 Adult Q ✔ 22000 Tourism R ✔ 22500 Press S 4000 BTC Forum
BitcoinTalk.org
* Rounded up to nearest 500
27 Hard? Alexa* Category A ✔ 500 Deals B 8500 Gaming C 5500 BitTorrent D ✔ 20500 Wallpapers E 16000 Gaming F 18500 Gaming G ✔ 17500 RSS Feeds H ✔ 17500 Marketing I ✔ 7500 Horoscopes J ✔ 20500 Gaming Hard? Alexa* Category K ✔ 20500 Classifieds L 11000 Adult M ✔ 20000 Vacations N 11500 Gaming O 18000 Outdoors P ? 1500 Adult Q ✔ 22000 Tourism R ✔ 22500 Press S 4000 BTC Forum
BitcoinTalk.org used salted sha256_crypt
* Rounded up to nearest 500
28 Hard? Alexa* Category A ✔ 500 Deals B 8500 Gaming C 5500 BitTorrent D ✔ 20500 Wallpapers E 16000 Gaming F 18500 Gaming G ✔ 17500 RSS Feeds H ✔ 17500 Marketing I ✔ 7500 Horoscopes J ✔ 20500 Gaming Hard? Alexa* Category K ✔ 20500 Classifieds L 11000 Adult M ✔ 20000 Vacations N 11500 Gaming O 18000 Outdoors P ? 1500 Adult Q ✔ 22000 Tourism R ✔ 22500 Press S 4000 BTC Forum
Hard Passwords Accessed
* Rounded up to nearest 500
29 Hard? Alexa* Category A ✔ 500 Deals B 8500 Gaming C 5500 BitTorrent D ✔ 20500 Wallpapers E 16000 Gaming F 18500 Gaming G ✔ 17500 RSS Feeds H ✔ 17500 Marketing I ✔ 7500 Horoscopes J ✔ 20500 Gaming Hard? Alexa* Category K ✔ 20500 Classifieds L 11000 Adult M ✔ 20000 Vacations N 11500 Gaming O 18000 Outdoors P ? 1500 Adult Q ✔ 22000 Tourism R ✔ 22500 Press S 4000 BTC Forum
Alexa < 500 in home country
* Rounded up to nearest 500
30 Hard? Alexa* Category A ✔ 500 Deals B 8500 Gaming C 5500 BitTorrent D ✔ 20500 Wallpapers E 16000 Gaming F 18500 Gaming G ✔ 17500 RSS Feeds H ✔ 17500 Marketing I ✔ 7500 Horoscopes J ✔ 20500 Gaming Hard? Alexa* Category K ✔ 20500 Classifieds L 11000 Adult M ✔ 20000 Vacations N 11500 Gaming O 18000 Outdoors P ? 1500 Adult Q ✔ 22000 Tourism R ✔ 22500 Press S 4000 BTC Forum
Same company; >30m users
* Rounded up to nearest 500
31
Login Activity
More than 1750 distinct account accesses Most via IMAP, but also SMTP, POP, web and mobile API Most accounts are not abused– little indication to user ~25% used for spam; one password changed
7/15 9/15 11/15 1/16 3/16 5/16 7/16 9/16 11/16 1/17
DDtH
6 5 4 3 2 1 / K J I H G ) ( D C B A
CRPSrRPLsHG 6LtH
KDrG HDsy (6) (77) (27) (3) (1) (23) (570) (9) (4) (11) (152) (90) (243) (119) (22) (99) (27) (83) (2)
7/15 9/15 11/15 1/16 3/16 5/16 7/16 9/16 11/16 1/17
DDtH
6 5 4 3 2 1 / K J I H G ) ( D C B A
CRPSrRPLsHG 6LtH
KDrG HDsy (6) (77) (27) (3) (1) (23) (570) (9) (4) (11) (152) (90) (243) (119) (22) (99) (27) (83) (2)
7/15 9/15 11/15 1/16 3/16 5/16 7/16 9/16 11/16 1/17
DDtH
6 5 4 3 2 1 / K J I H G ) ( D C B A
CRPSrRPLsHG 6LtH
KDrG HDsy (6) (77) (27) (3) (1) (23) (570) (9) (4) (11) (152) (90) (243) (119) (22) (99) (27) (83) (2)
7/15 9/15 11/15 1/16 3/16 5/16 7/16 9/16 11/16 1/17
DDtH
6 5 4 3 2 1 / K J I H G ) ( D C B A
CRPSrRPLsHG 6LtH
KDrG HDsy (6) (77) (27) (3) (1) (23) (570) (9) (4) (11) (152) (90) (243) (119) (22) (99) (27) (83) (2)
7/15 9/15 11/15 1/16 3/16 5/16 7/16 9/16 11/16 1/17
DDtH
6 5 4 3 2 1 / K J I H G ) ( D C B A
CRPSrRPLsHG 6LtH
KDrG HDsy (6) (77) (27) (3) (1) (23) (570) (9) (4) (11) (152) (90) (243) (119) (22) (99) (27) (83) (2)
7/15 9/15 11/15 1/16 3/16 5/16 7/16 9/16 11/16 1/17
DDtH
6 5 4 3 2 1 / K J I H G ) ( D C B A
CRPSrRPLsHG 6LtH
KDrG HDsy (6) (77) (27) (3) (1) (23) (570) (9) (4) (11) (152) (90) (243) (119) (22) (99) (27) (83) (2)
7/15 9/15 11/15 1/16 3/16 5/16 7/16 9/16 11/16 1/17
DDtH
6 5 4 3 2 1 / K J I H G ) ( D C B A
CRPSrRPLsHG 6LtH
KDrG HDsy (6) (77) (27) (3) (1) (23) (570) (9) (4) (11) (152) (90) (243) (119) (22) (99) (27) (83) (2)
39
Who are the attackers?
Of >1750 logins, >1300 distinct IPs Only 5 sites connected by IP re-use IPs appear mostly residential Some compromised servers Bursty accesses across IPs Access at one immediately after another Suggests: compromised hosts form proxy network
40
Disclosure
41
Disclosure
Notified sites via email Recipients via WHOIS, ‘Contact Us’, security@, etc. First message didn’t disclose-- designed to evoke reply “We [are] happy to share details…with the appropriate party.” As helpful as possible Explained system, password strengths, answered questions
42
Disclosure: High order bits
Receiving this news is not fun Good evidence of breach, but little guidance on finding it 2/3rds of sites did not respond No sites notified users. No sites forced a password reset.
43
Disclosure: Smaller sites
Many sites know they have problems
- Old software
Out of date WordPress, PHP from 2008
- Known Vulnerabilities:
Improper SQL escaping, XSS
- Poor password hygiene:
Plaintext passwords, unsalted MD5, etc. In one case, breach is ongoing Many promised fixes ‘soon’, but have few cycles to spare
44
Disclosure: Site L (Adult)
“Our… setup was 60 servers, 30 for media streaming and 30 for back end.” “I started it in 2007 with a low level of IT knowledge… [We] used to employ sysadmins who cost a lot of money and made it their purpose in life to frustrate every possible upgrade… so I threw them out … and took over the sites myself, being thrown in the deep end is an understatement.”
45
Disclosure: Large Sites
Most responded in less than an hour, took us seriously Rapidly involved lawyers and engineers in calls and emails Focused on finding alternate explanations No site corroborated, but no one could offer an alternate explanation for our evidence
46
Disclosure: Site O (Outdoors)
Development halted in 2012– later purchased by largest competitor Breach detected when accounts were transferred to competitor
47
Disclosure: Site A (Deals)
Worth > $1B; ~50M users; Alexa < 500 Reported engaged 3rd party incident response Around our observed logins, complaints from Twitter One publication ran a story; site denied allegations
48
Takeaways
3rd parties can detect compromise via password re-use Offers insight into otherwise invisible compromise Nearly 1% of sites were compromised Conservative estimate! More time means more detection. Compromises at sites of all sizes Companies with dedicated security teams still impacted Strong evidence is not enough for sites to disclose Unlikely to corroborate; strong incentive to not disclose
49
Thanks!
Anonymized data and crawler source https://github.com/ccied/tripwire Joe DeBlasio, Stefan Savage, Geoffrey M. Voelker, Alex C. Snoeren {jdeblasio,savage,voelker,snoeren}@cs.ucsd.edu