fifteen minutes of unwanted fame detecting and
play

Fifteen Minutes of Unwanted Fame: Detecting and Characterizing - PowerPoint PPT Presentation

Fifteen Minutes of Unwanted Fame: Detecting and Characterizing Doxing Peter Snyder* Periwinkle Doerfler + Chris Kanich* Damon McCoy + * + 1 Overview Doxing is a targeted form of online abuse Prior work is qualitative or on


  1. Fifteen Minutes of Unwanted Fame: Detecting and Characterizing Doxing Peter Snyder* – Periwinkle Doerfler + – Chris Kanich* – Damon McCoy + * + 1

  2. Overview • Doxing is a targeted form of online abuse • Prior work is qualitative or on defensive techniques • We don't understand the scale or targets of problem • This work is the first quantitative, large scale measurement of doxing 2

  3. Outline • Problem area • Measurement methodology • Results and findings • Discussion and conclusions 3

  4. Outline • Problem area • Measurement methodology • Results and findings • Discussion and conclusions 4

  5. What is Doxing? (1/2) • Method of targeted online abuse • Attackers compile sensitive information about the target • Personal : Name, addresses, age, photographs, SSN • Relationships : Family members, partners, friends • Financial : Work history, investments, CCN • Online : Email, social network accounts, passwords, IPs 5

  6. What is Doxing? (2/2) • Information is compiled into plain text files • Released "anonymously" • Text sharing sites (e.x. pastebin.com, skidpaste.com) • Online forums (e.x. 4chan, 8chan) • Torrents • IRC, Twitch, social networks, etc. 6

  7. ==================================================== Full Name: █████ ██████ Aliases> ████████████ Age: ██ DOB: ██ / ██ / ████ Address: ██ ███████ █████ ███████████ , ███████ ██████ // Confirmed Mobile Number: + █ ( ███ ) ███ - ████ // Confirmed Email: ██████████ @ ███████ . ███ // Confirmed Illness: Asthma ==================================================== ISP Records> ISP: Rogers Cable // Previous IP Address: ███ . ███ . ███ . ███ // Previous ==================================================== Parental Information> Father: █ █ ██████ Age: ██ 7

  8. Aliases) ███████████ , ███████████ , █████ Name) ██████ ████ DOB █ / ██ / ██ Address) ██ █ ████ █ , ██████ , ██ █████ Cell Phone) ███ - ███ - ████ – Sprint, Mobile Caller ID) ██████ ████ Old Home Phone) ███ - ███ - ████ – CenturyLink, Landline Last 4 of Mastercard) ████ Emails) ██████████████ @ █████ . ███ , ████████ @ █████ . ███ Snapchat) ███████████ Twitter) @ ███████████ Facebook) https://facebook.com/ █████████ , ███████████ Skype) █████████ , ████████ 8

  9. Doxing Harms 9

  10. Frequency, Targets and Effects • Prior work is based in qualitative or preventative / risk management approaches • Research Questions: 1. How frequently does doxing happen? 2. What information is shared in doxes? Who is targeted? 3. What is knowable about the large scale e ff ects and harms? 4. Are anti-abuse tools e ff ective? 10

  11. Outline • Problem area • Measurement methodology • Results and findings • Discussion and conclusions 11

  12. Steps to Protect Victims • Worked closely with IRBs; multiple rounds of study design • Only recorded publicly available data, careful to not use it to record data • Careful data storage / analysis methods: only recorded high level summary data • Data protection best practices (key based encryption, single data store, strict access controls) 12

  13. General Measurement Strategy • Find places online where doxes are frequently shared • Train a classifier to determine how much activity is doxing • Measure extracted doxes to determine contained information • Watch the OSN accounts of doxing victims for abuse 13

  14. Dox Collection Pipeline 4Chan 4Chan 8Ch 8Ch Pastebin pol b pol baphomet 1.45m • Fully automated 144k 138k 3.4k 512 Dox Classifier - Sec 3.1.2 • Single IP at the University 5,330 files 1.73m files Not Dox of Illinois at Chicago OSN Extractor - Sec 3.1.3 • Two recording periods: 5,330 files 748 117 328 127 345 245 • Summer of 2016 Dox De-Duplication - Sec 3.1.4 4,328 files 1,002 files Duplicate • Winter of 2016 Social Network Account Verifier & Scraper - Sec 3.1.5 552 Acct 228 Acct 305 Acct 200 Acct 14

  15. Text File Collection • Data recorded from 4Chan 4Chan 8Ch 8Ch Pastebin pol b pol baphomet • pastebin.com 1.45m 144k 138k 3.4k 512 • 4chan.org (pol, b) Dox Classifier - Sec 3.1.2 5,330 files 1.73m files Not Dox • 8ch.net (pol, baphomet) OSN Extractor - Sec 3.1.3 5,330 • Selected because: files 748 117 328 127 345 245 Dox De-Duplication - Sec 3.1.4 • "Original" sources of 4,328 files 1,002 files Duplicate doxes Social Network Account Verifier & Scraper - Sec 3.1.5 • Anecdotal reputation for 552 Acct 228 Acct 305 Acct 200 Acct doxing 15

  16. Text File Classification 4Chan 4Chan 8Ch 8Ch Pastebin pol b pol baphomet • Scikit-learn, 1.45m 144k 138k 3.4k 512 TfidfVectorizer, Dox Classifier - Sec 3.1.2 5,330 SGDClassifier files 1.73m files Not Dox OSN Extractor - Sec 3.1.3 • Training Data: 5,330 files 748 117 328 127 345 245 • Manual labeling of Dox De-Duplication - Sec 3.1.4 Pastebin crawl 4,328 files 1,002 files Duplicate • "proof-of-work" sets Social Network Account Verifier & Scraper - Sec 3.1.5 552 Acct 228 Acct 305 Acct 200 Acct 16

  17. Text File Classification Label Precision Recall # Samples Dox 0.81 0.89 258 Not 0.99 0.98 3,546 Avg / Total 0.98 0.98 3,804 17

  18. Social Networking Account Extractor 4Chan 4Chan 8Ch 8Ch Pastebin pol b pol baphomet 1.45m 144k 138k 3.4k 512 • Extract social networking Dox Classifier - Sec 3.1.2 accounts 5,330 files 1.73m files Not Dox • Custom, heuristic-based OSN Extractor - Sec 3.1.3 5,330 identifier files 748 117 328 127 345 245 • Evaluated on 125 labeled Dox De-Duplication - Sec 3.1.4 4,328 files doxes 1,002 files Duplicate Social Network Account Verifier & Scraper - Sec 3.1.5 552 Acct 228 Acct 305 Acct 200 Acct 18

  19. Social Networking Account Extractor % Doxes Including Extractor Accuracy Instagram 11.2 95.2 Twitch 9.7 95.2 Google+ 18.4 90.4 Twitter 34.4 86.4 Facebook 48.0 84.8 YouTube 40.0 80.0 19

  20. Dox De-duplication 4Chan 4Chan 8Ch 8Ch Pastebin pol b pol baphomet • Similar doxes, identical 1.45m 144k 138k 3.4k 512 target Dox Classifier - Sec 3.1.2 5,330 files • Hash based comparison 1.73m files Not Dox fragile to marginal updates OSN Extractor - Sec 3.1.3 5,330 files • Compare referenced OSN 748 117 328 127 345 245 accounts Dox De-Duplication - Sec 3.1.4 4,328 files 1,002 files • ~14.2% of doxes were Duplicate Social Network Account Verifier & Scraper - Sec 3.1.5 duplicates 552 Acct 228 Acct 305 Acct 200 Acct 20

  21. Social Network Status Watcher 4Chan 4Chan 8Ch 8Ch Pastebin pol b pol baphomet • Repeatedly visit 1.45m 144k 138k 3.4k 512 referenced OSN accounts Dox Classifier - Sec 3.1.2 5,330 files • After 1, 2, 3, 7, 14… days 1.73m files Not Dox OSN Extractor - Sec 3.1.3 • Only record the status of 5,330 files 748 117 328 127 345 245 the account: Dox De-Duplication - Sec 3.1.4 • public, private, inactive 4,328 files 1,002 files Duplicate • Single IP @ UIC Social Network Account Verifier & Scraper - Sec 3.1.5 552 Acct 228 Acct 305 Acct 200 Acct 21

  22. Manual Dox Labeling • Randomly selected 464 doxes • Manually label each dox to understand the contents. • Did it include name, address, phone #, email, etc.? • Age and gender of the target (if included) • Categorization of the victim • Categorization of the motive of attacker 22

  23. Collection Statistics Study Period Summer 2016 Winter 2016-17 Combined Text Files 484,185 1,253,702 1,737,887 Recorded Classified as Dox 2,976 2,554 5,530 Doxes w/o 2,326 2,202 4,528 Duplicates Manually Labeled 270 194 464 23

  24. Outline • Problem area • Measurement methodology • Results and findings • Discussion and conclusions 24

  25. Outline • Results and findings • Doxing targets • Doxing perpetuators • E ff ects on social networks 25

  26. Doxing Targets 26

  27. Victim Demographics Min Age 10 years old • Taken from the 464 manually labeled doxes Max Age 74 years old • Only based on data in Mean Age 21.7 years old doxes Gender, Female 16.3% • Careful to avoid further Gender, Male 82.2% harm (e.g. not taking demographic data from Gender, Other 0.4% OSN accounts) 64.5% 
 Located in USA (of 300 files that 
 included address) 27

  28. Types of Data in Doxes Frequently Occurring Data Highly Sensitive Data Category # of Doxes % of Doxes* Category # of Doxes % of Doxes* Address School 422 90.1% 48 10.3% Phone # ISP 284 61.2% 100 21.6% Family Info Passwords 235 50.6% 40 8.6% Criminal Email 249 53.7% 6 1.3% Record Zip Code CCN 227 48.9% 20 4.3% Date of SSN 155 33.4% 10 2.6% Birth *All numbers from 464 manually labeled doxes 28

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend