Fifteen Minutes of Unwanted Fame: Detecting and Characterizing - - PowerPoint PPT Presentation

fifteen minutes of unwanted fame detecting and
SMART_READER_LITE
LIVE PREVIEW

Fifteen Minutes of Unwanted Fame: Detecting and Characterizing - - PowerPoint PPT Presentation

Fifteen Minutes of Unwanted Fame: Detecting and Characterizing Doxing Peter Snyder* Periwinkle Doerfler + Chris Kanich* Damon McCoy + * + 1 Overview Doxing is a target form of online abuse Prior work is qualitative or


slide-1
SLIDE 1

Fifteen Minutes of Unwanted Fame:
 Detecting and Characterizing Doxing

Peter Snyder* – Periwinkle Doerfler+ – Chris Kanich* – Damon McCoy+

1

*

+

slide-2
SLIDE 2

Overview

  • Doxing is a target form of online abuse
  • Prior work is qualitative or on defensive
  • We don't understand scope and targets of problem
  • This work is the first qualitative, large scale measurement
  • f doxing

2

slide-3
SLIDE 3

Outline

  • Problem area
  • Measurement methodology
  • Results and findings
  • Discussion and conclusions

3

slide-4
SLIDE 4

Outline

  • Problem area
  • Measurement methodology
  • Results and findings
  • Discussion and conclusions

4

slide-5
SLIDE 5

What is Doxing? (1/2)

  • Method of targeted online abuse
  • Attackers compile sensitive information about the target
  • Personal: Name, addresses, age, photographs, SSN
  • Relationships: Family members, partners, friends
  • Financial: Work history, investments, criminal history
  • Online: Email, social network accounts, passwords, IPs

5

slide-6
SLIDE 6

What is Doxing? (2/2)

  • Information is compiled into plain text files
  • Released "anonymously"
  • Text sharing sites (e.x. pastebin.com, skidpaste.com)
  • Online forums (e.x. 4chan, 8chan)
  • Torrents
  • IRC, Twitch, social networks, etc.

6

slide-7
SLIDE 7

==================================================== Full Name: █████ ██████ Aliases> ████████████ Age: ██ DOB: ██/██/████ Address: ██ ███████ █████ ███████████, ███████ ██████ // Confirmed Mobile Number: +█ (███) ███-████ // Confirmed Email: ██████████@███████.███ // Confirmed Illness: Asthma ==================================================== ISP Records> ISP: Rogers Cable // Previous IP Address: ███.███.███.███ // Previous ==================================================== Parental Information> Father: █ █ ██████ Age: ██

7

slide-8
SLIDE 8

Aliases) ███████████, ███████████, █████ Name) ██████ ████ DOB █/██/██ Address) ██ █ ████ █, ██████, ██ █████ Cell Phone) ███-███-████ – Sprint, Mobile Caller ID) ██████ ████ Old Home Phone) ███-███-████ – CenturyLink, Landline Last 4 of Mastercard) ████ Emails) ██████████████@█████.███, ████████@█████.███ Snapchat) ███████████ Twitter) @███████████ Facebook) https://facebook.com/█████████, ███████████ Skype) █████████, ████████

8

slide-9
SLIDE 9

Doxing Harms

9

slide-10
SLIDE 10

Frequency, Targets
 and Effects

  • Prior work is based in qualitative or preventative / risk

management approaches

  • Research Questions:
  • 1. How frequently does doxing happen?
  • 2. What information is shared in doxes? Who is targeted?
  • 3. What is knowable about the large scale effects and

harms?

  • 4. Are anti-abuse tools effective?

10

slide-11
SLIDE 11

Outline

  • Problem area
  • Measurement methodology
  • Results and findings
  • Discussion and conclusions

11

slide-12
SLIDE 12

General Measurement Strategy

  • Find places online where doxes are frequently shared
  • Train a classifier to determine how much activity is doxing
  • Measure extracted doxes to determine contained

information

  • Watch the OSN accounts of doxing victims for abuse

12

slide-13
SLIDE 13

Steps to Protect Victims

  • Worked closely with IRBs; multiple rounds of study design
  • Only recorded publicly available data
  • Careful data storage / analysis methods: only recorded

high level summary data

  • Data protection best practices (key based encryption,

single data store, strict access controls)

13

slide-14
SLIDE 14

1.73m files

Not Dox

1,002 files

Duplicate

4,328 files 5,330 files Dox De-Duplication - Sec 3.1.4

748 345 245 117 328 127

5,330 files

138k

4Chan

b

144k

4Chan

pol

3.4k

8Ch

pol

512

8Ch

baphomet

Dox Classifier - Sec 3.1.2

1.45m

Pastebin OSN Extractor - Sec 3.1.3 Social Network Account Verifier & Scraper - Sec 3.1.5

552 Acct 228 Acct 305 Acct 200 Acct

Dox Collection Pipeline

  • Fully automated
  • Single IP at the University of

Illinois at Chicago

  • Two recording periods:
  • Summer of 2016
  • Winter of 2016

14

slide-15
SLIDE 15

1.73m files

Not Dox

1,002 files

Duplicate

4,328 files 5,330 files Dox De-Duplication - Sec 3.1.4

748 345 245 117 328 127

5,330 files

138k

4Chan

b

144k

4Chan

pol

3.4k

8Ch

pol

512

8Ch

baphomet

Dox Classifier - Sec 3.1.2

1.45m

Pastebin OSN Extractor - Sec 3.1.3 Social Network Account Verifier & Scraper - Sec 3.1.5

552 Acct 228 Acct 305 Acct 200 Acct

Text File Collection

  • Data recorded from
  • pastebin.com
  • 4chan.org (pol, b)
  • 8ch.net (pol, baphomet)
  • API and scrapers
  • Selected because:
  • "Original" sources of doxes
  • Anecdotal reputation for toxic

behavior / doxing activity

15

slide-16
SLIDE 16

1.73m files

Not Dox

1,002 files

Duplicate

4,328 files 5,330 files Dox De-Duplication - Sec 3.1.4

748 345 245 117 328 127

5,330 files

138k

4Chan

b

144k

4Chan

pol

3.4k

8Ch

pol

512

8Ch

baphomet

Dox Classifier - Sec 3.1.2

1.45m

Pastebin OSN Extractor - Sec 3.1.3 Social Network Account Verifier & Scraper - Sec 3.1.5

552 Acct 228 Acct 305 Acct 200 Acct

Text File Classification

  • Scikit-Learn, TfidfVectorizer,

SGDClassifier

  • Training Data:
  • Manual labeling of Pastebin

crawl

  • "proof-of-work" sets

16

slide-17
SLIDE 17

Text File Classification

Label Precision Recall # Samples Dox 0.81 0.89 258 Not 0.99 0.98 3,546 Avg / Total 0.98 0.98 3,804

17

slide-18
SLIDE 18

1.73m files

Not Dox

1,002 files

Duplicate

4,328 files 5,330 files Dox De-Duplication - Sec 3.1.4

748 345 245 117 328 127

5,330 files

138k

4Chan

b

144k

4Chan

pol

3.4k

8Ch

pol

512

8Ch

baphomet

Dox Classifier - Sec 3.1.2

1.45m

Pastebin OSN Extractor - Sec 3.1.3 Social Network Account Verifier & Scraper - Sec 3.1.5

552 Acct 228 Acct 305 Acct 200 Acct

Social Networking
 Account Extractor

  • Extract social networking

accounts

  • Custom, heuristic-based

identifier

  • Example:
  • Facebook:https://facebook.com/example
  • FB example
  • fbs: example - example2 - example3
  • facebooks; example and example2
  • Evaluated on 125 labeled

doxes

18

slide-19
SLIDE 19

Social Networking
 Account Extractor

% Doxes Including Extractor Accuracy Instagram 11.2 95.2 Twitch 9.7 95.2 Google+ 18.4 90.4 Twitter 34.4 86.4 Facebook 48.0 84.8 YouTube 40.0 80.0

19

slide-20
SLIDE 20

1.73m files

Not Dox

1,002 files

Duplicate

4,328 files 5,330 files Dox De-Duplication - Sec 3.1.4

748 345 245 117 328 127

5,330 files

138k

4Chan

b

144k

4Chan

pol

3.4k

8Ch

pol

512

8Ch

baphomet

Dox Classifier - Sec 3.1.2

1.45m

Pastebin OSN Extractor - Sec 3.1.3 Social Network Account Verifier & Scraper - Sec 3.1.5

552 Acct 228 Acct 305 Acct 200 Acct

Dox De-duplication

  • Similar doxes, identical target
  • Hash based comparison

fragile to marginal updates

  • Compare referenced OSN

accounts

  • ~14.2% of doxes were

duplicates

20

slide-21
SLIDE 21

1.73m files

Not Dox

1,002 files

Duplicate

4,328 files 5,330 files Dox De-Duplication - Sec 3.1.4

748 345 245 117 328 127

5,330 files

138k

4Chan

b

144k

4Chan

pol

3.4k

8Ch

pol

512

8Ch

baphomet

Dox Classifier - Sec 3.1.2

1.45m

Pastebin OSN Extractor - Sec 3.1.3 Social Network Account Verifier & Scraper - Sec 3.1.5

552 Acct 228 Acct 305 Acct 200 Acct

Dox De-duplication

  • Repeatedly visit referenced

OSN accounts

  • After 1, 2, 3, 7, 14… days
  • Only record the status of the

account:

  • public
  • private
  • inactive
  • Single IP @ UIC

21

slide-22
SLIDE 22

Manual Dox Labeling

  • Randomly selected 464 doxes
  • Manually label each dox to understand the contents.
  • Did it include name, address, phone #, email, etc.?
  • Age and gender of the target (if included)
  • Categorization of the victim
  • Categorization of the motive of attacker


("why I doxed this person…")

22

slide-23
SLIDE 23

Collection Statistics

Study Period Summer 2016 Winter 2016-17 Combined Text Files Recorded 484,185 1,253,702 1,737,887 Classified as Dox 2,976 2,554 5,530 Doxes w/o Duplicates 2,326 2,202 4,528 Manually Labeled 270 194 464

23

slide-24
SLIDE 24

Outline

  • Problem area
  • Measurement methodology
  • Results and findings
  • Discussion and conclusions

24

slide-25
SLIDE 25

Outline

  • Results and findings
  • Doxing targets
  • Doxing perpetuators
  • Effects on social networks

25

slide-26
SLIDE 26

Doxing Targets

26

slide-27
SLIDE 27

Victim Demographics

  • Taken from the 464

manually labeled doxes

  • Only based on data in

doxes

  • Harm prevention steps


(e.g. not taking demographic data from OSN accounts)

Min Age 10 years old Max Age 74 years old Mean Age 21.7 years old Gender, Female 16.3% Gender, Male 82.2% Gender, Other 0.4% Located in USA 64.5%


(of 300 files that
 included address)

27

slide-28
SLIDE 28

Types of Data in Doxes

Category # of Doxes % of Doxes* Address 422 90.1% Phone # 284 61.2% Family Info 235 50.6% Email 249 53.7% Zip Code 227 48.9% Date of Birth 155 33.4%

Frequently Occurring Data Highly Sensitive Data *All numbers from 464 manually labeled doxes

Category # of Doxes % of Doxes* School 48 10.3% ISP 100 21.6% Passwords 40 8.6% Criminal Record 6 1.3% CCN 20 4.3% SSN 10 2.6%

28

slide-29
SLIDE 29

Doxing Victims by Community

  • Categorization of victim based on listed OSN accounts
  • 16.2% of victims categorizable into 3 categories

Category Criteria # of Labeled % of Labeled Hacker 2 or more OSN accounts on hacking sites
 (e.g. hackforums.net) 17 3.7% Gamer 2 or more OSN accounts on gaming sites
 (e.g. twitch.tv, minecraftforum.net) 53 11.4% Celebrity Labelers recognized target independent of doxing
 (e.g. Donald Trump, Hillary Clinton) 5 1.1% Total 75 16.2%

29

slide-30
SLIDE 30

Doxing Perpetrators

30

slide-31
SLIDE 31

Doxer Motivations

  • Categorization of doxers based on "why I did it" suffixes
  • 28.4% of dox motivations categorizable into 4 categories

Category Criteria # of Labeled % of Labeled Competitive Demonstrating attacker's capabilities /
 victim's weaknesses 7 1.5% Revenge Because of doxee's actions against doxer
 (e.g. "you cheated in counterstrike.") 52 11.2% Justice Because of doxee's actions against third party
 (e.g. "you ripped off my friend") 68 14.7% Political Because of larger political goal
 (attacking KKK members or child pornographers) 5 1.1% Total 132 28.4%

31

slide-32
SLIDE 32

Doxer Networks

  • Looked for doxer networks based on "credit lines"
  • ex: "by Alice and @Bob, thx to Charlie (@Charlie for SSN)"
  • 251 aliases given, 213 twitter handles
  • Undirected graph from doxes and twitter network

32

slide-33
SLIDE 33

Doxer Networks

  • 61 (of 251) aliases appear in cliques of 4 or more
  • 34 Twitter accounts were private

33

slide-34
SLIDE 34

Harms from Doxing

34

slide-35
SLIDE 35

Effects on OSN Accounts

  • 1. Are OSN accounts that are doxed more likely to become

more private?

  • 2. Does abuse filtering reduce the impact of doxing on OSN

accounts?

35

Fall 2016
 Facebook and Instagram
 add abuse filtering Summer 2016
 First recording
 period Winter 2016
 Second recording
 period

slide-36
SLIDE 36

Effects on OSN Accounts

  • Measure changes in status
  • f OSN accounts after

appearing in dox

  • Public, private, inactive
  • Compared against 13,392

randomly selected Instagram accounts

Social Network # Doxes % Doxes Facebook 983 17.8% Google+ 405 7.3% Twitter 449 8.1% Instagram 418 7.5% YouTube 316 5.7% Twitch 185 3.4%

36

slide-37
SLIDE 37

Doxed vs. Non-Doxed Accounts

Account Condition % More Private % More
 Public % Any
 Change Total # Instagram default 0.1 0.1 0.2 13,392 Instagram doxed,
 pre-filtering 17.2 8.1 32.2 87 Instagram doxed,
 post-filtering 5.7 1.4 9.9 141 Facebook doxed, pre-filtering 22.0 2.0 24.6 191 Facebook doxed,
 post-filtering 3.0 <0.1 3.3 361

37

slide-38
SLIDE 38

Doxed vs. Non-Doxed Accounts

Account Condition % More Private % More
 Public % Any
 Change Total # Instagram default 0.1 0.1 0.2 13,392 Instagram doxed,
 pre-filtering 17.2 8.1 32.2 87 Instagram doxed,
 post-filtering 5.7 1.4 9.9 141 Facebook doxed, pre-filtering 22.0 2.0 24.6 191 Facebook doxed,
 post-filtering 3.0 <0.1 3.3 361

37

slide-39
SLIDE 39

Doxed vs. Non-Doxed Accounts

Account Condition % More Private % More
 Public % Any
 Change Total # Instagram default 0.1 0.1 0.2 13,392 Instagram doxed,
 pre-filtering 17.2 8.1 32.2 87 Instagram doxed,
 post-filtering 5.7 1.4 9.9 141 Facebook doxed, pre-filtering 22.0 2.0 24.6 191 Facebook doxed,
 post-filtering 3.0 <0.1 3.3 361

37

slide-40
SLIDE 40

Facebook accounts that changed status, Pre-filtering (22.5%) Facebook accounts that changed status, Post-filtering (1.7%)

Facebook Statues
 after Doxing

38

slide-41
SLIDE 41

Facebook accounts that changed status, Pre-filtering (22.5%) Facebook accounts that changed status, Post-filtering (1.7%)

Facebook Statues
 after Doxing

38

slide-42
SLIDE 42

Facebook accounts that changed status, Pre-filtering (22.5%) Facebook accounts that changed status, Post-filtering (1.7%)

Facebook Statues
 after Doxing

38

slide-43
SLIDE 43

Facebook accounts that changed status, Pre-filtering (22.5%) Facebook accounts that changed status, Post-filtering (1.7%)

Facebook Statues
 after Doxing

38

slide-44
SLIDE 44

Facebook accounts that changed status, Pre-filtering (22.5%) Facebook accounts that changed status, Post-filtering (1.7%)

Facebook Statues
 after Doxing

38

slide-45
SLIDE 45

Instagram accounts that changed status, Pre-filtering (13.8%) Instagram accounts that changed status, Post-filtering (5.0%)

Instagram Statues
 after Doxing

39

slide-46
SLIDE 46

Outline

  • Problem area
  • Measurement methodology
  • Results and findings
  • Discussion and conclusions

40

slide-47
SLIDE 47
  • Notification of doxing victims


"Have I Been Pwned" style service

  • OSN Account protection


Notify social networks of doxing, for defenses

  • Anti-SWAT-ing List


Additional information for law enforcement to evaluates

  • Anti-Abuse Policies From Dox Distributing Sites


Working with Pastebin to increase automated takedowns

Using Data to Help Victims

41

slide-48
SLIDE 48

Take Aways

  • Automatic dox measurement and classification pipeline
  • 1.7m text files, 4,328 doxes, manual labeling of 464
  • First quantitative analysis of frequency, targets and

contents of doxing online

  • Measurement of harm of doxing, via OSN account

change

42