Tagvisor: A Privacy Advisor for Sharing Hashtags Yang Zhang Joint - - PowerPoint PPT Presentation

tagvisor a privacy advisor for sharing hashtags
SMART_READER_LITE
LIVE PREVIEW

Tagvisor: A Privacy Advisor for Sharing Hashtags Yang Zhang Joint - - PowerPoint PPT Presentation

Tagvisor: A Privacy Advisor for Sharing Hashtags Yang Zhang Joint work with Mathias Humbert, Tahleen Rahman, Cheng-Te Li, Jun Pang and Michael Backes #hashtag 2 #hashtag 3 #hashtag 4 #hashtag 5 #hashtag #like4like #foodporn


slide-1
SLIDE 1

Tagvisor: A Privacy Advisor for Sharing Hashtags

Yang Zhang

Joint work with Mathias Humbert, Tahleen Rahman, Cheng-Te Li, Jun Pang and Michael Backes

slide-2
SLIDE 2

#hashtag

2

slide-3
SLIDE 3

#hashtag

3

slide-4
SLIDE 4

#hashtag

4

slide-5
SLIDE 5

#hashtag

5

slide-6
SLIDE 6

#hashtag

6

#like4like #foodporn #tbt

slide-7
SLIDE 7

#hashtag

7

#privacy #locationprivacy

slide-8
SLIDE 8

#contributions

  • Attack: location inference with hashtags
  • Defense: Tagvisor, a privacy advisor to mitigate the

privacy threat by hashtags

8

slide-9
SLIDE 9

#dataset

  • Collected through Instagram’s APIs
  • New York, Los Angeles, and London
  • Hashtags + locations (check-ins)

9

slide-10
SLIDE 10

#attack

10

[1, 1, 1, 0] [0, 1, 1, 0] [1, 0, 0, 1]

  • Bag-of-words for feature representation
  • Random forest classifier
  • Multiple-class classification, e.g., 498 classes (locations) in New York
  • All posts are trained together

#a#b#c #b#c #a#d

slide-11
SLIDE 11

#attack

11

slide-12
SLIDE 12

#attack

12

slide-13
SLIDE 13

#tagvisor

  • A privacy advisor for sharing hashtags
  • Fool the attacker’s location inferencer (ML classifier)
  • Three defense mechanisms
  • Hiding
  • Replacement
  • Generalization (location category)
  • Utility: preserving the semantical meaning of hashtags

13

slide-14
SLIDE 14

#hiding

14

hide #a hide #b hide #c successful attack delete one hashtag (can be more) #a#b#c #b#c #a#c #a#b

slide-15
SLIDE 15

#utility

15

  • Semantical meaning
  • Skip-gram, aka word2vec
  • Skip-gram over all posts’ hashtags

#a: [3.1, 1.3] #b: [2.5, 1.9] #c: [4.0, 5.1] #a #b #c #a#b#c #a#c #a#b Hashtag vectors d1 d2 d1 d2 #a#b#c #a#c #a#b

slide-16
SLIDE 16

#replacement

16

  • Replace each hashtag with all the possible hashtag
  • Search space is too big
  • Bound to the most closest hashtags (with word2vec)
  • Reduce the search space
  • Semantical meaning can be preserved

successful attack #a#b#c

slide-17
SLIDE 17

#generalization

  • Location category from foursquare
  • #centralpark -> #park
  • Do not apply to all hashtags
  • e.g., #tbt #love

17

slide-18
SLIDE 18

#tagvisor

  • Check whether the post’s location is inferred correctly
  • If no, then publish
  • Else, consider the three defense mechanisms
  • Pick the hashtag set with the highest utility

18

slide-19
SLIDE 19

#tagvisor

19

Obfuscating 2 hashtags is enough!

Obfuscating bounded number of hashtags

slide-20
SLIDE 20

#conclusion

  • First location inference attack with hashtags
  • Sharing hashtags is not safe!!!
  • A privacy advisor to mitigate this risk
  • Minimal risk and maximal utility
  • Fit for the real-world setting

20

#thankyou

https://yangzhangalmo.github.io/ @yangzhangalmo