walk2friends: Inferring Social Links from Mobility Profiles Yang - - PowerPoint PPT Presentation
walk2friends: Inferring Social Links from Mobility Profiles Yang - - PowerPoint PPT Presentation
walk2friends: Inferring Social Links from Mobility Profiles Yang Zhang joint work with Michael Backes, Mathias Humbert, and Jun Pang Location Privacy 4 spatial-temporal points can identify 95% of the individuals Mobility traces can be e
Location Privacy
- 4 spatial-temporal points can identify 95% of the individuals
- Mobility traces can be effectively de-anonymized
- You are where you go
- Demographics
- Social relations
Social Relation Privacy
- Social relations can be sensitive, e.g., office romance
- 17.2% -> 56.2% (Facebook users in New York)
- NSA’s co-traveler program
Predict whether two users are friends based on the locations they have visited
- Solution 1: common locations two users have visited
- Almost all data mining approaches take this way
- Location entropy
- Can’t work when two users share no common locations
- Solution 2: mobility profiles/features
- Summarize each user’s mobility profiles
- Friends share similar mobility profiles than strangers
- Feature engineering
- Tedious efforts and domain expert knowledge
- Time consuming
Every Single Time!!!
Representation Learning
- Learning features (representation/deep learning)
- Follow a general object (unsupervised)
- Graph representation learning (graph embedding)
- Preserve each user’s neighbors in a social network
- Mobility feature learning
Assumption: A user’s mobility neighbors can reflect his mobility profile/features
- Define each user’s mobility neighbors
- Learn mobility features/profiles
- Infer two users’ social relation
Mobility Neighbors
- A user’s mobility neighbors include
- Locations a user has visited
- Others who have visited similar locations and their locations
- Breadth first search
- Not considering the visiting frequencies
- Random walk sampling
Mobility Neighbors
Feature Learning
- Learn a function:
- Each node to predict it’s neighbors
- Softmax
arg max
θ : U → Rd p( | ; θ)·p( | ; θ)·p( | ; θ)· p( | ; θ)·p( | ; θ)· p( | ; θ)· p( | ; θ)· p( | ; θ)· p( | ; θ)·p( | ; θ)·p( | ; θ)· p( | ; θ)· p( | ; θ)· p( | ; θ)· p( | ; θ)·p( | ; θ)·p( | ; θ)· p( | ; θ)·p( | ; θ)·p( | ; θ)· p( | ; θ)
θ
p( | ; θ) · ·
Social Relation Inference
s( , ) = 0.9 s( , ) = 0.8 s( , ) = 0.6 s( , ) = 0.4 s( , ) = 0.3 s( , ) = 0.2
- Cosine similarity
- Unsupervised
- Predict any social relation
Evaluation: dataset
- Instagram users’ check-ins
- New York, Los Angeles and London
- Foursquare (location semantics)
- Social relations (two users follow each other)
Evaluation: ROC curve
Evaluation: distance metric
1ew YoUN Los Angeles London 0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80
A8C
CosLne EuclLdean CoUUelatLon CheEysheY BUay-CuUtLs CanEeUUa 0anhattan
Evaluation: baseline models
1ew YoUN Los AngeOes London 0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80
A8C
2uU appUoach common_p
- YeUOap_p
w_common_p w_oYeUOap_p aa_ent mLn_ent aa_p mLn_p geodLst w_geodLst pp dLYeUsLty w_fUequency peUsonaO
Evaluation: baseline models
1ew YoUN Los AngeOes London 0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80
A8C
2uU appUoach common_p
- YeUOap_p
w_common_p w_oYeUOap_p aa_ent mLn_ent aa_p mLn_p geodLst w_geodLst pp dLYeUsLty w_fUequency peUsonaO
Evaluation: hyperparameters
10 20 30 40 50 60 70 80 90 100
lw
0.70 0.72 0.74 0.76 0.78 0.80 0.82
A8C
1ew YoUN Los Angeles London 2 4 6 8 10 12 14 16 18 20
tw
0.70 0.72 0.74 0.76 0.78 0.80 0.82
A8C
1ew YoUN Los Angeles London 4 5 6 7 8
log2(d)
0.70 0.72 0.74 0.76 0.78 0.80 0.82
A8C
New YoUN Los Angeles London
Evaluation: check-in numbers
5 10 15 20 25 30
1umbeU of checN-Lns
0.71 0.74 0.77 0.80 0.83
A8C
1ew YoUN Los Angeles London
Evaluation: common locations
1 2 3 4
1umbeU of common locatLons
0.66 0.70 0.74 0.78 0.82
A8C
1ew YoUN Los Angeles London
Evaluation: geo-coordinates
10−3 10−2 10−1
GULG gUanulaULty (Ln GegUee)
0.55 0.62 0.69 0.76 0.83
A8C
1ew YoUN Los Angeles LonGon
Defense Mechanisms
- Hiding
- Delete certain proportion of check-ins
- Replacement
- Random walk to replace locations
Defense Mechanisms
- Generalization
- Geo-coordinate and location semantics
- MoMA -> art (40.76N, -73.97W)
- Recover location first
- art (40.76N, -73.97W) -> MoMA or Tom Otterness Frog?
Utility Metric
- Each user’s check-in distribution
- Both original and obfuscated
- Jensen-Shannon divergence
- Average over all users
Defense Evaluation
10 20 30 40 50 60 70 80 90
3URpRUtiRn Rf RbfuscatiRn (%)
0.52 0.56 0.60 0.64 0.68 0.72 0.76 0.80
A8C
Hiding 5HplacHPHnt (stHp 5) 5HplacHPHnt (stHp 15) 5HplacHPHnt (stHp 25) 5HplacHPHnt (stHp 35) 10 20 30 40 50 60 70 80 90
3URpRUtiRn Rf RbfuscatiRn (%)
0.00 0.20 0.40 0.60 0.80 1.00
8tility
Hiding 5HplacHPHnt (stHp 5) 5HplacHPHnt (stHp 15) 5HplacHPHnt (stHp 25) 5HplacHPHnt (stHp 35)
Defense Evaluation
Defense Evaluation
0.50 0.55 0.60 0.65 0.70 0.75 0.80
A8C
0.00 0.20 0.40 0.60 0.80 1.00
8tility
HiGing 5HplacHmHnt GHnHUalizatiRn
Conclusion
- A new social relation inference attack with mobility profiles
- Learning user profiles
- Unsupervised and predict any social relations
- Three general defense mechanisms
- Replacement and hiding outperform generalization