Link prediction The link prediction space is vast and imbalanced : - - PowerPoint PPT Presentation
Link prediction The link prediction space is vast and imbalanced : - - PowerPoint PPT Presentation
Link prediction The link prediction space is vast and imbalanced : real approaches focus only in the 2-hop social neighborhood, i.e., friends-of- friends . A new source of promising candidates for link prediction: the places visited by each user.
The importance of Place-Friends
32% 14% 16% 38%
Friends-of-friends only Friends-of-friends and place-friends Place-friends only Other
- We have analyzed four monthly
snapshots of Gowalla data containing user profiles, friends list and check-ins.
- We found that about 30% of
new links are added among “place-friends”, or users who check-in at the same places.
Reducing the link prediction space
FoF PF FoF and PF Complete 10M 100M 1B 10B Number of candidate pairs in the link prediction spaces: by focusing prediction efforts
- nly on place-friends (PF) and friends-of-friends (FoF) the prediction space can be
reduced by about 15 times, while still covering two-thirds of all new links.
The focus theory of social ties
“A focus is a social, psychological, physical or legal entity around which joint activities are organized. [...] individuals whose activities are organized around the same focus will tend to become interpersonally tied. [...] The structure of the social ties is dependent upon the constraint and the size of the underlying foci.”
Scott Feld, “The Focused Organization of Social Ties” American Journal of Sociology, Vol. 86, No. 5. (1981).
Physical places represent social foci and they correlate with the creation of social ties.
Place properties, focus theory and social ties
Fraction of check-ins at place k made by user i
1 2 3 4 5 6 7 8 9
Place entropy [bits]
10−4 10−3 10−2 10−1 100
Link probability
Snapshot 1 Snapshot 2 Snapshot 3 Snapshot 4
Ek = −
- i
qik log qik
A place where only a small number of regular users is likely to be a place with a significant importance for them, such as private houses, gyms, offices. A place with a sporadic check-ins made by several users is likely to be a public place without great significance to its visitors, such as touristic places, airports, train stations.
Place entropy
Prediction features
- Place features (place-friends only)
✦ (weighted) shared places ✦ (weighted) shared check-ins ✦ entropy of shared places ✦ popularity of shared places
- Social features (friends-of-friends only)
✦ shared friends ✦ Jaccard coefficient ✦ Adamic/Adar measure
- Global features (all pairs):
✦ geographic distance ✦ preferential attachment ✦ user activity
System design
We adopt a supervised learning approach to link prediction over three disjoint prediction sets:
- Social: links appearing only among friends-of-friends;
- Place: links appearing only among place-friends;
- Place-social: links appearing among friends-of-friends and place-friends
We adopt place features, social features and global features across the three prediction spaces; then we train our models on a set of data and we test them on future data.
Prediction performance: classifiers
0.5 0.625 0.75 0.875 1 Social Place Place-social AUC Model trees Random forest J48 Naive Bayes
Area under the ROC curve (AUC) for different classifiers on the three different prediction sets. Results obtained with 10- fold cross validation.
Prediction performance: temporal snapshots
0.8 0.85 0.9 0.95 1 Month 1 Month 2 Month 3 AUC Social Place Place-social Prediction performance in terms of AUC of model trees on the three separate prediction sets in each temporal snapshot: results
- btained by training on one