1
mohamed.bouguessa@uqo.ca/
Social/Network/Analysis mohamed.bouguessa@uqo.ca/ 1 Web/today 2 - - PowerPoint PPT Presentation
Social/Network/Analysis mohamed.bouguessa@uqo.ca/ 1 Web/today 2 Web/today//Diverse/applications 3 Web/today//Millions/of/users 4 Web/today//Rich/content 5 Web/today//Highly/dynamic/ 6 Web/today//Traces/of/activity 7
1
mohamed.bouguessa@uqo.ca/
2
3
4
5
6
7
8
Rich/interactions/ between/users/ and/content/
9 9
Rich interactions between users and content Modeled as interaction network
10
We#can#all#be#connected#through#a#series#of#six#contacts# appeals#to#me.#It#makes#the#world#seem#less#brutal,#and# more#warm#and#more#friendly.##
Six/degrees/of/separation
11
Six/degrees/of/separation
12
Testing/the/smallGworld/hypothesis
MSN Messenger Average path length is 6.6 90% of nodes is reachable <8 steps Network of who talks to whom on MSN Messenger: 240M nodes, 1.3 billion edges
13
Why/study/networks?
– How#users#create#content#and#interact#with#it#and# among#themselves?#
– How#to#design#better#services#and#algorithms?#
14
(directly#or#indirectly)#to#each#other#through#a#common# relation#or#interest.#
networks#to#understand#their#structure#and#behavior.#
Social/Networks/Analysis
15
Social/Networks
interacting#units.#
16
Social/Networks
Interacting unites: Actors / nodes discrete individual, corporate, or collective social units
17
Relational ties between actors are channels to transfer, exchange or flow of resources. Relations, linkages or ties
Social/Networks
18
Social/Networks
– Adjacency#matrix##(socioGmatrix)# – Graph#(SocioGgraph)#
! ! ! ! ! ! ! ! ! ! ! ! " # $ $ $ $ $ $ $ $ $ $ $ $ % & 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[ ]
9 8 7 6 5 4 3 2 1
! ! ! ! ! ! ! ! ! ! ! ! " # $ $ $ $ $ $ $ $ $ $ $ $ % & 9 8 7 6 5 4 3 2 1
19
–#Social#Interaction# –#Knowledge#Exchange# –#Knowledge#Discovery#
–#different#about#various#types#of#social#interactions# –#at#a#very#fine#granularity# –#with#practically#no#reporting#bias#
Data/mining/techniques/can/be/used/for/building/ descriptive/and/predictive/models/of/social/interactions/
Key/Drivers/for/CS/Research/in/SNA
20
SNA/Techniques
Prominent/problems/
21
network#data#on#the#web#
threads#etc.)#
network#service#websites#such#as#Orkut,#Friendster#and#MySpace)#
W e b D
u m e n t s C
m u n i c a t i
L
s P r
i l e _ 3 P r
i l e _ 1 P r
i l e _ 5 P r
i l e _ 4 P r
i l e _ 2 A c t
p r
i l e s
a S
i a l N e t w
k S e r v i c e W e b D
u m e n t s C
m u n i c a t i
L
s C
m u n i c a t i
L
s P r
i l e _ 3 P r
i l e _ 1 P r
i l e _ 5 P r
i l e _ 4 P r
i l e _ 2 A c t
p r
i l e s
a S
i a l N e t w
k S e r v i c e P r
i l e _ 3 P r
i l e _ 1 P r
i l e _ 5 P r
i l e _ 4 P r
i l e _ 2 A c t
p r
i l e s
a S
i a l N e t w
k S e r v i c e
Social/Network/Extraction
22
Social/Network/Extraction
– Asking#people#about#their#relations# – Tracking#their#contacts#(emails,#phone#call,# visits,#etc.)#such#as#Enron#project# – Mining#their#contextual#data#(papers,# interviews,#resumes,#news,##biographies,# citations,#references,#web#pages,#blogs,# portfolios,#etc.)#!#Learning#social#network#
23
Learning/Social/Networks
– Descriptive#vs.#Predictive#model# – We#only#predict#the#possible#relations# between#the#actors#
24
Learning/Social/Networks
Usually,#we#can#reach#documents#by# knowing#people…#
#
25
Learning/Social/Networks
…and#directly#or#indirectly#we#will#know#
through#these#documents…# # #
26
…and#very#soon#we#will#have#a#social#network# including#some#individuals#who#have#been# connected#to#each#other#via#some#similar# contents.# #
Learning/Social/Networks
27
SNA/Techniques
Prominent/problems/
28
Identifying/prominent/expert/actors/ in/social/networks/
Link/Analysis/Technique/
29
number#of#edges#pointing#towards#it.#
number#of#nodes.#
Hubs/and/Authorities
30
→
=
p q
q h p a ) ( ) (
→
=
q p
q a p h ) ( ) (
Hubs/and/Authorities
31
PageRank#values#evenly#to#all#the#nodes#it#connects#to.##
high.###
and#that#where#a#node#has#a#few#highly#ranked#inGlinks.#
Google’s/PageRank
32
How/is/PageRank/calculated?/
! " # $ % & + + + − = ) ( ) ( ) 1 ( ) 1 ( * ) 1 ( ) ( Tn C Tn PR T C T PR d d A PR
C(Ti):#the#number#of#OutGlinks#of#the#page/node#Ti#
That's#the#equation#that#calculates#a#page's#PageRank.#It's#the#
developed,#and#it#is#probable#that#Google#uses#a#variation#of#it# but#they#aren't#telling#us#what#it#is.#It#doesn't#matter#though,#as# this#equation#is#good#enough.##
Google’s/PageRank
33
PR(A)#=#PR(B)#=#PR(C)#=PR(D)#=1# PR(A)#>#PR(B)#>#PR(C)#>#PR(D)##
Google’s/PageRank
34
A B C D A B C D E A B C D E A B C D E
Google’s/PageRank
35
Yahoo!/Answers/:/Identifying/the/expert/
User_x User_y User_z User Votes
Question
Answer_1
User_1
Answer_2
User_2
Answer_n
User_n
service#
36
Yahoo!/Answers
37 37
Question/Life/Cycle
38
3 4 1 2 1 4 1 2 1 3 2 4 1 2 1 3 5
Users who usually only ask questions Users who usually only answer questions Users who help each other
Example#of#interactions#between#askers#and#best#answerers##
Yahoo!/Answers
How$to$es(mate$the$authority$degree$for$each$user?
39
PageRank?
Example:#The#category#of#“Programming”/
" #Is#it#possible#to#state#that#C#is#more#expert#than#B?##
A B C
JAVA PHP
40
HITS?
nodes#1,#2,#10,#11#and#12,#but#a#nearGzero#authority#score# to#node##
41
HITS?
while#giving#zero#authority#score#to#node#N1.##
its#hub#score.#Hence,#causing#the#nodes#N9–N15#to#receive#higher#authority# scores.#However,#intuition#suggests#that#node#N1#is#the#most#authoritative# since#it#represents#an#answerer#with#a#large#number#of#best#answers.#
42
Proposed/Approach
answer#of#each#users#normalized#so#their#square#sum#to#1:##
=
N i i
y
1 2
1 ) (
category.##
" #We#are#interested#in#all#sets#of#Ui#having#large#values#of#yi.###
43
Authority/Score
44
Authority/Score
45
Automatic/Identification/of/Authorities
46
Experiments
We#conduct#experiments#on#datasets#which#represent#users’# activities#over#one#full#year#for#six#categories:##
Category % users who ask only % users who answer only % users who ask and answer Engineering 65% 31% 4% Biology 60% 36% 4% Programming 66% 29% 5% Mathematics 64% 31% 5% Physics 60% 34% 6% Chemistry 63% 32% 5%
47
Authoritative/Users
48
Quality/of/Content
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Biology Chemistry Engineering Mathematics Physics Programming Category Average quality score
content#in#Yahoo!#Answers.##
49
SNA/Techniques
Prominent/problems/
50
Existing/Approaches
perform#spam#classification.#
characterize#the#activities#of#each#user#in#the# network.#
learn#the#behavior#of#spammers#and# legitimate#users.#
51
Why/new/approach?
" Existing#approaches#suffer#from#their# dependency#on#the#training#data.# " Robust#learning#approach#for#spammer# detection#often#requires#large#amounts#of# labeled#training#data.# " The#problem#here#is#that#labeled#samples# are#more#difficult,#expensive#and#time# consuming#to#obtain#than#unlabeled#ones.#
52
Why/new/approach?
challenges#facing#current#spammer#detection# methods#in#social#networks.#
" #There#is#good#reason#to#focus#on# unsupervised#approaches#to#explore#the# vast#amount#of#unlabeled#data#available#at# low#cost.#
53
Proposed/Approach
to#discriminate#between#spammer#and# legitimate#users.#
score#within#a#network#
based#on#the#beta#mixture#model#to#identify# spammers##
54
Interactions/in/Social/Networks
similar#pattern#such#that#when#a#user#receives#a# message#from#a#legitimate#sender,#he/she#will,# generally,#reply#to#this#message.#
receive#few#or#no#replies#back#from#their# recipients.#
55
Communication/Reciprocity
message#from#a#user#ui$
message#to#ui.#
56
Communication/Reciprocity
response#from#each#of#its#neighbors.# " Legitimate#accounts#have#a#much#higher# probability#of#being#responded#to.# " spammers#have#a#very#low#response#rate## # CR#value#close#to#zero#
57
The/Probabilistic/Model
coming#from#several#underlying#probability# distributions.#
mixture#model#representing#users#with#close# CR#values#
comprehensive#model#by#a#mixture#form.#
58
model#due#to#its#shape#flexibility#
The/Probabilistic/Model
59
The/Probabilistic/Model
a#mixture#density#of#the#form#
given#by#
60
The/Probabilistic/Model
components?# Maximum#likelihood#technique#
61
The/Probabilistic/Model
parameters#of#the#mixture#for#a#range#of#values#
the#value#of#m#which#optimizes#the#criterion.## ## ##
62
The/Probabilistic/Model
Integrated/Classification/Likelihood/// Bayesian/Information/Criteria/(ICLGBIC)/
63
Procedure
64
Experiments/:/Data/Sets
network#of#the#University#at#Rovira#i#Virgili#(URV)#in# Tarragona,#Spain# " 1,133#users#(faculty,#reserachers,#etc.)# " all#legitimate#users# " the#URV#email#network#icludes#no#spam#sender# " Transactions/from/spammers/are/therefore/ simulated/by/generating/mock/spam/accounts/ in/the/URV/email/network/data/
65
Data/Sets
spam#accounts#in#order#to#inject#spam#traffic#into# the#URV#data#set#of#legitimate#senders.## " Data200,#Data400,#Data600,#Data800,# Data1000,#and#Data1200,#containing#200,#400,# 600,#800,#1,000#and#1,200#spam#accounts# respectively,#
each#set.#
66
Comparison
supervised#learning#approach#for#detecting# spammers#on#email#social#networks.## " MailNet#first#extracts#a#number#of#features#from# each#user#in#the#email#social#network.## " Then,#support#vector#machine#learning#is#used# to#perform#spam#classification#
67
Results
which#are#close#to#zero.##
that#they#sent).##
replies#to#most#of#the#messages#they#send#
68
Results
69
Results
the#SVMGbased#approach#MailNet.#
unlabeled#data#G#a#considerable#practical# advantage#for#realGworld#applications#in# which#class#labels#are#not#available.#
70
Application/to/Yahoo!/Answers
71 71
Question/Life/Cycle
72
capability#where#users#can#connect#and#make# friends#with#other#members#who#share#similar# interests.#
consisting#of#167,455#Yahoo!#Answers#users# accounts,#using#our#approach#to#identify# spammers#
Yahoo!/Answers/Social/Network/
73
Results
" #we#found#that#32,724#accounts#could#be# classified#as#spam.#
74
Quality/of/Content
by#users#identified#as#spammers#and#compared# it#to#the#quality#of#content#generated#by#users# classified#as#nonGspammers.#
relatively#goodGquality#content##
75
generated#by#the#identified#authoritative#users#we#use# the#algorithm#of#Agichtein#et#al.#(WSDM#2008).#
textual#content#with#the#user#feedback#on#the#site#in#
answer.##
classifier#trained#on#high#and#low#quality#examples.##
value#of#the#quality#score#is#close#to#1##
Quality/of/Content
76
Quality/of/Content
quality#content#than#do#spammers.#
77
SNA/Techniques
Prominent/problems/
78
Community/Structure///// in/Social/Network
Non-Sybil Region Sybil Region
79
Graph/Clustering
80
Algorithms+based+on+Czekanovski5Dice+Distance
( ) ( ) ( ) ( )
2 1 2 1 2 1 2 1 ) 2 , 1 ( S S S S S S S S N N dist + − =
Distance#between#two#nodes#
S1:#number#of#nodes#connected#to#N1#(including#N1)# S2:#number#of#nodes#connected#to#N2#(including#N2)# # Small#distance#$#High#similarity#
81
Czekanovski5Dice+Distance
N1 N2 N3 N4 N5 N6
S1#=#{N1,#N2,#N3}# S2#=#{N2,#N1,#N3}#
( ) ( ) ( ) ( )
3 3 3 3 2 1 2 1 2 1 2 1 ) 2 , 1 ( = + − = + − = S S S S S S S S N N dist
S3#=#{N3,#N1,#N2,#N4}# S4#=#{N4,#N3,#N5,#N6}#
( ) ( ) ( ) ( )
5 . 2 6 2 6 4 3 4 3 4 3 4 3 ) 4 , 3 ( = + − = + − = S S S S S S S S N N dist
82
Czekanovski5Dice+Distance
82
(a)#Graph# (b)#Smilarity# Matrix# (c)#Dendogramme# (d)#Clustering#
83
Applica9on
The#Santa#Fe#Institute#collaboration#network#
84
Applica9on
Enron#email#network##
85
Discovering+Knowledge5Sharing+ Communi9es+in+ + Ques9on5Answering+Forums
86
Knowledge5Sharing+Community
set#of#askers#and#authoritative#users.#
homogenous#behavior#in#terms#of#their# interactions#with#authoritative#users#than# elsewhere.#
community.#
87
Knowledge5Sharing+Community
Existing#graphGbased#community#detection# methods#are#not#appropriate#for#our#study.##
88
Example
a1 : e1, e2 a2 : e1, e2 a3 : e2, e3 a4 : e2, e3 a5 : e1, e2, e3 a6 : e1, e2, e3
89
Example
Modeling#users#interactions#as#a#graph##
90
The+GRACLUS+Algorithm
91
Modeling+Interac9ons+Between+Users
" #We#use#a#transactional#data#model#to#represent#the# interactions#between#askers#and#authoritative#users.#
92
#Boolean#representation#of#the#interaction#between# askers#and#authoritative#users.#
Illustra9on
93
The+TRANCLUS+Algorithm
summarizes#the#interactions#of#all#askers#ai$with#the# identified#authoritative#users.#
94
Problem+Defini9on
Given#the#set#A#of#askers#and#the#set#E#of# authoritative#users,##
###C=${C1,$C2,$…,$Cnc}## " #The#identified#clusters#represent#the#communities# we#want#to#discover.#
95
Criterion+Func9on
( )
( )
= ∈
# $ % & ' ( × =
nc s C e s s
s
e Z C e
n n C CF
1 3 2
) ( ) , ( 1 1 ) (
( )
1 ) , ( ) ( + − = TD e
n e Z
96
The+TRANCLUS+Scheme
97
Applica9on+to+Yahoo!+Answers
98
Content+Analysis
" #The#clustered#askers#tend#to#post#questions#on#closed# related#topics##
99
Influence/of/Social/Networks/on/Product/Recommendations/
A1 A2 . . . AN A1 A2 A3 … AN P1 P2 P3 … PM A1 A2 . . . AN
Recommendation System
Social Network Product Opinion
Emerging+Applica9on
100