IR: Information Retrieval
FIB, Master in Innovation and Research in Informatics Slides by Marta Arias, José Luis Balcázar, Ramon Ferrer-i-Cancho, Ricard Gavaldá
Department of Computer Science, UPC
Fall 2018 http://www.cs.upc.edu/~ir-miri
1 / 72
IR: Information Retrieval FIB, Master in Innovation and Research in - - PowerPoint PPT Presentation
IR: Information Retrieval FIB, Master in Innovation and Research in Informatics Slides by Marta Arias, Jos Luis Balczar, Ramon Ferrer-i-Cancho, Ricard Gavald Department of Computer Science, UPC Fall 2018 http://www.cs.upc.edu/~ir-miri 1
1 / 72
◮ real networks exhibit small diameter ◮ .. and so does the Erdös-Rényi or random model ◮ real networks have high clustering coefficient ◮ .. and so does the Watts-Strogatz model ◮ real networks’ degree distribution follows a power-law ◮ .. and so does the Barabasi-Albert or preferential attachment
3 / 72
4 / 72
5 / 72
6 / 72
7 / 72
8 / 72
9 / 72
10 / 72
11 / 72
◮ The diameter (longest shortest-path distance) as
i,j dij
◮ The average shortest-path length as
◮ The harmonic mean shortest-path length as
ij
12 / 72
13 / 72
14 / 72
15 / 72
16 / 72
17 / 72
◮ The transitivity or clustering coefficient, which basically
18 / 72
19 / 72
20 / 72
21 / 72
22 / 72
◮ Hence, ER networks do not have high clustering coefficient
23 / 72
24 / 72
25 / 72
26 / 72
◮ mimics local or geographical connectivity
◮ p = 0 high clustering, high diameter ◮ p = 1 low clustering, low diameter (ER model)
◮ As we increase p from 0 to 1 ◮ Fast decrease of mean distance ◮ Slow decrease in clustering 27 / 72
28 / 72
29 / 72
30 / 72
◮ so no hubs 31 / 72
32 / 72
◮ The more someone has, the more she is likely to have
◮ the more friends you have, the easier it is to make new ones ◮ the more business a firm has, the easier it is to win more ◮ the more people there are at a restaurant, the more who
33 / 72
◮ The model controls how a network grows over time
◮ new nodes prefer to attach to well-connected nodes
◮ the process starts with some initial subgraph ◮ each new node comes in with m edges ◮ probability of connecting to existing node i is proportional to
◮ results in a power-law degree distribution with exponent
34 / 72
35 / 72
1clustering coefficient is higher than in random networks, but not as high
36 / 72
◮ Degree centrality ◮ Closeness centrality ◮ Betweenness centrality
◮ Hierarchical clustering ◮ Agglomerative ◮ Girvan-Newman ◮ Modularity maximization: Louvain method 37 / 72
38 / 72
39 / 72
40 / 72
41 / 72
42 / 72
43 / 72
44 / 72
46 / 72
47 / 72
48 / 72
49 / 72
50 / 72
51 / 72
52 / 72
53 / 72
◮ Agglomerative ◮ Divisive (Girvan-Newman algorithm)
◮ Louvain method 54 / 72
55 / 72
56 / 72
57 / 72
40 60 80 −20 20 40 60 80
Data
Clustering 02 5 / 21
40 60 80 −20 20 40 60 80
iteration 001
V1 V2
Clustering 02 5 / 21
40 60 80 −20 20 40 60 80
iteration 002
V1 V2
Clustering 02 5 / 21
40 60 80 −20 20 40 60 80
iteration 003
V1 V2
Clustering 02 5 / 21
40 60 80 −20 20 40 60 80
iteration 004
V1 V2
Clustering 02 5 / 21
40 60 80 −20 20 40 60 80
iteration 005
V1 V2
Clustering 02 5 / 21
40 60 80 −20 20 40 60 80
iteration 006
V1 V2
Clustering 02 5 / 21
40 60 80 −20 20 40 60 80
iteration 007
V1 V2
Clustering 02 5 / 21
40 60 80 −20 20 40 60 80
iteration 008
V1 V2
Clustering 02 5 / 21
40 60 80 −20 20 40 60 80
iteration 009
V1 V2
Clustering 02 5 / 21
40 60 80 −20 20 40 60 80
iteration 010
V1 V2
Clustering 02 5 / 21
40 60 80 −20 20 40 60 80
iteration 011
V1 V2
Clustering 02 5 / 21
40 60 80 −20 20 40 60 80
iteration 012
V1 V2
Clustering 02 5 / 21
40 60 80 −20 20 40 60 80
iteration 013
V1 V2
Clustering 02 5 / 21
40 60 80 −20 20 40 60 80
iteration 014
V1 V2
Clustering 02 5 / 21
40 60 80 −20 20 40 60 80
iteration 015
V1 V2
Clustering 02 5 / 21
40 60 80 −20 20 40 60 80
iteration 016
V1 V2
Clustering 02 5 / 21
40 60 80 −20 20 40 60 80
iteration 017
V1 V2
Clustering 02 5 / 21
40 60 80 −20 20 40 60 80
iteration 018
V1 V2
Clustering 02 5 / 21
40 60 80 −20 20 40 60 80
iteration 019
V1 V2
Clustering 02 5 / 21
40 60 80 −20 20 40 60 80
iteration 020
V1 V2
Clustering 02 5 / 21
40 60 80 −20 20 40 60 80
iteration 021
V1 V2
Clustering 02 5 / 21
40 60 80 −20 20 40 60 80
iteration 022
V1 V2
Clustering 02 5 / 21
40 60 80 −20 20 40 60 80
iteration 023
V1 V2
Clustering 02 5 / 21
40 60 80 −20 20 40 60 80
iteration 024
V1 V2
Clustering 02 5 / 21
◮ nij = |Γ(i) ∩ Γ(j)| =
k AikAkj, and
◮ ki =
k Aik is the degree of node i
58 / 72
2From the equation xy = |x||y| cos θ 3Uses the idea that the maximum value of dij is when there are no
59 / 72
60 / 72
61 / 72
62 / 72
63 / 72
64 / 72
65 / 72
◮ partitions with large negative Q implies existence of cluster
66 / 72
67 / 72
◮ For each neighbor j of i, consider removing i from its
◮ Greedily chose to place i into community of neighbor that
68 / 72
69 / 72
70 / 72
71 / 72
72 / 72