Proximity measures applied to community detection in complex - - PowerPoint PPT Presentation

proximity measures applied to community detection in
SMART_READER_LITE
LIVE PREVIEW

Proximity measures applied to community detection in complex - - PowerPoint PPT Presentation

c n r s - u p m c l a b o r a t o i r e d i n f o r m a t i q u e d e p a r i s 6 Proximity measures applied to community detection in complex networks Maximilien Danisch Thesis supervised by: Jean-Loup Guillaume and B en edicte Le


slide-1
SLIDE 1

c n r s - u p m c l a b o r a t o i r e d ’ i n f o r m a t i q u e d e p a r i s 6

Proximity measures applied to community detection in complex networks

Maximilien Danisch Thesis supervised by: Jean-Loup Guillaume and B´ en´ edicte Le Grand Complex networks team, LIP6-CNRS-UPMC CRI, Univ. Paris 1

December 15, 2015

slide-2
SLIDE 2

c n r s - u p m c l a b o r a t o i r e d ’ i n f o r m a t i q u e d e p a r i s 6

Complex networks

Definition

Network/Graph: a set of nodes linked by edges. networks nodes edges Facebook profiles friendship Internet computers connections Web web pages hyperlinks P2P peers file exchanges

Remark

  • increasingly more networks
  • increasingly larger networks
  • have common properties

⇒ need for fast and generic algorithms to understand them and extract knowledge.

Maximilien Danisch — Proximity measures and communities — December 15, 2015 2/56

slide-3
SLIDE 3

c n r s - u p m c l a b o r a t o i r e d ’ i n f o r m a t i q u e d e p a r i s 6

Why detecting groups of nodes?

Organization

Classification of documents (e.g. Wikipedia pages). Organization of friends’ lists on Facebook.

Recommendation

“People you may know” on Facebook or LinkedIn. “You may also like...” on Amazon.

Prediction

What is this unknown P2P file? What is the function of this unknown protein?

Maximilien Danisch — Proximity measures and communities — December 15, 2015 3/56

slide-4
SLIDE 4

c n r s - u p m c l a b o r a t o i r e d ’ i n f o r m a t i q u e d e p a r i s 6

Group of nodes = Community

Some definitions

  • a set of nodes that are similar,
  • a set of nodes that are close to one another,
  • a set of nodes highly connected inside, but poorly connected
  • utside.

Remark

Different visions of communities:

  • overlapping / partition
  • global / local

Maximilien Danisch — Proximity measures and communities — December 15, 2015 4/56

slide-5
SLIDE 5

c n r s - u p m c l a b o r a t o i r e d ’ i n f o r m a t i q u e d e p a r i s 6

Positioning

Survey articles

  • Partition: Fortunato’s survey 2010 (500 references)

⇒ A lot of work has been done.

  • Overlapping communities: Xi et al. 2013

⇒ More realistic than partition, but less work has been done and methods do not scale.

  • From local to global: Kanawati 2014

⇒ if you solve locally, you may be able to solve globally.

Position

⇒ Chose to work on overlapping communities with a local approach in order to

  • be realistic and
  • be able to treat large networks.

Maximilien Danisch — Proximity measures and communities — December 15, 2015 5/56

slide-6
SLIDE 6

c n r s - u p m c l a b o r a t o i r e d ’ i n f o r m a t i q u e d e p a r i s 6

Communities detection: how?

Observation

Local community detection is done mainly using greedy heuristics to optimize an ad hoc quality function.

Examples of quality function

  • rd(S) =

li li+lo

  • C(S) = △i(S)

(|S|

3 ) ×

△i(S) △i(S)+△o(S)

Friggeri et al. 2011

Problems:

1 design of the quality function is difficult, 2 optimization can be trapped in local minima.

Maximilien Danisch — Proximity measures and communities — December 15, 2015 6/56

slide-7
SLIDE 7

c n r s - u p m c l a b o r a t o i r e d ’ i n f o r m a t i q u e d e p a r i s 6

Let’s not use quality functions!

Idea

A community can be defined as a set of nodes close to each other. ⇒ Let’s try to use proximity measures instead of quality functions to find communities.

Maximilien Danisch — Proximity measures and communities — December 15, 2015 7/56

slide-8
SLIDE 8

c n r s - u p m c l a b o r a t o i r e d ’ i n f o r m a t i q u e d e p a r i s 6

Outline / Contributions

1 How to use proximity measures for community detection 2 Proposition of two proximity measures:

  • propagated opinion
  • Katz+

3 Framework to:

  • find the communities of a given node (local)
  • find overlapping communities (global)
  • complete a set of nodes into a community

Maximilien Danisch — Proximity measures and communities — December 15, 2015 8/56

slide-9
SLIDE 9

c n r s - u p m c l a b o r a t o i r e d ’ i n f o r m a t i q u e d e p a r i s 6

Networks and communities

Maximilien Danisch — Proximity measures and communities — December 15, 2015 9/56

slide-10
SLIDE 10

c n r s - u p m c l a b o r a t o i r e d ’ i n f o r m a t i q u e d e p a r i s 6

Ego-centered community

Maximilien Danisch — Proximity measures and communities — December 15, 2015 10/56

slide-11
SLIDE 11

c n r s - u p m c l a b o r a t o i r e d ’ i n f o r m a t i q u e d e p a r i s 6

Ego-centered community

Maximilien Danisch — Proximity measures and communities — December 15, 2015 11/56

slide-12
SLIDE 12

c n r s - u p m c l a b o r a t o i r e d ’ i n f o r m a t i q u e d e p a r i s 6

On a small visualizable network

50 100 150 200 250 300

RANK OF THE NODE

0.000 0.005 0.010 0.015 0.020 0.025

PROXIMITY

Maximilien Danisch — Proximity measures and communities — December 15, 2015 12/56

slide-13
SLIDE 13

c n r s - u p m c l a b o r a t o i r e d ’ i n f o r m a t i q u e d e p a r i s 6

Wikipedia category = community

10 10

1

10

2

10

3

10

4

10

5

10

6

10

  • 5

10

  • 4

10

  • 3

10

  • 2

10

  • 1

PROXIMITY TO THE NODE MAGNUS CALSEN

10 10

1

10

2

10

3

10

4

10

5

10

6

RANK OF THE NODE

500 1000 1500 2000 2500 3000 3500 4000

NUMBER OF TOPK NODES IN THE CATEGORY CHESS

Maximilien Danisch — Proximity measures and communities — December 15, 2015 13/56

slide-14
SLIDE 14

c n r s - u p m c l a b o r a t o i r e d ’ i n f o r m a t i q u e d e p a r i s 6

Ego-centered community

Maximilien Danisch — Proximity measures and communities — December 15, 2015 14/56

slide-15
SLIDE 15

c n r s - u p m c l a b o r a t o i r e d ’ i n f o r m a t i q u e d e p a r i s 6

Ego-centered community

Maximilien Danisch — Proximity measures and communities — December 15, 2015 15/56

slide-16
SLIDE 16

c n r s - u p m c l a b o r a t o i r e d ’ i n f o r m a t i q u e d e p a r i s 6

On a small visualizable network

50 100 150 200 250 300 350 400

RANK OF THE NODE

0.0 0.2 0.4 0.6 0.8 1.0

PROXIMITY

Maximilien Danisch — Proximity measures and communities — December 15, 2015 16/56

slide-17
SLIDE 17

c n r s - u p m c l a b o r a t o i r e d ’ i n f o r m a t i q u e d e p a r i s 6

Several behaviors in Wikipedia:

10 10

1

10

2

10

3

10

4

10

5

10

6

10

7

10

  • 6

10

  • 5

10

  • 4

10

  • 3

10

  • 2

10

  • 1

10

sharp transition smooth transition deformed power-law perfect power-law

RANK OF THE NODE PROXIMITY

Maximilien Danisch — Proximity measures and communities — December 15, 2015 17/56

slide-18
SLIDE 18

c n r s - u p m c l a b o r a t o i r e d ’ i n f o r m a t i q u e d e p a r i s 6

Ego-centered community

Maximilien Danisch — Proximity measures and communities — December 15, 2015 18/56

slide-19
SLIDE 19

c n r s - u p m c l a b o r a t o i r e d ’ i n f o r m a t i q u e d e p a r i s 6

Ego-centered community

Maximilien Danisch — Proximity measures and communities — December 15, 2015 19/56

slide-20
SLIDE 20

c n r s - u p m c l a b o r a t o i r e d ’ i n f o r m a t i q u e d e p a r i s 6

Ego-centered community

Maximilien Danisch — Proximity measures and communities — December 15, 2015 20/56

slide-21
SLIDE 21

c n r s - u p m c l a b o r a t o i r e d ’ i n f o r m a t i q u e d e p a r i s 6

Multi-ego-centered community

for each node: minimum

  • f the two proximities

Maximilien Danisch — Proximity measures and communities — December 15, 2015 21/56

slide-22
SLIDE 22

c n r s - u p m c l a b o r a t o i r e d ’ i n f o r m a t i q u e d e p a r i s 6

Multi-egocentered communities

Maximilien Danisch — Proximity measures and communities — December 15, 2015 22/56

slide-23
SLIDE 23

c n r s - u p m c l a b o r a t o i r e d ’ i n f o r m a t i q u e d e p a r i s 6

Bi-ego-centered communities

In Wikipedia: Folk wrestling + Torii school =

10 10

1

10

2

10

3

10

4

10

5

10

6

10

7

10

  • 6

10

  • 5

10

  • 4

10

  • 3

10

  • 2

10

  • 1

10

Torii school Folk wrestling

PROXIMITY RANK OF THE NODE

Maximilien Danisch — Proximity measures and communities — December 15, 2015 23/56

slide-24
SLIDE 24

c n r s - u p m c l a b o r a t o i r e d ’ i n f o r m a t i q u e d e p a r i s 6

Bi-ego-centered communities

In Wikipedia: Folk wrestling + Torii school = Sumo

10 10

1

10

2

10

3

10

4

10

5

10

6

10

7

10

  • 6

10

  • 5

10

  • 4

10

  • 3

10

  • 2

10

  • 1

10

Torii school Folk wrestling

PROXIMITY RANK OF THE NODE

10 10

1

10

2

10

3

10

4

10

5

10

6

10

7

RANK OF THE NODE

10

  • 6

10

  • 5

10

  • 4

10

  • 3

10

  • 2

10

  • 1

10

PROXIMITY

Sumo Minimum Rescaled minimum

Maximilien Danisch — Proximity measures and communities — December 15, 2015 23/56

slide-25
SLIDE 25

c n r s - u p m c l a b o r a t o i r e d ’ i n f o r m a t i q u e d e p a r i s 6

Which proximity measure?

Classical proximity measures

  • distance: number of hops between two nodes
  • number of common neighbors between two nodes
  • Katz index
  • commuting time
  • hitting time
  • rooted page-rank

Remark

  • need to have a discriminative measure
  • need to compute the proximity of all nodes to the node of

interest in a fast way

  • with or without parameters?

Maximilien Danisch — Proximity measures and communities — December 15, 2015 24/56

slide-26
SLIDE 26

c n r s - u p m c l a b o r a t o i r e d ’ i n f o r m a t i q u e d e p a r i s 6

Outline / Contributions

1 How to use proximity measures for community detection 2 Proposition of two proximity measures:

  • propagated opinion
  • Katz+

3 Framework to:

  • find the communities of a given node (local)
  • find overlapping communities (global)
  • complete a set of nodes into a community

Maximilien Danisch — Proximity measures and communities — December 15, 2015 25/56

slide-27
SLIDE 27

c n r s - u p m c l a b o r a t o i r e d ’ i n f o r m a t i q u e d e p a r i s 6

A parameter free proximity measure: the propagated opinion

50 100 150 200 250 300 350 400

RANK OF THE NODE

0.0 0.2 0.4 0.6 0.8 1.0

PROXIMITY

Xt = MXt−1 AVERAGING X i

t

= 1 RESETING

Maximilien Danisch — Proximity measures and communities — December 15, 2015 26/56

slide-28
SLIDE 28

c n r s - u p m c l a b o r a t o i r e d ’ i n f o r m a t i q u e d e p a r i s 6

A parameter free proximity measure: the propagated opinion

50 100 150 200 250 300 350 400

RANK OF THE NODE

0.0 0.2 0.4 0.6 0.8 1.0

PROXIMITY

10 iterations 20 iterations 100 iterations 10000 iterations

Xt = MXt−1 AVERAGING Xt =

Xt−min(Xt) 1−min(Xt)

RESCALING X i

t

= 1 RESETING

Maximilien Danisch — Proximity measures and communities — December 15, 2015 26/56

slide-29
SLIDE 29

c n r s - u p m c l a b o r a t o i r e d ’ i n f o r m a t i q u e d e p a r i s 6

Problem

50 100 150 200 250

RANK OF THE NODE

0.00 0.01 0.02 0.03 0.04 0.05 0.06

PROXIMITY

Maximilien Danisch — Proximity measures and communities — December 15, 2015 27/56

slide-30
SLIDE 30

c n r s - u p m c l a b o r a t o i r e d ’ i n f o r m a t i q u e d e p a r i s 6

Outline / Contributions

1 How to use proximity measures for community detection 2 Proposition of two proximity measures:

  • propagated opinion
  • Katz+

3 Framework to:

  • find the communities of a given node (local)
  • find overlapping communities (global)
  • complete a set of nodes into a community

Maximilien Danisch — Proximity measures and communities — December 15, 2015 28/56

slide-31
SLIDE 31

c n r s - u p m c l a b o r a t o i r e d ’ i n f o r m a t i q u e d e p a r i s 6

A parametrized proximity measure: Katz+

n common neighbors 2 1 3

Which node is closer to 1: 2 or 3?

“distance VS redundancy” trade-off: Pα(i, j) =

λ

  • l=0

αlNpath

l

(i, j)

Maximilien Danisch — Proximity measures and communities — December 15, 2015 29/56

slide-32
SLIDE 32

c n r s - u p m c l a b o r a t o i r e d ’ i n f o r m a t i q u e d e p a r i s 6

A parametrized proximity measure: Katz+

2 1 3

Which node is closer to 1: 2 or 3?

“popularity VS intimacy” trade-off: Pα,β(i, j) = 1 dβ

j λ

  • l=0

αlNpath

l

(i, j)

Maximilien Danisch — Proximity measures and communities — December 15, 2015 30/56

slide-33
SLIDE 33

c n r s - u p m c l a b o r a t o i r e d ’ i n f o r m a t i q u e d e p a r i s 6

Result 1: Katz+

50 100 150 200 250

RANK OF THE NODE

0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5

PROXIMITY

Maximilien Danisch — Proximity measures and communities — December 15, 2015 31/56

slide-34
SLIDE 34

c n r s - u p m c l a b o r a t o i r e d ’ i n f o r m a t i q u e d e p a r i s 6

Result 2: Katz+

50 100 150 200 250

RANK OF THE NODE

0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4

PROXIMITY

Maximilien Danisch — Proximity measures and communities — December 15, 2015 32/56

slide-35
SLIDE 35

c n r s - u p m c l a b o r a t o i r e d ’ i n f o r m a t i q u e d e p a r i s 6

Result 3: Katz+

50 100 150 200 250

RANK OF THE NODE

2 4 6 8 10 12 14 16

PROXIMITY

Maximilien Danisch — Proximity measures and communities — December 15, 2015 33/56

slide-36
SLIDE 36

c n r s - u p m c l a b o r a t o i r e d ’ i n f o r m a t i q u e d e p a r i s 6

Result 4: Katz+

50 100 150 200 250

RANK OF THE NODE

0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09

PROXIMITY

Maximilien Danisch — Proximity measures and communities — December 15, 2015 34/56

slide-37
SLIDE 37

c n r s - u p m c l a b o r a t o i r e d ’ i n f o r m a t i q u e d e p a r i s 6

Outline / Contributions

1 How to use proximity measures for community detection 2 Proposition of two proximity measures:

  • propagated opinion
  • Katz+

3 Framework to:

  • find the communities of a given node (local)
  • find overlapping communities (global)
  • complete a set of nodes into a community

Maximilien Danisch — Proximity measures and communities — December 15, 2015 35/56

slide-38
SLIDE 38

c n r s - u p m c l a b o r a t o i r e d ’ i n f o r m a t i q u e d e p a r i s 6

1- Choosing a set of candidates

Maximilien Danisch — Proximity measures and communities — December 15, 2015 36/56

slide-39
SLIDE 39

c n r s - u p m c l a b o r a t o i r e d ’ i n f o r m a t i q u e d e p a r i s 6

2- Looking for bi-egocentered communities

10 10

1

10

2

10

3

10

4

10

5

10

6

10

  • 6

10

  • 5

10

  • 4

10

  • 3

10

  • 2

10

  • 1

10

Chess Boxing Chessboard MIN: Chess notation

PROXIMITY RANK OF THE NODE

10 10

1

10

2

10

3

10

4

10

5

10

6

10

  • 6

10

  • 5

10

  • 4

10

  • 3

10

  • 2

10

  • 1

10

Chess Boxing Achi, Nagano MIN: Morabaraba

PROXIMITY RANK OF THE NODE

Maximilien Danisch — Proximity measures and communities — December 15, 2015 37/56

slide-40
SLIDE 40

c n r s - u p m c l a b o r a t o i r e d ’ i n f o r m a t i q u e d e p a r i s 6

3- Cleaning and labeling the output

Jaccard matrix for the communities obtained before cleaning:

100 200 300 400 500 600 700 100 200 300 400 500 600 700 10 20 30 40 50 10 20 30 40 50 10 20 30 40 50 10 20 30 40 50 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

3000 candidates ⇒ 770 communities ⇒ 5 communities after cleaning.

Maximilien Danisch — Proximity measures and communities — December 15, 2015 38/56

slide-41
SLIDE 41

c n r s - u p m c l a b o r a t o i r e d ’ i n f o r m a t i q u e d e p a r i s 6

Outline / Contributions

1 How to use proximity measures for community detection 2 Proposition of two proximity measures:

  • propagated opinion
  • Katz+

3 Framework to:

  • find the communities of a given node (local)
  • find overlapping communities (global)
  • complete a set of nodes into a community

Maximilien Danisch — Proximity measures and communities — December 15, 2015 39/56

slide-42
SLIDE 42

c n r s - u p m c l a b o r a t o i r e d ’ i n f o r m a t i q u e d e p a r i s 6

Representative’s community

Maximilien Danisch — Proximity measures and communities — December 15, 2015 40/56

slide-43
SLIDE 43

c n r s - u p m c l a b o r a t o i r e d ’ i n f o r m a t i q u e d e p a r i s 6

Representatives in Wikipedia

Category candidate # nodes in the F-score representative found community Chess Magnus Carlsen 4259 0.965 (3850 nodes) Chess 4239 0.970 Queen’s Gambit 4542 0.977 World Chess Championship 4338 0.970 Chess Boxing 37 0.009 Sumo Sumo 443 0.951 (445 nodes) Taiho Koki 486 0.965 Yokozuna 504 0.962 Lyoto Machida 87 0.004 Boxing Boxing 6814 0.890 (7289 nodes) Cruiserweight 8739 0.924 Vitali Klitschko 9942 0.835 Chess Boxing 37 0.001

Maximilien Danisch — Proximity measures and communities — December 15, 2015 41/56

slide-44
SLIDE 44

c n r s - u p m c l a b o r a t o i r e d ’ i n f o r m a t i q u e d e p a r i s 6

Comparing found communities to ground truth communities

Comparing two communities

  • F1(a, b) = 2 |a∩b|

|a|+|b|

Comparing two sets of communities

  • F1(A, B) =

1 |A|

  • a∈A

max

b∈B F1(a, b)

⇒ how much A looks like B: not symmetric

  • S(A, B) = 2 F1(A,B) F1(B,A)

F1(A,B)+F1(B,A)

(symmetric)

Maximilien Danisch — Proximity measures and communities — December 15, 2015 42/56

slide-45
SLIDE 45

c n r s - u p m c l a b o r a t o i r e d ’ i n f o r m a t i q u e d e p a r i s 6

Comparison

Network k-clique BigClam link approche percolation partition proximity scholar 0.14 0.16 0.15 0.07 0.00 0.01 0.16 0.12 0.14 0.13 0.13 0.13 16773 26318 (k=3) 100 11133 18845 DBLP 0.00 0.03 0.00 0.05 0.10 0.07 0.00 0.03 0.00 0.32 0.19 0.24 24 3113 (k=5) 50 12871 15 wiki08 0.67 0.03 0.04 79365 355 wiki12 0.72 0.03 0.05 730752 2046

F1(FC, GTC), F1(GTC, FC), S(FC, GTC) and number of communities.

  • Performance equivalent or better than state of the art.
  • Scale to very large graphs contrary to state of the art.

Maximilien Danisch — Proximity measures and communities — December 15, 2015 43/56

slide-46
SLIDE 46

c n r s - u p m c l a b o r a t o i r e d ’ i n f o r m a t i q u e d e p a r i s 6

Outline / Contributions

1 How to use proximity measures for community detection 2 Proposition of two proximity measures:

  • propagated opinion
  • Katz+

3 Framework to:

  • find the communities of a given node (local)
  • find overlapping communities (global)
  • complete a set of nodes into a community

Maximilien Danisch — Proximity measures and communities — December 15, 2015 44/56

slide-47
SLIDE 47

c n r s - u p m c l a b o r a t o i r e d ’ i n f o r m a t i q u e d e p a r i s 6

Learning the parameters of a parametrized proximity measure

: in the input set of nodes : not in the input set of nodes

Learn the parameters by AUC grid-search optimization.

Maximilien Danisch — Proximity measures and communities — December 15, 2015 45/56

slide-48
SLIDE 48

c n r s - u p m c l a b o r a t o i r e d ’ i n f o r m a t i q u e d e p a r i s 6

Learning by AUC optimization

0.0001 0.001 0.01 0.1 1 alpha 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 beta 0.989 0.9892 0.9894 0.9896 0.9898 0.99 0.9902 0.9904

AUC of the ranking as a function of α and β.

Maximilien Danisch — Proximity measures and communities — December 15, 2015 46/56

slide-49
SLIDE 49

c n r s - u p m c l a b o r a t o i r e d ’ i n f o r m a t i q u e d e p a r i s 6

Validation

500 1000 1500 2000 2500 3000 3500 4000

k

0.0 0.2 0.4 0.6 0.8 1.0

PROPORTION KATZ+ KATZ T and F PROP-OP DISTANCE

Comparison to other proximity measures for the task of completing the Wikipedia category “graph theory”.

Maximilien Danisch — Proximity measures and communities — December 15, 2015 47/56

slide-50
SLIDE 50

c n r s - u p m c l a b o r a t o i r e d ’ i n f o r m a t i q u e d e p a r i s 6

Conclusion

  • We showed how to use proximity measures for community

detection.

  • We proposed two new proximity measures.
  • We proposed frameworks to solve three variations of the

community detection problem.

  • Our approach is a viable alternative to the classical approach
  • f quality function greedy optimization.
  • We suggested solutions to two community-related problems:

1 Social capitalists detection in Twitter, 2 Pedophile queries made on P2P networks.

  • We’ve underlined the lack of a realistic benchmark to test
  • verlapping community detection algorithms and tried to
  • pen a path towards one.

Maximilien Danisch — Proximity measures and communities — December 15, 2015 48/56

slide-51
SLIDE 51

c n r s - u p m c l a b o r a t o i r e d ’ i n f o r m a t i q u e d e p a r i s 6

Perspectives

Adapt the method to follow dynamic communities

Proximity approach is stable and communities labeled by representatives.

Take into account links’ orientation

  • Survey: Malliaros & Vazirgiannis 2013.
  • How to better take into account links’ orientation in community

detection algorithms?

  • The study of oriented triangles could lead to some insights!

Redefine the notion of community

  • Current classic definition “set of nodes highly connected inside but poorly

connected outside” is wrong. Other definitions are too broad.

  • Study ground truth communities and capture what makes them

communities.

  • Design a better model of network with communities and a community

detection framework out of the study.

Maximilien Danisch — Proximity measures and communities — December 15, 2015 49/56

slide-52
SLIDE 52

c n r s - u p m c l a b o r a t o i r e d ’ i n f o r m a t i q u e d e p a r i s 6

Following dynamic communities

Observation

Traditional approaches detect communities at the different timestamps and match them. Two problems with the quality function optimization:

1 the optimization is not stable to perturbations 2 it can be hard to match the communities

Maximilien Danisch — Proximity measures and communities — December 15, 2015 50/56

slide-53
SLIDE 53

c n r s - u p m c l a b o r a t o i r e d ’ i n f o r m a t i q u e d e p a r i s 6

Stability despite perturbations on Wikipedia

10 10

1

10

2

10

3

10

4

10

5

10

6

RANK OF THE NODE

10

  • 5

10

  • 4

10

  • 3

10

  • 2

10

  • 1

PROXIMITY TO THE NODE MAGNUS CARLSEN 0% 5% 10% 20% 40%

Maximilien Danisch — Proximity measures and communities — December 15, 2015 51/56

slide-54
SLIDE 54

c n r s - u p m c l a b o r a t o i r e d ’ i n f o r m a t i q u e d e p a r i s 6

Use representatives as labels Wikip´ edia 2008 VS Wikip´ edia 2012

10 10

1

10

2

10

3

10

4

10

5

10

6

10

  • 5

10

  • 4

10

  • 3

10

  • 2

10

  • 1

PROXIMITY TO THE NODE MAGNUS CARLSEN IN 2008

10 10

1

10

2

10

3

10

4

10

5

10

6

RANK OF THE NODE

10

  • 5

10

  • 4

10

  • 3

10

  • 2

10

  • 1

PROXIMITY TO THE NODE MAGNUS CARLSEN IN 2012

Maximilien Danisch — Proximity measures and communities — December 15, 2015 52/56

slide-55
SLIDE 55

c n r s - u p m c l a b o r a t o i r e d ’ i n f o r m a t i q u e d e p a r i s 6

Perspectives

Adapt the method to follow dynamic communities

Proximity approach is stable and communities labeled by representatives.

Take into account links’ orientation

  • Survey: Malliaros & Vazirgiannis 2013.
  • How to better take into account links’ orientation in community

detection algorithms?

  • The study of oriented triangles could lead to some insights!

Redefine the notion of community

  • Current classic definition “set of nodes highly connected inside but poorly

connected outside” is wrong. Other definitions are too broad.

  • Study ground truth communities and capture what makes them

communities.

  • Design a better model of network with communities and a community

detection framework out of the study.

Maximilien Danisch — Proximity measures and communities — December 15, 2015 53/56

slide-56
SLIDE 56

c n r s - u p m c l a b o r a t o i r e d ’ i n f o r m a t i q u e d e p a r i s 6

Oriented triangles and communities

1 2 3 4 5 6 7

Network Wikipedia Twitter triangle 1 0.011 0.021 triangle 2 0.448 0.266 triangle 3 0.028 0.172 triangle 4 1.539 0.733 triangle 5 1.418 0.436 triangle 6 0.797 2.224 triangle 7 12.608 45.9634 Number of triangles divided by the expected number of triangles.

Maximilien Danisch — Proximity measures and communities — December 15, 2015 54/56

slide-57
SLIDE 57

c n r s - u p m c l a b o r a t o i r e d ’ i n f o r m a t i q u e d e p a r i s 6

Perspectives

Adapt the method to follow dynamic communities

Proximity approach is stable and communities labeled by representatives.

Take into account links’ orientation

  • Survey: Malliaros & Vazirgiannis 2013.
  • How to better take into account links’ orientation in community

detection algorithms?

  • The study of oriented triangles could lead to some insights!

Redefine the notion of community

  • Current classic definition “set of nodes highly connected inside but poorly

connected outside” is wrong. Other definitions are too broad.

  • Study ground truth communities and capture what makes them

communities.

  • Design a better model of network with communities and a community

detection framework out of the study.

Maximilien Danisch — Proximity measures and communities — December 15, 2015 55/56

slide-58
SLIDE 58

c n r s - u p m c l a b o r a t o i r e d ’ i n f o r m a t i q u e d e p a r i s 6

13 publications

  • 2 international journals:
  • Towards multi-ego-centered communities: a node

similarity approach.

  • M. Danisch, J.-L. Guillaume and B. Le Grand.

IJWC 2013.

  • Multi-ego-centered communities in practice.
  • M. Danisch, J.-L. Guillaume, B. Le Grand.

SNAM 2013.

  • 4 international conferences:
  • Mining bipartite graphs to improve semantic

pedophile activity detection.

  • R. Fournier and M. Danisch. RCIS 2014.
  • On the importance of considering social

capitalism when measuring influence on Twitter.

  • M. Danisch, N. Dugu´

e and A. Perez. BESC 2014.

  • Direct Generation of Random Graphs Exactly

Realising a Prescribed Degree Sequence.

  • D. Obradovic and M. Danisch. CASoN 2014.
  • Learning a Proximity Measure to Complete a

Community.

  • M. Danisch, J.-L. Guillaume and B. Le Grand.

DSAA 2014.

  • 1 international workshop:
  • Unfolding ego-centered community structures

with ”a similarity approach”.

  • M. Danisch, J.-L. Guillaume and B. Le Grand.

CompleNet IV 2013.

  • 1 international book chapter:
  • Multi-ego-centered communities.
  • M. Danisch, J.-L. Guillaume and B. Le Grand.

ComNetBook 2013.

  • 4 national conferences:
  • Une approche `

a base de similarit´ e pour la d´ etection de communaut´ es egocentr´ ees.

  • M. Danisch, J.-L. Guillaume and B. Le Grand.

ALGOTEL 2013.

  • Compl´

etion de communaut´ es par l’apprentissage d’une mesure de proximit´ e.

  • M. Danisch, J.-L. Guillaume and B. Le Grand.

ALGOTEL 2014.

  • Prendre en compte le capitalisme social dans la

mesure de l’influence sur Twitter.

  • M. Danisch, N. Dugu´

e and A. Perez. MARAMI 2014.

  • Structures biparties et communaut´

es recouvrantes des graphes de terrains.

  • R. Tackx, M. Danisch and F. Tarissan.

MARAMI 2014.

  • 1 national workshop:

eplier les structures communautaires egocentr´ ees - une approche ` a base de similarit´ e.

  • M. Danisch, J.-L. Guillaume and B. Le Grand.

AFGG-EGC 2013.

Maximilien Danisch — Proximity measures and communities — December 15, 2015 56/56