To be connected, or not to be connected... That is the Minimum - - PowerPoint PPT Presentation

to be connected
SMART_READER_LITE
LIVE PREVIEW

To be connected, or not to be connected... That is the Minimum - - PowerPoint PPT Presentation

To be connected, or not to be connected... That is the Minimum Inefficiency Subgraph Problem Natali Ruchansky Francesco Bonchi David Garcia-Soriano Francesco Gullo Nicolas Kourtellis Biologists in Lab X have constructed a large


slide-1
SLIDE 1

To be connected,

  • r not to be connected...

That is the Minimum Inefficiency Subgraph Problem

Natali Ruchansky Francesco Bonchi David Garcia-Soriano Francesco Gullo Nicolas Kourtellis

slide-2
SLIDE 2

Biologists in Lab X have constructed a large protein-protein interaction network (PPI).

slide-3
SLIDE 3

Biologists in Lab X have constructed a large protein-protein interaction network (PPI). The PI has tasked them with making an amazing discovery about relationship among specific proteins P1, P2, and P3.

slide-4
SLIDE 4

Given a set of subjects in a terrorist network suspected of organizing an attack. Which other subjects, likely to be involved, should we keep under control? suspect 1 suspect 3 suspect 2

slide-5
SLIDE 5

Given a set of users who clicked on an ad, who else should the ad be displayed to? impression 1 impression 3 impression 2

slide-6
SLIDE 6

patient 1 patient 2 patient 3 Given a set of patients infected with a viral disease, which other people should we monitor?

slide-7
SLIDE 7

Community search / seed set expansion

  • General class of problems of the form:

Given a graph G=(V,E) and a set of vertices Q  V , find a subgraph H of G that “explains” the connections among Q. (H minimizes/maximizes some objective function)

  • Several approaches in the literature

– H must be a connected subgraph – Mostly based on random-walks – Tend to return rather large solutions – Solutions get very large when query nodes belong to different communities – Have parameters

slide-8
SLIDE 8

The Minimum Wiener Connector Problem

(SIGMOD 2015)

Our proposal: find the connected subgraph containing and minimizing the Wiener Index (the sum of pairwise distances)

  • Parameter-free
  • Returns smaller and denser subgraphs

No matter whether the query nodes belong to the same community or not

  • Add “important” nodes (high centrality)
  • Efficient algorithm with approximation guarantees
slide-9
SLIDE 9

Smaller, denser, and more central vertices

slide-10
SLIDE 10

Relaxing connectivity

instead of forcing connectivity relax the constraint

slide-11
SLIDE 11

Desired Properties

Parsimonious vertex addition

  • vertices should be added iff they help forming a more cohesive subgraph

Outlier Tolerance

  • query vertices which are far from others should remain disconnected

Multi-community awareness

  • if the query vertices span multiple communities, connectedness should not

be imposed among them

slide-12
SLIDE 12

Cohesiveness

  • As with the Wiener Connector, we leverage shortest path distances;

however, the distance between disconnected vertices is infinite.

  • Idea: use the reciprocal of the shortest-path distance! This has the

useful property of handling disconnection neatly ( ) Network Efficiency (Latora and Marchiori): Harmonic Centrality (Boldi and Vigna):

slide-13
SLIDE 13

What about these problem statements?

Given a graph G=(V,E) and a set of vertices Q  V, find a (not-necessarily connected) subgraph H of G, with Q  V(H) that maximizes network efficiency E(H) Given a graph G=(V,E) and a set of vertices Q  V, find a (not-necessarily connected) subgraph H of G, with Q  V(H) that maximizes the total harmonic centrality C(H)

slide-14
SLIDE 14

2 3 1 C(G[Q])=0 E(G[Q])=0

These do not work…

C(H)=0 E(H)=0 C(H)=9900 E(H)=0.942 4 2 3 1 a clique of size 100 2 3 1

slide-15
SLIDE 15

Minimize Network Inefficiency

Given a graph G=(V,E), we define its inefficiency as: Note:

slide-16
SLIDE 16

2 3 1

… and this works

4 2 3 1 a clique of size 100 2 3 1 C(G[Q])=0 E(G[Q])=0 I(G[Q])=6 C(G[Q])=0 E(G[Q])=0 I(G[Q])=12 C(G[Q])=9900 E(G[Q])=0.942 I(G[Q])=606

slide-17
SLIDE 17

Problem statement and hardness

slide-18
SLIDE 18

Greedy Algorithm

Choose

Choose the intermediate solution S that minimizes I(S)

Remove

Remove one vertex at a time until Q is disconnected

Connect

Start with the Minimum Wiener Connector for Q

slide-19
SLIDE 19

Competitors

KDD’06 KDD’10 SIGMOD’15 SDM’13 ICDE’15

slide-20
SLIDE 20

Brain Co-activation Network

relaxing connectivity highlights three different functional relationships and gives a smaller, more interpretable solution

The 3 components in the solution end up corresponding to different functions: motor, visual, and emotional. The data is a graph where each vertex is an area of the brain and edges are added according to co-activation in experiments. (The graph is one connected component) query vertices extra vertices

slide-21
SLIDE 21

Brain Co-activation Network: competitors

slide-22
SLIDE 22

Experimental Results

Parsimonious vertex addition

  • vertices should be added iff they help forming a more cohesive subgraph

Outlier Tolerance

  • query vertices which are far from others should remain disconnected

Multi-community awareness

  • if the query vertices span multiple communities, connectedness should not

be imposed among them

slide-23
SLIDE 23

Experimental Results

solution size # query vertices # disconnected singletons in solution # outliers selected # of communities spanned by Q # connected component in solution

slide-24
SLIDE 24

coffee scallop black bean

  • nion

mushroom bell pepper lemongrass honey peanut butter soybean scallop black bean

  • nion

mushroom bell pepper honey soybean lemongrass beef scallop black bean

  • nion

mushroom bell pepper honey lemongrass

Minimum Inefficiency Bump Hunting MDL-based

Cohesive meal creation

slide-25
SLIDE 25

Minimum Inefficiency Bump Hunting MDL-based

SMAD4 BRAC1 NF1 CTNNB1 ERBB3 NOD2 FAM100B ESR1 NRAS SMAD4 BRAC1 NF1 ERBB3 NOD2 FAM100B SMARCA4 NRAS PIK3CA ELAV MUC1 GALNT2 CTNNB1 SMAD4 BRAC1 NF1 CTNNB1 ERBB3 NOD2 FAM100B NRAS

Biology

slide-26
SLIDE 26

Takeaway

how are we related? you love cats! but I don’t... Selective Connector

slide-27
SLIDE 27