Learning to rank and compare graph layouts Toby Dylan Hocking - - PowerPoint PPT Presentation

learning to rank and compare graph layouts
SMART_READER_LITE
LIVE PREVIEW

Learning to rank and compare graph layouts Toby Dylan Hocking - - PowerPoint PPT Presentation

Learning to rank and compare graph layouts Toby Dylan Hocking toby@sg.cs.titech.ac.jp http://sugiyama-www.cs.titech.ac.jp/~toby/ joint work with Supaporn Spanurattana and Masashi Sugiyama 6 Aug 2013 Introduction: what makes a graph layout good


slide-1
SLIDE 1

Learning to rank and compare graph layouts

Toby Dylan Hocking toby@sg.cs.titech.ac.jp http://sugiyama-www.cs.titech.ac.jp/~toby/ joint work with Supaporn Spanurattana and Masashi Sugiyama 6 Aug 2013

slide-2
SLIDE 2

Introduction: what makes a graph layout good or bad? Learning to rank and compare graph layouts

slide-3
SLIDE 3

Biology is full of networks (graphs)

Source: Kyoto encyclopedia of genes and genomes (KEGG).

slide-4
SLIDE 4

Biology is full of networks (graphs)

Source: Wikipedia “Citric acid cycle.”

slide-5
SLIDE 5

Goal: find a good layout for a particular graph

Two categories of methods for graph layout

◮ Heuristic layout algorithms:

◮ Force-directed ◮ Hierarchical clustering (trees/dendrograms) ◮ Hive plots ◮ ...

◮ Manual layout using programs such as:

◮ Cytoscape/cytoscape.js ◮ Gephi ◮ Image processing: gimp/inkscape ◮ ...

slide-6
SLIDE 6

Force-directed layout has many tuning parameters

Source: Data-Driven Documents (D3) JavaScript visualization library (Bostock 2011). parameter min default max size ? 1 x 1 ? link distance 20 ∞ link strength 1 1 friction 0.9 1 charge −∞

  • 30

∞ theta 0.8 ∞ gravity 0.1 ∞ Question: how to tune these parameters for a specific graph?

slide-7
SLIDE 7

Manual layout using a GUI is time-consuming

◮ Try default parameters of several different algorithms. ◮ Play with tuning parameters, select a combination that looks

good.

◮ Finally, refine the algorithm’s layout by dragging nodes to

positions that look better. Goal: learn from a database of manually labeled graphs.

slide-8
SLIDE 8

Manual layout using a GUI is time-consuming

◮ Try default parameters of several different algorithms. ◮ Play with tuning parameters, select a combination that looks

good.

◮ Finally, refine the algorithm’s layout by dragging nodes to

positions that look better. Goal: learn from a database of manually labeled graphs.

slide-9
SLIDE 9

Pairwise comparison in the graph layout literature

Source: Holten and van Wijk, “Force-Directed Edge Bundling for Graph Visualization,” EuroVis 2009.

slide-10
SLIDE 10

Pairwise comparison in the graph layout literature

Source: Muelder and Ma, “Rapid Graph Layout Using Space Filling Curves,” InfoVis 2008.

slide-11
SLIDE 11

Pairwise comparison in the graph layout literature

Source: Gorochowski et al., “Using Aging to Visually Uncover Evolutionary Processes on Networks,” IEEE Trans. Viz 2012.

slide-12
SLIDE 12

Introduction: what makes a graph layout good or bad? Learning to rank and compare graph layouts

slide-13
SLIDE 13

Learning a comparison function

We are given n training pairs (Gi, xi, x′

i , yi) where we have ◮ a graph Gi, ◮ two layouts xi, x′ i ∈ Rp of that graph (feature vectors), ◮ a comparison yi =

     −1 if xi is better if xi is as good as x′

i

1 if x′

i is better.

Goal: find a comparison function g : Rp × Rp → {−1, 0, 1}

◮ Symmetry: g(x, x′) = −g(x′, x). ◮ Good prediction with respect to the zero-one loss E:

minimize

g

  • i∈test

E

  • yi, g(xi, x′

i )

slide-14
SLIDE 14

Learning to rank and compare

We will learn a

◮ Ranking function f : Rp → R. Bigger means a better layout. ◮ Threshold t ∈ R+.

A small difference |f (x′) − f (x)| ≤ t is not significant.

◮ Comparison function gt(x, x′) =

     −1 if f (x′) − f (x) < −t if |f (x′) − f (x)| ≤ t 1 if f (x′) − f (x) > t. The problem becomes minimize

f ,t n

  • i=1

E

  • yi, gt(xi, x′

i )

slide-15
SLIDE 15

Some labeled layouts of a 2-node graph

good 1 good 2 good 3 bad 11 bad 12 bad 13 50 100 150 200 50 100 150 200

  • 300 -200 -100
  • 300 -200 -100
  • 300 -200 -100

x y

slide-16
SLIDE 16

Map 20 layouts xi ∈ R2 to a feature space

0.4 0.8 1.2 1.6 100 200 300

distance angle

label good bad

slide-17
SLIDE 17

Generate 10 pairwise constraints x′

i − xi ∈ R2

0.4 0.8 1.2 1.6 100 200 300

distance angle

label good bad

slide-18
SLIDE 18

10 labeled difference vectors x′

i − xi ∈ R2

  • 1

1

  • 200

200

distance angle

comparison yi

  • 1

1

slide-19
SLIDE 19

All 190 labeled difference vectors x′

i − xi ∈ R2

  • 1

1

  • 200

200

distance angle

comparison yi

  • 1

1

slide-20
SLIDE 20

Max margin comparison function

f (x′) − f (x) = 1 f (x′) − f (x) = −1

  • 1

1

  • 200

200

distance angle

line margin decision comparison yi

  • 1

1 constraint active inactive

slide-21
SLIDE 21

Invariance of ˆ g when switching train direction xi, x′

i

f (x′) − f (x) = 1 f (x′) − f (x) = −1

  • 1

1

  • 200

200

distance angle

line margin decision comparison yi

  • 1

constraint active inactive

slide-22
SLIDE 22

Defining the margin

Recall: for all pairs i ∈ {1, . . . , n} we have

◮ features xi, x′ i ∈ Rp and ◮ comparisons yi ∈ {−1, 0, 1}.

We define

◮ Ranking function f (x) = w⊺x ∈ R. ◮ Threshold t = 1. ◮ Comparison function g1(x, x′) ∈ {−1, 0, 1}.

yi = −1 yi = 0 yi = 1 1

  • 1

1

  • 1

1

  • 1

1

predicted rank difference f (x′

i ) − f (xi)

margin µ

slide-23
SLIDE 23

Max margin comparison is a linear program (LP)

For y ∈ {−1, 0, 1}, let Iy = {i | yi = y} be the corresponding training indices. maximize

µ∈R,w∈Rp µ

subject to µ ≤ 1 − |w⊺(x′

i − xi)|, ∀ i ∈ I0

µ ≤ −1 + w⊺(x′

i − xi)yi, ∀ i ∈ I1 ∪ I−1.

Note: if the optimal µ > 0 then the data are separable.

slide-24
SLIDE 24

Related work: reject, rank, and rate

❳❳❳❳❳❳❳❳❳❳❳

Outputs Inputs single items x pairs of items x, x′ y ∈ {−1, 1} SVM SVMrank y ∈ {−1, 0, 1} Reject option this work

◮ PL Bartlett and MH Wegkamp. Classification with a reject

  • ption using a hinge loss. JMLR, 9:1823–1840, 2008.

(statistical properties of the hinge loss)

◮ T Joachims. Optimizing search engines using clickthrough

  • data. KDD 2002. (SVMrank)

◮ K Zhou et al. Learning to rank with ties. SIGIR 2008.

(boosting, ties are more effective with more output values)

◮ R Herbrich et al. TrueSkill: a Bayesian skill rating system.

NIPS 2006. (generalization of Elo for chess)

slide-25
SLIDE 25

SVMrank is a quadratic program (QP)

minimize

w∈Rp

w⊺w subject to w⊺(x′

i − xi)yi ≥ 1, ∀i ∈ I1 ∪ I−1.

f (x′) − f (x) = 1 f (x′) − f (x) = −1 f (x′) − f (x) = 0

  • 2

2 4

  • 2
  • 1

1 2

distance angle

line margin decision comparison yi

  • 1

1 constraint active inactive

slide-26
SLIDE 26

Conclusions and future work

Learned a function f (x) for ranking a graph layout x.

◮ Features for good performance on real graphs? ◮ Tune layout algorithm parameters to maximize f . ◮ SVMrank is sufficient under what assumption?

slide-27
SLIDE 27

Thank you!

Supplementary slides appear after this one.

slide-28
SLIDE 28

Layout evaluation metrics (features xi, x′

i )

◮ Number of crossing edges (smaller is better) ◮ Aspect ratio (closer to 1:1 is better?) ◮ Symmetry (more is better when the graph has symmetries) ◮ Edge length (small and less variable is better?) ◮ Angle between edge pairs (big is better?) ◮ Area of smallest bounding box (smaller is better to let small

features be more legible) Source: http://en.wikipedia.org/wiki/Graph_drawing# Quality_measures