Querying and Creating Visualizations by Analogy Carlos E. - - PowerPoint PPT Presentation

querying and creating visualizations by analogy
SMART_READER_LITE
LIVE PREVIEW

Querying and Creating Visualizations by Analogy Carlos E. - - PowerPoint PPT Presentation

Querying and Creating Visualizations by Analogy Carlos E. Scheidegger, Huy T. Vo, David Koop, Juliana Freire, Cludio T. Silva SCI Institute, School of Computing University of Utah Outline Provenance reuse We have all this rich


slide-1
SLIDE 1

Querying and Creating Visualizations by Analogy

Carlos E. Scheidegger, Huy T. Vo, David Koop, Juliana Freire, Cláudio T. Silva SCI Institute, School of Computing University of Utah

slide-2
SLIDE 2

www.vistrails.org

Outline

  • Provenance reuse
  • We have all this rich metadata - let’s use it
  • Query-by-example
  • Visualization by Analogy
  • (VisTrails intro)
  • Transparent provenance tracking
slide-3
SLIDE 3

www.vistrails.org

Related Work

  • Visualization Systems and Libraries
  • AVS, DX, SCIRun, VTK
  • History tracking and formalisms
  • Jankun-Kelly et al’s pset-calculus
  • Kreuseler et al, VDM history
  • Brodlie’s et al’s GRASPARC
  • VisTrails
slide-4
SLIDE 4

www.vistrails.org

Provenance

  • The “pedigree” of an artifact
  • Where did it come from? Who held it?
slide-5
SLIDE 5

www.vistrails.org

Provenance in VisTrails

  • Process provenance
  • How was this visualization created?
slide-6
SLIDE 6

www.vistrails.org

Version Tree

  • Persistent
  • Transparent
  • Reuse
  • Can we do

better than just presenting?

slide-7
SLIDE 7

www.vistrails.org

Why not query languages?

slide-8
SLIDE 8

www.vistrails.org

Why not query languages?

wf{*}: upstream(x) union x where x.module = “SoftMean” and executed (x) and y in upstream(x) and y.module = “AlignWarp” and y.parameter(“model”) = “12”

slide-9
SLIDE 9

www.vistrails.org

Why not query languages?

wf{*}: upstream(x) union x where x.module = “SoftMean” and executed (x) and y in upstream(x) and y.module = “AlignWarp” and y.parameter(“model”) = “12”

This is still only mildly better than straight SQL... Does not expose mapping to relational schema

slide-10
SLIDE 10

www.vistrails.org

Query-by-Example

  • Do not teach the user new forms of interaction!
slide-11
SLIDE 11

www.vistrails.org

Visualization by Analogy

  • Create new visualizations by saying “do as they

did”

  • Specify what, not how
slide-12
SLIDE 12

www.vistrails.org

Query-by-Example

  • Trivially reducible from MAX-CLIQUE
  • ... and MAX-CLIQUE is NP-Complete
  • ... and MAX-CLIQUE is fundamentally hard to

approximate

  • Solution: algorithm tailored to problem domain
slide-13
SLIDE 13

www.vistrails.org

1 2 3

Query-by-Example

  • Split every subgraph in topologically sorted layers
  • Ok, since all pipelines are DAGs in VisTrails
slide-14
SLIDE 14

www.vistrails.org

Query-by-Example

  • Now search for layers that are connected in the

same way in the database

1 2 3

Query Database

slide-15
SLIDE 15

www.vistrails.org

4 5 1 2 3

Match

Query-by-Example

  • Now search for layers that are connected in the

same way in the database

1 2 3

Query Database

slide-16
SLIDE 16

www.vistrails.org

1 4 2 3

No match

4 5 1 2 3

Match

Query-by-Example

  • Now search for layers that are connected in the

same way in the database

1 2 3

Query Database

slide-17
SLIDE 17

www.vistrails.org

1 2 3

No match

1 4 2 3

No match

4 5 1 2 3

Match

Query-by-Example

  • Now search for layers that are connected in the

same way in the database

1 2 3

Query Database

slide-18
SLIDE 18

www.vistrails.org

Query-by-Example

  • Might return false positives - it ignores the

particular connectivity between topological layers

1 2 3

Query Database

  • Not too harmful - most modules cannot connect to
  • ne another
slide-19
SLIDE 19

www.vistrails.org

Query-by-Example

  • Might return false positives - it ignores the

particular connectivity between topological layers

4 5 1 2 3

1 2 3

Query Database

  • Not too harmful - most modules cannot connect to
  • ne another
slide-20
SLIDE 20

www.vistrails.org

Query-by-Example

  • Might return false positives - it ignores the

particular connectivity between topological layers

4 5 1 2 3

1 2 3

Query

4 5 1 2 3

Database

  • Not too harmful - most modules cannot connect to
  • ne another
slide-21
SLIDE 21

www.vistrails.org

4 5 1 2 3

Query-by-Example

  • Might return false positives - it ignores the

particular connectivity between topological layers

4 5 1 2 3

1 2 3

Query

4 5 1 2 3

Database

  • Not too harmful - most modules cannot connect to
  • ne another
slide-22
SLIDE 22

www.vistrails.org

QBE Demo

slide-23
SLIDE 23

www.vistrails.org

Vistrail diffs

  • A version tree stores a set of actions
  • Each action is a function on the set of all

possible visualizations:

  • We can use those to determine the difference

between visualizations

  • Moving up, then down the version tree

V → V an ◦ an−1 ◦ an−2 · · · ◦ a0

slide-24
SLIDE 24

www.vistrails.org

  • Action to go from A to B is

Vistrail diffs

A B

a0 a1 a2 a3

a3 ◦ a2 ◦ a−1

  • a−1

1

slide-25
SLIDE 25

www.vistrails.org

Visualization by Analogy

  • A diff is a template: reapply it elsewhere
  • How do we match two pipelines?
slide-26
SLIDE 26

www.vistrails.org

  • Compute the difference
  • Compute the map
  • Apply to
  • Compute the new pipeline

Algorithm Overview

mapac = map(pa, pc) δab = ∆(pa, pb) pd = δ

cb(pc)

δ

cb = mapac(δab)

δab

mapac

slide-27
SLIDE 27

www.vistrails.org

Visualization by Analogy

  • Simplest version is again reducible from MAX-

CLIQUE

  • We will now use a probabilistic argument to create

a Markov chain

slide-28
SLIDE 28

www.vistrails.org

How does it work?

  • Module compatibility: prior
  • Independent of graph topology
  • Probability of match between a pair
  • Dependent of graph topology
  • Linear combination of probability of match in

the neighborhood pairs and data

  • This is a Markov chain!

f : M2 → [0,1]

slide-29
SLIDE 29

www.vistrails.org

How does it work?

  • Graph product G of the two input graphs
  • each vertex in G represents a possible match
  • similarity is then defined as
  • is an eigenvector of
  • It is the limit distribution of the transition matrix

π = αA(G)π+(1−α)c(G) = MGπ

π MG

slide-30
SLIDE 30

www.vistrails.org

GA GB GA ×GB How does it work?

slide-31
SLIDE 31

www.vistrails.org

GA GB GA ×GB How does it work?

slide-32
SLIDE 32

www.vistrails.org

GA GB GA ×GB How does it work?

slide-33
SLIDE 33

www.vistrails.org

GA GB GA ×GB How does it work?

slide-34
SLIDE 34

www.vistrails.org

GA GB GA ×GB How does it work?

slide-35
SLIDE 35

www.vistrails.org

How does it work?

Each node is assigned some initial value. (It doesn’t matter which, as long as the values sum to one!)

slide-36
SLIDE 36

www.vistrails.org

How does it work?

pk(a0 → b0) pk(a0 → b1) pk(a0 → b2) pk(a0 → b3) pk(a1 → b0)

slide-37
SLIDE 37

www.vistrails.org

How does it work?

pk(a0 → b0) pk(a0 → b1) pk(a0 → b2) pk(a0 → b3) pk(a1 → b0) pk+1(a0 → b0) = (1 − α)c(a0, b0) + α/3 (pk(a0 → b3)+ pk(a0 → b1)+ pk(a1 → b0))

slide-38
SLIDE 38

www.vistrails.org

How does it work?

pk+1(a0 → b0) = (1 − α)c(a0, b0) + α/3 (pk(a0 → b3)+ pk(a0 → b1)+ pk(a1 → b0))

slide-39
SLIDE 39

www.vistrails.org

How does it work?

pk+1(a0 → b0) = (1 − α)c(a0, b0) + α/3 (pk(a0 → b3)+ pk(a0 → b1)+ pk(a1 → b0)) c(a0, b0)

slide-40
SLIDE 40

www.vistrails.org

How does it work?

pk+1(a0 → b0) = (1 − α)c(a0, b0) + α/3 (pk(a0 → b3)+ pk(a0 → b1)+ pk(a1 → b0)) c(a0, b0) Do it for all nodes, until convergence

slide-41
SLIDE 41

www.vistrails.org

How does it work?

  • is defined over graph product
  • For each module in the second pipeline, pick

maximal value of on first pipeline: this is the match

  • Many others possible

π π

slide-42
SLIDE 42

www.vistrails.org

The matching algorithm

slide-43
SLIDE 43

www.vistrails.org

The matching algorithm

slide-44
SLIDE 44

www.vistrails.org

The matching algorithm

slide-45
SLIDE 45

www.vistrails.org

The matching algorithm

slide-46
SLIDE 46

www.vistrails.org

Failure Modes

  • Analogies are not fool-proof
slide-47
SLIDE 47

www.vistrails.org

Case study

  • Creating a complex visualization out of simple
  • nes
  • (demo)
slide-48
SLIDE 48

www.vistrails.org

Discussion

  • If your system can encode actions as functions on

the space of objects of interest, store these explicitly

  • That will be your “version tree” - everything else is

just the same

  • Easy to incorporate domain-specific knowledge in

analogies: change and

A(G) c(G)

slide-49
SLIDE 49

www.vistrails.org

Acknowledgments

  • Sarang Joshi, Suresh Venkatasubramanian, Erik

Anderson, João Comba

  • VisTrails dev team
  • Many open source packages and devs: VTK, SciPy,

teem, matplotlib

  • VisTrails is open source! http://www.vistrails.org
  • Shameless plug: Visit the SCI booth!
  • NSF, DOE, IBM Faculty Award
slide-50
SLIDE 50

www.vistrails.org

Thank you!

  • Questions?
slide-51
SLIDE 51

www.vistrails.org

Too much data

  • We are better off with visualization systems than

without - but it’s still pretty messy

slide-52
SLIDE 52

www.vistrails.org

Video