"Interesting" Paths = Shortest Paths? - - PowerPoint PPT Presentation

interesting paths shortest paths interesting paths
SMART_READER_LITE
LIVE PREVIEW

"Interesting" Paths = Shortest Paths? - - PowerPoint PPT Presentation

WiSP : Weighted Shortest Paths for RDF graphs Gonzalo Tartari, Aidan Hogan DCC, Universidad de Chile "Interesting" Paths = Shortest Paths? "Interesting" Paths Shortest Paths! (Many of the) Existing Approaches Enumerate


slide-1
SLIDE 1

WiSP:

Weighted Shortest Paths for RDF graphs

Gonzalo Tartari, Aidan Hogan

DCC, Universidad de Chile

slide-2
SLIDE 2

"Interesting" Paths = Shortest Paths?

slide-3
SLIDE 3

"Interesting" Paths ≠ Shortest Paths!

slide-4
SLIDE 4

Output

(Many of the) Existing Approaches

Enumerate Paths Score Paths Order/Filter Paths

slide-5
SLIDE 5

Enumerate Paths Output

(Many of the) Existing Approaches

Score Paths Order/Filter Paths Enumerate Paths

slide-6
SLIDE 6

Output

(Many of the) Existing Approaches

Score Paths Order/Filter Paths Enumerate Paths

slide-7
SLIDE 7

Our Approach: Weight Graphs

slide-8
SLIDE 8

Wei eigh ghti ting ng gr graphs hs: No Node des

slide-9
SLIDE 9

Node Weights: Length (Baseline)

...

slide-10
SLIDE 10

Node Weights: Degree

...

slide-11
SLIDE 11

Node Weights: PageRank

...

slide-12
SLIDE 12

Aside: PageRank / directed graph used

...

slide-13
SLIDE 13

Wei eigh ghti ting ng gr graphs hs: Ed Edge ges

slide-14
SLIDE 14

Weighting with only nodes

slide-15
SLIDE 15

Weighting with only nodes

slide-16
SLIDE 16

Edge Weights: Frequency

slide-17
SLIDE 17

Wei eigh ghti ting ng gr graphs hs: No Node des + E s + Edg dges es

slide-18
SLIDE 18

Node + Edge Weights: Degree + Frequency

...

slide-19
SLIDE 19

...

Node + Edge Weights: PageRank + Frequency

slide-20
SLIDE 20

...

Node + Edge Weights: PageRank + Frequency

slide-21
SLIDE 21

...

Node + Edge Weights: [0,1] Normalisation

slide-22
SLIDE 22

Hy Hybrid d No Node de Wei eigh ghts ts

slide-23
SLIDE 23

Node Weights: PageRank

...

Visiting one high-centrality node = Visiting thousands of low-centrality nodes

slide-24
SLIDE 24

Hybrid Node Weights: PageRank + Length

...

slide-25
SLIDE 25

Imp mplem emen entat tation

  • n
slide-26
SLIDE 26

Weighted Shortest-Path Implementation

  • Dijsktra's algorithm:

– Worst case:

Image source: https://github.com/aakash1104/Graph-Algorithms

slide-27
SLIDE 27

Ex Exper erimen ments ts

slide-28
SLIDE 28

Questions

  • Performance:

– How are the runtimes? – How is the scalability?

  • Weighting schemes:

– How similar are paths for different weightings? – Does weighting help find interesting paths? – Which weighting finds the most interesting paths?

slide-29
SLIDE 29

Dataset: Wikidata

  • Truthy dump: 2017-06-07

– 25 million nodes ( -IRIs only) – 90 million edges

slide-30
SLIDE 30

Dataset: Wikidata Slices

slide-31
SLIDE 31

Machine

  • 2 x Intel Xeon Quad Core @1.9GHz
  • 32 GB of RAM
slide-32
SLIDE 32

Weighting Schemes

  • Node

– Degree ( ) – PageRank ( ) – Length ( )

  • Node + Edge

– Degree + Edge Frequency ( ) – PageRank + Edge Frequency ( )

  • Hybrid Node + Edge

– Degree + Length + Edge Frequency ( ) – PageRank + Length + Edge Frequency ( )

slide-33
SLIDE 33

Pe Perfo forma manc nce

slide-34
SLIDE 34

Queries (Node pairs)

  • Queries: 100 node pairs randomly sampled

– From smallest slice ( code < ) – From each slice independently

  • Task: Return one (best) path
slide-35
SLIDE 35

Performance Results (Full Dataset)

slide-36
SLIDE 36

Performance Results ( | Various Scales)

slide-37
SLIDE 37

Comp mpariso ison n of w f wei eigh ghti ting ng sc sche heme mes

slide-38
SLIDE 38

Comparison of path length (full dataset)

slide-39
SLIDE 39

How many pairs give the same path? (full dataset)

slide-40
SLIDE 40

Us User er Stu tudy dy

slide-41
SLIDE 41

Queries: Same type

slide-42
SLIDE 42

Queries: Different types

slide-43
SLIDE 43

User study

  • 10 students
  • 1.6 M dataset
  • Shown all paths for one query together
  • Scores: 1 (very poor) - 7 (very good)
  • 79 complete evaluations

– 4 evaluations per query (node pair) – 553 scores

slide-44
SLIDE 44

Lowest-rated path

mean score 1.25 ( {1,1,1,2} )

slide-45
SLIDE 45

Highest-rated path

mean score 6.0 ( {5,7} )

slide-46
SLIDE 46

Inter-rater agreement

  • Kendall's τ correlation (ordinal scales)

– τ = 0.201 – Slight, positive agreement

  • Two sets of results

– All

  • τ = 0.201, 20 queries, 79 evaluations

– Concordant

  • Queries with positive τ correlation only
  • τ = 0.552, 8 queries, 27 evaluations
slide-47
SLIDE 47

User study: Comparison of weightings

slide-48
SLIDE 48

Dem emo

http://wisp.dcc.uchile.cl

slide-49
SLIDE 49

WiSP Demo

?

slide-50
SLIDE 50

Conc nclus lusion

  • ns
slide-51
SLIDE 51

Conclusions

  • Performance:

– How are the runtimes?

  • A few seconds (1.6 m) to a few minutes (full dataset)

– How is the scalability?

  • Linear (roughly)
  • Weighting schemes:

– How similar are paths for different weightings?

  • |

similar; others not so much

– Does weighting help find interesting paths?

  • Yes!

– Which weighting finds the most interesting paths?

  • No clear winner (

best in most cases)

slide-52
SLIDE 52

Future work

  • Top-k queries
  • Explore more weightings
  • Normalisation / combinations
  • Performance? (Parallelism? Approximation?)
  • ¡¡¡Evaluation!!!