SLIDE 1 WiSP:
Weighted Shortest Paths for RDF graphs
Gonzalo Tartari, Aidan Hogan
DCC, Universidad de Chile
SLIDE 2
"Interesting" Paths = Shortest Paths?
SLIDE 3
"Interesting" Paths ≠ Shortest Paths!
SLIDE 4 Output
(Many of the) Existing Approaches
Enumerate Paths Score Paths Order/Filter Paths
SLIDE 5 Enumerate Paths Output
(Many of the) Existing Approaches
Score Paths Order/Filter Paths Enumerate Paths
SLIDE 6 Output
(Many of the) Existing Approaches
Score Paths Order/Filter Paths Enumerate Paths
SLIDE 7
Our Approach: Weight Graphs
SLIDE 8
Wei eigh ghti ting ng gr graphs hs: No Node des
SLIDE 9
Node Weights: Length (Baseline)
...
SLIDE 10
Node Weights: Degree
...
SLIDE 11
Node Weights: PageRank
...
SLIDE 12
Aside: PageRank / directed graph used
...
SLIDE 13
Wei eigh ghti ting ng gr graphs hs: Ed Edge ges
SLIDE 14
Weighting with only nodes
SLIDE 15
Weighting with only nodes
SLIDE 16
Edge Weights: Frequency
SLIDE 17
Wei eigh ghti ting ng gr graphs hs: No Node des + E s + Edg dges es
SLIDE 18
Node + Edge Weights: Degree + Frequency
...
SLIDE 19
...
Node + Edge Weights: PageRank + Frequency
SLIDE 20
...
Node + Edge Weights: PageRank + Frequency
SLIDE 21
...
Node + Edge Weights: [0,1] Normalisation
SLIDE 22
Hy Hybrid d No Node de Wei eigh ghts ts
SLIDE 23 Node Weights: PageRank
...
Visiting one high-centrality node = Visiting thousands of low-centrality nodes
SLIDE 24
Hybrid Node Weights: PageRank + Length
...
SLIDE 25 Imp mplem emen entat tation
SLIDE 26 Weighted Shortest-Path Implementation
– Worst case:
Image source: https://github.com/aakash1104/Graph-Algorithms
SLIDE 27
Ex Exper erimen ments ts
SLIDE 28 Questions
– How are the runtimes? – How is the scalability?
– How similar are paths for different weightings? – Does weighting help find interesting paths? – Which weighting finds the most interesting paths?
SLIDE 29 Dataset: Wikidata
– 25 million nodes ( -IRIs only) – 90 million edges
SLIDE 30
Dataset: Wikidata Slices
SLIDE 31 Machine
- 2 x Intel Xeon Quad Core @1.9GHz
- 32 GB of RAM
SLIDE 32 Weighting Schemes
– Degree ( ) – PageRank ( ) – Length ( )
– Degree + Edge Frequency ( ) – PageRank + Edge Frequency ( )
– Degree + Length + Edge Frequency ( ) – PageRank + Length + Edge Frequency ( )
SLIDE 33
Pe Perfo forma manc nce
SLIDE 34 Queries (Node pairs)
- Queries: 100 node pairs randomly sampled
– From smallest slice ( code < ) – From each slice independently
- Task: Return one (best) path
SLIDE 35
Performance Results (Full Dataset)
SLIDE 36
Performance Results ( | Various Scales)
SLIDE 37
Comp mpariso ison n of w f wei eigh ghti ting ng sc sche heme mes
SLIDE 38
Comparison of path length (full dataset)
SLIDE 39
How many pairs give the same path? (full dataset)
SLIDE 40
Us User er Stu tudy dy
SLIDE 41
Queries: Same type
SLIDE 42
Queries: Different types
SLIDE 43 User study
- 10 students
- 1.6 M dataset
- Shown all paths for one query together
- Scores: 1 (very poor) - 7 (very good)
- 79 complete evaluations
– 4 evaluations per query (node pair) – 553 scores
SLIDE 44
Lowest-rated path
mean score 1.25 ( {1,1,1,2} )
SLIDE 45
Highest-rated path
mean score 6.0 ( {5,7} )
SLIDE 46 Inter-rater agreement
- Kendall's τ correlation (ordinal scales)
– τ = 0.201 – Slight, positive agreement
– All
- τ = 0.201, 20 queries, 79 evaluations
– Concordant
- Queries with positive τ correlation only
- τ = 0.552, 8 queries, 27 evaluations
SLIDE 47
User study: Comparison of weightings
SLIDE 48
Dem emo
http://wisp.dcc.uchile.cl
SLIDE 49
WiSP Demo
?
SLIDE 50 Conc nclus lusion
SLIDE 51 Conclusions
– How are the runtimes?
- A few seconds (1.6 m) to a few minutes (full dataset)
– How is the scalability?
- Linear (roughly)
- Weighting schemes:
– How similar are paths for different weightings?
similar; others not so much
– Does weighting help find interesting paths?
– Which weighting finds the most interesting paths?
best in most cases)
SLIDE 52 Future work
- Top-k queries
- Explore more weightings
- Normalisation / combinations
- Performance? (Parallelism? Approximation?)
- ¡¡¡Evaluation!!!