Mayank Kejriwal
Information Sciences Institute/USC kejriwal@isi.edu http://usc-isi-i2.github.io/kejriwal/
Mayank Kejriwal Information Sciences Institute/USC kejriwal@isi.edu - - PowerPoint PPT Presentation
Mayank Kejriwal Information Sciences Institute/USC kejriwal@isi.edu http://usc-isi-i2.github.io/kejriwal/ Given one or more attribute-rich graphs, a training set of linked node pairs, how do we avoid evaluating all node pairs (O|V| 2 ) ?
Information Sciences Institute/USC kejriwal@isi.edu http://usc-isi-i2.github.io/kejriwal/
Blocks
1 2 3 4 5
Apply blocking key e.g. Tokens(LastName) Generate candidate set (7 pairs), apply similarity function
? ? ? ? ? ? ? Dataset 1 Dataset 2 ‘Exhaustive’ set: 4 X 6=24 pairs
Disjunctive Normal Form (DNF) blocking keys
blocking keys
guarantees
DNF blocking for RDF Attribute Clustering (AC) Name Recall Reduction FMeasure Recall Reduction FMeasure Persons 1 100 99.75 99.88 100 98.86 99.43 Persons 2 99.00 99.79 99.39 99.75 99.02 99.38 Restaurants 100 99.73 99.87 100 95.57 99.79 Eprints-Rexa 98.16 99.28 98.72 99.60 99.37 99.48 IM-Similarity 100 98.14 99.06 100 62.79 77.14 IIMB-059 99.76 93.35 96.45 97.33 73.09 83.49 IIMB-062 47.73 98.11 64.22 77.27 90.80 83.49 Libraries 97.96 99.99 98.96 99.99 99.87 99.93 Parks 95.96 94.41 95.18 99.07 88.27 93.36 Video Game 98.73 99.96 99.34 99.72 99.85 99.79 Average 93.73 98.25 95.11 97.27 91.15 93.53