Improving Link Discovery using context-aware link specifications - - PowerPoint PPT Presentation

improving link discovery using context aware link
SMART_READER_LITE
LIVE PREVIEW

Improving Link Discovery using context-aware link specifications - - PowerPoint PPT Presentation

Improving Link Discovery using context-aware link specifications LDOW 2016 PhD candidate Andrea Cimmino Supervised by David Ruiz, University of Seville, Spain Carlos R. Rivero, Rochester Institute of Technology, USA Hi! My name is Andrea


slide-1
SLIDE 1

PhD candidate Andrea Cimmino

Improving Link Discovery using context-aware link specifications

Supervised by David Ruiz, University of Seville, Spain Carlos R. Rivero, Rochester Institute of Technology, USA LDOW 2016

slide-2
SLIDE 2

Hi! My name is Andrea

2

BARI SEVILLE ROCHESTER

slide-3
SLIDE 3

Roadmap
 


Problem statement
 Results
 Future work

slide-4
SLIDE 4

To be or not to be … the same

4

name : “Wei Wang” name : “Wei Wang” email : “weiwang@cs.unc.edu”

T h e s a m e ?

name : “Wei Wang” email : “wwang@unm.edu”

DATASET 2 DATASET 1

slide-5
SLIDE 5

To be or not to be … the same

5

name : “Wei Wang” full-name : “Wei Wang” email : “weiwang@cs.unc.edu”

T h e s a m e ?

full-name : “Wei Wang” email : “wwang@unm.edu”

DATASET 2 DATASET 1 Link Specification (LSAR): Levenshtein( name, full-name) ≤ 0.42

slide-6
SLIDE 6

Article Paper Award

To be or not to be … the same

6

writes leads supports LSAR Some publications in common?

… …

slide-7
SLIDE 7

Article Paper Award

To be or not to be … the same

7

writes leads supports LSAR Some publications in common?

… …

  • 1. RDF, OWL
slide-8
SLIDE 8

Article Paper Award

To be or not to be … the same

8

writes leads supports LSAR Some publications in common?

… …

  • 1. RDF, OWL
  • 2. ≠ Vocabularies
slide-9
SLIDE 9

Article Paper Award

To be or not to be … the same

9

writes leads supports LSAR Some publications in common?

… …

  • 1. RDF, OWL
  • 2. ≠ Vocabularies
  • 3. Rule generation
slide-10
SLIDE 10

Article Paper Award

To be or not to be … the same

10

writes leads supports LSAR Some publications in common?

… …

  • 1. RDF, OWL
  • 2. ≠ Vocabularies
  • 3. Rule generation
  • 4. Context
slide-11
SLIDE 11

Overlap Factor

11

  • wl:sameAs (LSAR)
  • wl:sameAs (LSAP)

EXISTS FOR ALL Contex-Aware Link Specification: FOR ALL Levenshtein( name, full-name) ≤ 0.42 AND EXISTS Levenshtein (title, title) < 1.20

slide-12
SLIDE 12

Applying LSAR

12

name : “Wei Wang” full-name : “Wei Wang” email : “weiwang@cs.unc.edu”

T h e s a m e ?

full-name : “Wei Wang” email : “wwang@unm.edu”

The same?

slide-13
SLIDE 13

Applying LSAR

13

name : “Wei Wang” full-name : “Wei Wang” email : “weiwang@cs.unc.edu”

T h e s a m e ?

full-name : “Wei Wang” email : “wwang@unm.edu”

The same?

  • wl:sameAs
  • wl:sameAs

wrongly linked correctly linked

slide-14
SLIDE 14

Applying CALS

14

name : “Wei Wang” full-name : “Wei Wang” email : “weiwang@cs.unc.edu”

T h e s a m e ?

full-name : “Wei Wang” email : “wwang@unm.edu”

The same?

date: “2007” title: “Efficient computation …” year: “2007” title: “Direct Oxidative Conversion…” date: “2012” title: “HolisticTtwig…” title: “Efficient computation …”

slide-15
SLIDE 15

Applying CALS

15

name : “Wei Wang” full-name : “Wei Wang” email : “weiwang@cs.unc.edu” full-name : “Wei Wang” email : “wwang@unm.edu” date: “2007” title: “Efficient computation …” year: “2007” title: “Direct Oxidative Conversion…” date: “2012” title: “HolisticTtwig…” title: “Efficient computation …”

  • w

l : s a m e A s wrongly linked correctly linked

slide-16
SLIDE 16

Roadmap
 


Problem statement
 Results
 Future work

slide-17
SLIDE 17

♦ Scenarios

Experiments

17

Scenario 1 – DBLP-NSF DBLP NSF Author 764 Researcher 235 Article 47,225 Award 235 Paper 6,877

  • wl:sameAs

Author ~ Researcher 188 Scenario 2 – DBLP-DBLP DBLP Author 58 Article 5,284

  • wl:sameAs

Author ~Author 62

slide-18
SLIDE 18

DBLP-NSF improving precision

18

Link Specification (LS1) Context-Aware Link Specification CALS 0.83 1.00 LS1: Jaro(name, full-name) < Threshold CALS: for all BEST(LS1) and exists Jaro(title, title) < Threshold

slide-19
SLIDE 19

DBLP-DBLP improving recall

19

Link Specification (LS1) Context-Aware Link Specification CALS 0.83 1.00 LS1: Jaro(name, name) < Threshold CALS: for all Jaro(title, title) < Threshold

slide-20
SLIDE 20

DBLP-NSF GenLink evaluation results

20

LS for DBLP-NSF ID Examples(+/-) Link LSN1 (+1, -1) Author ~ Researcher LSN5 (+5, -5) Author ~ Researcher LSN10 (+5, -5) Author ~ Researcher LST1 (+1, -1) Article ~ Paper LST5 (+5, -5) Article ~ Paper LST10 (+10, -10) Article ~ Paper CALS for DBLP-NSF for link Author~ Researcher ID P R for all LSN1 and exists LST1 0.94 1.0 for all LSN5 and exists LST5 1.0 0.38 for all LSN10 and exists LST10 1.0 0.95 Best improvement 0.24 LS for DBLP-NSF ID P R LSN1 0.76 1.0 LSN5 0.76 1.0 LSN10 0.76 1.0

slide-21
SLIDE 21

DBLP-DBLP GenLink evaluation results

21

LS for DBLP-NSF ID Examples(+/-) Link LSN1 (+1, -1) Author ~ Author LSN5 (+5, -5) Author ~ Author LSN10 (+5, -5) Author ~ Author LST1 (+1, -1) Article ~ Article LST5 (+5, -5) Article ~ Article LST10 (+10, -10) Article ~ Article CALS for DBLP-NSF for link Author ~ Author ID P R for all LST1 1.00 0.84 for all LST5 1.00 0.84 for all LST10 1.00 0.84 Best impr. 0.58 LS ID P R LSN1 1.00 0.26 LSN5 1.00 0.30 LSN10 1.00 0.26

slide-22
SLIDE 22

Roadmap
 


Problem statement
 Results
 Future work

slide-23
SLIDE 23

Award

Current work

23

WWW2017 WWW2017

Au Australia

Article Paper writes leads supports LSAR

… …

LSAR1, LSAR2 co-leads co-author

slide-24
SLIDE 24

Future work

24

DATASETS SETUP TECHNIQUES E x p e r i m e n t s E x p e r i m e n t s a and Anal Anal yses yses tr tr ac ack

slide-25
SLIDE 25

Award

Future Future work

25

Article Paper writes leads supports

… …

  • wl:sameAs

co-leads co-author

  • wl:sameAs
slide-26
SLIDE 26

Andrea Cimmino cimmino@us.es http://tdg-seville.info/acimmino

THANKS! Queries?

slide-27
SLIDE 27

Features

♦ R1: Input RDF, not OWL. ♦ R2: Handle different schemas/vocabularies ♦ R3: Rule based (LS) ♦ R4: Context aware ♦ R5: Efficient context

27

slide-28
SLIDE 28

Related Work

28

Technique R1 R2 R3 R4 R5 RiMOM ¡

  • Nikolov et al.
  • AgreementMaker ¡
  • GenLink ¡
  • CODI ¡ ¡ ¡
  • EAGLE ¡
  • LOGMAP ¡
  • Zhishi.links ¡
  • SLINT+ ¡
  • SignoProsik ¡

~

  • ~

SERIMI ¡

  • Song and Heflin
  • PARIS
  • Hassanzadeh et al.
slide-29
SLIDE 29

DBLP-NSF GenLink LS

dblp:name, nsf:name Jaccard ≤ 0.37 LSN1 Jaccard ≤ 0.37 LSN5 Jaccard ≤ 0.21 LSN10 dblp:title, nsf:title Levenshtein ≤ 29.48 LST1 Levenshtein ≤ 0.59 LST5 Levenshtein ≤ 7.05 LST10

29

slide-30
SLIDE 30

DBLP-DBLP GenLink LS

dblp:name, nsf:name Jaccard ≤ 0.15 LSN1 Levenshtein ≤ 1.48 LSN5 Levenshtein ≤ 1.15 LSN10 dblp:title, nsf:title Levenshtein ≤ 1.76 LST1 Levenshtein ≤ 1.46 LST5 Levenshtein ≤ 1.76 LST10

30

slide-31
SLIDE 31

Link Specification model

31

slide-32
SLIDE 32

Link Specification extended (context)

32

  • F: OverlapFactor

C-ASameAsCondition f: Aggregation ConditionComposite C-ACondition source: Class target: Class C-ALinkSpecification * prop: ObjectProperty dataset: {SRC, TRG} LeafNode source: Class target: Class LinkSpecification 2 *