Repairing Entities using Star Constraints in Multi-relational Graphs
Peng Lin1 Qi Song1 Yinghui Wu2,3 Jiaxing Pi4
1 2 3 4
Repairing Entities using Star Constraints in Multi-relational Graphs - - PowerPoint PPT Presentation
Repairing Entities using Star Constraints in Multi-relational Graphs Peng Lin 1 Qi Song 1 Yinghui Wu 2,3 Jiaxing Pi 4 1 2 4 3 Erroneous entities: how to capture? Multi-relational graphs: a labeled graph with attributes on nodes
1 2 3 4
Graph G: a football database
Player
name: VanPersie
Club
name: MU
Club
name: AFC
Player
name: Rooney
Coach
name: Wenger
Stadium
name: OT
city: MAN
Facility
name: AON
city: LD Stadium
name: ATC
city: LDN
Facility
name: EM
city: BZ
playsFor playsFor worksAt teammate trainsAt
coachedBy trainsAt
1
Player
name: VanPersie
Club
name: MU
Club
name: AFC
Player
name: Rooney
Coach
name: Wenger
Stadium
name: OT
city: MAN
Facility
name: AON
city: LD Stadium
name: ATC
city: LDN
Facility
name: EM
city: BZ
playsFor playsFor worksAt teammate trainsAt
coachedBy trainsAt
Graph G: a football database
1
Player
name: VanPersie
Club
name: MU
Club
name: AFC
Player
name: Rooney
Coach
name: Wenger
Stadium
name: OT
city: MAN
Facility
name: AON
city: LD Stadium
name: ATC
city: LDN
Facility
name: EM
city: BZ
playsFor playsFor worksAt teammate trainsAt
coachedBy trainsAt
Graph G: a football database
1
Player
name: VanPersie
Club
name: MU
Club
name: AFC
Player
name: Rooney
Coach
name: Wenger
Stadium
name: OT
city: MAN
Facility
name: AON
city: LD Facility
name: ATC
city: LDN
Stadium
name: EM
city: BZ
playsFor playsFor worksAt teammate trainsAt
coachedBy trainsAt
Graph G: a football database 2
Player
name: VanPersie
Club
name: MU
Club
name: AFC
Player
name: Rooney
Coach
name: Wenger
Stadium
name: OT
city: MAN
Facility
name: AON
city: LD Facility
name: ATC
city: LDN
Stadium
name: EM
city: BZ
playsFor playsFor worksAt teammate trainsAt
coachedBy trainsAt
Graph G: a football database Β§ Paths from Player to Stadium Β§ π! = (playsFor , operates) βͺ (coachedBy , worksAt) 2
Player
name: VanPersie
Club
name: MU
Club
name: AFC
Player
name: Rooney
Coach
name: Wenger
Stadium
name: OT
city: MAN
Facility
name: AON
city: LD Facility
name: ATC
city: LDN
Stadium
name: EM
city: BZ
playsFor playsFor worksAt teammate trainsAt
coachedBy trainsAt
Β§ Paths from Player to Facility Β§ π" = (playsFor , operates) βͺ (teammate#! , trainsAt) Β§ Paths from Player to Stadium Β§ π! = (playsFor , operates) βͺ (coachedBy , worksAt)
Graph G: a football database
2
Graph π», StarFDs Ξ£ (π» does not satisfy Ξ£) Repair π»β (π»β satisfies Ξ£)
StarRepair framework
Error detection
3
Repair
StarFDs: star functional dependencies new constraints for graphs
Graph π», StarFDs Ξ£ (π» does not satisfy Ξ£) Repair π»β (π»β satisfies Ξ£)
Entity repair problem: minimum editing cost, NP-hard and APX-hard Feasible framework with provable guarantees whenever possible StarRepair framework
Error detection
3
Repair
StarFDs: star functional dependencies new constraints for graphs
Graph π», StarFDs Ξ£ (π» does not satisfy Ξ£) Repair π»β (π»β satisfies Ξ£)
Entity repair problem: minimum editing cost, NP-hard and APX-hard Feasible framework with provable guarantees whenever possible StarRepair framework
Error detection Is approximable? Approximation solution Is optimal repairable?
3
Repair Optimal solution
Yes
Heuristic solution
No Yes No
Repair workflow
4
Player Stadium Facility πΊπ πΊπ
π# = (playsFor 0 operates) βͺ (teammate$% 0 trainsAt) π% = (playsFor 0 operates) βͺ (coachedBy 0 worksAt)
4
Player Stadium Facility πΊπ πΊπ
π# = (playsFor 0 operates) βͺ (teammate$% 0 trainsAt) π% = (playsFor 0 operates) βͺ (coachedBy 0 worksAt)
π : π£$. league = EPL, π£!. owner = π£". owner π : π£!. city = π£". city 4
5
Player Stadium Facility πΊπ πΊπ
Star pattern π(π£$)
π : π£&. league = EPL, π£%. owner = π£#. owner π : π£%. city = π£#. city
Player
name: VanPersie
Club
name: MU
Club
name: AFC
Player
name: Rooney
Coach
name: Wenger
Stadium
name: OT
city: MAN
Facility
name: AON
city: LD Facility
name: ATC
city: LDN
Stadium
name: EM
city: BZ
playsFor playsFor worksAt teammate trainsAt
coachedBy trainsAt
5
Player Stadium Facility πΊπ πΊπ
Star pattern π(π£$)
π : π£&. league = EPL, π£%. owner = π£#. owner π : π£%. city = π£#. city
Player
name: VanPersie
Club
name: MU
Club
name: AFC
Player
name: Rooney
Coach
name: Wenger
Stadium
name: OT
city: MAN
Facility
name: AON
city: LD Facility
name: ATC
city: LDN
Stadium
name: EM
city: BZ
playsFor playsFor worksAt teammate trainsAt
coachedBy trainsAt
Player Stadium Facility πΊπ πΊπ
Star pattern π(π£$)
π : π£&. league = EPL, π£%. owner = π£#. owner π : π£%. city = π£#. city
5
Β§ Notations 6
Problem Description Hardness Solution Satisfiability Input: Ξ£ decide whether there exists π» that satisfies Ξ£ NP-complete Implication Input: Ξ£ and π decide whether for all π» satisfy Ξ£, they satisfy π coNP-hard Error detection (validation) Input: π» and Ξ£ Output: all inconsistencies π± PTIME Evaluate regular path queries and validate values
Repair Input: Ξ£ and π» that does not satisfy Ξ£ Ouput: π»β² that satisfies Ξ£ with least repair cost NP-hard APX-hard Approximable cases (PTIME checkable)
Optimal cases
Heuristic cases
π»: graph π: nodes πΉ: edges π: a single StarFD Ξ£: a set of StarFDs π±: all inconsistencies.
7
Player
name: VanPersie
Club
name: MU
Club
name: AFC
Player
name: Rooney
Coach
name: Wenger
Stadium
name: OT
city: MAN
Facility
name: AON
city: LD Facility
name: ATC
city: LDN
Stadium
name: EM
city: BZ
playsFor playsFor worksAt teammate trainsAt
coachedBy trainsAt
7 Two repairs:
π! = {(π€".city, LD, MAN), (π€#.city, BZ, LDN)} π" = {(π€".owner, MUP, CFG), (π€#.owner, EM, ENIC)}
8
8 π½! π½" π½%
π€". π΅" π€!. π΅!
8
Is approximable? Approximation solution Is optimal repairable? Optimal solution
Yes
Heuristic solution
No Yes No
Repair workflow π½! π½" π½%
π€". π΅! π€!. π΅!
Isolated CCs have approximate solutions
π½! π½" π½% π"
'
π(
'
Optimal case
9
Is approximable? Approximation solution Is optimal repairable? Optimal solution
Yes
Heuristic solution
No Yes No
Repair workflow π!
'
π)
'
π*
'
π%
'
Example:
'
' βͺ π( ' βͺ π* '
10
Is approximable? Approximation solution Is optimal repairable? Optimal solution
Yes
Heuristic solution
No Yes No
Repair workflow Example:
' βͺ π( '
' is pruned
π½! π½%
Approximable case
π"
'
π(
'
π!
'
π)
'
π*
'
π%
'
π'
( = {(π€#.owner, MUP, CFG), (π€).owner, EM, ENIC)}
π*
( = {(π€#.owner, MUP, FSG), (π€).owner, EM, ENIC)}
π½"
11
Is approximable? Approximation solution Is optimal repairable? Optimal solution
Yes
Heuristic solution
No Yes No
Repair workflow
Heuristic case
Repair CC1 consisting of π½!, π½", and π½%
π½" π½! π½%
12
Is approximable? Approximation solution Is optimal repairable? Optimal solution
Yes
Heuristic solution
No Yes No
Repair workflow π½(
Heuristic case
π½)
Two new inconsistencies π½) and π½(
Data Description # of nodes # of edges
Yago Knowledge graph 2.1M 4.0M 3 DBPedia Knowledge graph 2.2M 7.4M 4 Yelp Business reviews 1.5M 1.6M 5 IMDb Movie network 5.9M 3.2M 3 13
0.1 1 10 100 1000
YAGO Yelp DBP IMDb
Time (seconds)
StarRepair biBFSRepair SubIsoRepair
StarRepair outperforms biBFSRepair and SubIsoRepair by 3.4 and 7.1 times respectively 14
0.1 1 10 100 1000
YAGO Yelp DBP IMDb
Time (seconds)
StarRepair biBFSRepair SubIsoRepair
StarRepair outperforms biBFSRepair and SubIsoRepair by 3.4 and 7.1 times respectively StarRepair outperforms SubIsoRepair by 10% in f-score (9% in precision and 14% in recall)
0.2 0.4 0.6 0.8 1
YAGO Yelp DBP IMDb
f-score
StarRepair SubIsoRepair
14
0.1 1 10 100 1000
YAGO Yelp DBP IMDb
Time (seconds)
StarRepair biBFSRepair SubIsoRepair
StarRepair outperforms biBFSRepair and SubIsoRepair by 3.4 and 7.1 times respectively StarRepair outperforms SubIsoRepair by 10% in f-score (9% in precision and 14% in recall)
0.2 0.4 0.6 0.8 1
YAGO Yelp DBP IMDb
f-score
StarRepair SubIsoRepair
StarFD: If a person π£, is a politician or president of U.S., and is married to another person π£!, then π£!βs child is π£,βs child. We found more than 100 such errors in Yago.
Person Country Person
π!=marriedTo π"= presidentOf βͺ politicianOf
Person
name: G.W. Bush child: B. Obama
Country
name: U.S.
Person
name: Laura Bush child: Barbara Bush
marriedTo presidentOf
14
Problem StarFDs GFDs Semantic star patterns with regex queries subgraph isomorphism Satisfiability NP-complete coNP-complete Implication coNP-hard NP-complete Error detection (validation) PTIME coNP-complete 15
Β§ Notations 6
Problem Description Hardness Solution Satisfiability Input: Ξ£ decide whether there exists π» that satisfies Ξ£ NP-complete Implication Input: Ξ£ and π decide whether for all π» satisfy Ξ£, they satisfy π coNP-hard Error detection (validation) Input: π» and Ξ£ Output: all inconsistencies π± PTIME Evaluate regular path queries and validate values
Repair Input: Ξ£ and π» that does not satisfy Ξ£ Ouput: π»β² that satisfies Ξ£ with least repair cost NP-hard APX-hard Approximable cases (PTIME checkable)
Optimal cases
Heuristic cases
π»: graph π: nodes πΉ: edges π: a single StarFD Ξ£: a set of StarFDs π±: all inconsistencies.
Kronos: Lightweight Knowledge-based Event Analysis in Cyber-Physical Data Streams To appear in Demo Session