Staging User Feedback toward Rapid Conflict Resolution in Data Fusion
Romila P Pradhan*, Siarhei Bykau , Sunil Prabhakar* *Purdue University, Bloomberg L.P.
1
Staging User Feedback toward Rapid Conflict Resolution in Data - - PowerPoint PPT Presentation
Staging User Feedback toward Rapid Conflict Resolution in Data Fusion Romila P Pradhan* , Siarhei Bykau , Sunil Prabhakar* *Purdue University, Bloomberg L.P. 1 Fusing data from multiple sources Data It Item S 1 S 2 S 3 S 4 Zootopia Howard
1
2
Data It Item S1 S2 S3 S4 Zootopia Howard Spencer Spencer Kung Fu Panda Stevenson Nelson Inside Out leFauve Docter Finding Dory Stanton Minions Coffin Renaud Rio Jones Saldanha
Data It Item Correctness o
claims Zootopia Howard (0.000) Spencer (1.000) Kung Fu Panda Stevenson (0.015) Nelson (0.985) Inside Out leFauve (0.001) Docter (0.999) Finding Dory Stanton (1.000) Minions Coffin (0.921) Renaud (0.079) Rio Jones (0.015) Saldanha (0.985) Data It Item Correctness o
claims Zootopia Howard (0.000) Spencer ( (1.0 .000) Kung Fu Panda Stevenson (0.015) Nelson ( (0.9 .985) Inside Out leFauve (0.001) Do Docter (0.9 .999) Finding Dory Stanton ( (1.0 .000) Minions Coffin ( (0.9 .921) Renaud (0.079) Rio Jones (0.015) Sa Saldanha (0.9 .985)
3
Data It Item S1 S2 S3 S4 Zootopia Howard Spencer Spencer Kung Fu Panda Stevenson Nelson Inside Out leFauve Docter Finding Dory Stanton Minions Coffin Renaud Rio Jones Saldanha
Source accuracy Correctness
iterative computation So Source Ac Accuracy S1 0.317 S2 0.027 S3 0.992 S4 1.000
ACCU1
[1] Xin Luna Dong, Laure Berti-Equille, Divesh Srivastava. Data Fusion: Resolving Conflicts from Multiple Sources. WAIM 2013.
Data It Item Correctness o
claims Zootopia Howard (0.000) Spencer ( (1.0 .000) Kung Fu Panda Stevenson (0.015) Nelson ( (0.9 .985) Inside Out leFauve (0.001) Do Docter (0.9 .999) Finding Dory Stanton ( (1.0 .000) Minions Coffin ( (0.9 .921) Renaud (0.079) Rio Jones (0.015) Sa Saldanha (0.9 .985)
4
Data It Item S1 S2 S3 S4 Zootopia Howard Spencer Spencer Kung Fu Panda Stevenson Nelson Inside Out leFauve Docter Finding Dory Stanton Minions Coffin Renaud Rio Jones Saldanha
Source accuracy Correctness
iterative computation So Source Ac Accuracy S1 0.317 S2 0.027 S3 0.992 S4 1.000
ACCU1
[1] Xin Luna Dong, Laure Berti-Equille, Divesh Srivastava. Data Fusion: Resolving Conflicts from Multiple Sources. WAIM 2013.
Data It Item Tr Truth Zootopia Howard Kung Fu Panda Stevenson Inside Out Docter Finding Dory Stanton Minions Coffin Rio Saldanha
5
D
Data Fusion Model Correctness
Validate data item Labels User feedback to fusion model
6
7
Query-by-committee
Uncertainty Sampling
4 ranking strategies Maximum Expected Utility Approximate MEU
Item-level ranking
Feedback Errors Evaluation Non-expert
feedback
20 40 60 80 100 ∆ distance_to_ground_truth (%) data items validated (%)
Random QBC US ApproxMEU MEU GUBHolistic ranking
Data It Item S1 S2 S3 S4 Zootopia Howard Spencer Spencer Kung Fu Panda Stevenson Nelson Inside Out leFauve Docter Finding Dory Stanton Minions Coffin Renaud Rio Jones Saldanha
8
item-level ranking holistic ranking feedback errors evaluation
most sources agree
Zootopia Howard Spencer Spencer Rio Jones Saldanha
sources disagree
Data It Item Correctness o
f claims Zootopia Howard (0.000) Spencer (1.000) Kung Fu Panda Stevenson (0.015) Nelson (0.985) Inside Out leFauve (0.001) Docter (0.999) Finding Dory Stanton (1.000) Minions Coffin (0.921) Renaud (0.079) Rio Jones (0.015) Saldanha (0.985)
9
item-level ranking holistic ranking feedback errors evaluation
Data It Item Correctness o
f claims Zootopia Howard (0.000) Spencer (1.000) Kung Fu Panda Stevenson (0.015) Nelson (0.985) Inside Out leFauve (0.001) Docter (0.999) Finding Dory Stanton (1.000) Minions Coffin (0.921) Renaud (0.079) Rio Jones (0.015) Saldanha (0.985) Kung Fu Panda Stevenson (0.015) Nelson (0.985) Minions Coffin (0.921) Renaud (0.079)
Data It Item S1 S2 S3 S4 Zootopia Howard Spencer Spencer Kung Fu Panda Stevenson Nelson Inside Out leFauve Docter Finding Dory Stanton Minions Coffin Renaud Rio Jones Saldanha
10
item-level ranking holistic ranking feedback errors evaluation
Data Item S1 S2 S3 S4 Zootopia Howard Spencer Spencer Kung Fu Panda Stevenson Nelson Inside Out leFauve Docter Finding Dory Stanton Minions Coffin Renaud Rio Jones Saldanha S2 S3 S4 Renaud Stanton leFauve Saldanha Nelson Docter
Data Item S1 S2 S3 S4 Zootopia Howard Spencer Spencer Kung Fu Panda Stevenson Nelson Inside Out leFauve Docter Finding Dory Stanton Minions Coffin Renaud Rio Jones Saldanha
11
item-level ranking holistic ranking feedback errors evaluation
S4 Spencer
12
item-level ranking holistic ranking feedback errors evaluation
data fusion model truth function average correctness
truth function ?
13
item-level ranking holistic ranking feedback errors evaluation
entropies of all data items
14
item-level ranking holistic ranking feedback errors evaluation
entropy utility if claim is true
15
item-level ranking holistic ranking feedback errors evaluation
no need to fuse for every claim! correctness of claims of validated data item accuracies of sources correctness of claims
items
16
item-level ranking holistic ranking feedback errors evaluation
80% certain about a claim user is correct 85% of the time
Claim3 Claim1 Claim2
6/10 3/10 1/10
17
item-level ranking holistic ranking feedback errors evaluation
Books1 FlightsDay2 Population3 Flights2 Items 1263 5836 40696 121567 Sources 894 38 2545 38 Claims 24303 80452 46734 1931701
Feedback Simulation
18
item-level ranking holistic ranking feedback errors evaluation
ground-truth-utility-based
19
item-level ranking holistic ranking feedback errors evaluation
20 40 60 80 100 ∆ distance_to_ground_truth (%) data items validated (%)
Random QBC US ApproxMEU MEU GUB
Books
20
item-level ranking holistic ranking feedback errors evaluation
20 40 60 80 100 120 ∆ distance_to_ground_truth (%) # data items validated
QBC US ApproxMEU MEU
Population
21
item-level ranking holistic ranking feedback errors evaluation
FlightsDay
20 40 60 80 100 ∆ distance_to_ground_truth (%) data items validated (%)
Random QBC US ApproxMEU MEU GUB
22
item-level ranking holistic ranking feedback errors evaluation
Flights
0.5 1 1.5 2 ∆ distance_to_ground_truth (%) data items validated (%)
QBC US ApproxMEU10
23