Staging User Feedback toward Rapid Conflict Resolution in Data - PowerPoint PPT Presentation

Staging User Feedback toward Rapid Conflict Resolution in Data Fusion Romila P Pradhan* , Siarhei Bykau , Sunil Prabhakar* *Purdue University, Bloomberg L.P. 1

Fusing data from multiple sources Data It Item S 1 S 2 S 3 S 4 Zootopia Howard Spencer Spencer Kung Fu Panda Stevenson Nelson Inside Out leFauve Docter Finding Dory Stanton Minions Coffin Renaud Rio Jones Saldanha 2

Data fusion systems ACCU 1 Data It Item S 1 S 2 S 3 S 4 Zootopia Howard Spencer Spencer Kung Fu Panda Stevenson Nelson Source iterative Correctness Inside Out leFauve Docter computation accuracy of claims Finding Dory Stanton Minions Coffin Renaud Rio Jones Saldanha Data It Data It Item Item Correctness o Correctness o of c of c claims claims So Source Ac Accuracy Zootopia Zootopia Howard (0.000) Howard (0.000) Spencer (1.000) Spencer ( (1.0 .000) S 1 0.317 Kung Fu Panda Kung Fu Panda Stevenson (0.015) Stevenson (0.015) Nelson (0.985) Nelson ( (0.9 .985) S 2 0.027 Inside Out Inside Out leFauve (0.001) leFauve (0.001) Docter (0.999) Do Docter (0.9 .999) S 3 0.992 Finding Dory Finding Dory Stanton (1.000) Stanton ( (1.0 .000) S 4 1.000 Minions Minions Coffin (0.921) Coffin ( (0.9 .921) Renaud (0.079) Renaud (0.079) Rio Rio Jones (0.015) Jones (0.015) Saldanha (0.9 Sa Saldanha (0.985) .985) [1] Xin Luna Dong, Laure Berti-Equille, Divesh Srivastava. Data Fusion: Resolving Conflicts from Multiple Sources. WAIM 2013. 3

Comparison with ground truth ACCU 1 Data It Item S 1 S 2 S 3 S 4 Zootopia Howard Spencer Spencer Kung Fu Panda Stevenson Nelson Source iterative Correctness Inside Out leFauve Docter computation accuracy of claims Finding Dory Stanton Minions Coffin Renaud Rio Jones Saldanha Data It Item Tr Truth Data It Item Correctness o of c claims So Source Ac Accuracy Zootopia Howard Zootopia Howard (0.000) Spencer ( (1.0 .000) S 1 0.317 Kung Fu Panda Stevenson Kung Fu Panda Stevenson (0.015) Nelson ( (0.9 .985) S 2 0.027 Inside Out Docter Inside Out leFauve (0.001) Docter (0.9 Do .999) S 3 0.992 Finding Dory Stanton Finding Dory Stanton ( (1.0 .000) S 4 1.000 Minions Coffin Minions Coffin ( (0.9 .921) Renaud (0.079) Rio Saldanha Rio Jones (0.015) Sa Saldanha (0.9 .985) [1] Xin Luna Dong, Laure Berti-Equille, Divesh Srivastava. Data Fusion: Resolving Conflicts from Multiple Sources. WAIM 2013. 4

Involve the User Validate data item Data Fusion Correctness D Model of claims User feedback to fusion model Labels 5

How to be most effective with user feedback? 6

This talk 4 ranking strategies Query-by-committee Maximum Expected Utility Item-level ranking Holistic ranking Approximate MEU Uncertainty Sampling Feedback Errors Evaluation ∆ distance_to_ground_truth (%) 0 Random QBC US • Confidence ApproxMEU -20 MEU GUB • Error-rate -40 • Conflicting -60 Non-expert feedback -80 0 20 40 60 80 100 data items validated (%) 7

item-level ranking holistic ranking feedback errors evaluation Query-by-committee (QBC) most sources agree Data It Item S 1 S 2 S 3 S 4 Zootopia Howard Spencer Spencer Zootopia Howard Spencer Spencer Kung Fu Panda Stevenson Nelson Inside Out leFauve Docter sources disagree Finding Dory Stanton Minions Coffin Renaud Rio Jones Saldanha Rio Jones Saldanha 8

item-level ranking holistic ranking feedback errors evaluation Uncertainty Sampling (US) Data It Item Correctness o of c f claims Data It Item Correctness o of c f claims Zootopia Howard (0.000) Spencer (1.000) Zootopia Howard (0.000) Spencer (1.000) Kung Fu Panda Stevenson (0.015) Nelson (0.985) Kung Fu Panda Stevenson (0.015) Nelson (0.985) Kung Fu Panda Stevenson (0.015) Nelson (0.985) Inside Out leFauve (0.001) Docter (0.999) Inside Out leFauve (0.001) Docter (0.999) Finding Dory Stanton (1.000) Finding Dory Stanton (1.000) Minions Coffin (0.921) Renaud (0.079) Minions Coffin (0.921) Renaud (0.079) Minions Coffin (0.921) Renaud (0.079) Rio Jones (0.015) Saldanha (0.985) Rio Jones (0.015) Saldanha (0.985) 9

item-level ranking holistic ranking feedback errors evaluation Implication of a validation S 2 S 3 S 4 Data It Data Item Item S 1 S 1 S 2 S 2 S 3 S 3 S 4 S 4 Zootopia Zootopia Howard Howard Spencer Spencer Spencer Spencer Kung Fu Panda Kung Fu Panda Stevenson Stevenson Nelson Nelson Nelson Inside Out Inside Out leFauve leFauve Docter Docter leFauve Docter Finding Dory Finding Dory Stanton Stanton Stanton Minions Minions Coffin Coffin Renaud Renaud Renaud Rio Rio Jones Jones Saldanha Saldanha Saldanha 10

item-level ranking holistic ranking feedback errors evaluation Implication of a validation Data Item S 1 S 2 S 3 S 4 S 4 Zootopia Howard Spencer Spencer Spencer Kung Fu Panda Stevenson Nelson Inside Out leFauve Docter Finding Dory Stanton Minions Coffin Renaud Rio Jones Saldanha 11

item-level ranking holistic ranking feedback errors evaluation Ideal utility function truth function ? truth function fusion model data average correctness Utility Function of true claims 12

item-level ranking holistic ranking feedback errors evaluation Practical utility function over correctness of all claims entropies of all data items Entropy Utility Function 13

item-level ranking holistic ranking feedback errors evaluation Maximum Expected Utility (MEU) § Value of perfect information entropy utility if claim is true Best alternative in the absence of ground truth 14

item-level ranking holistic ranking feedback errors evaluation Approximate-MEU • Key idea: Propagation of changes correctness of correctness of claims accuracies of claims of validated of unvalidated data sources data item items no need to fuse for every claim! removed bottleneck iterative computation of MEU 15

item-level ranking holistic ranking feedback errors evaluation Users can be wrong o Honest but unsure user 80% certain about a claim o Error-rate of user user is correct 85% of the time o Conflicting feedback from a crowd of workers Claim1 Claim2 Claim3 6/10 3/10 1/10 16

item-level ranking holistic ranking feedback errors evaluation Real-world datasets Books 1 FlightsDay 2 Population 3 Flights 2 Items 1263 5836 40696 121567 Sources 894 38 2545 38 Claims 24303 80452 46734 1931701 Feedback Simulation o Books: silver standard provided in [4] o Flight information: data provided by carrier websites considered ground truth o Population: manually identified the true claim for data items having multiple claims 1. X. L. Dong, L. Berti-Equille, and D. Srivastava. Integrating conflicting data: The role of source dependence. PVLDB, 2009 2. X. Li, X. L. Dong, K. Lyons, W. Meng, and D. Srivastava. Truth finding on the deep web: Is the problem solved? PVLDB, 2012 3. J. Pasternack and D. Roth. Knowing what to believe(when you already know something). COLING, 2010 4. http://lunadong.com/fusionDataSets.htm 17

item-level ranking holistic ranking feedback errors evaluation Competing methods o It Item-level r ranking m methods § QBC / US o De Decision-theoretic r ranking m methods § MEU / Approx-MEU § Greedy Upper Bound (GUB) ground-truth-utility-based o Ra Random § all data items equally beneficial 18

item-level ranking holistic ranking feedback errors evaluation Large number of sources, few claims: holistic ranking ∆ distance_to_ground_truth (%) 0 Random QBC US ApproxMEU -20 MEU GUB -40 -60 -80 0 20 40 60 80 100 data items validated (%) Books 19

item-level ranking holistic ranking feedback errors evaluation Large number of sources, few claims: holistic ranking ∆ distance_to_ground_truth (%) 0 -3 -6 QBC -9 US ApproxMEU -12 MEU 0 20 40 60 80 100 120 # data items validated Population 20

item-level ranking holistic ranking feedback errors evaluation Large number of claims, few sources: either QBC/holistic ∆ distance_to_ground_truth (%) 0 Random QBC US -20 ApproxMEU MEU GUB -40 -60 0 20 40 60 80 100 data items validated (%) FlightsDay 21

item-level ranking holistic ranking feedback errors evaluation Large number of claims, few sources: either QBC/holistic ∆ distance_to_ground_truth (%) 0 -2 -4 -6 QBC US ApproxMEU 10 0 0.5 1 1.5 2 data items validated (%) Flights 22

Contributions o Integrating user feedback to improve the performance of existing data fusion systems o Designed strategies to generate an effective ordering for validating claims o scalable decision-theoretic solution for iterative fusion o explored imperfect feedback scenarios o Evaluation on real-world datasets confirmed that guided feedback rapidly increases the effectiveness of data fusion 23

Staging User Feedback toward Rapid Conflict Resolution in Data - PowerPoint PPT Presentation

Staging User Feedback toward Rapid Conflict Resolution in Data Fusion Romila P Pradhan* , Siarhei Bykau , Sunil Prabhakar* *Purdue University, Bloomberg L.P. 1 Fusing data from multiple sources Data It Item S 1 S 2 S 3 S 4 Zootopia Howard

Conflict Resolution Mechanism- A Growers Perspective - The Wilmar Experience Introduction

Rapid Response Jobs are Alaskas Future Rapid Response Rapid Response Rapid Response is a

Staging Drupal Change Management Strategies for Drupal DrupalCamp CT 2010 Staging Drupal

SIGBI Limited General Meeting 2019 Resolutions 1-6 Resolution 1 Resolution 2 Resolution 3

Patagonia Gold Plc 2009 Patagonia Gold VOTING ORDINARY SPECIAL Resolution 1 Resolution 2

MANAGING CONFLICT MANAGING CONFLICT: Your Survival Guide to Successful Conflict Resolution

PRODUCTIVE CONFLICT RESOLUTION Leadership on Demand AGENDA Overview Style Assessment

Conflict Resolution Training Why do we need Conflict Resolution training? 1. Legislation - Health

Pro- and cons of staging complex EVAR Is there more to staging than lower paraplegia? Barend

PREOPERATIVE STAGING PREOPERATIVE STAGING IN RECTAL CANCER IN RECTAL CANCER Jacqueline A.

Kings Staging Ammar Al-Chalabi ENCALS Training for TRICALS Milan, 21-22 May 2016

Just-in-time Staging of Large Input Just-in-time Staging of Large Input Data for Supercomputing

Last time: staging basics . < e > . 1/ 41 Staging pow let rec pow x n = if n = 0 then .

SOCIAL MEDIA AND MDIAS SOCIAUX THE STAGING OF ET MISE EN SCNE HISTORY DE LHISTOIRE

Last time: staging basics . < e > . 1/ 54 Staging recap Goal : specialise with available

Model REM Rapid Engineering Model What is REM? REM Rapid Engineering Model What is REM? REM

Lecture 2 Making Simple Commits Sign in on the attendance sheet! credit:

JGOG3020 Japanese Gynecologic Oncology Group A pha hase I III r rand ndom omize zed c

Staging studies Callum Wilkinson University of Bern April 15, 2019 CPV 1 kt MW yr (0.04 actual

Secure and Efgicient Parsing via Programming Language Theory Neel Krishnaswami & Jeremy

DLVM DLVM A Modern Compiler Framework for Neural Network DSLs DLVM A Modern Compiler Framework

Siesta Time v0.2 Implant, Infrastructure and Reporting Who m I ? Spoke at R.T. Vill.

Uncertainty Evaluation Metric for Brain Tumour Segmentation Raghav Mehta 1 , Angelos Filos 2 ,

Managing Your Relationship with Your Supervisor Brianna Blaser Tuesday, September 15, 2009

Sambuz

Useful Links

Newsletter

Mail Us

Staging User Feedback toward Rapid Conflict Resolution in Data - PowerPoint PPT Presentation

Staging User Feedback toward Rapid Conflict Resolution in Data Fusion Romila P Pradhan* , Siarhei Bykau , Sunil Prabhakar* *Purdue University, Bloomberg L.P. 1 Fusing data from multiple sources Data It Item S 1 S 2 S 3 S 4 Zootopia Howard

Conflict Resolution Mechanism- A Growers Perspective - The Wilmar Experience Introduction

Rapid Response Jobs are Alaskas Future Rapid Response Rapid Response Rapid Response is a

Staging Drupal Change Management Strategies for Drupal DrupalCamp CT 2010 Staging Drupal

SIGBI Limited General Meeting 2019 Resolutions 1-6 Resolution 1 Resolution 2 Resolution 3

Patagonia Gold Plc 2009 Patagonia Gold VOTING ORDINARY SPECIAL Resolution 1 Resolution 2

MANAGING CONFLICT MANAGING CONFLICT: Your Survival Guide to Successful Conflict Resolution

PRODUCTIVE CONFLICT RESOLUTION Leadership on Demand AGENDA Overview Style Assessment

Conflict Resolution Training Why do we need Conflict Resolution training? 1. Legislation - Health

Pro- and cons of staging complex EVAR Is there more to staging than lower paraplegia? Barend

PREOPERATIVE STAGING PREOPERATIVE STAGING IN RECTAL CANCER IN RECTAL CANCER Jacqueline A.

Kings Staging Ammar Al-Chalabi ENCALS Training for TRICALS Milan, 21-22 May 2016

Just-in-time Staging of Large Input Just-in-time Staging of Large Input Data for Supercomputing

Last time: staging basics . &lt; e &gt; . 1/ 41 Staging pow let rec pow x n = if n = 0 then .

SOCIAL MEDIA AND MDIAS SOCIAUX THE STAGING OF ET MISE EN SCNE HISTORY DE LHISTOIRE

Last time: staging basics . &lt; e &gt; . 1/ 54 Staging recap Goal : specialise with available

Model REM Rapid Engineering Model What is REM? REM Rapid Engineering Model What is REM? REM

Lecture 2 Making Simple Commits Sign in on the attendance sheet! credit:

JGOG3020 Japanese Gynecologic Oncology Group A pha hase I III r rand ndom omize zed c

Staging studies Callum Wilkinson University of Bern April 15, 2019 CPV 1 kt MW yr (0.04 actual

Secure and Efgicient Parsing via Programming Language Theory Neel Krishnaswami &amp; Jeremy

DLVM DLVM A Modern Compiler Framework for Neural Network DSLs DLVM A Modern Compiler Framework

Siesta Time v0.2 Implant, Infrastructure and Reporting Who m I ? Spoke at R.T. Vill.

Uncertainty Evaluation Metric for Brain Tumour Segmentation Raghav Mehta 1 , Angelos Filos 2 ,

Managing Your Relationship with Your Supervisor Brianna Blaser Tuesday, September 15, 2009

Sambuz

Useful Links

Newsletter

Mail Us

Last time: staging basics . < e > . 1/ 41 Staging pow let rec pow x n = if n = 0 then .

Last time: staging basics . < e > . 1/ 54 Staging recap Goal : specialise with available

Secure and Efgicient Parsing via Programming Language Theory Neel Krishnaswami & Jeremy