HUMAN-POWERED DATA MANAGEMENT!
! Aditya Parameswaran!
! with H. Garcia-Molina, !
- J. Widom, A. Polyzotis, M. Teh!
!
1!
HUMAN-POWERED DATA MANAGEMENT ! ! Aditya Parameswaran ! ! with H. - - PowerPoint PPT Presentation
HUMAN-POWERED DATA MANAGEMENT ! ! Aditya Parameswaran ! ! with H. Garcia-Molina, ! J. Widom, A. Polyzotis, M. Teh ! ! 1 ! Why should we (DM/DB folks) care? ! Reason 1: Most data is unstructured ! Unstructured Data ! Automated processing: not yet
! with H. Garcia-Molina, !
1!
2!
Structured Data! Unstructured Data! ! images, videos, text!
3!
5!
6!
Data Processing! Algorithms! Learning accuracies! Data Processing Systems!
Interfaces! Patterns!
Incentives!
7!
Data Processing! Algorithms! Auxiliary Plugins: Quality, Pricing! Data Processing Systems!
Filter [SIGMOD12, VLDB14] !Max [SIGMOD12] ! Clean [KDD12, TKDD13] !Categorize [VLDB11]! Search [ICDE14] ! !Debugging [NIPS12]! Deco [CIKM12, VLDB12, TR12, SIGMOD Record 12]! DataSift [HCOMP13, SIGMOD14] HQuery [CIDR11] ! Confidence [KDD13, TR14] !Eviction [TR12] ! Pricing [VLDB15] ! !Quality [HCOMP14]!
8!
Non-t n-text xtual ual cont ntent nt:! ! !“cables that plug into <img>”! ! !“funny pictures of cats with hats with captions”!
!
Time-c
nsuming ng: “find noise canceling headphones where the battery lasts 13 hrs”! ! !“apartments in a nice area around urbana”! !
9!
10!
!
!
!
11!
!
any re reformul ulat ations ns sho shoul uld we gat athe her? r? !
any items s sho shoul uld we re retrieve at at eac ach h st step? p?!
filter r items? s? How many any pe peopl ple do we ask ask? ? !
ptimize the he workfl flow?!
uarant antee corre rectne ness? ss?! !
How much am I willing to spend?! How long can I wait?! What is my desired quality?!
12!
13!
14!
Dataset set of
Items ems!
Boolean! Predicate!
Filt Filtered ered Dataset set!
Y! Y! N!
15!
16!
— Human error probability (FP/FN)! — Pr [Yes | 0]; Pr [No | 1] ! — A-priori probability! — Pr [0]; Pr[1]!
— Expected error < t (say, 5%)! — Cost per item < m (say, 20 questions)!
17!
18!
Yes]!
Cost
Error ror = Pr [reach 1] + ! Pr [reach 0]
!
!
!
19!
20!
21!
22!
23!
24!
25!
26!
27!
28!
A+! A! B+! B-!
29!
30!
31!
Data Processing! Algorithms! Auxiliary Plugins: Quality, Pricing! Data Processing Systems!
Filt Filter er [SIG IGMOD12, 12, VLDB14] 14] !Max [SIGMOD12] ! Clean [KDD12, TKDD13] !Categorize [VLDB11]! Search [ICDE14] ! !Debugging [NIPS12]! Deco [CIKM12, VLDB12, TR12, SIGMOD Record 12]! DataSif ift [HCOMP13, 13, SIG IGMOD14] 14] HQuery [CIDR11] ! Confidence [KDD13, TR14] !Eviction [TR12] ! Pricing [VLDB15] ! !Quality [HCOMP14]!
32!
33!
34!
!
35!
50! 10! 10! 30! MA! CA! IL! NY!
25! 15! 40! 20!
!
36!
!
1.5! 2! 2.5! 3! 3.5! 4! 4.5!D!
D!
1.5! 2! 2.5! 3! 3.5! 4! 4.5! 0! 1! 2! 3! 4! 5! 6!39!
40!
41!
42!
!
!
!
43!
!
!
44!
!
45!
46!
!
architecture of our system is shown in Figure 1 below.
47!
48!
49!
50!