CrowdDB: Answering queries with crowdsourcing!
Michael Franklin+, Donald Kossmann*, Tim Kraska+ Sukriti Ramesh*, and Reynold Xin+
+ UC Berkeley * ETH Zurich
UC Berkeley
1
CrowdDB: Answering queries with crowdsourcing ! Michael Franklin + , - - PowerPoint PPT Presentation
UC Berkeley CrowdDB: Answering queries with crowdsourcing ! Michael Franklin + , Donald Kossmann*, Tim Kraska + Sukriti Ramesh*, and Reynold Xin + + UC Berkeley * ETH Zurich 1 ! !"#$%&'()*+ " ,-'%./)*+ # /#0!/+
Michael Franklin+, Donald Kossmann*, Tim Kraska+ Sukriti Ramesh*, and Reynold Xin+
+ UC Berkeley * ETH Zurich
UC Berkeley
1
2
search Watson/IBM
3
search Watson/IBM
4
search Watson/IBM
Crowd DB
5
$%&'()*+,(&-. !//0-11. "(02-3.$('. I##"!/+ I##"!/0!/=*+?&.J+K%/D+C>+ LMNOF.+ P.&!J+F4)%./))+?,-'%./)+ >$(#.6*+1Q+ LMOOF.+ ?%-$#)#R+ S/2(#.2*+T>+ LMUOF.+
6
$%&'()*+,(&-. !//0-11. "(02-3.$('. I##"!/+ I##"!/0!/=*+?&.J+K%/D+C>+ LMNOF.+ P.&!J+F4)%./))+?,-'%./)+ >$(#.6*+1Q+ LMOOF.+ ?%-$#)#R+ S/2(#.2*+T>+ LMUOF.+
7
8
$%&'()*+,(&-. !//0-11. "(02-3.$('. I##"!/+ I##"!/0!/=*+?&.J+K%/D+C>+ LMNOF.+ P.&!J+F4)%./))+?,-'%./)+ >$(#.6*+1Q+ LMOOF.+ ?%-$#)#R+ S/2(#.2*+T>+ LMUOF.+
9
$%&'()*+,(&-. !//0-11. "(02-3.$('. I##"!/+ I##"!/0!/=*+?&.J+K%/D+C>+ LMNOF.+ P.&!J+F4)%./))+?,-'%./)+ >$(#.6*+1Q+ LMOOF.+ ?%-$#)#R+ S/2(#.2*+T>+ LMUOF.+
10
11
12
+
T'/$/+&#+4)/+&'/+-$#D2;+
–" Z,b/+0/#0!/+<.2+2,&,+ –" Z,b/+0/#0!/+2#+`B4YY5a+
T'/$/+%&+2#/).:&+(,6/+)/.)/;+
–" Z,b/+0/#0!/+2#+34%-6+)#$&+#B+)/&)+ [%J/J*+94$6P&\+ –" >.5&'%."+&'/+-#(04&/$+,!$/,25+ 2#/)+D/!!++
Disk 2 Disk 1
Parser Optimizer Statistics
CrowdSQL Results
Executor
Files Access Methods
UI Template Manager UI Editor UI Creation HIT Management
CrowdDB
MetaData
–" dHA)+-!#)/2GD#$!2+,))4(08#.+%)+,+!%/+ –" P.g4/.-/)+34/$5+/=/-48#.+)&$,&/"%/)h#08#.)+ –" >&+&'/+(#(/.&*+D/+#.!5+,!!#D+,+$/)&$%-&/2+)/&+#B+34/$%/)+,",%.)&+
–" I#,!;+V=0!#$/+&'/+#0/.GD#$!2+,)+(4-'+,)+0#))%7!/+[/J"J*+,.)D/$+,+ 34/$5+,)+7/)&+,)+0#))%7!/+"%b/.+,+-/$&,%.+742"/&\++
–" Vb/$5+$/)4!&+B$#(+&'/+-$#D2+%)+)&#$/2+!+9##+/=0/.)%b/+.#&+&#+2#+)#+ –" H4/$%/)+4)/+)&#$/2+$/)4!&)+D'/./b/$+0#))%7!/+ –" >.)D/$)+-',."/+7,)/2+#.+&'/+34/$5+'%)&#$5+,.2+-,-'/+7/',b%#$+ –" I#,!;+]i/$+(#$/+-#.&$#!+#b/$+-,-'%."*+99A*+_+
–" Z4(,.G%.04&G&#!/$,.&+34/$5+0$#-/))%."+ –" >&+&'/+(#(/.&+,+)%(0!/+)/&+#B+'/4$%)8-)++ –" I#,!;+-#.<2/.-/+%.&/$b,!)*+%&/$,8b/+%(0$#b/(/.&*_+
13
SELECT * FROM companies WHERE Name ~ ~ “Big Blue”+
14
CREATE C CROWD TABLE department ( university STRING, department STRING, phone_no STRING) PRIMARY KEY (university, department); CREATE TABLE company ( name STRING PRIMARY KEY, hq_address C CROWD STRING);
:"E.FG3-)1?%)1H..
SELECT p FROM picture WHERE subject = "Golden Gate Bridge" ORDER BY C CROWDORDER(p, "Which pic shows better %subject");
::E.FG3-)1?%)1;
!"#$%#"%&"'()*+,-(+.'/01++*2-34'5%6.78' !+(9:&;1,38''' !+(9:.(1+0*:'0(31<2.''' !+(9:.(1+0*:'-,=3*.'
15
16
17
CREATE C CROWD TABLE department ( name STRING PRIMARY KEY phone_no STRING); CREATE C CROWD TABLE professor ( name STRING PRIMARY KEY e-mail STRING dep STRING REF department(name) );
dVAVC9+l++ kS]?+@S]kVdd]S+0*+EV@>S9?V19+2+ TZVSV+2J.,(/+e+0J2/0+ >1E+0J.,(/+e`?%-',/!+mJ+C,$/5a+
Professor Department
p.dep=d.name
Professor Department
"Carey" p.dep=d.name
Please fill out the missing professor data Submit Carey E-Mail Name Please fill out the missing department data Submit CS Phone Department
MTJoin (Dep) p.dep = d.name MTProbe (Professor) name=Carey
(b) Logical plan before optimization (c) Logical plan after optimization (d) Physical plan
Department
Professor Department
p.dep=d.name
Professor Department
"Carey" p.dep=d.name
Please fill out the missing professor data Submit Carey E-Mail Name Please fill out the missing department data Submit CS Phone Department
MTJoin (Dep) p.dep = d.name MTProbe (Professor) name=Carey
(b) Logical plan before optimization (c) Logical plan after optimization (d) Physical plan
Department
18
I>@-.J(1-/.%'K&?L-0.
@/$B#$(,.-/+P.)%"'n4!+H4/$5+A,."4,"/+[@PHA\+&/-'.%34/)++ &#+2/,!+D%&'+#0/.GD#$!2+,))4(08#.+
d%(0!/+)/&+#B+$4!/)+&#+0%-6+&'/+7/)&+0!,.+
d%(0!/+'/4$%)8-)+&#+)/&+&'/+-$#D2+0,$,(/&/$)+++ [/J"J*+$/0!%-,8#.+B,-&#$*+0$%-/+0/$+ZP9*+/&-J\+ +
19
Professor Department
"Carey" p.dep=d.name
Professor Department
Professor Department
Stop After (10)
dVAVC9+l++ kS]?+@S]kVdd]S+0*++ +++EV@>S9?V19+2+ TZVSV+0J2/0+e+2J.,(/+
Professor Department
dVAVC9+l++ kS]?+@S]kVdd]S+0*++ +++EV@>S9?V19+2+ TZVSV+0J2/0+e+2J.,(/+ AP?P9+O*+NO++
10*1
10 10 10
dVAVC9+l++ kS]?+@S]kVdd]S+0*++ +++EV@>S9?V19+2+ TZVSV+0J2/0+e+2J.,(/+ >1E+0J.,(/+e`C,$/5a+ +
Professor Department
p.dep=d.name
1 1 1*1 1
Never Soon Focus Right Now: PK Queries
Professor Department
p.dep=d.name
Professor Department
"Carey" p.dep=d.name
Please fill out the missing professor data Submit Carey E-Mail Name Please fill out the missing department data Submit CS Phone Department
MTJoin (Dep) p.dep = d.name MTProbe (Professor) name=Carey
(b) Logical plan before optimization (c) Logical plan after optimization (d) Physical plan
Department
20
–" C$#D2+]0/$,&#$);+?9@$#7/*+?9m#%.*+?9k4.-8#.+ –" ]&'/$;+d9]@+>k9VS+[%J/J*+!%(%&\+
21
Please fill out the missing company data! Submit IBM
Name Headquarter address
Please fill out the missing professor data Submit Carey E-Mail Name Department Department Phone
Crowd Column & Crowd Colums w/o foreign keys
Please fill out the missing professor data Submit CS Carey Department name N ame E-Mail
Crowd Column & Crowd Colums with foreign keys Denormalization
"M#0%J- [@$#B/))#$\+ 1,(/eC,$/5+
22
Please fill out the missing professor data Submit Carey E-Mail Name Please fill out the missing department data Submit CS Phone Department Name
MTJoin (Dep) p.dep = d.name MTProbe (Professor) name=Carey
Department
Are the following entities the same? IBM == Big Blue Yes No Which picture visualizes better "Golden Gate Bridge" Submit
23
Professor Department
p.dep=d.name
Professor Department
"Carey" p.dep=d.name
Please fill out the missing professor data Submit Carey E-Mail Name Please fill out the missing department data Submit CS Phone Department
MTJoin (Dep) p.dep = d.name MTProbe (Professor) name=Carey
(b) Logical plan before optimization (c) Logical plan after optimization (d) Physical plan
Department
24
Query: SELECT name FROM company WHERE name ~ [a non-uniform name of the company] Data-Size: 100 company names Batching: 10 comparisons per HIT Replication: 3 Assignments per HIT Price: 1 cent per HIT Schema: CREATE TABLE company ( name STRING, headquarter_address CROWD STRING );
Non Uniform Name Query Result Votes Error Examples Bayerische Motoren Werke BMW 3 TATA Group, Gazprom, Boeing, Toyota International Business Machines IBM 2 Samsung, HP Company of Gillette P&G 2 Aviva, AIG, France Telecom Big Blue IBM 2 Microsoft
Which entities are the same as
Big BLUE?
Google HP IBM Facebook NetApp CrowdFlower Yahoo Microsoft Salesforce SAP None of the above
25
Query: SELECT p FROM picture WHERE subject = "Golden Gate Bridge" ORDER BY C CROWDORDER(p, "Which pic shows better %subject"); Data-Size: 30 subject areas, with 8 pictures each Batching: 4 orderings per HIT Replication: 3 Assignments per HIT Price: 1 cent per HIT
(turker-votes, turker-ranking, expert-ranking)
Which picture visualizes better "Golden Gate Bridge"
Submit
–" @$%-/+0/$+ZP9+ –" C#(0!/=%&5h17J+34/)8#.)+0/$+ZP9+ –" 17J+,))%".(/.&)+0/$+ZP9+!+S/0!%-,8#.+ –" 950/+#B+j)/$+P.&/$B,-/+ –" _+
27
Oq+ NOq+ MOq+ rOq+ sOq+ UOq+ tOq+ uOq+ vOq+ wOq+ NOOq+ O+ NO+ MO+ rO+ sO+ UO+ tO+
#-07-)3(C-.%6.P5M1.38(3.8(Q-.(3.@-(13. %)-.(11?C)&-)3.7%&'@-3-/. M?&-.R&?)1S.
LOJON++ LOJOM++ LOJOr++ LOJOs++ 5 Assignments, 100 HITs
28
>?@A,7+ ZP9+I$#40+x+d%(0!/+)&$,%"'&GB#$D,$2+ZP9)*+<.2+&'/+,22$/))+,.2+0'#./+.4(7/$+ B#$+,+"%b/.+74)%./))+%.+,+"%b/.+-%&5J+>!!+ZP9)+-#(0!/&/2+D/$/+,00$#b/2J+@,5+D,)+ 2/-/.&+B#$+,(#4.&+#B+8(/+$/34%$/2*+D'/.+-#(0,$/2+&#+#&'/$+,b,%!,7!/+ZP9)J+ F4&+.#&+D'/.+!##6/2+,&+B$#(+,.+'#4$!5+D,"/+0/$)0/-8b/J+P+D#4!2+2#+
B,%$;U+h+U+++B,)&;U+h+U+++0,5;s+h+U+++-#((;O+h+U+
++++++++++++++++++++++++++++ 9%(+y!,)+y$,)6,+
ZP9+I$#40+x+ +P+$/-/.&!5+2%2+Mww+ZP9)+B#$+&'%)+$/34/)&/$J_+]B+&'/+Mww+ZP9)+P+
&'%)+P+#.!5+',2+Ns+$/c/-8#.)*+,+JMq+$/c/-8#.+$,&/J+P+-4$$/.&!5+',b/+vUMM+ )47(%W/2+ZP9)*+D%&'+,+Jrq+$/c/-8#.+$,&/+,R/$+&'/+$/c/-8#.)+B$#(+&'%)+ $/34/)&/$+[MU+&#&,!+$/c/-8#.)\J+P+',b/+,W/(0&/2+&#+-#.&,-&+&'/+$/34/)&/$+,.2+ D%!!+402,&/+%B+P+$/-/%b/+,+$/)0#.)/J+j.8!+&'/.+7/+b/$5+D,$5+#B+2#%."+
,7#4&+N+%.+/b/$5+Mu+ZP9)+7/%."+)47(%W/2J+++++0#)&/2+75+_+ B,%$;M+h+U++++B,)&;s+h+U++++0,5;M+h+U+++++-#((;O+h+U+
29
Professor Department
p.dep=d.name
(Department first) (Professor first) (De-normalized Probe)
Please fill out the missing professor data Submit Carey E-Mail Name Please fill out the missing department data Submit CS Phone Department Name
MTJoin (Dep) p.dep = d.name MTProbe (Professor) name=Carey
Department Please fill out the missing professor data Submit CS Carey Department name N ame
MTJoin (Professor) p.name = "carey" MTProbe(Dep)
E-Mail Please fill out the missing professor data Submit Carey E-Mail Name
MTProbe (Professor, Dep) name=Carey
Department Department Phone Please fill out the missing department data Submit Phone Department Name
31
O+ UO+ NOO+ NUO+ MOO+ MUO+ N+ MN+ sN+ tN+ vN+
,>&J-0.%6.(11?C)&-)31.1>J&?T-/.
9#&,!+ZP9)+)47(%W/2+ {P.-#$$/-&+,))%".(/.&)+-#(0!/&/2{+ 5 Assignments
32
Oq+ Uq+ NOq+ NUq+ MOq+ MUq+ rOq+ rUq+ O+ U+ NO+ NU+ MO+ MU+ rO+ rU+ sO+ sU+ UO+ UU+ tO+
P5M1.7%&'@-3-/.RUS. M?&-.R&?)1S.
Start of a concurrent experiment
33
$0%9/<P(0/.#0%J@-&1H.
#0%C0(&&?)C.E()C>(C-H.V45.
H4/)8#.+2/)%".*+,(7%"4%8/)*+"$,.4!,$%&5*+_+
"()*W.&()*.2)%J1.3%.3>0).
@$%-/*+$/0!%-,8#.+B,-&#$*+ZP9+"$#40+)%Y/*+/=0%$,8#.+8(/*_J+
$8()C?)C.'@(X%0&.J-8(Q?%0.
P.-$/,)%."+(,$6/&+)%Y/*+./D+0!,n#$(+B/,&4$/)*_+
N>)C@-.%6./?Y-0-)3.3-78)?=>-1.
S/c/-8#.+0#!%-5+[H4#$4(GK#&/*+9/)&Gd/&*_\*++H4,!%&5+-#.&$#!+[H4#$4(*+%&/$,8b/+(#2/!)*_\+
E-(0)?)C.-Y-731.Z.$%&&>)?3*."()(C-&-)3.
_+
M8-.:;<!''0%(78.
:(3(.?)/-'-)/-)7-..
PB+ZT+-',."/)*+,00+.//2+.#&+-',."/+
:;"[.%'K&?L-1.=>-0?-1..
–" E/-%2/+D',&+&#+-$#D2)#4$-/+ –" d&,8)8-)+,7#4&+&'/+(,$6/&+0!,-/*++ 34/)8#.+#$2/$%."*_+
34
>J+?,$-4)*++VJ+T4*+dJ+?,22/.+,.2+SJ+?%!!/$;+ $0%9/1%>07-/.:(3(J(1-1H.D>-0*.#0%7-11?)C.9?38.#-%'@-J+!>%"?'@ABB'
>J+@,$,(/)D,$,.+,.2+1J+@#!5Y#8);+ !)19-0?)C.D>-0?-1.>1?)C.P>&()1W.!@C%0?38&1.()/.:(3(J(1-1J+!>%"?'@ABB'
>J+@,$,(/)D,$,.*++>J+E,)+d,$(,*+ZJ+I,$-%,G?#!%.,*+1J+@#!5Y#8)+,.2+mJ+T%2#(;+ P>&()<(11?13-/.V0('8.[-(78H.53\1.%2(*.3%.(12.=>-1K%)1]J+KAEFC'@ANN+
dJ+>(/$GQ,'%,*+>J+E#,.*+mJ+?J+y!/%.7/$"*+1J+y#42,)*+?J+mJ+k$,.6!%.;++ $0%9/1W.7@%>/1W.()/.(@C%0?38&1H.-G'@%0?)C.38-.8>&().1?/-.%6.^J?C./(3(^. (''@?7(K%)1J+D>EF#%?'@ABA+
@J+E/S#)/*+}J+C',%*+FJ+I,#*+TJ+d'/.*+>J+E#,.*+@J+F#',..#.+,.2+mJ+~'4;++ ;>?@/?)C.$%&&>)?3*.A?2?'-/?(1H.!.P>&()<"(78?)-.!''0%(78*+>!%&?'@AAG++
IJ+A%W!/*+AJ+FJ+C'%!&#.*+SJ+?%!!/$+,.2+?J+I#!2(,.;+ M>0_?3H.M%%@1.6%0.53-0(KQ-.M(121.%)."-78()?7(@.M>02J+H!#FIJAK'
mJ+@J+F%"',(*+CJ+m,5,.&*+ZJ+m%*+IJ+A%W!/*+>J+?%!!/$*+SJ+CJ+?%!!/$*+SJ+?%!!/$*+>J+9,&,$#D%-Y*+ FJ+T'%&/*+dJ+T'%&/*+,.2+9J+Q/';+ `?LA?LH.)-(0@*.0-(@<K&-.()19-01.3%.Q?1>(@.=>-1K%)1J+jPd9?'@ANO+
@J+IJ+P0/%$#8);++ !)(@*L?)C.38-."-78()?7(@.M>02."(02-3'@(7-*+>C?+}SEd*+MONO+ +
35
36