1
1
Using CrowdSourcing for Data Analytics
Hector Garcia-Molina
(work with Steven Whang, Peter Lofgren, Aditya Parameswaran and others)
Stanford University
- Big Data Analytics
- CrowdSourcing
Using CrowdSourcing for Data Analytics Hector Garcia-Molina (work - - PDF document
Using CrowdSourcing for Data Analytics Hector Garcia-Molina (work with Steven Whang, Peter Lofgren, Aditya Parameswaran and others) Stanford University 1 Big Data Analytics CrowdSourcing 1 CrowdSourcing 3 Real World Examples
1
1
(work with Steven Whang, Peter Lofgren, Aditya Parameswaran and others)
2
3
4
Image Matching Translation Image Matching Translation Categorizing Images Categorizing Images S earch R elevance S earch R elevance Data Gathering Data Gathering
3
6
4
7
Example tasks:
8
Example tasks:
Key Point:
5
9 10
cleansing analysis System 1 System n
what matches what??
6
11
12
7
13
sim=0.9 sim=0.8
14
sim=0.9 sim=0.8
8
15
products cameras resolved cameras CDs books
resolved CDs resolved books
ER ER ER
16
9
17
0.45 0.63 0.5 0.9 0.95 0.9 0.7 0.9 0.87 0.7 0.4 0.6 0.5 0.63
18
0.9 0.95 0.9 0.7 0.9 0.87 0.7
threshold = 0.7
10
19
0.9 0.95 0.9 0.7 0.9 0.87 0.7
20
11
21
22
12
23
0.45 0.63 0.5 0.9 0.95 0.9 0.7 0.9 0.87 0.7 0.4 0.6 0.5 0.63
24
0.45 0.63 0.5 0.9 0.95 0.9 0.7 0.9 0.87 0.7 0.4 0.6 0.5 0.63
critical??
13
25
0.45 0.63 0.5 0.9 0.95 0.9 0.7 0.9 0.87 0.7 0.4 0.6 0.5 0.63
critical??
26
0.45 0.63 0.5 0.9 0.95 0.9 0.7 0.9 0.87 0.7 0.4 0.6 0.5 0.63
critical??
14
27
Key Point:
28
15
29
E D C B A
30
16
31
current state a b c
0.9 0.5 0.2
ER result use any given ER algorithm
32
current state a b c
0.9 0.5 0.2
Q(a,b) Q(b,c) Q(a,c) consider ALL possible questions (three in this example)
17
33
current state a b c
0.9 0.5 0.2
Q(a,b) Q(b,c) Q(a,c) new state new state new state new state new state new state ER result ER result ER result ER result ER result ER result
Y Y Y N N N
consider possible
34
current state a b c
0.9 0.5 0.2
Q(b,c) new state ER result
Y
example a b c
0.9 1.0 0.2
a b c
18
35
current state a b c
0.9 0.5 0.2
Q(a,b) Q(b,c) Q(a,c) new state new state new state new state new state new state ER result ER result ER result ER result ER result ER result score? score? score? score? score? score?
Y Y Y N N N
36
ER result gold standard F score
19
37
a b c
0.9 0.5 0.2
a b c
1.0 0.6 0.2
sim to prob
38
a b c
0.9 0.5 0.2
a b c
1.0 0.6 0.2
a b c a b c a b c a b c
0.12 0.48 0.08 0.32
sim to prob possible worlds
20
39
a b c
0.9 0.5 0.2
a b c
1.0 0.6 0.2
a b c a b c a b c a b c
0.12 0.48 0.08 0.32
a b c
0.68
a b c
0.32
sim to prob possible worlds possible clustering (via ER algorithm)
40
current state a b c
0.9 0.5 0.2
Q(a,b) Q(b,c) Q(a,c) new state new state new state new state new state new state ER result ER result ER result ER result ER result ER result score vs GS?
Y Y Y N N N
score vs GS? score vs GS? score vs GS? score vs GS? score vs GS?
21
41
42
22
43
Example tasks:
Key Point:
44
23
45
46
End user
what is best price for Nikon DS LR cameras?
24
47
End user
what is best price for Nikon DS LR cameras?
model type brand D7100 DSLR Nikon 7D DSLR Canon P5000 comp Nikon
48
End user what is best price for Nikon DS LR cameras?
model type brand D7100 DSLR Nikon 7D DSLR Canon P5000 comp Nikon
what is best price for Nikon D7100 camera? Crowd
25
restaurant rating cuisine Chez Panisse 4.9 French Chez Panisse 4.9 California Bytes 3.8 California
User view restaurant rating cuisine Chez Panisse 4.9 French Chez Panisse 4.9 California Bytes 3.8 California
User view restaurant Chez Panisse Bytes
restaurant rating Chez Panisse 4.8 Chez Panisse 5.0 Chez Panisse 4.9 Bytes 3.6 Bytes 4.0
restaurant cuisine Chez Panisse French Chez Panisse California Bytes California Bytes California
Anchor Dependent Dependent
26
restaurant rating cuisine Chez Panisse 4.9 French Chez Panisse 4.9 California Bytes 3.8 California
User view restaurant Chez Panisse Bytes
restaurant rating Chez Panisse 4.8 Chez Panisse 5.0 Chez Panisse 4.9 Bytes 3.6 Bytes 4.0
restaurant cuisine Chez Panisse French Chez Panisse California Bytes California Bytes California
Anchor Dependent Dependent fetch rule fetch rule Bytes
Chez Panisse
fetch rule
restaurant rating cuisine Chez Panisse 4.9 French Chez Panisse 4.9 California Bytes 3.8 California
User view restaurant Chez Panisse Bytes
restaurant rating Chez Panisse 4.8 Chez Panisse 5.0 Chez Panisse 4.9 Bytes 3.6 Bytes 4.0
restaurant cuisine Chez Panisse French Chez Panisse California Bytes California Bytes California
Anchor Dependent Dependent fetch rule fetch rule Bytes
Chez Panisse
fetch rule fetch rule
French
27
restaurant rating cuisine Chez Panisse 4.9 French Chez Panisse 4.9 California Bytes 3.8 California
User view restaurant Chez Panisse Bytes
restaurant rating Chez Panisse 4.8 Chez Panisse 5.0 Chez Panisse 4.9 Bytes 3.6 Bytes 4.0
restaurant cuisine Chez Panisse French Chez Panisse California Bytes California Bytes California
Anchor Dependent Dependent resolution rule resolution rule Bytes
Chez Panisse
restaurant rating cuisine Chez Panisse 4.9 French Chez Panisse 4.9 California Bytes 3.8 California
User view restaurant Chez Panisse Bytes
restaurant rating Chez Panisse 4.8 Chez Panisse 5.0 Chez Panisse 4.9 Bytes 3.6 Bytes 4.0
restaurant cuisine Chez Panisse French Chez Panisse California Bytes California Bytes California
Anchor Dependent Dependent
28 Fetch [n] Fetch [ln] Fetch [ln,c] Scan D1(n,l) Scan A(n)
55
Join Join AtLeast [8]
SELECT n,l,c FROM country WHERE l = ‘Spanish’ ATLEAST 8
Resolve[m3] Resolve[d.e] Fetch [nl] Scan D2(n,c) Resolve[m3] Fetch [nl,c]
Filter [l=‘Spanish’] Fetch [nl,c]
56
29
57 58