1
Building Mashups by Example
Rattapoom Tuchinda Doctoral Defense
July 22, 2008
Building Mashups by Example Rattapoom Tuchinda Doctoral Defense - - PowerPoint PPT Presentation
Building Mashups by Example Rattapoom Tuchinda Doctoral Defense July 22, 2008 1 Whats a Mashup? A website or application that combines content from more than one source into an integrated experience [wikipedia] a) LA crime map b) zillow.com
1
Rattapoom Tuchinda Doctoral Defense
July 22, 2008
2
a) LA crime map c) Ski bonk b) zillow.com
different counties
Introduction Approach Evaluation Related Work Conclusion
3
Introduction Approach Evaluation Related Work Conclusion
4
Wrapper Wrapper
Data Retrieval
Clean Clean Attribute Attribute
Calibration
Combine
Integration
Customize Display
Display
Introduction Approach Evaluation Related Work Conclusion
5
Wrapper
Data Retrieval
Clean Attribute
Calibration
Customize Display
Display
Introduction Approach Evaluation Related Work Conclusion
6
Wrapper Wrapper
Data Retrieval
Clean Clean Attribute Attribute
Calibration
Union
Integration
Customize Display
Display
Introduction Approach Evaluation Related Work Conclusion
7
Wrapper
Data Retrieval
Clean Attribute
Calibration
Customize Display
Display
Introduction Approach Evaluation Related Work Conclusion
8
Wrapper Wrapper
Data Retrieval
Clean Clean Attribute Attribute
Calibration
Join
Integration
Customize Display
Display
Introduction Approach Evaluation Related Work Conclusion
9
Wrapper Wrapper
Data Retrieval
Clean Clean Attribute Attribute
Calibration
Combine
Integration
Customize Display
Display
Introduction Approach Evaluation Related Work Conclusion
10
Goal: Create Mashups without Programming
for MS) represents an operation
customize widget can be time consuming
issues and ignore others. Can we come up with a framework that addresses all of the issues while still making the Mashup building process easy?
Introduction Approach Evaluation Related Work Conclusion
11
Introduction Approach Evaluation Related Work Conclusion
12
Introduction Approach Evaluation Related Work Conclusion
13
Introduction Approach Evaluation Related Work Conclusion
14
Embedded Browser Our system: Karma Introduction Approach Evaluation Related Work Conclusion
15
Embedded Browser Our system: Karma Introduction Approach Evaluation Related Work Conclusion
16
Embedded Browser Table
Introduction Approach Evaluation Related Work Conclusion
17
Embedded Browser Table Interaction Modes
Introduction Approach Evaluation Related Work Conclusion
18
{Restaurant name, address, phone, Review} {Restaurant name, address, phone, review, Date of Inspection, Score}
{Restaurant name, address, Date of Inspection, Score}
Introduction Approach Evaluation Related Work Conclusion
19
{Restaurant name, address, phone, Review} {Restaurant name, address, phone, review, Date of Inspection, Score}
{Restaurant name, address, Date of Inspection, Score}
Introduction Approach Evaluation Related Work Conclusion
20
{Restaurant name, address, phone, Review} {Restaurant name, address, phone, review, Date of Inspection, Score}
Introduction Approach Evaluation Related Work Conclusion
21
{Restaurant name, address, phone, Review} {Restaurant name, address, phone, review, Date of Inspection, Score}
Introduction Approach Evaluation Related Work Conclusion
22
{Restaurant name, address, phone, Review} {Restaurant name, address, phone, review, Date of Inspection, Score}
Introduction Approach Evaluation Related Work Conclusion
Mashups and data tables
23
TBODY tr tr td td
1. 2. Japon Bistro
td a br br
970 E Colora.. Upscale yet affordabl..
td a br br
8400 Wilshir. Chic elegance….. Hokusai
Introduction Approach Evaluation Related Work Conclusion
24
TBODY tr tr td td
1. 2. Japon Bistro
td a br br
970 E Colora.. Upscale yet affordabl..
td a br br
8400 Wilshir. Chic elegance….. Hokusai
Introduction Approach Evaluation Related Work Conclusion
25
TBODY tr tr td td
1. 2. Japon Bistro
td a br br
970 E Colora.. Upscale yet affordab
td a br br
8400 Wilshir. Chic elegance… Hokusai
Introduction Approach Evaluation Related Work Conclusion
26
TBODY tr tr td td
1. 2. Japon Bistro
td a br br
970 E Colora.. Upscale yet affordab
td a br br
8400 Wilshir. Chic elegance… Hokusai
Introduction Approach Evaluation Related Work Conclusion
27
TBODY tr tr td td
1. 2. Japon Bistro
td a br br
970 E Colora.. Upscale yet affordab
td a br br
8400 Wilshir. Chic elegance… Hokusai
Introduction Approach Evaluation Related Work Conclusion
28
Possible Attribute restaurant name (3) artist name (1) {a |a,s: a att (s) (val(a,s) V)}
…
Sushi Sasabune Hokusai Japon Bistro
Newly extracted data
.. .. 23 Katana .. .. 25 Sushi Roku .. .. 27 Sushi Sasabune .. .. zagat Rating restaurant name
Zagat
.. .. .. .. .. .. French Renoir .. .. Japanese Hokusai .. .. nationality artist name
Artist Info
95 .. 927 E.. Japon Bistro 99 .. 8439.. Katana 90 .. 8400.. Hokusai Health Rating .. Address restaurant name
LA Health Rating
Introduction Approach Evaluation Related Work Conclusion
29
Data repository
95 .. 927 E.. Japon Bistro 99 .. 8439.. Katana 90 .. 8400.. Hokusai Health Rating .. Address restaurant name .. .. 23 Katana .. .. 25 Sushi Roku .. .. 27 Sushi Sasabune .. .. zagat Rating restaurant name
Zagat LA Health Rating
Sushi Roka Sushi Sasabune Hokusai Japon Bistro
Newly extracted data
Introduction Approach Evaluation Related Work Conclusion
30
Data repository
95 .. 927 E.. Japon Bistro 99 .. 8439.. Katana 90 .. 8400.. Hokusai Health Rating .. Address restaurant name .. .. 23 Katana .. .. 25 Sushi Roku .. .. 27 Sushi Sasabune .. .. zagat Rating restaurant name
Zagat LA Health Rating
Sushi Roka Sushi Sasabune Hokusai Japon Bistro
Newly extracted data
Introduction Approach Evaluation Related Work Conclusion
31
. . .
Predefined Rules
31 Reviews 31 Subset Rule: (s1s2..sk) (d1d2…dt) (k <= t) si {d1,d2,…,dt} di dj Introduction Approach Evaluation Related Work Conclusion
32
. . .
Predefined Rules
31 Reviews 31 Subset Rule: (s1s2..sk) (d1d2…dt) (k <= t) si {d1,d2,…,dt} di dj Introduction Approach Evaluation Related Work Conclusion
33
Introduction Approach Evaluation Related Work Conclusion
34
Introduction Approach Evaluation Related Work Conclusion
35
Introduction Approach Evaluation Related Work Conclusion
36
Data repository
95 .. 927 E.. Japon Bistro 99 .. 8439.. Katana 90 .. 8400.. Hokusai Health Rating .. Address restaurant name .. .. 23 Katana .. .. 25 Sushi Roku .. .. 27 Sushi Sasabune .. .. zagat Rating restaurant name
Zagat LA Health Rating
Introduction Approach Evaluation Related Work Conclusion
37
Data repository
95 .. 927 E.. Japon Bistro 99 .. 8439.. Katana 90 .. 8400.. Hokusai Health Rating .. Address restaurant name .. .. 23 Katana .. .. 25 Sushi Roku .. .. 27 Sushi Sasabune .. .. zagat Rating restaurant name
Zagat LA Health Rating
Introduction Approach Evaluation Related Work Conclusion
38
Data repository
95 .. 927 E.. Japon Bistro 99 .. 8439.. Katana 90 .. 8400.. Hokusai Health Rating .. Address restaurant name .. .. 23 Katana .. .. 25 Sushi Roku .. .. 27 Sushi Sasabune .. .. zagat Rating restaurant name
Zagat LA Health Rating
Introduction Approach Evaluation Related Work Conclusion
39
{v} = val(a,s) where a {x} s is any source where att(s) {x} {}
{a}R = possible new attribute selection for row i. {x} = Set intersection({a}) over all the value rows.
Data repository
95 .. 927 E.. Japon Bistro 99 .. 8439.. Katana 90 .. 8400.. Hokusai Health Rating .. Address restaurant name .. .. 23 Katana .. .. 25 Sushi Roku .. .. 27 Sushi Sasabune .. .. zagat Rating restaurant name
Zagat LA Health Rating
Introduction Approach Evaluation Related Work Conclusion
40
Introduction Approach Evaluation Related Work Conclusion
41
Introduction Approach Evaluation Related Work Conclusion
42
Introduction Approach Evaluation Related Work Conclusion
more time on Dapper/Pipes in general, then the non-programmer subjects would spend more time on Dapper/Pipes as well if they were to learn how to use those systems.
43
Introduction Approach Evaluation Related Work Conclusion
(2 assignments on DP)
Practice
Karma
Test (3 tasks)
between Karma vs. DP for each task
Karma
video capture software
Using 5 minutes cut off time
44
Task No. Mashup Type Data Extraction Source Modeling Data Cleaning Data Integration 1 1 (1 source) Moderate Simple Difficult N/A 2 2,3 (union+form) Difficult Simple Simple Union (simple) 3 4 (join 2 sources) Simple Simple N/A Join (difficult)
Introduction Approach Evaluation Related Work Conclusion
Users with no programming experiences can build all four Mashup types.
45
Task No. Mashup Type Data Extraction Source Modeling Data Cleaning Data Integration 1 1 (1 source) Moderate Simple Difficult N/A 2 2,3 (union+form) Difficult Simple Simple Union (simple) 3 4 (join 2 sources) Simple Simple N/A Join (difficult)
Introduction Approach Evaluation Related Work Conclusion
When the Mashup subtask is difficult, Karma takes less time to complete that subtask.
46
Task No. Mashup Type Data Extraction Source Modeling Data Cleaning Data Integration 1 1 (1 source) Moderate Simple Difficult N/A 2 2,3 (union+form) Difficult Simple Simple Union (simple) 3 4 (join 2 sources) Simple Simple N/A Join (difficult)
Introduction Approach Evaluation Related Work Conclusion
Overall, the user takes less time to build the same Mashup in Karma compared to Dapper/Pipes
47
48
Introduction Approach Evaluation Related Work Conclusion
49
50
Introduction Approach Evaluation Related Work Conclusion
Karma Karma (programmer) Dapper/Pipes
51
Introduction Approach Evaluation Related Work Conclusion
difficult, Dapper/Pipes takes
the task (11% for moderate and 25% for difficult) Dapper/Pipes Karma
52
Introduction Approach Evaluation Related Work Conclusion
tasks 2
attributes
data integration step.
because of union Dapper/Pipes Karma
53
Introduction Approach Evaluation Related Work Conclusion
more subjects are failing in Dapper/ Pipes (35% for simple and 83% in hard) Dapper/Pipes Karma
54
Introduction Approach Evaluation Related Work Conclusion
subjects can specify union indirectly by dropping data into the right cell
step allows Karma to suggest the linking source
case and 95% fail in the join case Dapper/Pipes Karma
55
56
Introduction Approach Evaluation Related Work Conclusion
Karma
57
Introduction Approach Evaluation Related Work Conclusion
1.13x 4.16x 6.49x 3.54x
Dapper/Pipes Karma
58
Introduction Approach Evaluation Related Work Conclusion
59
Introduction Approach Evaluation Related Work Conclusion
60
Introduction Approach Evaluation Related Work Conclusion
Early work. Focus on DOM, too basic
61
Introduction Approach Evaluation Related Work Conclusion
Early work. Focus on DOM, too basic RDF / Manually specify data int
62
Introduction Approach Evaluation Related Work Conclusion
Mainly focus on extraction / linear Early work. Focus on DOM, too basic RDF / Manually specify data int
63
Introduction Approach Evaluation Related Work Conclusion
Mainly focus on extraction / linear Widgets Fancier UI/ more widgets Fewer Widgets / Confusion on workflow Early work. Focus on DOM, too basic RDF / Manually specify data int
64
Introduction Approach Evaluation Related Work Conclusion
Mainly focus on extraction / linear Widgets Fancier UI/ more widgets Fewer Widgets / Confusion on workflow Early work. Focus on DOM, too basic Create points on Map RDF / Manually specify data int
65
Introduction Approach Evaluation Related Work Conclusion
Mainly focus on extraction / linear Q/A approach / linear / scalability Widgets Fancier UI/ more widgets Fewer Widgets / Confusion on workflow Early work. Focus on DOM, too basic Create points on Map RDF / Manually specify data int
66
Introduction Approach Evaluation Related Work Conclusion
Mainly focus on extraction / linear Q/A approach / linear / scalability Widgets Fancier UI/ more widgets Fewer Widgets / Confusion on workflow Early work. Focus on DOM, too basic Create points on Map RDF / Manually specify data int Tuple = card. Drawing links for relations
67
– RoadRunner (exploit HTML structure) [Crescenzi et al., 2001] – Adel (grammer induction to detect rows) [Lerman+ 2001] – VisualWeb (OCR technique to detect tables) [Gatterbauer+ 2007]
– WIEN (inductive – less expressive than stalker) [Kushmerick 1997] – Stalker (Cotesting) [Muslea+ 1999] – SoftMealy (finite state transducer) [Hsu 1998] – WHISK (rigid format, exact delimiter) [Soderland 1998]
– Simile [Huynh+ 2005] – Dapper – Interactive Wrapper Generation (ML + prediction on DOM) [Irmak+ 2006] – PLOW (add natural language) [Allen+ 2007] – Cards [Dontcheva+ 2007] – Karma [Tuchinda+ 2008]
Introduction Approach Evaluation Related Work Conclusion
68
– Schema-level match
[Milo+ 98]
[Palopoli+ 99]
[Castano+ 01]
[Clifton+ 97]
– +Instance-based matcher
[Li 00]
[Doan 01]
[Etzioni 95]
[Dhamanka 04]
[Ling 01]
[Carman 07]
– String Similarities [Cohen+ 2003]
Introduction Approach Evaluation Related Work Conclusion
69
– ACR/Data, Migration Architect
[Chaudhuri+ 1997]
cleaning system
– Levenshtein distance
[Needleman+ 70]
– Vector based
[Baeza-Yates+ 99]
– EM
[Ristad+ 98]
– SVM
[Bilenko+ 03]
– Fuzzy Match
[Chaudhuri+ 03]
– Apollo
[Michalowski+ 05]
– Phoebus
[Michelson+ 07]
– Potter’s wheel
[Raman+ 01]
– Gained reference sources through source modeling process – Provided predefined transformations
Introduction Approach Evaluation Related Work Conclusion
70
formulate the query [Ullman 1980, 1988]
not return results
– QBE [Zloof 1975]
partial description
– Helgon [Fischer 1989]
– RABBIT [Williams 1982]
(graphs)
– Gql [Benzi 1998, Haw 1994, Papantonakis 1988]
required.
– Agent Wizard [Tuchinda+ 2004]
required
– Clio [Ling 01]
Introduction Approach Evaluation Related Work Conclusion
71
Introduction Approach Evaluation Related Work Conclusion
72
Introduction Approach Evaluation Related Work Conclusion
73
74
75
Source: http://www.w3.org
76
77
78
79