Learning Semantic Definitions Learning Semantic Definitions for - - PowerPoint PPT Presentation

learning semantic definitions learning semantic
SMART_READER_LITE
LIVE PREVIEW

Learning Semantic Definitions Learning Semantic Definitions for - - PowerPoint PPT Presentation

Doctoral Thesis: Doctoral Thesis: Learning Semantic Definitions Learning Semantic Definitions for Information Sources on the Internet for Information Sources on the Internet Mark James Carman Mark James Carman Advisors: Advisors: Prof.


slide-1
SLIDE 1

Doctoral Thesis: Doctoral Thesis:

Learning Semantic Definitions Learning Semantic Definitions for Information Sources on the Internet for Information Sources on the Internet

Mark James Carman Mark James Carman Advisors: Advisors:

  • Prof. Paolo
  • Prof. Paolo Traverso

Traverso

  • Prof. Craig A. Knoblock
  • Prof. Craig A. Knoblock
slide-2
SLIDE 2

24 April 2007 24 April 2007 Thesis Defense Thesis Defense -

  • Mark James Carman

Mark James Carman 2 2

Abundance of Information Sources Abundance of Information Sources

Motivation Approach Search Scoring Extensions Experiments Related Work Conclusions

Orbitz Travel Deals Cheap Flights T r a v e l

  • c

i t y A i r f a r e s U s e d C a r s f

  • r

S a l e ! Y a h

  • C

l a s s i f i e d s G

  • g

l e B a s e H

  • t

e l s Tsunami Warnings! Exchange Rates Weather Forecasts Realtime Stock Quote Weather Conditions Package Deals Last Minute Flights N e w C a r s f

  • r

S a l e ! C l a s s i f i e d L i s t i n g s H

  • t

e l D e a l s Earthquake Data Currency Rates Stock Quotes F l i g h t S t a t u s

slide-3
SLIDE 3

24 April 2007 24 April 2007 Thesis Defense Thesis Defense -

  • Mark James Carman

Mark James Carman 3 3

Bringing the Data Together Bringing the Data Together

Motivation Approach Search Scoring Extensions Experiments Related Work Conclusions

Orbitz Travel Deals Cheap Flights T r a v e l

  • c

i t y A i r f a r e s G

  • g

l e B a s e H

  • t

e l s Exchange Rates Weather Forecasts Package Deals Last Minute Flights H

  • t

e l D e a l s F l i g h t S t a t u s

slide-4
SLIDE 4

24 April 2007 24 April 2007 Thesis Defense Thesis Defense -

  • Mark James Carman

Mark James Carman 4 4

Bringing the Data Together Bringing the Data Together

Motivation Approach Search Scoring Extensions Experiments Related Work Conclusions

Orbitz Travel Deals Cheap Flights T r a v e l

  • c

i t y A i r f a r e s G

  • g

l e B a s e H

  • t

e l s Exchange Rates Weather Forecasts Package Deals Last Minute Flights H

  • t

e l D e a l s F l i g h t S t a t u s

slide-5
SLIDE 5

24 April 2007 24 April 2007 Thesis Defense Thesis Defense -

  • Mark James Carman

Mark James Carman 5 5

Mediators resolve Heterogeneity Mediators resolve Heterogeneity

Motivation Approach Search Scoring Extensions Experiments Related Work Conclusions

Orbitz Travel Deals Cheap Flights T r a v e l

  • c

i t y A i r f a r e s G

  • g

l e B a s e H

  • t

e l s Exchange Rates Weather Forecasts Package Deals Last Minute Flights H

  • t

e l D e a l s F l i g h t S t a t u s

Mediator

slide-6
SLIDE 6

24 April 2007 24 April 2007 Thesis Defense Thesis Defense -

  • Mark James Carman

Mark James Carman 6 6

Mediators Mediators Require Source Definitions

Require Source Definitions

  • New service = > no source definition!

New service = > no source definition!

  • Can we discover a definition automatically?

Can we discover a definition automatically?

R e f

  • r

m u l a t e d Q u e r y Query SELECT MIN(price) FROM flight WHERE depart=“LAX” AND arrive=“MXP” Reformulated Query Reformulated Query

l

  • w

e s t F a r e ( “ L A X ” , “ M X P ” )

calcPrice(“LAX”,“MXP”,”economy”)

Motivation Approach Search Scoring Extensions Experiments Related Work Conclusions

Orbitz Flight Search United Airlines Qantas Specials A l i t a l i a Source Definitions:

  • Orbitz Flight Search
  • United Airlines
  • Qantas Specials

Mediator

slide-7
SLIDE 7

24 April 2007 24 April 2007 Thesis Defense Thesis Defense -

  • Mark James Carman

Mark James Carman 7 7

Inducing Source Definitions by Example Inducing Source Definitions by Example

  • Step 1: classify input &

Step 1: classify input &

  • utput semantic types
  • utput semantic types

zipcode distance

Motivation Approach Search Scoring Extensions Experiments Related Work Conclusions

source1($zip, lat, long) :- centroid(zip, lat, long). source2($lat1, $long1, $lat2, $long2, dist) :- greatCircleDist(lat1, long1, lat2, long2, dist). source3($dist1, dist2) :- convertKm2Mi(dist1, dist2).

Known Source 1 Known Source 2 Known Source 3 N e w S

  • u

r c e 4

source4( $startZip, $endZip, separation)

A s s u m e t h i s p r

  • b

l e m h a s b e e n s

  • l

v e d !

slide-8
SLIDE 8

24 April 2007 24 April 2007 Thesis Defense Thesis Defense -

  • Mark James Carman

Mark James Carman 8 8

Inducing Source Definitions Inducing Source Definitions -

  • Step 2

Step 2

  • Step 1: classify input &

Step 1: classify input &

  • utput semantic types
  • utput semantic types
  • Step 2: generate

Step 2: generate plausible definitions plausible definitions

Motivation Approach Search Scoring Extensions Experiments Related Work Conclusions

source1($zip, lat, long) :- centroid(zip, lat, long). source2($lat1, $long1, $lat2, $long2, dist) :- greatCircleDist(lat1, long1, lat2, long2, dist). source3($dist1, dist2) :- convertKm2Mi(dist1, dist2).

Known Source 1 Known Source 2 Known Source 3 N e w S

  • u

r c e 4

source4( $zip1, $zip2, dist)

source4($zip1, $zip2, dist):- source1(zip1, lat1, long1), source1(zip2, lat2, long2), source2(lat1, long1, lat2, long2, dist2), source3(dist2, dist). source4($zip1, $zip2, dist):- centroid(zip1, lat1, long1), centroid(zip2, lat2, long2), greatCircleDist(lat1, long1, lat2, long2, dist2), convertKm2Mi(dist1, dist2).

slide-9
SLIDE 9

24 April 2007 24 April 2007 Thesis Defense Thesis Defense -

  • Mark James Carman

Mark James Carman 9 9

Inducing Source Definitions Inducing Source Definitions – – Step 3 Step 3

  • Step 1: classify input &

Step 1: classify input &

  • utput semantic types
  • utput semantic types
  • Step 2: generate

Step 2: generate plausible definitions plausible definitions

  • Step 3: invoke service

Step 3: invoke service & compare output & compare output

Motivation Approach Search Scoring Extensions Experiments Related Work Conclusions

source4($zip1, $zip2, dist):- source1(zip1, lat1, long1), source1(zip2, lat2, long2), source2(lat1, long1, lat2, long2, dist2), source3(dist2, dist). source4($zip1, $zip2, dist):- centroid(zip1, lat1, long1), centroid(zip2, lat2, long2), greatCircleDist(lat1, long1, lat2, long2, dist2), convertKm2Mi(dist1, dist2).

899.21 899.50 35555 10005 410.83 410.31 15201 60601 843.65 842.37 90266 80210

dist dist (predicted) (predicted) dist dist (actual) (actual) $zip2 $zip2 $zip1 $zip1

match

slide-10
SLIDE 10

24 April 2007 24 April 2007 Thesis Defense Thesis Defense -

  • Mark James Carman

Mark James Carman 10 10

Overlapping Data Requirement Overlapping Data Requirement

  • Assumption: overlap between new & known sources

Assumption: overlap between new & known sources

  • Nonetheless, the technique is widely applicable:

Nonetheless, the technique is widely applicable:

  • Redundancy

Redundancy

  • Scope or Completeness

Scope or Completeness

  • Binding Constraints

Binding Constraints

  • Composed Functionality

Composed Functionality

  • Access Time

Access Time

Motivation Approach Search Scoring Extensions Experiments Related Work Conclusions

B l

  • m

b e r g C u r r e n c y R a t e s W

  • r

l d w i d e H

  • t

e l D e a l s 5 * H

  • t

e l s B y S t a t e D i s t a n c e B e t w e e n Z i p c

  • d

e s Government Hotel List Great Circle Distance Centroid

  • f Zipcode

Hotels By Zipcode US Hotel Rates Yahoo Exchange Rates G

  • g

l e H

  • t

e l S e a r c h

slide-11
SLIDE 11

24 April 2007 24 April 2007 Thesis Defense Thesis Defense -

  • Mark James Carman

Mark James Carman 11 11

Searching for Definitions Searching for Definitions

  • Search space of

Search space of conjunctive queries: conjunctive queries:

target(X target(X) : ) :-

  • source1(X

source1(X1

1), source2(X

), source2(X2

2),

), … …

  • For scalability don

For scalability don’ ’t allow negation or union t allow negation or union

  • Perform Top

Perform Top-

  • Down Best

Down Best-

  • First Search

First Search

Motivation Approach Search Scoring Extensions Experiments Related Work Conclusions

Invoke target with set of random inputs; Add empty clause to queue; while (queue not empty) v := best definition from queue; forall (v’ in Expand(v)) if ( Eval(v’) > Eval(v) ) insert v’ into queue;

  • 1. First sample the

New Source

  • 2. Then perform best-first

search through space of candidate definitions Expressive Language

Sufficient for modeling most online sources

slide-12
SLIDE 12

24 April 2007 24 April 2007 Thesis Defense Thesis Defense -

  • Mark James Carman

Mark James Carman 12 12

Invoking the Target Invoking the Target

Generate Input Tuples: <zip1, dist1>

Invoke Motivation Approach Search Scoring Extensions Experiments Related Work Conclusions

N e w S

  • u

r c e 5

source5( $zip1, $dist1, zip2, dist2)

Invoke source with Invoke source with representative representative values values

  • Try randomly generating input

Try randomly generating input tuples tuples: :

  • Combine examples of each type

Combine examples of each type

  • Use distribution if available

Use distribution if available

{ < 07097, 0.26> , { < 07097, 0.26> , < 07030, 0.83> , < 07030, 0.83> , < 07310, 1.09> , ...} < 07310, 1.09> , ...} < 07307, 50.94> < 07307, 50.94> { } { } < 60632, 10874.2> < 60632, 10874.2>

Output < zip2, dist2> I nput < zip1, dist1>

Empty Result Non-empty Result Randomly Combined Example Values

slide-13
SLIDE 13

24 April 2007 24 April 2007 Thesis Defense Thesis Defense -

  • Mark James Carman

Mark James Carman 13 13

Invoking the Target Invoking the Target

Motivation Approach Search Scoring Extensions Experiments Related Work Conclusions

Invoke source with Invoke source with representative representative values values

  • Try randomly generating input

Try randomly generating input tuples tuples: :

  • Combine examples of each type

Combine examples of each type

  • Use distribution if available

Use distribution if available

  • If

If only empty invocations

  • nly empty invocations result

result

  • Try

Try invoking other sources invoking other sources to generate input to generate input

  • Continue until sufficient non

Continue until sufficient non-

  • empty invocations result

empty invocations result

Generate Input Tuples: <zip1, dist1>

Invoke

N e w S

  • u

r c e 5

source5( $zip1, $dist1, zip2, dist2)

slide-14
SLIDE 14

24 April 2007 24 April 2007 Thesis Defense Thesis Defense -

  • Mark James Carman

Mark James Carman 14 14

Top Top-

  • down Generation of Candidates

down Generation of Candidates

Start with empty clause & generate specialisations by Start with empty clause & generate specialisations by

  • Adding one predicate at a time from set of sources

Adding one predicate at a time from set of sources

  • Checking that each definition is:

Checking that each definition is:

  • Not logically redundant

Not logically redundant

  • Executable (binding constraints satisfied)

Executable (binding constraints satisfied)

source5(_,_,_,_). source5(zip1,_,_,_) :- source4(zip1,zip1,_). source5(zip1,_,zip2,dist2) :- source4(zip2,zip1,dist2). source5(_,dist1,_,dist2) :- <(dist2,dist1). …

Expand Motivation Approach Search Scoring Extensions Experiments Related Work Conclusions

N e w S

  • u

r c e 5

source5( $zip1,$dist1,zip2,dist2)

slide-15
SLIDE 15

24 April 2007 24 April 2007 Thesis Defense Thesis Defense -

  • Mark James Carman

Mark James Carman 15 15

Best Best-

  • first Enumeration of Candidates

first Enumeration of Candidates

  • Evaluate each clause produced

Evaluate each clause produced

  • Then expand best one found so far

Then expand best one found so far

  • Expand high

Expand high-

  • arity

arity predicates incrementally predicates incrementally

source5(zip1,dist1,zip2,dist2) :- source4(zip2,zip1,dist2), source4(zip1,zip2,dist1). source5(zip1,dist1,zip2,dist2) :- source4(zip2,zip1,dist2), < (dist2,dist1). …

Expand Motivation Approach Search Scoring Extensions Experiments Related Work Conclusions

source5(zip1,_,zip2,dist2) :- source4(zip2,zip1,dist2).

N e w S

  • u

r c e 5

slide-16
SLIDE 16

24 April 2007 24 April 2007 Thesis Defense Thesis Defense -

  • Mark James Carman

Mark James Carman 16 16

Limiting the Search Limiting the Search

  • Extremely Large Search space

Extremely Large Search space

  • Constrained by use of Semantic Types

Constrained by use of Semantic Types

  • Limit search by:

Limit search by:

  • Maximum Clause length

Maximum Clause length

  • Maximum Predicate Repetition

Maximum Predicate Repetition

  • Maximum Number of Existential Variables

Maximum Number of Existential Variables

  • Definition must be Executable

Definition must be Executable

  • Maximum Variable Repetition within Literal

Maximum Variable Repetition within Literal

Motivation Approach Search Scoring Extensions Experiments Related Work Conclusions

Standard ILP techniques Non-standard technique

slide-17
SLIDE 17

24 April 2007 24 April 2007 Thesis Defense Thesis Defense -

  • Mark James Carman

Mark James Carman 17 17

Evaluating Candidates Evaluating Candidates

  • Compare output of clause with that of target.

Compare output of clause with that of target.

  • Average the results across different input tuples.

Average the results across different input tuples.

e x e c u t e e x e c u t e

(clause) (clause)

<Input Tuple>

Target Tuples Clause Tuples

Compare

  • utputs

Motivation Approach Search Scoring Extensions Experiments Related Work Conclusions

N e w S

  • u

r c e 5 Known Source Known Source Known Source

i n v

  • k

e i n v

  • k

e

( t a r g e t ( t a r g e t ) )

slide-18
SLIDE 18

24 April 2007 24 April 2007 Thesis Defense Thesis Defense -

  • Mark James Carman

Mark James Carman 18 18

Candidates may return multiple Candidates may return multiple tuples tuples per input per input

  • Need measure that compares sets of

Need measure that compares sets of tuples tuples! !

Evaluating Candidates II Evaluating Candidates II

{ < 28072, 1.74> , { < 28072, 1.74> , < 28146, 3.41> , < 28146, 3.41> , < 28138, 3.97> , < 28138, 3.97> ,… …} } { < 07097, 0.26> , { < 07097, 0.26> , < 07030, 0.83> , < 07030, 0.83> , < 07310, 1.09> , ...} < 07310, 1.09> , ...} { } { }

Target Output < zip2, dist2>

{ < 28072, 1.74> , { < 28072, 1.74> , < 28146, 3.41> } < 28146, 3.41> } { } { } { < 60629, 2.15> , { < 60629, 2.15> , < 60682, 2.27> , < 60682, 2.27> , < 60623, 2.64 < 60623, 2.64> , .. > , ..} }

Clause Output < zip2, dist2>

< 28041, 240.46> < 28041, 240.46> < 07307, 50.94> < 07307, 50.94> < 60632, 874.2> < 60632, 874.2>

I nput < $zip1, $dist1>

Motivation Approach Search Scoring Extensions Experiments Related Work Conclusions

No Overlap No Overlap Overlap!

slide-19
SLIDE 19

24 April 2007 24 April 2007 Thesis Defense Thesis Defense -

  • Mark James Carman

Mark James Carman 19 19

PROBLEM: All sources assumed incomplete PROBLEM: All sources assumed incomplete

  • Even

Even optimal definition

  • ptimal definition may only produce overlap

may only produce overlap

  • Want definition that

Want definition that best predicts best predicts the target the target’ ’s output s output

  • Use

Use Jaccard Jaccard similarity to score candidates similarity to score candidates

Evaluating Candidates III Evaluating Candidates III

Motivation Approach Search Scoring Extensions Experiments Related Work Conclusions

return average(fitness) forall (tuple in I nputTuples) T_target = invoke(target, tuple) T_clause = execute(clause, tuple) if not (|T_target|= 0 and |T_clause|= 0) fitness =

T_clause T_target T_clause Τ_target U I

At least half of input tuples are non-empty invocations of target Similarity metric is Jaccard similarity between the sets Average results only when output is returned

slide-20
SLIDE 20

24 April 2007 24 April 2007 Thesis Defense Thesis Defense -

  • Mark James Carman

Mark James Carman 20 20

Missing Output Attributes Missing Output Attributes

  • Some candidates produce less output attributes:

Some candidates produce less output attributes:

  • Makes comparing them difficult

Makes comparing them difficult

  • Penalize candidate by number of

Penalize candidate by number of “ “negative examples negative examples” ”

  • First candidate doesn

First candidate doesn’ ’t produce either outputs, thus: t produce either outputs, thus:

  • Penalty = |{

Penalty = |{ zipcode zipcode} | x |{ distance} | } | x |{ distance} |

  • For numeric types use accuracy to approximate cardinality

For numeric types use accuracy to approximate cardinality

  • 1. source5(zip1,_,_,_)

:- source4(zip1,zip1,_).

  • 2. source5(zip1,_,zip2,dist2) :-

source4(zip2,zip1,dist2). source5($zipcode, $distance, zipcode, distance)

Motivation Approach Search Scoring Extensions Experiments Related Work Conclusions

slide-21
SLIDE 21

24 April 2007 24 April 2007 Thesis Defense Thesis Defense -

  • Mark James Carman

Mark James Carman 21 21

Different Input Attributes Different Input Attributes

  • Some clauses take different inputs from target:

Some clauses take different inputs from target:

  • zip2

zip2 is an input parameter for clause but not target is an input parameter for clause but not target

  • Should invoke operation with

Should invoke operation with every possible zip code every possible zip code! !

  • Problem: algorithm should return & not get banned!

Problem: algorithm should return & not get banned!

  • Solution: sample to estimate score for clause:

Solution: sample to estimate score for clause:

  • record the scaling factor = |{

record the scaling factor = |{ zipcode zipcode} |/ # invocations } |/ # invocations

  • bias search: choose at least half of tuples to be positive

bias search: choose at least half of tuples to be positive

source5($zip1,$dist1,zip2,_) :- source4($zip1,$zip2,dist1). > 40,000 zip codes in US

Motivation Approach Search Scoring Extensions Experiments Related Work Conclusions

Target Input Clause Input

slide-22
SLIDE 22

24 April 2007 24 April 2007 Thesis Defense Thesis Defense -

  • Mark James Carman

Mark James Carman 22 22

Approximating Equality Approximating Equality

Allow flexibility in values from different sources Allow flexibility in values from different sources

  • Numeric Types like

Numeric Types like distance distance

Error Bounds ( Error Bounds (eg

  • eg. + /

. + / -

  • 1%)

1%)

  • Nominal Types like

Nominal Types like company company

String Distance Metrics (e.g. String Distance Metrics (e.g. JaroWinkler JaroWinkler Score > 0.9) Score > 0.9)

  • Complex Types like

Complex Types like date date

Hand Hand-

  • written equality checking procedures.

written equality checking procedures.

Motivation Approach Search Scoring Extensions Experiments Related Work Conclusions

10.6 km ≈ 10.54 km Google Inc. ≈ Google Incorporated Mon, 31. July 2006 ≈ 7/31/06

slide-23
SLIDE 23

24 April 2007 24 April 2007 Thesis Defense Thesis Defense -

  • Mark James Carman

Mark James Carman 23 23

Extensions Extensions

Many extensions to basic algorithm are Many extensions to basic algorithm are discussed in thesis: discussed in thesis:

  • Inverse and functional sources

Inverse and functional sources

  • Constants in the modeling language

Constants in the modeling language

  • Post

Post-

  • processing (tightening) of definitions

processing (tightening) of definitions

  • Search heuristics based on semantic types

Search heuristics based on semantic types

  • Caching & determining if source is blocking

Caching & determining if source is blocking

Motivation Approach Search Scoring Extensions Experiments Related Work Conclusions

slide-24
SLIDE 24

24 April 2007 24 April 2007 Thesis Defense Thesis Defense -

  • Mark James Carman

Mark James Carman 24 24

Experiments Experiments – – Setup Setup

Problems:

  • 25 target predicates involving real services
  • same domain model used for each problem

(70 Semantic Types and 37 Predicates)

  • 35 known sources

System Settings:

  • Each target source invoked at least 20 times
  • Time limit of 20 minutes imposed

Motivation Approach Search Scoring Extensions Experiments Related Work Conclusions

Inductive search bias:

  • Maximum clause length 7
  • Predicate repetition limit 2
  • Maximum variable level 5
  • Candidate must be executable
  • Only 1 variable occurrence per literal

Equality Approximations:

  • 1% for distance, speed, temperature & price
  • 0.002 degrees for latitude & longitude
  • JaroWinkler > 0.85 for company, hotel & airport
  • hand-written procedure for date.
slide-25
SLIDE 25

24 April 2007 24 April 2007 Thesis Defense Thesis Defense -

  • Mark James Carman

Mark James Carman 25 25

Actual Learned Examples Actual Learned Examples

Motivation Approach Search Scoring Extensions Experiments Related Work Conclusions

1 GetDistanceBetweenZipCodes($zip0, $zip1, dis2):- GetCentroid(zip0, lat1, lon2), GetCentroid(zip1, lat4, lon5), GetDistance(lat1, lon2, lat4, lon5, dis10), ConvertKm2Mi(dis10, dis2). 2 USGSElevation($lat0, $lon1, dis2):- ConvertFt2M(dis2, dis1), Altitude(lat0, lon1, dis1). 3 YahooWeather($zip0, cit1, sta2, , lat4, lon5, day6, dat7,tem8, tem9, sky10) :- WeatherForecast(cit1,sta2,,lat4,lon5,,day6,dat7,tem9,tem8,,,sky10,,,), GetCityState(zip0, cit1, sta2). 4 GetQuote($tic0,pri1,dat2,tim3,pri4,pri5,pri6,pri7,cou8,,pri10,,,pri13,,com15) :- YahooFinance(tic0, pri1, dat2, tim3, pri4, pri5, pri6,pri7, cou8), GetCompanyName(tic0,com15,,),Add(pri5,pri13,pri10),Add(pri4,pri10,pri1). 5 YahooAutos($zip0, $mak1, dat2, yea3, mod4, , , pri7, ) :- GoogleBaseCars(zip0, mak1, , mod4, pri7, , , yea3), ConvertTime(dat2, , dat10, , ), GetCurrentTime( , , dat10, ).

Distinguished forecast from current conditions current price = yesterday’s close + change

slide-26
SLIDE 26

24 April 2007 24 April 2007 Thesis Defense Thesis Defense -

  • Mark James Carman

Mark James Carman 26 26

Experimental Results Experimental Results

  • Results for different domains:

Results for different domains:

Motivation Approach Search Scoring Extensions Experiments Related Work Conclusions

Attributes Learnt

  • Avg. Time

(sec)

  • Avg. # of

Candidates # of Problems Problem Domain 50% 940 68 2 cars 60% 374 43 4 hotels 69% 693 368 7 weather 59% 335 1606 2 financial 84% 303 136 9 geospatial

slide-27
SLIDE 27

24 April 2007 24 April 2007 Thesis Defense Thesis Defense -

  • Mark James Carman

Mark James Carman 27 27

Comparison with Other Systems Comparison with Other Systems

ILA & Category Translation ILA & Category Translation (

(Perkowitz Perkowitz & & Etzioni Etzioni 1995) 1995)

Learn functions describing operations on internet Learn functions describing operations on internet

  • My system learns

My system learns more complicated more complicated definitions definitions

  • Multiple attributes, Multiple output

Multiple attributes, Multiple output tuples tuples, etc. , etc.

iMAP iMAP (

(Dhamanka Dhamanka et. al. 2004)

  • et. al. 2004)

Discovers complex (many Discovers complex (many-

  • to

to-

  • 1) mappings between DB

1) mappings between DB schemas schemas

  • My system learns

My system learns many many-

  • to

to-

  • many

many mappings mappings

  • My approach is more general (single search algorithm)

My approach is more general (single search algorithm)

  • Deal with problem of invoking sources

Deal with problem of invoking sources

Motivation Approach Search Scoring Extensions Experiments Related Work Conclusions

slide-28
SLIDE 28

24 April 2007 24 April 2007 Thesis Defense Thesis Defense -

  • Mark James Carman

Mark James Carman 28 28

Conclusions Conclusions

Learning procedure for online information Learning procedure for online information services is: services is:

1.

  • 1. Automated

Automated

2.

  • 2. Expressive (

Expressive (conjunctive queries conjunctive queries) )

3.

  • 3. Efficient (

Efficient (access sources only as required access sources only as required) )

4.

  • 4. Robust (

Robust (to noisy and incomplete data to noisy and incomplete data) )

5.

  • 5. Evolving (

Evolving (improves with # of known sources improves with # of known sources) )

6.

  • 6. Scalable (

Scalable (for moderate size domain model for moderate size domain model) )

Generate Semantic Metadata for Semantic Web Generate Semantic Metadata for Semantic Web

  • Little motivation for providers to annotate services

Little motivation for providers to annotate services

  • Instead we generate metadata automatically

Instead we generate metadata automatically

Motivation Approach Search Scoring Extensions Experiments Related Work Conclusions

slide-29
SLIDE 29

24 April 2007 24 April 2007 Thesis Defense Thesis Defense -

  • Mark James Carman

Mark James Carman 29 29