Craig Knoblock University of Southern California 1
Learning to Optimize Plan Execution in Information Agents
Craig A. Craig A. Knoblock Knoblock University of Southern California University of Southern California
Learning to Optimize Plan Execution in Information Agents Craig A. - - PowerPoint PPT Presentation
Learning to Optimize Plan Execution in Information Agents Craig A. Knoblock Knoblock Craig A. University of Southern California University of Southern California Craig Knoblock University of Southern California 1 Acknowledgements
Craig Knoblock University of Southern California 1
Craig A. Craig A. Knoblock Knoblock University of Southern California University of Southern California
Craig Knoblock Craig Knoblock University of Southern California University of Southern California 2 2
Electric Elves
Jose Luis Ambite Ambite
Maria Muslea Muslea
Hans Chalupsky Chalupsky
Yolanda Gil
Jean Oh
David V. Pynadath Pynadath
Thomas A. Russ
Milind Tambe Tambe
Theseus Agent Agent Execution Execution
Greg Barish Barish
Steve Minton
Maria Muslea Muslea
Speculative Execution
Greg Barish Barish
Funding
DARPA
AFOSR
NSF
Craig Knoblock Craig Knoblock University of Southern California University of Southern California 3 3
Sites provide limited capabilities for personalization personalization
Few sites are designed to be integrated with
Build agents that can perform retrieval, integration, and monitoring tasks on any integration, and monitoring tasks on any
Craig Knoblock Craig Knoblock University of Southern California University of Southern California 4 4
1. 1.
2. 2.
3. 3.
4. 4.
5. 5.
6. 6.
Craig Knoblock Craig Knoblock University of Southern California University of Southern California 5 5
Elves project goal: Apply agent technology to Elves project goal: Apply agent technology to support human organizations support human organizations
Applications: Office Elves and Travel Elves
W W W A g e n t P r o x i e s F o r P e o p l e I n f o r m a t i o n A g e n t s O n t o l o g y - b a s e d M a t c h m a k e r s
GRID
Craig Knoblock Craig Knoblock University of Southern California University of Southern California 6 6
Travel Elves created as an application of the Electric Elves Electric Elves
Given travel itinerary, generates set of agents for anticipating travel anticipating travel-
related failures and
Price changes
Schedule changes
Flight delays & cancellations
Earlier and close connections
Finding the closest restaurant given GPS coordinates
Craig Knoblock Craig Knoblock University of Southern California University of Southern California 7 7
Craig Knoblock Craig Knoblock University of Southern California University of Southern California 8 8
Travel Itinerary
W W W A g e n t P r o x i e s F o r P e o p l e I n f o r m a t i o n A g e n t s O n t o l o g y - b a s e d M a t c h m a k e r s
GRID
Flight Prices & Schedules Weather Flight Status Restaurants
Craig Knoblock Craig Knoblock University of Southern California University of Southern California 9 9
Flight-
Status Agent:
Flight delayed message:
Your United Airlines flight 190 has been delayed. Your United Airlines flight 190 has been delayed. It was originally scheduled to depart at 11:45 AM It was originally scheduled to depart at 11:45 AM and is now scheduled to depart at 12:30 PM. and is now scheduled to depart at 12:30 PM. The new arrival time is 7:59 PM. The new arrival time is 7:59 PM.
Flight cancelled message:
Your Delta Air Lines flight 200 has been cancelled. Your Delta Air Lines flight 200 has been cancelled.
Fax to hotel message:
Attention: Registration Desk Attention: Registration Desk I am sending this message on behalf of David I am sending this message on behalf of David Pynadath Pynadath, who has a reservation at your hotel. David , who has a reservation at your hotel. David Pynadath Pynadath is on United Airlines 190, which is now is on United Airlines 190, which is now scheduled to arrive at IAD at 7:59 PM. Since the scheduled to arrive at IAD at 7:59 PM. Since the flight will be arriving late, I would like to request flight will be arriving late, I would like to request that you indicate this in the reservation so that the that you indicate this in the reservation so that the room is not given away. room is not given away.
Craig Knoblock Craig Knoblock University of Southern California University of Southern California 10 10
Airfare Agent: Airfare dropped message
The airfare for your American Airlines itinerary The airfare for your American Airlines itinerary (IAD (IAD -
LAX) dropped to $281.
Earlier-
Flight Agent: Earlier flights message
The status of your currently scheduled flight is: The status of your currently scheduled flight is: # 190 LAX (11:45 AM) # 190 LAX (11:45 AM) -
IAD (7:29 PM) 45 minutes Late If you would like to return earlier, the following If you would like to return earlier, the following United Airlines flights will arrive earlier than your United Airlines flights will arrive earlier than your scheduled flights: scheduled flights: # 946 LAX (8:31 AM) # 946 LAX (8:31 AM) -
IAD (3:35 PM) 11 minutes Late
# 388 LAX (9:25 AM) -
DEN (12:25 PM) 10 minutes Late # 1534 DEN (1:20 PM) # 1534 DEN (1:20 PM) -
IAD (6:06 PM) On Time
Craig Knoblock Craig Knoblock University of Southern California University of Southern California 11 11
1. 1.
2. 2.
3. 3.
4. 4.
5. 5.
6. 6.
Craig Knoblock Craig Knoblock University of Southern California University of Southern California 12 12
Information gathering may involve accessing and integrating data from many sources and integrating data from many sources
Total time to execute these plans may be large
Slow remote sources
Unpredictable network latencies
Binding patterns
Source cannot be queried until a previous query has been answered been answered
Result: execution is often I/O-
bound
Craig Knoblock Craig Knoblock University of Southern California University of Southern California 13 13
Plan language and and execution system execution system for Web for Web-
based information integration
Expressive enough for monitoring a variety of sources
Efficient enough for real-
time monitoring
Theseus
Executor
PLAN myplan { INPUT: x OUTPUT: y BODY { Op (x : y) } } 01010101010110 00011101101011 11010101010101
Plan Input Data
Craig Knoblock Craig Knoblock University of Southern California University of Southern California 14 14
Examples: : Wrapper Wrapper, , Select Select, etc. , etc.
Operators produce and consume data
Operators “fire” upon any input data
Wrapper Select Join Wrapper
Address 100 Main St., Santa Monica, 90292 520 4th St. Santa Monica, 90292 2 Ocean Blvd, Venice, 90292
City State Max Price Santa Monica CA 200000
Input relation Output relation Plan
Craig Knoblock Craig Knoblock University of Southern California University of Southern California 15 15
MUL MUL ADD a b c d
Operations scheduled by data availability availability
Independent operations execute in parallel
Maximizes horizontal parallelism
Example: computing : computing (a*b) + (c*d)
Operations emit data as soon as possible possible
Independent data processed in parallel
Maximizes vertical parallelism
Producer Consumer MUL MUL ADD a b c d
Craig Knoblock Craig Knoblock University of Southern California University of Southern California 16 16
Prices of used cars
Safety ratings
Reviews
2002 Midsize coupe/hatchback
$4K-
$12K,
No Oldsmobiles Oldsmobiles
Craig Knoblock Craig Knoblock University of Southern California University of Southern California 17 17
meet criteria
Craig Knoblock Craig Knoblock University of Southern California University of Southern California 18 18
meet criteria
Oldsmobiles
Craig Knoblock Craig Knoblock University of Southern California University of Southern California 19 19
meet criteria
Oldsmobiles
reviews for each
Craig Knoblock Craig Knoblock University of Southern California University of Southern California 20 20
meet criteria
Oldsmobiles
reviews for each
reviews of each
Craig Knoblock Craig Knoblock University of Southern California University of Southern California 21 21
Craig Knoblock Craig Knoblock University of Southern California University of Southern California 22 22
Standard von Neumann model
Execute one operation at a time
Each operation processes all of its input before output is used for next operation is used for next operation
Assume: : 1000ms per I/O op, 100ms per CPU op 1000ms per I/O op, 100ms per CPU op
Execution time = 13.4 sec
time (seconds)
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Select Edmunds NHTSA CG Search CG Summary CG Full
CPU-bound operation I/O-bound operation
Craig Knoblock Craig Knoblock University of Southern California University of Southern California 23 23
WRAPPER
ConsumerGuide Search
(Midsize coupe/hatchback, $4000 to $12000, 2002) ((http://cg.com/summ/20812.htm),
((http://cg.com/full/20812.htm),
search criteria
WRAPPER
ConsumerGuide Summary
WRAPPER
ConsumerGuide Full Review
(car reviews)
WRAPPER
Edmunds Search
((Oldsmobile Alero), (Dodge Stratus), (Pontiac Grand Am), (Mercury Cougar)) JOIN SELECT
maker != "Oldsmobile"
WRAPPER
NHTSA Search
(safety reports)
((Dodge Stratus), (Pontiac Grand Am), (Mercury Cougar))
Craig Knoblock Craig Knoblock University of Southern California University of Southern California 24 24
Improved, but plan remains I/O-
bound (76%) (76%)
Main problem: : remote source latencies remote source latencies
Meanwhile, local resources are wasted
Complicating factor: : binding constraints binding constraints
Remote queries dependent on other remote queries
Question: : How can execution be more efficient? How can execution be more efficient?
time (seconds)
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Select Edmunds CG Search CG Summary CG Full Join
Craig Knoblock Craig Knoblock University of Southern California University of Southern California 25 25
1. 1.
2. 2.
3. 3.
4. 4.
5. 5.
6. 6.
Craig Knoblock Craig Knoblock University of Southern California University of Southern California 26 26
Craig Knoblock Craig Knoblock University of Southern California University of Southern California 27 27
Execute operators ahead of schedule
Predict data based on past execution
Allows greater degree of parallelism
Solves the problem caused by binding constraints
Can lead to speedups > streaming dataflow
time (seconds)
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Select Edmunds CG Search CG Summary CG Full Join
Craig Knoblock Craig Knoblock University of Southern California University of Southern California 28 28
General problem
Means for issuing and confirming predictions
Two new operators
Speculate: Makes predictions based on "hints" : Makes predictions based on "hints"
Confirm: Prevents errant results from exiting plan : Prevents errant results from exiting plan
Speculate
answers hints confirmations predictions/additions
Confirm
confirmations probable results actual results
Craig Knoblock Craig Knoblock University of Southern California University of Southern California 29 29
J S W W W W W
BEFORE
Same criteria yields same cars
Craig Knoblock Craig Knoblock University of Southern California University of Southern California 30 30
AFTER
Same criteria yields same cars
J S W W Speculate
hints predictions/additions confirmations answers
W Confirm W W
Craig Knoblock Craig Knoblock University of Southern California University of Southern California 31 31
J S W W Speculate W Confirm W W
2002 Midsize coupe $4000-$12000
Time = 0.0 sec
Craig Knoblock Craig Knoblock University of Southern California University of Southern California 32 32
J S W W Speculate W Confirm W W
Oldsmobile Alero T1 Dodge Stratus T2 Pontiac Grand Am T3 Mercury Cougar T4
Time = 0.1 sec
Craig Knoblock Craig Knoblock University of Southern California University of Southern California 33 33
J S W W Speculate W Confirm W W
Dodge Stratus T2 Pontiac Grand Am T3 Mercury Cougar T4
Time = 0.2 sec
Craig Knoblock Craig Knoblock University of Southern California University of Southern California 34 34
J S W W Speculate W Confirm W W
Oldsmobile Alero Dodge Stratus Pontiac Grand Am Mercury Cougar
Time = 1.0 sec
Craig Knoblock Craig Knoblock University of Southern California University of Southern California 35 35
J S W W Speculate W Confirm W W
T1 T2 T3 T4
Time = 1.1 sec Additions (corrections), if any
Craig Knoblock Craig Knoblock University of Southern California University of Southern California 36 36
J S W W Speculate W Confirm W W
Dodge Stratus (safety) (review) T2 Pontiac Grand Am (safety) (review) T3 Mercury Cougar (safety) (review) T4
Time = 3.2 sec
Craig Knoblock Craig Knoblock University of Southern California University of Southern California 37 37
J S W W Speculate W Confirm W W
Dodge Stratus (safety) (review) Pontiac Grand Am (safety) (review) Mercury Cougar (safety) (review)
Time = 3.3 sec
Craig Knoblock Craig Knoblock University of Southern California University of Southern California 38 38
Confirm blocks predictions (and results of) from blocks predictions (and results of) from exiting plan before verification exiting plan before verification
CPU
Speculative operations use "speculative threads"
Lower priority threads
Memory and bandwidth
Speculative operations allocate "speculative resources"
Drawn from "speculative pool" of memory / objects
Craig Knoblock Craig Knoblock University of Southern California University of Southern California 39 39
Single speculation allows a max speedup of 2
Time spent either speculating or confirming
Cascading speculation allows arbitrary arbitrary speedups speedups
Up to the length of the longest plan flow
W
a
W W
b c
W
d
W W
e f
W
g
W W
h i
W
j
W W W W W W W W W W S S S S S S S S S C
Craig Knoblock Craig Knoblock University of Southern California University of Southern California 40 40
Use predicted cars to speculate about the ConsumerGuide ConsumerGuide summary and full URLs summary and full URLs
Optimistic performance
Execution time: max
max { {1.2, 1.4, 1.5, 1.6
1.2, 1.4, 1.5, 1.6}
} =
= 1.6 sec 1.6 sec
Speedup over streaming dataflow: (4.2/1.6) (4.2/1.6) = = 2.63 2.63
W J S W W SPEC CONF SPEC W W SPEC
Craig Knoblock Craig Knoblock University of Southern California University of Southern California 41 41
Agent plans are automatically modified for speculative execution speculative execution
Successive runs of the plan benefit
Even with different input data
Optimize only the most expensive path (MEP MEP) )
Algorithm
1. 1.Find MEP Find MEP 2. 2.Find best candidate speculative plan Find best candidate speculative plan transformation transformation 3. 3.IF no candidate found, THEN exit IF no candidate found, THEN exit 4. 4.Transform plan accordingly Transform plan accordingly 5. 5.REPEAT REPEAT (anytime property) (anytime property)
Craig Knoblock Craig Knoblock University of Southern California University of Southern California 42 42
0.00 0.50 1.00 1.50 2.00 2.50 3.00 3.50 4.00 4.50
CarInfo RepInfo TheaterLoc FlightStatus StockInfo
Plan Speedup
100% correct 50% correct 0.00 0.50 1.00 1.50 2.00 2.50 3.00 3.50 4.00 4.50 CarInfo RepInfo TheaterLoc FlightStatus StockInfo
Plan Speedup
100% correct 50% correct
Craig Knoblock Craig Knoblock University of Southern California University of Southern California 43 43
1. 1.
2. 2.
3. 3.
4. 4.
5. 5.
6. 6.
Craig Knoblock Craig Knoblock University of Southern California University of Southern California 44 44
Better value prediction = better speedups
Prediction capability
Examples:
Edmunds car list Edmunds car list from from search criteria search criteria
2002 Midsize coupe 4K 2002 Midsize coupe 4K-
12K Olds Olds Alero Alero, Dodge Stratus, Pontiac Grand Am, Mercury Cougar , Dodge Stratus, Pontiac Grand Am, Mercury Cougar
ConsumerGuide ConsumerGuide full review URL full review URL from from summary URL summary URL
http://cg.com/summary/20812.htm http://cg.com/summary/20812.htm http://cg.com/full/20812.htm http://cg.com/full/20812.htm
Category Hint Prediction A Previously seen Previously seen B Never seen Previously seen C Never seen Never seen
H P 5K-12K ?
http://cg.com/summary/12345.htm ?
Craig Knoblock Craig Knoblock University of Southern California University of Southern California 45 45
Associate a hint with a predicted value
2002 Midsize coupe 4K-
12K
Olds Alero Alero, Dodge Stratus, Pontiac Grand Am, Mercury Cougar , Dodge Stratus, Pontiac Grand Am, Mercury Cougar
Use features of a hint to predict value
EXAMPLE: : Predicting car list from Edmunds Predicting car list from Edmunds type = SUV : (Nissan Pathfinder, Ford Explorer) type = Midsize : :...min <= 10000 : (Olds Alero, Dodge Stratus) min > 10000 : (Honda Accord, Toyota Camry)
Cache Decision list
Year Type Min Max Car list 2002 Midsize 8000 15000 (Oldmobile Alero, Dodge Stratus) 2002 Midsize 7500 14500 (Oldmobile Alero, Dodge Stratus) 2002 SUV 14000 20000 (Nissan Pathfinder, Ford Explorer) 2001 Midsize 11000 18000 (Honda Accord, Toyota Camry) 2002 SUV 18000 22000 (Nissan Pathfinder, Ford Explorer)
Craig Knoblock Craig Knoblock University of Southern California University of Southern California 46 46
"http://cg.com/summary/" : ε : COPY
"." :
ε : COPY
Transducers are FSM that translate hints into predictions http://cg.com/summary/20812.htm http://cg.com/full/20812.htm
To create full review URL:
Craig Knoblock Craig Knoblock University of Southern California University of Southern California 47 47
Alternating seq seq of STATIC/DYNAMIC elements
State transitions (arcs) = high-
level
INSERT, CACHE, CLASSIFY, TRANSDUCE http://cg.com/summary/20812.htm http://cg.com/full/20812.htm
1
STATIC
2
DYNAMIC
3
STATIC
TRANSDUCE
Dodge Stratus http://cg.com/summary/20812.htm
CACHE or CLASSIFY
1
STATIC
2
DYNAMIC
3
STATIC
Craig Knoblock Craig Knoblock University of Southern California University of Southern California 48 48
Identify STATIC/DYNAMIC STATIC/DYNAMIC template template
Find LCS for the set of predicted values, using technique based on (Hirschberg 1975) based on (Hirschberg 1975)
For each STATIC STATIC element, element,
Construct INSERT INSERT arc to next automata state arc to next automata state
For each DYNAMIC DYNAMIC element, element,
Construct TRANSDUCE TRANSDUCE, , CLASSIFY CLASSIFY, or , or CACHE CACHE arc to arc to next automata state next automata state
Prefer TRANSDUCE and CLASSIFY because
Better predictive capability on average
Better space efficiency on average
Craig Knoblock Craig Knoblock University of Southern California University of Southern California 49 49
http://cg.com/summary/20812.htm
ANSWERS: HINTS: TEMPLATE
http://cg.com/summary/12345.htm http://cg.com/full/20812.htm http://cg.com/full/12345.htm
INSERT INSERT("http://cg.com/full/")
INSERT INSERT(".htm") ?? TRANSDUCE
Craig Knoblock Craig Knoblock University of Southern California University of Southern California 50 50
INSERT INSERT("http://cg.com/full/") TRANSDUCE TRANSDUCE(hint)
INSERT INSERT(".htm")
u:ACCEPT ACCEPT /:ACCEPT ACCEPT ε:ACCEPT ACCEPT
ε:COPY COPY .:ACCEPT ACCEPT
ε:ACCEPT ACCEPT ε:ACCEPT ACCEPT
http://cg.com/ http://cg.com/summ/20812.h summ/20812.htm tm
Craig Knoblock Craig Knoblock University of Southern California University of Southern California 51 51
1000 2000 3000 4000 5000 6000 7000 No spec (0) Spec (1-25) Spec (26-50) Spec (51-75) Spec (76-100) Spec (101-125)
Number of tuples seen Average agent execution time (ms)
First tuple Average tuple Last tuple
Craig Knoblock Craig Knoblock University of Southern California University of Southern California 52 52
1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
No spec (0) Spec (1-20) Spec (21-40) Spec (41-60) Spec (61-80) Number of tuples seen Average agent execution time (ms) First tuple Average tuple Last tuple
Craig Knoblock Craig Knoblock University of Southern California University of Southern California 53 53
1. 1.
2. 2.
3. 3.
4. 4.
5. 5.
6. 6.
Craig Knoblock Craig Knoblock University of Southern California University of Southern California 54 54
Speculative execution
Approximate & partial query results
[Hellerstein Hellerstein et al. 1997] et al. 1997] [ [Shanmugasundaram Shanmugasundaram et al. 2000] et al. 2000] [Raman [Raman and and Hellerstein Hellerstein 2001] 2001]
Executing anticipated actions in advance
Continual computation [Horvitz 2001]
[Horvitz 2001],
, time time-
critical decision making making [Greenwald and Dean 1994]
[Greenwald and Dean 1994]
Other types of speculative execution
File system prefetching prefetching [Chang and Gibson 1999]
[Chang and Gibson 1999],
, control
control speculation in workflow processing speculation in workflow processing [Hull et al. 2000]
[Hull et al. 2000]
Network prefetching prefetching
Craig Knoblock Craig Knoblock University of Southern California University of Southern California 55 55
Learning value predictors
Predicting commands
Command line prediction [Davison and Hirsh 1998, 2001]
[Davison and Hirsh 1998, 2001]
Assisted browsing [Lieberman 1995] [
[Lieberman 1995] [Joachims Joachims et al. et al. 1997] 1997]
Value prediction as speedup learning
[Fikes Fikes et al. 1972], [Mitchell 1983], [Minton 1988] et al. 1972], [Mitchell 1983], [Minton 1988]
Transducer learning
Provably correct transducers [
[Oncina Oncina et al. 1993] et al. 1993]
Issues: Requires many examples, generalization capability differs differs
Transducers for data extraction [Hsu and Chang 1999]
[Hsu and Chang 1999]
URL prediction
[Zukerman Zukerman et al. 1999], [Su et al. 2000] et al. 1999], [Su et al. 2000]
Craig Knoblock Craig Knoblock University of Southern California University of Southern California 56 56
1. 1.
2. 2.
3. 3.
4. 4.
5. 5.
6. 6.
Craig Knoblock Craig Knoblock University of Southern California University of Southern California 57 57
Can yield arbitrary speedups
Safe, fair
More accurate & space efficient than strictly caching caching
Craig Knoblock Craig Knoblock University of Southern California University of Southern California 58 58
Speculative execution is a form of speed-
up learning
Two very large search spaces:
Plan transformations for speculative execution
Value prediction for each speculate operator
Both of these are potential opportunities for CBR in information gathering information gathering
Could learn finer-
grained plan transformations that depend on the request the request
Could learn more sophisticated value prediction strategies (e.g., speculating on multiple inputs) (e.g., speculating on multiple inputs)
Finding the right speculative plan and value predictions can provide significant speedups! provide significant speedups!
Craig Knoblock Craig Knoblock University of Southern California University of Southern California 59 59
Craig Knoblock University of Southern California 60