An Algorithm better than AO*? Blai Bonet Universidad Sim on Bol - PowerPoint PPT Presentation

An Algorithm better than AO*? Blai Bonet Universidad Sim´ on Bol´ ıvar Caracas, Venezuela H´ ector Geffner ICREA and Universitat Pompeu Fabra Barcelona, Spain 7/2005 An Algorithm Better than AO*? B. Bonet and H. Geffner; 7/05 1

Motivation • Heuristic Search methods can be efficient but lack common foundation: IDA* , AO* , Alpha-Beta , ... • Dynamic Programming methods such as Value Iteration are general but not as efficient • Question: can we the get the best of both; i.e., generality and efficiency ? • Answer is yes , combining their key ideas: Admissible Heuristics (Lower Bounds) Learning (Value Updates as in LRTA*, RTDP, etc) An Algorithm Better than AO*? B. Bonet and H. Geffner; 7/05 2

What does proposed integration give us? An algorithm schema , called LDFS , that is simple , general , and efficient : • simple because it can be expressed in a few lines of code; indeed LDFS = Depth First Search + Learning • general because it handles many models: OR Graphs (IDA*), AND/OR Graphs (AO*), Game Trees (Alpha-Beta), MDPs, etc. • efficient because it reduces to state-of-the-art algorithms in many of these models, while in others, yields new competitive algorithms; e.g. IDA* + TT for OR-Graphs � LDFS = MTD ( −∞ ) for Game Trees We also show that LDFS better than AO* over Max AND/OR Graphs . . . An Algorithm Better than AO*? B. Bonet and H. Geffner; 7/05 3

What does proposed integration give us? (cont'd) • Like LRTA*, RTDP, and LAO*, LDFS combines lower bounds with learning , but motivation and goals are slightly different • By accounting for and generalizing existing algorithms , we aim to uncover the three key computational ideas that underlie them all so that nothing else is left out . These ideas are: Depth First Search Lower Bounds Learning • It is also useful to know that, say, new MDP algorithm, reduces to well-known and tested algorithms when applied OR-Graphs or Game Trees An Algorithm Better than AO*? B. Bonet and H. Geffner; 7/05 4

Models 1. a discrete and finite states space S , 2. an initial state s 0 ∈ S , 3. a non-empty set of terminal states S T ⊆ S , 4. actions A ( s ) ⊆ A applicable in each non-terminal state, 5. a function that maps states and actions into sets of states F ( a, s ) ⊆ S , 6. action costs c ( a, s ) for non-terminal states s , and 7. terminal costs c T ( s ) for terminal states. • Deterministic: | F ( a, s ) | = 1 , • Non-Deterministic: | F ( a, s ) | ≥ 1 , • MDPs: probabilities P a ( s ′ | s ) for s ′ ∈ F ( s, a ) that add up to 1 . . . An Algorithm Better than AO*? B. Bonet and H. Geffner; 7/05 5

Solutions (Optimal) Solutions can all be expressed in terms of value function V satisfying Bellman equation: � c T ( s ) if s is terminal V ( s ) = min a ∈ A ( s ) Q V ( a, s ) otherwise where Q V ( a, s ) stands for the cost-to-go value defined as: c ( a, s ) + V ( s ′ ) , s ′ ∈ F ( a, s ) for OR Graphs c ( a, s ) + max s ′ ∈ F ( a,s ) V ( s ′ ) for Max AND/OR Graphs s ′ ∈ F ( a,s ) V ( s ′ ) c ( a, s ) + � for Add AND/OR Graphs s ′ ∈ F ( a,s ) P a ( s ′ | s ) V ( s ′ ) c ( a, s ) + � for MDPs max s ′ ∈ F ( a,s ) V ( s ′ ) for Game Trees A policy (solution) π maps states into actions, must be closed around s 0 , and is optimal if π ( s ) = argmin a ∈ A ( s ) Q V ( a, s ) for V satisfying Bellman An Algorithm Better than AO*? B. Bonet and H. Geffner; 7/05 6

Value Iteration (VI): A general solution method Start with arbitrary cost function V 1. Repeat until residual over all s is 0 (i.e., LHS = RHS) 2. Update V ( s ) := min a ∈ A ( s ) Q V ( a, s ) for all s Return π V ( s ) = argmin a ∈ A ( s ) Q V ( a, s ) 3. • VI is simple and general (models encoded in form of Q V ), but also exhaustive (considers all states) and affected by dead-ends ( V ∗ ( s ) = ∞ ) • Both problems solvable using initial state s 0 and lower bound V . . . An Algorithm Better than AO*? B. Bonet and H. Geffner; 7/05 7

Find-and-Revise: Selective VI Schema Assume V admissible ( V ≤ V ∗ ) and monotonic ( V ( s ) ≤ min a ∈ A ( s ) Q V ( a, s ) ) Define s inconsistent if V ( s ) < min a ∈ A ( s ) Q V ( a, s ) ) Start with a lower bound V 1. Repeat until no more states found in a. 2. Find inconsistent s reachable from s 0 and π V a. Update V ( s ) to min a ∈ A ( s ) Q V ( a, s ) b. Return π V ( s ) = argmin a ∈ A ( s ) Q V ( a, s ) 3. s V ∗ ( s ) − V ( s ) iterations (provided • Find-and-Revise yields optimal π in at most � integer costs and no probabilities) • Proposed LDFS = Find-and-Revise with: – Find = DFS that backtracks on inconsistent states that – Updates states on backtracks, and – Labels as Solved states s with no inconsistencies beneath An Algorithm Better than AO*? B. Bonet and H. Geffner; 7/05 8

Learning in Depth-First Search (LDFS) ldfs-driver ( s 0 ) begin repeat solved := ldfs ( s 0) until solved return ( V, π ) end ldfs ( s ) begin if s is solved or terminal then if s is terminal then V ( s ) := cT ( s ) Mark s as solved return true flag := false foreach a ∈ A ( s ) do if QV ( a, s ) > V ( s ) then continue flag := true foreach s ′ ∈ F ( a, s ) do flag := ldfs ( s ′ ) & [ QV ( a, s ) ≤ V ( s )] if ¬ flag then break if flag then break if flag then π ( s ) := a Mark s as solved else V ( s ) := min a ∈ A ( s ) QV ( a, s ) return flag end An Algorithm Better than AO*? B. Bonet and H. Geffner; 7/05 9

Properties of LDFS and Bounded LDFS ldfs computes π ∗ for all models if V admissible (i.e. V ≤ V ∗ ) • For OR-Graphs and monotone V , ldfs = ida* + transposition tables • For Game Trees and V = −∞ , bounded ldfs = mtd ( −∞ ) • For Additive models, ldfs = bounded ldfs • For Max models, ldfs � = bounded ldfs LDFS (like VI, AO*, min-max LRTA*, etc) computes optimal solutions graphs where each node is an optimal solution subgraph; over Max Models , this isn’t needed. Bounded LDFS fixed this, enforcing consistency only where needed An Algorithm Better than AO*? B. Bonet and H. Geffner; 7/05 10

Empirical Evaluation: Algorithms, Heuristics, Domains • Algorithms: vi , ao* / cfc rev ∗ , min-max lrta* , ldfs , bounded ldfs • Heuristics: h = 0 and two domain-independent heuristics h 1 and h 2 • Domains – Coins: Find counterfeit coin among N coins; N = 10 , 20 , . . . , 60 . – Diagnosis: Find true state of system among M states with N binary tests: In one case, N = 10 and M in { 10 , 20 , . . . , 60 } , in second, M = 60 and N in { 10 , 12 , . . . , 28 } . – Rules: Derivation of atoms in acyclic rule systems with N atoms, and at most R rules per atom and M atoms per rule body . . . R = M = 50 and N in { 5000 , 10000 , . . . , 20000 } . – MTS: Predator must catch a prey that moves non-deterministically to a non-blocked adjacent cell in a given random maze of size N × N ; N = 15 , 20 , . . . , 40 . . . V ∗ | π ∗ | problem | S | N vi | A | | F | coins-10 43 3 2 172 3 9 coins-60 1,018 5 2 315K 3 12 mts-5 625 17 14 4 4 156 mts-35 1 , 5 M 573 322 4 4 220K 2 , 5 M mts-40 684 – 4 4 304K diag-60-10 29,738 6 8 10 2 119 diag-60-28 > 15 M 6 – 28 2 119 rules-5000 5,000 156 158 50 50 4,917 rules-20000 20,000 592 594 50 50 19,889 An Algorithm Better than AO*? B. Bonet and H. Geffner; 7/05 11

Empirical Evaluation: Results (1) coins / h = 0 coins / h = h1(#vi/2) coins / h = h2(#vi/2) 1000 1000 1000 LDFS / B-LDFS LDFS / B-LDFS VI VI VI 100 AO* / LRTA* 100 100 AO* / LRTA* 10 10 10 time in seconds 1 AO* LRTA* 1 0.1 1 0.01 0.1 0.1 LDFS / B-LDFS 0.001 Value Iteration Value Iteration Value Iteration 0.01 LDFS LDFS 0.01 LDFS Bounded LDFS 0.0001 Bounded LDFS Bounded LDFS AO* AO* AO* Min-Max LRTA* Min-Max LRTA* Min-Max LRTA* 0.001 1e-05 0.001 0 10 20 30 40 50 60 70 0 10 20 30 40 50 60 70 0 10 20 30 40 50 60 70 number of coins number of coins number of coins mts / h = 0 mts / h = h1(#vi/2) mts / h = h2(#vi/2) 1000 1000 1000 LDFS LDFS LDFS 100 100 100 LRTA* LRTA* LRTA* 10 10 10 B-LDFS time in seconds 1 1 B-LDFS B-LDFS 1 0.1 0.1 CFC CFC 0.1 CFC 0.01 0.01 0.01 VI VI 0.001 0.001 Value Iteration Value Iteration Value Iteration VI LDFS LDFS LDFS 0.001 Bounded LDFS 0.0001 Bounded LDFS 0.0001 Bounded LDFS AO*/CFC AO*/CFC AO*/CFC Min-Max LRTA* Min-Max LRTA* Min-Max LRTA* 0.0001 1e-05 1e-05 0 5 10 15 20 25 30 35 40 45 0 5 10 15 20 25 30 35 40 45 0 5 10 15 20 25 30 35 40 45 size of maze size of maze size of maze rules systems / max rules = 50, max body = 50 / h = zero rules systems / max rules = 50, max body = 50 / h = h1(#vi/2) rules systems / max rules = 50, max body = 50 / h = h2(#vi/2) VI / LDFS / B-LDFS AO* VI VI 100 100 AO* 100 AO* LRTA* time in seconds LDFS / B-LDFS LDFS / B-LDFS LRTA* LRTA* 10 10 10 Value Iteration Value Iteration Value Iteration LDFS LDFS LDFS Bounded LDFS Bounded LDFS Bounded LDFS AO* AO* AO* Min-Max LRTA* Min-Max LRTA* Min-Max LRTA* 1 1 1 5000 10000 15000 20000 25000 5000 10000 15000 20000 25000 5000 10000 15000 20000 25000 number of atoms number of atoms number of atoms An Algorithm Better than AO*? B. Bonet and H. Geffner; 7/05 12

An Algorithm better than AO*? Blai Bonet Universidad Sim on Bol - PowerPoint PPT Presentation

An Algorithm better than AO? Blai Bonet Universidad Sim on Bol var Caracas, Venezuela H ector Geffner ICREA and Universitat Pompeu Fabra Barcelona, Spain 7/2005 An Algorithm Better than AO? B. Bonet and H. Geffner; 7/05 1

>>> import this The Zen of Python, by Tim Peters Beautiful is better than ugly.

ROCKBOX FABRIQ EDITION ITS TIME FOR FOR BETTER SOUND. BETTER DESIGN. BETTER SPECS.

Better Advice, Better Lives Adults Select Committee 21 st June Usk 1 Better Advice, Better Lives

Odds Algorithm An Online Algorithm Group Fibonado 20. Dec 2016 Group Fibonado Odds Algorithm

Architecture Research On Transport Information Services of EXPO 2010 Shanghai China Better City,

Introductory Webinar Better Care, Better Health, Better Value A Better Rehabilitative Care System

Better health Better health Better health Better health for Europe: for Europe: p equitable

BETTER BART BETTER BAY AREA BETT BETTER ER BAR ART T / / BETT BETTER ER BAY Y AREA AREA

BETTER S BETTER S AFE THAN S AFE THAN S ORRY: ORRY: Navigating Data-Driven S Navigating

Technology and Humanity: opportunities and challenges for the next 10 years The Future is

Python Cam Allen cam@cs.duke.edu Based on slides by Zhenyu Zhou, Richard Guo What is Python?

When virtual is better than real When virtual is better than real Peter M. Chen Peter M. Chen

Visible Surface Determination CS418 Computer Graphics John C. Hart Painters Algorithm

Algorithm Analysis October 12, 2016 CMPE 250 Algorithm Analysis October 12, 2016 1 / 66

Italian Folk Multiplication Why Parallelization Algorithm Is Indeed Better: Which Algorithm Is .

ECE 242 Data Structures Lecture 2 Algorithm Analysis September 11, 2009 ECE242 L2: Algorithm

Energy Efficiency in Data Centers Through Optimized operations Eswar Viswanathan Director

Computer Security Cunsheng DING, HKUST COMP4631 Dr. Cunsheng DING Computer Security

Distillation Codes and DOS Resistant Multicast Prepared for CS 624 Fabian Monrose Johns

This study examines a sample scope for learning using technology as a platform for preschoolers.

An Invitation to Homotopy Type Theory Tingxiang Zou Type Theory Formal Systems: Churchs

Lec03: Writing Exploits Taesoo Kim 2 Scoreboard 3 Administrivia Survey: how many hours

CS 166: Information Security Reverse Engineering & Digital Rights Management Prof. Tom

Adapting to Alzheimers Understanding dementia in individuals with intellectual and

An Algorithm better than AO*? Blai Bonet Universidad Sim on Bol - PowerPoint PPT Presentation

An Algorithm better than AO*? Blai Bonet Universidad Sim on Bol var Caracas, Venezuela H ector Geffner ICREA and Universitat Pompeu Fabra Barcelona, Spain 7/2005 An Algorithm Better than AO*? B. Bonet and H. Geffner; 7/05 1

&gt;&gt;&gt; import this The Zen of Python, by Tim Peters Beautiful is better than ugly.

ROCKBOX FABRIQ EDITION ITS TIME FOR FOR BETTER SOUND. BETTER DESIGN. BETTER SPECS.

Better Advice, Better Lives Adults Select Committee 21 st June Usk 1 Better Advice, Better Lives

Odds Algorithm An Online Algorithm Group Fibonado 20. Dec 2016 Group Fibonado Odds Algorithm

Architecture Research On Transport Information Services of EXPO 2010 Shanghai China Better City,

Introductory Webinar Better Care, Better Health, Better Value A Better Rehabilitative Care System

Better health Better health Better health Better health for Europe: for Europe: p equitable

BETTER BART BETTER BAY AREA BETT BETTER ER BAR ART T / / BETT BETTER ER BAY Y AREA AREA

BETTER S BETTER S AFE THAN S AFE THAN S ORRY: ORRY: Navigating Data-Driven S Navigating

Technology and Humanity: opportunities and challenges for the next 10 years The Future is

Python Cam Allen cam@cs.duke.edu Based on slides by Zhenyu Zhou, Richard Guo What is Python?

When virtual is better than real When virtual is better than real Peter M. Chen Peter M. Chen

Visible Surface Determination CS418 Computer Graphics John C. Hart Painters Algorithm

Algorithm Analysis October 12, 2016 CMPE 250 Algorithm Analysis October 12, 2016 1 / 66

Italian Folk Multiplication Why Parallelization Algorithm Is Indeed Better: Which Algorithm Is .

ECE 242 Data Structures Lecture 2 Algorithm Analysis September 11, 2009 ECE242 L2: Algorithm

Energy Efficiency in Data Centers Through Optimized operations Eswar Viswanathan Director

Computer Security Cunsheng DING, HKUST COMP4631 Dr. Cunsheng DING Computer Security

Distillation Codes and DOS Resistant Multicast Prepared for CS 624 Fabian Monrose Johns

This study examines a sample scope for learning using technology as a platform for preschoolers.

An Invitation to Homotopy Type Theory Tingxiang Zou Type Theory Formal Systems: Churchs

Lec03: Writing Exploits Taesoo Kim 2 Scoreboard 3 Administrivia Survey: how many hours

CS 166: Information Security Reverse Engineering &amp; Digital Rights Management Prof. Tom

Adapting to Alzheimers Understanding dementia in individuals with intellectual and

An Algorithm better than AO? Blai Bonet Universidad Sim on Bol var Caracas, Venezuela H ector Geffner ICREA and Universitat Pompeu Fabra Barcelona, Spain 7/2005 An Algorithm Better than AO? B. Bonet and H. Geffner; 7/05 1

>>> import this The Zen of Python, by Tim Peters Beautiful is better than ugly.

CS 166: Information Security Reverse Engineering & Digital Rights Management Prof. Tom