Heuristic Search Planning With Multi-Objective Probabilistic LTL - - PowerPoint PPT Presentation

heuristic search planning with multi objective
SMART_READER_LITE
LIVE PREVIEW

Heuristic Search Planning With Multi-Objective Probabilistic LTL - - PowerPoint PPT Presentation

Heuristic Search Planning With Multi-Objective Probabilistic LTL Constraints Peter Baumgartner, Sylvie Thibaux, Felipe Trevizan Data61/CSIRO and Research School of Computer Science, ANU Australia Planning Under Uncertainty Goal Actions: move


slide-1
SLIDE 1

Peter Baumgartner, Sylvie Thiébaux, Felipe Trevizan Data61/CSIRO and Research School of Computer Science, ANU Australia

Heuristic Search Planning With Multi-Objective Probabilistic LTL Constraints

slide-2
SLIDE 2

Planning Under Uncertainty

2

Actions: move lefu, move right, enter, get Eve, exit Goal

slide-3
SLIDE 3

Planning Under Uncertainty

2

Actions: move lefu, move right, enter, get Eve, exit Goal Environment: door possibly jams, … action ⟹ stochastic environment response

slide-4
SLIDE 4

Planning Under Uncertainty

2

0.9 0.5 Actions: move lefu, move right, enter, get Eve, exit Goal Environment: door possibly jams, … action ⟹ stochastic environment response

slide-5
SLIDE 5

Planning Under Uncertainty

2

0.9 0.5 Actions: move lefu, move right, enter, get Eve, exit Stochastic Shortest Path Problem (SSP) Problem: What action to take in what state to reach the goal with minimal costs? Solution: Stochastic policy: probability distribution on actions
 “When at door 1 enter the room 3 out of 10 times,…”
 Goal Environment: door possibly jams, … action ⟹ stochastic environment response

slide-6
SLIDE 6

Planning Under Uncertainty

2

0.9 0.5 Actions: move lefu, move right, enter, get Eve, exit Stochastic Shortest Path Problem (SSP) Problem: What action to take in what state to reach the goal with minimal costs? Solution: Stochastic policy: probability distribution on actions
 “When at door 1 enter the room 3 out of 10 times,…”
 Goal Environment: door possibly jams, … action ⟹ stochastic environment response Add constraints for better expressivity (C-SSP)

  • well-known: “fuel < 5”
  • here: PLTL
slide-7
SLIDE 7

Multi-Objective Probabilistic LTL (MO-PLTL)

3

Eve stays in a room until Eve and Wall-E are together eve_in_a_room U together (ψ1) Once together, eventually together forever G (together ⇒ F G together) (ψ2) Wall-E never visits room1 twice G (wall-E_room1 ⇒ (wall-E_room1 U G ¬wall-E_room1) (ψ3) ψ := ⊤ | A | ψ ∧ ψ | ψ ∨ ψ | ¬ ψ | X ψ | ψ U ψ | F ψ | G ψ (LTL) ϕ := P>z ψ | P≥z ψ (PLTL) Additional Multi-Objective PLTL Constraint ϕ = P≥0.8 ψ1 ∧ P≥1.0 ψ2 ∧ P≥0.5 ψ3 (MO-PLTL) Task: compute a cost-minimal stochastic policy for reaching the goal (with probability 1) 
 such that ϕ is satisfied

slide-8
SLIDE 8

Multi-Objective Probabilistic LTL (MO-PLTL)

3

Eve stays in a room until Eve and Wall-E are together eve_in_a_room U together (ψ1) Once together, eventually together forever G (together ⇒ F G together) (ψ2) Wall-E never visits room1 twice G (wall-E_room1 ⇒ (wall-E_room1 U G ¬wall-E_room1) (ψ3) ψ := ⊤ | A | ψ ∧ ψ | ψ ∨ ψ | ¬ ψ | X ψ | ψ U ψ | F ψ | G ψ (LTL) ϕ := P>z ψ | P≥z ψ (PLTL) Additional Multi-Objective PLTL Constraint ϕ = P≥0.8 ψ1 ∧ P≥1.0 ψ2 ∧ P≥0.5 ψ3 (MO-PLTL) Task: compute a cost-minimal stochastic policy for reaching the goal (with probability 1) 
 such that ϕ is satisfied Not as used in “optimisation”

slide-9
SLIDE 9

Solving MO-PLTL

Methods Based on Probabilistic Verification

  • State of the art method, implemented in PRISM probabilistic model checker
  • Needs infinite runs


(1) add self-loop at Goal
 (2) add Goal constraint : ϕ = P1 ψ1 ∧ ⋯ ∧ Pk ψk ∧ P≥1 F Goal

  • Compute cross-product automaton 


  • Obtain policy for ϕ as a solution of a certain linear program obtained from A

Complexity

  • |DRA(ψ)| is double exponential in |ψ|
  • |S| is usually huge for planning problems - cannot afford to generate in full
  • Upfront DRA-computation/crossproduct is problematic even for small examples
  • The verification/synthesis problem is 2EXPTIME complete
  • Complicated algorithms (see also [deGiacomo&Vardi IJCAI2013, IJCAI2015])

4

A = DRA(ψ1) × ⋯ × DRA(ψk) × DRA(F Goal) × S (S is given state transition system, MDP).

slide-10
SLIDE 10

Solving MO-PLTL

Methods Based on Probabilistic Verification

  • State of the art method, implemented in PRISM probabilistic model checker
  • Needs infinite runs


(1) add self-loop at Goal
 (2) add Goal constraint : ϕ = P1 ψ1 ∧ ⋯ ∧ Pk ψk ∧ P≥1 F Goal

  • Compute cross-product automaton 


  • Obtain policy for ϕ as a solution of a certain linear program obtained from A

Complexity

  • |DRA(ψ)| is double exponential in |ψ|
  • |S| is usually huge for planning problems - cannot afford to generate in full
  • Upfront DRA-computation/crossproduct is problematic even for small examples
  • The verification/synthesis problem is 2EXPTIME complete
  • Complicated algorithms (see also [deGiacomo&Vardi IJCAI2013, IJCAI2015])

4

A = DRA(ψ1) × ⋯ × DRA(ψk) × DRA(F Goal) × S (S is given state transition system, MDP). ψ NBA DRA

slide-11
SLIDE 11

Solving MO-PLTL

Methods Based on Probabilistic Verification

  • State of the art method, implemented in PRISM probabilistic model checker
  • Needs infinite runs


(1) add self-loop at Goal
 (2) add Goal constraint : ϕ = P1 ψ1 ∧ ⋯ ∧ Pk ψk ∧ P≥1 F Goal

  • Compute cross-product automaton 


  • Obtain policy for ϕ as a solution of a certain linear program obtained from A

Complexity

  • |DRA(ψ)| is double exponential in |ψ|
  • |S| is usually huge for planning problems - cannot afford to generate in full
  • Upfront DRA-computation/crossproduct is problematic even for small examples
  • The verification/synthesis problem is 2EXPTIME complete
  • Complicated algorithms (see also [deGiacomo&Vardi IJCAI2013, IJCAI2015])

4

A = DRA(ψ1) × ⋯ × DRA(ψk) × DRA(F Goal) × S (S is given state transition system, MDP). We have a specific problem - all BSCCs are self-loops at goals - and can do better ψ NBA DRA

slide-12
SLIDE 12

Contributions

5

Verification Based Our Method General Yes No (Requires Goal) Complexity Double exponential in ϕ Single exponential in ϕ for (1) Heuristics No Yes (i2Dual) Approach Automata (DRA) (1) Formula progression, Tseitin (2) NBA State Space Upfront On-the-fly Baier&McIlraith ICAPS 2006: non-stochastic planning w/ LTL, heuristics, NFA, by compilation

slide-13
SLIDE 13

Contributions

5

Verification Based Our Method General Yes No (Requires Goal) Complexity Double exponential in ϕ Single exponential in ϕ for (1) Heuristics No Yes (i2Dual) Approach Automata (DRA) (1) Formula progression, Tseitin (2) NBA Rest of this talk: approach, complexity, heuristics, experiments State Space Upfront On-the-fly Baier&McIlraith ICAPS 2006: non-stochastic planning w/ LTL, heuristics, NFA, by compilation

slide-14
SLIDE 14

How to Check a Policy π for Satisfying a PLTL Formula

6

s0: [α → 0.6, β → 0.4 ] Given policy π = s0 sa α 0.6 0.4 sb sc β 0.7 0.3 sd {A} {A} s0 ⊨ P>0.6 F A The probability of all paths from s0 satisfying F A is > 0.6 Proof It follows s0 ⊨ P>0.6 F A

slide-15
SLIDE 15

How to Check a Policy π for Satisfying a PLTL Formula

6

s0: [α → 0.6, β → 0.4 ] Given policy π = s0 sa α 0.6 0.4 sb sc β 0.7 0.3 sd {A} {A} s0 ⊨ P>0.6 F A The probability of all paths from s0 satisfying F A is > 0.6 Proof iff Pr{p | p is a path from s0 and p ⊨ F A} > 0.6 It follows s0 ⊨ P>0.6 F A

slide-16
SLIDE 16

How to Check a Policy π for Satisfying a PLTL Formula

6

s0: [α → 0.6, β → 0.4 ] Given policy π = s0 sa α 0.6 0.4 sb sc β 0.7 0.3 sd {A} {A} s0 ⊨ P>0.6 F A The probability of all paths from s0 satisfying F A is > 0.6 Proof iff Pr{p | p is a path from s0 and p ⊨ F A} > 0.6 Non-probabilistic LTL Ignore finiteness of paths on this slide It follows s0 ⊨ P>0.6 F A

slide-17
SLIDE 17

How to Check a Policy π for Satisfying a PLTL Formula

6

s0: [α → 0.6, β → 0.4 ] Given policy π = s0 sa α 0.6 0.4 sb sc β 0.7 0.3 sd {A} {A} s0 ⊨ P>0.6 F A The probability of all paths from s0 satisfying F A is > 0.6 Proof iff Pr{p | p is a path from s0 and p ⊨ F A} > 0.6 iff Pr{s0sa, s0sc} > 0.6 Non-probabilistic LTL Ignore finiteness of paths on this slide It follows s0 ⊨ P>0.6 F A

slide-18
SLIDE 18

How to Check a Policy π for Satisfying a PLTL Formula

6

s0: [α → 0.6, β → 0.4 ] Given policy π = s0 sa α 0.6 0.4 sb sc β 0.7 0.3 sd {A} {A} s0 ⊨ P>0.6 F A The probability of all paths from s0 satisfying F A is > 0.6 Proof iff Pr{p | p is a path from s0 and p ⊨ F A} > 0.6 iff Pr{s0sa, s0sc} > 0.6 0.6 ⋅0.6 + 0.4 ⋅0.7 = 0.64 > 0.6 iff Non-probabilistic LTL Ignore finiteness of paths on this slide It follows s0 ⊨ P>0.6 F A

slide-19
SLIDE 19

How to Check a Policy π for Satisfying a PLTL Formula

6

s0: [α → 0.6, β → 0.4 ] Given policy π = s0 sa α 0.6 0.4 sb sc β 0.7 0.3 sd {A} {A} s0 ⊨ P>0.6 F A The probability of all paths from s0 satisfying F A is > 0.6 Proof iff Pr{p | p is a path from s0 and p ⊨ F A} > 0.6 iff Pr{s0sa, s0sc} > 0.6 0.6 ⋅0.6 + 0.4 ⋅0.7 = 0.64 > 0.6 iff Non-probabilistic LTL Ignore finiteness of paths on this slide It follows s0 ⊨ P>0.6 F A

Synthesize

slide-20
SLIDE 20

How to Check a Policy π for Satisfying a PLTL Formula

6

s0: [α → 0.6, β → 0.4 ] Given policy π = s0 sa α 0.6 0.4 sb sc β 0.7 0.3 sd {A} {A} s0 ⊨ P>0.6 F A The probability of all paths from s0 satisfying F A is > 0.6 Proof iff Pr{p | p is a path from s0 and p ⊨ F A} > 0.6 iff Pr{s0sa, s0sc} > 0.6 0.6 ⋅0.6 + 0.4 ⋅0.7 = 0.64 > 0.6 iff Non-probabilistic LTL Ignore finiteness of paths on this slide It follows s0 ⊨ P>0.6 F A Find

Synthesize

slide-21
SLIDE 21

How to Check a Policy π for Satisfying a PLTL Formula

6

s0: [α → 0.6, β → 0.4 ] Given policy π = s0 sa α 0.6 0.4 sb sc β 0.7 0.3 sd {A} {A} s0 ⊨ P>0.6 F A The probability of all paths from s0 satisfying F A is > 0.6 Proof iff Pr{p | p is a path from s0 and p ⊨ F A} > 0.6 iff Pr{s0sa, s0sc} > 0.6 0.6 ⋅0.6 + 0.4 ⋅0.7 = 0.64 > 0.6 iff Non-probabilistic LTL Ignore finiteness of paths on this slide It follows s0 ⊨ P>0.6 F A Find

Synthesize

slide-22
SLIDE 22

How to Check a Policy π for Satisfying a PLTL Formula

6

s0: [α → 0.6, β → 0.4 ] Given policy π = s0 sa α 0.6 0.4 sb sc β 0.7 0.3 sd {A} {A} s0 ⊨ P>0.6 F A The probability of all paths from s0 satisfying F A is > 0.6 Proof iff Pr{p | p is a path from s0 and p ⊨ F A} > 0.6 iff Pr{s0sa, s0sc} > 0.6 0.6 ⋅0.6 + 0.4 ⋅0.7 = 0.64 > 0.6 iff Non-probabilistic LTL Ignore finiteness of paths on this slide It follows s0 ⊨ P>0.6 F A Find

Synthesize

such that

slide-23
SLIDE 23

How to Check a Policy π for Satisfying a PLTL Formula

6

s0: [α → 0.6, β → 0.4 ] Given policy π = s0 sa α 0.6 0.4 sb sc β 0.7 0.3 sd {A} {A} s0 ⊨ P>0.6 F A The probability of all paths from s0 satisfying F A is > 0.6 Proof iff Pr{p | p is a path from s0 and p ⊨ F A} > 0.6 iff Pr{s0sa, s0sc} > 0.6 0.6 ⋅0.6 + 0.4 ⋅0.7 = 0.64 > 0.6 iff Non-probabilistic LTL Ignore finiteness of paths on this slide ↝ Quantify over action probabilities and
 compute solution It follows s0 ⊨ P>0.6 F A Find

Synthesize

such that

slide-24
SLIDE 24

How to Check a Policy π for Satisfying a PLTL Formula

6

s0: [α → 0.6, β → 0.4 ] Given policy π = s0 sa α 0.6 0.4 sb sc β 0.7 0.3 sd {A} {A} s0 ⊨ P>0.6 F A The probability of all paths from s0 satisfying F A is > 0.6 Proof iff Pr{p | p is a path from s0 and p ⊨ F A} > 0.6 iff Pr{s0sa, s0sc} > 0.6 0.6 ⋅0.6 + 0.4 ⋅0.7 = 0.64 > 0.6 iff Non-probabilistic LTL Ignore finiteness of paths on this slide ↝ Quantify over action probabilities and
 compute solution It follows s0 ⊨ P>0.6 F A Find

Synthesize

π(β | s0) π(α | s0) such that

slide-25
SLIDE 25

How to Check a Policy π for Satisfying a PLTL Formula

6

s0: [α → 0.6, β → 0.4 ] Given policy π = s0 sa α 0.6 0.4 sb sc β 0.7 0.3 sd {A} {A} s0 ⊨ P>0.6 F A The probability of all paths from s0 satisfying F A is > 0.6 Proof iff Pr{p | p is a path from s0 and p ⊨ F A} > 0.6 iff Pr{s0sa, s0sc} > 0.6 0.6 ⋅0.6 + 0.4 ⋅0.7 = 0.64 > 0.6 iff Non-probabilistic LTL Ignore finiteness of paths on this slide ↝ Quantify over action probabilities and
 compute solution It follows s0 ⊨ P>0.6 F A Find

Synthesize

π(α | s0) + π(β | s0) = 1 π(β | s0) π(α | s0) such that

slide-26
SLIDE 26

How to Check a Policy π for Satisfying a PLTL Formula

6

s0: [α → 0.6, β → 0.4 ] Given policy π = s0 sa α 0.6 0.4 sb sc β 0.7 0.3 sd {A} {A} s0 ⊨ P>0.6 F A The probability of all paths from s0 satisfying F A is > 0.6 Proof iff Pr{p | p is a path from s0 and p ⊨ F A} > 0.6 iff Pr{s0sa, s0sc} > 0.6 0.6 ⋅0.6 + 0.4 ⋅0.7 = 0.64 > 0.6 iff Non-probabilistic LTL Ignore finiteness of paths on this slide ↝ Quantify over action probabilities and
 compute solution It follows s0 ⊨ P>0.6 F A Find

Synthesize

π(α | s0) + π(β | s0) = 1 0.6 0.4 π(β | s0) π(α | s0) such that

slide-27
SLIDE 27

How to Check a Policy π for Satisfying a PLTL Formula

6

s0: [α → 0.6, β → 0.4 ] Given policy π = s0 sa α 0.6 0.4 sb sc β 0.7 0.3 sd {A} {A} s0 ⊨ P>0.6 F A The probability of all paths from s0 satisfying F A is > 0.6 Proof iff Pr{p | p is a path from s0 and p ⊨ F A} > 0.6 iff Pr{s0sa, s0sc} > 0.6 0.6 ⋅0.6 + 0.4 ⋅0.7 = 0.64 > 0.6 iff Non-probabilistic LTL Ignore finiteness of paths on this slide ↝ Quantify over action probabilities and
 compute solution It follows s0 ⊨ P>0.6 F A Find

Synthesize

π(α | s0) + π(β | s0) = 1 0.6 0.4 π(β | s0) π(α | s0) such that Non-linear program in general - we use dual-space LPs instead

slide-28
SLIDE 28

How to Check a Policy π for Satisfying a PLTL Formula

6

s0: [α → 0.6, β → 0.4 ] Given policy π = s0 sa α 0.6 0.4 sb sc β 0.7 0.3 sd {A} {A} s0 ⊨ P>0.6 F A The probability of all paths from s0 satisfying F A is > 0.6 Proof iff Pr{p | p is a path from s0 and p ⊨ F A} > 0.6 iff Pr{s0sa, s0sc} > 0.6 0.6 ⋅0.6 + 0.4 ⋅0.7 = 0.64 > 0.6 iff ↝ Quantify over action probabilities and
 compute solution It follows s0 ⊨ P>0.6 F A Find

Synthesize

π(α | s0) + π(β | s0) = 1 0.6 0.4 π(β | s0) π(α | s0) such that Non-linear program in general - we use dual-space LPs instead

slide-29
SLIDE 29

How to Check a Policy π for Satisfying a PLTL Formula

6

s0: [α → 0.6, β → 0.4 ] Given policy π = s0 sa α 0.6 0.4 sb sc β 0.7 0.3 sd {A} {A} s0 ⊨ P>0.6 F A The probability of all paths from s0 satisfying F A is > 0.6 Proof iff Pr{p | p is a path from s0 and p ⊨ F A} > 0.6 iff Pr{s0sa, s0sc} > 0.6 0.6 ⋅0.6 + 0.4 ⋅0.7 = 0.64 > 0.6 iff ↝ Quantify over action probabilities and
 compute solution It follows s0 ⊨ P>0.6 F A Find

Synthesize

π(α | s0) + π(β | s0) = 1 0.6 0.4 π(β | s0) π(α | s0) (1) Formula progression, or (2) NBA mode Contributions such that Non-linear program in general - we use dual-space LPs instead

slide-30
SLIDE 30

How to Check a Policy π for Satisfying a PLTL Formula

6

s0: [α → 0.6, β → 0.4 ] Given policy π = s0 sa α 0.6 0.4 sb sc β 0.7 0.3 sd {A} {A} s0 ⊨ P>0.6 F A The probability of all paths from s0 satisfying F A is > 0.6 Proof iff Pr{p | p is a path from s0 and p ⊨ F A} > 0.6 iff Pr{s0sa, s0sc} > 0.6 0.6 ⋅0.6 + 0.4 ⋅0.7 = 0.64 > 0.6 iff ↝ Quantify over action probabilities and
 compute solution It follows s0 ⊨ P>0.6 F A Find

Synthesize

π(α | s0) + π(β | s0) = 1 0.6 0.4 π(β | s0) π(α | s0) (1) Formula progression, or (2) NBA mode Contributions Next such that Non-linear program in general - we use dual-space LPs instead

slide-31
SLIDE 31

Formula Progression

7

s0 sa α 0.6 0.4 sb sc β 0.7 0.3 sd {A} {A} [Bachus&Kabanza98]

Why? On-the-fly instead of upfront cross-product

{} {} {}

slide-32
SLIDE 32

Formula Progression

7

s0 sa α 0.6 0.4 sb sc β 0.7 0.3 sd {A} {A} [Bachus&Kabanza98]

Why? On-the-fly instead of upfront cross-product

{} {} {} LTL is defined on infinite runs

slide-33
SLIDE 33

Formula Progression

7

s0 sa α 0.6 0.4 sb sc β 0.7 0.3 sd {A} {A} [Bachus&Kabanza98]

Why? On-the-fly instead of upfront cross-product

{} {} {} Self loops at goals LTL is defined on infinite runs

slide-34
SLIDE 34

Formula Progression

7

s0 sa α 0.6 0.4 sb sc β 0.7 0.3 sd {A} {A} [Bachus&Kabanza98]

Why? On-the-fly instead of upfront cross-product

{} {} {} Self loops at goals LTL: s0 sa sa ⋯ ⊨ G X A LTLf: s0 sa ⊨ G X A LTL is defined on infinite runs

slide-35
SLIDE 35

Formula Progression

7

Progression: expand and simplify a given LTL formula along a path s0 sa α 0.6 0.4 sb sc β 0.7 0.3 sd {A} {A} [Bachus&Kabanza98]

Why? On-the-fly instead of upfront cross-product

{} {} {} Self loops at goals LTL: s0 sa sa ⋯ ⊨ G X A LTLf: s0 sa ⊨ G X A LTL is defined on infinite runs

slide-36
SLIDE 36

Formula Progression

7

Progression: expand and simplify a given LTL formula along a path s0 sa α 0.6 0.4 sb sc β 0.7 0.3 sd {A} {A} [Bachus&Kabanza98] s0 sa sa ⋯ ⊨ F A

Why? On-the-fly instead of upfront cross-product

{} {} {} Self loops at goals LTL: s0 sa sa ⋯ ⊨ G X A LTLf: s0 sa ⊨ G X A LTL is defined on infinite runs

slide-37
SLIDE 37

Formula Progression

7

Progression: expand and simplify a given LTL formula along a path s0 sa α 0.6 0.4 sb sc β 0.7 0.3 sd {A} {A} s0 sa sa ⋯ ⊨ A ∨ X F A (by expand) [Bachus&Kabanza98] s0 sa sa ⋯ ⊨ F A

Why? On-the-fly instead of upfront cross-product

{} {} {} Self loops at goals LTL: s0 sa sa ⋯ ⊨ G X A LTLf: s0 sa ⊨ G X A LTL is defined on infinite runs

slide-38
SLIDE 38

Formula Progression

7

Progression: expand and simplify a given LTL formula along a path s0 sa α 0.6 0.4 sb sc β 0.7 0.3 sd {A} {A} s0 sa sa ⋯ ⊨ A ∨ X F A (by expand) s0 sa sa ⋯ ⊨ X F A (by simplify) [Bachus&Kabanza98] s0 sa sa ⋯ ⊨ F A

Why? On-the-fly instead of upfront cross-product

{} {} {} Self loops at goals LTL: s0 sa sa ⋯ ⊨ G X A LTLf: s0 sa ⊨ G X A LTL is defined on infinite runs

slide-39
SLIDE 39

Formula Progression

7

Progression: expand and simplify a given LTL formula along a path s0 sa α 0.6 0.4 sb sc β 0.7 0.3 sd {A} {A} s0 sa sa ⋯ ⊨ A ∨ X F A (by expand) s0 sa sa ⋯ ⊨ X F A (by simplify) sa sa ⋯ ⊨ F A (by X) [Bachus&Kabanza98] s0 sa sa ⋯ ⊨ F A

Why? On-the-fly instead of upfront cross-product

{} {} {} Self loops at goals LTL: s0 sa sa ⋯ ⊨ G X A LTLf: s0 sa ⊨ G X A LTL is defined on infinite runs

slide-40
SLIDE 40

Formula Progression

7

Progression: expand and simplify a given LTL formula along a path s0 sa α 0.6 0.4 sb sc β 0.7 0.3 sd {A} {A} s0 sa sa ⋯ ⊨ A ∨ X F A (by expand) s0 sa sa ⋯ ⊨ X F A (by simplify) sa sa ⋯ ⊨ F A (by X) (by self-loop) sa sa ⋯ ⊨ A [Bachus&Kabanza98] s0 sa sa ⋯ ⊨ F A

Why? On-the-fly instead of upfront cross-product

{} {} {} Self loops at goals LTL: s0 sa sa ⋯ ⊨ G X A LTLf: s0 sa ⊨ G X A LTL is defined on infinite runs

slide-41
SLIDE 41

Formula Progression

7

Progression: expand and simplify a given LTL formula along a path s0 sa α 0.6 0.4 sb sc β 0.7 0.3 sd {A} {A} s0 sa sa ⋯ ⊨ A ∨ X F A (by expand) s0 sa sa ⋯ ⊨ X F A (by simplify) sa sa ⋯ ⊨ F A (by X) (by self-loop) sa sa ⋯ ⊨ A [Bachus&Kabanza98] sa sa ⋯ ⊨ ⊤ (by self-loop) s0 sa sa ⋯ ⊨ F A

Why? On-the-fly instead of upfront cross-product

{} {} {} Self loops at goals LTL: s0 sa sa ⋯ ⊨ G X A LTLf: s0 sa ⊨ G X A LTL is defined on infinite runs

slide-42
SLIDE 42

Formula Progression

7

Progression: expand and simplify a given LTL formula along a path s0 sa α 0.6 0.4 sb sc β 0.7 0.3 sd {A} {A} s0 sa sa ⋯ ⊨ A ∨ X F A (by expand) s0 sa sa ⋯ ⊨ X F A (by simplify) sa sa ⋯ ⊨ F A (by X) (by self-loop) sa sa ⋯ ⊨ A [Bachus&Kabanza98] sa sa ⋯ ⊨ ⊤ (by self-loop) s0 sa sa ⋯ ⊨ F A s0 sb sb ⋯ ⊨ F A

Why? On-the-fly instead of upfront cross-product

{} {} {} Self loops at goals LTL: s0 sa sa ⋯ ⊨ G X A LTLf: s0 sa ⊨ G X A LTL is defined on infinite runs

slide-43
SLIDE 43

Formula Progression

7

Progression: expand and simplify a given LTL formula along a path s0 sa α 0.6 0.4 sb sc β 0.7 0.3 sd {A} {A} s0 sa sa ⋯ ⊨ A ∨ X F A (by expand) s0 sa sa ⋯ ⊨ X F A (by simplify) sa sa ⋯ ⊨ F A (by X) (by self-loop) sa sa ⋯ ⊨ A [Bachus&Kabanza98] sa sa ⋯ ⊨ ⊤ (by self-loop) s0 sb sb ⋯ ⊨ A ∨ X F A (by expand) s0 sa sa ⋯ ⊨ F A s0 sb sb ⋯ ⊨ F A

Why? On-the-fly instead of upfront cross-product

{} {} {} Self loops at goals LTL: s0 sa sa ⋯ ⊨ G X A LTLf: s0 sa ⊨ G X A LTL is defined on infinite runs

slide-44
SLIDE 44

Formula Progression

7

Progression: expand and simplify a given LTL formula along a path s0 sa α 0.6 0.4 sb sc β 0.7 0.3 sd {A} {A} s0 sa sa ⋯ ⊨ A ∨ X F A (by expand) s0 sa sa ⋯ ⊨ X F A (by simplify) sa sa ⋯ ⊨ F A (by X) (by self-loop) sa sa ⋯ ⊨ A [Bachus&Kabanza98] sa sa ⋯ ⊨ ⊤ (by self-loop) s0 sb sb ⋯ ⊨ A ∨ X F A (by expand) s0 sb sb ⋯ ⊨ X F A (by simplify) s0 sa sa ⋯ ⊨ F A s0 sb sb ⋯ ⊨ F A

Why? On-the-fly instead of upfront cross-product

{} {} {} Self loops at goals LTL: s0 sa sa ⋯ ⊨ G X A LTLf: s0 sa ⊨ G X A LTL is defined on infinite runs

slide-45
SLIDE 45

Formula Progression

7

Progression: expand and simplify a given LTL formula along a path s0 sa α 0.6 0.4 sb sc β 0.7 0.3 sd {A} {A} s0 sa sa ⋯ ⊨ A ∨ X F A (by expand) s0 sa sa ⋯ ⊨ X F A (by simplify) sa sa ⋯ ⊨ F A (by X) (by self-loop) sa sa ⋯ ⊨ A [Bachus&Kabanza98] sa sa ⋯ ⊨ ⊤ (by self-loop) s0 sb sb ⋯ ⊨ A ∨ X F A (by expand) s0 sb sb ⋯ ⊨ X F A (by simplify) sb sb ⋯ ⊨ F A (by X) s0 sa sa ⋯ ⊨ F A s0 sb sb ⋯ ⊨ F A

Why? On-the-fly instead of upfront cross-product

{} {} {} Self loops at goals LTL: s0 sa sa ⋯ ⊨ G X A LTLf: s0 sa ⊨ G X A LTL is defined on infinite runs

slide-46
SLIDE 46

Formula Progression

7

Progression: expand and simplify a given LTL formula along a path s0 sa α 0.6 0.4 sb sc β 0.7 0.3 sd {A} {A} s0 sa sa ⋯ ⊨ A ∨ X F A (by expand) s0 sa sa ⋯ ⊨ X F A (by simplify) sa sa ⋯ ⊨ F A (by X) (by self-loop) sa sa ⋯ ⊨ A [Bachus&Kabanza98] sa sa ⋯ ⊨ ⊤ (by self-loop) s0 sb sb ⋯ ⊨ A ∨ X F A (by expand) s0 sb sb ⋯ ⊨ X F A (by simplify) sb sb ⋯ ⊨ F A (by X) (by self-loop) sb sb ⋯ ⊨ A s0 sa sa ⋯ ⊨ F A s0 sb sb ⋯ ⊨ F A

Why? On-the-fly instead of upfront cross-product

{} {} {} Self loops at goals LTL: s0 sa sa ⋯ ⊨ G X A LTLf: s0 sa ⊨ G X A LTL is defined on infinite runs

slide-47
SLIDE 47

Formula Progression

7

Progression: expand and simplify a given LTL formula along a path s0 sa α 0.6 0.4 sb sc β 0.7 0.3 sd {A} {A} s0 sa sa ⋯ ⊨ A ∨ X F A (by expand) s0 sa sa ⋯ ⊨ X F A (by simplify) sa sa ⋯ ⊨ F A (by X) (by self-loop) sa sa ⋯ ⊨ A [Bachus&Kabanza98] sa sa ⋯ ⊨ ⊤ (by self-loop) s0 sb sb ⋯ ⊨ A ∨ X F A (by expand) s0 sb sb ⋯ ⊨ X F A (by simplify) sb sb ⋯ ⊨ F A (by X) (by self-loop) sb sb ⋯ ⊨ A sb sb ⋯ ⊨ ⊥ (by self-loop) s0 sa sa ⋯ ⊨ F A s0 sb sb ⋯ ⊨ F A

Why? On-the-fly instead of upfront cross-product

{} {} {} Self loops at goals LTL: s0 sa sa ⋯ ⊨ G X A LTLf: s0 sa ⊨ G X A LTL is defined on infinite runs

slide-48
SLIDE 48

Formula Progression

7

Progression: expand and simplify a given LTL formula along a path s0 sa α 0.6 0.4 sb sc β 0.7 0.3 sd {A} {A} s0 sa sa ⋯ ⊨ A ∨ X F A (by expand) s0 sa sa ⋯ ⊨ X F A (by simplify) sa sa ⋯ ⊨ F A (by X) (by self-loop) sa sa ⋯ ⊨ A [Bachus&Kabanza98] sa sa ⋯ ⊨ ⊤ (by self-loop) s0 sb sb ⋯ ⊨ A ∨ X F A (by expand) s0 sb sb ⋯ ⊨ X F A (by simplify) sb sb ⋯ ⊨ F A (by X) (by self-loop) sb sb ⋯ ⊨ A sb sb ⋯ ⊨ ⊥ (by self-loop) s0 sa sa ⋯ ⊨ F A s0 sb sb ⋯ ⊨ F A

Why? On-the-fly instead of upfront cross-product

All transitions “if and only if” {} {} {} Self loops at goals LTL: s0 sa sa ⋯ ⊨ G X A LTLf: s0 sa ⊨ G X A LTL is defined on infinite runs

slide-49
SLIDE 49

Multi-Objective Progression in the State Space

8

A = DRA(ψ1) × ⋯ × DRA(ψk) × DRA(F Goal) × S ‹ ψ1,…,ψk,s0› ‹ ψ1’,…,ψk’,s1› ‹ ψ1’,…,ψk’,s2› ‘ is the progression operator Questions/Issues

  • Q: Does repeated progression terminate? 


A: It better does, but some rules even increases formula size: F A ↝ A ∨ X F A

  • Q: How to detect a loop ‹ψ,s› ≡ ‹ψ’’…’,s› ?


A: Check equivalence of LTL formulas. Exponential!
 A: Check equality of canonical representation of LTL formulas. Polynomial! Loop “Goal” α

slide-50
SLIDE 50

Multi-Objective Progression in the State Space

8

A = DRA(ψ1) × ⋯ × DRA(ψk) × DRA(F Goal) × S ‹ ψ1,…,ψk,s0› ‹ ψ1’,…,ψk’,s1› ‹ ψ1’,…,ψk’,s2› ‘ is the progression operator Questions/Issues

  • Q: Does repeated progression terminate? 


A: It better does, but some rules even increases formula size: F A ↝ A ∨ X F A

  • Q: How to detect a loop ‹ψ,s› ≡ ‹ψ’’…’,s› ?


A: Check equivalence of LTL formulas. Exponential!
 A: Check equality of canonical representation of LTL formulas. Polynomial! Loop “Goal” α Tseitin-style progression

slide-51
SLIDE 51

Tseitin Transformation for Classical Logic

9

  • Earliest polynomial conjunctive normal form (CNF) transformation [Tseitin 1966]
  • Improved versions popular with first-order theorem proving [Azmy&Weidenbach 2013]

How it works

  • Introduce names for complex subformulas before multiplying-out
  • Requires polynomially many names, one for each subformula
  • Apply once-and-forall to given formula and obtain equi-satisfiable CNF
  • That CNF is a conjunction of disjunction of 3-literal clauses

(A ∧ B) ∨ ψ ↝ (A ∨ ψ) ∧ (B ∨ ψ) Duplicates ψ (A ∧ B) ∨ ψ ↝ ψ(A ∧ B) ∨ ψ ¬ψ(A ∧ B) ∨ A ¬ψ(A ∧ B) ∨ B ψ(A ∧ B) is a name for (A ∧ B) Definition of ψ(A ∧ B)

slide-52
SLIDE 52

Tseitin Transformation for Classical Logic

9

  • Earliest polynomial conjunctive normal form (CNF) transformation [Tseitin 1966]
  • Improved versions popular with first-order theorem proving [Azmy&Weidenbach 2013]

How it works

  • Introduce names for complex subformulas before multiplying-out
  • Requires polynomially many names, one for each subformula
  • Apply once-and-forall to given formula and obtain equi-satisfiable CNF
  • That CNF is a conjunction of disjunction of 3-literal clauses

(A ∧ B) ∨ ψ ↝ (A ∨ ψ) ∧ (B ∨ ψ) Duplicates ψ (A ∧ B) ∨ ψ ↝ ψ(A ∧ B) ∨ ψ ¬ψ(A ∧ B) ∨ A ¬ψ(A ∧ B) ∨ B ψ(A ∧ B) is a name for (A ∧ B) Definition of ψ(A ∧ B) ↝ We need to apply Tseitin CNF to every derived formula: Tseitin-style progression

slide-53
SLIDE 53

Tseitin-Style Progression

10

First (?) application to LTL progression

All LTL formulas are now in 3-CNF { … … } { Li1, Li2, Li3 }

slide-54
SLIDE 54

Tseitin-Style Progression

10

First (?) application to LTL progression

All LTL formulas are now in 3-CNF { … … } { Li1, Li2, Li3 } 3-CNF: ∧-connected set of 3-literal clauses

slide-55
SLIDE 55

Tseitin-Style Progression

10

First (?) application to LTL progression

All LTL formulas are now in 3-CNF { … … } { Li1, Li2, Li3 } 3-CNF: ∧-connected set of 3-literal clauses Lik ∈ sub(ψ) ∪ { ¬ϕ, X ϕ, X ¬ϕ | ϕ ∈ sub(ψ) } ∪ “Names” ∪ … where ψ = initially given formula

slide-56
SLIDE 56

Tseitin-Style Progression

10

First (?) application to LTL progression

Progression

  • Sequence s0 ⊨ {{ ψ }} → s1 ⊨ Γ1 → s2 ⊨ Γ2 → … → si ⊨ Γi
  • Initially s0 ⊨ Γ0 where Γ0 = simplified 3-CNF of {{ ψ }}
  • Step si ⊨ Γi → si+1 ⊨ Γi+1 :

(1) Eliminate names from Γi and strip X-operators (2) Γi+1 = simplified 3-CNF of (1)

  • Stop if sk ⊨ Γk = si ⊨ Γi for some k < i 


Replaces ≡-test for LTL-formulas by polynomial set equality test! Complexity Literal signature |Σ| ∈ O(|ψ|2) O(|Σ|3) = O(|ψ|6) different clauses Theorem Space and time complexity polynomial in |S| and single exponential |ψ| O(|ψ|6) 2 different clause sets All LTL formulas are now in 3-CNF { … … } { Li1, Li2, Li3 } 3-CNF: ∧-connected set of 3-literal clauses Lik ∈ sub(ψ) ∪ { ¬ϕ, X ϕ, X ¬ϕ | ϕ ∈ sub(ψ) } ∪ “Names” ∪ … where ψ = initially given formula

slide-57
SLIDE 57

Tseitin-Style Progression

10

First (?) application to LTL progression

Progression

  • Sequence s0 ⊨ {{ ψ }} → s1 ⊨ Γ1 → s2 ⊨ Γ2 → … → si ⊨ Γi
  • Initially s0 ⊨ Γ0 where Γ0 = simplified 3-CNF of {{ ψ }}
  • Step si ⊨ Γi → si+1 ⊨ Γi+1 :

(1) Eliminate names from Γi and strip X-operators (2) Γi+1 = simplified 3-CNF of (1)

  • Stop if sk ⊨ Γk = si ⊨ Γi for some k < i 


Replaces ≡-test for LTL-formulas by polynomial set equality test! Complexity Literal signature |Σ| ∈ O(|ψ|2) O(|Σ|3) = O(|ψ|6) different clauses Theorem Space and time complexity polynomial in |S| and single exponential |ψ| O(|ψ|6) 2 different clause sets All LTL formulas are now in 3-CNF { … … } { Li1, Li2, Li3 } 3-CNF: ∧-connected set of 3-literal clauses Lik ∈ sub(ψ) ∪ { ¬ϕ, X ϕ, X ¬ϕ | ϕ ∈ sub(ψ) } ∪ “Names” ∪ … where ψ = initially given formula

{{}} ] Γ )r {{}} if Γ < ; (Triv) {{>} ] Ψ} ] Γ )r Γ (>) {{¬>} ] Ψ} ] Γ )r {Ψ} [ Γ (¬>) {{(u, d)} ] Ψ} ] Γ )r Γ if (u, d) 2 AP and r[u] = d (Eval1) {{(u, d)} ] Ψ} ] Γ )r {Ψ} [ Γ if (u, d) 2 AP and r[u] < d (Eval2) {{¬(u, d)} ] Ψ} ] Γ )r {Ψ} [ Γ if (u, d) 2 AP and r[u] = d (Eval3) {{¬(u, d)} ] Ψ} ] Γ )r Γ if (u, d) 2 AP and r[u] < d (Eval4) {{¬¬'} ] Ψ} ] Γ )r {{'} [ Ψ} [ Γ (¬¬) {{'1 _ '2} ] Ψ} ] Γ )r {{A'1_'2} [ Ψ, {¬A'1_'2, '1, '2}} [ Γ (_) {{¬('1 _ '2)} ] Ψ} ] Γ )r {{¬A'1_'2} [ Ψ, {A'1_'2, '1}, {A'1_'2, '2}} [ Γ (¬_) {{'1 ^ '2} ] Ψ} ] Γ )r {{A'1^'2} [ Ψ, {¬A'1^'2, '1}, {¬A'1^'2, '2}} [ Γ (^) {{¬('1 ^ '2)} ] Ψ} ] Γ )r {{¬A'1^'2} [ Ψ, {A'1^'2, '1, '2}} [ Γ (¬^) {{'1 U '2} ] Ψ} ] Γ )r {{A'1 U '2} [ Ψ, {¬A'1 U '2, '2, A'1^X ('1 U '2)}, {¬A'1^X ('1 U '2), '1}, {¬A'1^X ('1 U '2), X ('1 U '2)}} [ Γ (U) {{¬('1 U '2)} ] Ψ} ] Γ )r {{¬A'1 U '2} [ Ψ, {A'1 U '2, '2}, {A'1 U '2, ¬A'1^X ('1 U '2)}, {A'1^X ('1 U '2), Ψ1, X ¬('1 U '2)}} [ Γ (¬U) {{¬X '} ] Ψ} ] Γ )r {{X '} [ Ψ} [ Γ (¬X) The singled-out literal in the left-hand side of the rule is called the pivot.

slide-58
SLIDE 58

Tseitin-Style Progression

10

First (?) application to LTL progression

Progression

  • Sequence s0 ⊨ {{ ψ }} → s1 ⊨ Γ1 → s2 ⊨ Γ2 → … → si ⊨ Γi
  • Initially s0 ⊨ Γ0 where Γ0 = simplified 3-CNF of {{ ψ }}
  • Step si ⊨ Γi → si+1 ⊨ Γi+1 :

(1) Eliminate names from Γi and strip X-operators (2) Γi+1 = simplified 3-CNF of (1)

  • Stop if sk ⊨ Γk = si ⊨ Γi for some k < i 


Replaces ≡-test for LTL-formulas by polynomial set equality test! Complexity Literal signature |Σ| ∈ O(|ψ|2) O(|Σ|3) = O(|ψ|6) different clauses Theorem Space and time complexity polynomial in |S| and single exponential |ψ| O(|ψ|6) 2 different clause sets All LTL formulas are now in 3-CNF { … … } { Li1, Li2, Li3 } 3-CNF: ∧-connected set of 3-literal clauses Lik ∈ sub(ψ) ∪ { ¬ϕ, X ϕ, X ¬ϕ | ϕ ∈ sub(ψ) } ∪ “Names” ∪ … where ψ = initially given formula

slide-59
SLIDE 59

Policy Synthesis by Translation to Linear Program

11

s0 α β si Search Space Policy π π(α | si) = ? π(β | si) = ? ….

k+1 tuple

slide-60
SLIDE 60

Policy Synthesis by Translation to Linear Program

11

s0 α β si Search Space Policy π π(α | si) = ? π(β | si) = ? ….

k+1 tuple

Goal Linear program computes expected values Primary: e.g. time Secondary: e.g. fuel < 50 Expected number of times α is executed in si Cost( ) × Pr( ) + Cost( ) × Pr( ) + Cost( ) × Pr( ) = … + x(si, α) × C(α) + x(si, β) × C(β) + … Expected policy costs x(si, α) = Σ π(α | si) × Pr( si )

si

slide-61
SLIDE 61

Policy Synthesis by Translation to Linear Program

11

s0 α β si Search Space Linear Program Solver Optimal solution of linear program, i.e., values for x(si, α) s.th.

  • primary cost is minimized, and
  • secondary cost constraints are satisfied

in expectation Policy π π(α | si) = ? π(β | si) = ? ….

k+1 tuple

Goal Linear program computes expected values Primary: e.g. time Secondary: e.g. fuel < 50 Expected number of times α is executed in si Cost( ) × Pr( ) + Cost( ) × Pr( ) + Cost( ) × Pr( ) = … + x(si, α) × C(α) + x(si, β) × C(β) + … Expected policy costs x(si, α) = Σ π(α | si) × Pr( si )

si

slide-62
SLIDE 62

Policy Synthesis by Translation to Linear Program

11

s0 α β si Search Space Linear Program Solver Optimal solution of linear program, i.e., values for x(si, α) s.th.

  • primary cost is minimized, and
  • secondary cost constraints are satisfied

in expectation Policy π π(α | si) = ? π(β | si) = ? …. π(α | si) = x(si, α) / (x(si, α) + x(si, β))

k+1 tuple

Goal Linear program computes expected values Primary: e.g. time Secondary: e.g. fuel < 50 Expected number of times α is executed in si Cost( ) × Pr( ) + Cost( ) × Pr( ) + Cost( ) × Pr( ) = … + x(si, α) × C(α) + x(si, β) × C(β) + … Expected policy costs x(si, α) = Σ π(α | si) × Pr( si )

si

slide-63
SLIDE 63

Policy Synthesis by Translation to Linear Program

11

s0 α β si Search Space Linear Program Solver Optimal solution of linear program, i.e., values for x(si, α) s.th.

  • primary cost is minimized, and
  • secondary cost constraints are satisfied

in expectation Policy π π(α | si) = ? π(β | si) = ? …. π(α | si) = x(si, α) / (x(si, α) + x(si, β))

k+1 tuple

Amenable to heuristics Goal Linear program computes expected values Primary: e.g. time Secondary: e.g. fuel < 50 Expected number of times α is executed in si Cost( ) × Pr( ) + Cost( ) × Pr( ) + Cost( ) × Pr( ) = … + x(si, α) × C(α) + x(si, β) × C(β) + … Expected policy costs x(si, α) = Σ π(α | si) × Pr( si )

si

slide-64
SLIDE 64

Heuristics Search: i-dual and i2-dual

12

s0 Goal α β si Exploring the state space …

  • First heuristic search algorithms for constrained SSPs [Trevizan, Thiebaux, Haslum, Williams, Santana]


i.e. primary expected cost (“time”) and secondary expected cost constraints (“fuel < 5”)

  • Sound, complete and optimal for admissible heuristics H (H must understimate expected costs)

s s s

Current state space

slide-65
SLIDE 65

Heuristics Search: i-dual and i2-dual

12

s0 Goal α β si Exploring the state space …

  • First heuristic search algorithms for constrained SSPs [Trevizan, Thiebaux, Haslum, Williams, Santana]


i.e. primary expected cost (“time”) and secondary expected cost constraints (“fuel < 5”)

  • Sound, complete and optimal for admissible heuristics H (H must understimate expected costs)

… with A*-like heuristic estimation function H (1) Compute best policy π* for current state space
 by translation into LP with fringe as artificial goals
 with costs H
 π* minimizes f = g + H (2) Expand all fringe states reachable under π* (3) If all reachable fringe states are original goals 
 then stop else repeat H H H s s s

Current state space

slide-66
SLIDE 66

Heuristics Search: i-dual and i2-dual

12

s0 Goal α β si Exploring the state space …

  • First heuristic search algorithms for constrained SSPs [Trevizan, Thiebaux, Haslum, Williams, Santana]


i.e. primary expected cost (“time”) and secondary expected cost constraints (“fuel < 5”)

  • Sound, complete and optimal for admissible heuristics H (H must understimate expected costs)

… with A*-like heuristic estimation function H (1) Compute best policy π* for current state space
 by translation into LP with fringe as artificial goals
 with costs H
 π* minimizes f = g + H (2) Expand all fringe states reachable under π* (3) If all reachable fringe states are original goals 
 then stop else repeat H H H Search space

  • Over policies, not paths; g(s) may change in each step
  • Policies may become constrained


E.g. Pr( ) < 0.1 if Hfuel(s) = 50 
 as otherwise fuel < 5 not achievable s s s

Current state space

slide-67
SLIDE 67

Heuristics Search: i-dual and i2-dual

12

s0 Goal α β si Exploring the state space …

  • First heuristic search algorithms for constrained SSPs [Trevizan, Thiebaux, Haslum, Williams, Santana]


i.e. primary expected cost (“time”) and secondary expected cost constraints (“fuel < 5”)

  • Sound, complete and optimal for admissible heuristics H (H must understimate expected costs)

… with A*-like heuristic estimation function H (1) Compute best policy π* for current state space
 by translation into LP with fringe as artificial goals
 with costs H
 π* minimizes f = g + H (2) Expand all fringe states reachable under π* (3) If all reachable fringe states are original goals 
 then stop else repeat H H H Search space

  • Over policies, not paths; g(s) may change in each step
  • Policies may become constrained


E.g. Pr( ) < 0.1 if Hfuel(s) = 50 
 as otherwise fuel < 5 not achievable s s s

Current state space

↝ For PLTL constraints

slide-68
SLIDE 68

Heuristic Search for PLTL - PLTL-dual

13

s0 Goal α Find policy π s.th s0, π ⊨ P≥0.9Ψ A universal heuristic for search space pruning α α 0.2 0.9

slide-69
SLIDE 69

Heuristic Search for PLTL - PLTL-dual

13

s0 Goal α Find policy π s.th s0, π ⊨ P≥0.9Ψ A universal heuristic for search space pruning α α Optimal (final) policy π* π*(α, s0) = 1 π*(α, s0) = 0 π*(α, s0) = 0 0.2 0.9

slide-70
SLIDE 70

Heuristic Search for PLTL - PLTL-dual

13

s0 Goal α Find policy π s.th s0, π ⊨ P≥0.9Ψ Pr { | Ψ } = 0.9 ≤ H( ) = 1 Max among all π* ≤ Heuristic value Pr { | Ψ } = 0 ≤ H( ) = 0.5 Pr { | Ψ } = 0.2 ≤ H( ) = 0.3 A universal heuristic for search space pruning α α Optimal (final) policy π* π*(α, s0) = 1 π*(α, s0) = 0 π*(α, s0) = 0 0.2 0.9

slide-71
SLIDE 71

Heuristic Search for PLTL - PLTL-dual

13

s0 Goal α Find policy π s.th s0, π ⊨ P≥0.9Ψ Pr { | Ψ } = 0.9 ≤ H( ) = 1 Max among all π* ≤ Heuristic value Pr { | Ψ } = 0 ≤ H( ) = 0.5 Pr { | Ψ } = 0.2 ≤ H( ) = 0.3 A universal heuristic for search space pruning α α Optimal (final) policy π* π*(α, s0) = 1 π*(α, s0) = 0 π*(α, s0) = 0 0.2 0.9 1 0.5 0.3

slide-72
SLIDE 72

Heuristic Search for PLTL - PLTL-dual

13

s0 Goal α Find policy π s.th s0, π ⊨ P≥0.9Ψ Pr { | Ψ } = 0.9 ≤ H( ) = 1 Max among all π* ≤ Heuristic value Pr { | Ψ } = 0 ≤ H( ) = 0.5 Pr { | Ψ } = 0.2 ≤ H( ) = 0.3 A universal heuristic for search space pruning α α Optimal (final) policy π* π*(α, s0) = 1 π*(α, s0) = 0 π*(α, s0) = 0 0.2 0.9 Entailed feasibilty policy constraint π(α, s0) ≤ 0.2 Otherwise, e.g. with π(α, s0) = 0.21 0.21 ⋅ 0.5 + π(α, s0) ⋅1 ≥ 0.9 But 0.21 + 0.795 = 1.005 > 1 ⇒ π(α, s0) ≥ 0.795 1 0.5 0.3

slide-73
SLIDE 73

Heuristic Search for PLTL - PLTL-dual

13

s0 Goal α Find policy π s.th s0, π ⊨ P≥0.9Ψ Pr { | Ψ } = 0.9 ≤ H( ) = 1 Max among all π* ≤ Heuristic value Pr { | Ψ } = 0 ≤ H( ) = 0.5 Pr { | Ψ } = 0.2 ≤ H( ) = 0.3 How to compute H( ) with NBAs 1.Ψ’ := Ψ ∧ “finite extension semantics” 2.Compute NBA B for Ψ’ 3.Trace B to find - states (overapproximation) 4.Trace B from - states as initial states to Goal


  • using relaxed actions from S consistent with trace

  • as a SSP T

5.Solve T putting 1 unit of flow into - states 6.Get H( ) from flow into Goal A universal heuristic for search space pruning α α Optimal (final) policy π* π*(α, s0) = 1 π*(α, s0) = 0 π*(α, s0) = 0 0.2 0.9 Entailed feasibilty policy constraint π(α, s0) ≤ 0.2 Otherwise, e.g. with π(α, s0) = 0.21 0.21 ⋅ 0.5 + π(α, s0) ⋅1 ≥ 0.9 But 0.21 + 0.795 = 1.005 > 1 ⇒ π(α, s0) ≥ 0.795 1 0.5 0.3

slide-74
SLIDE 74

Experiment: Wall-e and Eve

14

Rooms Hallway 1 2 n

G

3 4 …

  • Goal: Wall-e at G
  • Constraints:
  • 1. Wall-e and Eve must eventually be together (P ≥ 0.5)
  • 2. Eve must be in a room until they are together (P ≥ 0.8)
  • 3. Once together, they eventually stay together (P = 1)
  • 4. Eve must visit the rooms 1, 2, and 3 (P = 1)
  • 5. Wall-e never visits a room twice (P ≥ 0.8)
slide-75
SLIDE 75

Experiment: Wall-e and Eve

14

Rooms Hallway 1 2 n

G

3 4 …

  • Goal: Wall-e at G
  • Constraints:
  • 1. Wall-e and Eve must eventually be together (P ≥ 0.5)
  • 2. Eve must be in a room until they are together (P ≥ 0.8)
  • 3. Once together, they eventually stay together (P = 1)
  • 4. Eve must visit the rooms 1, 2, and 3 (P = 1)
  • 5. Wall-e never visits a room twice (P ≥ 0.8)
slide-76
SLIDE 76

Experiments - Wall-E

15

NBA heur: full heuristics, may yield “many” states NBA heur (100): use trivial heuristics if > 100 states in NBA Good also for progression: violated LTL constraints detected early by simplification Wall-E never visits room1 twice G (wall-E_room1 ⇒ (wall-E_room1 U G ¬wall-E_room1) (ψ3)

slide-77
SLIDE 77

Experiments - Factory

16

slide-78
SLIDE 78

Conclusion

Summary

  • Policy synthesis algorithm for multi-objective PLTL constraints Ψ = P1 ψ1 ∧ ⋯ ∧ Pk ψk


Resulting history-independent (Markovian) policy over cross-product state space converts to finite-memory policy in the standard way

  • Tseitin-style progression


Better worst-case complexity: single-exponential (vs double-exponential) in |Ψ|

  • NBA-based A*-like heuristics
  • “Promising experiments”

Future Work

  • Implement progression in full
  • Heuristics based on progression (vs NBA)
  • Multi-objective PLTL verification (on infinite runs) based on progression
  • Quantification over finite domains. Non-prob: [Baier&McIlraith 2006]
  • Beyond PLTL, e.g. P>0.8 G (A → P>0.4 F B)

17