(Near)-optimal policies for Probabilistic IPC 2018 domains Brikena - - PowerPoint PPT Presentation

near optimal policies for probabilistic ipc 2018 domains
SMART_READER_LITE
LIVE PREVIEW

(Near)-optimal policies for Probabilistic IPC 2018 domains Brikena - - PowerPoint PPT Presentation

(Near)-optimal policies for Probabilistic IPC 2018 domains Brikena C elaj Department of Mathematics and Computer Science University of Basel June 2020 Brikena C elaj (Near)-optimal policies for Probabilistic IPC 2018 domains 1 / 45


slide-1
SLIDE 1

(Near)-optimal policies for Probabilistic IPC 2018 domains

Brikena C ¸elaj

Department of Mathematics and Computer Science University of Basel

June 2020

Brikena C ¸elaj (Near)-optimal policies for Probabilistic IPC 2018 domains 1 / 45

slide-2
SLIDE 2

Introduction

  • The International Planning Competition (IPC) is a competition of

state-of-the-art planning systems.

  • Quality of the planners is measured in terms of IPC Score.
  • Evaluation metric is flawed without optimal upper bound.
  • Thesis aim and motivation - Contribute to the IPC evaluation

metric by finding near-optimal solution of two domains:

  • Academic Advising
  • Chromatic Dice

Brikena C ¸elaj (Near)-optimal policies for Probabilistic IPC 2018 domains 2 / 45

slide-3
SLIDE 3

Academic Advising

  • Academic Advising Domain
  • Relevance Analysis
  • Mapping to Classical Planning
  • Results

Brikena C ¸elaj (Near)-optimal policies for Probabilistic IPC 2018 domains 3 / 45

slide-4
SLIDE 4

Academic Advising Domain

Semester No. Title Lecturers CP fs 15731-01 Multimedia Retrieval Roger Weber 6 ss 13548-01 Foundation of Artificial Intel- ligence Malte Helmert Thomas Keller 8 fs 45400-01 Planning and Optimization Thomas Keller Gabriele R¨

  • ger

8 fs 45401-01 Bioinformatics Algorithms Volker Roth 4 ss 17165-01 Machine Learning Volker Roth 8 ss 10948-01 Theory

  • f

Computer Science Gabriele R¨

  • ger

8

Brikena C ¸elaj (Near)-optimal policies for Probabilistic IPC 2018 domains 4 / 45

slide-5
SLIDE 5

Academic Advising Domain

Semester No. Title Lecturers CP fs 15731-01 Multimedia Retrieval Roger Weber 6 ss 13548-01 Foundation of Artificial Intel- ligence Malte Helmert Thomas Keller 8 fs 45400-01 Planning and Optimization Thomas Keller Gabriele R¨

  • ger

8 fs 45401-01 Bioinformatics Algorithms Volker Roth 4 ss 17165-01 Machine Learning Volker Roth 8 ss 10948-01 Theory

  • f

Computer Science Gabriele R¨

  • ger

8

Prerequisite Theory of Computer Science Foundation of Artificial Intelligence

O

Brikena C ¸elaj (Near)-optimal policies for Probabilistic IPC 2018 domains 4 / 45

slide-6
SLIDE 6

Academic Advising Domain

  • The smallest instances has more than a trillion states.
  • The hardest instance has around 10167 states and
  • The hardest instance has around 1012 actions.
  • First step toward solution - Relevance Analysis!

Brikena C ¸elaj (Near)-optimal policies for Probabilistic IPC 2018 domains 5 / 45

slide-7
SLIDE 7

Relevance Analysis

1 An instance is represented by directed acyclic graph (DAG)

Nodes − → courses Edges − → connect course to its prerequisites

Brikena C ¸elaj (Near)-optimal policies for Probabilistic IPC 2018 domains 6 / 45

slide-8
SLIDE 8

Relevance Analysis

Example: Academic Advising Instance

C03 C04 C00 C10 C12 C01 C02 C13 C11 C21 C31 C20 C32 C30 C22

Brikena C ¸elaj (Near)-optimal policies for Probabilistic IPC 2018 domains 7 / 45

slide-9
SLIDE 9

Relevance Analysis

1 An instance is represented by directed acyclic graph (DAG)

Nodes − → courses Edges − → connect course to its prerequisites

2 In each iteration find the leaves of the graph Brikena C ¸elaj (Near)-optimal policies for Probabilistic IPC 2018 domains 8 / 45

slide-10
SLIDE 10

Relevance Analysis

1 An instance is represented by directed acyclic graph (DAG)

Nodes − → courses Edges − → connect course to its prerequisites

2 In each iteration find the leaves of the graph 3 Prune any leaf that it not in program required courses Brikena C ¸elaj (Near)-optimal policies for Probabilistic IPC 2018 domains 9 / 45

slide-11
SLIDE 11

Relevance Analysis

First iteration

C03 C04 C00 C10 C12 C01 C02 C13 C11 C21 C31 C20 C32 C30 C22

Brikena C ¸elaj (Near)-optimal policies for Probabilistic IPC 2018 domains 10 / 45

slide-12
SLIDE 12

Relevance Analysis

Second iteration

C03 C04 C00 C01 C02 C13 C11 C21 C20 C30 C22

Brikena C ¸elaj (Near)-optimal policies for Probabilistic IPC 2018 domains 11 / 45

slide-13
SLIDE 13

Relevance Analysis

Third iteration

C01 C02 C13 C11 C20 C30 C22

Brikena C ¸elaj (Near)-optimal policies for Probabilistic IPC 2018 domains 12 / 45

slide-14
SLIDE 14

Relevance Analysis

C03 C04 C00 C10 C12 C01 C02 C13 C11 C21 C31 C20 C32 C30 C22

Brikena C ¸elaj (Near)-optimal policies for Probabilistic IPC 2018 domains 13 / 45

slide-15
SLIDE 15

Relevance Analysis

  • After shrinking, in average, we have half the number of courses.
  • The hardest instance now has around 1046 states and 109 actions.
  • Still too large to find an optimal solution!
  • Next step: Mapping to Classical Planning!

Brikena C ¸elaj (Near)-optimal policies for Probabilistic IPC 2018 domains 14 / 45

slide-16
SLIDE 16

Mapping to Classical Planning

In Academic Advising domain:

  • There are no dead ends.
  • If horizon h is infinite, any optimal policy will try to reach a state

where the program requirement is complete.

  • If concurrency σ is one, we have two outcomes for each action

(succeed or fail).

Brikena C ¸elaj (Near)-optimal policies for Probabilistic IPC 2018 domains 15 / 45

slide-17
SLIDE 17

Mapping to Classical Planning

Assumption: h = ∞, σ = 1.

I A 0.2 −2 0.8

Brikena C ¸elaj (Near)-optimal policies for Probabilistic IPC 2018 domains 16 / 45

slide-18
SLIDE 18

Mapping to Classical Planning

Assumption: h = ∞, σ = 1.

I A 0.2 −2 0.8

I A −10

Brikena C ¸elaj (Near)-optimal policies for Probabilistic IPC 2018 domains 16 / 45

slide-19
SLIDE 19

Mapping to Classical Planning

Academic Advising domain example A C D

Brikena C ¸elaj (Near)-optimal policies for Probabilistic IPC 2018 domains 17 / 45

slide-20
SLIDE 20

Mapping to Classical Planning

Academic Advising domain converted into a classical domain

I A C D

C ∧ D A ∧ D A ∧ C

A ∧ C ∧ D

take − A take − C take − D take − D − given − A take − C − given − A t a k e − D − g i v e n − C t a k e − A − g i v e n − C take − D − given − A − and − C Brikena C ¸elaj (Near)-optimal policies for Probabilistic IPC 2018 domains 18 / 45

slide-21
SLIDE 21

Mapping to Classical Planning

Theorem For all Academic Advising instances, where σ = 1 and h = ∞, and π, an

  • ptimal plan for the induced Classical Planning Task, we have

V∗(s0, ∞) = −cost(π)

Brikena C ¸elaj (Near)-optimal policies for Probabilistic IPC 2018 domains 19 / 45

slide-22
SLIDE 22

Mapping to Classical Planning

  • In most of the instances, σ > 1!
  • Question: Why it is not simple to map to Classical Planning when

σ > 1?

  • Answer: We no longer have only two outcomes (succeed or fail)!
  • Solution: Ignore that courses can be taken in parallel, and divide cost
  • f the plan by σ.

Brikena C ¸elaj (Near)-optimal policies for Probabilistic IPC 2018 domains 20 / 45

slide-23
SLIDE 23

Example: σ = 2

C01 C02 C11 C20 C30 C22

  • Assume we always perform as many actions as concurrency,
  • Assume we take the courses where all the prerequisites are already

passed.

Brikena C ¸elaj (Near)-optimal policies for Probabilistic IPC 2018 domains 21 / 45

slide-24
SLIDE 24

Example: σ = 2

C01 C02 C11 C20 C30 C22

  • Assume we always perform as many actions as concurrency,
  • Assume we take the courses where all the prerequisites are already

passed.

Brikena C ¸elaj (Near)-optimal policies for Probabilistic IPC 2018 domains 22 / 45

slide-25
SLIDE 25

Example: σ = 2

C01 C02 C11 C20 C30 C22

  • Assume we always perform as many actions as concurrency,
  • Assume we take the courses where all the prerequisites are already

passed.

Brikena C ¸elaj (Near)-optimal policies for Probabilistic IPC 2018 domains 23 / 45

slide-26
SLIDE 26

Mapping to Classical Planning

Theorem For all Academic Advising instances, where σ > 1 and h = ∞, and π, an

  • ptimal plan for the induced Classical Planning Task, we have

V∗(s0, ∞) ≥ −cost(π) σ

Brikena C ¸elaj (Near)-optimal policies for Probabilistic IPC 2018 domains 24 / 45

slide-27
SLIDE 27

Mapping to Classical Planning

  • In practice, the horizon is finite!
  • If we don’t expect to achieve the goal in time, it is better to do

nothing instead of applying an operator.

  • Applying an operator incurs cost.

Brikena C ¸elaj (Near)-optimal policies for Probabilistic IPC 2018 domains 25 / 45

slide-28
SLIDE 28

Mapping to Classical Planning

  • Question: Can we deal with cases where h = ∞?
  • Answer: No, but we can come up with good estimates!
  • Solution: Comparison of the optimal policy with noop policy!

Brikena C ¸elaj (Near)-optimal policies for Probabilistic IPC 2018 domains 26 / 45

slide-29
SLIDE 29

Mapping to Classical Planning

Result For all Academic Advising instances, where h = ∞, and π, an optimal plan for the induced Classical Planning Task, we have V∗(s0, h) ≈ max (−cost(π) σ , h · penalty)

Brikena C ¸elaj (Near)-optimal policies for Probabilistic IPC 2018 domains 27 / 45

slide-30
SLIDE 30

Results

Instance Concurrency Horizon Our Results SOGBOFA PROST-DD 01 1 20

  • 25
  • 48.4
  • 47.13

02 2 20

  • 15
  • 63.13
  • 49.93

03 1 20

  • 20
  • 35.2
  • 37.8

04 1 20

  • 21.87
  • 79.18
  • 39.48

05 2 20

  • 26.63
  • 100.0
  • 90.12

06 1 30

  • 55
  • 82.86
  • 83.46

07 2 30

  • 40.98
  • 150.0
  • 188.96

08 2 30

  • 30.41
  • 150.0
  • 182.84

09 1 30

  • 25
  • 66.53
  • 86.33

10 2 30

  • 42
  • 150.0
  • 200.24

11 3 40

  • 34.09
  • 200.0

12 2 40

  • 36.51
  • 200.0
  • 215.2

13 2 40

  • 42.57
  • 200.0
  • 282.48

14 3 40

  • 44.24
  • 200.0

15 2 40

  • 53.09
  • 200.0

16 3 50

  • 52.79
  • 250.0

17 4 50

  • 41.8
  • 250.0

18 3 50

  • 44.74
  • 250.0

19 4 50

  • 45.59
  • 250.0

20 5 50

  • 35.35
  • 250.0

Brikena C ¸elaj (Near)-optimal policies for Probabilistic IPC 2018 domains 28 / 45

slide-31
SLIDE 31

Chromatic Dice

  • Chromatic Dice Domain
  • Implementation Strategy
  • Chromatic Dice Structure
  • Near-optimal Strategy
  • Results

Brikena C ¸elaj (Near)-optimal policies for Probabilistic IPC 2018 domains 29 / 45

slide-32
SLIDE 32

Yahtzee

Brikena C ¸elaj (Near)-optimal policies for Probabilistic IPC 2018 domains 30 / 45

slide-33
SLIDE 33

Chromatic Dice Domain

Chromatic Dice is similar to Yahtzee with some differences:

1 Dices are two-dimensional(values and colors). 2 There are more categories. 3 There are two different type of bonuses. Brikena C ¸elaj (Near)-optimal policies for Probabilistic IPC 2018 domains 31 / 45

slide-34
SLIDE 34

Chromatic Dice Structure

Chromatic Dice structure looks as follow:

  • The state space can be structured into rounds
  • Each round consists of 3 roll operators and 1 assign operator
  • Roll operators roll (a subset of) the dice
  • Assign operators select an unassigned category and yield a reward
  • The number of rounds is equal to the number of categories

Brikena C ¸elaj (Near)-optimal policies for Probabilistic IPC 2018 domains 32 / 45

slide-35
SLIDE 35

Chromatic Dice Architecture

C1 C6 C3oak C2p Bv . . . . . . phase4: assign A Macro-step phase1: roll phase2: re-roll phase3: re-roll A Micro-step Brikena C ¸elaj (Near)-optimal policies for Probabilistic IPC 2018 domains 33 / 45

slide-36
SLIDE 36

Macro-steps

C1 C6 C3oak C2p Bv

A Micro-step . . .

C1 x C6 C3oak C2p Bv

A Micro-step . . .

C1 C6 x C3oak C2p Bv

A Micro-step . . .

C1 x C6 x C3oak C2p Bv

A Micro-step . . .

C1 x C6 C3oak x C2p Bv

A Micro-step . . .

C1 x C6 C3oak C2p Bv x

A Micro-step . . .

assignC1 assignC6 a s s i g n

C

6

assignC3oak a s s i g n

C

s s

. . . . . . . . .

  • The dice remain the same while we perform Macro-steps.

Brikena C ¸elaj (Near)-optimal policies for Probabilistic IPC 2018 domains 34 / 45

slide-37
SLIDE 37

Macro-step State Space

  • A naive representation considers all information of the scorecard:

Yahtzee: 237 states Chromatic Dice: 274 states

  • Computation of an optimal strategy possible with much more

compact representation based on

1

which category is still available (is the category taken or not)

2

which is the bonus level of the upper and middle section.

Yahtzee: 219 states Chromatic Dice: 229 states

Brikena C ¸elaj (Near)-optimal policies for Probabilistic IPC 2018 domains 35 / 45

slide-38
SLIDE 38

Micro-steps

C1 C6 C3oak C2p Bv . . . . . .

A Macro-step

I . . . . . . roll1 r

  • l

l2

{d1}

roll2

{d2}

roll3

{d1}

p p p roll2

{} 1 n 1 n 1 n 1 n 1 n

roll3{}

1 n 1 n 1 n

. . . . . . . . .

A Micro-step

  • The categories and level of the bonuses remain the same while we

perform Micro-steps.

Brikena C ¸elaj (Near)-optimal policies for Probabilistic IPC 2018 domains 36 / 45

slide-39
SLIDE 39

Micro-steps

We can reduce the problem by:

  • Shrinking the Micro-step state space.

= − → 3 × , 2 ×

  • Shrinking the Micro-step edges.

Keep Roll Roll Roll

=

Roll Roll Keep Roll Roll Roll

− → Roll 0, 1, 2 or 3 and 0, 1 or 2

Brikena C ¸elaj (Near)-optimal policies for Probabilistic IPC 2018 domains 37 / 45

slide-40
SLIDE 40

Micro-step State Space and Edges

  • Naive representation:

Yahtzee Chromatic Dice States 212 224 Edges 213 224

  • Compact representation:

Yahtzee Chromatic Dice States 210 218 Edges 212 223

Brikena C ¸elaj (Near)-optimal policies for Probabilistic IPC 2018 domains 38 / 45

slide-41
SLIDE 41

Backtracking Method

  • Our state space is a DAG.
  • Benefit of DAG: We compute the policy by using backtracking

method.

  • How? - Initialize the state values on the last layer with 0 and go

backward up to the initial state by replacing all value states.

Brikena C ¸elaj (Near)-optimal policies for Probabilistic IPC 2018 domains 39 / 45

slide-42
SLIDE 42

Optimal Results

Instances Macro- steps Instance edges Optimal Results SOGBOFA PROST-DD 01 ≈ 211 ≈ 214 72.51 48.89 37.87 02 ≈ 215 ≈ 214 160.03 182.17 71.69 03 ≈ 216 ≈ 214 216.88 142.21 108.61 04 ≈ 219 ≈ 214 279.42 247.52 119.45 05 ≈ 213 ≈ 218 154.40 118.31 108.0 06 ≈ 221 ≈ 218

  • 325.84

150.47 07 ≈ 225 ≈ 218

  • 402.53

203.36 08 ≈ 214 ≈ 221 147.17 120.29 94.53 09 ≈ 222 ≈ 221

  • 313.79

144.49 10 ≈ 226 ≈ 221

  • 370.13

193.27 11 ≈ 215 ≈ 223 155.48 108.43 86.93 12 ≈ 227 ≈ 223

  • 355.01

162.13 13 ≈ 216 ≈ 224

  • 115.63

86.2 14 ≈ 221 ≈ 224

  • 204.92

74.63 15 ≈ 229 ≈ 225

  • 402.25

159.48 16 ≈ 229 ≈ 214

  • 441.05

227.68 17 ≈ 229 ≈ 214

  • 414.27

236.28 18 ≈ 229 ≈ 214

  • 450.97

211.73 19 ≈ 229 ≈ 214

  • 423.39

205.24 20 ≈ 229 ≈ 214

  • 452.44

208.96

Brikena C ¸elaj (Near)-optimal policies for Probabilistic IPC 2018 domains 40 / 45

slide-43
SLIDE 43

Near-optimal Strategy

  • We can find the optimal solution only for small instances because the

state space is large.

  • For harder instances, finding an optimal solution is intractable in

practice.

  • Solution: Heuristic Strategy

Brikena C ¸elaj (Near)-optimal policies for Probabilistic IPC 2018 domains 41 / 45

slide-44
SLIDE 44

Near-optimal Strategy

We generalize the idea of cost partitioning and apply it for FH-MDPs, called reward partitioning, as follow:

  • We divide an instance into any number of sub-instances.
  • Each category yield the reward in only one of the sub-instances, while

in all others the reward is 0.

  • The sum of solution rewards of each sub-instances is an admissible

expected reward.

Brikena C ¸elaj (Near)-optimal policies for Probabilistic IPC 2018 domains 42 / 45

slide-45
SLIDE 45

Near-optimal Strategy in Practice

  • Drawback: The horizon is still the same!
  • Drawback: The size of MDP is almost the same, therefore,it is hard

to compute in practice!

  • Solution: Near optimal solution without the guarantee of

admissibility!

  • Solution: Decrease the horizon to the number of categories that are

considered in the sub-instance

Brikena C ¸elaj (Near)-optimal policies for Probabilistic IPC 2018 domains 43 / 45

slide-46
SLIDE 46

Heuristic Results

Instances |Sub- instances| Our Results SOGBOFA PROST-DD 06 3 389 .95 325.84 150.47 07 3 496.29 402.53 203.36 09 3 395.49 313.79 144.49 10 3 489 .76 370.13 193.27 12 3 480.70 355.01 162.13 13 3 225.70 115.63 86.2 14 3 297.05 204.92 74.63 15 3 500.50 402.25 159.48 16 2 406.43 441.05 227.68 17 2 409.32 414.27 236.28 18 2 381.33 450.97 211.73 19 2 401.63 423.39 205.24 20 2 430.44 452.44 208.96

Brikena C ¸elaj (Near)-optimal policies for Probabilistic IPC 2018 domains 44 / 45

slide-47
SLIDE 47

Thank you!

Brikena C ¸elaj (Near)-optimal policies for Probabilistic IPC 2018 domains 45 / 45