Path Finding under Uncertainty through Probabilistic Inference - - PowerPoint PPT Presentation

path finding under uncertainty through probabilistic
SMART_READER_LITE
LIVE PREVIEW

Path Finding under Uncertainty through Probabilistic Inference - - PowerPoint PPT Presentation

Path Finding under Uncertainty through Probabilistic Inference David Tolpin, Jan Willem van de Meent, Brooks Paige, Frank Wood University of Oxford June 8th, 2015 Paper: http://arxiv.org/abs/1502.07314 Slides:


slide-1
SLIDE 1

Path Finding under Uncertainty through Probabilistic Inference

David Tolpin, Jan Willem van de Meent, Brooks Paige, Frank Wood University of Oxford June 8th, 2015 Paper: http://arxiv.org/abs/1502.07314 Slides: http://offtopia.net/ctp-pp-slides.pdf

slide-2
SLIDE 2

Outline

Probabilistic Programming Inference Path Finding and Probabilistic Inference Stochastic Policy Learning Case Study: Canadian Traveller Problem Summary

slide-3
SLIDE 3

Intuition

Probabilistic program:

◮ A program with random computations. ◮ Distributions are conditioned by ‘observations’. ◮ Values of certain expressions are ‘predicted’ — the output.

Can be written in any language (extended by sample and

  • bserve).
slide-4
SLIDE 4

Example: Model Selection

1

(let [;; Model

2

dist (sample (categorical [[normal 1/4] [gamma 1/4]

3

[uniform-discrete 1/4]

4

[uniform-continuous 1/4]]))

5

a (sample (gamma 1 1))

6

b (sample (gamma 1 1))

7

d (dist a b)]

8 9

;; Observations

10

(observe d 1)

11

(observe d 2)

12

(observe d 4)

13

(observe d 7)

14 15

;; Explanation

16

(predict :d (type d))

17

(predict :a a)

18

(predict :b b)))

slide-5
SLIDE 5

Definition

A probabilistic program is a stateful deterministic computation P:

◮ Initially, P expects no arguments. ◮ On every call, P returns

◮ a distribution F, ◮ a distribution and a value (G, y), ◮ a value z, ◮ or ⊥.

◮ Upon returning F, P expects x ∼ F. ◮ Upon returning ⊥, P terminates.

A program is run by calling P repeatedly until termination. The probability of each trace is pP(x x x) =∝

|x x x|

  • i=1

pFi(xi)

|y y y|

  • j=1

pGj(yj) .

slide-6
SLIDE 6

Outline

Probabilistic Programming Inference Path Finding and Probabilistic Inference Stochastic Policy Learning Case Study: Canadian Traveller Problem Summary

slide-7
SLIDE 7

Inference Objective

◮ Continuously and infinitely generate a sequence of samples

drawn from the distribution of the output expression — so that someone else puts it in good use (vague but common).

slide-8
SLIDE 8

Inference Objective

◮ Continuously and infinitely generate a sequence of samples

drawn from the distribution of the output expression — so that someone else puts it in good use (vague but common).

◮ Approximately compute integral of the form

Φ =

  • −∞

ϕ(x)p(x)dx

slide-9
SLIDE 9

Inference Objective

◮ Continuously and infinitely generate a sequence of samples

drawn from the distribution of the output expression — so that someone else puts it in good use (vague but common).

◮ Approximately compute integral of the form

Φ =

  • −∞

ϕ(x)p(x)dx

◮ Suggest most probable explanation (MPE) - most likely

assignment for all non-evidence variables given evidence.

slide-10
SLIDE 10

Example: Inference Results

[ ]

gamma normal uniform-discrete uniform-continuous 500 1,000 1,500 2,000 2,500 3,000 3,500 4,000 sample count 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 a 50 100 150 200 250 300 350 400 450 500 sample count 1 2 3 4 5 6 7 8 9 b 200 400 600 800 1,000 1,200 1,400 1,600

[(let [dfreqs (frequencies (map :d predicts))] (plot/bar-chart (map (comp #(str/replace % #"class embang.runtime.(.*)- distribution" "$1") str first) dfreqs) (map second dfreqs) :plot-size 600 :aspect-ratio 4 :y-title "sample count")) (plot/histogram (map :a predicts) :x-title "a" :bins 30 :plot-size 250 :aspect- ratio 1.5 :y-title "sample count") (plot/histogram (map :b predicts) :x-title "b" :bins 30 :plot-size 250 :aspect- ratio 1.5)]

slide-11
SLIDE 11

Outline

Probabilistic Programming Inference Path Finding and Probabilistic Inference Stochastic Policy Learning Case Study: Canadian Traveller Problem Summary

slide-12
SLIDE 12

Connection between MAP and Shortest Path

Maximizing the (logarithm of) trace probability log pP(x x x) =

|x x x|

  • i=1

log pFi(xi) +

|y y y|

  • j=1

log pGj(yj) + C corresponds to finding the shortest path in a graph G = (V , E):

◮ V = {(Fi, xi)} ∪ {(Gj, yj)}. ◮ Edge costs are − log pFi(xi) or − log pHj(yj).

(F1, x1) (G1, y1) (F2, x2) − l

  • g

p

F1

( x

1

) − log pG1(y1) − log pF2(x2)

slide-13
SLIDE 13

Marginal MAP as Policy Learning

In Marginal MAP, assignment of a part of the trace x x xθ is inferred. In a probabilistic program:

◮ x

x xθ becomes the program output z z z.

◮ z

z z is marginalized over x x x \ x x xθ.

◮ x

x xθ

MAP = arg max pP(z

z z). Determining x x xθ

MAP corresponds to learning a policy x

x xθ which minimizes the expected path length Ex

x x\x x xθ

  − |x

x xθ|

  • i=1

log pF θ

i (xθ

i ) − |y y y|

  • j=1

log pGj(yj)   

slide-14
SLIDE 14

Outline

Probabilistic Programming Inference Path Finding and Probabilistic Inference Stochastic Policy Learning Case Study: Canadian Traveller Problem Summary

slide-15
SLIDE 15

Policy Learning through Probabilistic Inference

Require: agent, Instances, Policies

1: instance ← Draw(Instances) 2: policy ← Draw(Policies) 3: cost ← Run(agent, instance, policy) 4: Observe(1, Bernoulli(e−cost)) 5: Print(policy)

The log probability of the output policy is log pP(policy) = −cost(policy) + log pPolicies(policy) + C When policies are drawn uniformly log pP(policy) = −cost(policy) + C ′

slide-16
SLIDE 16

Outline

Probabilistic Programming Inference Path Finding and Probabilistic Inference Stochastic Policy Learning Case Study: Canadian Traveller Problem Summary

slide-17
SLIDE 17

Canadian Traveller Problem

CTP is a problem finding the shortest travel distance in a graph where some edges may be blocked. Given

◮ Undirected weighted graph G = (V , E). ◮ The initial and the final location nodes s and t. ◮ Edge weights w : E → R. ◮ Traversability probabilities: po : E → (0, 1].

find the shortest travel distance from s to t — the sum of weights

  • f all traversed edges.
slide-18
SLIDE 18

The Simplest CTP Instance — Two Roads

Given

◮ two roads with probability being open p1 and p2, ◮ costs of each road c1 and c2, ◮ cost of bumping into a blocked road cb,

learn the optimum policy q.

1

(defquery tworoads

2

(loop []

3

(let [o1 (sample (flip p1))

4

  • 2 (sample (flip p2))]

5

(if (not (or o1 o2)) (recur)

6

(let [q (sample (uniform-continuous 0. 1.))

7

s (sample (flip (- 1 q)))]

8

(let [distance (if s (if o1 c1 (+ c2 cb))

9

(if o2 c2 (+ c1 cb)))]

10

(observe +factor+ (- distance))

11

(predict :q q)))))))

slide-19
SLIDE 19

Learning Stochastic Policy for CTP

Depth-first search based policy:

◮ the agent traverses G in depth-first order. ◮ the policy specifies the probabilities of selecting each adjacent

edge in every node. Require: CTP(G, s, t, w, p)

1: for v ∈ V do 2:

policy(v) ← Draw(Dirichlet(1 1 1deg(v)))

3: end for 4: repeat 5:

instance ← Draw(CTP(G, w, p))

6:

(reached, distance) ← StDFS(instance, policy)

7: until reached 8: Observe(1, Bernoulli

  • e−distance

)

9: Print(policy)

slide-20
SLIDE 20

Inference Results — CTP Travel Graphs

Learned policies:

  • pen fraction 1.0
  • pen fraction 0.9
  • pen fraction 0.8
  • pen fraction 0.7
  • pen fraction 0.6

Line widths indicate the frequency of travelling each edge.

slide-21
SLIDE 21

Outline

Probabilistic Programming Inference Path Finding and Probabilistic Inference Stochastic Policy Learning Case Study: Canadian Traveller Problem Summary

slide-22
SLIDE 22

Summary

◮ Discovery of bilateral correspondence between probabilistic

inference and policy learning for path finding.

◮ A new approach to policy learning based on the established

correspondence.

◮ A realization of the approach for the Canadian traveller

problem, where improved policies were consistently learned by probabilistic program inference.

slide-23
SLIDE 23

Thank You