ts tr - - PDF document

t s t r s s t r
SMART_READER_LITE
LIVE PREVIEW

ts tr - - PDF document

ts tr ss t r rr rs


slide-1
SLIDE 1

❈❙✷✸✹ ◆♦t❡s ✲ ▲❡❝t✉r❡ ✷ ▼❛❦✐♥❣ ●♦♦❞ ❉❡❝✐s✐♦♥s ●✐✈❡♥ ❛ ▼♦❞❡❧ ♦❢ t❤❡ ❲♦r❧❞

❘❛❤✉❧ ❙❛r❦❛r✱ ❊♠♠❛ ❇r✉♥s❦✐❧❧ ▼❛r❝❤ ✷✵✱ ✷✵✶✽

✸ ❆❝t✐♥❣ ✐♥ ❛ ▼❛r❦♦✈ ❞❡❝✐s✐♦♥ ♣r♦❝❡ss

❲❡ ❜❡❣✐♥ t❤✐s ❧❡❝t✉r❡ ❜② r❡❝❛❧❧✐♥❣ t❤❡ ❞❡✜♥✐t✐♦♥s ♦❢ ❛ ♠♦❞❡❧✱ ♣♦❧✐❝② ❛♥❞ ✈❛❧✉❡ ❢✉♥❝t✐♦♥ ❢♦r ❛♥ ❛❣❡♥t✳ ▲❡t t❤❡ ❛❣❡♥t✬s st❛t❡ ❛♥❞ ❛❝t✐♦♥ s♣❛❝❡s ❜❡ ❞❡♥♦t❡❞ ❜② S ❛♥❞ A r❡s♣❡❝t✐✈❡❧②✳ ❲❡ t❤❡♥ ❤❛✈❡ t❤❡ ❢♦❧❧♦✇✐♥❣ ❞❡✜♥✐t✐♦♥s✿

  • ▼♦❞❡❧ ✿ ❆ ♠♦❞❡❧ ✐s t❤❡ ♠❛t❤❡♠❛t✐❝❛❧ ❞❡s❝r✐♣t✐♦♥ ♦❢ t❤❡ ❞②♥❛♠✐❝s ❛♥❞ r❡✇❛r❞s ♦❢ t❤❡ ❛❣❡♥t✬s

❡♥✈✐r♦♥♠❡♥t✱ ✇❤✐❝❤ ✐♥❝❧✉❞❡s t❤❡ tr❛♥s✐t✐♦♥ ♣r♦❜❛❜✐❧✐t✐❡s P(s′|s, a) ♦❢ ❜❡✐♥❣ ✐♥ ❛ s✉❝❝❡ss♦r st❛t❡ s′ ∈ S ✇❤❡♥ st❛rt✐♥❣ ❢r♦♠ ❛ st❛t❡ s ∈ S ❛♥❞ t❛❦✐♥❣ ❛♥ ❛❝t✐♦♥ a ∈ A✱ ❛♥❞ t❤❡ r❡✇❛r❞s R(s, a) ✭❡✐t❤❡r ❞❡t❡r♠✐♥✐st✐❝ ♦r st♦❝❤❛st✐❝✮ ♦❜t❛✐♥❡❞ ❜② t❛❦✐♥❣ ❛♥ ❛❝t✐♦♥ a ∈ A ✇❤❡♥ ✐♥ ❛ st❛t❡ s ∈ S✳

  • P♦❧✐❝② ✿ ❆ ♣♦❧✐❝② ✐s ❛ ❢✉♥❝t✐♦♥ π : S → A t❤❛t ♠❛♣s t❤❡ ❛❣❡♥t✬s st❛t❡s t♦ ❛❝t✐♦♥s✳ P♦❧✐❝✐❡s ❝❛♥

❜❡ st♦❝❤❛st✐❝ ♦r ❞❡t❡r♠✐♥✐st✐❝✳

  • ❱❛❧✉❡ ❢✉♥❝t✐♦♥ ✿ ❚❤❡ ✈❛❧✉❡ ❢✉♥❝t✐♦♥ V π ❝♦rr❡s♣♦♥❞✐♥❣ t♦ ❛ ♣❛rt✐❝✉❧❛r ♣♦❧✐❝② π ❛♥❞ ❢♦r ❛ st❛t❡

s ∈ S✱ ✐s t❤❡ ❝✉♠✉❧❛t✐✈❡ s✉♠ ♦❢ ❢✉t✉r❡ ✭❞✐s❝♦✉♥t❡❞✮ r❡✇❛r❞s ♦❜t❛✐♥❡❞ ❜② t❤❡ ❛❣❡♥t✱ ❜② st❛rt✐♥❣ ❢r♦♠ t❤❡ st❛t❡ s ❛♥❞ ❢♦❧❧♦✇✐♥❣ t❤❡ ♣♦❧✐❝②✳ ❲❡ ❛❧s♦ r❡❝❛❧❧ t❤❡ ♥♦t✐♦♥ ♦❢ ▼❛r❦♦✈ ♣r♦♣❡rt② ❢r♦♠ t❤❡ ❧❛st ❧❡❝t✉r❡✳ ❈♦♥s✐❞❡r ❛ st♦❝❤❛st✐❝ ♣r♦❝❡ss (s0, s1, s2, . . . ) ❡✈♦❧✈✐♥❣ ❛❝❝♦r❞✐♥❣ t♦ s♦♠❡ tr❛♥s✐t✐♦♥ ❞②♥❛♠✐❝s✳ ❲❡ s❛② t❤❛t t❤❡ st♦❝❤❛st✐❝ ♣r♦❝❡ss ❤❛s t❤❡ ▼❛r❦♦✈ ♣r♦♣❡rt② ✐❢ ❛♥❞ ♦♥❧② ✐❢ P(si|s0, . . . , si−1) = P(si|si−1)✱ ∀ i = 1, 2, . . . ✱ ✐✳❡✳ t❤❡ tr❛♥s✐t✐♦♥ ♣r♦❜❛❜✐❧✐t② ♦❢ t❤❡ ♥❡①t st❛t❡ ❝♦♥❞✐t✐♦♥❡❞ ♦♥ t❤❡ ❤✐st♦r② ✐♥❝❧✉❞✐♥❣ t❤❡ ❝✉rr❡♥t st❛t❡ ✐s ❡q✉❛❧ t♦ t❤❡ tr❛♥s✐t✐♦♥ ♣r♦❜❛❜✐❧✐t② ♦❢ t❤❡ ♥❡①t st❛t❡ ❝♦♥❞✐t✐♦♥❡❞ ♦♥❧② ♦♥ t❤❡ ❝✉rr❡♥t st❛t❡✳ ■♥ s✉❝❤ ❛ s❝❡♥❛r✐♦✱ t❤❡ ❝✉rr❡♥t st❛t❡ ✐s ❛ s✉✣❝✐❡♥t st❛t✐st✐❝ ♦❢ ❤✐st♦r② ♦❢ t❤❡ st♦❝❤❛st✐❝ ♣r♦❝❡ss✱ ❛♥❞ ✇❡ s❛② t❤❛t ✏t❤❡ ❢✉t✉r❡ ✐s ✐♥❞❡♣❡♥❞❡♥t ♦❢ t❤❡ ♣❛st ❣✐✈❡♥ ♣r❡s❡♥t✳✑ ■♥ t❤✐s ❧❡❝t✉r❡✱ ✇❡ ✇✐❧❧ ❜✉✐❧❞ ♦♥ t❤❡s❡ ❞❡✜♥✐t✐♦♥s ❛♥❞ ♣r♦❝❡❡❞ ✐♥ ♦r❞❡r ❜② ✜rst ❞❡✜♥✐♥❣ ❛ ▼❛r❦♦✈ ♣r♦❝❡ss ✭▼P✮✱ ❢♦❧❧♦✇❡❞ ❜② t❤❡ ❞❡✜♥✐t✐♦♥ ♦❢ ❛ ▼❛r❦♦✈ r❡✇❛r❞ ♣r♦❝❡ss ✭▼❘P✮ ❛♥❞ ✜♥❛❧❧② ❜✉✐❧❞ ♦♥ ❜♦t❤ ♦❢ t❤❡♠ t♦ ❞❡✜♥❡ ❛ ▼❛r❦♦✈ ❞❡❝✐s✐♦♥ ♣r♦❝❡ss ✭▼❉P✮✳ ❲❡ ✇✐❧❧ ✜♥✐s❤ t❤✐s ❧❡❝t✉r❡ ❜② ❞✐s❝✉ss✐♥❣ s♦♠❡ ❛❧❣♦r✐t❤♠s ✇❤✐❝❤ ❡♥❛❜❧❡ ✉s t♦ ♠❛❦❡ ❣♦♦❞ ❞❡❝✐s✐♦♥s ✇❤❡♥ ❛ ▼❉P ✐s ❝♦♠♣❧❡t❡❧② ❦♥♦✇♥✳

✸✳✶ ▼❛r❦♦✈ ♣r♦❝❡ss

■♥ ✐ts ♠♦st ❣❡♥❡r❛❧✐t②✱ ❛ ▼❛r❦♦✈ ♣r♦❝❡ss ✐s ❛ st♦❝❤❛st✐❝ ♣r♦❝❡ss t❤❛t s❛t✐s✜❡s t❤❡ ▼❛r❦♦✈ ♣r♦♣❡rt②✱ ❜❡❝❛✉s❡ ♦❢ ✇❤✐❝❤ ✇❡ s❛② t❤❛t ❛ ▼❛r❦♦✈ ♣r♦❝❡ss ✐s ✏♠❡♠♦r②❧❡ss✑✳ ❋♦r t❤❡ ♣✉r♣♦s❡ ♦❢ t❤✐s ❧❡❝t✉r❡✱ ✇❡ ✇✐❧❧ ♠❛❦❡ t✇♦ ❛❞❞✐t✐♦♥❛❧ ❛ss✉♠♣t✐♦♥s t❤❛t ❛r❡ ✈❡r② ❝♦♠♠♦♥ ✐♥ t❤❡ r❡✐♥❢♦r❝❡♠❡♥t ❧❡❛r♥✐♥❣ s❡tt✐♥❣✿ ✶

slide-2
SLIDE 2
  • ❋✐♥✐t❡ st❛t❡ s♣❛❝❡ ✿ ❚❤❡ st❛t❡ s♣❛❝❡ ♦❢ t❤❡ ▼❛r❦♦✈ ♣r♦❝❡ss ✐s ✜♥✐t❡✳ ❚❤✐s ♠❡❛♥s t❤❛t ❢♦r t❤❡

▼❛r❦♦✈ ♣r♦❝❡ss (s0, s1, s2, . . . )✱ t❤❡r❡ ✐s ❛ st❛t❡ s♣❛❝❡ S ✇✐t❤ |S| < ∞✱ s✉❝❤ t❤❛t ❢♦r ❛❧❧ r❡❛❧✐③❛t✐♦♥s ♦❢ t❤❡ ▼❛r❦♦✈ ♣r♦❝❡ss✱ ✇❡ ❤❛✈❡ si ∈ S ❢♦r ❛❧❧ i = 1, 2, . . . ✳

  • ❙t❛t✐♦♥❛r② tr❛♥s✐t✐♦♥ ♣r♦❜❛❜✐❧✐t✐❡s ✿ ❚❤❡ tr❛♥s✐t✐♦♥ ♣r♦❜❛❜✐❧✐t✐❡s ❛r❡ t✐♠❡ ✐♥❞❡♣❡♥❞❡♥t✳ ▼❛t❤❡✲

♠❛t✐❝❛❧❧②✱ t❤✐s ♠❡❛♥s t❤❡ ❢♦❧❧♦✇✐♥❣✿ P(si = s′|si−1 = s) = P(sj = s′|sj−1 = s) , ∀ s, s′ ∈ S , ∀ i, j = 1, 2, . . . . ✭✶✮ ❯♥❧❡ss ♦t❤❡r✇✐s❡ s♣❡❝✐✜❡❞✱ ✇❡ ✇✐❧❧ ❛❧✇❛②s ❛ss✉♠❡ t❤❛t t❤❡s❡ t✇♦ ♣r♦♣❡rt✐❡s ❤♦❧❞ ❢♦r ❛♥② ▼❛r❦♦✈ ♣r♦❝❡ss t❤❛t ✇❡ ✇✐❧❧ ❡♥❝♦✉♥t❡r ✐♥ t❤✐s ❧❡❝t✉r❡✱ ✐♥❝❧✉❞✐♥❣ ❢♦r ❛♥② ▼❛r❦♦✈ r❡✇❛r❞ ♣r♦❝❡ss ❛♥❞ ❛♥② ▼❛r❦♦✈ ❞❡❝✐s✐♦♥ ♣r♦❝❡ss t♦ ❜❡ ❞❡✜♥❡❞ ❧❛t❡r ❜② ❛❞❞✐♥❣ ♣r♦❣r❡ss✐✈❡❧② ❡①tr❛ str✉❝t✉r❡ t♦ t❤❡ ▼❛r❦♦✈ ♣r♦❝❡ss✳ ◆♦t❡ t❤❛t ❛ ▼❛r❦♦✈ ♣r♦❝❡ss s❛t✐s❢②✐♥❣ t❤❡s❡ ❛ss✉♠♣t✐♦♥s ✐s ❛❧s♦ s♦♠❡t✐♠❡s ❝❛❧❧❡❞ ❛ ✏▼❛r❦♦✈ ❝❤❛✐♥✑✱ ❛❧t❤♦✉❣❤ t❤❡ ♣r❡❝✐s❡ ❞❡✜♥✐t✐♦♥ ♦❢ ❛ ▼❛r❦♦✈ ❝❤❛✐♥ ✈❛r✐❡s✳ ❋♦r t❤❡ ▼❛r❦♦✈ ♣r♦❝❡ss✱ t❤❡s❡ ❛ss✉♠♣t✐♦♥s ❧❡❛❞ t♦ ❛ ♥✐❝❡ ❝❤❛r❛❝t❡r✐③❛t✐♦♥ ♦❢ t❤❡ tr❛♥s✐t✐♦♥ ❞②♥❛♠✐❝s ✐♥ t❡r♠s ♦❢ ❛ tr❛♥s✐t✐♦♥ ♣r♦❜❛❜✐❧✐t② ♠❛tr✐① P ♦❢ s✐③❡ |S|×|S|✱ ✇❤♦s❡ (i, j) ❡♥tr② ✐s ❣✐✈❡♥ ❜② Pij = P(j|i)✱ ✇✐t❤ i, j r❡❢❡rr✐♥❣ t♦ t❤❡ st❛t❡s ♦❢ S ♦r❞❡r❡❞ ❛r❜✐tr❛r✐❧②✳ ■t s❤♦✉❧❞ ❜❡ ♥♦t❡❞ t❤❛t t❤❡ ♠❛tr✐① P ✐s ❛ ♥♦♥✲♥❡❣❛t✐✈❡ r♦✇✲st♦❝❤❛st✐❝ ♠❛tr✐①✱ ✐✳❡✳ t❤❡ s✉♠ ♦❢ ❡❛❝❤ r♦✇ ❡q✉❛❧s ✶✳ ❍❡♥❝❡❢♦rt❤✱ ✇❡ ✇✐❧❧ t❤✉s ❞❡✜♥❡ ❛ ▼❛r❦♦✈ ♣r♦❝❡ss ❜② t❤❡ t✉♣❧❡ (S, P)✱ ✇❤✐❝❤ ❝♦♥s✐sts ♦❢ t❤❡ ❢♦❧❧♦✇✐♥❣✿

  • S ✿ ❆ ✜♥✐t❡ st❛t❡ s♣❛❝❡✳
  • P ✿ ❆ tr❛♥s✐t✐♦♥ ♣r♦❜❛❜✐❧✐t② ♠♦❞❡❧ t❤❛t s♣❡❝✐✜❡s P(s′|s)✳

❊①❡r❝✐s❡ ✸✳✶✳ ✭❛✮ Pr♦✈❡ t❤❛t P ✐s ❛ r♦✇✲st♦❝❤❛st✐❝ ♠❛tr✐①✳ ✭❜✮ ❙❤♦✇ t❤❛t ✶ ✐s ❛♥ ❡✐❣❡♥✈❛❧✉❡ ♦❢ ❛♥② r♦✇✲st♦❝❤❛st✐❝ ♠❛tr✐①✱ ❛♥❞ ✜♥❞ ❛ ❝♦rr❡s♣♦♥❞✐♥❣ ❡✐❣❡♥✈❡❝t♦r✳ ✭❝✮ ❙❤♦✇ t❤❛t ❛♥② ❡✐❣❡♥✈❛❧✉❡ ♦❢ ❛ r♦✇✲st♦❝❤❛st✐❝ ♠❛tr✐① ❤❛s ♠❛①✐♠✉♠ ❛❜s♦❧✉t❡ ✈❛❧✉❡ ✶✳ ❊①❡r❝✐s❡ ✸✳✷✳ ❚❤❡ ♠❛①✲♥♦r♠ ♦r ✐♥✜♥✐t②✲♥♦r♠ ♦❢ ❛ ✈❡❝t♦r x ∈ Rn ✐s ❞❡♥♦t❡❞ ❜② ||x||∞✱ ❛♥❞ ❞❡✜♥❡❞ ❛s ||x||∞ = maxi |xi|✱ ✐✳❡✳ ✐t ✐s t❤❡ ❝♦♠♣♦♥❡♥t ♦❢ x ✇✐t❤ t❤❡ ♠❛①✐♠✉♠ ❛❜s♦❧✉t❡ ✈❛❧✉❡✳ ❋♦r ❛♥② ♠❛tr✐① A ∈ Rm×n✱ ❞❡✜♥❡ t❤❡ ❢♦❧❧♦✇✐♥❣ q✉❛♥t✐t② ||A||∞ = sup

x∈Rn x=0

||Ax||∞ ||x||∞ . ✭✷✮ ✭❛✮ Pr♦✈❡ t❤❛t ||A||∞ s❛t✐s✜❡s ❛❧❧ t❤❡ ♣r♦♣❡rt✐❡s ♦❢ ❛ ♥♦r♠✳ ❚❤❡ q✉❛♥t✐t② s♦ ❞❡✜♥❡❞ ✐s ❝❛❧❧❡❞ t❤❡ ✏✐♥❞✉❝❡❞ ✐♥✜♥✐t② ♥♦r♠✑ ♦❢ ❛ ♠❛tr✐①✳ ✭❜✮ Pr♦✈❡ t❤❛t ||A||∞ = max

i=1,...,m

 

n

  • j=1

|Aij|   . ✭✸✮ ✭❝✮ ❈♦♥❝❧✉❞❡ t❤❛t ✐❢ A ✐s r♦✇✲st♦❝❤❛st✐❝✱ t❤❡♥ ||A||∞ = 1✳ ✭❞✮ Pr♦✈❡ t❤❛t ❢♦r ❡✈❡r② x ∈ Rn✱ ||Ax||∞ ≤ ||A||∞||x||∞✳ ✸✳✶✳✶ ❊①❛♠♣❧❡ ♦❢ ❛ ▼❛r❦♦✈ ♣r♦❝❡ss ✿ ▼❛rs ❘♦✈❡r ❚♦ ♣r❛❝t✐❝❡ ♦✉r ✉♥❞❡rst❛♥❞✐♥❣✱ ❝♦♥s✐❞❡r t❤❡ ▼❛r❦♦✈ ♣r♦❝❡ss s❤♦✇♥ ✐♥ ❋✐❣✉r❡ ✶✳ ❖✉r ❛❣❡♥t ✐s ❛ ▼❛rs r♦✈❡r ✇❤♦s❡ st❛t❡ s♣❛❝❡ ✐s ❣✐✈❡♥ ❜② S = {S1, S2, S3, S4, S5, S6, S7}✳ ❚❤❡ tr❛♥s✐t✐♦♥ ♣r♦❜❛❜✐❧✐t✐❡s ♦❢ t❤❡ st❛t❡s ❛r❡ ✐♥❞✐❝❛t❡❞ ✐♥ t❤❡ ✜❣✉r❡ ✇✐t❤ ❛rr♦✇s✳ ❙♦ ❢♦r ❡①❛♠♣❧❡ ✐❢ t❤❡ r♦✈❡r ✐s ✐♥ t❤❡ st❛t❡ S4 ❛t ✷

slide-3
SLIDE 3

❋✐❣✉r❡ ✶✿ ▼❛rs ❘♦✈❡r ▼❛r❦♦✈ ♣r♦❝❡ss✳ t❤❡ ❝✉rr❡♥t t✐♠❡ st❡♣✱ ✐♥ t❤❡ ♥❡①t t✐♠❡ st❡♣ ✐t ❝❛♥ ❣♦ t♦ t❤❡ st❛t❡s S3, S4, S5 ✇✐t❤ ♣r♦❜❛❜✐❧✐t✐❡s ❣✐✈❡♥ ❜② 0.4, 0.2, 0.4 r❡s♣❡❝t✐✈❡❧②✳ ❆ss✉♠✐♥❣ t❤❛t t❤❡ r♦✈❡r st❛rts ♦✉t ✐♥ st❛t❡ S4✱ s♦♠❡ ♣♦ss✐❜❧❡ ❡♣✐s♦❞❡s ♦❢ t❤❡ ▼❛r❦♦✈ ♣r♦❝❡ss ❝♦✉❧❞ ❧♦♦❦ ❛s ❢♦❧❧♦✇s✿ ✕ S4, S5, S6, S7, S7, S7, . . . ✕ S4, S4, S5, S4, S5, S6, . . . ✕ S4, S3, S2, S1, . . . ❊①❡r❝✐s❡ ✸✳✸✳ ❈♦♥s✐❞❡r t❤❡ ❡①❛♠♣❧❡ ♦❢ ❛ ▼❛r❦♦✈ ♣r♦❝❡ss ❣✐✈❡♥ ✐♥ ❋✐❣✉r❡ ✶✳ ✭❛✮ ❲r✐t❡ ❞♦✇♥ t❤❡ tr❛♥s✐t✐♦♥ ♣r♦❜❛❜✐❧✐t② ♠❛tr✐① ❢♦r t❤❡ ▼❛r❦♦✈ ♣r♦❝❡ss✳

✸✳✷ ▼❛r❦♦✈ r❡✇❛r❞ ♣r♦❝❡ss

❆ ▼❛r❦♦✈ r❡✇❛r❞ ♣r♦❝❡ss ✐s ❛ ▼❛r❦♦✈ ♣r♦❝❡ss✱ t♦❣❡t❤❡r ✇✐t❤ t❤❡ s♣❡❝✐✜❝❛t✐♦♥ ♦❢ ❛ r❡✇❛r❞ ❢✉♥❝t✐♦♥ ❛♥❞ ❛ ❞✐s❝♦✉♥t ❢❛❝t♦r✳ ■t ✐s ❢♦r♠❛❧❧② r❡♣r❡s❡♥t❡❞ ✉s✐♥❣ t❤❡ t✉♣❧❡ (S, P, R, γ) ✇❤✐❝❤ ❛r❡ ❧✐st❡❞ ❜❡❧♦✇✿

  • S ✿ ❆ ✜♥✐t❡ st❛t❡ s♣❛❝❡✳
  • P ✿ ❆ tr❛♥s✐t✐♦♥ ♣r♦❜❛❜✐❧✐t② ♠♦❞❡❧ t❤❛t s♣❡❝✐✜❡s P(s′|s)✳
  • R ✿ ❆ r❡✇❛r❞ ❢✉♥❝t✐♦♥ t❤❛t ♠❛♣s st❛t❡s t♦ r❡✇❛r❞s ✭r❡❛❧ ♥✉♠❜❡rs✮✱ ✐✳❡ R : S → R✳
  • γ✿ ❉✐s❝♦✉♥t ❢❛❝t♦r ❜❡t✇❡❡♥ 0 ❛♥❞ 1✳

❲❡ ❤❛✈❡ ❛❧r❡❛❞② ❡①♣❧❛✐♥❡❞ t❤❡ r♦❧❡s ♣❧❛②❡❞ ❜② S ❛♥❞ P ✐♥ t❤❡ ❝♦♥t❡①t ♦❢ ❛ ▼❛r❦♦✈ ♣r♦❝❡ss✳ ❲❡ ✇✐❧❧ ♥❡①t ❡①♣❧❛✐♥ t❤❡ ❝♦♥❝❡♣t ♦❢ t❤❡ r❡✇❛r❞ ❢✉♥❝t✐♦♥ R ❛♥❞ t❤❡ ❞✐s❝♦✉♥t ❢❛❝t♦r γ✱ ✇❤✐❝❤ ❛r❡ s♣❡❝✐✜❝ t♦ t❤❡ ▼❛r❦♦✈ r❡✇❛r❞ ♣r♦❝❡ss✳ ❆❞❞✐t✐♦♥❛❧❧②✱ ✇❡ ✇✐❧❧ ❛❧s♦ ❞❡✜♥❡ ❛♥❞ ❡①♣❧❛✐♥ ❛ ❢❡✇ q✉❛♥t✐t✐❡s ✇❤✐❝❤ ❛r❡ ✐♠♣♦rt❛♥t ✐♥ t❤✐s ❝♦♥t❡①t✱ s✉❝❤ ❛s t❤❡ ❤♦r✐③♦♥✱ r❡t✉r♥ ❛♥❞ st❛t❡ ✈❛❧✉❡ ❢✉♥❝t✐♦♥ ♦❢ ❛ ▼❛r❦♦✈ r❡✇❛r❞ ♣r♦❝❡ss✳ ✸✳✷✳✶ ❘❡✇❛r❞ ❢✉♥❝t✐♦♥ ■♥ ❛ ▼❛r❦♦✈ r❡✇❛r❞ ♣r♦❝❡ss✱ ✇❤❡♥❡✈❡r ❛ tr❛♥s✐t✐♦♥ ❤❛♣♣❡♥s ❢r♦♠ ❛ ❝✉rr❡♥t st❛t❡ s t♦ ❛ s✉❝❝❡ss♦r st❛t❡ s′✱ ❛ r❡✇❛r❞ ✐s ♦❜t❛✐♥❡❞ ❞❡♣❡♥❞✐♥❣ ♦♥ t❤❡ ❝✉rr❡♥t st❛t❡ s✳ ❚❤✉s ❢♦r t❤❡ ▼❛r❦♦✈ ♣r♦❝❡ss (s0, s1, s2, . . . )✱ ❡❛❝❤ tr❛♥s✐t✐♦♥ si → si+1 ✐s ❛❝❝♦♠♣❛♥✐❡❞ ❜② ❛ r❡✇❛r❞ ri ❢♦r ❛❧❧ i = 0, 1, . . . ✱ ❛♥❞ s♦ ❛ ♣❛rt✐❝✉❧❛r ❡♣✐s♦❞❡ ✸

slide-4
SLIDE 4

♦❢ t❤❡ ▼❛r❦♦✈ r❡✇❛r❞ ♣r♦❝❡ss ✐s r❡♣r❡s❡♥t❡❞ ❛s (s0, r0, s1, r1, s2, r2, . . . )✳ ❲❡ s❤♦✉❧❞ ♥♦t❡ t❤❛t t❤❡s❡ r❡✇❛r❞s ❝❛♥ ❜❡ ❡✐t❤❡r ❞❡t❡r♠✐♥✐st✐❝ ♦r st♦❝❤❛st✐❝✳ ❋♦r ❛ st❛t❡ s ∈ S✱ ✇❡ ❞❡✜♥❡ t❤❡ ❡①♣❡❝t❡❞ r❡✇❛r❞ R(s) ❜②✿ R(s) = E[r0|s0 = s], ✭✹✮ t❤❛t ✐s R(s) ✐s t❤❡ ❡①♣❡❝t❡❞ r❡✇❛r❞ ♦❜t❛✐♥❡❞ ❞✉r✐♥❣ t❤❡ ✜rst tr❛♥s✐t✐♦♥✱ ✇❤❡♥ t❤❡ ▼❛r❦♦✈ ♣r♦❝❡ss st❛rts ✐♥ st❛t❡ s✳ ❏✉st ❧✐❦❡ t❤❡ ❛ss✉♠♣t✐♦♥ ♦❢ st❛t✐♦♥❛r② tr❛♥s✐t✐♦♥ ♣r♦❜❛❜✐❧✐t✐❡s✱ ❣♦✐♥❣ ❢♦r✇❛r❞ ✇❡ ✇✐❧❧ ❛❧s♦ ❛ss✉♠❡ t❤❡ ❢♦❧❧♦✇✐♥❣✿

  • ❙t❛t✐♦♥❛r② r❡✇❛r❞s ✿ ❚❤❡ r❡✇❛r❞s ✐♥ ❛ ▼❛r❦♦✈ r❡✇❛r❞ ♣r♦❝❡ss ❛r❡ st❛t✐♦♥❛r② ✇❤✐❝❤ ♠❡❛♥s t❤❛t

t❤❡② ❛r❡ t✐♠❡ ✐♥❞❡♣❡♥❞❡♥t✳ ■♥ t❤❡ ❞❡t❡r♠✐♥✐st✐❝ ❝❛s❡✱ ♠❛t❤❡♠❛t✐❝❛❧❧② t❤✐s ♠❡❛♥s t❤❛t ❢♦r ❛❧❧ r❡❛❧✐③❛t✐♦♥s ♦❢ t❤❡ ♣r♦❝❡ss ✇❡ ♠✉st ❤❛✈❡ t❤❛t✿ ri = rj , ✇❤❡♥❡✈❡r si = sj ∀ i, j = 0, 1, . . . , ✭✺✮ ✇❤✐❧❡ ✐♥ t❤❡ ❝❛s❡ ♦❢ st♦❝❤❛st✐❝ r❡✇❛r❞s ✇❡ r❡q✉✐r❡ t❤❛t t❤❡ ❝✉♠✉❧❛t✐✈❡ ❞✐str✐❜✉t✐♦♥ ❢✉♥❝t✐♦♥s ✭❝❞❢✮ ♦❢ t❤❡ r❡✇❛r❞s ❝♦♥❞✐t✐♦♥❡❞ ♦♥ t❤❡ ❝✉rr❡♥t st❛t❡ ❜❡ t✐♠❡ ✐♥❞❡♣❡♥❞❡♥t✳ ❚❤✐s ✐s ✇r✐tt❡♥ ♠❛t❤❡♠❛t✐❝❛❧❧② ❛s✿ F(ri|si = s) = F(rj|sj = s) , ∀ s ∈ S , ∀ i, j = 0, 1, . . . , ✭✻✮ ✇❤❡r❡ F(ri|si = s) ❞❡♥♦t❡s t❤❡ ❝❞❢ ♦❢ ri ❝♦♥❞✐t✐♦♥❡❞ ♦♥ t❤❡ st❛t❡ si = s✳ ◆♦t✐❝❡ t❤❛t ❛s ❛ ❝♦♥s❡q✉❡♥❝❡ ♦❢ ✭✺✮ ❛♥❞ ✭✻✮✱ ✇❡ ❢✉rt❤❡r♠♦r❡ ❤❛✈❡ t❤❡ ❢♦❧❧♦✇✐♥❣ r❡s✉❧t ❛❜♦✉t t❤❡ ❡①♣❡❝t❡❞ r❡✇❛r❞s✿ R(s) = E[ri|si = s] , ∀ i = 0, 1, . . . . ✭✼✮ ❲❡ ✇✐❧❧ s❡❡ t❤❛t ❛s ❧♦♥❣ ❛s t❤❡ ✏st❛t✐♦♥❛r② r❡✇❛r❞s✑ ❛ss✉♠♣t✐♦♥ ✐s tr✉❡ ❛❜♦✉t ❛ ▼❛r❦♦✈ r❡✇❛r❞ ♣r♦❝❡ss✱ ♦♥❧② t❤❡ ❡①♣❡❝t❡❞ r❡✇❛r❞ R ♠❛tt❡rs ✐♥ t❤❡ t❤✐♥❣s t❤❛t ✇❡ ✇✐❧❧ ❜❡ ✐♥t❡r❡st❡❞ ✐♥✱ ❛♥❞ ✇❡ ❝❛♥ ❞❡♣♦s❡ ♦❢ t❤❡ q✉❛♥t✐t✐❡s ri ❡♥t✐r❡❧②✳ ❍❡♥❝❡ ❣♦✐♥❣ ❢♦r✇❛r❞✱ t❤❡ ✇♦r❞ ✏r❡✇❛r❞✑ ✇✐❧❧ ❜❡ ✉s❡❞ ✐♥t❡r❝❤❛♥❣❡❛❜❧② t♦ ♠❡❛♥ ❜♦t❤ R ❛♥❞ ri✱ ❛♥❞ s❤♦✉❧❞ ❜❡ ❡❛s✐❧② ✉♥❞❡rst♦♦❞ ❢r♦♠ ❝♦♥t❡①t✳ ❋✐♥❛❧❧② ♥♦t✐❝❡ t❤❛t R ❝❛♥ ❜❡ r❡♣r❡s❡♥t❡❞ ❛s ❛ ✈❡❝t♦r ♦❢ ❞✐♠❡♥s✐♦♥ |S|✱ ✐♥ t❤❡ ❝❛s❡ ♦❢ ❛ ✜♥✐t❡ st❛t❡ s♣❛❝❡ S✳ ❊①❡r❝✐s❡ ✸✳✹✳ ✭❛✮ ❯♥❞❡r t❤❡ ❛ss✉♠♣t✐♦♥s ♦❢ st❛t✐♦♥❛r② tr❛♥s✐t✐♦♥ ♣r♦❜❛❜✐❧✐t✐❡s ❛♥❞ r❡✇❛r❞s✱ ♣r♦✈❡ ❡q✉❛t✐♦♥ ✭✼✮✳ ✸✳✷✳✷ ❍♦r✐③♦♥✱ ❘❡t✉r♥ ❛♥❞ ❱❛❧✉❡ ❢✉♥❝t✐♦♥ ❲❡ ♥❡①t ❞❡✜♥❡ t❤❡ ♥♦t✐♦♥s ♦❢ t❤❡ ❤♦r✐③♦♥✱ r❡t✉r♥ ❛♥❞ ✈❛❧✉❡ ❢✉♥❝t✐♦♥ ❢♦r ❛ ▼❛r❦♦✈ r❡✇❛r❞ ♣r♦❝❡ss✳

  • ❍♦r✐③♦♥ ✿ ❚❤❡ ❤♦r✐③♦♥ H ♦❢ ❛ ▼❛r❦♦✈ r❡✇❛r❞ ♣r♦❝❡ss ✐s ❞❡✜♥❡❞ ❛s t❤❡ ♥✉♠❜❡r ♦❢ t✐♠❡ st❡♣s ✐♥

❡❛❝❤ ❡♣✐s♦❞❡ ✭r❡❛❧✐③❛t✐♦♥✮ ♦❢ t❤❡ ♣r♦❝❡ss✳ ❚❤❡ ❤♦r✐③♦♥ ❝❛♥ ❜❡ ✜♥✐t❡ ♦r ✐♥✜♥✐t❡✳ ■❢ t❤❡ ❤♦r✐③♦♥ ✐s ✜♥✐t❡✱ t❤❡♥ t❤❡ ♣r♦❝❡ss ✐s ❛❧s♦ ❝❛❧❧❡❞ ❛ ✜♥✐t❡ ▼❛r❦♦✈ r❡✇❛r❞ ♣r♦❝❡ss✳

  • ❘❡t✉r♥ ✿ ❚❤❡ r❡t✉r♥ Gt ♦❢ ❛ ▼❛r❦♦✈ r❡✇❛r❞ ♣r♦❝❡ss ✐s ❞❡✜♥❡❞ ❛s t❤❡ ❞✐s❝♦✉♥t❡❞ s✉♠ ♦❢ r❡✇❛r❞s

st❛rt✐♥❣ ❛t t✐♠❡ t ✉♣ t♦ t❤❡ ❤♦r✐③♦♥ H✱ ❛♥❞ ✐s ❣✐✈❡♥ ❜② t❤❡ ❢♦❧❧♦✇✐♥❣ ♠❛t❤❡♠❛t✐❝❛❧ ❢♦r♠✉❧❛✿ Gt =

H−1

  • i=t

γi−tri , ∀ 0 ≤ t ≤ H − 1. ✭✽✮

  • ❙t❛t❡ ✈❛❧✉❡ ❢✉♥❝t✐♦♥ ✿ ❚❤❡ st❛t❡ ✈❛❧✉❡ ❢✉♥❝t✐♦♥ Vt(s) ❢♦r ❛ ▼❛r❦♦✈ r❡✇❛r❞ ♣r♦❝❡ss ❛♥❞ ❛ st❛t❡

s ∈ S ✐s ❞❡✜♥❡❞ ❛s t❤❡ ❡①♣❡❝t❡❞ r❡t✉r♥ st❛rt✐♥❣ ❢r♦♠ st❛t❡ s ❛t t✐♠❡ t✱ ❛♥❞ ✐s ❣✐✈❡♥ ❜② t❤❡ ❢♦❧❧♦✇✐♥❣ ❡①♣r❡ss✐♦♥✿ Vt(s) = E[Gt|st = s]. ✭✾✮ ◆♦t✐❝❡ t❤❛t ✇❤❡♥ t❤❡ ❤♦r✐③♦♥ H ✐s ✐♥✜♥✐t❡✱ t❤✐s ❞❡✜♥✐t✐♦♥ ✭✾✮ t♦❣❡t❤❡r ✇✐t❤ t❤❡ st❛t✐♦♥❛r② ❛ss✉♠♣t✐♦♥s ♦❢ t❤❡ r❡✇❛r❞s ❛♥❞ tr❛♥s✐t✐♦♥ ♣r♦❜❛❜✐❧✐t✐❡s ✐♠♣❧② t❤❛t Vi(s) = Vj(s) ❢♦r ❛❧❧ i, j = 0, 1, . . . ✱ ❛♥❞ t❤✉s ✐♥ t❤✐s ❝❛s❡ ✇❡ ✇✐❧❧ ❞❡✜♥❡✿ V (s) = V0(s) . ✭✶✵✮ ✹

slide-5
SLIDE 5

❊①❡r❝✐s❡ ✸✳✺✳ ✭❛✮ ■❢ t❤❡ ❛ss✉♠♣t✐♦♥s ♦❢ st❛t✐♦♥❛r② tr❛♥s✐t✐♦♥ ♣r♦❜❛❜✐❧✐t✐❡s ❛♥❞ st❛t✐♦♥❛r② r❡✇❛r❞s ❤♦❧❞✱ ❛♥❞ ✐❢ t❤❡ ❤♦r✐③♦♥ H ✐s ✐♥✜♥✐t❡✱ t❤❡♥ ✉s✐♥❣ t❤❡ ❞❡✜♥✐t✐♦♥s ✐♥ ✭✽✮ ❛♥❞ ✭✾✮ ♣r♦✈❡ t❤❛t Vi(s) = Vj(s) ❢♦r ❛❧❧ i, j = 0, 1, . . . ✳ ✸✳✷✳✸ ❉✐s❝♦✉♥t ❢❛❝t♦r ◆♦t✐❝❡ t❤❛t ✐♥ t❤❡ ❞❡✜♥✐t✐♦♥ ♦❢ r❡t✉r♥ Gt ✐♥ ✭✽✮✱ ✐❢ t❤❡ ❤♦r✐③♦♥ ✐s ✐♥✜♥✐t❡ ❛♥❞ γ = 1✱ t❤❡♥ t❤❡ r❡t✉r♥ ❝❛♥ ❜❡❝♦♠❡ ✐♥✜♥✐t❡ ❡✈❡♥ ✐❢ t❤❡ r❡✇❛r❞s ❛r❡ ❛❧❧ ❜♦✉♥❞❡❞✳ ■❢ t❤✐s ❤❛♣♣❡♥s✱ t❤❡♥ t❤❡ ✈❛❧✉❡ ❢✉♥❝t✐♦♥ V (s) ❝❛♥ ❛❧s♦ ❜❡❝♦♠❡ ✐♥✜♥✐t❡✳ ❙✉❝❤ ♣r♦❜❧❡♠s ❝❛♥♥♦t t❤❡♥ ❜❡ s♦❧✈❡❞ ✉s✐♥❣ ❛ ❝♦♠♣✉t❡r✳ ❚♦ ❛✈♦✐❞ s✉❝❤ ♠❛t❤❡♠❛t✐❝❛❧ ❞✐✣❝✉❧t✐❡s ❛♥❞ ♠❛❦❡ t❤❡ ♣r♦❜❧❡♠s ❝♦♠♣✉t❛t✐♦♥❛❧❧② tr❛❝t❛❜❧❡ ✇❡ s❡t γ < 1✱ ✇❤✐❝❤ ❡①♣♦♥❡♥t✐❛❧❧② ✇❡✐❣❤s ❞♦✇♥ t❤❡ ❝♦♥tr✐❜✉t✐♦♥ ♦❢ r❡✇❛r❞s ❛t ❢✉t✉r❡ t✐♠❡s✱ ✐♥ t❤❡ ❝❛❧❝✉❧❛t✐♦♥ ♦❢ t❤❡ r❡t✉r♥ ✐♥ ✭✽✮✳ ❚❤✐s q✉❛♥t✐t② γ ✐s ❝❛❧❧❡❞ t❤❡ ❞✐s❝♦✉♥t ❢❛❝t♦r✳ ❖t❤❡r t❤❛♥ ❢♦r ♣✉r❡❧② ❝♦♠♣✉t❛t✐♦♥❛❧ r❡❛s♦♥s✱ ✐t s❤♦✉❧❞ ❜❡ ♥♦t❡❞ t❤❛t ❤✉♠❛♥s ❜❡❤❛✈❡ ✐♥ ♠✉❝❤ t❤❡ s❛♠❡ ✇❛② ✲ ✇❡ t❡♥❞ t♦ ♣✉t ♠♦r❡ ✐♠♣♦rt❛♥❝❡ ✐♥ ✐♠♠❡❞✐❛t❡ r❡✇❛r❞s ♦✈❡r r❡✇❛r❞s ♦❜t❛✐♥❡❞ ❛t ❛ ❧❛t❡r t✐♠❡✳ ❚❤❡ ✐♥t❡r♣r❡t❛t✐♦♥ ♦❢ γ ✐s t❤❛t ✇❤❡♥ γ = 0✱ ✇❡ ♦♥❧② ❝❛r❡ ❛❜♦✉t t❤❡ ✐♠♠❡❞✐❛t❡ r❡✇❛r❞✱ ✇❤✐❧❡ ✇❤❡♥ γ = 1✱ ✇❡ ♣✉t ❛s ♠✉❝❤ ✐♠♣♦rt❛♥❝❡ ♦♥ ❢✉t✉r❡ r❡✇❛r❞s ❛s ❝♦♠♣❛r❡❞ t❤❡ ♣r❡s❡♥t✳ ❋✐♥❛❧❧②✱ ♥♦t✐❝❡ t❤❛t ✐❢ t❤❡ ❤♦r✐③♦♥ ♦❢ t❤❡ ▼❛r❦♦✈ r❡✇❛r❞ ♣r♦❝❡ss ✐s ✜♥✐t❡✱ ✐✳❡✳ H < ∞✱ t❤❡♥ ✇❡ ❝❛♥ s❡t γ = 1✱ ❛s t❤❡ r❡t✉r♥s ❛♥❞ ✈❛❧✉❡ ❢✉♥❝t✐♦♥s ❛r❡ ❛❧✇❛②s ✜♥✐t❡✳ ❊①❡r❝✐s❡ ✸✳✻✳ ❈♦♥s✐❞❡r ❛ ✜♥✐t❡ ❤♦r✐③♦♥ ▼❛r❦♦✈ r❡✇❛r❞ ♣r♦❝❡ss✱ ✇✐t❤ ❜♦✉♥❞❡❞ r❡✇❛r❞s✳ ❙♣❡❝✐✜❝❛❧❧② ❛ss✉♠❡ t❤❛t ∃ M ∈ (0, ∞) s✉❝❤ t❤❛t |ri| ≤ M ∀ i ❛♥❞ ❛❝r♦ss ❛❧❧ ❡♣✐s♦❞❡s ✭r❡❛❧✐③❛t✐♦♥s✮✳ ✭❛✮ ❙❤♦✇ t❤❛t t❤❡ r❡t✉r♥ ❢♦r ❛♥② ❡♣✐s♦❞❡ Gt ❛s ❞❡✜♥❡❞ ✐♥ ✭✽✮ ✐s ❜♦✉♥❞❡❞✳ ✭❜✮ ❈❛♥ ②♦✉ s✉❣❣❡st ❛ ❜♦✉♥❞❄ ❙♣❡❝✐✜❝❛❧❧② ❝❛♥ ②♦✉ ✜♥❞ C(M, γ, t, H) s✉❝❤ t❤❛t |Gt| ≤ C ❢♦r ❛♥② ❡♣✐s♦❞❡❄ ❊①❡r❝✐s❡ ✸✳✼✳ ❈♦♥s✐❞❡r ❛♥ ✐♥✜♥✐t❡ ❤♦r✐③♦♥ ▼❛r❦♦✈ r❡✇❛r❞ ♣r♦❝❡ss✱ ✇✐t❤ ❜♦✉♥❞❡❞ r❡✇❛r❞s ❛♥❞ γ < 1✳ ✭❛✮ Pr♦✈❡ t❤❛t t❤❡ r❡t✉r♥ ❢♦r ❛♥② ❡♣✐s♦❞❡ Gt ❛s ❞❡✜♥❡❞ ✐♥ ✭✽✮ ❝♦♥✈❡r❣❡s t♦ ❛ ✜♥✐t❡ ❧✐♠✐t✳ ❍✐♥t✿ ❈♦♥s✐❞❡r t❤❡ ♣❛rt✐❛❧ s✉♠s SN = N

i=t γi−tri ❢♦r N ≥ t✳ ❙❤♦✇ t❤❛t {SN}N≥t ✐s ❛ ❈❛✉❝❤② s❡q✉❡♥❝❡✳

✸✳✷✳✹ ❊①❛♠♣❧❡ ♦❢ ❛ ▼❛r❦♦✈ r❡✇❛r❞ ♣r♦❝❡ss ✿ ▼❛rs ❘♦✈❡r ❆s ❛♥ ❡①❛♠♣❧❡✱ ❝♦♥s✐❞❡r t❤❡ ▼❛r❦♦✈ r❡✇❛r❞ ♣r♦❝❡ss ✐♥ ❋✐❣✉r❡ ✷✳ ❚❤❡ st❛t❡s ❛♥❞ t❤❡ tr❛♥s✐t✐♦♥ ♣r♦❜❛✲ ❜✐❧✐t✐❡s ♦❢ t❤✐s ♣r♦❝❡ss ❛r❡ ❡①❛❝t❧② t❤❡ s❛♠❡ ❛s ✐♥ t❤❡ ▼❛rs r♦✈❡r ▼❛r❦♦✈ ♣r♦❝❡ss ❡①❛♠♣❧❡ ♦❢ ❊①❡r❝✐s❡ ✸✳✸✳ ❚❤❡ r❡✇❛r❞s ♦❜t❛✐♥❡❞ ❜② ❡①❡❝✉t✐♥❣ ❛♥ ❛❝t✐♦♥ ❢r♦♠ ❛♥② ♦❢ t❤❡ st❛t❡s {S2, S3, S4, S5, S6} ✐s 0✱ ✇❤✐❧❡ ❛♥② ♠♦✈❡s ❢r♦♠ st❛t❡s S1, S7 ②✐❡❧❞ r❡✇❛r❞s 1, 10 r❡s♣❡❝t✐✈❡❧②✳ ❚❤❡ r❡✇❛r❞s ❛r❡ st❛t✐♦♥❛r② ❛♥❞ ❞❡t❡r♠✐♥✐st✐❝✳ ❆ss✉♠❡ γ = 0.5 ✐♥ t❤✐s ❡①❛♠♣❧❡✳ ❋♦r ✐❧❧✉str❛t✐♦♥✱ ❧❡t ✉s ❛❣❛✐♥ ❛ss✉♠❡ t❤❛t t❤❡ r♦✈❡r ✐s ✐♥✐t✐❛❧❧② ✐♥ st❛t❡ S4✳ ❈♦♥s✐❞❡r t❤❡ ❝❛s❡ ✇❤❡♥ t❤❡ ❤♦r✐③♦♥ ✐s ✜♥✐t❡ ✿ H = 4✳ ❆ ❢❡✇ ♣♦ss✐❜❧❡ ❡♣✐s♦❞❡s ✐♥ t❤✐s ❝❛s❡ ✇✐t❤ t❤❡ r❡t✉r♥ G0 ✐♥ ❡❛❝❤ ❝❛s❡ ❛r❡ ❣✐✈❡♥ ❜❡❧♦✇✿ ✕ S4, S5, S6, S7, S7 : G0 = 0 + 0.5 ∗ 0 + 0.52 ∗ 0 + 0.53 ∗ 10 = 1.25 ✕ S4, S4, S5, S4, S5 : G0 = 0 + 0.5 ∗ 0 + 0.52 ∗ 0 + 0.53 ∗ 0 = 0 ✕ S4, S3, S2, S1, S2 : G0 = 0 + 0.5 ∗ 0 + 0.52 ∗ 0 + 0.53 ∗ 1 = 0.125

✸✳✸ ❈♦♠♣✉t✐♥❣ t❤❡ ✈❛❧✉❡ ❢✉♥❝t✐♦♥ ♦❢ ❛ ▼❛r❦♦✈ r❡✇❛r❞ ♣r♦❝❡ss

■♥ t❤✐s s❡❝t✐♦♥ ✇❡ ❣✐✈❡ t❤r❡❡ ❞✐✛❡r❡♥t ✇❛②s t♦ ❝♦♠♣✉t❡ t❤❡ ✈❛❧✉❡ ❢✉♥❝t✐♦♥ ♦❢ ❛ ▼❛r❦♦✈ r❡✇❛r❞ ♣r♦❝❡ss✿

  • ❙✐♠✉❧❛t✐♦♥
  • ❆♥❛❧②t✐❝ s♦❧✉t✐♦♥
  • ■t❡r❛t✐✈❡ s♦❧✉t✐♦♥

slide-6
SLIDE 6

❋✐❣✉r❡ ✷✿ ▼❛rs ❘♦✈❡r ▼❛r❦♦✈ r❡✇❛r❞ ♣r♦❝❡ss✳ ✸✳✸✳✶ ▼♦♥t❡ ❈❛r❧♦ s✐♠✉❧❛t✐♦♥ ❚❤❡ ✜rst ♠❡t❤♦❞ ✐♥✈♦❧✈❡s ❣❡♥❡r❛t✐♥❣ ❛ ❧❛r❣❡ ♥✉♠❜❡r ♦❢ ❡♣✐s♦❞❡s ✉s✐♥❣ t❤❡ tr❛♥s✐t✐♦♥ ♣r♦❜❛❜✐❧✐t② ♠♦❞❡❧ ❛♥❞ r❡✇❛r❞s ♦❢ t❤❡ ▼❛r❦♦✈ r❡✇❛r❞ ♣r♦❝❡ss✳ ❋♦r ❡❛❝❤ ❡♣✐s♦❞❡✱ t❤❡ r❡t✉r♥s ❝❛♥ ❜❡ ❝❛❧❝✉❧❛t❡❞ ✇❤✐❝❤ ❝❛♥ t❤❡♥ ❜❡ ❛✈❡r❛❣❡❞ t♦ ❣✐✈❡ t❤❡ ❛✈❡r❛❣❡ r❡t✉r♥s✳ ❈♦♥❝❡♥tr❛t✐♦♥ ✐♥❡q✉❛❧✐t✐❡s ❜♦✉♥❞ ❤♦✇ q✉✐❝❦❧② t❤❡ ❛✈❡r❛❣❡s ❝♦♥❝❡♥tr❛t❡ t♦ t❤❡ ♠❡❛♥ ✈❛❧✉❡✳ ❋♦r ❛ ▼❛r❦♦✈ r❡✇❛r❞ ♣r♦❝❡ss M = (S, P, R, γ)✱ st❛t❡ s✱ t✐♠❡ t✱ ❛♥❞ t❤❡ ♥✉♠❜❡r ♦❢ s✐♠✉❧❛t✐♦♥ ❡♣✐s♦❞❡s N✱ t❤❡ ♣s❡✉❞♦✲❝♦❞❡ ♦❢ t❤❡ s✐♠✉❧❛t✐♦♥ ❛❧❣♦r✐t❤♠ ✐s ❣✐✈❡♥ ✐♥ ❆❧❣♦r✐t❤♠ ✶✳ ❆❧❣♦r✐t❤♠ ✶ ▼♦♥t❡ ❈❛r❧♦ s✐♠✉❧❛t✐♦♥ t♦ ❝❛❧❝✉❧❛t❡ ▼❘P ✈❛❧✉❡ ❢✉♥❝t✐♦♥

✶✿ ♣r♦❝❡❞✉r❡ ▼♦♥t❡ ❈❛r❧♦ ❊✈❛❧✉❛t✐♦♥✭M, s, t, N✮ ✷✿

i ← 0

✸✿

Gt ← 0

✹✿

✇❤✐❧❡ i = N ❞♦

✺✿

  • ❡♥❡r❛t❡ ❛♥ ❡♣✐s♦❞❡✱ st❛rt✐♥❣ ❢r♦♠ st❛t❡ s ❛♥❞ t✐♠❡ t

✻✿

❯s✐♥❣ t❤❡ ❣❡♥❡r❛t❡❞ ❡♣✐s♦❞❡✱ ❝❛❧❝✉❧❛t❡ r❡t✉r♥ g ← H−1

i=t γi−tri

✼✿

Gt ← Gt + g

✽✿

i ← i + 1

✾✿

Vt(s) ← Gt/N

✶✵✿

r❡t✉r♥ Vt(s) ✸✳✸✳✷ ❆♥❛❧②t✐❝ s♦❧✉t✐♦♥ ❚❤✐s ♠❡t❤♦❞ ✇♦r❦s ♦♥❧② ❢♦r ❛♥ ✐♥✜♥✐t❡ ❤♦r✐③♦♥ ▼❛r❦♦✈ r❡✇❛r❞ ♣r♦❝❡ss❡s ✇✐t❤ γ < 1✳ ❯s✐♥❣ ✭✾✮✱ t❤❡ ❢❛❝t t❤❛t t❤❡ ❤♦r✐③♦♥ ✐s ✐♥✐❢♥✐t❡✱ ❛♥❞ ✉s✐♥❣ t❤❡ st❛t✐♦♥❛r② ▼❛r❦♦✈ ♣r♦♣❡rt② ✇❡ ❤❛✈❡ ❢♦r ❛♥② st❛t❡ s ∈ S✿ V (s)

(a)

= V0(s) = E[G0|s0 = s] = E ∞

  • i=0

γiri

  • s0 = s
  • = E[r0|s0 = s] +

  • i=1

γiE[ri|s0 = s]

(b)

= E[r0|s0 = s] +

  • i=1

γi

  • s′∈S

P(s1 = s′|s0 = s)E[ri|s0 = s, s1 = s′]

  • (c)

= E[r0|s0 = s] + γ

  • s′∈S

P(s′|s)E ∞

  • i=0

γiri

  • s0 = s′
  • (d)

= R(s) + γ

  • s′∈S

P(s′|s)V (s′) , ✭✶✶✮ ✻

slide-7
SLIDE 7

✇❤❡r❡ ✭❛✮ ❢♦❧❧♦✇s ❢r♦♠ ✭✽✮✱ ✭✾✮✱ ❛♥❞ ✭✶✵✮✱ ✭❜✮ ❢♦❧❧♦✇s ❜② t❤❡ ❧❛✇ ♦❢ t♦t❛❧ ❡①♣❡❝t❛t✐♦♥✱ ✭❝✮ ❢♦❧❧♦✇s ❢r♦♠ t❤❡ ▼❛r❦♦✈ ♣r♦♣❡rt② ❛♥❞ ❞✉❡ t♦ st❛t✐♦♥❛r✐t②✱ ❛♥❞ ✭❞✮ ❢♦❧❧♦✇s ❢r♦♠ ✭✹✮✳ ❚❤❡r❡ ✐s ❛ ♥✐❝❡ ✐♥t❡r♣r❡t❛t✐♦♥ ♦❢ t❤❡ ✜♥❛❧ r❡s✉❧t ♦❢ ✭✶✶✮✱ ♥❛♠❡❧② t❤❛t t❤❡ ✜rst t❡r♠ R(s) ✐s t❤❡ ✐♠♠❡❞✐❛t❡ r❡✇❛r❞ ✇❤✐❧❡ t❤❡ s❡❝♦♥❞ t❡r♠ γ

s′∈S P(s′|s)V (s′) ✐s t❤❡ ❞✐s❝♦✉♥t❡❞ s✉♠ ♦❢ ❢✉t✉r❡ r❡✇❛r❞s✳ ❚❤❡ ✈❛❧✉❡ ❢✉♥❝t✐♦♥ V (s) ✐s t❤❡

s✉♠ ♦❢ t❤❡s❡ t✇♦ q✉❛♥t✐t✐❡s✳ ❆s |S| < ∞✱ ✐t ✐s ♣♦ss✐❜❧❡ t♦ ✇r✐t❡ t❤✐s ❡q✉❛t✐♦♥ ✐♥ ♠❛tr✐① ❢♦r♠ ❛s✿ V = R + γPV , ✭✶✷✮ ✇❤❡r❡ P ✐s t❤❡ tr❛♥s✐t✐♦♥ ♣r♦❜❛❜✐❧✐t② ♠❛tr✐① ✐♥tr♦❞✉❝❡❞ ❡❛r❧✐❡r✱ ❛♥❞ R ❛♥❞ V ❛r❡ ❝♦❧✉♠♥ ✈❡❝t♦rs ♦❢ ❞✐♠❡♥s✐♦♥ |S| ❢♦r♠❡❞ ❜② st❛❝❦✐♥❣ ❛❧❧ t❤❡ ✈❛❧✉❡s R(s) ❛♥❞ V (s) r❡s♣❡❝t✐✈❡❧②✱ ❢♦r ❛❧❧ s ∈ S✳ ❊q✉❛t✐♦♥ ✭✶✷✮ ❝❛♥ ❜❡ r❡❛rr❛♥❣❡❞ t♦ ❣✐✈❡ (I − γP)V = R✱ ✇❤✐❝❤ ❤❛s ❛♥ ❛♥❛❧②t✐❝❛❧ s♦❧✉t✐♦♥ V = (I − γP)−1R✳ ◆♦t✐❝❡ t❤❛t ❛s γ < 1 ❛♥❞ P ✐s r♦✇✲st♦❝❤❛st✐❝✱ (I − γP) ✐s ♥♦♥✲s✐♥❣✉❧❛r ❛♥❞ ❤❡♥❝❡ ❝❛♥ ❜❡ ✐♥✈❡rt❡❞✳ ❚❤✉s ✭✶✷✮ ❛❧✇❛②s ❤❛s ❛ s♦❧✉t✐♦♥ ❛♥❞ t❤❡ s♦❧✉t✐♦♥ ✐s ✉♥✐q✉❡✳ ❍♦✇❡✈❡r✱ t❤❡ ❝♦♠♣✉t❛t✐♦♥❛❧ ❝♦st ♦❢ t❤❡ ❛♥❛❧②t✐❝❛❧ ♠❡t❤♦❞ ✐s O(|S|3)✱ ❛s ✐t ✐♥✈♦❧✈❡s ❛ ♠❛tr✐① ✐♥✈❡rs❡ ❛♥❞ ❤❡♥❝❡ ✐t ✐s ❝♦♠♣❧❡t❡❧② ✉♥s✉✐t❛❜❧❡ ❢♦r ❝❛s❡s ✇❤❡r❡ t❤❡ st❛t❡ s♣❛❝❡ ✐s ✈❡r② ❧❛r❣❡✳ ❊①❡r❝✐s❡ ✸✳✽✳ ❈♦♥s✐❞❡r t❤❡ ♠❛tr✐① (I − γP)✳ ✭❛✮ ❙❤♦✇ t❤❛t 1 − γ ✐s ❛♥ ❡✐❣❡♥✈❛❧✉❡ ♦❢ t❤✐s ♠❛tr✐①✱ ❛♥❞ ✜♥❞ ❛ ❝♦rr❡s♣♦♥❞✐♥❣ ❡✐❣❡♥✈❡❝t♦r✳ ✭❜✮ ❋♦r 0 < γ < 1✱ ✉s❡ t❤❡ r❡s✉❧t ♦❢ ❊①❡r❝✐s❡ ✸✳✶ t♦ ❝♦♥❝❧✉❞❡ t❤❛t (I − γP) ✐s ♥♦♥✲s✐♥❣✉❧❛r✱ ❛♥❞ t❤✉s ✐♥✈❡rt✐❜❧❡✳ ❊①❡r❝✐s❡ ✸✳✾✳ ❈♦♥s✐❞❡r t❤❡ ▼❛r❦♦✈ r❡✇❛r❞ ♣r♦❝❡ss ✐♥tr♦❞✉❝❡❞ ✐♥ t❤❡ ❡①❛♠♣❧❡ ✐♥ s❡❝t✐♦♥ ✸✳✷✳✹✳ ✭❛✮ ■❢ t❤❡ ❤♦r✐③♦♥ H ✐s ✐♥✜♥✐t❡✱ ❝❛❧❝✉❧❛t❡ t❤❡ ✈❛❧✉❡ ❢✉♥❝t✐♦♥ ❢♦r ❛❧❧ t❤❡ st❛t❡s✳ ✸✳✸✳✸ ■t❡r❛t✐✈❡ s♦❧✉t✐♦♥ ❲❡ ♥♦✇ ❣✐✈❡ ❛♥ ✐t❡r❛t✐✈❡ s♦❧✉t✐♦♥ t♦ ❡✈❛❧✉❛t❡ t❤❡ ✈❛❧✉❡ ❢✉♥❝t✐♦♥ ✐♥ t❤❡ ✐♥✜♥✐t❡ ❤♦r✐③♦♥ ❝❛s❡ ✭✇✐t❤ γ < 1✮ ❛♥❞ ❛ ❞②♥❛♠✐❝ ♣r♦❣r❛♠♠✐♥❣ ❜❛s❡❞ s♦❧✉t✐♦♥ ❢♦r t❤❡ ✜♥✐t❡ ❤♦r✐③♦♥ ❝❛s❡✳ ❚❤❡ s✉r♣r✐s✐♥❣ t❤✐♥❣ ✐s t❤❛t ❜♦t❤ t❤❡ ❛❧❣♦r✐t❤♠s ❧♦♦❦ s✉r♣r✐s✐♥❣❧② s✐♠✐❧❛r✱ t♦ t❤❡ ♣♦✐♥t t❤❛t ✐t ✐s ❤❛r❞ t♦ t❡❧❧ t❤❡ ❞✐✛❡r❡♥❝❡✳ ❲❡ ✜rst ❝♦♥s✐❞❡r t❤❡ ✜♥✐t❡ ❤♦r✐③♦♥ ❝❛s❡✳ ■t ✐s ❡❛s② t♦ ♣r♦✈❡ ✭❜② ❢♦❧❧♦✇✐♥❣ ❛❧♠♦st ❡①❛❝t❧② t❤❡ s❛♠❡ ♣r♦♦❢ ♦❢ ✭✶✶✮✮ t❤❛t t❤❡ ❛♥❛❧♦❣ ♦❢ ❡q✉❛t✐♦♥ ✭✶✶✮ ✐♥ t❤❡ ✜♥✐t❡ ❤♦r✐③♦♥ ❝❛s❡ ✐s ❣✐✈❡♥ ❜②✿ Vt(s) = R(s) + γ

  • s′∈S

P(s′|s)Vt+1(s′) , ∀ t = 0, . . . , H − 1, VH(s) = 0 . ✭✶✸✮ ❊①❡r❝✐s❡ ✸✳✶✵✳ Pr♦✈❡ ❡q✉❛t✐♦♥s ✭✶✸✮ ❢♦r ❛ ✜♥✐t❡ ❤♦r✐③♦♥ ▼❛r❦♦✈ r❡✇❛r❞ ♣r♦❝❡ss✳ ❚❤❡s❡ ❡q✉❛t✐♦♥s ✐♠♠❡❞✐❛t❡❧② ❧❡♥❞ t❤❡♠s❡❧✈❡s t♦ ❛ ❞②♥❛♠✐❝ ♣r♦❣r❛♠♠✐♥❣ s♦❧✉t✐♦♥ ✇❤♦s❡ ♣s❡✉❞♦✲❝♦❞❡ ✐s ♦✉t❧✐♥❡❞ ✐♥ ❆❧❣♦r✐t❤♠ ✷✳ ❚❤❡ ❛❧❣♦r✐t❤♠ t❛❦❡s ❛s ✐♥♣✉t ❛ ✜♥✐t❡ ❤♦r✐③♦♥ ▼❛r❦♦✈ r❡✇❛r❞ ♣r♦❝❡ss M = (S, P, R, γ)✱ ❛♥❞ ❝♦♠♣✉t❡s t❤❡ ✈❛❧✉❡ ❢✉♥❝t✐♦♥ ❢♦r ❛❧❧ st❛t❡s ❛♥❞ ❛t ❛❧❧ t✐♠❡s✳ ❆❧❣♦r✐t❤♠ ✷ ❉②♥❛♠✐❝ ♣r♦❣r❛♠♠✐♥❣ ❛❧❣♦r✐t❤♠ t♦ ❝❛❧❝✉❧❛t❡ ✜♥✐t❡ ▼❘P ✈❛❧✉❡ ❢✉♥❝t✐♦♥

✶✿ ♣r♦❝❡❞✉r❡ ❉②♥❛♠✐❝ Pr♦❣r❛♠♠✐♥❣ ❱❛❧✉❡ ❋✉♥❝t✐♦♥ ❊✈❛❧✉❛t✐♦♥✭M✮ ✷✿

❋♦r ❛❧❧ st❛t❡s s ∈ S✱ VH(s) ← 0

✸✿

t ← H − 1

✹✿

✇❤✐❧❡ t ≥ 0 ❞♦

✺✿

❋♦r ❛❧❧ st❛t❡s s ∈ S✱ Vt(s) = R(s) + γ

s′∈S P(s′|s)Vt+1(s′)

✻✿

t ← t − 1

✼✿

r❡t✉r♥ Vt(s) ❢♦r ❛❧❧ s ∈ S ❛♥❞ t = 0, . . . , H ▲❡t ✉s ♥♦✇ ❧♦♦❦ ❛t t❤❡ ✐t❡r❛t✐✈❡ ❛❧❣♦r✐t❤♠ ❢♦r t❤❡ ✐♥✜♥✐t❡ ❤♦r✐③♦♥ ❝❛s❡ ✇✐t❤ γ < 1✳ ❚❤❡ ♣s❡✉❞♦✲❝♦❞❡ ❢♦r t❤✐s ❛❧❣♦r✐t❤♠ ✐s ♣r❡s❡♥t❡❞ ✐♥ ❆❧❣♦r✐t❤♠ ✸✳ ❚❤❡ ❛❧❣♦r✐t❤♠ t❛❦❡s ❛s ✐♥♣✉t ❛ ▼❛r❦♦✈ r❡✇❛r❞ ♣r♦❝❡ss M = (S, P, R, γ)✱ ❛♥❞ ❛ t♦❧❡r❛♥❝❡ ǫ✱ ❛♥❞ ❝♦♠♣✉t❡s t❤❡ ✈❛❧✉❡ ❢✉♥❝t✐♦♥ ❢♦r ❛❧❧ st❛t❡s✳ ✼

slide-8
SLIDE 8

❆❧❣♦r✐t❤♠ ✸ ■t❡r❛t✐✈❡ ❛❧❣♦r✐t❤♠ t♦ ❝❛❧❝✉❧❛t❡ ▼❘P ✈❛❧✉❡ ❢✉♥❝t✐♦♥

✶✿ ♣r♦❝❡❞✉r❡ ■t❡r❛t✐✈❡ ❱❛❧✉❡ ❋✉♥❝t✐♦♥ ❊✈❛❧✉❛t✐♦♥✭M, ǫ✮ ✷✿

❋♦r ❛❧❧ st❛t❡s s ∈ S✱ V ′(s) ← 0✱ V (s) ← ∞

✸✿

✇❤✐❧❡ ||V − V ′||∞ > ǫ ❞♦

✹✿

V ← V ′

✺✿

❋♦r ❛❧❧ st❛t❡s s ∈ S✱ V ′(s) = R(s) + γ

s′∈S P(s′|s)V (s′)

✻✿

r❡t✉r♥ V ′(s) ❢♦r ❛❧❧ s ∈ S ❋♦r ❜♦t❤ t❤❡s❡ ❛❧❣♦r✐t❤♠s ✷ ❛♥❞ ✸✱ t❤❡ ❝♦♠♣✉t❛t✐♦♥❛❧ ❝♦st ♦❢ ❡❛❝❤ ❧♦♦♣ ✐s O(|S|2)✳ ❚❤✐s ✐s ❛♥ ✐♠♣r♦✈❡✲ ♠❡♥t ♦✈❡r t❤❡ O(|S|3) ❝♦st ♦❢ t❤❡ ❛♥❛❧②t✐❝❛❧ ♠❡t❤♦❞ ✐♥ t❤❡ ✐♥✐✜♥✐t❡ ❤♦r✐③♦♥ ❝❛s❡✱ ❤♦✇❡✈❡r ♦♥❡ ♠❛② ♥❡❡❞ q✉✐t❡ ❛ ❢❡✇ ✐t❡r❛t✐♦♥s t♦ ❝♦♥✈❡r❣❡ ❞❡♣❡♥❞✐♥❣ ♦♥ t❤❡ t♦❧❡r❛♥❝❡ ❧❡✈❡❧ ǫ✳ ❲❤✐❧❡ t❤❡ ♣r♦♦❢ ♦❢ ❝♦rr❡❝t♥❡ss ♦❢ ❛❧❣♦r✐t❤♠ ✷ ✐♥ t❤❡ ✜♥✐t❡ ❤♦r✐③♦♥ ❝❛s❡ ✐s ♦❜✈✐♦✉s✱ ❢♦r t❤❡ ✐♥✜♥✐t❡ ❤♦r✐③♦♥ ❝❛s❡ ✐t ✐s ♥♦t s♦ ❝❧❡❛r ✐❢ ❛❧❣♦r✐t❤♠ ✸ ❛❧✇❛②s ❝♦♥✈❡r❣❡s✱ ❛♥❞ ✐❢ ✐t ❞♦❡s ✇❤❡t❤❡r ✐t ❝♦♥✈❡r❣❡s t♦ t❤❡ ❝♦rr❡❝t s♦❧✉t✐♦♥ (I − γP)−1R✳ ❚❤❡ ❛♥s✇❡rs t♦ ❜♦t❤ t❤❡s❡ q✉❡st✐♦♥s ❛r❡ ❛✣r♠❛t✐✈❡ ❛s ✐s s❤♦✇♥ ❜② t❤❡ ❢♦❧❧♦✇✐♥❣ t❤❡♦r❡♠✳ ❚❤❡♦r❡♠ ✸✳✶✳ ❆❧❣♦r✐t❤♠ ✸ ❛❧✇❛②s t❡r♠✐♥❛t❡s✳ ▼♦r❡♦✈❡r✱ ✐❢ t❤❡ ♦✉t♣✉t ♦❢ t❤❡ ❛❧❣♦r✐t❤♠ ✐s V ′ ❛♥❞ ✇❡ ❞❡♥♦t❡ t❤❡ tr✉❡ s♦❧✉t✐♦♥ ❛s V = (I − γP)−1R✱ t❤❡♥ ✇❡ ❤❛✈❡ t❤❡ ❡rr♦r ❡st✐♠❛t❡ ||V ′ − V ||∞ ≤

ǫγ 1−γ ✳

Pr♦♦❢✳ ❲❡ ❝♦♥s✐❞❡r t❤❡ ✈❡❝t♦r s♣❛❝❡ R|S| ❡q✉✐♣♣❡❞ ✇✐t❤ t❤❡ || · ||∞ ♥♦r♠ ✭s❡❡ ❊①❡r❝✐s❡ ✸✳✷✮✱ ❛♥❞ r❡❝❛❧❧ t❤❛t R|S| s♦ ❝♦♥str✉❝t❡❞ ✐s ❛ ❇❛♥❛❝❤ s♣❛❝❡ ✭s❡❡ ❙❡❝t✐♦♥ ❆ ❢♦r ❛ ❞✐s❝✉ss✐♦♥ ♦♥ ♥♦r♠❡❞ ✈❡❝t♦r s♣❛❝❡s✮✳ ❲❡ st❛rt ❜② ♥♦t✐❝✐♥❣ t❤❛t ❜♦t❤ V ❛♥❞ ❛❧❧ t❤❡ ✐t❡r❛t❡s ♦❢ ❛❧❣♦r✐t❤♠ ✸ ❛r❡ ❡❧❡♠❡♥ts ♦❢ R|S|✳ ❉❡✜♥❡ t❤❡ ♦♣❡r❛t♦r B : R|S| → R|S| ✭❛❧s♦ ❦♥♦✇♥ ❛s t❤❡ ✏❇❡❧❧♠❛♥ ❜❛❝❦✉♣✑ ♦♣❡r❛t♦r✮ t❤❛t ❛❝ts ♦♥ ❛♥ ❡❧❡♠❡♥t U ∈ R|S| ❛s ❢♦❧❧♦✇s (BU)(s) = R(s) + γ

  • s′∈S

P(s′|s)U(s′) , ∀ s ∈ S, ✭✶✹✮ ✇❤✐❝❤ ❝❛♥ ❜❡ ✇r✐tt❡♥ ✐♥ ❝♦♠♣❛❝t ♠❛tr✐①✲✈❡❝t♦r ♥♦t❛t✐♦♥ ❛s BU = R + γPU . ✭✶✺✮ ❲❡ ✜rst ♣r♦✈❡ t❤❛t t❤❡ ♦♣❡r❛t♦r B ✐s ❛ str✐❝t ❝♦♥tr❛❝t✐♦♥ ✭❞❡✜♥❡❞ ✐♥ ❉❡✜♥✐t✐♦♥ ❆✳✸✮✳ ❋♦r ❡✈❡r② U1, U2 ∈ R|S|✱ ✉s✐♥❣ ✭✶✺✮ ✇❡ ❤❛✈❡ ||BU1 − BU2||∞ = γ||PU1 − PU2||∞ = γ||P(U1 − U2)||∞ ≤ γ||P||∞||U1 − U2||∞ = γ||U1 − U2||∞ , ✭✶✻✮ ✇❤❡r❡ t❤❡ s❡❝♦♥❞ st❡♣ ❢♦❧❧♦✇s ❜② ❊①❡r❝✐s❡ ✸✳✷✱ ❛♥❞ t❤✉s ❛s 0 < γ < 1✱ ✇❡ ❝♦♥❝❧✉❞❡ t❤❛t B ✐s ❛ str✐❝t ❝♦♥tr❛❝t✐♦♥ ♦♥ R|S|✳ ❚❤✉s ❜② t❤❡ ❝♦♥tr❛❝t✐♦♥ ♠❛♣♣✐♥❣ t❤❡♦r❡♠ ✭❚❤❡♦r❡♠ ❆✳✺✮✱ ✇❡ ❝♦♥❝❧✉❞❡ t❤❛t B ❤❛s ❛ ✉♥✐q✉❡ ✜①❡❞ ♣♦✐♥t✳ ❋r♦♠ ✭✶✺✮ ❛♥❞ ✭✶✷✮ ✐t ❛❧s♦ ❢♦❧❧♦✇s t❤❛t BV = R + γPV = V ✱ ❛♥❞ ❤❡♥❝❡ V ✐s ❛ ✜①❡❞ ♣♦✐♥t ♦❢ B✱ ❛♥❞ ❤❡♥❝❡ ❜② ✉♥✐q✉❡♥❡ss ✐t ♠✉st ❛❧s♦ ❜❡ t❤❡ ♦♥❧② ✜①❡❞ ♣♦✐♥t✳ ❲❡ ♥❡①t ❝♦♥s✐❞❡r t❤❡ ✐t❡r❛t❡s ♣r♦❞✉❝❡❞ ❜② ❛❧❣♦r✐t❤♠ ✸ ✭✐❢ ✐t ✐s ♥♦t ❛❧❧♦✇❡❞ t♦ t❡r♠✐♥❛t❡✮ ❛♥❞ ❞❡♥♦t❡ t❤❡♠ ❜② {Vk}k≥1✳ ◆♦t✐❝❡ t❤❛t t❤❡s❡ ✐t❡r❛t❡s s❛t✐s❢② t❤❡ ❢♦❧❧♦✇✐♥❣ r❡❧❛t✐♦♥s Vk =

  • ✐❢ k = 1,

BVk−1 ✐❢ k > 1 . ✭✶✼✮ ❇② ❚❤❡♦r❡♠ ❆✳✺✱ ✇❡ ❢✉rt❤❡r ❝♦♥❝❧✉❞❡ t❤❛t {Vk}k≥1 ✐s ❛ ❈❛✉❝❤② s❡q✉❡♥❝❡✱ ❛♥❞ ❤❡♥❝❡ ❜② ❉❡✜♥✐t✐♦♥ ❆✳✶ ✇❡ ❝♦♥❝❧✉❞❡ t❤❛t ∃ N ≥ 1✱ s✉❝❤ t❤❛t ||Vm − Vn||∞ < ǫ ❢♦r ❛❧❧ m, n > N✳ ❚❤✐s ❝♦♠♣❧❡t❡s t❤❡ ♣r♦♦❢ t❤❛t ❛❧❣♦r✐t❤♠ ✸ t❡r♠✐♥❛t❡s✳ ◆♦t✐❝❡ t❤❛t t❤❡ ❝♦♥tr❛❝t✐♦♥ ♠❛♣♣✐♥❣ t❤❡♦r❡♠ ✭❚❤❡♦r❡♠ ❆✳✺✮ ❛❧s♦ ✐♠♣❧✐❡s t❤❛t Vk → V ✭s❡❡ ❉❡✜♥✐t✐♦♥ ❆✳✷ ❢♦r ❡①❛❝t ♥♦t✐♦♥ ♦❢ ❝♦♥✈❡r❣❡♥❝❡✮✳ ✽

slide-9
SLIDE 9

❚♦ ♣r♦✈❡ t❤❡ ❡rr♦r ❜♦✉♥❞ ✇❤❡♥ t❤❡ ❛❧❣♦r✐t❤♠ t❡r♠✐♥❛t❡s✱ ❧❡t t❤❡ ❛❧❣♦r✐t❤♠ t❡r♠✐♥❛t❡ ❛❢t❡r k ✐t❡r❛t✐♦♥s✱ ❛♥❞ s♦ t❤❡ ❧❛st ✐t❡r❛t❡ ✐s Vk+1✳ ❲❡ t❤❡♥ ❤❛✈❡ ||Vk+1 − Vk||∞ ≤ ǫ✳ ❚❤❡♥ ✉s✐♥❣ t❤❡ tr✐❛♥❣❧❡ ✐♥❡q✉❛❧✐t② ❛♥❞ t❤❡ ❢❛❝t t❤❛t Vk+1 = BVk ✇❡ ❣❡t✱ ||Vk − V ||∞ ≤ ||Vk − Vk+1||∞ + ||Vk+1 − V ||∞ = ||Vk − Vk+1||∞ + ||BVk − BV ||∞ ≤ ||Vk − Vk+1||∞ + γ||Vk − V ||∞ = ǫ + γ||Vk − V ||∞ , ✭✶✽✮ ❛♥❞ s♦ ||Vk − V ||∞ ≤

ǫ 1−γ ✳ ❚❤✐s ✜♥❛❧❧② ❛❧❧♦✇s ✉s t♦ ❝♦♥❝❧✉❞❡ t❤❛t

||Vk+1 − V ||∞ = ||BVk − BV ||∞ ≤ γ||Vk − V ||∞ ≤ ǫγ 1 − γ . ✭✶✾✮ ❊①❡r❝✐s❡ ✸✳✶✶✳ ❙✉♣♣♦s❡ t❤❛t ✐♥ ❛❧❣♦r✐t❤♠ ✸✱ t❤❡ ✐♥✐t✐❛❧✐③❛t✐♦♥ st❡♣ ✐s ❝❤❛♥❣❡❞ s♦ V ′ ✐s s❡t r❛♥❞♦♠❧② ✭❛❧❧ ❡♥tr✐❡s ✜♥✐t❡✮✱ ✐♥st❡❛❞ ♦❢ V ′ ← 0✳ ✭❛✮ ❲✐❧❧ t❤❡ ❛❧❣♦r✐t❤♠ st✐❧❧ ❝♦♥✈❡r❣❡❄ ✭❜✮ ❉♦❡s t❤❡ ❛❧❣♦r✐t❤♠ st✐❧❧ r❡t❛✐♥ t❤❡ s❛♠❡ ❡rr♦r ❡st✐♠❛t❡ ♦❢ ❚❤❡♦r❡♠ ✸✳✶ ❄ ❊①❡r❝✐s❡ ✸✳✶✷✳ ❙✉♣♣♦s❡ t❤❡ ❛ss✉♠♣t✐♦♥s ♦❢ ❚❤❡♦r❡♠ ✸✳✶ ❤♦❧❞✳ ❯s✐♥❣ t❤❡ s❛♠❡ ♥♦t❛t✐♦♥s ❛s ✐♥ t❤❡ t❤❡♦r❡♠ ♣r♦✈❡ t❤❡ ❢♦❧❧♦✇✐♥❣✿ ✭❛✮ ❋♦r ❛❧❧ k ≥ 1✱ ||Vk − V ||∞ ≤ γk−1||V ||∞ ✳ ✭❜✮ ||V2||∞ ≤ (1 + γ)||V ||∞ ✳ ✭❝✮ ❋♦r ❛❧❧ m, n ≥ 1✱ ||Vm − Vn||∞ ≤ (γm−1 + γn−1)||V ||∞ ✳

✸✳✹ ▼❛r❦♦✈ ❞❡❝✐s✐♦♥ ♣r♦❝❡ss

❲❡ ❛r❡ ♥♦✇ ✐♥ ❛ ♣♦s✐t✐♦♥ t♦ ❞❡✜♥❡ ❛ ▼❛r❦♦✈ ❞❡❝✐s✐♦♥ ♣r♦❝❡ss ✭▼❉P✮✳ ❆ ▼❉P ✐♥❤❡r✐ts t❤❡ ❜❛s✐❝ str✉❝✲ t✉r❡ ♦❢ ❛ ▼❛r❦♦✈ r❡✇❛r❞ ♣r♦❝❡ss ✇✐t❤ s♦♠❡ ✐♠♣♦rt❛♥t ❦❡② ❞✐✛❡r❡♥❝❡s✱ t♦❣❡t❤❡r ✇✐t❤ t❤❡ s♣❡❝✐✜❝❛t✐♦♥ ♦❢ ❛ s❡t ♦❢ ❛❝t✐♦♥s t❤❛t ❛♥ ❛❣❡♥t ❝❛♥ t❛❦❡ ❢r♦♠ ❡❛❝❤ st❛t❡✳ ■t ✐s ❢♦r♠❛❧❧② r❡♣r❡s❡♥t❡❞ ✉s✐♥❣ t❤❡ t✉♣❧❡ (S, A, P, R, γ) ✇❤✐❝❤ ❛r❡ ❧✐st❡❞ ❜❡❧♦✇✿

  • S ✿ ❆ ✜♥✐t❡ st❛t❡ s♣❛❝❡✳
  • A ✿ ❆ ✜♥✐t❡ s❡t ♦❢ ❛❝t✐♦♥s ✇❤✐❝❤ ❛r❡ ❛✈❛✐❧❛❜❧❡ ❢r♦♠ ❡❛❝❤ st❛t❡ s✳
  • P ✿ ❆ tr❛♥s✐t✐♦♥ ♣r♦❜❛❜✐❧✐t② ♠♦❞❡❧ t❤❛t s♣❡❝✐✜❡s P(s′|s, a)✳
  • R ✿ ❆ r❡✇❛r❞ ❢✉♥❝t✐♦♥ t❤❛t ♠❛♣s ❛ st❛t❡✲❛❝t✐♦♥ ♣❛✐r t♦ r❡✇❛r❞s ✭r❡❛❧ ♥✉♠❜❡rs✮✱ ✐✳❡✳ R : S×A → R✳
  • γ✿ ❉✐s❝♦✉♥t ❢❛❝t♦r γ ∈ [0, 1]✳

❙♦♠❡ ♦❢ t❤❡s❡ q✉❛♥t✐t✐❡s ❤❛✈❡ ❜❡❡♥ ❡①♣❧❛✐♥❡❞ ✐♥ t❤❡ ❝♦♥t❡①t ♦❢ ❛ ▼❛r❦♦✈ r❡✇❛r❞ ♣r♦❝❡ss✳ ❍♦✇❡✈❡r ✐♥ t❤❡ ❝♦♥t❡①t ♦❢ ❛ ▼❉P✱ t❤❡r❡ ❛r❡ ✐♠♣♦rt❛♥t ❞✐✛❡r❡♥❝❡s t❤❛t ✇❡ ♥❡❡❞ t♦ ♠❡♥t✐♦♥✳ ❚❤❡ ❜❛s✐❝ ♠♦❞❡❧ ♦❢ t❤❡ ❞②♥❛♠✐❝s ✐s t❤❛t t❤❡r❡ ✐s ❛ st❛t❡ s♣❛❝❡ S✱ ❛♥❞ ❛♥ ❛❝t✐♦♥ s♣❛❝❡ A✱ ❜♦t❤ ♦❢ ✇❤✐❝❤ ✇❡ ✇✐❧❧ ❝♦♥s✐❞❡r t♦ ❜❡ ✜♥✐t❡✳ ❚❤❡ ❛❣❡♥t st❛rts ❢r♦♠ ❛ st❛t❡ si ❛t t✐♠❡ i✱ ❝❤♦♦s❡s ❛♥ ❛❝t✐♦♥ ai ❢r♦♠ t❤❡ ❛❝t✐♦♥ s♣❛❝❡✱ ♦❜t❛✐♥s ❛ r❡✇❛r❞ ri ❛♥❞ t❤❡♥ r❡❛❝❤❡s ❛ s✉❝❝❡ss♦r st❛t❡ si+1✳ ❆♥ ❡♣✐s♦❞❡ ♦❢ ❛ ▼❉P ✐s t❤✉s r❡♣r❡s❡♥t❡❞ ❛s (s0, a0, r0, s1, a1, r1, s2, a2, r2, . . . )✳ ❯♥❧✐❦❡ ✐♥ t❤❡ ❝❛s❡ ♦❢ ❛ ▼❛r❦♦✈ ♣r♦❝❡ss ♦r ❛ ▼❛r❦♦✈ r❡✇❛r❞ ♣r♦❝❡ss ✇❤❡r❡ t❤❡ tr❛♥s✐t✐♦♥ ♣r♦❜❛❜✐❧✐t② ✇❛s ♦♥❧② ❛ ❢✉♥❝t✐♦♥ ♦❢ t❤❡ s✉❝❝❡ss♦r st❛t❡ ❛♥❞ t❤❡ ❝✉rr❡♥t st❛t❡✱ ✐♥ t❤❡ ❝❛s❡ ♦❢ ❛ ▼❉P t❤❡ tr❛♥s✐t✐♦♥ ♣r♦❜❛❜✐❧✐t✐❡s ❛t t✐♠❡ i ❛r❡ ❛ ❢✉♥❝t✐♦♥ ♦❢ t❤❡ s✉❝❝❡ss♦r st❛t❡ si+1 ❛❧♦♥❣ ✇✐t❤ ❜♦t❤ t❤❡ ❝✉rr❡♥t st❛t❡ si ❛♥❞ t❤❡ ❛❝t✐♦♥ ai✱ ✇r✐tt❡♥ ❛s P(si+1|si, ai)✳ ❲❡ st✐❧❧ ❛ss✉♠❡ t❤❡ ♣r✐♥❝✐♣❧❡ ♦❢ st❛t✐♦♥❛r② tr❛♥s✐t✐♦♥ ♣r♦❜❛❜✐❧✐t✐❡s ✇❤✐❝❤ ✐♥ t❤❡ ❝♦♥t❡①t ♦❢ ❛ ▼❉P ✐s ✇r✐tt❡♥ ♠❛t❤❡♠❛t✐❝❛❧❧② ❛s P(si = s′|si−1 = s, ai−1 = a) = P(sj = s′|sj−1 = s, aj−1 = a), ✭✷✵✮ ✾

slide-10
SLIDE 10

❢♦r ❛❧❧ s, s′ ∈ S✱ ❢♦r ❛❧❧ a ∈ A✱ ❛♥❞ ❢♦r ❛❧❧ i, j = 1, 2, . . . ✳ ❚❤❡ r❡✇❛r❞ ri ❛t t✐♠❡ i ❞❡♣❡♥❞s ♦♥ ❜♦t❤ si ❛♥❞ ai ✐♥ t❤❡ ❝❛s❡ ♦❢ ❛ ▼❉P✱ ✐♥ ❝♦♥tr❛st t♦ ❛ ▼❛r❦♦✈ r❡✇❛r❞ ♣r♦❝❡ss ✇❤❡r❡ ✐t ❞❡♣❡♥❞❡❞ ♦♥❧② ♦♥ t❤❡ ❝✉rr❡♥t st❛t❡✳ ❚❤❡s❡ r❡✇❛r❞s ❝❛♥ ❜❡ st♦❝❤❛st✐❝ ♦r ❞❡t❡r♠✐♥✐st✐❝✱ ❜✉t ❥✉st ❧✐❦❡ ✐♥ t❤❡ ❝❛s❡ ♦❢ ❛ ▼❛r❦♦✈ r❡✇❛r❞ ♣r♦❝❡ss✱ ✇❡ ✇✐❧❧ ❛ss✉♠❡ t❤❛t t❤❡ r❡✇❛r❞s ❛r❡ st❛t✐♦♥❛r② ❛♥❞ t❤❡ ♦♥❧② r❡❧❡✈❛♥t q✉❛♥t✐t② ✇✐❧❧ ❜❡ t❤❡ ❡①♣❡❝t❡❞ r❡✇❛r❞ ✇❤✐❝❤ ✇❡ ✇✐❧❧ ❞❡♥♦t❡ ❜② R(s, a) ❢♦r ❛ ✜①❡❞ st❛t❡ s ❛♥❞ ❛❝t✐♦♥ a✱ ❛♥❞ ❞❡✜♥❡❞ ❜❡❧♦✇✿ R(s, a) = E[ri|si = s, ai = a] , ∀ i = 0, 1, . . . . ✭✷✶✮ ❚❤❡ ♥♦t✐♦♥s ♦❢ t❤❡ ❞✐s❝♦✉♥t ❢❛❝t♦r γ✱ ❤♦r✐③♦♥ H ❛♥❞ r❡t✉r♥ Gt ❢♦r ❛ ▼❉P ❛r❡ ❡①❛❝t❧② ❡q✉✐✈❛❧❡♥t t♦ t❤♦s❡ ✐♥ t❤❡ ❝❛s❡ ♦❢ ❛ ▼❛r❦♦✈ r❡✇❛r❞ ♣r♦❝❡ss✳ ❍♦✇❡✈❡r t❤❡ ♥♦t✐♦♥ ♦❢ ❛ st❛t❡ ✈❛❧✉❡ ❢✉♥❝t✐♦♥ ✐s s❧✐❣❤t❧② ♠♦❞✐✜❡❞ ❢♦r ❛ ▼❉P ❛s ❡①♣❧❛✐♥❡❞ ♥❡①t✳ ✸✳✹✳✶ ▼❉P ♣♦❧✐❝✐❡s ❛♥❞ ♣♦❧✐❝② ❡✈❛❧✉❛t✐♦♥

  • ✐✈❡♥ ❛ ▼❉P✱ ❛ ♣♦❧✐❝② ❢♦r t❤❡ ▼❉P s♣❡❝✐✜❡s ✇❤❛t ❛❝t✐♦♥ t♦ t❛❦❡ ✐♥ ❡❛❝❤ st❛t❡✳ ❆ ♣♦❧✐❝② ❝❛♥ ❡✐t❤❡r

❜❡ ❞❡t❡r♠✐♥✐st✐❝ ♦r st♦❝❤❛st✐❝✳ ❚♦ ❝♦✈❡r ❜♦t❤ t❤❡s❡ ❝❛s❡s✱ ✇❡ ✇✐❧❧ ❝♦♥s✐❞❡r ❛ ♣♦❧✐❝② t♦ ❜❡ ❛ ♣r♦❜❛❜✐❧✐t② ❞✐str✐❜✉t✐♦♥ ♦✈❡r ❛❝t✐♦♥s ❣✐✈❡♥ t❤❡ ❝✉rr❡♥t st❛t❡✳ ■t ✐s ✐♠♣♦rt❛♥t t♦ ♥♦t❡ t❤❛t t❤❡ ♣♦❧✐❝② ♠❛② ❜❡ ✈❛r②✐♥❣ ✇✐t❤ t✐♠❡✱ ✇❤✐❝❤ ✐s ❡s♣❡❝✐❛❧❧② tr✉❡ ✐♥ t❤❡ ❝❛s❡ ♦❢ ✜♥✐t❡ ❤♦r✐③♦♥ ▼❉Ps✳ ❲❡ ✇✐❧❧ ❞❡♥♦t❡ ❛ ❣❡♥❡r✐❝ ♣♦❧✐❝② ❜② t❤❡ ❜♦❧❞❢❛❝❡ s②♠❜♦❧ π✱ ❞❡✜♥❡❞ ❛s t❤❡ ✐♥✜♥✐t❡ ❞✐♠❡♥s✐♦♥❛❧ t✉♣❧❡ π = (π0, π1, . . . )✱ ✇❤❡r❡ πt r❡❢❡rs t♦ t❤❡ ♣♦❧✐❝② ❛t t✐♠❡ t✳ ❲❡ ✇✐❧❧ ❝❛❧❧ ♣♦❧✐❝✐❡s t❤❛t ❞♦ ♥♦t ✈❛r② ✇✐t❤ t✐♠❡ ✏st❛t✐♦♥❛r② ♣♦❧✐❝✐❡s✑✱ ❛♥❞ ✐♥❞✐❝❛t❡ t❤❡♠ ❛s π✱ ✐✳❡✳ ✐♥ t❤✐s ❝❛s❡ π = (π, π, . . . )✳ ❋♦r ❛ st❛t✐♦♥❛r② ♣♦❧✐❝② π✱ ✐❢ ❛t t✐♠❡ t t❤❡ ❛❣❡♥t ✐s ✐♥ st❛t❡ s✱ ✐t ✇✐❧❧ ❝❤♦♦s❡ ❛♥ ❛❝t✐♦♥ a ✇✐t❤ ♣r♦❜❛❜✐❧✐t② ❣✐✈❡♥ ❜② π(a|s) ❛♥❞ t❤✐s ♣r♦❜❛❜✐❧✐t② ❞♦❡s ♥♦t ❞❡♣❡♥❞ ♦♥ t✱ ✇❤✐❧❡ ❢♦r ❛ ♥♦♥✲st❛t✐♦♥❛r② ♣♦❧✐❝② t❤❡ ♣r♦❜❛❜✐❧✐t② ✇✐❧❧ ❞❡♣❡♥❞ ♦♥ t✐♠❡ t ❛♥❞ ✇❡ ✇✐❧❧ ❜❡ ❞❡♥♦t❡❞ ❜② πt(a|s)✳

  • ✐✈❡♥ ❛ ♣♦❧✐❝② π ♦♥❡ ❝❛♥ ❞❡✜♥❡ t✇♦ q✉❛♥t✐t✐❡s ✿ t❤❡ st❛t❡ ✈❛❧✉❡ ❢✉♥❝t✐♦♥ ❛♥❞ t❤❡ st❛t❡✲❛❝t✐♦♥ ✈❛❧✉❡

❢✉♥❝t✐♦♥ ❢♦r t❤❡ ▼❉P ❝♦rr❡s♣♦♥❞✐♥❣ t♦ t❤❡ ♣♦❧✐❝② π✱ ❛s s❤♦✇♥ ❜❡❧♦✇✿

  • ❙t❛t❡ ✈❛❧✉❡ ❢✉♥❝t✐♦♥ ✿ ❚❤❡ st❛t❡ ✈❛❧✉❡ ❢✉♥❝t✐♦♥ V π

t (s) ❢♦r ❛ st❛t❡ s ∈ S ✐s ❞❡✜♥❡❞ ❛s t❤❡

❡①♣❡❝t❡❞ r❡t✉r♥ st❛rt✐♥❣ ❢r♦♠ t❤❡ st❛t❡ st = s ❛t t✐♠❡ t ❛♥❞ ❢♦❧❧♦✇✐♥❣ ♣♦❧✐❝② π✱ ❛♥❞ ✐s ❣✐✈❡♥ ❜② t❤❡ ❡①♣r❡ss✐♦♥ V π

t (s) = Eπ[Gt|st = s]✱ ✇❤❡r❡ Eπ ❞❡♥♦t❡s t❤❛t t❤❡ ❡①♣❡❝t❛t✐♦♥ ✐s t❛❦❡♥ ✇✐t❤

r❡s♣❡❝t t♦ t❤❡ ♣♦❧✐❝② π✳ ❋r❡q✉❡♥t❧② ✇❡ ✇✐❧❧ ❞r♦♣ t❤❡ s✉❜s❝r✐♣t π ✐♥ t❤❡ ❡①♣❡❝t❛t✐♦♥ t♦ s✐♠♣❧✐❢② ♥♦t❛t✐♦♥ ❣♦✐♥❣ ❢♦r✇❛r❞✳ ❚❤✉s E ✇✐❧❧ ♠❡❛♥ ❡①♣❡❝t❛t✐♦♥ ✇✐t❤ r❡s♣❡❝t t♦ t❤❡ ♣♦❧✐❝② ✉♥❧❡ss s♣❡❝✐✜❡❞ ♦t❤❡r✇✐s❡✱ ❛♥❞ s♦ ✇❡ ❝❛♥ ✇r✐t❡ V π

t (s) = E[Gt|st = s] .

✭✷✷✮ ◆♦t✐❝❡ t❤❛t ✇❤❡♥ t❤❡ ❤♦r✐③♦♥ H ✐s ✐♥✜♥✐t❡✱ t❤✐s ❞❡✜♥✐t✐♦♥ ✭✷✷✮ t♦❣❡t❤❡r ✇✐t❤ t❤❡ st❛t✐♦♥❛r② ❛ss✉♠♣t✐♦♥s ♦❢ t❤❡ r❡✇❛r❞s✱ tr❛♥s✐t✐♦♥ ♣r♦❜❛❜✐❧✐t✐❡s ❛♥❞ ♣♦❧✐❝② ✐♠♣❧② t❤❛t ❢♦r ❛❧❧ s ∈ S✱ V π

i (s) =

V π

j (s) ❢♦r ❛❧❧ i, j = 0, 1, . . . ✱ ❛♥❞ t❤✉s ✐♥ t❤✐s ❝❛s❡ ✇❡ ✇✐❧❧ ❞❡✜♥❡ ✐♥ ❛ ♠❛♥♥❡r ❛♥❛❧♦❣♦✉s t♦ t❤❡

❝❛s❡ ♦❢ ❛ ▼❛r❦♦✈ r❡✇❛r❞ ♣r♦❝❡ss✿ V π(s) = V π

0 (s) .

✭✷✸✮

  • ❙t❛t❡✲❛❝t✐♦♥ ✈❛❧✉❡ ❢✉♥❝t✐♦♥ ✿ ❚❤❡ st❛t❡✲❛❝t✐♦♥ ✈❛❧✉❡ ❢✉♥❝t✐♦♥ Qπ

t (s, a) ❢♦r ❛ st❛t❡ s ❛♥❞ ❛❝t✐♦♥

a ✐s ❞❡✜♥❡❞ ❛s t❤❡ ❡①♣❡❝t❡❞ r❡t✉r♥ st❛rt✐♥❣ ❢r♦♠ t❤❡ st❛t❡ st = s ❛t t✐♠❡ t ❛♥❞ t❛❦✐♥❣ t❤❡ ❛❝t✐♦♥ at = a✱ ❛♥❞ t❤❡♥ s✉❜s❡q✉❡♥t❧② ❢♦❧❧♦✇✐♥❣ t❤❡ ♣♦❧✐❝② π✳ ■t ✐s ✇r✐tt❡♥ ♠❛t❤❡♠❛t✐❝❛❧❧② ❛s Qπ

t (s, a) = E[Gt|st = s, at = a] .

✭✷✹✮ ■♥ t❤❡ ✐♥✜♥✐t❡ ❤♦r✐③♦♥ ❝❛s❡✱ s✐♠✐❧❛r t♦ t❤❡ st❛t❡ ✈❛❧✉❡ ❢✉♥❝t✐♦♥✱ t❤❡ st❛t✐♦♥❛r② ❛ss✉♠♣t✐♦♥s ❛❜♦✉t t❤❡ r❡✇❛r❞s✱ tr❛♥s✐t✐♦♥ ♣r♦❜❛❜✐❧✐t✐❡s ❛♥❞ ♣♦❧✐❝② ✐♠♣❧② t❤❛t ❢♦r ❛❧❧ s ∈ S ❛♥❞ a ∈ A✱ Qπ

i (s, a) =

j (s, a) ❢♦r ❛❧❧ i, j = 0, 1, . . . ✱ ✇❤✐❝❤ ♠♦t✐✈❛t❡s t❤❡ ❢♦❧❧♦✇✐♥❣ ❞❡✜♥✐t✐♦♥

Qπ(s, a) = Qπ

0 (s, a) .

✭✷✺✮ ✶✵

slide-11
SLIDE 11

❊①❡r❝✐s❡ ✸✳✶✸✳ ❈♦♥s✐❞❡r ❛ st❛t✐♦♥❛r② ♣♦❧✐❝② π = (π, π, . . . )✳ ■❢ t❤❡ ❛ss✉♠♣t✐♦♥s ♦❢ st❛t✐♦♥❛r② tr❛♥s✐✲ t✐♦♥ ♣r♦❜❛❜✐❧✐t✐❡s ❛♥❞ st❛t✐♦♥❛r② r❡✇❛r❞s ❤♦❧❞✱ ❛♥❞ ✐❢ t❤❡ ❤♦r✐③♦♥ H ✐s ✐♥✜♥✐t❡✱ t❤❡♥ ✉s✐♥❣ t❤❡ ❞❡✜♥✐t✐♦♥s ✐♥ ✭✷✷✮ ❛♥❞ ✭✷✹✮ ♣r♦✈❡ t❤❛t ❢♦r ❛❧❧ s ∈ S ❛♥❞ a ∈ A✱ ✭❛✮ V π

i (s) = V π j (s)✱ ❛♥❞ ✭❜✮ Qπ i (s, a) = Qπ j (s, a)

❢♦r ❛❧❧ i, j = 0, 1, . . . ✳ ■♥ t❤❡ ✐♥✜♥✐t❡ ❤♦r✐③♦♥ ❝❛s❡✱ t❤❡ ❛ss✉♠♣t✐♦♥s ❛❜♦✉t st❛t✐♦♥❛r② tr❛♥s✐t✐♦♥ ♣r♦❜❛❜✐❧✐t✐❡s ❛♥❞ r❡✇❛r❞s ❧❡❛❞ t♦ t❤❡ ❢♦❧❧♦✇✐♥❣ ✐♠♣♦rt❛♥t ✐❞❡♥t✐t② ❝♦♥♥❡❝t✐♥❣ t❤❡ st❛t❡ ✈❛❧✉❡ ❢✉♥❝t✐♦♥ ❛♥❞ t❤❡ st❛t❡✲❛❝t✐♦♥ ✈❛❧✉❡ ❢✉♥❝t✐♦♥ ❢♦r ❛ st❛t✐♦♥❛r② ♣♦❧✐❝② π ✿ Qπ(s, a)

(a)

= Qπ

0(s, a) = E[G0|s0 = s, a0 = a] = E

  • i=0

γiri

  • s0 = s, a0 = a
  • = E[r0|s0 = s, a0 = a] +

  • i=1

γiE[ri|s0 = s, a0 = a]

(b)

= R(s, a) +

  • i=1

γi

  • s′∈S

P(s1 = s′|s0 = s, a0 = a)E[ri|s0 = s, a0 = a, s1 = s′]

  • (c)

= R(s, a) + γ

  • s′∈S

P(s′|s, a) ∞

  • i=1

γi−1E[ri|s1 = s′]

  • (d)

= R(s, a) + γ

  • s′∈S

P(s′|s, a)V π(s′) , ✭✷✻✮ ❢♦r ❛❧❧ s ∈ S✱ a ∈ A✱ ✇❤❡r❡ ✭❛✮ ❢♦❧❧♦✇s ❢r♦♠ ✭✷✹✮ ❛♥❞ ✭✷✺✮✱ ✭❜✮ ✐s ❞✉❡ t♦ t❤❡ ❧❛✇ ♦❢ t♦t❛❧ ❡①♣❡❝t❛t✐♦♥✱ ✭❝✮ ❢♦❧❧♦✇s ❢r♦♠ t❤❡ ▼❛r❦♦✈ ♣r♦♣❡rt②✱ ❛♥❞ ✭❞✮ ❢♦❧❧♦✇s ❢r♦♠ ❊①❡r❝✐s❡ ✸✳✶✸ ❛♥❞ ❧✐♥❡❛r✐t② ♦❢ ❡①♣❡❝t❛t✐♦♥✳ ❊①❡r❝✐s❡ ✸✳✶✹✳ ❈♦♥s✐❞❡r ❛ ♣♦❧✐❝② π✱ ♥♦t ♥❡❝❡ss❛r✐❧② st❛t✐♦♥❛r②✳ ✭❛✮ Pr♦✈❡ t❤❛t ✐♥ t❤✐s ❝❛s❡ t❤❡ ❛♥❛❧♦❣ ♦❢ ❡q✉❛t✐♦♥ ✭✷✻✮ ✐s ❣✐✈❡♥ ❜② Qπ

t (s, a) = R(s, a) + γ s′∈S P(s′|s, a)V π t+1(s′)✱ ❢♦r ❛❧❧ s ∈ S✱ a ∈ A ❛♥❞

❢♦r ❛❧❧ t = 0, 1, . . . ✳ ❆♥ ✐♥t❡r❡st✐♥❣ ❛s♣❡❝t ♦❢ s♣❡❝✐❢②✐♥❣ ❛ st❛t✐♦♥❛r② ♣♦❧✐❝② π ♦♥ ❛ ▼❉P ✐s t❤❛t ❡✈❛❧✉❛t✐♥❣ t❤❡ ✈❛❧✉❡ ❢✉♥❝t✐♦♥ ❢♦r t❤❡ ♣♦❧✐❝② ✐s ❡q✉✐✈❛❧❡♥t t♦ ❡✈❛❧✉❛t✐♥❣ t❤❡ ✈❛❧✉❡ ❢✉♥❝t✐♦♥ ♦♥ ❛♥ ❡q✉✐✈❛❧❡♥t ▼❛r❦♦✈ r❡✇❛r❞ ♣r♦❝❡ss✳ ❙♣❡❝✐✜❝❛❧❧② ✇❡ ❞❡✜♥❡ t❤❡ ▼❛r❦♦✈ r❡✇❛r❞ ♣r♦❝❡ss M ′(S, Pπ, Rπ, γ)✱ ✇❤❡r❡ Pπ ❛♥❞ Rπ ❛r❡ ❣✐✈❡♥ ❜②✿ Rπ(s) =

  • a∈A

π(a|s)R(s, a) , P π(s′|s) =

  • a∈A

π(a|s)P(s′|s, a) . ✭✷✼✮ ❊①❡r❝✐s❡ ✸✳✶✺✳ ❈♦♥s✐❞❡r ❛ st❛t✐♦♥❛r② ♣♦❧✐❝② π ❢♦r ❛ ▼❉P✳ ✭❛✮ Pr♦✈❡ t❤❛t t❤❡ ✈❛❧✉❡ ❢✉♥❝t✐♦♥ ♦❢ t❤❡ ♣♦❧✐❝② V π s❛t✐s✜❡s t❤❡ ✐❞❡♥t✐t② V π(s) = Rπ(s) + γ

s′∈S P π(s′|s)V π(s′) ❢♦r ❛❧❧ st❛t❡s s ∈ S✱ ✇✐t❤ Rπ

❛♥❞ Pπ ❞❡✜♥❡❞ ❜② ✭✷✼✮✳ ❚❤❡ ❡✈❛❧✉❛t✐♦♥ ♦❢ t❤❡ ✈❛❧✉❡ ❢✉♥❝t✐♦♥ ❝♦rr❡s♣♦♥❞✐♥❣ t♦ t❤❡ ♣♦❧✐❝② ❝❛♥ t❤❡♥ ❜❡ ❝❛rr✐❡❞ ♦✉t ✉s✐♥❣ t❤❡ t❡❝❤♥✐q✉❡s ✐♥tr♦❞✉❝❡❞ ✐♥ t❤❡ ❝♦♥t❡①t ♦❢ ▼❛r❦♦✈ r❡✇❛r❞ ♣r♦❝❡ss❡s✳ ❋♦r ❡①❛♠♣❧❡✱ ✐♥ t❤❡ ✐♥✜♥✐t❡ ❤♦r✐③♦♥ ❝❛s❡ ✇✐t❤ γ < 1✱ t❤❡ ✐t❡r❛t✐✈❡ ❛❧❣♦r✐t❤♠ t♦ ❝❛❧❝✉❧❛t❡ t❤❡ ✈❛❧✉❡ ❢✉♥❝t✐♦♥ ❝♦rr❡s♣♦♥❞✐♥❣ t♦ ❛ st❛t✐♦♥❛r② ♣♦❧✐❝② π ✐s ❣✐✈❡♥ ✐♥ ❛❧❣♦r✐t❤♠ ✹✳ ❚❤❡ ❛❧❣♦r✐t❤♠ t❛❦❡s ❛s ✐♥♣✉t ❛ ▼❛r❦♦✈ ❞❡❝✐s✐♦♥ ♣r♦❝❡ss M = (S, A, P, R, γ)✱ ❛ st❛t✐♦♥❛r② ♣♦❧✐❝② π✱ ❛♥❞ ❛ t♦❧❡r❛♥❝❡ ǫ✱ ❛♥❞ ❝♦♠♣✉t❡s t❤❡ ✈❛❧✉❡ ❢✉♥❝t✐♦♥ ❢♦r ❛❧❧ t❤❡ st❛t❡s✳ ❊①❡r❝✐s❡ ✸✳✶✻✳ ✭❛✮ Pr♦✈❡ t❤❛t ✇❤❡♥ γ < 1✱ ❛❧❣♦r✐t❤♠ ✹ ❛❧✇❛②s ❝♦♥✈❡r❣❡s✳ ❍✐♥t✿ ❯s❡ ❚❤❡♦r❡♠ ✸✳✶✳ ✭❜✮ ❈♦♥s✐❞❡r ❛ ♣♦s✐t✐✈❡ s❡q✉❡♥❝❡ ♦❢ r❡❛❧ ♥✉♠❜❡rs {ǫi}i≥1 s✉❝❤ t❤❛t ǫi → 0✳ ❙✉♣♣♦s❡ ❛❧❣♦r✐t❤♠ ✹ ✐s r✉♥ t♦ t❡r♠✐♥❛t✐♦♥ ❢♦r ❡❛❝❤ ǫi✱ ❛♥❞ ❞❡♥♦t❡ ❡❛❝❤ ❝♦rr❡s♣♦♥❞✐♥❣ ♦✉t♣✉t ♦❢ t❤❡ ❛❧❣♦r✐t❤♠ ❛s V π

i ✳ Pr♦✈❡ t❤❛t

t❤❡ s❡q✉❡♥❝❡ V π

i → V π✱ ✇❤❡r❡ V π ✐s t❤❡ ✈❛❧✉❡ ♦❢ t❤❡ ♣♦❧✐❝②✳

✶✶

slide-12
SLIDE 12

❆❧❣♦r✐t❤♠ ✹ ■t❡r❛t✐✈❡ ❛❧❣♦r✐t❤♠ t♦ ❝❛❧❝✉❧❛t❡ ▼❉P ✈❛❧✉❡ ❢✉♥❝t✐♦♥ ❢♦r ❛ st❛t✐♦♥❛r② ♣♦❧✐❝② π

✶✿ ♣r♦❝❡❞✉r❡ P♦❧✐❝② ❊✈❛❧✉❛t✐♦♥✭M, π, ǫ✮ ✷✿

❋♦r ❛❧❧ st❛t❡s s ∈ S✱ ❞❡✜♥❡ Rπ(s) =

a∈A π(a|s)R(s, a)

✸✿

❋♦r ❛❧❧ st❛t❡s s, s′ ∈ S✱ ❞❡✜♥❡ P π(s′|s) =

a∈A π(a|s)P(s′|s, a)

✹✿

❋♦r ❛❧❧ st❛t❡s s ∈ S✱ V ′(s) ← 0✱ V (s) ← ∞

✺✿

✇❤✐❧❡ ||V − V ′||∞ > ǫ ❞♦

✻✿

V ← V ′

✼✿

❋♦r ❛❧❧ st❛t❡s s ∈ S✱ V ′(s) = Rπ(s) + γ

s′∈S P π(s′|s)V (s′)

✽✿

r❡t✉r♥ V ′(s) ❢♦r ❛❧❧ s ∈ S ❋✐❣✉r❡ ✸✿ ▼❛rs ❘♦✈❡r ▼❛r❦♦✈ ❞❡❝✐s✐♦♥ ♣r♦❝❡ss✳ ✸✳✹✳✷ ❊①❛♠♣❧❡ ♦❢ ❛ ▼❛r❦♦✈ ❞❡❝✐s✐♦♥ ♣r♦❝❡ss ✿ ▼❛rs ❘♦✈❡r ❆s ❛♥ ❡①❛♠♣❧❡ ♦❢ ❛ ▼❉P✱ ❝♦♥s✐❞❡r t❤❡ ❡①❛♠♣❧❡ ❣✐✈❡♥ ✐♥ ❋✐❣✉r❡ ✸✳ ❚❤❡ ❛❣❡♥t ✐s ❛❣❛✐♥ ❛ ▼❛rs r♦✈❡r ✇❤♦s❡ st❛t❡ s♣❛❝❡ ✐s ❣✐✈❡♥ ❜② S = {S1, S2, S3, S4, S5, S6, S7}✳ ❚❤❡ ❛❣❡♥t ❤❛s t✇♦ ❛❝t✐♦♥s ✐♥ ❡❛❝❤ st❛t❡ ❝❛❧❧❡❞ ✏tr② ❧❡❢t✑ ❛♥❞ ✏tr② r✐❣❤t✑✱ ❛♥❞ s♦ t❤❡ ❛❝t✐♦♥ s♣❛❝❡ ✐s ❣✐✈❡♥ ❜② A = {TL, TR}✳ ❚❛❦✐♥❣ ❛♥ ❛❝t✐♦♥ ❛❧✇❛②s s✉❝❝❡❡❞s✱ ✉♥❧❡ss ✇❡ ❤✐t ❛♥ ❡❞❣❡ ✐♥ ✇❤✐❝❤ ❝❛s❡ ✇❡ st❛② ✐♥ t❤❡ s❛♠❡ st❛t❡✳ ❚❤✐s ❧❡❛❞s t♦ t❤❡ t✇♦ tr❛♥s✐t✐♦♥ ♣r♦❜❛❜✐❧✐t② ♠❛tr✐❝❡s ❢♦r ❡❛❝❤ ♦❢ t❤❡ t✇♦ ❛❝t✐♦♥s ❛s s❤♦✇♥ ✐♥ ❋✐❣✉r❡ ✸✳ ❚❤❡ r❡✇❛r❞s ❢r♦♠ ❡❛❝❤ st❛t❡ ❛r❡ t❤❡ s❛♠❡ ❢♦r ❛❧❧ ❛❝t✐♦♥s✱ ❛♥❞ ✐s 0 ✐♥ t❤❡ st❛t❡s {S2, S3, S4, S5, S6}✱ ✇❤✐❧❡ ❢♦r t❤❡ st❛t❡s S1, S7 t❤❡ r❡✇❛r❞s ❛r❡ 1, 10 r❡s♣❡❝t✐✈❡❧②✳ ❚❤❡ ❞✐s❝♦✉♥t ❢❛❝t♦r ❢♦r t❤✐s ▼❉P ✐s s♦♠❡ γ ∈ [0, 1]✳ ❊①❡r❝✐s❡ ✸✳✶✼✳ ❈♦♥s✐❞❡r t❤❡ ▼❉P ❞✐s❝✉ss❡❞ ❛❜♦✈❡ ✐♥ ❋✐❣✉r❡ ✸✳ ▲❡t γ = 0✱ ❛♥❞ ❝♦♥s✐❞❡r ❛ st❛t✐♦♥❛r② ♣♦❧✐❝② π ✇❤✐❝❤ ❛❧✇❛②s ✐♥✈♦❧✈❡s t❛❦✐♥❣ t❤❡ ❛❝t✐♦♥ TL ❢r♦♠ ❛♥② st❛t❡✳ ✭❛✮ ❈❛❧❝✉❧❛t❡ t❤❡ ✈❛❧✉❡ ❢✉♥❝t✐♦♥ ♦❢ t❤❡ ♣♦❧✐❝② ❢♦r ❛❧❧ st❛t❡s ✐❢ t❤❡ ❤♦r✐③♦♥ ✐s ✜♥✐t❡✳ ✭❜✮ ❈❛❧❝✉❧❛t❡ t❤❡ ✈❛❧✉❡ ❢✉♥❝t✐♦♥ ♦❢ t❤❡ ♣♦❧✐❝② ✇❤❡♥ t❤❡ ❤♦r✐③♦♥ ✐s ✐♥✜♥✐t❡✳ ❍✐♥t✿ ❯s❡ ❚❤❡♦r❡♠ ❆✳✸✳

✸✳✺ ❇❡❧❧♠❛♥ ❜❛❝❦✉♣ ♦♣❡r❛t♦rs

■♥ t❤✐s s❡❝t✐♦♥✱ ✇❡ ✐♥tr♦❞✉❝❡ t❤❡ ❝♦♥❝❡♣t ♦❢ t❤❡ ❇❡❧❧♠❛♥ ❜❛❝❦✉♣ ♦♣❡r❛t♦rs ❛♥❞ ♣r♦✈❡ s♦♠❡ ♦❢ t❤❡✐r ♣r♦♣❡rt✐❡s ✇❤✐❝❤ ✇✐❧❧ t✉r♥ ♦✉t t♦ ❜❡ ❡①tr❡♠❡❧② ✉s❡❢✉❧ ✐♥ t❤❡ ♥❡①t s❡❝t✐♦♥ ✇❤❡♥ ✇❡ ❞✐s❝✉ss ▼❉P ❝♦♥tr♦❧✳ ❲❡ ❤❛✈❡ ❛❧r❡❛❞② ❡♥❝♦✉♥t❡r❡❞ ♦♥❡ ❇❡❧❧♠❛♥ ❜❛❝❦✉♣ ♦♣❡r❛t♦r ✐♥ ✭✶✹✮✱ ✭✶✺✮ ✐♥ t❤❡ ♣r♦♦❢ ♦❢ ❚❤❡♦r❡♠ ✸✳✶✳ ❲❡ ✇✐❧❧ ♥♦✇ ❞❡✜♥❡ t✇♦ ♦t❤❡r ❝❧♦s❡❧② r❡❧❛t❡❞ ✭❜✉t ♥♦t s❛♠❡✦✮ ❇❡❧❧♠❛♥ ❜❛❝❦✉♣ ♦♣❡r❛t♦rs ✿ t❤❡ ❇❡❧❧♠❛♥ ✶✷

slide-13
SLIDE 13

❡①♣❡❝t❛t✐♦♥ ❜❛❝❦✉♣ ♦♣❡r❛t♦r ❛♥❞ t❤❡ ❇❡❧❧♠❛♥ ♦♣t✐♠❛❧✐t② ❜❛❝❦✉♣ ♦♣❡r❛t♦r✳ ✸✳✺✳✶ ❇❡❧❧♠❛♥ ❡①♣❡❝t❛t✐♦♥ ❜❛❝❦✉♣ ♦♣❡r❛t♦r ❙✉♣♣♦s❡ ✇❡ ❛r❡ ❣✐✈❡♥ ❛ ▼❉P M = (S, A, P, R, γ)✱ ❛♥❞ ❛ st❛t✐♦♥❛r② ♣♦❧✐❝② π ✇❤✐❝❤ ❝❛♥ ❜❡ ❞❡t❡r✲ ♠✐♥✐st✐❝ ♦r st♦❝❤❛st✐❝✳ ❲❡ ❤❛✈❡ ❛❧r❡❛❞② s❡❡♥ ✐♥ s❡❝t✐♦♥ ✸✳✹✳✶ t❤❛t t❤✐s ✐s ❡q✉✐✈❛❧❡♥t t♦ ❛ ▼❘P M ′ = (S, P π, Rπ, γ)✱ ✇❤❡r❡ P π ❛♥❞ Rπ ❛r❡ ❞❡✜♥❡❞ ✐♥ ✭✷✼✮✳ ❚❤❡ ✈❛❧✉❡ ❢✉♥❝t✐♦♥ ♦❢ ♣♦❧✐❝② π ❡✈❛❧✉✲ ❛t❡❞ ♦♥ M✱ ❛♥❞ ❞❡♥♦t❡❞ ❜② V π✱ ✐s t❤❡ s❛♠❡ ❛s t❤❡ ✈❛❧✉❡ ❢✉♥❝t✐♦♥ ❡✈❛❧✉❛t❡❞ ♦♥ M ′✱ ✇❤❡r❡ ✇❡ ❤❛✈❡ ✉s❡❞ t❤❡ ❝♦rr❡s♣♦♥❞✐♥❣ ❞❡✜♥✐t✐♦♥s ♦❢ t❤❡ ✈❛❧✉❡ ❢✉♥❝t✐♦♥ ❢♦r ❛ ▼❉P ❛♥❞ ▼❘P r❡s♣❡❝t✐✈❡❧②✳ ◆♦t❡ t❤❛t V π ❧✐✈❡s ✐♥ t❤❡ ✜♥✐t❡ ❞✐♠❡♥s✐♦♥❛❧ ❇❛♥❛❝❤ s♣❛❝❡ R|S|✱ ✇❤✐❝❤ ✇❡ ✇✐❧❧ ❡q✉✐♣ ✇✐t❤ t❤❡ ✐♥✜♥✐t② ♥♦r♠ ||·||∞ ✐♥tr♦❞✉❝❡❞ ✐♥ ❊①❡r❝✐s❡ ✸✳✷✳ ❚❤❡♥ ❢♦r ❡❧❡♠❡♥t U ∈ R|S| t❤❡ ❇❡❧❧♠❛♥ ❡①♣❡❝t❛t✐♦♥ ❜❛❝❦✉♣ ♦♣❡r❛t♦r Bπ ❢♦r t❤❡ ♣♦❧✐❝② π ✐s ❞❡✜♥❡❞ ❛s (BπU)(s) = Rπ(s) + γ

  • s′∈S

P π(s′|s)U(s′) , ∀ s ∈ S . ✭✷✽✮ ❲❡ s❤♦✉❧❞ ♥♦t❡ t❤❛t ✇❡ ❤❛✈❡ ❛❧r❡❛❞② s❡❡♥ t❤✐s ♦♣❡r❛t♦r ❛♣♣❡❛r ♦♥❝❡ ❜❡❢♦r❡ ✐♥ ❛❧❣♦r✐t❤♠ ✹✳ ❲❡ ♥♦✇ ♣r♦✈❡ s♦♠❡ ♣r♦♣❡rt✐❡s ♦❢ t❤✐s ♦♣❡r❛t♦r✳ ❚❤❡♦r❡♠ ✸✳✷✳ ❚❤❡ ♦♣❡r❛t♦r Bπ ❞❡✜♥❡❞ ✐♥ ✭✷✽✮ ✐s ❛ ❝♦♥tr❛❝t✐♦♥ ♠❛♣✳ ■❢ γ < 1 t❤❡♥ ✐t ✐s ❛ str✐❝t ❝♦♥tr❛❝t✐♦♥ ❛♥❞ ❤❛s ❛ ✉♥✐q✉❡ ✜①❡❞ ♣♦✐♥t✳ Pr♦♦❢✳ ❈♦♥s✐❞❡r U1, U2 ∈ R|S|✳ ❚❤❡♥ ❢♦r ❛ st❛t❡ s ∈ S✱ ✇❡ ❤❛✈❡ ❢r♦♠ ✭✷✽✮ ❛♥❞ tr✐❛♥❣❧❡ ✐♥❡q✉❛❧✐t② |(BπU1)(s) − (BπU2)(s)| = γ

  • s′∈S

P π(s′|s)(U1(s′) − U2(s′))

  • ≤ γ
  • s′∈S

P π(s′|s)|U1(s′) − U2(s′)| ≤ γ

  • s′∈S

P π(s′|s) max

s′′∈S |U1(s′′) − U2(s′′)| = γ

  • s′∈S

P π(s′|s) ||U1 − U2||∞ = γ ||U1 − U2||∞ . ✭✷✾✮ ❆s ✭✷✾✮ ✐s tr✉❡ ❢♦r ❡✈❡r② s ∈ S ✇❡ ❝♦♥❝❧✉❞❡ t❤❛t ||BπU1 − BπU2||∞ ≤ γ ||U1 − U2||∞✱ ❛♥❞ ❤❡♥❝❡ Bπ ✐s ❛ ❝♦♥tr❛❝t✐♦♥ ♠❛♣ ❛s γ ∈ [0, 1]✳ ❈♦♥s✐❞❡r✐♥❣ γ < 1 ✐♥ ✭✷✾✮✱ ✇❡ ❝♦♥❝❧✉❞❡ t❤❛t ✐♥ t❤✐s ❝❛s❡ Bπ ✐s ❛ str✐❝t ❝♦♥tr❛❝t✐♦♥✱ ❛♥❞ ❤❡♥❝❡ ❜② ❛♣♣❧②✐♥❣ ❚❤❡♦r❡♠ ❆✳✺ ✐t ❤❛s ❛ ✉♥✐q✉❡ ✜①❡❞ ♣♦✐♥t✳ ❈♦r♦❧❧❛r② ✸✳✷✳✶✳ ▲❡t γ < 1✳ ❚❤❡♥ ❢♦r ❛♥② U ∈ R|S| t❤❡ s❡q✉❡♥❝❡ {(Bπ)kU}k≥0 ✐s ❛ ❈❛✉❝❤② s❡q✉❡♥❝❡ ❛♥❞ ❝♦♥✈❡r❣❡s t♦ t❤❡ ✜①❡❞ ♣♦✐♥t ♦❢ Bπ✳ Pr♦♦❢✳ ❚❤❡ ♣r♦♦❢ ❢♦❧❧♦✇s ❞✐r❡❝t❧② ❜② ❛♣♣❧②✐♥❣ ❚❤❡♦r❡♠ ✸✳✷✱ ❢♦❧❧♦✇❡❞ ❜② ❚❤❡♦r❡♠ ❆✳✹ ❛♥❞ t❤❡ ❝♦♥tr❛❝✲ t✐♦♥ ♠❛♣♣✐♥❣ t❤❡♦r❡♠ ✭❚❤❡♦r❡♠❆✳✺✮✳ ❚❤✐s ❛❧s♦ ✐♠♣❧✐❡s t❤❛t ❢♦r ❛ st❛t✐♦♥❛r② ♣♦❧✐❝② π✱ t❤❡ ✈❛❧✉❡ ❢✉♥❝t✐♦♥ ♦❢ t❤❡ ♣♦❧✐❝② V π ✐s ❛ ✜①❡❞ ♣♦✐♥t ♦❢ Bπ✱ ❛s s❤♦✇♥ ❜② t❤❡ ❢♦❧❧♦✇✐♥❣ ❝♦r♦❧❧❛r②✳ ❈♦r♦❧❧❛r② ✸✳✷✳✷✳ ▲❡t π ❜❡ ❛ ♣♦❧✐❝② ❢♦r ❛♥ ✐♥✜♥✐t❡ ❤♦r✐③♦♥ ▼❉P ✇✐t❤ γ < 1✳ ❚❤❡♥ t❤❡ ✈❛❧✉❡ ❢✉♥❝t✐♦♥ ♦❢ t❤❡ ♣♦❧✐❝② V π ✐s ❛ ✜①❡❞ ♣♦✐♥t ♦❢ Bπ✳ Pr♦♦❢✳ ❚❤❡ ❢❛❝t t❤❛t (BπV π)(s) = V π(s) ❢♦r ❛❧❧ st❛t❡s s ∈ S✱ ❢♦❧❧♦✇s ❢r♦♠ t❤❡ ❞❡✜♥✐t✐♦♥ ✭✷✽✮ ♦❢ Bπ ❛♥❞ ❊①❡r❝✐s❡ ✸✳✶✺✳ ✶✸

slide-14
SLIDE 14

❚❤❡ ♥❡①t t❤❡♦r❡♠ ♣r♦✈❡s t❤❡ ✏♠♦♥♦t♦♥✐❝✐t②✑ ♣r♦♣❡rt② ♦❢ t❤❡ ❇❡❧❧♠❛♥ ❡①♣❡❝t❛t✐♦♥ ❜❛❝❦✉♣ ♦♣❡r❛t♦r✳ ❚❤❡♦r❡♠ ✸✳✸✳ ❙✉♣♣♦s❡ ✇❡ ❤❛✈❡ U1, U2 ∈ R|S| s✉❝❤ t❤❛t ❢♦r ❛❧❧ s ∈ S, U1(s) ≥ U2(s)✳ ❚❤❡♥ ❢♦r ❡✈❡r② st❛t✐♦♥❛r② ♣♦❧✐❝② π✱ ✇❡ ❤❛✈❡ (BπU1)(s) ≥ (BπU2)(s) ❢♦r ❛❧❧ s ∈ S✳ ■❢ ✐♥st❡❛❞ t❤❡ ✐♥❡q✉❛❧✐t② ✐s str✐❝t✱ ✐✳❡✳ U1(s) > U2(s) ❢♦r ❛❧❧ s ∈ S✱ t❤❡♥ ✇❡ ❤❛✈❡ (BπU1)(s) > (BπU2)(s) ❢♦r ❛❧❧ s ∈ S✳ Pr♦♦❢✳ ❲❤❡♥ U1(s) ≥ U2(s) ❢♦r ❛❧❧ s ∈ S✱ ✉s✐♥❣ ❞❡✜♥✐t✐♦♥ ✭✷✽✮ ♦❢ Bπ ✇❡ ♦❜t❛✐♥✱ (BπU1)(s) − (BπU2)(s) =

  • s′∈S

P π(s′|s)(U1(s′) − U2(s′)) ≥ 0 , ✭✸✵✮ ❛♥s ✇❤❡♥ U1(s) > U2(s) ❢♦r ❛❧❧ s ∈ S✱ t❤❡ s❛♠❡ st❡♣s ❣✐✈❡ (BπU1)(s) − (BπU2)(s) > 0✱ ❢♦r ❛❧❧ st❛t❡s s ∈ S✳ ✸✳✺✳✷ ❇❡❧❧♠❛♥ ♦♣t✐♠❛❧✐t② ❜❛❝❦✉♣ ♦♣❡r❛t♦r ❙✉♣♣♦s❡ ✇❡ ❛r❡ ♥♦✇ ❣✐✈❡♥ ❛ ▼❉P M = (S, A, P, R, γ)✳ ❲❡ ❛❣❛✐♥ ❝♦♥s✐❞❡r t❤❡ ✜♥✐t❡ ❞✐♠❡♥s✐♦♥❛❧ ❇❛♥❛❝❤ s♣❛❝❡ R|S| ❡q✉✐♣♣❡❞ ✇✐t❤ t❤❡ ✐♥✜♥✐t② ♥♦r♠ || · ||∞✳ ❚❤❡♥ ❢♦r ❡✈❡r② ❡❧❡♠❡♥t U ∈ R|S| t❤❡ ❇❡❧❧♠❛♥ ♦♣t✐♠❛❧✐t② ❜❛❝❦✉♣ ♦♣❡r❛t♦r B∗ ✐s ❞❡✜♥❡❞ ❛s (B∗U)(s) = max

a∈A

  • R(s, a) + γ
  • s′∈S

P(s′|s, a)U(s′)

  • , ∀ s ∈ S .

✭✸✶✮ ❲❡ ♥❡①t ♣r♦✈❡ ❛♥❛❧♦❣♦✉s ♣r♦♣❡rt✐❡s ❢♦r t❤✐s ♦♣❡r❛t♦r ✇❤✐❝❤ ❛r❡ s✐♠✐❧❛r t♦ t❤❡ ♦♥❡s ❢♦r t❤❡ ❇❡❧❧♠❛♥ ❡①♣❡❝t❛t✐♦♥ ❜❛❝❦✉♣ ♦♣❡r❛t♦r✳ ❚❤❡♦r❡♠ ✸✳✹✳ ❋♦r ❡✈❡r② U1, U2 ∈ R|S|✱ ❛♥❞ ❢♦r ❛❧❧ st❛t❡s s ∈ S t❤❡ ❢♦❧❧♦✇✐♥❣ ✐♥❡q✉❛❧✐t✐❡s ❛r❡ tr✉❡✿ ✭❛✮ (B∗U1)(s) − (B∗U2)(s) ≤ γ max

a∈A

  • s′∈S

P(s′|s, a) (U1(s′) − U2(s′))

  • ≤ γ max

a∈A

  • s′∈S

P(s′|s, a) |U1(s′) − U2(s′)|

  • ,

✭✸✷✮ ✭❜✮ |(B∗U1)(s) − (B∗U2)(s)| ≤ γ max

a∈A

  • s′∈S

P(s′|s, a) |U1(s′) − U2(s′)|

  • ≤ γ ||U1 − U2||∞ .

✭✸✸✮ Pr♦♦❢✳ ❲❡ ✜rst ♣r♦✈❡ ♣❛rt ✭❛✮✳ ❋✐① ❛ st❛t❡ s ∈ S✳ ❯s✐♥❣ ✭✸✶✮ ❛♥❞ ❛s t❤❡ ❛❝t✐♦♥ s♣❛❝❡ A ✐s ✜♥✐t❡✱ ✇❡ ❝♦♥❝❧✉❞❡ t❤❛t t❤❡r❡ ❡①✐sts a1, a2 ∈ A✱ ♥♦t ♥❡❝❡ss❛r✐❧② ❞✐✛❡r❡♥t✱ s✉❝❤ t❤❛t t❤❡ ❢♦❧❧♦✇✐♥❣ ❤♦❧❞s✿ (B∗U1)(s) = R(s, a1) + γ

  • s′∈S

P(s′|s, a1)U1(s′) , (B∗U2)(s) = R(s, a2) + γ

  • s′∈S

P(s′|s, a2)U2(s′) . ✭✸✹✮ ❚❤❡♥ ❜② t❤❡ ❞❡✜♥✐t✐♦♥ ♦❢ ♠❛①✐♠✉♠ ✐♥ ✭✸✶✮✱ ✇❡ ❛❧s♦ ❤❛✈❡ ❢♦r t❤❡ ❛❝t✐♦♥ a1 t❤❛t (B∗U2)(s) ≥ R(s, a1) + γ

  • s′∈S

P(s′|s, a1)U2(s′) . ✭✸✺✮ ✶✹

slide-15
SLIDE 15

❚❤✉s ❢r♦♠ ✭✸✹✮ ❛♥❞ ✭✸✺✮ ✇❡ ❞❡❞✉❝❡ t❤❡ ❢♦❧❧♦✇✐♥❣ (B∗U1)(s) − (B∗U2)(s) ≤ γ

  • s′∈S

P(s′|s, a1) (U1(s′) − U2(s′)) ≤ γ max

a∈A

  • s′∈S

P(s′|s, a) (U1(s′) − U2(s′))

  • ,

✭✸✻✮ ✇❤✐❝❤ ♣r♦✈❡s t❤❡ ✜rst ✐♥❡q✉❛❧✐t② ♦❢ ✭❛✮✳ ❋♦r t❤❡ s❡❝♦♥❞ ✐♥❡q✉❛❧✐t② ♥♦t✐❝❡ t❤❛t ✇❡ ❤❛✈❡ ❢♦r ❛❧❧ st❛t❡s s′ ∈ S✱ U1(s′) − U2(s′) ≤ |U1(s′) − U2(s′)|✱ ❛♥❞ s♦ ♠✉❧t✐♣❧②✐♥❣ ❡❛❝❤ ♦❢ t❤❡s❡ ✐♥❡q✉❛❧✐t✐❡s ❜② ♣♦s✐t✐✈❡ ♥✉♠❜❡rs P(s′|s, a) ❢♦r s♦♠❡ a ∈ A✱ ❛♥❞ s✉♠♠✐♥❣ ♦✈❡r ❛❧❧ s′ ❣✐✈❡s

  • s′∈S

P(s′|s, a) (U1(s′) − U2(s′)) ≤

  • s′∈S

P(s′|s, a) |(U1(s′) − U2(s′))| . ✭✸✼✮ ❚❤❡ r❡s✉❧t ✐s ♣r♦✈❡❞ ❜② t❛❦✐♥❣ t❤❡ ♠❛① ♦✈❡r ❛❧❧ a ∈ A✱ ❜② ✉s✐♥❣ ♠♦♥♦t♦♥✐❝✐t② ♦❢ t❤❡ ♠❛① ❢✉♥❝t✐♦♥✳ ❚♦ ♣r♦✈❡ ♣❛rt ✭❜✮✱ ♥♦t✐❝❡ t❤❛t ❜② ✐♥t❡r❝❤❛♥❣✐♥❣ t❤❡ r♦❧❡s ♦❢ U1, U2✱ ✇❡ ❤❛✈❡ ❢r♦♠ ♣❛rt ✭❛✮ (B∗U2)(s) − (B∗U1)(s) ≤ γ max

a∈A

  • s′∈S

P(s′|s, a) |U1(s′) − U2(s′)|

  • ,

✭✸✽✮ ❛♥❞ t❤✉s ❝♦♠❜✐♥✐♥❣ ✭✸✽✮ ❛♥❞ ✭✸✷✮ ✇❡ ♦❜t❛✐♥ |(B∗U1)(s) − (B∗U2)(s)| ≤ γ max

a∈A

  • s′∈S

P(s′|s, a) |U1(s′) − U2(s′)|

  • ≤ γ max

a∈A

  • s′∈S

P(s′|s, a) max

s′′∈S |U1(s′′) − U2(s′′)|

  • = γ max

a∈A

  • s′∈S

P(s′|s, a) ||U1 − U2||∞

  • = γ max

a∈A ||U1 − U2||∞ = γ ||U1 − U2||∞ ,

✭✸✾✮ ✇❤✐❝❤ ♣r♦✈❡s ✭❜✮✳ ❚❤❡♦r❡♠ ✸✳✺✳ ❚❤❡ ♦♣❡r❛t♦r B∗ ❞❡✜♥❡❞ ✐♥ ✭✸✶✮ ✐s ❛ ❝♦♥tr❛❝t✐♦♥ ♠❛♣✳ ■❢ γ < 1 t❤❡♥ ✐t ✐s ❛ str✐❝t ❝♦♥tr❛❝t✐♦♥ ❛♥❞ ❤❛s ❛ ✉♥✐q✉❡ ✜①❡❞ ♣♦✐♥t✳ Pr♦♦❢✳ ❚❤❡ ❢❛❝t t❤❛t B∗ ✐s ❛ ❝♦♥tr❛❝t✐♦♥ ❢♦❧❧♦✇s ❢r♦♠ ❚❤❡♦r❡♠ ✸✳✹ ❜② ♦❜s❡r✈✐♥❣ t❤❛t ✭✸✷✮ ✐s tr✉❡ ❢♦r ❛❧❧ s ∈ S✱ ❛♥❞ s♦ ♠✉st ❜❡ tr✉❡ ✐♥ ♣❛rt✐❝✉❧❛r ❢♦r arg max

s∈S

|(B∗U1)(s) − (B∗U2)(s)|✱ ❢♦r ❡✈❡r② U1, U2 ∈ R|S|✳ ❚❤✉s ||B∗U1 − B∗U2||∞ ≤ γ ||U1 − U2||∞✱ ♣r♦✈✐♥❣ t❤❛t B∗ ✐s ❛ ❝♦♥tr❛❝t✐♦♥ ♠❛♣ ❛s γ ∈ [0, 1]✳ ❙❡tt✐♥❣ γ < 1 ✐♥ t❤✐s ✐♥❡q✉❛❧✐t② ♣r♦✈❡s t❤❛t B∗ ✐s ❛ str✐❝t ❝♦♥tr❛❝t✐♦♥ ❢♦r γ ∈ [0, 1) ❛♥❞ t❤✉s ❤❛s ❛ ✉♥✐q✉❡ ✜①❡❞ ♣♦✐♥t ❜② ❚❤❡♦r❡♠ ❆✳✺✳ ❈♦r♦❧❧❛r② ✸✳✺✳✶✳ ▲❡t γ < 1✳ ❚❤❡♥ ❢♦r ❛♥② U ∈ R|S| t❤❡ s❡q✉❡♥❝❡ {(B∗)kU}k≥0 ✐s ❛ ❈❛✉❝❤② s❡q✉❡♥❝❡ ❛♥❞ ❝♦♥✈❡r❣❡s t♦ t❤❡ ✜①❡❞ ♣♦✐♥t ♦❢ B∗✳ Pr♦♦❢✳ ❚❤❡ ♣r♦♦❢ ❢♦❧❧♦✇s ❞✐r❡❝t❧② ❜② ❛♣♣❧②✐♥❣ ❚❤❡♦r❡♠ ✸✳✺✱ ❢♦❧❧♦✇❡❞ ❜② ❚❤❡♦r❡♠ ❆✳✹ ❛♥❞ t❤❡ ❝♦♥tr❛❝✲ t✐♦♥ ♠❛♣♣✐♥❣ t❤❡♦r❡♠ ✭❚❤❡♦r❡♠❆✳✺✮✳ ❚❤❡ ♥❡①t t❤❡♦r❡♠ ❝♦♠♣❛r❡s t❤❡ r❡s✉❧t ♦❢ t❤❡ ❛♣♣❧✐❝❛t✐♦♥ ♦❢ Bπ ✈❡rs✉s B∗ t♦ s♦♠❡ U ∈ R|S|✳ ❚❤❡♦r❡♠ ✸✳✻✳ ❋♦r ❡✈❡r② st❛t✐♦♥❛r② ♣♦❧✐❝② π✱ ❢♦r ❡✈❡r② U ∈ R|S| ❛♥❞ ❢♦r ❛❧❧ s ∈ S✱ (B∗U)(s) ≥ (BπU)(s)✳ ✶✺

slide-16
SLIDE 16

Pr♦♦❢✳ ❋✐① ❛ st❛t✐♦♥❛r② ♣♦❧✐❝② π✱ ❛♥❞ ❧❡t Bπ ❜❡ t❤❡ ❝♦rr❡s♣♦♥❞✐♥❣ ❇❡❧❧♠❛♥ ❡①♣❡❝t❛t✐♦♥ ❜❛❝❦✉♣ ♦♣❡r✲ ❛t♦r✳ ❋✐① s♦♠❡ U ∈ R|S|✳ ▲❡t ✉s ❛❧s♦ ✜① s♦♠❡ s ∈ S✳ ❚❤❡♥ ❢r♦♠ ❞❡✜♥✐t✐♦♥ ✭✸✶✮ ♦❢ B∗ ✇❡ ❤❛✈❡ (B∗U)(s) = max

a∈A

  • R(s, a) + γ
  • s′∈S

P(s′|s, a)U(s′)

  • ≥ R(s, a)+γ
  • s′∈S

P(s′|s, a)U(s′) , ∀ a ∈ A . ✭✹✵✮ ▼✉❧t✐♣❧②✐♥❣ ✭✹✵✮ ❜② π(a|s) ❛♥❞ s✉♠♠✐♥❣ ♦✈❡r ❛❧❧ a ∈ A ❣✐✈❡s (B∗U)(s) =

  • a∈A

π(a|s)(B∗U)(s) ≥

  • a∈A

π(a|s)

  • R(s, a) + γ
  • s′∈S

P(s′|s, a)U(s′)

  • =
  • a∈A

π(a|s)R(s, a) + γ

  • s′∈S
  • a∈A

π(a|s)P(s′|s, a)

  • U(s′)

= Rπ(s) + γ

  • s′∈S

P π(s′|s)U(s′) = (BπU)(s) , ✭✹✶✮ ✇❤❡r❡ t❤❡ ❧❛st ❡q✉❛❧✐t② ❢♦❧❧♦✇s ❢r♦♠ ❞❡✜♥✐t✐♦♥s ✭✷✼✮ ❛♥❞ ✭✷✽✮ ♦❢ Rπ✱ P π ❛♥❞ Bπ✱ t❤✉s ♣r♦✈✐♥❣ t❤❡ t❤❡♦r❡♠✳

✸✳✻ ▼❉P ❝♦♥tr♦❧ ✐♥ t❤❡ ✐♥✜♥✐t❡ ❤♦r✐③♦♥ s❡tt✐♥❣

❲❡ ♥♦✇ ❤❛✈❡ ❛❧❧ t❤❡ ❜❛❝❦❣r♦✉♥❞ ♥❡❝❡ss❛r② t♦ ❞✐s❝✉ss t❤❡ ♣r♦❜❧❡♠ ♦❢ ✏▼❉P ❝♦♥tr♦❧✑✱ ✇❤❡r❡ ✇❡ s❡❡❦ t♦ ✜♥❞ t❤❡ ❜❡st ♣♦❧✐❝② ✭♦❢t❡♥ ❛ ♣♦❧✐❝②✮✱ t❤❛t ❛❝❤✐❡✈❡s t❤❡ ❣r❡❛t❡st ✈❛❧✉❡ ❢✉♥❝t✐♦♥ ❛♠♦♥❣ t❤❡ s❡t ♦❢ ❛❧❧ ♣♦ss✐❜❧❡ ♣♦❧✐❝✐❡s✳ ■♥ t❤❡ ❝♦♥t❡①t ♦❢ r❡✐♥❢♦r❝❡♠❡♥t ❧❡❛r♥✐♥❣✱ t❤✐s ✐s ♣r❡❝✐s❡❧② t❤❡ ♦❜❥❡❝t✐✈❡ ♦❢ t❤❡ ❛❣❡♥t✳ ❲❡ ❛r❡ ❣♦✐♥❣ t♦ ✜rst ❞✐s❝✉ss t❤❡ ✐♥✜♥✐t❡ ❤♦r✐③♦♥ ❝❛s❡ ✐♥ t❤✐s s❡❝t✐♦♥✱ ❛♥❞ t❤❡ ✜♥✐t❡ ❤♦r✐③♦♥ ❝❛s❡ ✇✐❧❧ ❜❡ ♠❡♥t✐♦♥❡❞ ✐♥ t❤❡ ♥❡①t s❡❝t✐♦♥✳ ❲❡ ❞♦ ✐t t❤✐s ✇❛② ❜❡❝❛✉s❡ t❤❡ ✐♥✜♥✐t❡ ❤♦r✐③♦♥ ❝❛s❡ ✐s ❛ ♠✉❝❤ ❤❛r❞❡r ♣r♦❜❧❡♠✱ t❤❛t ♣r❡s❡♥ts q✉✐t❡ ❛ ❢❡✇ ♠❛t❤❡♠❛t✐❝❛❧ ❝❤❛❧❧❡♥❣❡s ✇❤✐❝❤ ✇✐❧❧ ♥❡❡❞ t♦ ❜❡ r❡s♦❧✈❡❞✳ ❚♦ ❣❡t st❛rt❡❞✱ ✇❡ ♥❡❡❞ t♦ ❛❞❞r❡ss t❤❡ q✉❡st✐♦♥ ✏✇❤❛t ❞♦ ✇❡ ❡①❛❝t❧② ♠❡❛♥ ❜② ✜♥❞✐♥❣ ❛♥ ♦♣t✐♠❛❧ ♣♦❧✐❝② ❄✑✳ Pr❡❝✐s❡❧② ✇❡ ✇❛♥t t♦ ❦♥♦✇ ✇❤❡t❤❡r ❛ ♣♦❧✐❝② ❛❧✇❛②s ❡①✐sts✱ ✇❤✐❝❤ ✇❡ ✇✐❧❧ ❞❡♥♦t❡ ❜② π∗✱ ✇❤♦s❡ ✈❛❧✉❡ ❢✉♥❝t✐♦♥ ✐s ❛t ❧❡❛st ❛s ❣♦♦❞ ❛s t❤❡ ✈❛❧✉❡ ❢✉♥❝t✐♦♥ ♦❢ ❛♥② ♦t❤❡r ♣♦❧✐❝②✳ ■♥ ♦t❤❡r ✇♦r❞s✱ ✇❡ ♥❡❡❞ t♦ ❡♥s✉r❡ t❤❛t t❤❡ s✉♣r❡♠✉♠ ♦❢ t❤❡ ✈❛❧✉❡ ❢✉♥❝t✐♦♥ ✐s ❛❝t✉❛❧❧② ❛tt❛✐♥❡❞ ❢♦r s♦♠❡ ♣♦❧✐❝② ✦ ❚♦ ❛♣♣r❡❝✐❛t❡ t❤❡ s✉❜t❧❡t② ♦❢ t❤✐s ♣♦✐♥t✱ ❝♦♥s✐❞❡r t❤❡ ❡①❛♠♣❧❡ ♦❢ ♠❛①✐♠✐③✐♥❣ t❤❡ ❢✉♥❝t✐♦♥ f : R → R ♦♥ (0, 1) ❞❡✜♥❡❞ ❛s f(x) = x✱ ❛♥❞ ♥♦t❡ t❤❛t t❤✐s ♣r♦❜❧❡♠ ❞♦❡s ♥♦t ❤❛✈❡ ❛ s♦❧✉t✐♦♥✳ ❇✉t sup f(x) = 1✱ ❛❧t❤♦✉❣❤ ∄ x ∈ (0, 1) ❢♦r ✇❤✐❝❤ t❤✐s ✐s ❛tt❛✐♥❡❞✳ ❲❡ ✜rst ❞❡✜♥❡ ♣r❡❝✐s❡❧② ✇❤❛t ✐t ♠❡❛♥s ❢♦r ❛ ♣♦❧✐❝②✱ ♥♦t ♥❡❝❡ss❛r✐❧② st❛t✐♦♥❛r②✱ t♦ ❜❡ ❛♥ ♦♣t✐♠❛❧ ♣♦❧✐❝②✳ ❉❡✜♥✐t✐♦♥ ✸✳✶✳ ❆ ♣♦❧✐❝② π∗ ✐s ❛♥ ♦♣t✐♠❛❧ ♣♦❧✐❝② ✐✛ ❢♦r ❡✈❡r② ♣♦❧✐❝② π✱ ❢♦r ❛❧❧ t = 0, 1, . . . ✱ ❛♥❞ ❢♦r ❛❧❧ st❛t❡s s ∈ S✱ V π∗

t

(s) ≥ V π

t (s)✳

❚❤❡ ♥❡①t r❡s✉❧t t❤❛t ✇❡ ❧❡❛✈❡ ❢♦r t❤❡ r❡❛❞❡r t♦ ♣r♦✈❡ st❛t❡s t❤❛t ❢♦r ❛♥ ✐♥✜♥✐t❡ ❤♦r✐③♦♥ ▼❉P✱ ❡①✐st❡♥❝❡ ♦❢ ❛♥ ♦♣t✐♠❛❧ ♣♦❧✐❝② ❛❧s♦ ✐♠♣❧✐❡s t❤❡ ❡①✐st❡♥❝❡ ♦❢ ❛ st❛t✐♦♥❛r② ♦♣t✐♠❛❧ ♣♦❧✐❝②✳ ❚❤✐s r❡s✉❧t ✐s ✐♥t✉✐t✐✈❡❧② ♦❜✈✐♦✉s✱ ❛♥❞ ✐s ❛ ✈❡r② ✐♠♣♦rt❛♥t r❡s✉❧t ❛s ✐t s✐❣♥✐✜❝❛♥t❧② r❡❞✉❝❡s t❤❡ ✉♥✐✈❡rs❡ ♦❢ ♣♦❧✐❝✐❡s t♦ ❝♦♥s✐❞❡r ✇❤❡♥ s❡❛r❝❤✐♥❣ ❢♦r ❛♥ ♦♣t✐♠❛❧ ♣♦❧✐❝②✱ ✐❢ ✐t ❡①✐sts✳ ■♥ ♣❛rt✐❝✉❧❛r✱ ✐t st❛t❡s t❤❛t ✇❡ ♥❡❡❞ ♦♥❧② ❝♦♥s✐❞❡r ♣♦❧✐❝✐❡s t❤❛t ❛r❡ st❛t✐♦♥❛r②✳ ❊①❡r❝✐s❡ ✸✳✶✽✳ ✭❛✮ ❈♦♥s✐❞❡r ❛♥ ✐♥✜♥✐t❡ ❤♦r✐③♦♥ ▼❉P✳ ▲❡t π∗ ❜❡ ❛♥ ♦♣t✐♠❛❧ ♣♦❧✐❝② ❢♦r t❤❡ ▼❉P✳ Pr♦✈❡ t❤❛t t❤❡r❡ ❡①✐sts ❛ st❛t✐♦♥❛r② ♣♦❧✐❝② π✱ t❤❛t ✐s π = (π, π, . . . )✱ ✇❤✐❝❤ ✐s ❛❧s♦ ♦♣t✐♠❛❧✳ ❚❤❡ ♥❡①t t✇♦ t❤❡♦r❡♠s ✐♠♣r♦✈❡ ♦♥ t❤❡ ❝♦♥❝❧✉s✐♦♥ ♦❢ ❊①❡r❝✐s❡ ✸✳✶✽ ❛♥❞ s❤♦✇ ✉s t❤❛t ✇❡ ♠❛② r❡str✐❝t t❤❡ s❡❛r❝❤ t♦ ❛ ✜♥✐t❡ s❡t ♦❢ ❞❡t❡r♠✐♥✐st✐❝ st❛t✐♦♥❛r② ♣♦❧✐❝✐❡s✳ ✶✻

slide-17
SLIDE 17

❚❤❡♦r❡♠ ✸✳✼✳ ❚❤❡ ♥✉♠❜❡r ♦❢ ❞❡t❡r♠✐♥✐st✐❝ st❛t✐♦♥❛r② ♣♦❧✐❝✐❡s ✐s ✜♥✐t❡✱ ❛♥❞ ❡q✉❛❧s |A||S|✳ Pr♦♦❢✳ ❙✐♥❝❡ t❤❡ ♣♦❧✐❝✐❡s ❛r❡ st❛t✐♦♥❛r② ❛♥❞ ❞❡t❡r♠✐♥✐st✐❝✱ ❡❛❝❤ ♣♦❧✐❝② ❝❛♥ ❜❡ r❡♣r❡s❡♥t❡❞ ❛s ❛ ❢✉♥❝t✐♦♥ π : S → A✳ ❚❤❡ ♥✉♠❜❡r ♦❢ s✉❝❤ ❞✐st✐♥❝t ❢✉♥❝t✐♦♥s ✐s ❣✐✈❡♥ ❜② |A||S|✳ ❚❤✐s ❛❧s♦ ♣r♦✈❡s t❤❛t t❤❡ s❡t ♦❢ ❞❡t❡r♠✐♥✐st✐❝ st❛t✐♦♥❛r② ♣♦❧✐❝✐❡s ✐s ✜♥✐t❡✳ ❚❤❡♦r❡♠ ✸✳✽✳ ■❢ π ✐s ❛ st❛t✐♦♥❛r② ♣♦❧✐❝② ❢♦r ❛♥ ✐♥✜♥✐t❡ ❤♦r✐③♦♥ ▼❉P ✇✐t❤ γ < 1✱ t❤❡♥ t❤❡r❡ ❡①✐sts ❛ ❞❡t❡r♠✐♥✐st✐❝ st❛t✐♦♥❛r② ♣♦❧✐❝② ˆ π s✉❝❤ t❤❛t V ˆ

π(s) ≥ V π(s) ❢♦r ❛❧❧ st❛t❡s s ∈ S✳ ❖♥❡ s✉❝❤ ♣♦❧✐❝② ✐s

❣✐✈❡♥ ❜② t❤❡ st❛t✐♦♥❛r② ♣♦❧✐❝② ˆ π(s) = arg max

a∈A

  • R(s, a) + γ
  • s′∈S

P(s′|s, a)V π(s′)

  • , ∀ s ∈ S ,

✭✹✷✮ ✇❤✐❝❤ s❛t✐s✜❡s t❤❡ ❡q✉❛❧✐t② (Bˆ

πV π)(s) = (B∗V π)(s) ≥ V π(s) ❢♦r ❛❧❧ s✳ ▼♦r❡♦✈❡r V ˆ π(s) = V π(s) ❢♦r

❛❧❧ s✱ ✐✛ (B∗V π)(s) = V π(s) ❢♦r ❛❧❧ s✳ Pr♦♦❢✳ ❲❡ ✜rst ♥♦t✐❝❡ t❤❛t t❤❡ ♣♦❧✐❝② ˆ π ❞❡✜♥❡❞ ✐♥ ✭✹✷✮ ✐s ❛ st❛t✐♦♥❛r② ♣♦❧✐❝② ✭❜② ❞❡✜♥✐t✐♦♥✮✱ ❛♥❞ ✐s ❛❧s♦ ❞❡t❡r♠✐♥✐st✐❝ ❢♦r ❡✈❡r② s ∈ S✱ ❜② t❤❡ ❞❡✜♥✐t✐♦♥ ♦❢ arg max ✇✐t❤ t✐❡s ❜r♦❦❡♥ r❛♥❞♦♠❧②✳ ❆s ˆ π ✐s ❞❡t❡r♠✐♥✐st✐❝✱ ✇❡ ❝❛♥ ❝♦♥❝❧✉❞❡ ✉s✐♥❣ ✭✷✼✮ t❤❛t Rˆ

π(s) = R(s, ˆ

π(s)) ❛♥❞ P ˆ

π(s′|s) = P(s′|s, ˆ

π(s)) ❢♦r ❛❧❧ s ∈ S ❛♥❞ a ∈ A✱ ❛♥❞ t❤✉s ✇❡ ❤❛✈❡ (Bˆ

πV π)(s) = R(s, ˆ

π(s)) + γ

  • s′∈S

P(s′|s, ˆ π(s))V π(s) = (B∗V π)(s) , ✭✹✸✮ ❢♦r ❛❧❧ st❛t❡s s ∈ S ✉s✐♥❣ ✭✹✷✮✱ ❛♥❞ t❤❡ ❞❡✜♥✐t✐♦♥s ♦❢ t❤❡ ❇❡❧❧♠❛♥ ❜❛❝❦✉♣ ♦♣❡r❛t♦rs ✐♥ ✭✷✽✮ ❛♥❞ ✭✸✶✮✳ ◆❡①t✱ ❜② ❈♦r♦❧❧❛r② ✸✳✷✳✷ ✇❡ ❤❛✈❡ BπV π = V π✱ ❛♥❞ ❜② ❚❤❡♦r❡♠ ✸✳✻ ✇❡ ❤❛✈❡ B∗V π ≥ BπV π✱ ❛♥❞ s♦ ❝♦♠❜✐♥✐♥❣ t❤❡s❡ ✇✐t❤ ✭✹✸✮ ✇❡ ♦❜t❛✐♥ (Bˆ

πV π)(s) = (B∗V π)(s) ≥ V π(s) , ∀ s ∈ S .

✭✹✹✮ ◆❡①t ✉s✐♥❣ ❚❤❡♦r❡♠ ✸✳✸✱ t❤❡ ♠♦♥♦t♦♥✐❝✐t② ♣r♦♣❡rt② ♦❢ Bˆ

π ❛❧❧♦✇s ✉s t♦ ❝♦♥❝❧✉❞❡ ❜② r❡♣❡❛t❡❞❧② ❛♣♣❧②✐♥❣

π t♦ ❜♦t❤ s✐❞❡s ♦❢ ✭✹✹✮ t❤❛t ((Bˆ π)kV π)(s) ≥ V π(s) ❢♦r ❛❧❧ k ≥ 1✱ ❛♥❞ ❢♦r ❛❧❧ st❛t❡s s ∈ S✳ ❚❤❡♥ ✉s✐♥❣

❈♦r♦❧❧❛r② ✸✳✷✳✶✱ ❛♥❞ ♥♦t✐❝✐♥❣ t❤❛t V ˆ

π ✐s t❤❡ ✉♥✐q✉❡ ✜①❡❞ ♣♦✐♥t ♦❢ Bˆ π ✇❡ ♦❜t❛✐♥ ❜② t❛❦✐♥❣ ❧✐♠✐ts

V ˆ

π(s) = (Bˆ πV ˆ π)(s) = lim k→∞((Bˆ π)kV π)(s) ≥ V π(s) , ∀ s ∈ S .

✭✹✺✮ ❚♦ ♣r♦✈❡ t❤❡ s❡❝♦♥❞ ♣❛rt ♦❢ t❤❡ t❤❡♦r❡♠✱ ✜rst ❛ss✉♠❡ t❤❛t B∗V π = V π✳ ❚❤❡♥ ❜② ✭✹✹✮ ✇❡ ❤❛✈❡ Bˆ

πV π = B∗V π = V π✱ ❛♥❞ s♦ ❜② ✉♥✐q✉❡♥❡ss ♦❢ t❤❡ ✜①❡❞ ♣♦✐♥t ♦❢ Bˆ π ✇❡ ❣❡t V ˆ π = Bˆ πV ˆ π = V π✳ ◆❡①t

❛ss✉♠❡ t❤❛t V ˆ

π = V π✳ ❚❤❡♥ ❛❣❛✐♥ ❜② ✭✹✹✮ ✇❡ ❤❛✈❡ V π = V ˆ π = Bˆ πV ˆ π = Bˆ πV π = B∗V π ≥ V π✱

✐♠♣❧②✐♥❣ t❤❛t B∗V π = V π✱ t❤✉s ❝♦♠♣❧❡t✐♥❣ t❤❡ ♣r♦♦❢✳ ❈♦r♦❧❧❛r② ✸✳✽✳✶✳ ■♥ t❤❡ ♥♦t❛t✐♦♥ ♦❢ ❚❤❡♦r❡♠ ✸✳✽✱ ✐❢ ∃ s ∈ S s✉❝❤ t❤❛t (B∗V π)(s) > V π(s)✱ t❤❡♥ V ˆ

π(s) > V π(s)✳ ■♥ t❤✐s ❝❛s❡✱ ✇❡ s❛② t❤❛t ˆ

π ✐s ✏str✐❝t❧② ❜❡tt❡r✑ t❤❛♥ π ❛s ❛ ♣♦❧✐❝②✳ Pr♦♦❢✳ ❚❤❡ ♣r♦♦❢ ❢♦❧❧♦✇s ✐♠♠❡❞✐❛t❡❧② ❜② ♥♦t✐♥❣ t❤❛t t❤❡ ✐♥❡q✉❛❧✐t② ✐♥ ✭✹✹✮ ❜❡❝♦♠❡s ❛ str✐❝t ✐♥❡q✉❛❧✐t②✱ ❛♥❞ t❤❡♥ ❛♣♣❧②✐♥❣ ❚❤❡♦r❡♠ ✸✳✸✳ ❚❤❡ ❝♦♥s❡q✉❡♥❝❡s ♦❢ ❚❤❡♦r❡♠s ✸✳✼ ❛♥❞ ✸✳✽ ✐s s♣❡❝t❛❝✉❧❛r✱ ❜❡❝❛✉s❡ ♥♦✇ t❤❡ s❡❛r❝❤ ❢♦r ❛♥ ♦♣t✐♠❛❧ ♣♦❧✐❝② ❤❛s ❜❡❡♥ r❡❞✉❝❡❞ t♦ t❤❡ s❡t ♦❢ ♦♥❧② t❤❡ ❞❡t❡r♠✐♥✐st✐❝ st❛t✐♦♥❛r② ♣♦❧✐❝✐❡s ✇❤✐❝❤ ✐s ❛ ✜♥✐t❡ s❡t✱ ✐❢ s✉❝❤ ❛ ♣♦❧✐❝② ❡①✐sts✳ ❚❤❡ r❡❛❞❡r ✐s t♦ ♣r♦✈❡ t❤❛t t❤✐s ✐s ❛❝t✉❛❧❧② t❤❡ ❝❛s❡ ✐♥ t❤❡ ❢♦❧❧♦✇✐♥❣ ❡①❡r❝✐s❡✳ ❊①❡r❝✐s❡ ✸✳✶✾✳ ❈♦♥s✐❞❡r ❛♥ ✐♥✜♥✐t❡ ❤♦r✐③♦♥ ▼❉P ✇✐t❤ γ < 1✳ ❉❡♥♦t❡ Π t♦ ❜❡ t❤❡ s❡t ♦❢ ❛❧❧ ❞❡t❡r✲ ♠✐♥✐st✐❝ st❛t✐♦♥❛r② ♣♦❧✐❝✐❡s✳ ✭❛✮ Pr♦✈❡ t❤❛t ∃ π∗ ∈ Π✱ s✉❝❤ t❤❛t ❢♦r ❛❧❧ π ∈ Π✱ ❛♥❞ ❢♦r ❛❧❧ st❛t❡s s ∈ S✱ V π∗(s) ≥ V π(s)✳ ✭❜✮ ❈♦♥❝❧✉❞❡ t❤❛t π∗ = (π∗, π∗, . . . ) ✐s ❛♥ ♦♣t✐♠❛❧ ♣♦❧✐❝②✳ ❍✐♥t ✿ ❙❡❡ ❚❤❡♦r❡♠ ✸✳✶✵✳ ✶✼

slide-18
SLIDE 18

❲❡ ❤❛✈❡ t❤✉s ❡st❛❜❧✐s❤❡❞ t❤❡ ❡①✐st❡♥❝❡ ♦❢ ❛♥ ♦♣t✐♠❛❧ ♣♦❧✐❝② ❛♥❞ ♠♦r❡♦✈❡r ❝♦♥❝❧✉❞❡❞ t❤❛t ❛ ❞❡t❡r♠✐♥✲ ✐st✐❝ st❛t✐♦♥❛r② ♣♦❧✐❝② s✉✣❝❡s✳ ❚❤✐s t❤❡♥ ❛❧❧♦✇s ✉s t♦ ♠❛❦❡ t❤❡ ❢♦❧❧♦✇✐♥❣ ❞❡✜♥✐t✐♦♥✿ ❉❡✜♥✐t✐♦♥ ✸✳✷✳ ❚❤❡ ♦♣t✐♠❛❧ ✈❛❧✉❡ ❢✉♥❝t✐♦♥ ❢♦r ❛♥ ✐♥✜♥✐t❡ ❤♦r✐③♦♥ ▼❉P ✐s ❞❡✜♥❡❞ ❛s V ∗(s) = max

π∈Π V π(s) ,

✭✹✻✮ ❛♥❞ t❤❡r❡ ❡①✐sts ❛ st❛t✐♦♥❛r② ❞❡t❡r♠✐♥✐st✐❝ ♣♦❧✐❝② π∗ ∈ Π✱ ✇❤✐❝❤ ✐s ❛♥ ♦♣t✐♠❛❧ ♣♦❧✐❝②✱ s✉❝❤ t❤❛t V ∗(s) = V π∗(s) ❢♦r ❛❧❧ st❛t❡s s ∈ S✱ ✇❤❡r❡ Π ✐s t❤❡ s❡t ♦❢ ❛❧❧ st❛t✐♦♥❛r② ❞❡t❡r♠✐♥✐st✐❝ ♣♦❧✐❝✐❡s✳ ❲❡ ♥❡①t ❧♦♦❦ ❛t ❛ ❢❡✇ ❛❧❣♦r✐t❤♠s t♦ ❝♦♠♣✉t❡ t❤❡ ♦♣t✐♠❛❧ ✈❛❧✉❡ ❢✉♥❝t✐♦♥ ❛♥❞ ❛♥ ♦♣t✐♠❛❧ ♣♦❧✐❝②✳ ✸✳✻✳✶ P♦❧✐❝② s❡❛r❝❤ ❉❡✜♥✐t✐♦♥ ✸✳✷ ✐♠♠❡❞✐❛t❡❧② r❡♥❞❡rs ✐ts❡❧❢ t♦ ❛ ❜r✉t❡ ❢♦r❝❡ ❛❧❣♦r✐t❤♠ ❝❛❧❧❡❞ ♣♦❧✐❝② s❡❛r❝❤ t♦ ✜♥❞ t❤❡ ♦♣t✐♠❛❧ ✈❛❧✉❡ ❢✉♥❝t✐♦♥ V ∗ ❛♥❞ ❛♥ ♦♣t✐♠❛❧ ♣♦❧✐❝② π∗✱ ❛s ❞❡s❝r✐❜❡❞ ✐♥ ♣s❡✉❞♦✲❝♦❞❡ ✐♥ ❛❧❣♦r✐t❤♠ ✺✳ ❚❤❡ ❛❧❣♦r✐t❤♠ t❛❦❡s ❛s ✐♥♣✉t ❛♥ ✐♥✜♥✐t❡ ❤♦r✐③♦♥ ▼❉P M = (S, A, P, R, γ) ❛♥❞ ❛ t♦❧❡r❛♥❝❡ ǫ ❢♦r ❛❝❝✉r❛❝② ♦❢ ♣♦❧✐❝② ❡✈❛❧✉❛t✐♦♥✱ ❛♥❞ r❡t✉r♥s t❤❡ ♦♣t✐♠❛❧ ✈❛❧✉❡ ❢✉♥❝t✐♦♥ ❛♥❞ ❛♥ ♦♣t✐♠❛❧ ♣♦❧✐❝②✳ ❆❧❣♦r✐t❤♠ ✺ P♦❧✐❝② s❡❛r❝❤ ❛❧❣♦r✐t❤♠ t♦ ❝❛❧❝✉❧❛t❡ ♦♣t✐♠❛❧ ✈❛❧✉❡ ❢✉♥❝t✐♦♥ ❛♥❞ ✜♥❞ ❛♥ ♦♣t✐♠❛❧ ♣♦❧✐❝②

✶✿ ♣r♦❝❡❞✉r❡ P♦❧✐❝② ❙❡❛r❝❤✭M, ǫ✮ ✷✿

Π ← ❆❧❧ st❛t✐♦♥❛r② ❞❡t❡r♠✐♥✐st✐❝ ♣♦❧✐❝✐❡s ♦❢ ▼

✸✿

π∗ ← ❘❛♥❞♦♠❧② ❝❤♦♦s❡ ❛ ♣♦❧✐❝② π ∈ Π

✹✿

V ∗ ← P❖▲■❈❨ ❊❱❆▲❯❆❚■❖◆ ✭M, π∗, ǫ✮

✺✿

❢♦r π ∈ Π ❞♦

✻✿

V π ← P❖▲■❈❨ ❊❱❆▲❯❆❚■❖◆ ✭M, π, ǫ✮

✼✿

✐❢ V π(s) ≥ V ∗(s) ❢♦r ❛❧❧ s ∈ S✱ t❤❡♥

✽✿

V ∗ ← V π

✾✿

π∗ ← π

✶✵✿

r❡t✉r♥ V ∗(s), π∗(s) ❢♦r ❛❧❧ s ∈ S ■t ✐s ❝❧❡❛r t❤❛t ❛❧❣♦r✐t❤♠ ✺ ❛❧✇❛②s t❡r♠✐♥❛t❡s ❛s ✐t ❝❤❡❝❦s ❛❧❧ |A||S| ❞❡t❡r♠✐♥✐st✐❝ st❛t✐♦♥❛r② ♣♦❧✐❝✐❡s✳ ❚❤✉s t❤❡ r✉♥✲t✐♠❡ ❝♦♠♣❧❡①✐t② ♦❢ t❤✐s ❛❧❣♦r✐t❤♠ ✐s O(|A||S|)✳ ■t ✐s ♣♦ss✐❜❧❡ t♦ ♣r♦✈❡ ❝♦rr❡❝t♥❡ss ♦❢ t❤❡ ❛❧❣♦r✐t❤♠ ✇❤❡♥ ǫ = 0✱ ✐✳❡✳ ✇❤❡♥ ✐♥ ❡❛❝❤ ✐t❡r❛t✐♦♥ t❤❡ ♣♦❧✐❝② ❡✈❛❧✉❛t✐♦♥ ✐s ❞♦♥❡ ❡①❛❝t❧②✳ ■♥ ♣r❛❝t✐❝❡ ǫ ✐s s❡t t♦ ❛ s♠❛❧❧ ♥✉♠❜❡r s✉❝❤ ❛s 10−9 t♦ 10−12✳ ❚❤❡♦r❡♠ ✸✳✾✳ ❆❧❣♦r✐t❤♠ ✺ r❡t✉r♥s t❤❡ ♦♣t✐♠❛❧ ✈❛❧✉❡ ❢✉♥❝t✐♦♥ ❛♥❞ ❛♥ ♦♣t✐♠❛❧ ♣♦❧✐❝② ✇❤❡♥ ǫ = 0✳ Pr♦♦❢✳ ▲❡t π∗ ❜❡ ❛♥ ♦♣t✐♠❛❧ ♣♦❧✐❝②✱ ❛♥❞ t❤✉s V π∗(s) = V ∗(s) ❢♦r ❛❧❧ st❛t❡s s ∈ S✳ ❙✐♥❝❡ t❤❡ ❛❧❣♦r✐t❤♠ ❝❤❡❝❦s ❡✈❡r② ♣♦❧✐❝② ✐♥ Π✱ ✐t ♠❡❛♥s t❤❛t π∗ ♠✉st ❣❡t s❡❧❡❝t❡❞ ❛t s♦♠❡ ✐t❡r❛t✐♦♥ ♦❢ t❤❡ ❛❧❣♦r✐t❤♠✳ ❚❤✉s ❢♦r t❤❡ ♣♦❧✐❝✐❡s ❝♦♥s✐❞❡r❡❞ ✐♥ ❢✉t✉r❡ ✐t❡r❛t✐♦♥s t❤❡ ✈❛❧✉❡ ❢✉♥❝t✐♦♥ ❝❛♥ ♥♦ ❧♦♥❣❡r str✐❝t❧② ✐♥❝r❡❛s❡✳ ❋✉t✉r❡ ✐t❡r❛t✐♦♥s ♠❛② s❡❧❡❝t ❛ ❞✐✛❡r❡♥t ♣♦❧✐❝② ✇✐t❤ t❤❡ s❛♠❡ ♦♣t✐♠❛❧ ✈❛❧✉❡ ❢✉♥❝t✐♦♥✱ t❤✉s ❝♦♠♣❧❡t✐♥❣ t❤❡ ♣r♦♦❢✳ ❊①❡r❝✐s❡ ✸✳✷✵✳ ❈♦♥s✐❞❡r t❤❡ ▼❉P ❞✐s❝✉ss❡❞ ✐♥ s❡❝t✐♦♥ ✸✳✹✳✷✱ s❤♦✇♥ ✐♥ ❋✐❣✉r❡ ✸✳ ❈♦♥s✐❞❡r t❤❡ ❤♦r✐③♦♥ t♦ ❜❡ ✐♥✜♥✐t❡✳ ✭❛✮ ❍♦✇ ♠❛♥② ❞❡t❡r♠✐♥✐st✐❝ st❛t✐♦♥❛r② ♣♦❧✐❝✐❡s ❞♦❡s t❤❡ ❛❣❡♥t ❤❛✈❡ ❄ ✭❜✮ ■❢ γ < 1✱ ✐s t❤❡ ♦♣t✐♠❛❧ ♣♦❧✐❝② ✉♥✐q✉❡ ❄ ✭❝✮ ■❢ γ = 1✱ ✐s t❤❡ ♦♣t✐♠❛❧ ♣♦❧✐❝② ✉♥✐q✉❡ ❄ ✸✳✻✳✷ P♦❧✐❝② ✐t❡r❛t✐♦♥ ❲❡ ♥♦✇ ❞✐s❝✉ss ❛ ♠♦r❡ ❡✣❝✐❡♥t ❛❧❣♦r✐t❤♠ t❤❛♥ ♣♦❧✐❝② s❡❛r❝❤ ❝❛❧❧❡❞ ♣♦❧✐❝② ✐t❡r❛t✐♦♥✳ ❚❤❡ ❛❧❣♦r✐t❤♠ ✐s ❛ str❛✐❣❤t❢♦r✇❛r❞ ❛♣♣❧✐❝❛t✐♦♥ ♦❢ ❚❤❡♦r❡♠ ✸✳✽✱ ✇❤✐❝❤ st❛t❡s t❤❛t ❣✐✈❡♥ ❛♥② st❛t✐♦♥❛r② ♣♦❧✐❝② π✱ ✇❡ ❝❛♥ ✜♥❞ ❛ ❞❡t❡r♠✐♥✐st✐❝ st❛t✐♦♥❛r② ♣♦❧✐❝② t❤❛t ✐s ♥♦ ✇♦rs❡ t❤❛♥ t❤❡ ❡①✐st✐♥❣ ♣♦❧✐❝②✳ ■♥ ♣❛rt✐❝✉❧❛r t❤❡ ✶✽

slide-19
SLIDE 19

❆❧❣♦r✐t❤♠ ✻ P♦❧✐❝② ✐♠♣r♦✈❡♠❡♥t ❛❧❣♦r✐t❤♠ t♦ ✐♠♣r♦✈❡ ❛♥ ✐♥♣✉t ♣♦❧✐❝②

✶✿ ♣r♦❝❡❞✉r❡ P♦❧✐❝② ■♠♣r♦✈❡♠❡♥t✭M, V π✮ ✷✿

ˆ π(s) ← arg max

a∈A

  • R(s, a) + γ

s′∈S P(s′|s, a)V π(s′)

  • , ∀ s ∈ S

✸✿

r❡t✉r♥ ˆ π(s) ❢♦r ❛❧❧ s ∈ S t❤❡♦r❡♠ ❛❧s♦ ❛♣♣❧✐❡s t♦ ❞❡t❡r♠✐♥✐st✐❝ ♣♦❧✐❝✐❡s✳ ❚❤✐s s✐♠♣❧❡ st❡♣ ❤❛s ❛ s♣❡❝✐❛❧ ♥❛♠❡ ❝❛❧❧❡❞ ✏♣♦❧✐❝② ✐♠♣r♦✈❡♠❡♥t✑✱ ✇❤♦s❡ ♣s❡✉❞♦✲❝♦❞❡ ✐s ♣r❡s❡♥t❡❞ ✐♥ ❛❧❣♦r✐t❤♠ ✻✳ ❚❤❡ ♦✉t♣✉t ♦❢ ❛❧❣♦r✐t❤♠ ✻ ✐s ❛❧✇❛②s ❣✉❛r❛♥t❡❡❞ t♦ ❜❡ ❛t ❧❡❛st ❛s ❣♦♦❞ ❛s t❤❡ ♣♦❧✐❝② π ❝♦rr❡s♣♦♥❞✐♥❣ t♦ t❤❡ ✐♥♣✉t ✈❛❧✉❡ ❢✉♥❝t✐♦♥ V π✱ ❛♥❞ r❡♣r❡s❡♥ts ❛ ✏❣r❡❡❞②✑ ❛tt❡♠♣t t♦ ✐♠♣r♦✈❡ t❤❡ ♣♦❧✐❝②✳ ❲❤❡♥ ♣❡r❢♦r♠❡❞ ✐t❡r❛t✐✈❡❧② ✇✐t❤ t❤❡ ♣♦❧✐❝② ❡✈❛❧✉❛t✐♦♥ ❛❧❣♦r✐t❤♠ ✭❛❧❣♦r✐t❤♠ ✹✮✱ t❤✐s ❣✐✈❡s r✐s❡ t♦ t❤❡ ♣♦❧✐❝② ✐t❡r❛t✐♦♥ ❛❧❣♦r✐t❤♠✳ ❚❤❡ ♣s❡✉❞♦✲❝♦❞❡ ♦❢ ♣♦❧✐❝② ✐t❡r❛t✐♦♥ ✐s ♦✉t❧✐♥❡❞ ✐♥ ❛❧❣♦r✐t❤♠ ✼✳ ❆❧❣♦r✐t❤♠ ✼ P♦❧✐❝② ✐t❡r❛t✐♦♥ ❛❧❣♦r✐t❤♠ t♦ ❝❛❧❝✉❧❛t❡ ♦♣t✐♠❛❧ ✈❛❧✉❡ ❢✉♥❝t✐♦♥ ❛♥❞ ✜♥❞ ❛♥ ♦♣t✐♠❛❧ ♣♦❧✐❝②

✶✿ ♣r♦❝❡❞✉r❡ P♦❧✐❝② ■t❡r❛t✐♦♥✭M, ǫ✮ ✷✿

π ← ❘❛♥❞♦♠❧② ❝❤♦♦s❡ ❛ ♣♦❧✐❝② π ∈ Π

✸✿

✇❤✐❧❡ tr✉❡ ❞♦

✹✿

V π ← P❖▲■❈❨ ❊❱❆▲❯❆❚■❖◆ ✭M, π, ǫ✮

✺✿

π∗ ← P❖▲■❈❨ ■▼P❘❖❱❊▼❊◆❚ ✭M, V π✮

✻✿

✐❢ π∗(s) = π(s) t❤❡♥

✼✿

❜r❡❛❦

✽✿

❡❧s❡

✾✿

π ← π∗

✶✵✿

V ∗ ← V π

✶✶✿

r❡t✉r♥ V ∗(s), π∗(s) ❢♦r ❛❧❧ s ∈ S ❚❤❡ ♣r♦♦❢ ♦❢ ❝♦rr❡❝t♥❡ss ♦❢ ❛❧❣♦r✐t❤♠ ✼ ✐s ❧❡❢t t♦ t❤❡ r❡❛❞❡r ❛s t❤❡ ♥❡①t ❡①❡r❝✐s❡✳ ◆♦t❡ t❤❛t t❤❡ ❛❧❣♦r✐t❤♠ ✇✐❧❧ ❛❧✇❛②s t❡r♠✐♥❛t❡ ❛s t❤❡r❡ ❛r❡ ❛ ✜♥✐t❡ ♥✉♠❜❡r ♦❢ st❛t✐♦♥❛r② ❞❡t❡r♠✐♥✐st✐❝ ♣♦❧✐❝✐❡s ❜② ❚❤❡♦r❡♠ ✸✳✼✳ ❊①❡r❝✐s❡ ✸✳✷✶✳ ❈♦♥s✐❞❡r ❛♥ ✐♥✜♥✐t❡ ❤♦r✐③♦♥ ▼❉P ✇✐t❤ γ < 1✳ ✭❛✮ ❙❤♦✇ t❤❛t ✇❤❡♥ ❛❧❣♦r✐t❤♠ ✼ ✐s r✉♥ ✇✐t❤ ǫ = 0✱ ✐t ✜♥❞s t❤❡ ♦♣t✐♠❛❧ ✈❛❧✉❡ ❢✉♥❝t✐♦♥ ❛♥❞ ❛♥ ♦♣t✐♠❛❧ ♣♦❧✐❝②✳ ❍✐♥t ✿ ❙❡❡ ❚❤❡♦r❡♠ ✸✳✶✵✳ ✭❜✮ Pr♦✈❡ t❤❛t t❤❡ t❡r♠✐♥❛t✐♦♥ ❝r✐t❡r✐❛ ✉s❡❞ ✐♥ t❤❡ ❛❧❣♦r✐t❤♠ ♠❛❦❡s s❡♥s❡✿ ♣r❡❝✐s❡❧② s❤♦✇ t❤❛t ✐❢ t❤❡ ♣♦❧✐❝② ❞♦❡s ♥♦t ❝❤❛♥❣❡ ❞✉r✐♥❣ ❛ ♣♦❧✐❝② ✐♠♣r♦✈❡♠❡♥t st❡♣✱ t❤❡♥ t❤❡ ♣♦❧✐❝② ❝❛♥♥♦t ✐♠♣r♦✈❡ ✐♥ ❢✉t✉r❡ ✐t❡r❛t✐♦♥s✳ ✭❝✮ ❙❤♦✇ t❤❛t t❤❡ ✈❛❧✉❡ ❢✉♥❝t✐♦♥s ❝♦rr❡s♣♦♥❞✐♥❣ t♦ t❤❡ ♣♦❧✐❝✐❡s ✐♥ ❡❛❝❤ ✐t❡r❛t✐♦♥ ♦❢ t❤❡ ❛❧❣♦r✐t❤♠ ❢♦r♠ ❛ ♥♦♥✲❞❡❝r❡❛s✐♥❣ s❡q✉❡♥❝❡ ❢♦r ❡✈❡r② s ∈ S✳ ✭❞✮ ❲❤❛t ✐s t❤❡ ✇♦rst ❝❛s❡ r✉♥✲t✐♠❡ ❝♦♠♣❧❡①✐t② ♦❢ t❤✐s ❛❧❣♦r✐t❤♠ ❄ ✸✳✻✳✸ ❱❛❧✉❡ ✐t❡r❛t✐♦♥ ❲❡ ♥♦✇ ❞✐s❝✉ss ✈❛❧✉❡ ✐t❡r❛t✐♦♥ ✇❤✐❝❤ ✐s ②❡t ❛♥♦t❤❡r t❡❝❤♥✐q✉❡ t❤❛t ❝❛♥ ❜❡ ✉s❡❞ t♦ ❝♦♠♣✉t❡ t❤❡ ♦♣t✐♠❛❧ ✈❛❧✉❡ ❢✉♥❝t✐♦♥ ❛♥❞ ❛♥ ♦♣t✐♠❛❧ ♣♦❧✐❝②✱ ❣✐✈❡♥ ❛ ▼❉P✳ ❚♦ ♠♦t✐✈❛t❡ t❤✐s ♠❡t❤♦❞ ✇❡ ✇✐❧❧ ♥❡❡❞ t❤❡ ❢♦❧❧♦✇✐♥❣ t❤❡♦r❡♠✿ ❚❤❡♦r❡♠ ✸✳✶✵✳ ❋♦r ❛ ▼❉P ✇✐t❤ γ < 1✱ ❧❡t t❤❡ ✜①❡❞ ♣♦✐♥t ♦❢ t❤❡ ❇❡❧❧♠❛♥ ♦♣t✐♠❛❧✐t② ❜❛❝❦✉♣ ♦♣❡r❛t♦r B∗ ❜❡ ❞❡♥♦t❡❞ ❜② V ∗ ∈ R|S|✳ ❚❤❡♥ t❤❡ ♣♦❧✐❝② ❣✐✈❡♥ ❜② π∗(s) = arg max

a∈A

  • R(s, a) + γ
  • s′∈S

P(s′|s, a)V ∗(s′)

  • , ∀ s ∈ S ,

✭✹✼✮ ✐s ❛ st❛t✐♦♥❛r② ❞❡t❡r♠✐♥✐st✐❝ ♣♦❧✐❝②✳ ❚❤❡ ✈❛❧✉❡ ❢✉♥❝t✐♦♥ ♦❢ t❤✐s ♣♦❧✐❝② V π∗ s❛t✐s✜❡s t❤❡ ✐❞❡♥t✐t② V π∗ = V ∗✱ ❛♥❞ t❤✉s V ∗ ✐s ❛❧s♦ t❤❡ ✜①❡❞ ♣♦✐♥t ♦❢ t❤❡ ♦♣❡r❛t♦r Bπ∗✳ ■♥ ♣❛rt✐❝✉❧❛r t❤✐s ✐♠♣❧✐❡s t❤❛t t❤❡r❡ ❡①✐sts ❛ st❛t✐♦♥❛r② ❞❡t❡r♠✐♥✐st✐❝ ♣♦❧✐❝② π∗ ✇❤♦s❡ ✈❛❧✉❡ ❢✉♥❝t✐♦♥ ✐s t❤❡ ✜①❡❞ ♣♦✐♥t ♦❢ B∗✳ ▼♦r❡♦✈❡r✱ π∗ ✐s ❛♥ ♦♣t✐♠❛❧ ♣♦❧✐❝②✳ ✶✾

slide-20
SLIDE 20

Pr♦♦❢✳ ❲❡ st❛rt ❜② ♥♦t✐♥❣ t❤❛t π∗ ❛s ❞❡✜♥❡❞ ✐♥ ✭✹✼✮ ✐s ❛ st❛t✐♦♥❛r② ❞❡t❡r♠✐♥✐st✐❝ ♣♦❧✐❝②✱ ❛♥❞ s♦ ✇❡ ❝❛♥ ❝♦♥❝❧✉❞❡ ✉s✐♥❣ ✭✷✼✮ t❤❛t Rπ∗(s) = R(s, π∗(s)) ❛♥❞ P π∗(s′|s) = P(s′|s, π∗(s)) ❢♦r ❛❧❧ s ∈ S ❛♥❞ a ∈ A✳ ❆s V ∗ ✐s t❤❡ ✜①❡❞ ♣♦✐♥t ♦❢ B∗✱ ✇❡ ❤❛✈❡ B∗V ∗ = V ∗✳ ❙♦ ✉s✐♥❣ ❞❡✜♥✐t✐♦♥ ✭✸✶✮ ♦❢ B∗✱ ❛♥❞ ✭✹✼✮ ✇❡ ❝❛♥ ✇r✐t❡ V ∗(s) = max

a∈A

  • R(s, a) + γ
  • s′∈S

P(s′|s, a)V ∗(s′)

  • = R(s, π∗(s)) + γ
  • s′∈S

P(s′|s, π∗(s))V ∗(s′) = Rπ∗(s) + γ

  • s′∈S

P π∗(s′|s)V ∗(s′) = V π∗(s) ✭✹✽✮ ❢♦r ❛❧❧ s ∈ S✱ ❝♦♠♣❧❡t✐♥❣ t❤❡ ♣r♦♦❢ ♦❢ t❤❡ ✜rst ♣❛rt ♦❢ t❤❡ t❤❡♦r❡♠✳ ❚♦ ♣r♦✈❡ t❤❛t π∗ ✐s ❛♥ ♦♣t✐♠❛❧ ♣♦❧✐❝②✱ ✇❡ s❤♦✇ t❤❛t ✐❢ ❛♥ ♦♣t✐♠❛❧ ♣♦❧✐❝② ❡①✐sts t❤❡♥ ✐ts ✈❛❧✉❡ ❢✉♥❝t✐♦♥ ♠✉st ❜❡ ❛ ✜①❡❞ ♣♦✐♥t ♦❢ t❤❡ ♦♣❡r❛t♦r B∗✳ ❙♦ ❛ss✉♠❡ t❤❛t ❛♥ ♦♣t✐♠❛❧ ♣♦❧✐❝② ❡①✐sts✱ ✇❤✐❝❤ ❜② ❚❤❡♦r❡♠ ✸✳✽ ✇❡ ❝❛♥ t❛❦❡ t♦ ❜❡ ❛ st❛t✐♦♥❛r② ❞❡t❡r♠✐♥✐st✐❝ ♣♦❧✐❝②✱ ❛♥❞ ❧❡t ✉s ❞❡♥♦t❡ ✐t ❛s µ ❛♥❞ t❤❡ ❝♦rr❡s♣♦♥❞✐♥❣ ♦♣t✐♠❛❧ ✈❛❧✉❡ ❢✉♥❝t✐♦♥ ❛s V µ✳ ◆♦✇ ❢♦r t❤❡ s❛❦❡ ♦❢ ❝♦♥tr❛❞✐❝t✐♦♥✱ s✉♣♣♦s❡ V µ ✐s ♥♦t ❛ ✜①❡❞ ♣♦✐♥t ♦❢ B∗✳ ❚❤❡♥ t❤❡r❡ ❡①✐sts s ∈ S s✉❝❤ t❤❛t V µ(s) = (B∗V µ)(s)✱ ✇❤✐❝❤ ✉♣♦♥ ❝♦♠❜✐♥✐♥❣ ✇✐t❤ ❚❤❡♦r❡♠ ✸✳✽ ✐♠♣❧✐❡s t❤❛t V µ(s) > (B∗V µ)(s)✳ ❚❤❡♥ ❛♣♣❧✐❝❛t✐♦♥ ♦❢ ❈♦r♦❧❧❛r② ✸✳✽✳✶ ✐♠♣❧✐❡s t❤❛t t❤❡r❡ ❡①✐sts ❛ ♣♦❧✐❝② ˆ π ✇❤✐❝❤ ✐s str✐❝t❧② ❜❡tt❡r t❤❛♥ µ✱ ❛♥❞ s♦ ✇❡ ❤❛✈❡ ❛ ❝♦♥tr❛❞✐❝t✐♦♥✳ ❚❤✐s ♣r♦✈❡s t❤❛t V µ ♠✉st ❜❡ t❤❡ ✉♥✐q✉❡ ✜①❡❞ ♣♦✐♥t ♦❢ B∗✳ ❈♦♠❜✐♥✐♥❣ t❤✐s ❢❛❝t ✇✐t❤ t❤❡ ✜rst ♣❛rt ✐♠♣❧✐❡s t❤❛t V ∗ ♠✉st ❜❡ t❤❡ ♦♣t✐♠❛❧ ✈❛❧✉❡ ❢✉♥❝t✐♦♥ ❛♥❞ π∗ ✐s ❛♥ ♦♣t✐♠❛❧ ♣♦❧✐❝②✳ ❚❤✐s ❝♦♠♣❧❡t❡s t❤❡ ♣r♦♦❢✳ ❚❤❡♦r❡♠ ✸✳✶✵ s✉❣❣❡sts ❛ str❛✐❣❤t❢♦r✇❛r❞ ✇❛② t♦ ❝❛❧❝✉❧❛t❡ t❤❡ ♦♣t✐♠❛❧ ✈❛❧✉❡ ❢✉♥❝t✐♦♥ V ∗ ❛♥❞ ❛♥ ♦♣t✐♠❛❧ ♣♦❧✐❝② π∗✳ ❚❤❡ ✐❞❡❛ ✐s t♦ r✉♥ ✜①❡❞ ♣♦✐♥t ✐t❡r❛t✐♦♥s t♦ ✜♥❞ t❤❡ ✜①❡❞ ♣♦✐♥t ♦❢ B∗ ✉s✐♥❣ ❈♦r♦❧❧❛r② ✸✳✺✳✶✳ ❖♥❝❡ ✇❡ ❤❛✈❡ V ∗✱ ❛♥ ♦♣t✐♠❛❧ ♣♦❧✐❝② π∗ ❝❛♥ ❜❡ ❡①tr❛❝t❡❞ ✉s✐♥❣ ✭✹✼✮✳ ❚❤❡ ♣s❡✉❞♦✲❝♦❞❡ ♦❢ t❤✐s ❛❧❣♦r✐t❤♠ ✐s ❣✐✈❡♥ ✐♥ ❛❧❣♦r✐t❤♠ ✽✱ ✇❤✐❝❤ t❛❦❡s ❛s ✐♥♣✉t ❛♥ ✐♥✜♥✐t❡ ❤♦r✐③♦♥ ▼❉P M = (S, A, P, R, γ) ❛♥❞ ❛ t♦❧❡r❛♥❝❡ ǫ✱ ❛♥❞ r❡t✉r♥s t❤❡ ♦♣t✐♠❛❧ ✈❛❧✉❡ ❢✉♥❝t✐♦♥ ❛♥❞ ❛♥ ♦♣t✐♠❛❧ ♣♦❧✐❝②✳ ❆❧❣♦r✐t❤♠ ✽ ❱❛❧✉❡ ✐t❡r❛t✐♦♥ ❛❧❣♦r✐t❤♠ t♦ ❝❛❧❝✉❧❛t❡ ♦♣t✐♠❛❧ ✈❛❧✉❡ ❢✉♥❝t✐♦♥ ❛♥❞ ✜♥❞ ❛♥ ♦♣t✐♠❛❧ ♣♦❧✐❝②

✶✿ ♣r♦❝❡❞✉r❡ ❱❛❧✉❡ ■t❡r❛t✐♦♥✭M, ǫ✮ ✷✿

❋♦r ❛❧❧ st❛t❡s s ∈ S✱ V ′(s) ← 0✱ V (s) ← ∞

✸✿

✇❤✐❧❡ ||V − V ′||∞ > ǫ ❞♦

✹✿

V ← V ′

✺✿

❋♦r ❛❧❧ st❛t❡s s ∈ S✱ V ′(s) = max

a∈A

  • R(s, a) + γ

s′∈S P(s′|s, a)V (s′)

  • ✻✿

V ∗ ← V ❢♦r ❛❧❧ s ∈ S

✼✿

π∗ ← arg max

a∈A

  • R(s, a) + γ

s′∈S P(s′|s, a)V ∗(s′)

  • , ∀ s ∈ S

✽✿

r❡t✉r♥ V ∗(s), π∗(s) ❢♦r ❛❧❧ s ∈ S ■❢ ❛❧❣♦r✐t❤♠ ✽ ✐s r✉♥ ✇✐t❤ ǫ = 0✱ ✇❡ ❝❛♥ r❡❝♦✈❡r t❤❡ ♦♣t✐♠❛❧ ✈❛❧✉❡ ❢✉♥❝t✐♦♥ ❛♥❞ ❛♥ ♦♣t✐♠❛❧ ♣♦❧✐❝② ❡①❛❝t❧②✳ ❍♦✇❡✈❡r ✐♥ ♣r❛❝t✐❝❡✱ ǫ ✐s s❡t t♦ ❜❡ ❛ s♠❛❧❧ ♥✉♠❜❡r s✉❝❤ ❛s 10−9✲10−12✳

✸✳✼ ▼❉P ❝♦♥tr♦❧ ❢♦r ❛ ✜♥✐t❡ ❤♦r✐③♦♥ ▼❉P

❲❡ ♥♦✇ ❜r✐❡✢② ❞✐s❝✉ss t❤❡ ▼❉P ❝♦♥tr♦❧ ♣r♦❜❧❡♠ ❢♦r ❛ ✜♥✐t❡ ❤♦r✐③♦♥ ▼❉P✳ ❍❛✈✐♥❣ ❛❧r❡❛❞② ❞✐s❝✉ss❡❞ t❤❡ ❝♦♥tr♦❧ ♣r♦❜❧❡♠ ❢♦r ✐♥✜♥✐t❡ ❤♦r✐③♦♥ ▼❉Ps✱ ✇❡ s✐♠♣❧② st❛t❡ t❤❛t ✐♥ t❤❡ ✜♥✐t❡ ❤♦r✐③♦♥ ❝❛s❡✱ ❛ ❞❡t❡r♠✐♥✐st✐❝ ♣♦❧✐❝② ❝❛♥ ❜❡ ♦❜t❛✐♥❡❞ t❤❛t ✐s ♦♣t✐♠❛❧✳ ❇✉t t❤❡ ♣♦❧✐❝② ✐s ♥♦ ❧♦♥❣❡r st❛t✐♦♥❛r②✱ ❛♥❞ s♦ ❛t ❡❛❝❤ t✐♠❡ t t❤❡ ♣♦❧✐❝② ✐s ❞✐✛❡r❡♥t✳ ❚❤❡ ♣r♦♦❢ ✐s ♥♦t t♦♦ ❞✐✣❝✉❧t ❛♥❞ t❤❡ r❡❛❞❡r ✐s ❛s❦❡❞ t♦ ❞❡r✐✈❡ t❤❡s❡ ❢❛❝ts ✐♥ t❤❡ ❢♦❧❧♦✇✐♥❣ ❡①❡r❝✐s❡✳ ✷✵

slide-21
SLIDE 21

❊①❡r❝✐s❡ ✸✳✷✷✳ ❈♦♥s✐❞❡r ❛ ▼❉P ✇✐t❤ ✜♥✐t❡ ❤♦r✐③♦♥ H ❛♥❞ ✜♥✐t❡ r❡✇❛r❞s✳ ❆ t②♣✐❝❛❧ ❡♣✐s♦❞❡ ♦❢ t❤❡ ▼❉P ✇✐❧❧ ❧♦♦❦ ❧✐❦❡ (s0, a0, s1, a1, . . . , sH−1, aH−1, sH)✳ ▲❡t ❛ ♣♦❧✐❝② ❢♦r t❤❡ ▼❉P ❜❡ ❞❡♥♦t❡❞ ❜② π = (π0, π1, . . . , πH−1)✳ ❚❤❡♥ ♣r♦✈❡ t❤❡ ❢♦❧❧♦✇✐♥❣ st❛t❡♠❡♥ts✿ ✭❛✮ ❙❤♦✇ t❤❛t t❤❡ ♥✉♠❜❡r ♦❢ ❞❡t❡r♠✐♥✐st✐❝ ♣♦❧✐❝✐❡s ❢♦r t❤❡ ▼❉P ✐s ❣✐✈❡♥ ❜② H|A||S|✳ ✭❜✮ ❆ss✉♠✐♥❣ t❤❛t ❛♥ ♦♣t✐♠❛❧ ♣♦❧✐❝② π∗ ❡①✐sts✱ ❞❡r✐✈❡ ❛ r❡❝✉rr❡♥❝❡ r❡❧❛t✐♦♥ ❢♦r t❤❡ ♦♣t✐♠❛❧ ✈❛❧✉❡ ❢✉♥❝t✐♦♥ V π∗ = (V π∗ , . . . , V π∗

H )✱ ✇✐t❤ V π∗ H (s) = 0 ❢♦r ❛❧❧ st❛t❡s s ∈ S✳ Pr❡❝✐s❡❧②✱ ❞❡r✐✈❡ ❛ r❡❧❛t✐♦♥s❤✐♣

❜❡t✇❡❡♥ V π∗

t

❛♥❞ V π∗

t+1✳

✭❝✮ ▲❡t Π ❜❡ t❤❡ s❡t ♦❢ ❛❧❧ ❞❡t❡r♠✐♥✐st✐❝ ♣♦❧✐❝✐❡s✱ ✐✳❡✳ ❢♦r ❡✈❡r② π ∈ Π✱ πt ✐s ❛ ❞❡t❡r♠✐♥✐st✐❝ ♣♦❧✲ ✐❝② ❛t t✐♠❡ t ❛♥❞ ❢♦r ❛❧❧ t✐♠❡s t = 0, . . . , H − 1✳ ❙❤♦✇ t❤❛t ❢♦r ❡✈❡r② ♣♦❧✐❝②✱ ❞❡t❡r♠✐♥✐st✐❝ ♦r st♦❝❤❛st✐❝✱ t❤❡r❡ ❡①✐sts ❛ π ∈ Π ✇❤✐❝❤ ✐s ♥♦ ✇♦rs❡✳ ✭❜✮ ❙❤♦✇ t❤❛t Π ❝♦♥t❛✐♥s ❛ ♣♦❧✐❝② t❤❛t ✐s ♦♣t✐♠❛❧✳ ❇❡❝❛✉s❡ ♦❢ t❤❡ ❝♦♥❝❧✉s✐♦♥ ♦❢ ❊①❡r❝✐s❡ ✸✳✷✷✱ ❥✉st ❧✐❦❡ ✐♥ t❤❡ ✐♥✜♥✐t❡ ❤♦r✐③♦♥ ❝❛s❡ ✇❡ ❝❛♥ r❡str✐❝t ♦✉r s❡❛r❝❤ ❢♦r ❛♥ ♦♣t✐♠❛❧ ♣♦❧✐❝② t♦ t❤❡ s❡t ♦❢ ❞❡t❡r♠✐♥✐st✐❝ ♣♦❧✐❝✐❡s✳ ❲❡ ♣r❡s❡♥t ❛♥ ❛❧❣♦r✐t❤♠✱ ♥❛♠❡❧② ✈❛❧✉❡ ✐t❡r❛t✐♦♥ ❢♦r t❤✐s ♣✉r♣♦s❡✱ ✇❤✐❝❤ ✐s ❛♥❛❧♦❣♦✉s t♦ ✐ts ❝♦✉♥t❡r♣❛rt ✐♥ t❤❡ ✐♥✜♥✐t❡ ❤♦r✐③♦♥ ❝❛s❡✳ ❆❧❣♦r✐t❤♠ ✾ ❱❛❧✉❡ ✐t❡r❛t✐♦♥ ❛❧❣♦r✐t❤♠ ❢♦r ✜♥✐t❡ ❤♦r✐③♦♥ ▼❉Ps

✶✿ ♣r♦❝❡❞✉r❡ ❋✐♥✐t❡ ❱❛❧✉❡ ■t❡r❛t✐♦♥✭M✮ ✷✿

❋♦r ❛❧❧ st❛t❡s s ∈ S✱ V ∗

H(s) ← 0

✸✿

t ← H − 1

✹✿

✇❤✐❧❡ t ≥ 0 ❞♦

✺✿

❋♦r ❛❧❧ st❛t❡s s ∈ S✱ V ∗

t (s) = max a∈A

  • R(s, a) + γ

s′∈S P(s′|s, a)V ∗ t+1(s′)

  • ✻✿

❋♦r ❛❧❧ st❛t❡s s ∈ S✱ π∗

t = arg max a∈A

  • R(s, a) + γ

s′∈S P(s′|s, a)V ∗ t+1(s′)

  • ✼✿

t ← t − 1

✽✿

r❡t✉r♥ ❋♦r ❛❧❧ st❛t❡s s ∈ S✱ V ∗

t (s) ❢♦r t = 0, . . . , H✱ π∗ t (s) ❢♦r t = 0, . . . , H − 1

❚❤❡ ♣r♦♦❢ ♦❢ ❝♦rr❡❝t♥❡ss ♦❢ t❤❡ ❛❧❣♦r✐t❤♠ ✐s ❧❡❢t t♦ t❤❡ r❡❛❞❡r ❛s t❤❡ ♥❡①t ❡①❡r❝✐s❡✳ ❊①❡r❝✐s❡ ✸✳✷✸✳ ✭❛✮ Pr♦✈❡ t❤❡ ❝♦rr❡❝t♥❡ss ♦❢ ❛❧❣♦r✐t❤♠ ✾✳ ❍✐♥t ✿ ❯s❡ r❡s✉❧ts ♦❢ ❊①❡r❝✐s❡ ✸✳✷✷ ✭❜✮✳ ❚❤❡ ♥❡①t ❡①❡r❝✐s❡✱ ✇❤✐❝❤ ✐s ❛❧s♦ ♥♦t t♦♦ ❞✐✣❝✉❧t t♦ ♣r♦✈❡✱ ❡st❛❜❧✐s❤❡s ❛ ❝♦rr❡s♣♦♥❞❡♥❝❡ ❜❡t✇❡❡♥ ✈❛❧✉❡ ✐t❡r❛t✐♦♥ ✐♥ t❤❡ ✜♥✐t❡ ❛♥❞ ✐♥✜♥✐t❡ ❤♦r✐③♦♥ ❝❛s❡s✳ ❊①❡r❝✐s❡ ✸✳✷✹✳ ❈♦♥s✐❞❡r ❛ ▼❉P M = (S, A, P, R, γ) ✇✐t❤ ✐♥✜♥✐t❡ ❤♦r✐③♦♥ ❛♥❞ γ < 1✳ ▲❡t V ∗ ❜❡ t❤❡ ♦♣t✐♠❛❧ ✈❛❧✉❡ ❢✉♥❝t✐♦♥ ♦❢ M✳ ❉❡✜♥❡ ❛ s❡q✉❡♥❝❡ ♦❢ ✜♥✐t❡ ❤♦r✐③♦♥ ▼❉Ps Mk ✇✐t❤ ❤♦r✐③♦♥ Hk✱ s✉❝❤ t❤❛t Mk = M ❛♥❞ Hk = k✱ ❢♦r ❛❧❧ k = 1, 2, . . . ✳ ▲❡t {(Vk)∗}k≥1 ❜❡ t❤❡ s❡q✉❡♥❝❡ ♦❢ ♦♣t✐♠❛❧ ✈❛❧✉❡ ❢✉♥❝t✐♦♥s r❡t✉r♥❡❞ ❜② ❛❧❣♦r✐t❤♠ ✾ ✇❤❡♥ r✉♥ ✇✐t❤ t❤❡ ✐♥♣✉t Mk✱ ❛♥❞ ❝♦rr❡s♣♦♥❞✐♥❣ t♦ t = 0✳ ✭❛✮ Pr♦✈❡ t❤❛t (Vk)∗ → V ∗ ❛s k → ∞✳ ✷✶

slide-22
SLIDE 22

❆♣♣❡♥❞✐❝❡s

❆ ❈♦♥tr❛❝t✐♦♥ ♠❛♣♣✐♥❣ t❤❡♦r❡♠ ✶

■♥ t❤✐s s❡❝t✐♦♥✱ ✇❡ ✐♥tr♦❞✉❝❡ t❤❡ ♥♦t✐♦♥ ♦❢ ❝♦♥tr❛❝t✐♦♥ ♠❛♣s ✐♥ ❛ ❇❛♥❛❝❤ s♣❛❝❡ s❡tt✐♥❣✱ t❤❛t ✇❡ ❤❛✈❡ ❤❡❛✈✐❧② r❡❧✐❡❞ ♦♥ ✐♥ t❤❡ ♣r❡✈✐♦✉s s❡❝t✐♦♥ t♦ ♣r♦✈❡ ♠❛♥② ♦❢ ♦✉r ✐♠♣♦rt❛♥t t❤❡♦r❡♠s✳ ❚❤❡ ♥♦t❛t✐♦♥ ✉s❡❞ ✐♥ t❤✐s s❡❝t✐♦♥ ✇✐❧❧ ❜❡ ❝♦♠♣❧❡t❡❧② ✐♥❞❡♣❡♥❞❡♥t ♦❢ ✇❤❛t ✇❛s ✐♥tr♦❞✉❝❡❞ ❜❡❢♦r❡✱ ❛♥❞ s♦ t❤❡ r❡❛❞❡r s❤♦✉❧❞ r❡❛❞ t❤✐s s❡❝t✐♦♥ ✐♥ ❛ s❡❧❢✲❝♦♥t❛✐♥❡❞ ❢❛s❤✐♦♥✳ ▲❡t (V, || · ||) ❜❡ ❛ ❇❛♥❛❝❤ s♣❛❝❡✱ ✇❤❡r❡ V ✐s ❛ ✈❡❝t♦r s♣❛❝❡ ❛♥❞ || · || ✐s t❤❡ ♥♦r♠ ❞❡✜♥❡❞ ♦♥ t❤❡ ✈❡❝t♦r s♣❛❝❡✳ ❱ ♠❛② ❜❡ ✜♥✐t❡ ♦r ✐♥✜♥✐t❡ ❞✐♠❡♥s✐♦♥❛❧✳ ❆s ✐t ✐s ❛ ❇❛♥❛❝❤ s♣❛❝❡✱ ✇❡ r❡♠✐♥❞ t❤❡ r❡❛❞❡r t❤❛t t❤❡ s♣❛❝❡ ✐s ❝♦♠♣❧❡t❡✱ ♠❡❛♥✐♥❣ t❤❛t ❛❧❧ ❈❛✉❝❤② s❡q✉❡♥❝❡s ✭❉❡✜♥✐t✐♦♥ ❆✳✶✮ ❝♦♥✈❡r❣❡ ✭❉❡✜♥✐t✐♦♥ ❆✳✷✮✳ ❲❡ ✜rst ❣✐✈❡ ❛ ❢❡✇ ❞❡✜♥✐t✐♦♥s✿ ❉❡✜♥✐t✐♦♥ ❆✳✶✳ ❆ s❡q✉❡♥❝❡ {vk}k≥1 ♦❢ ❡❧❡♠❡♥ts vk ∈ V, ∀ k = 1, 2, . . . ✱ ✐s ❝❛❧❧❡❞ ❛ ❈❛✉❝❤② s❡q✉❡♥❝❡ ✐✛ ❢♦r ❡✈❡r② r❡❛❧ ♥✉♠❜❡r ǫ > 0 t❤❡r❡ ❡①✐sts ❛♥ ✐♥t❡❣❡r N ≥ 1✱ s✉❝❤ t❤❛t ||vm −vn|| < ǫ ❢♦r ❛❧❧ m, n > N✳ ❉❡✜♥✐t✐♦♥ ❆✳✷✳ ▲❡t {vk}k≥1 ❜❡ ❛ s❡q✉❡♥❝❡ ♦❢ ❡❧❡♠❡♥ts ♦❢ V ✳ ❲❡ s❛② t❤❛t t❤❡ s❡q✉❡♥❝❡ ❝♦♥✈❡r❣❡s t♦ ❛♥ ❡❧❡♠❡♥t v ∈ V ✱ ✐✛ ❢♦r ❡✈❡r② r❡❛❧ ♥✉♠❜❡r ǫ > 0 t❤❡r❡ ❡①✐sts ❛♥ ✐♥t❡❣❡r N ≥ 1✱ s✉❝❤ t❤❛t ||vk −v|| < ǫ ❢♦r ❛❧❧ k ≥ N✳ ❲❡ ✇r✐t❡ t❤✐s ❛s vk → v✳ ❖✉r ✜rst t❤❡♦r❡♠ ♦❢ t❤✐s s❡❝t✐♦♥ s❤♦✇s t❤❛t ❛♥② s❡q✉❡♥❝❡ t❤❛t ✐s ❡✈❡♥t✉❛❧❧② ❝♦♥st❛♥t ✐s ❈❛✉❝❤②✳ ❚❤❡♦r❡♠ ❆✳✶✳ ❆ s❡q✉❡♥❝❡ {vk}k≥1 ✐♥ ❛ ♥♦r♠❡❞ ✈❡❝t♦r s♣❛❝❡ t❤❛t ✐s ❡✈❡♥t✉❛❧❧② ❝♦♥st❛♥t ✐s ❈❛✉❝❤②✳ Pr♦♦❢✳ ❆s t❤❡ s❡q✉❡♥❝❡ ✐s ❡✈❡♥t✉❛❧❧② ❝♦♥st❛♥t✱ t❤❡r❡ ❡①✐sts ❛ ♣♦s✐t✐✈❡ ✐♥t❡❣❡r r ❛♥❞ v ∈ V s✉❝❤ t❤❛t ❢♦r ❛❧❧ k ≥ r✱ vk = v✳ ❚❤❡♥ ❢♦r ❛♥② ǫ > 0✱ ♦♥❡ ❝❛♥ ❝❤♦♦s❡ N = r ✐♥ ❉❡✜♥✐t✐♦♥ ❆✳✶✱ ❣✐✈✐♥❣ 0 = ||vm−vn|| < ǫ ❢♦r ❛❧❧ m, n > N✱ t❤✉s ❝♦♠♣❧❡t✐♥❣ t❤❡ ♣r♦♦❢✳ ❲❡ ❝❛♥ ♥♦✇ ♣r♦✈❡ t❤❛t t❤❡ ❧✐♠✐t ♦❢ ❛ ❈❛✉❝❤② s❡q✉❡♥❝❡ ✐s ✉♥✐q✉❡✳ ❚❤❡♦r❡♠ ❆✳✷✳ ❆ ❈❛✉❝❤② s❡q✉❡♥❝❡ {vk}k≥1 ✐♥ ❛ ❇❛♥❛❝❤ s♣❛❝❡ ❝♦♥✈❡r❣❡s t♦ ❛ ✉♥✐q✉❡ ❧✐♠✐t✳ Pr♦♦❢✳ ❚❤❡ ❢❛❝t t❤❛t t❤❡ ❈❛✉❝❤② s❡q✉❡♥❝❡ ❝♦♥✈❡r❣❡s t♦ ❛ ❧✐♠✐t ✐s tr✉❡ ❜② t❤❡ ❞❡✜♥✐t✐♦♥ ♦❢ ❛ ❇❛♥❛❝❤ s♣❛❝❡✳ ❲❡ ♥❡❡❞ t♦ s❤♦✇ t❤❛t t❤✐s ❧✐♠✐t ✐s ✉♥✐q✉❡✳ ❲❡ ♣r♦✈❡ ✐t ❜② ❝♦♥tr❛❞✐❝t✐♦♥✳ ❙✉♣♣♦s❡ ∃v, w ∈ V, v = w✱ s✉❝❤ t❤❛t vk → v ❛♥❞ vk → w✳ ▲❡t δ = ||v − w||✱ ❛♥❞ ♥♦t❡ t❤❛t δ > 0 ❛s v = w✳ ❇② ❉❡✜♥✐t✐♦♥ ❆✳✷✱ t❤❡r❡ ❡①✐st ♣♦s✐t✐✈❡ ✐♥t❡❣❡rs M, N s✉❝❤ t❤❛t ||vm − v|| < δ/2 , ∀ m ≥ M ❛♥❞ ||vn − w|| < δ/2 , ∀ n ≥ N✳ ▲❡t l = max(M, N)✳ ❚❤❡♥ ❜② tr✐❛♥❣❧❡ ✐♥❡q✉❛❧✐t② ✇❡ ❤❛✈❡✱ ||v − w|| ≤ ||v − vl|| + ||vl − w|| < δ✱ ✇❤✐❝❤ ✐s ❛ ❝♦♥tr❛❞✐❝t✐♦♥✳ ❲❡ ♥❡①t ❞❡✜♥❡ t❤❡ ♥♦t✐♦♥ ♦❢ ❛ ✏❝♦♥tr❛❝t✐♦♥ ♠❛♣✑ ♦♥ ❛ ❇❛♥❛❝❤ s♣❛❝❡✱ ❛♥❞ t❤❡ ♥♦t✐♦♥ ♦❢ ❛ ✏✜①❡❞ ♣♦✐♥t✑ ♦❢ ❛♥ ♦♣❡r❛t♦r t❤❛t ♠❛♣s V t♦ ✐ts❡❧❢✳ ❉❡✜♥✐t✐♦♥ ❆✳✸✳ ❆ ❢✉♥❝t✐♦♥ T : V → V ✐s ❝❛❧❧❡❞ ❛ ❝♦♥tr❛❝t✐♦♥ ♦♥ V ✐✛ ❢♦r ❡✈❡r② v, w ∈ V ✱ ||Tv−Tw|| ≤ ||v − w||✳ ❚❤❡ ♠❛♣ ✐s ❝❛❧❧❡❞ ❛ str✐❝t ❝♦♥tr❛❝t✐♦♥ ✐✛ t❤❡r❡ ❡①✐sts ❛ r❡❛❧ ♥✉♠❜❡r 0 ≤ γ < 1✱ s✉❝❤ t❤❛t ❢♦r ❡✈❡r② v, w ∈ V ✱ ||Tv − Tw|| ≤ γ||v − w||✳ ❚❤❡ ❝♦♥st❛♥t γ ✐s ❝❛❧❧❡❞ t❤❡ ❝♦♥tr❛❝t✐♦♥ ❢❛❝t♦r ♦❢ T✳ ❉❡✜♥✐t✐♦♥ ❆✳✹✳ ❈♦♥s✐❞❡r ❛ ❢✉♥❝t✐♦♥ T : V → V ✳ ❲❡ s❛② t❤❛t v ∈ V ✐s ❛ ✜①❡❞ ♣♦✐♥t ♦❢ T ✐♥ V ✱ ✐✛ Tv = v✳

✶❆❞❞✐t✐♦♥❛❧ ♠❛t❡r✐❛❧ t❤❛t ✇❛s ♥♦t ❝♦✈❡r❡❞ ✐♥ ❝❧❛ss✳

✷✷

slide-23
SLIDE 23

❲❡ s❤♦✉❧❞ ♥♦t❡ t❤❛t ❛ ♠❛♣ T : V → V ♠❛② ❤❛✈❡ ♠❛♥② ✜①❡❞ ♣♦✐♥ts ♦r ♥♦♥❡✳ ❋♦r ❡①❛♠♣❧❡✱ t❤❡ ❝♦♥tr❛❝t✐♦♥ ♠❛♣ T : R → R ❣✐✈❡♥ ❜② T(x) = x + 1 ❤❛s ♥♦ ✜①❡❞ ♣♦✐♥ts ✐♥ R✳ ❖♥ t❤❡ ♦t❤❡r ❤❛♥❞ t❤❡ ♠❛♣ T : R → R ❣✐✈❡♥ ❜② T(x) = x✱ ✇❤✐❝❤ ✐s ❛❧s♦ ❛ ❝♦♥tr❛❝t✐♦♥✱ ❤❛s ✐♥✜♥✐t❡❧② ♠❛♥② ✜①❡❞ ♣♦✐♥ts ✐♥ R✳ ❙✐♠✐❧❛r❧②✱ ❛♥② ❧✐♥❡❛r ♠❛♣ ❢r♦♠ V t♦ ✐ts❡❧❢ ❤❛s 0 ❛s ❛ ✜①❡❞ ♣♦✐♥t✱ ❜✉t ♠❛② ♥♦t ❜❡ ❛ ❝♦♥tr❛❝t✐♦♥✳ ❚❤❡ γ = 0 ❝❛s❡ ✐s s♣❡❝✐❛❧✱ ❛s s❤♦✇♥ ❜② t❤❡ ❢♦❧❧♦✇✐♥❣ t❤❡♦r❡♠✳ ❚❤❡♦r❡♠ ❆✳✸✳ ❙✉♣♣♦s❡ T ✐s ❛ str✐❝t ❝♦♥tr❛❝t✐♦♥ ♦♥ ❛ ♥♦r♠❡❞ ✈❡❝t♦r s♣❛❝❡ V ✭♥♦t ♥❡❝❡ss❛r✐❧② ❇❛♥❛❝❤✮ ✇✐t❤ ❝♦♥tr❛❝t✐♦♥ ❢❛❝t♦r γ = 0✳ ❚❤❡♥ T ✐s ❛ ❝♦♥st❛♥t ♠❛♣✳ Pr♦♦❢✳ ❈♦♥s✐❞❡r ❛♥ ❡❧❡♠❡♥t v ∈ V ✱ ❛♥❞ ❧❡t c = Tv✳ ◆♦✇ ❢♦r ❡✈❡r② ❡❧❡♠❡♥t w ∈ V ✱ ✇❡ ❤❛✈❡ ||Tv−Tw|| ≤ 0✱ ✇❤✐❝❤ ✐♠♣❧✐❡s ||Tv − Tw|| = 0✳ ❇② ♣r♦♣❡rt② ♦❢ ♥♦r♠s t❤✐s ✐♠♣❧✐❡s t❤❛t Tw = Tv = c✳ ❲❡ ♥❡①t ♣r♦✈❡ ❛ t❤❡♦r❡♠ ✐♥✈♦❧✈✐♥❣ r❡♣❡❛t❡❞ ❛♣♣❧✐❝❛t✐♦♥ ♦❢ ❛ str✐❝t ❝♦♥tr❛❝t✐♦♥ ♠❛♣✳ ❚❤❡♦r❡♠ ❆✳✹✳ ❙✉♣♣♦s❡ T ✐s ❛ str✐❝t ❝♦♥tr❛❝t✐♦♥ ♦♥ ❛ ♥♦r♠❡❞ ✈❡❝t♦r s♣❛❝❡ V ✭♥♦t ♥❡❝❡ss❛r✐❧② ❇❛♥❛❝❤✮ ✇✐t❤ ❝♦♥tr❛❝t✐♦♥ ❢❛❝t♦r γ✳ ❚❤❡♥ ❢♦r ❡✈❡r② ❡❧❡♠❡♥t v ∈ V ✱ t❤❡ s❡q✉❡♥❝❡ {v, Tv, T 2v, . . . } ✐s ❛ ❈❛✉❝❤② s❡q✉❡♥❝❡✳ Pr♦♦❢✳ ■❢ γ = 0✱ ❚❤❡♦r❡♠ ❆✳✸ ✐♠♣❧✐❡s t❤❛t t❤❡ s❡q✉❡♥❝❡ {v, Tv, T 2v, . . . } ✐s ❛ ❝♦♥st❛♥t s❡q✉❡♥❝❡✱ ❡①❝❡♣t ❢♦r t❤❡ ✜rst t❡r♠✱ ❛♥❞ ❤❡♥❝❡ ❈❛✉❝❤② ❜② ❚❤❡♦r❡♠ ❆✳✶✳ ❙♦ ❛ss✉♠❡ t❤❛t γ = 0✳ ▲❡t α = ||Tv − v||✳ ❇② r❡♣❡❛t❡❞ ❛♣♣❧✐❝❛t✐♦♥ ♦❢ t❤❡ ❝♦♥tr❛❝t✐♦♥ ♠❛♣ ✇❡ ❤❛✈❡ ❢♦r ❛❧❧ n ≥ 0✱ ||T n+1v − T nv|| ≤ γ||T nv − T n−1v|| ≤ · · · ≤ γn||Tv − v|| = γnα. ✭✹✾✮ ❚❤❡♥ ❜② t❤❡ tr✐❛♥❣❧❡ ✐♥❡q✉❛❧✐t② ❛♥❞ ✭✹✾✮ ✇❡ ❛❞❞✐t✐♦♥❛❧❧② ❤❛✈❡ ❢♦r ❛❧❧ m, n s❛t✐s❢②✐♥❣ 0 ≤ n ≤ m✱ ||T mv − T nv|| =

  • m−1
  • k=n

(T k+1v − T kv)

m−1

  • k=n

||T k+1v − T kv|| ≤

m−1

  • k=n

γkα = α γn − γm 1 − γ

  • < αγn

1 − γ . ✭✺✵✮ ❚♦ ♣r♦✈❡ t❤❡ s❡q✉❡♥❝❡ ✐s ❈❛✉❝❤②✱ ✇❡ ✜① ❛♥ ǫ > 0✱ ❛♥❞ s❡t N = max

  • 1,
  • log
  • ǫ(1−γ)

α

log γ

  • ✳ ❚❤❡♥

❢♦r ❛❧❧ m, n s❛t✐s❢②✐♥❣ m ≥ n > N✱ ❛♥❞ ❛s ❛ ❝♦♥s❡q✉❡♥❝❡ ♦❢ ✭✺✵✮✱ ✇❡ ❤❛✈❡ ||T mv − T nv|| ≤ αγn 1 − γ < αγN 1 − γ ≤ ǫ , ✭✺✶✮ ✇❤✐❝❤ ❝♦♠♣❧❡t❡s t❤❡ ♣r♦♦❢✳ ❲❡ ❝❛♥ ♥♦✇ ♣r♦✈❡ t❤❡ ♠❛✐♥ r❡s✉❧t ♦❢ t❤✐s s❡❝t✐♦♥ ✿ ✏t❤❡ ❝♦♥tr❛❝t✐♦♥ ♠❛♣♣✐♥❣ t❤❡♦r❡♠✑✳ ❚❤❡♦r❡♠ ❆✳✺✳ ❙✉♣♣♦s❡ t❤❡ ❢✉♥❝t✐♦♥ T : V → V ✐s ❛ str✐❝t ❝♦♥tr❛❝t✐♦♥ ♦♥ ❛ ❇❛♥❛❝❤ s♣❛❝❡ V ✳ ❚❤❡♥ T ❤❛s ❛ ✉♥✐q✉❡ ✜①❡❞ ♣♦✐♥t ✐♥ V ✳ ▼♦r❡♦✈❡r✱ ❢♦r ❡✈❡r② ❡❧❡♠❡♥t v ∈ V ✱ t❤❡ s❡q✉❡♥❝❡ {v, Tv, T 2v, . . . } ✐s ❈❛✉❝❤② ❛♥❞ ❝♦♥✈❡r❣❡s t♦ t❤❡ ✜①❡❞ ♣♦✐♥t✳ Pr♦♦❢✳ ❆s T ✐s ❛ str✐❝t ❝♦♥tr❛❝t✐♦♥✱ ❧❡t γ ∈ [0, 1) ❜❡ t❤❡ ❝♦♥tr❛❝t✐♦♥ ❢❛❝t♦r ♦❢ T✳ ❲❡ ✜rst ♣r♦✈❡ t❤❡ ✉♥✐q✉❡♥❡ss ♣❛rt ❜② ❝♦♥tr❛❞✐❝t✐♦♥✳ ▲❡t v, w ∈ V ❜❡ ✜①❡❞ ♣♦✐♥ts ♦❢ T ❛♥❞ v = w✱ s♦ ||v − w|| > 0✳ ❚❤❡♥ ✇❡ ❤❛✈❡ t❤❛t ||Tv − Tw|| = ||v − w||✳ ❇② t❤❡ ❝♦♥tr❛❝t✐♦♥ ♣r♦♣❡rt② ✇❡ ❛❧s♦ ❤❛✈❡ ||Tv − Tw|| ≤ γ||v − w|| < ||v − w||✳ ❇✉t t❤❡♥ t❤✐s ✐♠♣❧✐❡s ||v − w|| < ||v − w||✱ ❛ ❝♦♥tr❛❞✐❝t✐♦♥✳ ✷✸

slide-24
SLIDE 24

❲❡ ♥♦✇ ♣r♦✈❡ t❤❡ ❡①✐st❡♥❝❡ ♣❛rt✳ ❚❛❦❡ ❛♥② ❡❧❡♠❡♥t v ∈ V ❛♥❞ ❝♦♥s✐❞❡r t❤❡ s❡q✉❡♥❝❡ {vk}k≥1 ❞❡✜♥❡❞ ❛s ❢♦❧❧♦✇s✿ vk =

  • v

✐❢ k = 1, Tvk−1 ✐❢ k > 1 . ✭✺✷✮ ❚❤❡♥ ❜② ❚❤❡♦r❡♠ ❆✳✹✱ {vk}k≥1 ✐s ❛ ❈❛✉❝❤② s❡q✉❡♥❝❡✱ ❛♥❞ ❤❡♥❝❡ ❛s V ✐s ❛ ❇❛♥❛❝❤ s♣❛❝❡✱ t❤❡ s❡q✉❡♥❝❡ ❝♦♥✈❡r❣❡s t♦ ❛ ✉♥✐q✉❡ ❧✐♠✐t v∗ ∈ V ❜② ❚❤❡♦r❡♠ ❆✳✷✳ ❲❡ ❝❧❛✐♠ t❤❛t v∗ ✐s ❛ ✜①❡❞ ♣♦✐♥t ♦❢ T✳ ❚♦ ♣r♦✈❡ t❤✐s✱ ❝❤♦♦s❡ ❛♥② ǫ > 0 ❛♥❞ ❞❡✜♥❡ δ = ǫ/(1 + γ)✳ ❆s vk → v∗✱ ❜② ❉❡✜♥✐t✐♦♥ ❆✳✷✱ ∃ N ≥ 1 s✉❝❤ t❤❛t ||vk − v∗|| < δ , ∀ k ≥ N✳ ❚❤❡♥ ❜② tr✐❛♥❣❧❡ ✐♥❡q✉❛❧✐t② ✇❡ ❤❛✈❡✿ ||Tv∗ − v∗|| ≤ ||Tv∗ − vN+1|| + ||vN+1 − v∗|| = ||Tv∗ − TvN|| + ||vN+1 − v∗|| ≤ γ||v∗ − vN|| + ||vN+1 − v∗|| < γδ + δ = ǫ . ✭✺✸✮ ❚❤✉s ✇❡ ❤❛✈❡ ♣r♦✈❡❞ t❤❛t ||Tv∗ − v∗|| < ǫ ❢♦r ❛❧❧ ǫ > 0✱ ✇❤✐❝❤ ✐♠♣❧✐❡s t❤❛t ||Tv∗ − v∗|| = 0✳ ❆s V ✐s ❛ ♥♦r♠❡❞ ✈❡❝t♦r s♣❛❝❡✱ t❤✐s ✜♥❛❧❧② ✐♠♣❧✐❡s t❤❛t Tv∗ = v∗✱ t❤✉s ❝♦♠♣❧❡t✐♥❣ t❤❡ ❡①✐st❡♥❝❡ ♣r♦♦❢ ❛♥❞ ❛❧s♦ ♣r♦✈✐♥❣ t❤❡ s❡❝♦♥❞ ♣❛rt ♦❢ t❤❡ t❤❡♦r❡♠✳

❇ ❙♦❧✉t✐♦♥s t♦ s❡❧❡❝t❡❞ ❡①❡r❝✐s❡s

❊①❡r❝✐s❡ ✸✳✸ ❙♦❧✉t✐♦♥✳ ❚❤❡ tr❛♥s✐t✐♦♥ ♣r♦❜❛❜✐❧✐t② ♠❛tr✐① ✐s ❣✐✈❡♥ ❜②✿ P = S1 S2 S3 S4 S5 S6 S7                   0.6 0.4 S1 0.4 0.2 0.4 S2 0.4 0.2 0.4 S3 0.2 0.4 S4 0.4 0.2 0.4 S5 0.4 0.2 0.4 S6 0.4 0.6 S7 ❊①❡r❝✐s❡ ✸✳✾ ❙♦❧✉t✐♦♥✳ ■❢ t❤❡ st❛t❡s ❛r❡ ♦r❞❡r❡❞ ❛s {S1, S2, S3, S4, S5, S6, S7}✱ t❤❡ ✈❛❧✉❡ ❢✉♥❝t✐♦♥ ✈❡❝t♦r ❝❛♥ ❜❡ ❢♦✉♥❞ ❜② s♦❧✈✐♥❣ ✭✶✷✮✳ ❚❤❡ r❡s✉❧t ✐s V = [1.53, 0.37, 0.13, 0.22, 0.85, 3.59, 15.31]T ✳ ❊①❡r❝✐s❡ ✸✳✶✼ ❙♦❧✉t✐♦♥✳ ■♥ ❜♦t❤ ❝❛s❡s t❤❡ ✈❛❧✉❡ ❢✉♥❝t✐♦♥ ♦❢ t❤❡ ♣♦❧✐❝② ✐s ❣✐✈❡♥ ❜② t❤❡ ✈❡❝t♦r V π = [1, 0, 0, 0, 0, 0, 10]T ✳ ❊①❡r❝✐s❡ ✸✳✷✵ ❙♦❧✉t✐♦♥✳ ❚❤❡ ❛❣❡♥t ❤❛s 27 ❞❡t❡r♠✐♥✐st✐❝ st❛t✐♦♥❛r② ♣♦❧✐❝✐❡s ❛✈❛✐❧❛❜❧❡ t♦ ✐t✳ ❲❤❡♥ γ < 1✱ t❤❡ ♦♣t✐♠❛❧ ♣♦❧✐❝② ✐s ✉♥✐q✉❡ ❛♥❞ t❤❡ ❛❝t✐♦♥ ✐♥ ❡❛❝❤ st❛t❡ ✐s t♦ ✏tr② r✐❣❤t✑✳ ■❢ γ = 1✱ t❤❡ ♦♣t✐♠❛❧ ♣♦❧✐❝② ✐s ♥♦t ✉♥✐q✉❡✳ ❆❧❧ ♣♦❧✐❝✐❡s ❧❡❛❞ t♦ ✐♥✜♥✐t❡ r❡✇❛r❞ ❛♥❞ ❛r❡ ❤❡♥❝❡ ♦♣t✐♠❛❧✳ ✷✹