Compmng Qanmes mn arov eward odes mcmae mmes German erospace - - PowerPoint PPT Presentation

comp mng q an m es mn ar ov eward ode s
SMART_READER_LITE
LIVE PREVIEW

Compmng Qanmes mn arov eward odes mcmae mmes German erospace - - PowerPoint PPT Presentation

Compmng Qanmes mn arov eward odes mcmae mmes German erospace Cener michael.ummels@dlr.de (omn Wor wmm Cmrmse amer, T Dresden) SSCS 2013 mcmae mmes Compmng


slide-1
SLIDE 1

Comp№ℓmng Q№anℓm¯es mn Ťar¯ov Úeward Ťode¯s

Ťmcmae¯ Ùmme¯s

German Πerospace Cenℓer

michael.ummels@dlr.de (Şomnℓ Wor¯ wmℓm Cmrmsℓe¯ Ψamer, TÙ Dresden) ŘΘSSΠCS 2013

Ťmcmae¯ Ùmme¯s – Comp№ℓmng Q№anℓm¯es mn Ťar¯ov Úeward Ťode¯s 1 θ 15

slide-2
SLIDE 2

Ťar¯ov Úeward Ťode¯s

Ťode¯: Ťar¯ov decmsmon processes wmℓm nonnegaℓmve rewards on sℓaℓes.

1 2 1 2 3 4 1 4

Ťmcmae¯ Ùmme¯s – Comp№ℓmng Q№anℓm¯es mn Ťar¯ov Úeward Ťode¯s 2 θ 15

slide-3
SLIDE 3

Ťar¯ov Úeward Ťode¯s

Ťode¯: Ťar¯ov decmsmon processes wmℓm nonnegaℓmve rewards on sℓaℓes. 1 2 1

1 2 1 2 3 4 1 4

Ťmcmae¯ Ùmme¯s – Comp№ℓmng Q№anℓm¯es mn Ťar¯ov Úeward Ťode¯s 2 θ 15

slide-4
SLIDE 4

Ťar¯ov Úeward Ťode¯s

Ťode¯: Ťar¯ov decmsmon processes wmℓm nonnegaℓmve rewards on sℓaℓes. 1 2 1

1 2 1 2 3 4 1 4

Πcc№m№¯aℓed reward: 0

Ťmcmae¯ Ùmme¯s – Comp№ℓmng Q№anℓm¯es mn Ťar¯ov Úeward Ťode¯s 2 θ 15

slide-5
SLIDE 5

Ťar¯ov Úeward Ťode¯s

Ťode¯: Ťar¯ov decmsmon processes wmℓm nonnegaℓmve rewards on sℓaℓes. 1 2 1

1 2 1 2 3 4 1 4

Πcc№m№¯aℓed reward: 0 + 1

Ťmcmae¯ Ùmme¯s – Comp№ℓmng Q№anℓm¯es mn Ťar¯ov Úeward Ťode¯s 2 θ 15

slide-6
SLIDE 6

Ťar¯ov Úeward Ťode¯s

Ťode¯: Ťar¯ov decmsmon processes wmℓm nonnegaℓmve rewards on sℓaℓes. 1 2 1

1 2 1 2 3 4 1 4

Πcc№m№¯aℓed reward: 0 + 1 + 1

Ťmcmae¯ Ùmme¯s – Comp№ℓmng Q№anℓm¯es mn Ťar¯ov Úeward Ťode¯s 2 θ 15

slide-7
SLIDE 7

Ťar¯ov Úeward Ťode¯s

Ťode¯: Ťar¯ov decmsmon processes wmℓm nonnegaℓmve rewards on sℓaℓes. 1 2 1

1 2 1 2 3 4 1 4

Πcc№m№¯aℓed reward: 0 + 1 + 1 + 0

Ťmcmae¯ Ùmme¯s – Comp№ℓmng Q№anℓm¯es mn Ťar¯ov Úeward Ťode¯s 2 θ 15

slide-8
SLIDE 8

Ťar¯ov Úeward Ťode¯s

Ťode¯: Ťar¯ov decmsmon processes wmℓm nonnegaℓmve rewards on sℓaℓes. 1 2 1

1 2 1 2 3 4 1 4

Πcc№m№¯aℓed reward: 0 + 1 + 1 + 0 + 1

Ťmcmae¯ Ùmme¯s – Comp№ℓmng Q№anℓm¯es mn Ťar¯ov Úeward Ťode¯s 2 θ 15

slide-9
SLIDE 9

Ťar¯ov Úeward Ťode¯s

Ťode¯: Ťar¯ov decmsmon processes wmℓm nonnegaℓmve rewards on sℓaℓes. 1 2 1

1 2 1 2 3 4 1 4

Πcc№m№¯aℓed reward: 0 + 1 + 1 + 0 + 1 + 1

Ťmcmae¯ Ùmme¯s – Comp№ℓmng Q№anℓm¯es mn Ťar¯ov Úeward Ťode¯s 2 θ 15

slide-10
SLIDE 10

Ťar¯ov Úeward Ťode¯s

Ťode¯: Ťar¯ov decmsmon processes wmℓm nonnegaℓmve rewards on sℓaℓes. 1 2 1

1 2 1 2 3 4 1 4

Πcc№m№¯aℓed reward: 0 + 1 + 1 + 0 + 1 + 1 + 0

Ťmcmae¯ Ùmme¯s – Comp№ℓmng Q№anℓm¯es mn Ťar¯ov Úeward Ťode¯s 2 θ 15

slide-11
SLIDE 11

Ťar¯ov Úeward Ťode¯s

Ťode¯: Ťar¯ov decmsmon processes wmℓm nonnegaℓmve rewards on sℓaℓes. 1 2 1

1 2 1 2 3 4 1 4

Πcc№m№¯aℓed reward: 0 + 1 + 1 + 0 + 1 + 1 + 0 + 1

Ťmcmae¯ Ùmme¯s – Comp№ℓmng Q№anℓm¯es mn Ťar¯ov Úeward Ťode¯s 2 θ 15

slide-12
SLIDE 12

Ťar¯ov Úeward Ťode¯s

Ťode¯: Ťar¯ov decmsmon processes wmℓm nonnegaℓmve rewards on sℓaℓes. 1 2 1

1 2 1 2 3 4 1 4

Πcc№m№¯aℓed reward: 0 + 1 + 1 + 0 + 1 + 1 + 0 + 1 + 2

Ťmcmae¯ Ùmme¯s – Comp№ℓmng Q№anℓm¯es mn Ťar¯ov Úeward Ťode¯s 2 θ 15

slide-13
SLIDE 13

Ťar¯ov Úeward Ťode¯s

Ťode¯: Ťar¯ov decmsmon processes wmℓm nonnegaℓmve rewards on sℓaℓes. 1 2 1

1 2 1 2 3 4 1 4

Πcc№m№¯aℓed reward: 0 + 1 + 1 + 0 + 1 + 1 + 0 + 1 + 2 = 7

Ťmcmae¯ Ùmme¯s – Comp№ℓmng Q№anℓm¯es mn Ťar¯ov Úeward Ťode¯s 2 θ 15

slide-14
SLIDE 14

Ťar¯ov Úeward Ťode¯s

Ťode¯: Ťar¯ov decmsmon processes wmℓm nonnegaℓmve rewards on sℓaℓes. 1 2 1

1 2 1 2 3 4 1 4

Πcc№m№¯aℓed reward: 0 + 1 + 1 + 0 + 1 + 1 + 0 + 1 + 2 = 7 Ţoℓe: Scmed№¯er reso¯ves nondeℓermmnmsm.

Ťmcmae¯ Ùmme¯s – Comp№ℓmng Q№anℓm¯es mn Ťar¯ov Úeward Ťode¯s 2 θ 15

slide-15
SLIDE 15

ÞÚCTL

s0 1 a 2 a c b 1 a b a

1 2 1 2 3 4 1 4

Ŕxamp¯e properℓmes mn ÞÚCTL (Πndova eℓ a¯.):

Ťmcmae¯ Ùmme¯s – Comp№ℓmng Q№anℓm¯es mn Ťar¯ov Úeward Ťode¯s 3 θ 15

slide-16
SLIDE 16

ÞÚCTL

s0 1 a 2 a c b 1 a b a

1 2 1 2 3 4 1 4

Ŕxamp¯e properℓmes mn ÞÚCTL (Πndova eℓ a¯.):

▶ s0 ⊧ P >0.2(a U≤3 b)

Ťmcmae¯ Ùmme¯s – Comp№ℓmng Q№anℓm¯es mn Ťar¯ov Úeward Ťode¯s 3 θ 15

slide-17
SLIDE 17

ÞÚCTL

s0 1 a 2 a c b 1 a b a

1 2 1 2 3 4 1 4

Ŕxamp¯e properℓmes mn ÞÚCTL (Πndova eℓ a¯.):

▶ s0 ⊧ P >0.2(a U≤3 b) ▶ s0 ⊧ P =0(a U≤1 b)

Ťmcmae¯ Ùmme¯s – Comp№ℓmng Q№anℓm¯es mn Ťar¯ov Úeward Ťode¯s 3 θ 15

slide-18
SLIDE 18

ÞÚCTL

s0 1 a 2 a c b 1 a b a

1 2 1 2 3 4 1 4

Ŕxamp¯e properℓmes mn ÞÚCTL (Πndova eℓ a¯.):

▶ s0 ⊧ P >0.2(a U≤3 b) ▶ s0 ⊧ P =0(a U≤1 b) ▶ s0 ⊧̹ P ≤0.2(a U≤2 b)

Ťmcmae¯ Ùmme¯s – Comp№ℓmng Q№anℓm¯es mn Ťar¯ov Úeward Ťode¯s 3 θ 15

slide-19
SLIDE 19

ÞÚCTL

s0 1 a 2 a c b 1 a b a

1 2 1 2 3 4 1 4

Ŕxamp¯e properℓmes mn ÞÚCTL (Πndova eℓ a¯.):

▶ s0 ⊧ P >0.2(a U≤3 b) ▶ s0 ⊧ P =0(a U≤1 b) ▶ s0 ⊧̹ P ≤0.2(a U≤2 b) ▶ s0 ⊧̹ P >0(a U≤2 c)

Ťmcmae¯ Ùmme¯s – Comp№ℓmng Q№anℓm¯es mn Ťar¯ov Úeward Ťode¯s 3 θ 15

slide-20
SLIDE 20

ÞÚCTL

s0 1 a 2 a c b 1 a b a

1 2 1 2 3 4 1 4

Ŕxamp¯e properℓmes mn ÞÚCTL (Πndova eℓ a¯.):

▶ s0 ⊧ ∀P >0.2(a U≤3 b) ▶ s0 ⊧ ∀P =0(a U≤1 b) ▶ s0 ⊧ ∃P >0.2(a U≤2 b) ▶ s0 ⊧ ∃P =0(a U≤2 b)

Ťmcmae¯ Ùmme¯s – Comp№ℓmng Q№anℓm¯es mn Ťar¯ov Úeward Ťode¯s 3 θ 15

slide-21
SLIDE 21

Ťoℓmvaℓmon

Ŕxamp¯e: Úandommsed Ť№ℓ№a¯ exc¯№smon. nn wn nw cn c1 ww 1 nc cw c1 wc 5

Ťmcmae¯ Ùmme¯s – Comp№ℓmng Q№anℓm¯es mn Ťar¯ov Úeward Ťode¯s 4 θ 15

slide-22
SLIDE 22

Ťoℓmvaℓmon

Ŕxamp¯e: Úandommsed Ť№ℓ№a¯ exc¯№smon. nn wn nw cn c1 ww 1 nc cw c1 wc 5 Q№esℓmon: Śow many sℓeps may process 1 wamℓ №nℓm¯ wmℓm 90% cmance mn crmℓmca¯ secℓmon?

Ťmcmae¯ Ùmme¯s – Comp№ℓmng Q№anℓm¯es mn Ťar¯ov Úeward Ťode¯s 4 θ 15

slide-23
SLIDE 23

Ťoℓmvaℓmon

Ŕxamp¯e: Úandommsed Ť№ℓ№a¯ exc¯№smon. nn wn nw cn c1 ww 1 nc cw c1 wc 5 Q№esℓmon: Śow many sℓeps may process 1 wamℓ №nℓm¯ wmℓm 90% cmance mn crmℓmca¯ secℓmon? Comp№ℓe: ¯easℓ r s№cm ℓmaℓ wn ⊧ ∀P≥0.9(ℓr№e U≤r c1).

Ťmcmae¯ Ùmme¯s – Comp№ℓmng Q№anℓm¯es mn Ťar¯ov Úeward Ťode¯s 4 θ 15

slide-24
SLIDE 24

Ťore Ťoℓmvaℓmon

Ŕxamp¯e: Úeso№rce Cons№mpℓmon. s 1,000,000 1,000 1 ⋯ ⋯ prod№cℓ fam¯№re

Ťmcmae¯ Ùmme¯s – Comp№ℓmng Q№anℓm¯es mn Ťar¯ov Úeward Ťode¯s 5 θ 15

slide-25
SLIDE 25

Ťore Ťoℓmvaℓmon

Ŕxamp¯e: Úeso№rce Cons№mpℓmon. s 1,000,000 1,000 1 ⋯ ⋯ prod№cℓ fam¯№re Q№esℓmon: Śow m№cm ℓo mnvesℓ ℓo s№ccessf№¯¯y prod№ce wmℓm 99%?

Ťmcmae¯ Ùmme¯s – Comp№ℓmng Q№anℓm¯es mn Ťar¯ov Úeward Ťode¯s 5 θ 15

slide-26
SLIDE 26

Ťore Ťoℓmvaℓmon

Ŕxamp¯e: Úeso№rce Cons№mpℓmon. s 1,000,000 1,000 1 ⋯ ⋯ prod№cℓ fam¯№re Q№esℓmon: Śow m№cm ℓo mnvesℓ ℓo s№ccessf№¯¯y prod№ce wmℓm 99%? Comp№ℓe: ¯easℓ r s№cm ℓmaℓ s ⊧ ∃P≥0.99(ℓr№e U≤r prod№cℓ).

Ťmcmae¯ Ùmme¯s – Comp№ℓmng Q№anℓm¯es mn Ťar¯ov Úeward Ťode¯s 5 θ 15

slide-27
SLIDE 27

Q№anℓm¯e Q№ermes

Q№anℓm¯e Q№ery φ = ∀P⋈p(a U≤? b) or φ = ∃P⋈p(a U≤? b) wmere

▶ a, b ∈ AP, ▶ p ∈ [0, 1], and ▶ ⋈ ∈ {<, ≤, ≥, >}.

Ťmcmae¯ Ùmme¯s – Comp№ℓmng Q№anℓm¯es mn Ťar¯ov Úeward Ťode¯s 6 θ 15

slide-28
SLIDE 28

Q№anℓm¯e Q№ermes

Q№anℓm¯e Q№ery φ = ∀P⋈p(a U≤? b) or φ = ∃P⋈p(a U≤? b) wmere

▶ a, b ∈ AP, ▶ p ∈ [0, 1], and ▶ ⋈ ∈ {<, ≤, ≥, >}.

Wrmℓe φ[r] for ℓme ÞÚCTL form№¯a ℓmaℓ res№¯ℓs from rep¯acmng ? by r.

Ťmcmae¯ Ùmme¯s – Comp№ℓmng Q№anℓm¯es mn Ťar¯ov Úeward Ťode¯s 6 θ 15

slide-29
SLIDE 29

Q№anℓm¯e Q№ermes

Q№anℓm¯e Q№ery φ = ∀P⋈p(a U≤? b) or φ = ∃P⋈p(a U≤? b) wmere

▶ a, b ∈ AP, ▶ p ∈ [0, 1], and ▶ ⋈ ∈ {<, ≤, ≥, >}.

Wrmℓe φ[r] for ℓme ÞÚCTL form№¯a ℓmaℓ res№¯ℓs from rep¯acmng ? by r. Defjne ℓme va¯№e of s wrℓ. φ ℓo be ℓme ¯easℓθ¯argesℓ r s№cm ℓmaℓ s ⊧ φ[r]:

Ťmcmae¯ Ùmme¯s – Comp№ℓmng Q№anℓm¯es mn Ťar¯ov Úeward Ťode¯s 6 θ 15

slide-30
SLIDE 30

Q№anℓm¯e Q№ermes

Q№anℓm¯e Q№ery φ = ∀P⋈p(a U≤? b) or φ = ∃P⋈p(a U≤? b) wmere

▶ a, b ∈ AP, ▶ p ∈ [0, 1], and ▶ ⋈ ∈ {<, ≤, ≥, >}.

Wrmℓe φ[r] for ℓme ÞÚCTL form№¯a ℓmaℓ res№¯ℓs from rep¯acmng ? by r. Defjne ℓme va¯№e of s wrℓ. φ ℓo be ℓme ¯easℓθ¯argesℓ r s№cm ℓmaℓ s ⊧ φ[r]:

▶ va¯φ(s) = mnf{r ∈ ℝ ∶ s ⊧ φ[r]} mf ⋈ ∈ {≥, >} (mmnmmmsmng q№ery).

Ťmcmae¯ Ùmme¯s – Comp№ℓmng Q№anℓm¯es mn Ťar¯ov Úeward Ťode¯s 6 θ 15

slide-31
SLIDE 31

Q№anℓm¯e Q№ermes

Q№anℓm¯e Q№ery φ = ∀P⋈p(a U≤? b) or φ = ∃P⋈p(a U≤? b) wmere

▶ a, b ∈ AP, ▶ p ∈ [0, 1], and ▶ ⋈ ∈ {<, ≤, ≥, >}.

Wrmℓe φ[r] for ℓme ÞÚCTL form№¯a ℓmaℓ res№¯ℓs from rep¯acmng ? by r. Defjne ℓme va¯№e of s wrℓ. φ ℓo be ℓme ¯easℓθ¯argesℓ r s№cm ℓmaℓ s ⊧ φ[r]:

▶ va¯φ(s) = mnf{r ∈ ℝ ∶ s ⊧ φ[r]} mf ⋈ ∈ {≥, >} (mmnmmmsmng q№ery). ▶ va¯φ(s) = s№p{r ∈ ℝ ∶ s ⊧ φ[r]} mf ⋈ ∈ {<, ≤} (maxmmmsmng q№ery).

Ťmcmae¯ Ùmme¯s – Comp№ℓmng Q№anℓm¯es mn Ťar¯ov Úeward Ťode¯s 6 θ 15

slide-32
SLIDE 32

Q№anℓm¯e Q№ermes

Q№anℓm¯e Q№ery φ = ∀P⋈p(a U≤? b) or φ = ∃P⋈p(a U≤? b) wmere

▶ a, b ∈ AP, ▶ p ∈ [0, 1], and ▶ ⋈ ∈ {<, ≤, ≥, >}.

Wrmℓe φ[r] for ℓme ÞÚCTL form№¯a ℓmaℓ res№¯ℓs from rep¯acmng ? by r. Defjne ℓme va¯№e of s wrℓ. φ ℓo be ℓme ¯easℓθ¯argesℓ r s№cm ℓmaℓ s ⊧ φ[r]:

▶ va¯φ(s) = mnf{r ∈ ℝ ∶ s ⊧ φ[r]} mf ⋈ ∈ {≥, >} (mmnmmmsmng q№ery). ▶ va¯φ(s) = s№p{r ∈ ℝ ∶ s ⊧ φ[r]} mf ⋈ ∈ {<, ≤} (maxmmmsmng q№ery).

Ţoℓe: 1. va¯φ(s) = −∞ or va¯φ(s) ≥ 0.

  • 2. s ⊧ φ[va¯φ(s)] for mmnmmmsmng q№ermes wmℓm fjnmℓe va¯№e.

Ťmcmae¯ Ùmme¯s – Comp№ℓmng Q№anℓm¯es mn Ťar¯ov Úeward Ťode¯s 6 θ 15

slide-33
SLIDE 33

Þroperℓmes of ℓme va¯№e

Two reasons for va¯φ(s) = ∞: s 1 a a s ⊧̹ ∀P≥1(a U≤∞ b) ⇒ va¯φ(s) = ∞.

Ťmcmae¯ Ùmme¯s – Comp№ℓmng Q№anℓm¯es mn Ťar¯ov Úeward Ťode¯s 7 θ 15

slide-34
SLIDE 34

Þroperℓmes of ℓme va¯№e

Two reasons for va¯φ(s) = ∞: s 1 a a s ⊧̹ ∀P≥1(a U≤∞ b) ⇒ va¯φ(s) = ∞. s 1 a b s ⊧̹ ∀P≥1(a U≤r b) for a¯¯ r ∈ ℝ ⇒ va¯φ(s) = ∞.

Ťmcmae¯ Ùmme¯s – Comp№ℓmng Q№anℓm¯es mn Ťar¯ov Úeward Ťode¯s 7 θ 15

slide-35
SLIDE 35

Þroperℓmes of ℓme va¯№e

Two reasons for va¯φ(s) = ∞: s 1 a a s ⊧̹ ∀P≥1(a U≤∞ b) ⇒ va¯φ(s) = ∞. s 1 a b s ⊧̹ ∀P≥1(a U≤r b) for a¯¯ r ∈ ℝ ⇒ va¯φ(s) = ∞. Ùse c¯assmca¯ ÞCTL mode¯-cmec¯mng a¯gormℓmm ℓo decmde wmmcm ms ℓme case.

Ťmcmae¯ Ùmme¯s – Comp№ℓmng Q№anℓm¯es mn Ťar¯ov Úeward Ťode¯s 7 θ 15

slide-36
SLIDE 36

Tme D№a¯ Q№ery

Q№esℓmon: Do we rea¯¯y need a¯¯ emgmℓ q№ery ℓypes?

Ťmcmae¯ Ùmme¯s – Comp№ℓmng Q№anℓm¯es mn Ťar¯ov Úeward Ťode¯s 8 θ 15

slide-37
SLIDE 37

Tme D№a¯ Q№ery

Q№esℓmon: Do we rea¯¯y need a¯¯ emgmℓ q№ery ℓypes? Defjne ℓme d№a¯ of a q№ery φ = ∀P⋈p(a U≤? b) ℓo be ℓme q№ery φ = ∃P⋈p(a U≤? b), wmere e.g. < = ≥, and vmce versa.

Ťmcmae¯ Ùmme¯s – Comp№ℓmng Q№anℓm¯es mn Ťar¯ov Úeward Ťode¯s 8 θ 15

slide-38
SLIDE 38

Tme D№a¯ Q№ery

Q№esℓmon: Do we rea¯¯y need a¯¯ emgmℓ q№ery ℓypes? Defjne ℓme d№a¯ of a q№ery φ = ∀P⋈p(a U≤? b) ℓo be ℓme q№ery φ = ∃P⋈p(a U≤? b), wmere e.g. < = ≥, and vmce versa. Ţoℓe: φ[r] ≡ ¬φ[r] for a¯¯ r ∈ ℝ ∪ {±∞}.

Ťmcmae¯ Ùmme¯s – Comp№ℓmng Q№anℓm¯es mn Ťar¯ov Úeward Ťode¯s 8 θ 15

slide-39
SLIDE 39

Tme D№a¯ Q№ery

Q№esℓmon: Do we rea¯¯y need a¯¯ emgmℓ q№ery ℓypes? Defjne ℓme d№a¯ of a q№ery φ = ∀P⋈p(a U≤? b) ℓo be ℓme q№ery φ = ∃P⋈p(a U≤? b), wmere e.g. < = ≥, and vmce versa. Ţoℓe: φ[r] ≡ ¬φ[r] for a¯¯ r ∈ ℝ ∪ {±∞}. Þroposmℓmon Leℓ M be an ŤDÞ and φ a q№anℓm¯e q№ery. Tmen va¯φ(s) = va¯φ(s) for a¯¯ sℓaℓes s of M.

Ťmcmae¯ Ùmme¯s – Comp№ℓmng Q№anℓm¯es mn Ťar¯ov Úeward Ťode¯s 8 θ 15

slide-40
SLIDE 40

Tme D№a¯ Q№ery

Q№esℓmon: Do we rea¯¯y need a¯¯ emgmℓ q№ery ℓypes? Defjne ℓme d№a¯ of a q№ery φ = ∀P⋈p(a U≤? b) ℓo be ℓme q№ery φ = ∃P⋈p(a U≤? b), wmere e.g. < = ≥, and vmce versa. Ţoℓe: φ[r] ≡ ¬φ[r] for a¯¯ r ∈ ℝ ∪ {±∞}. Þroposmℓmon Leℓ M be an ŤDÞ and φ a q№anℓm¯e q№ery. Tmen va¯φ(s) = va¯φ(s) for a¯¯ sℓaℓes s of M. Þroof: s ⊧ φ[r] s ⊧ φ[r] −∞ +∞

Ťmcmae¯ Ùmme¯s – Comp№ℓmng Q№anℓm¯es mn Ťar¯ov Úeward Ťode¯s 8 θ 15

slide-41
SLIDE 41

Tme D№a¯ Q№ery

Q№esℓmon: Do we rea¯¯y need a¯¯ emgmℓ q№ery ℓypes? Defjne ℓme d№a¯ of a q№ery φ = ∀P⋈p(a U≤? b) ℓo be ℓme q№ery φ = ∃P⋈p(a U≤? b), wmere e.g. < = ≥, and vmce versa. Ţoℓe: φ[r] ≡ ¬φ[r] for a¯¯ r ∈ ℝ ∪ {±∞}. Þroposmℓmon Leℓ M be an ŤDÞ and φ a q№anℓm¯e q№ery. Tmen va¯φ(s) = va¯φ(s) for a¯¯ sℓaℓes s of M. Þroof: s ⊧ φ[r] s ⊧ φ[r] −∞ +∞ Conseq№ence: Ťay resℓrmcℓ ℓo mmnmmmsmng q№ermes.

Ťmcmae¯ Ùmme¯s – Comp№ℓmng Q№anℓm¯es mn Ťar¯ov Úeward Ťode¯s 8 θ 15

slide-42
SLIDE 42

Q№a¯mℓaℓmve Q№ermes

Π q№anℓm¯e q№ery ∀P⋈p(a U≤? b) or ∃P⋈p(a U≤? b) ms q№a¯mℓaℓmve mf p ∈ {0, 1}.

Ťmcmae¯ Ùmme¯s – Comp№ℓmng Q№anℓm¯es mn Ťar¯ov Úeward Ťode¯s 9 θ 15

slide-43
SLIDE 43

Q№a¯mℓaℓmve Q№ermes

Π q№anℓm¯e q№ery ∀P⋈p(a U≤? b) or ∃P⋈p(a U≤? b) ms q№a¯mℓaℓmve mf p ∈ {0, 1}. Tmeorem Q№a¯mℓaℓmve q№ermes can be eva¯№aℓed mn sℓrong¯y po¯ynomma¯ ℓmme.

Ťmcmae¯ Ùmme¯s – Comp№ℓmng Q№anℓm¯es mn Ťar¯ov Úeward Ťode¯s 9 θ 15

slide-44
SLIDE 44

Q№a¯mℓaℓmve Q№ermes

Π q№anℓm¯e q№ery ∀P⋈p(a U≤? b) or ∃P⋈p(a U≤? b) ms q№a¯mℓaℓmve mf p ∈ {0, 1}. Tmeorem Q№a¯mℓaℓmve q№ermes can be eva¯№aℓed mn sℓrong¯y po¯ynomma¯ ℓmme. Þrevmo№s res№¯ℓ: mn Þ for non-zeno ŤDÞs.

Ťmcmae¯ Ùmme¯s – Comp№ℓmng Q№anℓm¯es mn Ťar¯ov Úeward Ťode¯s 9 θ 15

slide-45
SLIDE 45

Q№a¯mℓaℓmve Q№ermes

Π q№anℓm¯e q№ery ∀P⋈p(a U≤? b) or ∃P⋈p(a U≤? b) ms q№a¯mℓaℓmve mf p ∈ {0, 1}. Tmeorem Q№a¯mℓaℓmve q№ermes can be eva¯№aℓed mn sℓrong¯y po¯ynomma¯ ℓmme. Þrevmo№s res№¯ℓ: mn Þ for non-zeno ŤDÞs. Ŕxamp¯e: φ = ∀P>0(a U≤? b)

M

X Ú = {0} X = {s ∶ rew(s) > 0, s ⊧ b}

Ťmcmae¯ Ùmme¯s – Comp№ℓmng Q№anℓm¯es mn Ťar¯ov Úeward Ťode¯s 9 θ 15

slide-46
SLIDE 46

Q№a¯mℓaℓmve Q№ermes

Π q№anℓm¯e q№ery ∀P⋈p(a U≤? b) or ∃P⋈p(a U≤? b) ms q№a¯mℓaℓmve mf p ∈ {0, 1}. Tmeorem Q№a¯mℓaℓmve q№ermes can be eva¯№aℓed mn sℓrong¯y po¯ynomma¯ ℓmme. Þrevmo№s res№¯ℓ: mn Þ for non-zeno ŤDÞs. Ŕxamp¯e: φ = ∀P>0(a U≤? b)

M

X Ú = {r < r1 < r2 < . . .} X = {dmscovered sℓaℓes}

Ťmcmae¯ Ùmme¯s – Comp№ℓmng Q№anℓm¯es mn Ťar¯ov Úeward Ťode¯s 9 θ 15

slide-47
SLIDE 47

Q№a¯mℓaℓmve Q№ermes

Π q№anℓm¯e q№ery ∀P⋈p(a U≤? b) or ∃P⋈p(a U≤? b) ms q№a¯mℓaℓmve mf p ∈ {0, 1}. Tmeorem Q№a¯mℓaℓmve q№ermes can be eva¯№aℓed mn sℓrong¯y po¯ynomma¯ ℓmme. Þrevmo№s res№¯ℓ: mn Þ for non-zeno ŤDÞs. Ŕxamp¯e: φ = ∀P>0(a U≤? b)

M

X Y Ú = {r < r1 < r2 < . . .} X = {dmscovered sℓaℓes} Y = {s ∈ X ∶ rew(s) > 0, va¯φ(s) ≤ r}

Ťmcmae¯ Ùmme¯s – Comp№ℓmng Q№anℓm¯es mn Ťar¯ov Úeward Ťode¯s 9 θ 15

slide-48
SLIDE 48

Q№a¯mℓaℓmve Q№ermes

Π q№anℓm¯e q№ery ∀P⋈p(a U≤? b) or ∃P⋈p(a U≤? b) ms q№a¯mℓaℓmve mf p ∈ {0, 1}. Tmeorem Q№a¯mℓaℓmve q№ermes can be eva¯№aℓed mn sℓrong¯y po¯ynomma¯ ℓmme. Þrevmo№s res№¯ℓ: mn Þ for non-zeno ŤDÞs. Ŕxamp¯e: φ = ∀P>0(a U≤? b)

M

X Y s

⊧ a ∧ ∀P>0X(Z U Y) va¯φ = r + rew(s)

Ú = {r < r1 < r2 < . . .} X = {dmscovered sℓaℓes} Y = {s ∈ X ∶ rew(s) > 0, va¯φ(s) ≤ r} Z = {s ∶ s ⊧ a ∧ ¬b, rew(s) = 0}

Ťmcmae¯ Ùmme¯s – Comp№ℓmng Q№anℓm¯es mn Ťar¯ov Úeward Ťode¯s 9 θ 15

slide-49
SLIDE 49

Q№a¯mℓaℓmve Q№ermes

Π q№anℓm¯e q№ery ∀P⋈p(a U≤? b) or ∃P⋈p(a U≤? b) ms q№a¯mℓaℓmve mf p ∈ {0, 1}. Tmeorem Q№a¯mℓaℓmve q№ermes can be eva¯№aℓed mn sℓrong¯y po¯ynomma¯ ℓmme. Þrevmo№s res№¯ℓ: mn Þ for non-zeno ŤDÞs. Ŕxamp¯e: φ = ∀P>0(a U≤? b)

M

X Y s

⊧ a ∧ ∀P>0X(Z U Y) va¯φ = r + rew(s)

Ú = {r < r1 < r2 < . . .} X = {dmscovered sℓaℓes} Y = {s ∈ X ∶ rew(s) > 0, va¯φ(s) ≤ r} Z = {s ∶ s ⊧ a ∧ ¬b, rew(s) = 0} X ← {s ⊧ a ∧ ∀P>0X(Z U Y)}

Ťmcmae¯ Ùmme¯s – Comp№ℓmng Q№anℓm¯es mn Ťar¯ov Úeward Ťode¯s 9 θ 15

slide-50
SLIDE 50

Q№a¯mℓaℓmve Q№ermes

Π q№anℓm¯e q№ery ∀P⋈p(a U≤? b) or ∃P⋈p(a U≤? b) ms q№a¯mℓaℓmve mf p ∈ {0, 1}. Tmeorem Q№a¯mℓaℓmve q№ermes can be eva¯№aℓed mn sℓrong¯y po¯ynomma¯ ℓmme. Þrevmo№s res№¯ℓ: mn Þ for non-zeno ŤDÞs. Ŕxamp¯e: φ = ∀P>0(a U≤? b)

M

X Y s

⊧ a ∧ ∀P>0X(Z U Y) va¯φ = r + rew(s)

Ú = {r < r1 < r2 < . . .} X = {dmscovered sℓaℓes} Y = {s ∈ X ∶ rew(s) > 0, va¯φ(s) ≤ r} Z = {s ∶ s ⊧ a ∧ ¬b, rew(s) = 0} X ← {s ⊧ a ∧ ∀P>0X(Z U Y)} Ú ← Ú \ {r}

Ťmcmae¯ Ùmme¯s – Comp№ℓmng Q№anℓm¯es mn Ťar¯ov Úeward Ťode¯s 9 θ 15

slide-51
SLIDE 51

Ŕxmsℓenℓma¯ q№ermes

Ţow ¯eℓ φ = ∃P>p(a U≤? b) wmℓm p ∈ (0, 1).

Ťmcmae¯ Ùmme¯s – Comp№ℓmng Q№anℓm¯es mn Ťar¯ov Úeward Ťode¯s 10 θ 15

slide-52
SLIDE 52

Ŕxmsℓenℓma¯ q№ermes

Ţow ¯eℓ φ = ∃P>p(a U≤? b) wmℓm p ∈ (0, 1). Řacℓ 1: Řor eacm r ∈ ℕ ℓme probabm¯mℓmes maxσ Þrσ

s (a U≤m b), 0 ≤ m ≤ r can be

comp№ℓed mn ℓmme po¯y(r ⋅ ∣M∣).

Ťmcmae¯ Ùmme¯s – Comp№ℓmng Q№anℓm¯es mn Ťar¯ov Úeward Ťode¯s 10 θ 15

slide-53
SLIDE 53

Ŕxmsℓenℓma¯ q№ermes

Ţow ¯eℓ φ = ∃P>p(a U≤? b) wmℓm p ∈ (0, 1). Řacℓ 1: Řor eacm r ∈ ℕ ℓme probabm¯mℓmes maxσ Þrσ

s (a U≤m b), 0 ≤ m ≤ r can be

comp№ℓed mn ℓmme po¯y(r ⋅ ∣M∣). Řacℓ 2: Tme probabm¯mℓmes maxσ Þrσ

s (a U≤r b) converge ℓo maxσ Þrσ s (a U b).

Ťmcmae¯ Ùmme¯s – Comp№ℓmng Q№anℓm¯es mn Ťar¯ov Úeward Ťode¯s 10 θ 15

slide-54
SLIDE 54

Ŕxmsℓenℓma¯ q№ermes

Ţow ¯eℓ φ = ∃P>p(a U≤? b) wmℓm p ∈ (0, 1). Řacℓ 1: Řor eacm r ∈ ℕ ℓme probabm¯mℓmes maxσ Þrσ

s (a U≤m b), 0 ≤ m ≤ r can be

comp№ℓed mn ℓmme po¯y(r ⋅ ∣M∣). Řacℓ 2: Tme probabm¯mℓmes maxσ Þrσ

s (a U≤r b) converge ℓo maxσ Þrσ s (a U b).

Π¯gormℓmm for comp№ℓmng va¯φ(s):

▶ Šf p ≥ maxσ Þrσ s (a U b), ℓmen reℓ№rn ∞. ▶ Θℓmerwmse comp№ℓe maxσ Þrσ s (a U≤m b) for m = 0, 1, 2, 3, . . . №nℓm¯

probabm¯mℓy exceeds p3 reℓ№rn m.

Ťmcmae¯ Ùmme¯s – Comp№ℓmng Q№anℓm¯es mn Ťar¯ov Úeward Ťode¯s 10 θ 15

slide-55
SLIDE 55

Ŕxmsℓenℓma¯ q№ermes

Ţow ¯eℓ φ = ∃P>p(a U≤? b) wmℓm p ∈ (0, 1). Řacℓ 1: Řor eacm r ∈ ℕ ℓme probabm¯mℓmes maxσ Þrσ

s (a U≤m b), 0 ≤ m ≤ r can be

comp№ℓed mn ℓmme po¯y(r ⋅ ∣M∣). Řacℓ 2: Tme probabm¯mℓmes maxσ Þrσ

s (a U≤r b) converge ℓo maxσ Þrσ s (a U b).

Π¯gormℓmm for comp№ℓmng va¯φ(s):

▶ Šf p ≥ maxσ Þrσ s (a U b), ℓmen reℓ№rn ∞. ▶ Θℓmerwmse comp№ℓe maxσ Þrσ s (a U≤m b) for m = 0, 1, 2, 3, . . . №nℓm¯

probabm¯mℓy exceeds p3 reℓ№rn m. Śow bmg can ℓme va¯№e geℓ???

Ťmcmae¯ Ùmme¯s – Comp№ℓmng Q№anℓm¯es mn Ťar¯ov Úeward Ťode¯s 10 θ 15

slide-56
SLIDE 56

Ψo№ndmng ℓme Va¯№e

Lemma Leℓ M be an ŤDÞ wmℓm n sℓaℓes wmere ℓme denommnaℓor of eacm ℓransmℓmon probabm¯mℓy ms aℓ mosℓ m, φ = ∃P>p(a U≤? b), and p < q = maxσ Þrσ

s (a U b).

Tmen va¯φ(s) ≤ −⌊¯n(q − p)⌋ ⋅ n ⋅ max reward ⋅ mn.

Ťmcmae¯ Ùmme¯s – Comp№ℓmng Q№anℓm¯es mn Ťar¯ov Úeward Ťode¯s 11 θ 15

slide-57
SLIDE 57

Ψo№ndmng ℓme Va¯№e

Lemma Leℓ M be an ŤDÞ wmℓm n sℓaℓes wmere ℓme denommnaℓor of eacm ℓransmℓmon probabm¯mℓy ms aℓ mosℓ m, φ = ∃P>p(a U≤? b), and p < q = maxσ Þrσ

s (a U b).

Tmen va¯φ(s) ≤ −⌊¯n(q − p)⌋ ⋅ n ⋅ max reward ⋅ mn. Tmeorem Q№ermes of ℓme form ∃P>p(a U≤? b) can be eva¯№aℓed mn exponenℓma¯ ℓmme.

Ťmcmae¯ Ùmme¯s – Comp№ℓmng Q№anℓm¯es mn Ťar¯ov Úeward Ťode¯s 11 θ 15

slide-58
SLIDE 58

Ψo№ndmng ℓme Va¯№e

Lemma Leℓ M be an ŤDÞ wmℓm n sℓaℓes wmere ℓme denommnaℓor of eacm ℓransmℓmon probabm¯mℓy ms aℓ mosℓ m, φ = ∃P>p(a U≤? b), and p < q = maxσ Þrσ

s (a U b).

Tmen va¯φ(s) ≤ −⌊¯n(q − p)⌋ ⋅ n ⋅ max reward ⋅ mn. Tmeorem Q№ermes of ℓme form ∃P>p(a U≤? b) can be eva¯№aℓed mn exponenℓma¯ ℓmme. Q№esℓmon: Wmaℓ abo№ℓ ∃P≥p(a U≤? b)?

Ťmcmae¯ Ùmme¯s – Comp№ℓmng Q№anℓm¯es mn Ťar¯ov Úeward Ťode¯s 11 θ 15

slide-59
SLIDE 59

Ψo№ndmng ℓme Va¯№e

Lemma Leℓ M be an ŤDÞ wmℓm n sℓaℓes wmere ℓme denommnaℓor of eacm ℓransmℓmon probabm¯mℓy ms aℓ mosℓ m, φ = ∃P>p(a U≤? b), and p < q = maxσ Þrσ

s (a U b).

Tmen va¯φ(s) ≤ −⌊¯n(q − p)⌋ ⋅ n ⋅ max reward ⋅ mn. Tmeorem Q№ermes of ℓme form ∃P>p(a U≤? b) can be eva¯№aℓed mn exponenℓma¯ ℓmme. Q№esℓmon: Wmaℓ abo№ℓ ∃P≥p(a U≤? b)? Lemma sℓm¯¯ app¯mes b№ℓ p = q does noℓ enℓam¯ va¯φ(s) = ∞!

Ťmcmae¯ Ùmme¯s – Comp№ℓmng Q№anℓm¯es mn Ťar¯ov Úeward Ťode¯s 11 θ 15

slide-60
SLIDE 60

Ψo№ndmng ℓme Va¯№e

Lemma Leℓ M be an ŤDÞ wmℓm n sℓaℓes wmere ℓme denommnaℓor of eacm ℓransmℓmon probabm¯mℓy ms aℓ mosℓ m, φ = ∃P>p(a U≤? b), and p < q = maxσ Þrσ

s (a U b).

Tmen va¯φ(s) ≤ −⌊¯n(q − p)⌋ ⋅ n ⋅ max reward ⋅ mn. Tmeorem Q№ermes of ℓme form ∃P>p(a U≤? b) can be eva¯№aℓed mn exponenℓma¯ ℓmme. Q№esℓmon: Wmaℓ abo№ℓ ∃P≥p(a U≤? b)? Lemma sℓm¯¯ app¯mes b№ℓ p = q does noℓ enℓam¯ va¯φ(s) = ∞! Lemma Leℓ M be an ŤDÞ wmℓm n sℓaℓes, φ = ∃P≥p(a U≤? b), and p = maxσ Þrσ

s (a U b). Tmen va¯φ(s) = ∞ or va¯φ(s) ≤ n ⋅ max reward.

Ťmcmae¯ Ùmme¯s – Comp№ℓmng Q№anℓm¯es mn Ťar¯ov Úeward Ťode¯s 11 θ 15

slide-61
SLIDE 61

Ùnmversa¯ Q№ermes

Řor q№ermes of ℓme form ∀P>p(a U≤? b) we can geℓ ℓme same res№¯ℓ.

Ťmcmae¯ Ùmme¯s – Comp№ℓmng Q№anℓm¯es mn Ťar¯ov Úeward Ťode¯s 12 θ 15

slide-62
SLIDE 62

Ùnmversa¯ Q№ermes

Řor q№ermes of ℓme form ∀P>p(a U≤? b) we can geℓ ℓme same res№¯ℓ. Řacℓ 1: Řor eacm r ∈ ℕ ℓme probabm¯mℓmes mmnσ Þrσ

s (a U≤m b), 0 ≤ m ≤ r can be

comp№ℓed mn ℓmme po¯y(r ⋅ ∣M∣).

Ťmcmae¯ Ùmme¯s – Comp№ℓmng Q№anℓm¯es mn Ťar¯ov Úeward Ťode¯s 12 θ 15

slide-63
SLIDE 63

Ùnmversa¯ Q№ermes

Řor q№ermes of ℓme form ∀P>p(a U≤? b) we can geℓ ℓme same res№¯ℓ. Řacℓ 1: Řor eacm r ∈ ℕ ℓme probabm¯mℓmes mmnσ Þrσ

s (a U≤m b), 0 ≤ m ≤ r can be

comp№ℓed mn ℓmme po¯y(r ⋅ ∣M∣). Řacℓ 2: Tme probabm¯mℓmes mmnσ Þrσ

s (a U≤r b) converge ℓo mmnσ Þrσ s (a U b).

Ťmcmae¯ Ùmme¯s – Comp№ℓmng Q№anℓm¯es mn Ťar¯ov Úeward Ťode¯s 12 θ 15

slide-64
SLIDE 64

Ùnmversa¯ Q№ermes

Řor q№ermes of ℓme form ∀P>p(a U≤? b) we can geℓ ℓme same res№¯ℓ. Řacℓ 1: Řor eacm r ∈ ℕ ℓme probabm¯mℓmes mmnσ Þrσ

s (a U≤m b), 0 ≤ m ≤ r can be

comp№ℓed mn ℓmme po¯y(r ⋅ ∣M∣). Řacℓ 2: Tme probabm¯mℓmes mmnσ Þrσ

s (a U≤r b) converge ℓo mmnσ Þrσ s (a U b).

Lemma Leℓ M be an ŤDÞ wmℓm n sℓaℓes wmere ℓme denommnaℓor of eacm ℓransmℓmon probabm¯mℓy ms aℓ mosℓ m, φ = ∀P>p(a U≤? b), and p < q = mmnσ Þrσ

s (a U b).

Tmen va¯φ(s) ≤ −⌊¯n(q − p)⌋ ⋅ n ⋅ max reward ⋅ mn.

Ťmcmae¯ Ùmme¯s – Comp№ℓmng Q№anℓm¯es mn Ťar¯ov Úeward Ťode¯s 12 θ 15

slide-65
SLIDE 65

Ùnmversa¯ Q№ermes

Řor q№ermes of ℓme form ∀P>p(a U≤? b) we can geℓ ℓme same res№¯ℓ. Řacℓ 1: Řor eacm r ∈ ℕ ℓme probabm¯mℓmes mmnσ Þrσ

s (a U≤m b), 0 ≤ m ≤ r can be

comp№ℓed mn ℓmme po¯y(r ⋅ ∣M∣). Řacℓ 2: Tme probabm¯mℓmes mmnσ Þrσ

s (a U≤r b) converge ℓo mmnσ Þrσ s (a U b).

Lemma Leℓ M be an ŤDÞ wmℓm n sℓaℓes wmere ℓme denommnaℓor of eacm ℓransmℓmon probabm¯mℓy ms aℓ mosℓ m, φ = ∀P>p(a U≤? b), and p < q = mmnσ Þrσ

s (a U b).

Tmen va¯φ(s) ≤ −⌊¯n(q − p)⌋ ⋅ n ⋅ max reward ⋅ mn. Tmeorem Q№ermes of ℓme form ∀P>p(a U≤? b) can be eva¯№aℓed mn exponenℓma¯ ℓmme.

Ťmcmae¯ Ùmme¯s – Comp№ℓmng Q№anℓm¯es mn Ťar¯ov Úeward Ťode¯s 12 θ 15

slide-66
SLIDE 66

Ùnmversa¯ Q№ermes

Řor q№ermes of ℓme form ∀P>p(a U≤? b) we can geℓ ℓme same res№¯ℓ. Řacℓ 1: Řor eacm r ∈ ℕ ℓme probabm¯mℓmes mmnσ Þrσ

s (a U≤m b), 0 ≤ m ≤ r can be

comp№ℓed mn ℓmme po¯y(r ⋅ ∣M∣). Řacℓ 2: Tme probabm¯mℓmes mmnσ Þrσ

s (a U≤r b) converge ℓo mmnσ Þrσ s (a U b).

Lemma Leℓ M be an ŤDÞ wmℓm n sℓaℓes wmere ℓme denommnaℓor of eacm ℓransmℓmon probabm¯mℓy ms aℓ mosℓ m, φ = ∀P>p(a U≤? b), and p < q = mmnσ Þrσ

s (a U b).

Tmen va¯φ(s) ≤ −⌊¯n(q − p)⌋ ⋅ n ⋅ max reward ⋅ mn. Tmeorem Q№ermes of ℓme form ∀P>p(a U≤? b) can be eva¯№aℓed mn exponenℓma¯ ℓmme. Θpen: Π¯gormℓmm for eva¯№aℓmng ∀P≥p(a U≤? b).

Ťmcmae¯ Ùmme¯s – Comp№ℓmng Q№anℓm¯es mn Ťar¯ov Úeward Ťode¯s 12 θ 15

slide-67
SLIDE 67

Π Co№nℓer-Ŕxamp¯e

Q№esℓmon: Πss№me φ = ∀P≥p(a U≤? b), wmere p = mmnσ Þrσ

s (a U b),

and va¯φ(s) < ∞. Tmen va¯φ(s) ≤ ∣S∣ ⋅ max reward ?

Ťmcmae¯ Ùmme¯s – Comp№ℓmng Q№anℓm¯es mn Ťar¯ov Úeward Ťode¯s 13 θ 15

slide-68
SLIDE 68

Π Co№nℓer-Ŕxamp¯e

Q№esℓmon: Πss№me φ = ∀P≥p(a U≤? b), wmere p = mmnσ Þrσ

s (a U b),

and va¯φ(s) < ∞. Tmen va¯φ(s) ≤ ∣S∣ ⋅ max reward ? Πnswer: Ţo! Leℓ p = 1

2 and 0 < q < 1.

s a a b 1 a b

1 2 1 2

q 1 − q

Ťmcmae¯ Ùmme¯s – Comp№ℓmng Q№anℓm¯es mn Ťar¯ov Úeward Ťode¯s 13 θ 15

slide-69
SLIDE 69

Π Co№nℓer-Ŕxamp¯e

Q№esℓmon: Πss№me φ = ∀P≥p(a U≤? b), wmere p = mmnσ Þrσ

s (a U b),

and va¯φ(s) < ∞. Tmen va¯φ(s) ≤ ∣S∣ ⋅ max reward ? Πnswer: Ţo! Leℓ p = 1

2 and 0 < q < 1.

s a a b 1 a b

1 2 1 2

q 1 − q Ţoℓe: va¯φ(s) = −⌊1/ ¯og2 q⌋.

Ťmcmae¯ Ùmme¯s – Comp№ℓmng Q№anℓm¯es mn Ťar¯ov Úeward Ťode¯s 13 θ 15

slide-70
SLIDE 70

Ťar¯ov Cmamns

Úed№ce ŤCs wmℓm non-negaℓmve rewards ℓo ŤCs wmℓm rewards 0 and 1:

M∶

3 ̃

M∶

1 1 1 ∣ ̃

M∣ ≤ ∣M∣ ⋅ max reward

Ťmcmae¯ Ùmme¯s – Comp№ℓmng Q№anℓm¯es mn Ťar¯ov Úeward Ťode¯s 14 θ 15

slide-71
SLIDE 71

Ťar¯ov Cmamns

Úed№ce ŤCs wmℓm non-negaℓmve rewards ℓo ŤCs wmℓm rewards 0 and 1:

M∶

3 ̃

M∶

1 1 1 ∣ ̃

M∣ ≤ ∣M∣ ⋅ max reward

Řacℓ: Řor a Ťar¯ov cmamn wmℓm rewards 0 and 1, ℓme probabm¯mℓmes Þrs(a U≤r b) can be comp№ℓed mn ℓmme po¯y(∣M∣) ⋅ ¯og r.

Ťmcmae¯ Ùmme¯s – Comp№ℓmng Q№anℓm¯es mn Ťar¯ov Úeward Ťode¯s 14 θ 15

slide-72
SLIDE 72

Ťar¯ov Cmamns

Úed№ce ŤCs wmℓm non-negaℓmve rewards ℓo ŤCs wmℓm rewards 0 and 1:

M∶

3 ̃

M∶

1 1 1 ∣ ̃

M∣ ≤ ∣M∣ ⋅ max reward

Řacℓ: Řor a Ťar¯ov cmamn wmℓm rewards 0 and 1, ℓme probabm¯mℓmes Þrs(a U≤r b) can be comp№ℓed mn ℓmme po¯y(∣M∣) ⋅ ¯og r. Śence: Ψmnary searcm mn ℓme mnℓerva¯ [0, −⌊¯n(q − p)⌋ ⋅ n ⋅ max reward ⋅ mn] wmℓm q = Þrs(a U b) can be №sed ℓo deℓermmne va¯φ(s) for φ = P>p(a U≤? b).

Ťmcmae¯ Ùmme¯s – Comp№ℓmng Q№anℓm¯es mn Ťar¯ov Úeward Ťode¯s 14 θ 15

slide-73
SLIDE 73

Ťar¯ov Cmamns

Úed№ce ŤCs wmℓm non-negaℓmve rewards ℓo ŤCs wmℓm rewards 0 and 1:

M∶

3 ̃

M∶

1 1 1 ∣ ̃

M∣ ≤ ∣M∣ ⋅ max reward

Řacℓ: Řor a Ťar¯ov cmamn wmℓm rewards 0 and 1, ℓme probabm¯mℓmes Þrs(a U≤r b) can be comp№ℓed mn ℓmme po¯y(∣M∣) ⋅ ¯og r. Śence: Ψmnary searcm mn ℓme mnℓerva¯ [0, −⌊¯n(q − p)⌋ ⋅ n ⋅ max reward ⋅ mn] wmℓm q = Þrs(a U b) can be №sed ℓo deℓermmne va¯φ(s) for φ = P>p(a U≤? b). Ţoℓe: −⌊¯n(q − p)⌋ ≤ po¯y(∣M∣) + ∥p∥.

Ťmcmae¯ Ùmme¯s – Comp№ℓmng Q№anℓm¯es mn Ťar¯ov Úeward Ťode¯s 14 θ 15

slide-74
SLIDE 74

Ťar¯ov Cmamns

Úed№ce ŤCs wmℓm non-negaℓmve rewards ℓo ŤCs wmℓm rewards 0 and 1:

M∶

3 ̃

M∶

1 1 1 ∣ ̃

M∣ ≤ ∣M∣ ⋅ max reward

Řacℓ: Řor a Ťar¯ov cmamn wmℓm rewards 0 and 1, ℓme probabm¯mℓmes Þrs(a U≤r b) can be comp№ℓed mn ℓmme po¯y(∣M∣) ⋅ ¯og r. Śence: Ψmnary searcm mn ℓme mnℓerva¯ [0, −⌊¯n(q − p)⌋ ⋅ n ⋅ max reward ⋅ mn] wmℓm q = Þrs(a U b) can be №sed ℓo deℓermmne va¯φ(s) for φ = P>p(a U≤? b). Ţoℓe: −⌊¯n(q − p)⌋ ≤ po¯y(∣M∣) + ∥p∥. Tmeorem Q№anℓm¯e Q№ermes can be eva¯№aℓed mn pse№do-po¯ynomma¯ ℓmme on Ťar¯ov cmamns.

Ťmcmae¯ Ùmme¯s – Comp№ℓmng Q№anℓm¯es mn Ťar¯ov Úeward Ťode¯s 14 θ 15

slide-75
SLIDE 75

Conc¯№smon

Úes№¯ℓs:

▶ Þo¯ynomma¯ a¯gormℓmm for q№a¯mℓaℓmve q№ermes. ▶ Ŕxponenℓma¯ a¯gormℓmm for q№anℓmℓaℓmve q№ermes. ▶ Þse№do-po¯ynomma¯ a¯gormℓmm for Ťar¯ov Cmamns.

Ťmcmae¯ Ùmme¯s – Comp№ℓmng Q№anℓm¯es mn Ťar¯ov Úeward Ťode¯s 15 θ 15

slide-76
SLIDE 76

Conc¯№smon

Úes№¯ℓs:

▶ Þo¯ynomma¯ a¯gormℓmm for q№a¯mℓaℓmve q№ermes. ▶ Ŕxponenℓma¯ a¯gormℓmm for q№anℓmℓaℓmve q№ermes. ▶ Þse№do-po¯ynomma¯ a¯gormℓmm for Ťar¯ov Cmamns.

Ř№ℓ№re wor¯:

▶ Q№ermes of ℓme form Q(a U>r b). ▶ Long-r№n average rewards. ▶ ÞÚCTL wmℓm parameℓers.

Ťmcmae¯ Ùmme¯s – Comp№ℓmng Q№anℓm¯es mn Ťar¯ov Úeward Ťode¯s 15 θ 15