Extracting semi-Dyck words from fsa using the CYK algorithm Thomas - - PowerPoint PPT Presentation

extracting semi dyck words from fsa using the cyk
SMART_READER_LITE
LIVE PREVIEW

Extracting semi-Dyck words from fsa using the CYK algorithm Thomas - - PowerPoint PPT Presentation

Extracting semi-Dyck words from fsa using the CYK algorithm Thomas Ruprecht November 30, 2018 Outline Motivation Finding appropriate restrictions CYK algorithm for extraction of semi-Dyck words goal: extract semi-Dyck words from reg.


slide-1
SLIDE 1

Extracting semi-Dyck words from fsa using the CYK algorithm

Thomas Ruprecht November 30, 2018

slide-2
SLIDE 2

Outline

Motivation Finding appropriate restrictions CYK algorithm for extraction of semi-Dyck words

slide-3
SLIDE 3

Motivation: Chomsky-Schützenberger parsing

▶ ChoSchü theorem [CS63]: decompose context-free language into

▶ reg. language 𝑆 ▶ alph. string homomorphism ℎ ▶ semi-Dyck language D

such that 𝑀 = ℎ(𝑆 ∩ 𝐸) ▶ ChoSchü parsing [Hul11]:

▶ def. of 𝑆 and 𝐸 using grammar imply

▶ bijection between 𝑆 ∩ 𝐸 and derivation trees ▶ bijection between 𝑆 ∩ 𝐸 ∩ ℎ−1(𝑥) and derivation trees for 𝑥

▶ goal: extract semi-Dyck words from reg. language 𝑆 ∩ ℎ−1(𝑥)

slide-4
SLIDE 4

Motivation: Chomsky-Schützenberger parsing

▶ ChoSchü theorem [CS63]: decompose context-free language into

▶ reg. language 𝑆 ▶ alph. string homomorphism ℎ ▶ semi-Dyck language D

such that 𝑀 = ℎ(𝑆 ∩ 𝐸) ▶ ChoSchü parsing [Hul11]:

▶ def. of 𝑆 and 𝐸 using grammar imply

▶ bijection between 𝑆 ∩ 𝐸 and derivation trees ▶ bijection between 𝑆 ∩ 𝐸 ∩ ℎ−1(𝑥) and derivation trees for 𝑥

▶ goal: extract semi-Dyck words from reg. language 𝑆 ∩ ℎ−1(𝑥)

slide-5
SLIDE 5

Motivation: Chomsky-Schützenberger parsing

▶ ChoSchü theorem [CS63]: decompose context-free language into

▶ reg. language 𝑆 ▶ alph. string homomorphism ℎ ▶ semi-Dyck language D

such that 𝑀 = ℎ(𝑆 ∩ 𝐸) ▶ ChoSchü parsing [Hul11]:

▶ def. of 𝑆 and 𝐸 using grammar imply

▶ bijection between 𝑆 ∩ 𝐸 and derivation trees ▶ bijection between 𝑆 ∩ 𝐸 ∩ ℎ−1(𝑥) and derivation trees for 𝑥

▶ goal: extract semi-Dyck words from reg. language 𝑆 ∩ ℎ−1(𝑥)

slide-6
SLIDE 6

Motivation: existing algorithm to extract Dyck words [Hul11]

Require: fjnite state automaton 𝒝 = (𝑅, 𝛵 ∪ 𝛵, 𝑟init, 𝑟fjn, 𝑈) Ensure: enumerate words in L(𝒝) ∩ D(𝛵)

1: procedure extractDyck(𝒝) 2:

𝐵, 𝐷 ∶= {𝑤 ∣ (𝑞, 𝜏, 𝑟), (𝑟, 𝜏, 𝑠) ∈ 𝑈}, ∅

3:

for (𝑞, 𝑤, 𝑟) ∈ 𝐵 do

4:

𝐵 ∖= {(𝑞, 𝑤, 𝑟)}; 𝐷 ∪= {(𝑞, 𝑤, 𝑟)}

5:

if (𝑞, 𝑟) = (𝑟init, 𝑟fjn) then yield 𝑤

6:

𝐵 ∪= {(𝑞, 𝑤𝑥, 𝑠) ∣ (𝑟, 𝑥, 𝑠) ∈ 𝐷} ∖ 𝐷

7:

𝐵 ∪= {(𝑝, 𝑣𝑤, 𝑟) ∣ (𝑝, 𝑣, 𝑞) ∈ 𝐷} ∖ 𝐷

8:

𝐵 ∪= {(𝑝, 𝜏𝑤𝜏, 𝑠) ∣ (𝑝, 𝜏, 𝑞), (𝑟, 𝜏, 𝑠) ∈ 𝑈} ∖ 𝐷 ▶ relies on recursive structure of Dyck words: concatenation and bracketing ▶ dynamic programming: store intermediate results (backlinks) for state ▶ backlinks are equivalent to reduct grammar [BPS61]

slide-7
SLIDE 7

Motivation: existing algorithm to extract Dyck words [Hul11]

Require: fjnite state automaton 𝒝 = (𝑅, 𝛵 ∪ 𝛵, 𝑟init, 𝑟fjn, 𝑈) Ensure: enumerate words in L(𝒝) ∩ D(𝛵)

1: procedure extractDyck(𝒝) 2:

𝐵, 𝐷 ∶= {𝑤 ∣ (𝑞, 𝜏, 𝑟), (𝑟, 𝜏, 𝑠) ∈ 𝑈}, ∅

3:

for (𝑞, 𝑤, 𝑟) ∈ 𝐵 do

4:

𝐵 ∖= {(𝑞, 𝑤, 𝑟)}; 𝐷 ∪= {(𝑞, 𝑤, 𝑟)}

5:

if (𝑞, 𝑟) = (𝑟init, 𝑟fjn) then yield 𝑤

6:

𝐵 ∪= {(𝑞, 𝑤𝑥, 𝑠) ∣ (𝑟, 𝑥, 𝑠) ∈ 𝐷} ∖ 𝐷

7:

𝐵 ∪= {(𝑝, 𝑣𝑤, 𝑟) ∣ (𝑝, 𝑣, 𝑞) ∈ 𝐷} ∖ 𝐷

8:

𝐵 ∪= {(𝑝, 𝜏𝑤𝜏, 𝑠) ∣ (𝑝, 𝜏, 𝑞), (𝑟, 𝜏, 𝑠) ∈ 𝑈} ∖ 𝐷 ▶ relies on recursive structure of Dyck words: concatenation and bracketing ▶ dynamic programming: store intermediate results (backlinks) for state ▶ backlinks are equivalent to reduct grammar [BPS61]

slide-8
SLIDE 8

Motivation: existing algorithm to extract Dyck words [Hul11]

Require: fjnite state automaton 𝒝 = (𝑅, 𝛵 ∪ 𝛵, 𝑟init, 𝑟fjn, 𝑈) Ensure: enumerate words in L(𝒝) ∩ D(𝛵)

1: procedure extractDyck(𝒝) 2:

𝐵, 𝐷 ∶= {𝑤 ∣ (𝑞, 𝜏, 𝑟), (𝑟, 𝜏, 𝑠) ∈ 𝑈}, ∅

3:

for (𝑞, 𝑤, 𝑟) ∈ 𝐵 do

4:

𝐵 ∖= {(𝑞, 𝑤, 𝑟)}; 𝐷 ∪= {(𝑞, 𝑤, 𝑟)}

5:

if (𝑞, 𝑟) = (𝑟init, 𝑟fjn) then yield 𝑤

6:

𝐵 ∪= {(𝑞, 𝑤𝑥, 𝑠) ∣ (𝑟, 𝑥, 𝑠) ∈ 𝐷} ∖ 𝐷

7:

𝐵 ∪= {(𝑝, 𝑣𝑤, 𝑟) ∣ (𝑝, 𝑣, 𝑞) ∈ 𝐷} ∖ 𝐷

8:

𝐵 ∪= {(𝑝, 𝜏𝑤𝜏, 𝑠) ∣ (𝑝, 𝜏, 𝑞), (𝑟, 𝜏, 𝑠) ∈ 𝑈} ∖ 𝐷 ▶ relies on recursive structure of Dyck words: concatenation and bracketing ▶ dynamic programming: store intermediate results (backlinks) for state ▶ backlinks are equivalent to reduct grammar [BPS61]

slide-9
SLIDE 9

Motivation: existing algorithm to extract Dyck words [Hul11]

Require: fjnite state automaton 𝒝 = (𝑅, 𝛵 ∪ 𝛵, 𝑟init, 𝑟fjn, 𝑈) Ensure: enumerate words in L(𝒝) ∩ D(𝛵)

1: procedure extractDyck(𝒝) 2:

𝐵, 𝐷 ∶= {𝑤 ∣ (𝑞, 𝜏, 𝑟), (𝑟, 𝜏, 𝑠) ∈ 𝑈}, ∅

3:

for (𝑞, 𝑤, 𝑟) ∈ 𝐵 do

4:

𝐵 ∖= {(𝑞, 𝑤, 𝑟)}; 𝐷 ∪= {(𝑞, 𝑤, 𝑟)}

5:

if (𝑞, 𝑟) = (𝑟init, 𝑟fjn) then yield 𝑤

6:

𝐵 ∪= {(𝑞, 𝑤𝑥, 𝑠) ∣ (𝑟, 𝑥, 𝑠) ∈ 𝐷} ∖ 𝐷

7:

𝐵 ∪= {(𝑝, 𝑣𝑤, 𝑟) ∣ (𝑝, 𝑣, 𝑞) ∈ 𝐷} ∖ 𝐷

8:

𝐵 ∪= {(𝑝, 𝜏𝑤𝜏, 𝑠) ∣ (𝑝, 𝜏, 𝑞), (𝑟, 𝜏, 𝑠) ∈ 𝑈} ∖ 𝐷 ▶ relies on recursive structure of Dyck words: concatenation and bracketing ▶ dynamic programming: store intermediate results (backlinks) for state ▶ backlinks are equivalent to reduct grammar [BPS61]

slide-10
SLIDE 10

Outline

Motivation Finding appropriate restrictions CYK algorithm for extraction of semi-Dyck words

slide-11
SLIDE 11

𝑜-centered semi-Dyck languages

▶ example [()]{([ ]⟦{}⟧)} is 3-centered ▶ 𝑜-centered semi-Dyck word o.t.f. 𝑥0(1)1𝑥1…(𝑜)𝑜𝑥𝑜 where

▶ 𝑥𝑗 ∈ 𝛵

∗ ⋅ 𝛵∗

▶ 𝑥0(1)1𝑥1…(𝑜)𝑜𝑥𝑜 ∈ D(𝛵)

▶ C(𝛵, 𝑜) ⊆ D(𝛵) ▶ C(𝛵, ≤𝑜) = ⋃𝑜′≤𝑜 C(𝛵, 𝑜′) ▶ C(𝛵, ≤∞) = ⋃𝑜′∈ℕ C(𝛵, 𝑜′) = D(𝛵)

slide-12
SLIDE 12

𝑜-centered semi-Dyck languages

▶ example [()]{([ ]⟦{}⟧)} is 3-centered ▶ 𝑜-centered semi-Dyck word o.t.f. 𝑥0(1)1𝑥1…(𝑜)𝑜𝑥𝑜 where

▶ 𝑥𝑗 ∈ 𝛵

∗ ⋅ 𝛵∗

▶ 𝑥0(1)1𝑥1…(𝑜)𝑜𝑥𝑜 ∈ D(𝛵)

▶ C(𝛵, 𝑜) ⊆ D(𝛵) ▶ C(𝛵, ≤𝑜) = ⋃𝑜′≤𝑜 C(𝛵, 𝑜′) ▶ C(𝛵, ≤∞) = ⋃𝑜′∈ℕ C(𝛵, 𝑜′) = D(𝛵)

slide-13
SLIDE 13

𝑜-centered semi-Dyck languages

▶ example [()]{([ ]⟦{}⟧)} is 3-centered ▶ 𝑜-centered semi-Dyck word o.t.f. 𝑥0(1)1𝑥1…(𝑜)𝑜𝑥𝑜 where

▶ 𝑥𝑗 ∈ 𝛵

∗ ⋅ 𝛵∗

▶ 𝑥0(1)1𝑥1…(𝑜)𝑜𝑥𝑜 ∈ D(𝛵)

▶ C(𝛵, 𝑜) ⊆ D(𝛵) ▶ C(𝛵, ≤𝑜) = ⋃𝑜′≤𝑜 C(𝛵, 𝑜′) ▶ C(𝛵, ≤∞) = ⋃𝑜′∈ℕ C(𝛵, 𝑜′) = D(𝛵)

slide-14
SLIDE 14

𝑜-centered semi-Dyck languages

▶ example [()]{([ ]⟦{}⟧)} is 3-centered ▶ 𝑜-centered semi-Dyck word o.t.f. 𝑥0(1)1𝑥1…(𝑜)𝑜𝑥𝑜 where

▶ 𝑥𝑗 ∈ 𝛵

∗ ⋅ 𝛵∗

▶ 𝑥0(1)1𝑥1…(𝑜)𝑜𝑥𝑜 ∈ D(𝛵)

▶ C(𝛵, 𝑜) ⊆ D(𝛵) ▶ C(𝛵, ≤𝑜) = ⋃𝑜′≤𝑜 C(𝛵, 𝑜′) ▶ C(𝛵, ≤∞) = ⋃𝑜′∈ℕ C(𝛵, 𝑜′) = D(𝛵)

slide-15
SLIDE 15

𝑜-centered semi-Dyck languages

▶ example [()]{([ ]⟦{}⟧)} is 3-centered ▶ 𝑜-centered semi-Dyck word o.t.f. 𝑥0(1)1𝑥1…(𝑜)𝑜𝑥𝑜 where

▶ 𝑥𝑗 ∈ 𝛵

∗ ⋅ 𝛵∗

▶ 𝑥0(1)1𝑥1…(𝑜)𝑜𝑥𝑜 ∈ D(𝛵)

▶ C(𝛵, 𝑜) ⊆ D(𝛵) ▶ C(𝛵, ≤𝑜) = ⋃𝑜′≤𝑜 C(𝛵, 𝑜′) ▶ C(𝛵, ≤∞) = ⋃𝑜′∈ℕ C(𝛵, 𝑜′) = D(𝛵)

slide-16
SLIDE 16

𝑜-centered semi-Dyck languages

▶ example [()]{([ ]⟦{}⟧)} is 3-centered ▶ 𝑜-centered semi-Dyck word o.t.f. 𝑥0(1)1𝑥1…(𝑜)𝑜𝑥𝑜 where

▶ 𝑥𝑗 ∈ 𝛵

∗ ⋅ 𝛵∗

▶ 𝑥0(1)1𝑥1…(𝑜)𝑜𝑥𝑜 ∈ D(𝛵)

▶ C(𝛵, 𝑜) ⊆ D(𝛵) ▶ C(𝛵, ≤𝑜) = ⋃𝑜′≤𝑜 C(𝛵, 𝑜′) ▶ C(𝛵, ≤∞) = ⋃𝑜′∈ℕ C(𝛵, 𝑜′) = D(𝛵)

slide-17
SLIDE 17

𝑜-centered semi-Dyck languages

▶ example [()]{([ ]⟦{}⟧)} is 3-centered ▶ 𝑜-centered semi-Dyck word o.t.f. 𝑥0(1)1𝑥1…(𝑜)𝑜𝑥𝑜 where

▶ 𝑥𝑗 ∈ 𝛵

∗ ⋅ 𝛵∗

▶ 𝑥0(1)1𝑥1…(𝑜)𝑜𝑥𝑜 ∈ D(𝛵)

▶ C(𝛵, 𝑜) ⊆ D(𝛵) ▶ C(𝛵, ≤𝑜) = ⋃𝑜′≤𝑜 C(𝛵, 𝑜′) ▶ C(𝛵, ≤∞) = ⋃𝑜′∈ℕ C(𝛵, 𝑜′) = D(𝛵)

slide-18
SLIDE 18

𝑜-centered semi-Dyck languages

▶ example [()]{([ ]⟦{}⟧)} is 3-centered ▶ 𝑜-centered semi-Dyck word o.t.f. 𝑥0(1)1𝑥1…(𝑜)𝑜𝑥𝑜 where

▶ 𝑥𝑗 ∈ 𝛵

∗ ⋅ 𝛵∗

▶ 𝑥0(1)1𝑥1…(𝑜)𝑜𝑥𝑜 ∈ D(𝛵)

▶ C(𝛵, 𝑜) ⊆ D(𝛵) ▶ C(𝛵, ≤𝑜) = ⋃𝑜′≤𝑜 C(𝛵, 𝑜′) ▶ C(𝛵, ≤∞) = ⋃𝑜′∈ℕ C(𝛵, 𝑜′) = D(𝛵)

slide-19
SLIDE 19

𝑜-centered semi-Dyck languages

▶ example [()]{([ ]⟦{}⟧)} is 3-centered ▶ 𝑜-centered semi-Dyck word o.t.f. 𝑥0(1)1𝑥1…(𝑜)𝑜𝑥𝑜 where

▶ 𝑥𝑗 ∈ 𝛵

∗ ⋅ 𝛵∗

▶ 𝑥0(1)1𝑥1…(𝑜)𝑜𝑥𝑜 ∈ D(𝛵)

▶ C(𝛵, 𝑜) ⊆ D(𝛵) ▶ C(𝛵, ≤𝑜) = ⋃𝑜′≤𝑜 C(𝛵, 𝑜′) ▶ C(𝛵, ≤∞) = ⋃𝑜′∈ℕ C(𝛵, 𝑜′) = D(𝛵)

slide-20
SLIDE 20

𝑜-centered semi-Dyck languages

▶ example [()]{([ ]⟦{}⟧)} is 3-centered ▶ 𝑜-centered semi-Dyck word o.t.f. 𝑥0(1)1𝑥1…(𝑜)𝑜𝑥𝑜 where

▶ 𝑥𝑗 ∈ 𝛵

∗ ⋅ 𝛵∗

▶ 𝑥0(1)1𝑥1…(𝑜)𝑜𝑥𝑜 ∈ D(𝛵)

▶ C(𝛵, 𝑜) ⊆ D(𝛵) ▶ C(𝛵, ≤𝑜) = ⋃𝑜′≤𝑜 C(𝛵, 𝑜′) ▶ C(𝛵, ≤∞) = ⋃𝑜′∈ℕ C(𝛵, 𝑜′) = D(𝛵)

slide-21
SLIDE 21

(At most) 𝑜-centered regular languages

▶ (≤ 𝑜 )

  • centered regular word o.t.f.

𝑥0(1)1𝑥1…(𝑜)𝑜𝑥𝑜 where 𝑛 ≤ 𝑜, 𝑥𝑗 does not contain subsequences in 𝛵 ⋅ 𝛵 ▶ 𝒝 = (𝑅, 𝛵 ∪ 𝛵, 𝑟init, 𝑟fjn, 𝑈) is (≤ 𝑜 )

  • centered

▶ surjective function 𝑔∶ 𝑅 → {0, …, 𝑜}:

(𝑞, 𝜏, 𝑟) ∈ 𝑈 ⇒ {𝑔(𝑞) = 𝑔(𝑠) − 1 if (𝑟, 𝜏, 𝑠) ∈ 𝑈 𝑔(𝑞) = 𝑔(𝑟)

  • therwise

vice versa for (𝑞, 𝜏, 𝑟)

▶ = state partition with ordered cells ▶ ̂ 𝑜 smallest number s.t. 𝒝 is (≤ ̂ 𝑜)-centered ⇒ 𝑔 is surjective

start 1 2 (, [ , ⟦ ] ) { ), }, ] ⟧

slide-22
SLIDE 22

(At most) 𝑜-centered regular languages

▶ (≤ 𝑜 )

  • centered regular word o.t.f.

𝑥0(1)1𝑥1…(𝑜)𝑜𝑥𝑜 where 𝑛 ≤ 𝑜, 𝑥𝑗 does not contain subsequences in 𝛵 ⋅ 𝛵 ▶ 𝒝 = (𝑅, 𝛵 ∪ 𝛵, 𝑟init, 𝑟fjn, 𝑈) is (≤ 𝑜 )

  • centered

▶ surjective function 𝑔∶ 𝑅 → {0, …, 𝑜}:

(𝑞, 𝜏, 𝑟) ∈ 𝑈 ⇒ {𝑔(𝑞) = 𝑔(𝑠) − 1 if (𝑟, 𝜏, 𝑠) ∈ 𝑈 𝑔(𝑞) = 𝑔(𝑟)

  • therwise

vice versa for (𝑞, 𝜏, 𝑟)

▶ = state partition with ordered cells ▶ ̂ 𝑜 smallest number s.t. 𝒝 is (≤ ̂ 𝑜)-centered ⇒ 𝑔 is surjective

start 1 2 (, [, ⟦ ] ) { ), }, ] ⟧

slide-23
SLIDE 23

(At most) 𝑜-centered regular languages

▶ (≤ 𝑜 )

  • centered regular word o.t.f.

𝑥0(1)1𝑥1…(𝑜)𝑜𝑥𝑜 where 𝑛 ≤ 𝑜, 𝑥𝑗 does not contain subsequences in 𝛵 ⋅ 𝛵 ▶ 𝒝 = (𝑅, 𝛵 ∪ 𝛵, 𝑟init, 𝑟fjn, 𝑈) is (≤ 𝑜 )

  • centered

▶ surjective function 𝑔∶ 𝑅 → {0, …, 𝑜}:

(𝑞, 𝜏, 𝑟) ∈ 𝑈 ⇒ {𝑔(𝑞) = 𝑔(𝑠) − 1 if (𝑟, 𝜏, 𝑠) ∈ 𝑈 𝑔(𝑞) = 𝑔(𝑟)

  • therwise

vice versa for (𝑞, 𝜏, 𝑟)

▶ = state partition with ordered cells ▶ ̂ 𝑜 smallest number s.t. 𝒝 is (≤ ̂ 𝑜)-centered ⇒ 𝑔 is surjective

start 1 2 (, [, ⟦ ] ) { ), }, ] ⟧

slide-24
SLIDE 24

(At most) 𝑜-centered regular languages

▶ (≤𝑜)-centered regular word o.t.f. 𝑥0(1)1𝑥1…(𝑛)𝑛𝑥𝑛 where 𝑛 ≤ 𝑜, 𝑥𝑗 does not contain subsequences in 𝛵 ⋅ 𝛵 ▶ 𝒝 = (𝑅, 𝛵 ∪ 𝛵, 𝑟init, 𝑟fjn, 𝑈) is (≤𝑜)-centered

▶ surjective function 𝑔∶ 𝑅 → {0, …, 𝑜}:

(𝑞, 𝜏, 𝑟) ∈ 𝑈 ⇒ {𝑔(𝑞) < 𝑔(𝑠) − 1 if (𝑟, 𝜏, 𝑠) ∈ 𝑈 𝑔(𝑞) = 𝑔(𝑟)

  • therwise

vice versa for (𝑞, 𝜏, 𝑟)

▶ ≈ state partition with ordered cells ▶ ̂ 𝑜 smallest number s.t. 𝒝 is (≤ ̂ 𝑜)-centered ⇒ 𝑔 is surjective

start 1 2 (, [, ⟦ ] ) { ), }, ] ⟧

slide-25
SLIDE 25

(At most) 𝑜-centered regular languages

▶ (≤𝑜)-centered regular word o.t.f. 𝑥0(1)1𝑥1…(𝑛)𝑛𝑥𝑛 where 𝑛 ≤ 𝑜, 𝑥𝑗 does not contain subsequences in 𝛵 ⋅ 𝛵 ▶ 𝒝 = (𝑅, 𝛵 ∪ 𝛵, 𝑟init, 𝑟fjn, 𝑈) is (≤𝑜)-centered

▶ surjective function 𝑔∶ 𝑅 → {0, …, 𝑜}:

(𝑞, 𝜏, 𝑟) ∈ 𝑈 ⇒ {𝑔(𝑞) < 𝑔(𝑠) − 1 if (𝑟, 𝜏, 𝑠) ∈ 𝑈 𝑔(𝑞) = 𝑔(𝑟)

  • therwise

vice versa for (𝑞, 𝜏, 𝑟)

▶ ≈ state partition with ordered cells ▶ ̂ 𝑜 smallest number s.t. 𝒝 is (≤ ̂ 𝑜)-centered ⇒ 𝑔 is surjective

start 1 2 (, [, ⟦ ] ) { ), }, ] ⟧

slide-26
SLIDE 26

Closure properties

𝑀 is (≤ℓ)-centered, 𝑁 is (≤𝑛)-centered reg. language over 𝛵, for ℓ, 𝑛 ∈ ℕ ∪ {∞} ▶ 𝑀 ∩ 𝑁 is (≤min(ℓ, 𝑛))-centered ▶ 𝑀 ∪ 𝑁 is (≤max(ℓ, 𝑛))-centered ▶ 𝑀 is (≤∞)-centered ▶ 𝑀 ∖ 𝑁 is (≤ℓ)-centered ▶ 𝑀 ∩ D(𝛵) ⊆ C(𝛵, ≤ℓ)

slide-27
SLIDE 27

Closure properties

𝑀 is (≤ℓ)-centered, 𝑁 is (≤𝑛)-centered reg. language over 𝛵, for ℓ, 𝑛 ∈ ℕ ∪ {∞} ▶ 𝑀 ∩ 𝑁 is (≤min(ℓ, 𝑛))-centered ▶ 𝑀 ∪ 𝑁 is (≤max(ℓ, 𝑛))-centered ▶ 𝑀 is (≤∞)-centered ▶ 𝑀 ∖ 𝑁 is (≤ℓ)-centered ▶ 𝑀 ∩ D(𝛵) ⊆ C(𝛵, ≤ℓ)

slide-28
SLIDE 28

Closure properties

𝑀 is (≤ℓ)-centered, 𝑁 is (≤𝑛)-centered reg. language over 𝛵, for ℓ, 𝑛 ∈ ℕ ∪ {∞} ▶ 𝑀 ∩ 𝑁 is (≤min(ℓ, 𝑛))-centered ▶ 𝑀 ∪ 𝑁 is (≤max(ℓ, 𝑛))-centered ▶ 𝑀 is (≤∞)-centered ▶ 𝑀 ∖ 𝑁 is (≤ℓ)-centered ▶ 𝑀 ∩ D(𝛵) ⊆ C(𝛵, ≤ℓ)

slide-29
SLIDE 29

Closure properties

𝑀 is (≤ℓ)-centered, 𝑁 is (≤𝑛)-centered reg. language over 𝛵, for ℓ, 𝑛 ∈ ℕ ∪ {∞} ▶ 𝑀 ∩ 𝑁 is (≤min(ℓ, 𝑛))-centered ▶ 𝑀 ∪ 𝑁 is (≤max(ℓ, 𝑛))-centered ▶ 𝑀 is (≤∞)-centered ▶ 𝑀 ∖ 𝑁 is (≤ℓ)-centered ▶ 𝑀 ∩ D(𝛵) ⊆ C(𝛵, ≤ℓ)

slide-30
SLIDE 30

Closure properties

𝑀 is (≤ℓ)-centered, 𝑁 is (≤𝑛)-centered reg. language over 𝛵, for ℓ, 𝑛 ∈ ℕ ∪ {∞} ▶ 𝑀 ∩ 𝑁 is (≤min(ℓ, 𝑛))-centered ▶ 𝑀 ∪ 𝑁 is (≤max(ℓ, 𝑛))-centered ▶ 𝑀 is (≤∞)-centered ▶ 𝑀 ∖ 𝑁 is (≤ℓ)-centered ▶ 𝑀 ∩ D(𝛵) ⊆ C(𝛵, ≤ℓ)

slide-31
SLIDE 31

Closure properties

𝑀 is (≤ℓ)-centered, 𝑁 is (≤𝑛)-centered reg. language over 𝛵, for ℓ, 𝑛 ∈ ℕ ∪ {∞} ▶ 𝑀 ∩ 𝑁 is (≤min(ℓ, 𝑛))-centered ▶ 𝑀 ∪ 𝑁 is (≤max(ℓ, 𝑛))-centered ▶ 𝑀 is (≤∞)-centered ▶ 𝑀 ∖ 𝑁 is (≤ℓ)-centered ▶ 𝑀 ∩ D(𝛵) ⊆ C(𝛵, ≤ℓ)

slide-32
SLIDE 32

Outline

Motivation Finding appropriate restrictions CYK algorithm for extraction of semi-Dyck words

slide-33
SLIDE 33

CYK algorithm for extraction of semi-Dyck words: example

▶ 𝑜-CYK algorithm applicable for (≤𝑜)-centered automata ▶ span 𝑔(𝑝), 𝑔(𝑠): fjll backlinks for sub-runs accepting semi-Dyck words

▶ initial: 𝑝, 𝑠 → 𝜏𝜏 for (𝑝, 𝜏, 𝑞), (𝑞, 𝜏, 𝑠) ∈ 𝑈 ▶ concatenation: 𝑝, 𝑠 → (𝑝, 𝑞)(𝑞, 𝑠) ▶ bracketing: 𝑝, 𝑠 → 𝜏(𝑞, 𝑟)𝜏 for (𝑝, 𝜏, 𝑞), (𝑟, 𝜏, 𝑠) ∈ 𝑈

𝑟0 start 𝑟1 𝑟2 (, [, ⟦ [ ] ) { } ), }, ] ⟦ ⟧

1 2 (𝑟0, 𝑟1): [ ], ( (𝑟0, 𝑟1) ) (𝑟1, 𝑟2): { } (𝑟0, 𝑟1): ⟦ ⟧, (𝑟0, 𝑟1)(𝑟1, 𝑟2), ( (𝑟0, 𝑟2) ), [ (𝑟0, 𝑟2) ]

slide-34
SLIDE 34

CYK algorithm for extraction of semi-Dyck words: example

▶ 𝑜-CYK algorithm applicable for (≤𝑜)-centered automata ▶ span 𝑔(𝑝), 𝑔(𝑠): fjll backlinks for sub-runs accepting semi-Dyck words

▶ initial: 𝑝, 𝑠 → 𝜏𝜏 for (𝑝, 𝜏, 𝑞), (𝑞, 𝜏, 𝑠) ∈ 𝑈 ▶ concatenation: 𝑝, 𝑠 → (𝑝, 𝑞)(𝑞, 𝑠) ▶ bracketing: 𝑝, 𝑠 → 𝜏(𝑞, 𝑟)𝜏 for (𝑝, 𝜏, 𝑞), (𝑟, 𝜏, 𝑠) ∈ 𝑈

𝑟0 start 𝑟1 𝑟2 (, [, ⟦ [ ] ) { } ), }, ] ⟦ ⟧

1 2 (𝑟0, 𝑟1): [ ], ( (𝑟0, 𝑟1) ) (𝑟1, 𝑟2): { } (𝑟0, 𝑟1): ⟦ ⟧, (𝑟0, 𝑟1)(𝑟1, 𝑟2), ( (𝑟0, 𝑟2) ), [ (𝑟0, 𝑟2) ]

slide-35
SLIDE 35

CYK algorithm for extraction of semi-Dyck words: example

▶ 𝑜-CYK algorithm applicable for (≤𝑜)-centered automata ▶ span 𝑔(𝑝), 𝑔(𝑠): fjll backlinks for sub-runs accepting semi-Dyck words

▶ initial: 𝑝, 𝑠 → 𝜏𝜏 for (𝑝, 𝜏, 𝑞), (𝑞, 𝜏, 𝑠) ∈ 𝑈 ▶ concatenation: 𝑝, 𝑠 → (𝑝, 𝑞)(𝑞, 𝑠) ▶ bracketing: 𝑝, 𝑠 → 𝜏(𝑞, 𝑟)𝜏 for (𝑝, 𝜏, 𝑞), (𝑟, 𝜏, 𝑠) ∈ 𝑈

𝑟0 start 𝑟1 𝑟2 (, [, ⟦ [ ] ) { } ), }, ] ⟦ ⟧

1 2 (𝑟0, 𝑟1): [ ], ( (𝑟0, 𝑟1) ) (𝑟1, 𝑟2): { } (𝑟0, 𝑟1): ⟦ ⟧, (𝑟0, 𝑟1)(𝑟1, 𝑟2), ( (𝑟0, 𝑟2) ), [ (𝑟0, 𝑟2) ]

slide-36
SLIDE 36

CYK algorithm for extraction of semi-Dyck words: example

▶ 𝑜-CYK algorithm applicable for (≤𝑜)-centered automata ▶ span 𝑔(𝑝), 𝑔(𝑠): fjll backlinks for sub-runs accepting semi-Dyck words

▶ initial: 𝑝, 𝑠 → 𝜏𝜏 for (𝑝, 𝜏, 𝑞), (𝑞, 𝜏, 𝑠) ∈ 𝑈 ▶ concatenation: 𝑝, 𝑠 → (𝑝, 𝑞)(𝑞, 𝑠) ▶ bracketing: 𝑝, 𝑠 → 𝜏(𝑞, 𝑟)𝜏 for (𝑝, 𝜏, 𝑞), (𝑟, 𝜏, 𝑠) ∈ 𝑈

𝑟0 start 𝑟1 𝑟2 (, [, ⟦ [ ] ) { } ), }, ] ⟦ ⟧

1 2 (𝑟0, 𝑟1): [ ], ( (𝑟0, 𝑟1) ) (𝑟1, 𝑟2): { } (𝑟0, 𝑟1): ⟦ ⟧, (𝑟0, 𝑟1)(𝑟1, 𝑟2), ( (𝑟0, 𝑟2) ), [ (𝑟0, 𝑟2) ]

slide-37
SLIDE 37

CYK algorithm for extraction of semi-Dyck words: example

▶ 𝑜-CYK algorithm applicable for (≤𝑜)-centered automata ▶ span 𝑔(𝑝), 𝑔(𝑠): fjll backlinks for sub-runs accepting semi-Dyck words

▶ initial: 𝑝, 𝑠 → 𝜏𝜏 for (𝑝, 𝜏, 𝑞), (𝑞, 𝜏, 𝑠) ∈ 𝑈 ▶ concatenation: 𝑝, 𝑠 → (𝑝, 𝑞)(𝑞, 𝑠) ▶ bracketing: 𝑝, 𝑠 → 𝜏(𝑞, 𝑟)𝜏 for (𝑝, 𝜏, 𝑞), (𝑟, 𝜏, 𝑠) ∈ 𝑈

𝑟0 start 𝑟1 𝑟2 (, [, ⟦ [ ] ) { } ), }, ] ⟦ ⟧

1 2 (𝑟0, 𝑟1): [ ], ( (𝑟0, 𝑟1) ) (𝑟1, 𝑟2): { } (𝑟0, 𝑟1): ⟦ ⟧, (𝑟0, 𝑟1)(𝑟1, 𝑟2), ( (𝑟0, 𝑟2) ), [ (𝑟0, 𝑟2) ]

slide-38
SLIDE 38

CYK algorithm for extraction of semi-Dyck words: example

▶ 𝑜-CYK algorithm applicable for (≤𝑜)-centered automata ▶ span 𝑔(𝑝), 𝑔(𝑠): fjll backlinks for sub-runs accepting semi-Dyck words

▶ initial: 𝑝, 𝑠 → 𝜏𝜏 for (𝑝, 𝜏, 𝑞), (𝑞, 𝜏, 𝑠) ∈ 𝑈 ▶ concatenation: 𝑝, 𝑠 → (𝑝, 𝑞)(𝑞, 𝑠) ▶ bracketing: 𝑝, 𝑠 → 𝜏(𝑞, 𝑟)𝜏 for (𝑝, 𝜏, 𝑞), (𝑟, 𝜏, 𝑠) ∈ 𝑈

𝑟0 start 𝑟1 𝑟2 (, [, ⟦ [ ] ) { } ), }, ] ⟦ ⟧

1 2 (𝑟0, 𝑟1): [ ], ( (𝑟0, 𝑟1) ) (𝑟1, 𝑟2): { } (𝑟0, 𝑟1): ⟦ ⟧, (𝑟0, 𝑟1)(𝑟1, 𝑟2), ( (𝑟0, 𝑟2) ), [ (𝑟0, 𝑟2) ]

slide-39
SLIDE 39

CYK algorithm for extraction of semi-Dyck words: example

▶ 𝑜-CYK algorithm applicable for (≤𝑜)-centered automata ▶ span 𝑔(𝑝), 𝑔(𝑠): fjll backlinks for sub-runs accepting semi-Dyck words

▶ initial: 𝑝, 𝑠 → 𝜏𝜏 for (𝑝, 𝜏, 𝑞), (𝑞, 𝜏, 𝑠) ∈ 𝑈 ▶ concatenation: 𝑝, 𝑠 → (𝑝, 𝑞)(𝑞, 𝑠) ▶ bracketing: 𝑝, 𝑠 → 𝜏(𝑞, 𝑟)𝜏 for (𝑝, 𝜏, 𝑞), (𝑟, 𝜏, 𝑠) ∈ 𝑈

𝑟0 start 𝑟1 𝑟2 (, [, ⟦ [ ] ) { } ), }, ] ⟦ ⟧

1 2 (𝑟0, 𝑟1): [ ], ( (𝑟0, 𝑟1) ) (𝑟1, 𝑟2): { } (𝑟0, 𝑟1): ⟦ ⟧, (𝑟0, 𝑟1)(𝑟1, 𝑟2), ( (𝑟0, 𝑟2) ), [ (𝑟0, 𝑟2) ]

slide-40
SLIDE 40

CYK algorithm for extraction of semi-Dyck words: example

▶ 𝑜-CYK algorithm applicable for (≤𝑜)-centered automata ▶ span 𝑔(𝑝), 𝑔(𝑠): fjll backlinks for sub-runs accepting semi-Dyck words

▶ initial: 𝑝, 𝑠 → 𝜏𝜏 for (𝑝, 𝜏, 𝑞), (𝑞, 𝜏, 𝑠) ∈ 𝑈 ▶ concatenation: 𝑝, 𝑠 → (𝑝, 𝑞)(𝑞, 𝑠) ▶ bracketing: 𝑝, 𝑠 → 𝜏(𝑞, 𝑟)𝜏 for (𝑝, 𝜏, 𝑞), (𝑟, 𝜏, 𝑠) ∈ 𝑈

𝑟0 start 𝑟1 𝑟2 (, [, ⟦ [ ] ) { } ), }, ] ⟦ ⟧

1 2 (𝑟0, 𝑟1): [ ], ( (𝑟0, 𝑟1) ) (𝑟1, 𝑟2): { } (𝑟0, 𝑟1): ⟦ ⟧, (𝑟0, 𝑟1)(𝑟1, 𝑟2), ( (𝑟0, 𝑟2) ), [ (𝑟0, 𝑟2) ]

slide-41
SLIDE 41

CYK algorithm for extraction of semi-Dyck words: example

▶ 𝑜-CYK algorithm applicable for (≤𝑜)-centered automata ▶ span 𝑔(𝑝), 𝑔(𝑠): fjll backlinks for sub-runs accepting semi-Dyck words

▶ initial: 𝑝, 𝑠 → 𝜏𝜏 for (𝑝, 𝜏, 𝑞), (𝑞, 𝜏, 𝑠) ∈ 𝑈 ▶ concatenation: 𝑝, 𝑠 → (𝑝, 𝑞)(𝑞, 𝑠) ▶ bracketing: 𝑝, 𝑠 → 𝜏(𝑞, 𝑟)𝜏 for (𝑝, 𝜏, 𝑞), (𝑟, 𝜏, 𝑠) ∈ 𝑈

𝑟0 start 𝑟1 𝑟2 (, [, ⟦ [ ] ) { } ), }, ] ⟦ ⟧

1 2 (𝑟0, 𝑟1): [ ], ( (𝑟0, 𝑟1) ) (𝑟1, 𝑟2): { } (𝑟0, 𝑟1): ⟦ ⟧, (𝑟0, 𝑟1)(𝑟1, 𝑟2), ( (𝑟0, 𝑟2) ), [ (𝑟0, 𝑟2) ]

slide-42
SLIDE 42

CYK algorithm for extraction of semi-Dyck words

Require: 𝑜≥-centered automaton 𝒝 = (𝑅, 𝛵, 𝑟init, 𝑟fjn, 𝑈) Ensure: enumerates elements of ℒ(𝒝) ∩ 𝐸(𝛵) 1: procedure extractDyck(𝒝) 2: 𝒝′ ∶= normalForm(𝒝) ▷ combine transitions (𝑝, 𝜏, 𝑞), (𝑞, 𝜏, 𝑟) to (𝑝, 𝜏𝜏, 𝑟) 3:

𝐷 ∶= cyk(𝒝′)

4:

enumerate(𝐷, 𝑟init, 𝑟fjn) ▷ c.f. Huang and Chiang [HC05]

5: function cyk(𝒝) 6:

for 𝑠 ∈ {1, …, 𝑜} do

7:

for 𝑚 ∈ {0, …, 𝑜 − 1} do

8:

𝑇𝑚,𝑚+𝑠 ∶= {(𝑞, 𝑟) ∣ (𝑞, 𝜏𝜏, 𝑟) ∈ 𝑈, 𝑔(𝑞) = 𝑚, 𝑔(𝑟) = 𝑚 + 𝑠}

9:

for 𝑛 ∈ {1, …, 𝑠 − 1} do

10:

𝑇𝑚,𝑚+𝑠 ∪= {(𝑝, 𝑟) ∣ (𝑝, 𝑞) ∈ 𝑇𝑚,𝑛, (𝑞, 𝑟) ∈ 𝑇𝑛,𝑚+𝑠}

11:

𝑇𝑚,𝑚+𝑠 ∪= ⋃(𝑞,𝑟)∈𝑇𝑚,𝑚+𝑠 R𝒝(𝑞, 𝑟) ▷ transitively reachable (𝑝, 𝑠) via (𝑝, 𝜏, 𝑞), (𝑟, 𝜏, 𝑠) ∈ 𝑈

12:

return (𝑇𝑗,𝑘 ∣ 𝑗 ∈ {0, …, 𝑜 − 1}, 𝑘 ∈ {𝑗 + 1, …, 𝑜})

slide-43
SLIDE 43

Conclusion

▶ application for Chomsky-Schützenberger parsing [Hul11; Den17]:

▶ 𝑆, ℎ−1(𝑥) are (≤∞)-centered ▶ 𝑆 ∩ ℎ−1(𝑥) is (≤|𝑥|)-centered for 𝜁-free grammars ▶ size of closure R𝒝(𝑞, 𝑟) depends on chain rules

▶ CYK parsing of cfg without binarization ▶ closure properties:

▶ parse multiple words at same time ▶ even using difgerent grammars

slide-44
SLIDE 44

Conclusion

▶ application for Chomsky-Schützenberger parsing [Hul11; Den17]:

▶ 𝑆, ℎ−1(𝑥) are (≤∞)-centered ▶ 𝑆 ∩ ℎ−1(𝑥) is (≤|𝑥|)-centered for 𝜁-free grammars ▶ size of closure R𝒝(𝑞, 𝑟) depends on chain rules

▶ CYK parsing of cfg without binarization ▶ closure properties:

▶ parse multiple words at same time ▶ even using difgerent grammars

slide-45
SLIDE 45

Conclusion

▶ application for Chomsky-Schützenberger parsing [Hul11; Den17]:

▶ 𝑆, ℎ−1(𝑥) are (≤∞)-centered ▶ 𝑆 ∩ ℎ−1(𝑥) is (≤|𝑥|)-centered for 𝜁-free grammars ▶ size of closure R𝒝(𝑞, 𝑟) depends on chain rules

▶ CYK parsing of cfg without binarization ▶ closure properties:

▶ parse multiple words at same time ▶ even using difgerent grammars

slide-46
SLIDE 46

Conclusion

▶ application for Chomsky-Schützenberger parsing [Hul11; Den17]:

▶ 𝑆, ℎ−1(𝑥) are (≤∞)-centered ▶ 𝑆 ∩ ℎ−1(𝑥) is (≤|𝑥|)-centered for 𝜁-free grammars ▶ size of closure R𝒝(𝑞, 𝑟) depends on chain rules

▶ CYK parsing of cfg without binarization ▶ closure properties:

▶ parse multiple words at same time ▶ even using difgerent grammars

slide-47
SLIDE 47

Conclusion

▶ application for Chomsky-Schützenberger parsing [Hul11; Den17]:

▶ 𝑆, ℎ−1(𝑥) are (≤∞)-centered ▶ 𝑆 ∩ ℎ−1(𝑥) is (≤|𝑥|)-centered for 𝜁-free grammars ▶ size of closure R𝒝(𝑞, 𝑟) depends on chain rules

▶ CYK parsing of cfg without binarization ▶ closure properties:

▶ parse multiple words at same time ▶ even using difgerent grammars

slide-48
SLIDE 48

References

[BPS61] Yehoshua Bar-Hillel, Micha Asher Perles, and Eli Shamir. “On Formal Properties of Simple Phrase Structure Grammars”. In: Zeitschrift für Phonetik, Sprachwissenschaft und Kommunikationsforschung 14 (1–4 1961), pp. 143–172. issn: 1867-8319. doi: 10.1524/stuf.1961.14.14.143. [CS63] Noam Chomsky and Marcel Paul Schützenberger. “The algebraic theory of context-free languages”. In: Computer Programming and Formal Systems, Studies in Logic (1963), pp. 118–161. doi: 10.1016/S0049-237X(09)70104-1. [Den17] Tobias Denkinger. “Chomsky-Schützenberger parsing for weighted multiple context-free languages”. In: Journal of Language Modelling 5.1 (July 2017), p. 3. doi: 10.15398/jlm.v5i1.159. [HC05] Liang Huang and David Chiang. “Better k-best Parsing”. In: Proceedings of the Ninth International Workshop on Parsing Technology. Vancouver, British Columbia, Canada: Association for Computational Linguistics, 2005, pp. 53–64. url: http://dl.acm.org/citation.cfm?id=1654494.1654500. [Hul11] Mans Hulden. “Parsing CFGs and PCFGs with a Chomsky-Schützenberger Representation”. In: Human Language Technology. Challenges for Computer Science and Linguistics. Ed. by Zygmunt Vetulani. Vol. 6562. Lecture Notes in Computer Science. 2011, pp. 151–160. isbn: 978-3-642-20094-6. doi: 10.1007/978-3-642-20095-3_14.