Lagrangian Based Approaches for Lexicalized Tree Adjoining Grammar - - PowerPoint PPT Presentation

lagrangian based approaches for lexicalized tree
SMART_READER_LITE
LIVE PREVIEW

Lagrangian Based Approaches for Lexicalized Tree Adjoining Grammar - - PowerPoint PPT Presentation

Lagrangian Based Approaches for Lexicalized Tree Adjoining Grammar Parsing Caio Corro Supervision: Adeline Nazarenko & Joseph Le Roux 9 march 2018 1 / 51 Syntax: description of structures in natural languages She walks the woman dog


slide-1
SLIDE 1

Lagrangian Based Approaches for Lexicalized Tree Adjoining Grammar Parsing

Caio Corro Supervision: Adeline Nazarenko & Joseph Le Roux 9 march 2018

1 / 51

slide-2
SLIDE 2

Syntax: description of structures in natural languages

She walks the woman dog the

2 / 51

slide-3
SLIDE 3

Syntax: description of structures in natural languages

She walks the woman dog the VB DET NN PRP Syntactic analysis

  • Part-of-speech tagging: assign a category to each a lexical item

2 / 51

slide-4
SLIDE 4

Syntax: description of structures in natural languages

She walks the woman dog the VB DET NN PRP NP Syntactic analysis

  • Part-of-speech tagging: assign a category to each a lexical item
  • Constituency parsing: define a hierarchy of syntactic units

2 / 51

slide-5
SLIDE 5

Syntax: description of structures in natural languages

She walks the woman dog the VB DET NN PRP NP NP Syntactic analysis

  • Part-of-speech tagging: assign a category to each a lexical item
  • Constituency parsing: define a hierarchy of syntactic units

2 / 51

slide-6
SLIDE 6

Syntax: description of structures in natural languages

woman The walks the woman dog the VB DET NN DET NN NP NP Syntactic analysis

  • Part-of-speech tagging: assign a category to each a lexical item
  • Constituency parsing: define a hierarchy of syntactic units

2 / 51

slide-7
SLIDE 7

Syntax: description of structures in natural languages

She walks the woman dog the VB DET NN PRP NP NP VP Syntactic analysis

  • Part-of-speech tagging: assign a category to each a lexical item
  • Constituency parsing: define a hierarchy of syntactic units

2 / 51

slide-8
SLIDE 8

Syntax: description of structures in natural languages

She walks the woman dog the VB DET NN PRP NP NP VP S Syntactic analysis

  • Part-of-speech tagging: assign a category to each a lexical item
  • Constituency parsing: define a hierarchy of syntactic units

2 / 51

slide-9
SLIDE 9

Syntax: description of structures in natural languages

She walks the woman dog the VB DET NN PRP NP NP VP S Syntactic analysis

  • Part-of-speech tagging: assign a category to each a lexical item
  • Constituency parsing: define a hierarchy of syntactic units

2 / 51

slide-10
SLIDE 10

Syntax: description of structures in natural languages

She walks the woman dog the VB DET NN PRP Syntactic analysis

  • Part-of-speech tagging: assign a category to each a lexical item
  • Constituency parsing: define a hierarchy of syntactic units
  • Dependency parsing: define bi-lexical relations

2 / 51

slide-11
SLIDE 11

Syntax: description of structures in natural languages

She walks the woman dog the VB DET NN PRP NP NP det Syntactic analysis

  • Part-of-speech tagging: assign a category to each a lexical item
  • Constituency parsing: define a hierarchy of syntactic units
  • Dependency parsing: define bi-lexical relations

2 / 51

slide-12
SLIDE 12

Syntax: description of structures in natural languages

She walks the woman dog the VB DET NN PRP NP NP VP

  • bj

Syntactic analysis

  • Part-of-speech tagging: assign a category to each a lexical item
  • Constituency parsing: define a hierarchy of syntactic units
  • Dependency parsing: define bi-lexical relations

2 / 51

slide-13
SLIDE 13

Syntax: description of structures in natural languages

She walks the woman dog the VB DET NN PRP NP NP VP

  • bj

Syntactic analysis

  • Part-of-speech tagging: assign a category to each a lexical item
  • Constituency parsing: define a hierarchy of syntactic units
  • Dependency parsing: define bi-lexical relations

2 / 51

slide-14
SLIDE 14

Syntax: description of structures in natural languages

She walks the woman dog the VB DET NN PRP NP NP VP S subj Syntactic analysis

  • Part-of-speech tagging: assign a category to each a lexical item
  • Constituency parsing: define a hierarchy of syntactic units
  • Dependency parsing: define bi-lexical relations

2 / 51

slide-15
SLIDE 15

Syntax: description of structures in natural languages

She * walks the woman dog the VB DET NN PRP NP NP VP S root Syntactic analysis

  • Part-of-speech tagging: assign a category to each a lexical item
  • Constituency parsing: define a hierarchy of syntactic units
  • Dependency parsing: define bi-lexical relations

2 / 51

slide-16
SLIDE 16

Syntax: description of structures in natural languages

She walks the woman dog the VB DET NN PRP NP NP VP S Syntactic analysis

  • Part-of-speech tagging: assign a category to each a lexical item
  • Constituency parsing: define a hierarchy of syntactic units
  • Dependency parsing: define bi-lexical relations

2 / 51

slide-17
SLIDE 17

Syntactic parsing

Parsing problem Compute the best syntactic analysis for a given sentence

  • Input: sentence
  • Output: constituency/dependency structure

Usual algorithmic trade-off

  • Exhaustive search with optimality certificate (dynamic program, . . . )
  • Heuristic without quality certificate (greedy/beam search, . . . )

3 / 51

slide-18
SLIDE 18

Syntactic parsing

Parsing problem Compute the best syntactic analysis for a given sentence

  • Input: sentence
  • Output: constituency/dependency structure

Usual algorithmic trade-off

  • Exhaustive search with optimality certificate (dynamic program, . . . )
  • Heuristic without quality certificate (greedy/beam search, . . . )

Lagrangian relaxation

  • Heuristic with quality/optimality certificate
  • Guided exhaustive search

3 / 51

slide-19
SLIDE 19

Scientific context

Syntactic analysis Lexicalized Tree adjoining Grammar

  • (Rich) tags
  • Constituency structure
  • Bi-lexical relations

4 / 51

slide-20
SLIDE 20

Scientific context

Syntactic analysis Lexicalized Tree adjoining Grammar

  • (Rich) tags
  • Constituency structure
  • Bi-lexical relations

Weighted grammar

  • Disambiguation

4 / 51

slide-21
SLIDE 21

Scientific context

Syntactic analysis Lexicalized Tree adjoining Grammar

  • (Rich) tags
  • Constituency structure
  • Bi-lexical relations

Weighted grammar

  • Disambiguation
  • Robustness

4 / 51

slide-22
SLIDE 22

Scientific context

Syntactic analysis Lexicalized Tree adjoining Grammar

  • (Rich) tags
  • Constituency structure
  • Bi-lexical relations

Weighted grammar

  • Disambiguation
  • Robustness

Parsing complexity

  • O(n7) with n the sentence length

4 / 51

slide-23
SLIDE 23

Scientific context

Syntactic analysis Graph theory Benefits of reduction

  • Alternative approach to problems
  • Bottleneck characterization
  • Substantial literature

Examples

  • Dependency parsing

⇔ Spanning Arborescence [McDonald et al. 2005]

  • Translation

⇔ Travelling Salesman Problem [Zaslavskiy et al. 2009]

5 / 51

slide-24
SLIDE 24

Scientific context

Syntactic analysis Graph theory Integer Linear Programming Declarative formulation

  • y: syntactic structure
  • f (y): likelihood of the structure
  • gi(y) ≤ 0: constraints on the structure

6 / 51

slide-25
SLIDE 25

Scientific context

Syntactic analysis Graph theory Integer Linear Programming Declarative formulation

  • y: syntactic structure
  • f (y): likelihood of the structure
  • gi(y) ≤ 0: constraints on the structure

Integer Linear Program max

y

f (y) s.t. gi(y) ≤ 0 ∀1 ≤ i ≤ k

6 / 51

slide-26
SLIDE 26

Scientific context

Syntactic analysis Graph theory Integer Linear Programming Declarative formulation

  • y: syntactic structure
  • f (y): likelihood of the structure
  • gi(y) ≤ 0: constraints on the structure

Integer Linear Program max

y

f (y) s.t. gi(y) ≤ 0 ∀1 ≤ i ≤ k NLP Examples Dependency parsing, model combination, semantic parsing . . . [Rush et al. 2010; Koo et al. 2010; Le Roux et al. 2013; Das et al. 2012]

6 / 51

slide-27
SLIDE 27

Scientific context

Syntactic analysis Graph theory Integer Linear Programming Lagrangian relaxation Difficult problem max

y

f (y) s.t. gi(y) ≤ 0 ∀1 ≤ i ≤ k hi(y) ≤ 0 ∀1 ≤ i ≤ l Intuition Difficult problem because of hi(y) ⇒ use soft penalties in the objective instead

7 / 51

slide-28
SLIDE 28

Scientific context

Syntactic analysis Graph theory Integer Linear Programming Lagrangian relaxation Difficult problem max

y

f (y) s.t. gi(y) ≤ 0 ∀1 ≤ i ≤ k hi(y) ≤ 0 ∀1 ≤ i ≤ l Intuition Difficult problem because of hi(y) ⇒ use soft penalties in the objective instead What we get

  • Bounds on the original problem
  • Possibly an optimality certificate

7 / 51

slide-29
SLIDE 29

Scientific context

Syntactic analysis LTAG derivation tree parsing Joint tagging and parsing

8 / 51

slide-30
SLIDE 30

Scientific context

Syntactic analysis Graph theory LTAG derivation tree parsing YRMSA Joint tagging and parsing GMSA

8 / 51

slide-31
SLIDE 31

Scientific context

Syntactic analysis Graph theory Integer Linear Programming LTAG derivation tree parsing YRMSA Non-compact program Joint tagging and parsing GMSA Compact program

8 / 51

slide-32
SLIDE 32

Scientific context

Syntactic analysis Graph theory Integer Linear Programming Lagrangian relaxation LTAG derivation tree parsing YRMSA Non-compact program Non delayed relax-and-cut Joint tagging and parsing GMSA Compact program Dual decomposition

8 / 51

slide-33
SLIDE 33

Outline

  • 1. Lexicalized Tree Adjoining Grammar Parsing
  • 2. Efficient parsing with Lagrangian relaxation
  • 3. A dependency-like LTAG parser
  • 4. Joint Tagging and Dependency Parsing
  • 5. Conclusion

9 / 51

slide-34
SLIDE 34
  • 1. Lexicalized Tree Adjoining

Grammar Parsing

slide-35
SLIDE 35

Lexicalized Tree Adjoining Grammar (LTAG)

Motivations

  • Mildly context-sensitive formalism
  • Linguistically plausible
  • Semantics
  • 1. Lexicalized Tree Adjoining Grammar Parsing

10 / 51

slide-36
SLIDE 36

Lexicalized Tree Adjoining Grammar (LTAG)

Motivations

  • Mildly context-sensitive formalism
  • Linguistically plausible
  • Semantics

Elementary tree Extended part-of-speech tags with structural constraints e.g. A verb with a subject on its left-side walks VB VP S NP↓

Constituents Part-of-speech tag Lexical leaf Substitution site

  • 1. Lexicalized Tree Adjoining Grammar Parsing

10 / 51

slide-37
SLIDE 37

Example

walks

(verb)

VB

  • 1. Lexicalized Tree Adjoining Grammar Parsing

11 / 51

slide-38
SLIDE 38

Example

walks

(verb)

VB walks

(intransitive verb)

VB VP S (sbj) NP↓ walks

(transitive verb)

VB VP S (sbj) NP↓ NP↓ (obj)

  • 1. Lexicalized Tree Adjoining Grammar Parsing

11 / 51

slide-39
SLIDE 39

Elementary tree combination

walks VB VP S NP↓ ⇒ She PRP NP walks VB VP S She PRP NP Substitution

  • 1. Lexicalized Tree Adjoining Grammar Parsing

12 / 51

slide-40
SLIDE 40

Elementary tree combination

walks VB VP S NP↓ ⇒ She PRP NP walks VB VP S She PRP NP Substitution walks VB VP S NP↓ VP VP∗ RB deliberately ⇒ deliberately RB walks VB VP VP S NP↓ Adjunction

  • 1. Lexicalized Tree Adjoining Grammar Parsing

12 / 51

slide-41
SLIDE 41

LTAG derivation tree

She deliberately walks the dog

Bottom-up construction of the syntactic phrase structure

  • 1. Lexicalized Tree Adjoining Grammar Parsing

13 / 51

slide-42
SLIDE 42

LTAG derivation tree

She PRP NP deliberately RB VP∗ VP walks VBZ VP S NP↓ NP↓ the DET NP∗ NP dog NN NP

Bottom-up construction of the syntactic phrase structure

  • 1. Lexicalized Tree Adjoining Grammar Parsing

13 / 51

slide-43
SLIDE 43

LTAG derivation tree

She PRP NP deliberately RB VP∗ VP walks VBZ VP S NP↓ NP↓ the DET NP∗ NP dog NN NP

Bottom-up construction of the syntactic phrase structure

  • 1. Lexicalized Tree Adjoining Grammar Parsing

13 / 51

slide-44
SLIDE 44

LTAG derivation tree

She PRP NP deliberately RB VP∗ VP walks VBZ VP S NP↓ NP↓ the DET NP∗ NP dog NN NP

Bottom-up construction of the syntactic phrase structure

v1 v2 v3 v4 v5 She deliberately walks the dog

Representation alternative as a derivation tree [Rambow et al. 1997]

  • 1. Lexicalized Tree Adjoining Grammar Parsing

13 / 51

slide-45
SLIDE 45

LTAG derivation tree

She PRP NP deliberately RB VP∗ VP walks VBZ VP S NP↓ NP↓ the DET NP∗ NP dog NN NP

Bottom-up construction of the syntactic phrase structure

v1 τ1 v2 τ2 v3 τ3 v4 τ4 v5 τ5 She deliberately walks the dog

Representation alternative as a derivation tree [Rambow et al. 1997]

  • 1. Lexicalized Tree Adjoining Grammar Parsing

13 / 51

slide-46
SLIDE 46

LTAG derivation tree

She PRP NP deliberately RB VP∗ VP walks VBZ VP S NP↓ NP↓ the DET NP∗ NP dog NN NP

1 1.1 1.2.2 1.2

Bottom-up construction of the syntactic phrase structure

v1 τ1 v2 τ2 v3 τ3 v4 τ4 v5 τ5 1.1 1.2 1.2.2 1 She deliberately walks the dog

Representation alternative as a derivation tree [Rambow et al. 1997]

  • 1. Lexicalized Tree Adjoining Grammar Parsing

13 / 51

slide-47
SLIDE 47

Weighted LTAG parsing

Weights

  • Tag weights (elementary tree assignation)
  • Dependency weights (combination operations)

Parsing goal

  • Compute the syntactic structure of maximum weight
  • 1. Lexicalized Tree Adjoining Grammar Parsing

14 / 51

slide-48
SLIDE 48

Weighted LTAG parsing

Weights

  • Tag weights (elementary tree assignation)
  • Dependency weights (combination operations)

Parsing goal

  • Compute the syntactic structure of maximum weight

Complexity [Eisner et al. 2000] O(n6 max(n, g)gt): n: sentence length t: maximum tree size g: maximum ambiguity ⇒ O(n7) asymptotically w.r.t. the sentence length

  • 1. Lexicalized Tree Adjoining Grammar Parsing

14 / 51

slide-49
SLIDE 49
  • 2. Efficient parsing with

Lagrangian relaxation

slide-50
SLIDE 50

Integer Linear Programming

Integer Linear Program (ILP) max

y

y⊤w (maximize the weight of the structure y) s.t. Ay − b ≤ 0 (constraints on the structure)

  • 2. Efficient parsing with Lagrangian relaxation

15 / 51

slide-51
SLIDE 51

Integer Linear Programming

Integer Linear Program (ILP) max

y

y⊤w (maximize the weight of the structure y) s.t. Ay − b ≤ 0 (easy constraints) By − c ≤ 0 (difficult constraints)

  • 2. Efficient parsing with Lagrangian relaxation

15 / 51

slide-52
SLIDE 52

Integer Linear Programming

Integer Linear Program (ILP) max

y

y⊤w (maximize the weight of the structure y) s.t. Ay − b ≤ 0 (easy constraints) By − c ≤ 0 (difficult constraints) Intuition

  • Remove difficult constraints
  • Introduce them as penalties in the objective
  • Solve the new reparametrized problem iteratively
  • 2. Efficient parsing with Lagrangian relaxation

15 / 51

slide-53
SLIDE 53

Example: dependency parsing

She walks the dog

  • 2. Efficient parsing with Lagrangian relaxation

16 / 51

slide-54
SLIDE 54

Example: dependency parsing

v0 v1 v2 v3 v4 * She walks the dog

  • 2. Efficient parsing with Lagrangian relaxation

16 / 51

slide-55
SLIDE 55

Example: dependency parsing

v0 v1 v2 v3 v4 * She walks the dog Reduction Dependency tree ⇔ v0-rooted spanning arborescence i.e. a connected graph such that:

  • v0: no incoming arc
  • v1 . . . v4: exactly one incoming arc
  • Acyclic
  • 2. Efficient parsing with Lagrangian relaxation

16 / 51

slide-56
SLIDE 56

Example: dependency parsing

v0 v1 v2 v3 v4

w0,1

* She walks the dog Graph construction

  • 1. Add arc candidates
  • 2. Add arc weights
  • 2. Efficient parsing with Lagrangian relaxation

16 / 51

slide-57
SLIDE 57

Example: dependency parsing

v0 v1 v2 v3 v4

w0,1 w3,1

* She walks the dog Graph construction

  • 1. Add arc candidates
  • 2. Add arc weights
  • 2. Efficient parsing with Lagrangian relaxation

16 / 51

slide-58
SLIDE 58

Example: dependency parsing

v0 v1 v2 v3 v4 * She walks the dog Graph construction

  • 1. Add arc candidates
  • 2. Add arc weights
  • 2. Efficient parsing with Lagrangian relaxation

16 / 51

slide-59
SLIDE 59

Example: dependency parsing

v0 v1 v2 v3 v4 * She walks the dog Graph construction

  • 1. Add arc candidates
  • 2. Add arc weights
  • 3. Compute the spanning arborescence of maximum weight
  • 2. Efficient parsing with Lagrangian relaxation

16 / 51

slide-60
SLIDE 60

Example: dependency parsing

ILP formulation

  • y: arc incidence vector (ya = 1 iff arc a is selected)
  • w: arc weight vector

max

y

y⊤w (arc-factored model)

  • 2. Efficient parsing with Lagrangian relaxation

16 / 51

slide-61
SLIDE 61

Example: dependency parsing

ILP formulation

  • y: arc incidence vector (ya = 1 iff arc a is selected)
  • w: arc weight vector

max

y

y⊤w (arc-factored model) s.t.

  • a∈δin(v0)

ya = 0 (root)

  • 2. Efficient parsing with Lagrangian relaxation

16 / 51

slide-62
SLIDE 62

Example: dependency parsing

ILP formulation

  • y: arc incidence vector (ya = 1 iff arc a is selected)
  • w: arc weight vector

max

y

y⊤w (arc-factored model) s.t.

  • a∈δin(v0)

ya = 0 (root)

  • a∈δin(v)

ya = 1 ∀v ∈ V + (one head/word)

  • 2. Efficient parsing with Lagrangian relaxation

16 / 51

slide-63
SLIDE 63

Example: dependency parsing

ILP formulation

  • y: arc incidence vector (ya = 1 iff arc a is selected)
  • w: arc weight vector

max

y

y⊤w (arc-factored model) s.t.

  • a∈δin(v0)

ya = 0 (root)

  • a∈δin(v)

ya = 1 ∀v ∈ V + (one head/word)

  • a∈δin(W )

ya ≥ 1 ∀W ⊆ V + (connectedness)

  • 2. Efficient parsing with Lagrangian relaxation

16 / 51

slide-64
SLIDE 64

Example: dependency parsing

ILP formulation

  • y: arc incidence vector (ya = 1 iff arc a is selected)
  • w: arc weight vector

max

y

y⊤w (arc-factored model) s.t.

  • a∈δin(v0)

ya = 0 (root)

  • a∈δin(v)

ya = 1 ∀v ∈ V + (one head/word)

  • a∈δin(W )

ya ≥ 1 ∀W ⊆ V + (connectedness) y ∈ {0, 1}A (integrality)

  • 2. Efficient parsing with Lagrangian relaxation

16 / 51

slide-65
SLIDE 65

Example: dependency parsing

ILP formulation

  • y: arc incidence vector (ya = 1 iff arc a is selected)
  • w: arc weight vector

max

y

y⊤w s.t. y ∈ Y Efficient decoding

  • Generic solver: simplex, interior point method, . . .
  • Specialized algorithm: Maximum Spanning Arborescence

O(n2) [Edmonds 1967; Schrijver 2003; McDonald et al. 2005]

  • 2. Efficient parsing with Lagrangian relaxation

16 / 51

slide-66
SLIDE 66

Lagrangian relaxation

Difficult constraints Force each vertex to have at most k outgoing arc

  • 2. Efficient parsing with Lagrangian relaxation

17 / 51

slide-67
SLIDE 67

Lagrangian relaxation

Difficult constraints Force each vertex to have at most k outgoing arc ILP formulation max

y

y⊤w s.t. y ∈ Y (easy constraints)

  • 2. Efficient parsing with Lagrangian relaxation

17 / 51

slide-68
SLIDE 68

Lagrangian relaxation

Difficult constraints Force each vertex to have at most k outgoing arc ILP formulation max

y

y⊤w s.t. y ∈ Y (easy constraints)

  • a∈δ+(v)

ya ≤ k ∀v ∈ V (hard constraints)

  • 2. Efficient parsing with Lagrangian relaxation

17 / 51

slide-69
SLIDE 69

Lagrangian relaxation

Difficult constraints Force each vertex to have at most k outgoing arc ILP formulation max

y

y⊤w s.t. y ∈ Y (easy constraints)

  • a∈δ+(v)

ya ≤ k ∀v ∈ V (hard constraints) Lagrangian relaxation

  • 1. Relax difficult constraints as penalties in the objective

λ ≥ 0: vector of Lagrangian multipliers

  • 2. Efficient parsing with Lagrangian relaxation

17 / 51

slide-70
SLIDE 70

Lagrangian relaxation

Difficult constraints Force each vertex to have at most k outgoing arc Lagrangian dual max

y

y⊤w −

  • v∈V

λv(

  • a∈δ+v

ya − k) s.t. y ∈ Y (easy constraints) Lagrangian relaxation

  • 1. Relax difficult constraints as penalties in the objective

λ ≥ 0: vector of Lagrangian multipliers ⇒ Upper bound on the original problem

  • 2. Efficient parsing with Lagrangian relaxation

17 / 51

slide-71
SLIDE 71

Lagrangian relaxation

Difficult constraints Force each vertex to have at most k outgoing arc Lagrangian dual max

y

y⊤w′ (+ constant term w.r.t. y) s.t. y ∈ Y (easy constraints) Lagrangian relaxation

  • 1. Relax difficult constraints as penalties in the objective

λ ≥ 0: vector of Lagrangian multipliers

  • 2. Rewrite the objective
  • 2. Efficient parsing with Lagrangian relaxation

17 / 51

slide-72
SLIDE 72

Lagrangian relaxation

Difficult constraints Force each vertex to have at most k outgoing arc Lagrangian dual min

λ

max

y

y⊤w′ (+ constant term w.r.t. y) s.t. y ∈ Y (easy constraints) Lagrangian relaxation

  • 1. Relax difficult constraints as penalties in the objective

λ ≥ 0: vector of Lagrangian multipliers

  • 2. Rewrite the objective
  • 3. Minimize over λ
  • 2. Efficient parsing with Lagrangian relaxation

17 / 51

slide-73
SLIDE 73

Lagrangian optimization

Lagrangian dual min

λ

max

y

y⊤w′ (+ constant term w.r.t. y) s.t. y ∈ Y Optimization

  • maxy: easy (assumption) ⇒ MSA
  • minλ: subgradient descent ⇒ loop over the maximization
  • 2. Efficient parsing with Lagrangian relaxation

18 / 51

slide-74
SLIDE 74

Lagrangian optimization

Lagrangian dual min

λ

max

y

y⊤w′ (+ constant term w.r.t. y) s.t. y ∈ Y Optimization

  • maxy: easy (assumption) ⇒ MSA
  • minλ: subgradient descent ⇒ loop over the maximization

Heuristic

  • Quality certificate
  • Possible optimality certificate

Exhaustive search

  • Branch-and-bound
  • Exact pruning
  • 2. Efficient parsing with Lagrangian relaxation

18 / 51

slide-75
SLIDE 75

Methodology

Methodology

  • 1. Graph characterization of LTAG-derivations
  • 2. ILP formulation of the problem
  • 3. Lagrangian based decoder
  • 2. Efficient parsing with Lagrangian relaxation

19 / 51

slide-76
SLIDE 76

Methodology

Methodology

  • 1. Graph characterization of LTAG-derivations
  • 2. ILP formulation of the problem
  • 3. Lagrangian based decoder

Requirements for the ILP formulation

  • Formulation as linear inequalities
  • 2. Efficient parsing with Lagrangian relaxation

19 / 51

slide-77
SLIDE 77

Methodology

Methodology

  • 1. Graph characterization of LTAG-derivations
  • 2. ILP formulation of the problem
  • 3. Lagrangian based decoder

Requirements for the ILP formulation

  • Formulation as linear inequalities

Requirements for the Lagrangian based decoder

  • Relaxation with "nice" objective function
  • Efficient algorithm that solve the relaxed problem
  • 2. Efficient parsing with Lagrangian relaxation

19 / 51

slide-78
SLIDE 78
  • 3. A dependency-like LTAG parser
slide-79
SLIDE 79

Proposed approach

v1 v2 v3 v4 v5 She deliberately walks the dog ⇒ v1 τ1 v2 τ2 v3 τ3 v4 τ4 v5 τ5 1.1 1.2 1.2.2 1 . 1 She deliberately walks the dog

A dependency-like LTAG parser

  • 1. LTAG compatible dependency parsing [Corro et al. 2016]
  • 2. LTAG derivation tree labeler [Corro et al. 2017b]
  • 3. A dependency-like LTAG parser

20 / 51

slide-80
SLIDE 80

Proposed approach

v1 v2 v3 v4 v5 She deliberately walks the dog ⇒ v1 τ1 v2 τ2 v3 τ3 v4 τ4 v5 τ5 1.1 1.2 1.2.2 1 . 1 She deliberately walks the dog

A dependency-like LTAG parser

  • 1. LTAG compatible dependency parsing [Corro et al. 2016]
  • 2. LTAG derivation tree labeler [Corro et al. 2017b]

Methodology

  • 1. Graph characterization of LTAG-derivations
  • 2. ILP formulation of the problem
  • 3. Lagrangian based decoder
  • 3. A dependency-like LTAG parser

20 / 51

slide-81
SLIDE 81

Dependency trees

v1 v2 v3 v4 v5 She deliberately walks the dog

Structural properties of dependency structures Non-projective Projective

  • 3. A dependency-like LTAG parser

21 / 51

slide-82
SLIDE 82

Dependency trees

v1 v2 v3 v4 v5 She deliberately walks the dog

Structural properties of dependency structures Non-projective Projective k-Bounded Block Degree Well-nestedness

  • 3. A dependency-like LTAG parser

21 / 51

slide-83
SLIDE 83

Dependency trees

v1 v2 v3 v4 v5 She deliberately walks the dog

Structural properties of dependency structures Non-projective Projective k-Bounded Block Degree Well-nestedness LTAG derivation tree [Bodirsky et al. 2009; Kuhlmann 2010]

  • 2-Bounded Block Degree (2-BBD)
  • Well-nested (WN)
  • 3. A dependency-like LTAG parser

21 / 51

slide-84
SLIDE 84

Yield

Yield of a vertex v Set of all vertices reachable from v ⇒ Required in order to defined structural properties

v0 v2 v4 v1 v3 s0 s1 s2 s3 s4

  • 3. A dependency-like LTAG parser

22 / 51

slide-85
SLIDE 85

Yield

Yield of a vertex v Set of all vertices reachable from v ⇒ Required in order to defined structural properties

v0 v2 v4 v1 v3 s0 s1 s2 s3 s4 Yield(v0) = {v0, v1, v2, v3, v4} v0 v2 v4 v1 v3

  • 3. A dependency-like LTAG parser

22 / 51

slide-86
SLIDE 86

Yield

Yield of a vertex v Set of all vertices reachable from v ⇒ Required in order to defined structural properties

v0 v2 v4 v1 v3 s0 s1 s2 s3 s4 Yield(v0) = {v0, v1, v2, v3, v4} Yield(v1) = {v1} v1

  • 3. A dependency-like LTAG parser

22 / 51

slide-87
SLIDE 87

Yield

Yield of a vertex v Set of all vertices reachable from v ⇒ Required in order to defined structural properties

v0 v2 v4 v1 v3 s0 s1 s2 s3 s4 Yield(v0) = {v0, v1, v2, v3, v4} Yield(v1) = {v1} Yield(v2) = {v1, v2, v3, v4} v2 v4 v1 v3

  • 3. A dependency-like LTAG parser

22 / 51

slide-88
SLIDE 88

Yield

Yield of a vertex v Set of all vertices reachable from v ⇒ Required in order to defined structural properties

v0 v2 v4 v1 v3 s0 s1 s2 s3 s4 Yield(v0) = {v0, v1, v2, v3, v4} Yield(v1) = {v1} Yield(v2) = {v1, v2, v3, v4} Yield(v3) = {v3} v3

  • 3. A dependency-like LTAG parser

22 / 51

slide-89
SLIDE 89

Yield

Yield of a vertex v Set of all vertices reachable from v ⇒ Required in order to defined structural properties

v0 v2 v4 v1 v3 s0 s1 s2 s3 s4 Yield(v0) = {v0, v1, v2, v3, v4} Yield(v1) = {v1} Yield(v2) = {v1, v2, v3, v4} Yield(v3) = {v3} Yield(v4) = {v3, v4} v4 v1 v3

  • 3. A dependency-like LTAG parser

22 / 51

slide-90
SLIDE 90

Contiguous yield

Block degree of a vertex Minimum number of intervals needed to describe its yield Contiguous yield Yield which can be defined with a single interval

v0 v1 v2 v3 v4 s0 s1 s2 s3 s4

  • 3. A dependency-like LTAG parser

23 / 51

slide-91
SLIDE 91

Contiguous yield

Block degree of a vertex Minimum number of intervals needed to describe its yield Contiguous yield Yield which can be defined with a single interval

v0 v1 v2 v3 v4 s0 s1 s2 s3 s4 Yield(v0) = [v0 . . . v4] BD(v0) = 1 v0 v1 v2 v3 v4

  • 3. A dependency-like LTAG parser

23 / 51

slide-92
SLIDE 92

Contiguous yield

Block degree of a vertex Minimum number of intervals needed to describe its yield Contiguous yield Yield which can be defined with a single interval

v0 v1 v2 v3 v4 s0 s1 s2 s3 s4 Yield(v0) = [v0 . . . v4] BD(v0) = 1 Yield(v1) = [v1] ∪ [v4] BD(v1) = 2 v1 v4

  • 3. A dependency-like LTAG parser

23 / 51

slide-93
SLIDE 93

Contiguous yield

Block degree of a vertex Minimum number of intervals needed to describe its yield Contiguous yield Yield which can be defined with a single interval

v0 v1 v2 v3 v4 s0 s1 s2 s3 s4 Yield(v0) = [v0 . . . v4] BD(v0) = 1 Yield(v1) = [v1] ∪ [v4] BD(v1) = 2 Yield(v2) = [v2 . . . v3] BD(v2) = 1 v2 v3

  • 3. A dependency-like LTAG parser

23 / 51

slide-94
SLIDE 94

Contiguous yield

Block degree of a vertex Minimum number of intervals needed to describe its yield Contiguous yield Yield which can be defined with a single interval

v0 v1 v2 v3 v4 s0 s1 s2 s3 s4 Yield(v0) = [v0 . . . v4] BD(v0) = 1 Yield(v1) = [v1] ∪ [v4] BD(v1) = 2 Yield(v2) = [v2 . . . v3] BD(v2) = 1 Yield(v3) = [v3] BD(v3) = 1 v3

  • 3. A dependency-like LTAG parser

23 / 51

slide-95
SLIDE 95

Contiguous yield

Block degree of a vertex Minimum number of intervals needed to describe its yield Contiguous yield Yield which can be defined with a single interval

v0 v1 v2 v3 v4 s0 s1 s2 s3 s4 Yield(v0) = [v0 . . . v4] BD(v0) = 1 Yield(v1) = [v1] ∪ [v4] BD(v1) = 2 Yield(v2) = [v2 . . . v3] BD(v2) = 1 Yield(v3) = [v3] BD(v3) = 1 Yield(v4) = [v4] BD(v4) = 1 v4

  • 3. A dependency-like LTAG parser

23 / 51

slide-96
SLIDE 96

Contiguous yield

Block degree of a vertex Minimum number of intervals needed to describe its yield Contiguous yield Yield which can be defined with a single interval

v0 v1 v2 v3 v4 s0 s1 s2 s3 s4 Yield(v0) = [v0 . . . v4] BD(v0) = 1 Yield(v1) = [v1] ∪ [v4] BD(v1) = 2 Yield(v2) = [v2 . . . v3] BD(v2) = 1 Yield(v3) = [v3] BD(v3) = 1 Yield(v4) = [v4] BD(v4) = 1 v1 v4

  • 3. A dependency-like LTAG parser

23 / 51

slide-97
SLIDE 97

Structural properties of dependency trees

Projective dependency tree Arborescence with contiguous yields only

v0 v2 v1 v4 v3 s0 s1 s2 s3 s4

  • 3. A dependency-like LTAG parser

24 / 51

slide-98
SLIDE 98

Structural properties of dependency trees

Projective dependency tree Arborescence with contiguous yields only

v0 v2 v1 v4 v3 s0 s1 s2 s3 s4

Non-projective dependency tree Arborescence with at least one non-contiguous yield

v0 v2 v4 v1 v3 s0 s1 s2 s3 s4

  • 3. A dependency-like LTAG parser

24 / 51

slide-99
SLIDE 99

Structural properties (1/2): k-BBD

k-Bounded Block Degree (k-BBD)

  • BD of a tree: the maximal block degree of its vertices
  • k-BBD tree: tree with a BD less or equal to k

v0 v1 v2 v3 v4 s0 s1 s2 s3 s4 Yield(v0) = [v0 . . . v4] BD(v0) = 1 Yield(v1) = [v1] ∪ [v4] BD(v1) = 2 Yield(v2) = [v2 . . . v3] BD(v2) = 1 Yield(v3) = [v3] BD(v3) = 1 Yield(v4) = [v4] BD(v4) = 1 v1 v4

Tree of block degree 2

  • 3. A dependency-like LTAG parser

25 / 51

slide-100
SLIDE 100

Structural properties (2/2): WN

Well-nestedness (WN)

  • Interleaving sets I1 ∩ I2 = ∅:

∃i, j ∈ I1 and k, l ∈ I2 such that i < k < j < l

  • Well-nested tree: does not contain two vertices whose yields interleave

⇒ e.g. a yield cannot be inside and outside a gap

v0 v1 v2 v3 v4 s0 s1 s2 s3 s4

Well-nested tree

v0 v1 v2 v3 v4 s0 s1 s2 s3 s4

Not well-nested tree

  • 3. A dependency-like LTAG parser

26 / 51

slide-101
SLIDE 101

Parsing algorithms

Complexity Non-projective O(n2) [McDonald et al. 2005] Projective O(n3) [Eisner 2000] WN + 2-BBD O(n7) [Gómez-Rodríguez et al. 2009] WN + k-BBD, k ≥ 2 O(n5+2(k−1)) [Gómez-Rodríguez et al. 2009] k-BBD, k ≥ 2 NP-complete [Satta 1992] Remark Same complexity as LTAG parsing :( Contribution

  • ILP formulation of the problem
  • Solver based on Lagrangian relaxation
  • 3. A dependency-like LTAG parser

27 / 51

slide-102
SLIDE 102

k-Bounded Block Degree Constraint

Definition W≥k+1: vertex subsets describing at least k + 1 intervals

  • 3. A dependency-like LTAG parser

28 / 51

slide-103
SLIDE 103

k-Bounded Block Degree Constraint

Definition W≥k+1: vertex subsets describing at least k + 1 intervals Example with k = 2 and [v1] ∪ [v3] ∪ [v5] ∈ W≥3 v0 v1 v2 v3 v4 v5 Not 2-BBD →: incoming/outgoing arcs to the vertex subset [v1] ∪ [v3] ∪ [v5]

  • 3. A dependency-like LTAG parser

28 / 51

slide-104
SLIDE 104

k-Bounded Block Degree Constraint

Definition W≥k+1: vertex subsets describing at least k + 1 intervals Example with k = 2 and [v1] ∪ [v3] ∪ [v5] ∈ W≥3 v0 v1 v2 v3 v4 v5 Not 2-BBD →: incoming/outgoing arcs to the vertex subset [v1] ∪ [v3] ∪ [v5]

  • 3. A dependency-like LTAG parser

28 / 51

slide-105
SLIDE 105

k-Bounded Block Degree Constraint

Definition W≥k+1: vertex subsets describing at least k + 1 intervals Example with k = 2 and [v1] ∪ [v3] ∪ [v5] ∈ W≥3 v0 v1 v2 v3 v4 v5 Not 2-BBD →: incoming/outgoing arcs to the vertex subset [v1] ∪ [v3] ∪ [v5]

  • 3. A dependency-like LTAG parser

28 / 51

slide-106
SLIDE 106

k-Bounded Block Degree Constraint

Definition W≥k+1: vertex subsets describing at least k + 1 intervals Example with k = 2 and [v1] ∪ [v3] ∪ [v5] ∈ W≥3 v0 v1 v2 v3 v4 v5 Not 2-BBD v0 v1 v2 v3 v4 v5 2-BBD →: incoming/outgoing arcs to the vertex subset [v1] ∪ [v3] ∪ [v5]

  • 3. A dependency-like LTAG parser

28 / 51

slide-107
SLIDE 107

k-Bounded Block Degree Constraint

Definition W≥k+1: vertex subsets describing at least k + 1 intervals Example with k = 2 and [v1] ∪ [v3] ∪ [v5] ∈ W≥3 v0 v1 v2 v3 v4 v5 Not 2-BBD v0 v1 v2 v3 v4 v5 2-BBD Constraint ∀W ∈ W≥k+1 ⇒ At least two incoming/outgoing arcs

  • 3. A dependency-like LTAG parser

28 / 51

slide-108
SLIDE 108

Well-nestedness constraint

Notation I: family of pairs of disjoint interleaving vertex subsets

  • 3. A dependency-like LTAG parser

29 / 51

slide-109
SLIDE 109

Well-nestedness constraint

Notation I: family of pairs of disjoint interleaving vertex subsets Example with ({1, 3}, {2, 4, 5}) ∈ I v0 v1 v2 v3 v4 v5 Not Well-nested v0 v1 v2 v3 v4 v5 Well-nested

  • 3. A dependency-like LTAG parser

29 / 51

slide-110
SLIDE 110

Well-nestedness constraint

Notation I: family of pairs of disjoint interleaving vertex subsets Example with ({1, 3}, {2, 4, 5}) ∈ I v0 v1 v2 v3 v4 v5 Not Well-nested v0 v1 v2 v3 v4 v5 Well-nested Constraint For each couple (I1, I2) ∈ I ⇒ At least two incoming/outgoing arcs for I1 or I2

  • 3. A dependency-like LTAG parser

29 / 51

slide-111
SLIDE 111

Full ILP: parsing with k-BBD and WN constraints

max

y

y⊤w (Arc-factored) s.t. y ∈ Y (Arborescence)

  • a∈δ(W )

ya ≥ 2 ∀ W ∈ W≥k+1 (k-BBD)

  • a∈δ(I1)

ya +

  • a∈δ(I2)

ya ≥ 3 ∀(I1, I2) ∈ I (WN)

  • 3. A dependency-like LTAG parser

30 / 51

slide-112
SLIDE 112

Full ILP: parsing with k-BBD and WN constraints

max

y

y⊤w (Arc-factored) s.t. y ∈ Y (Arborescence)

  • a∈δ(W )

ya ≥ 2 ∀ W ∈ W≥k+1 (k-BBD)

  • a∈δ(I1)

ya +

  • a∈δ(I2)

ya ≥ 3 ∀(I1, I2) ∈ I (WN) Problem

  • MSA: k-BBD and WN constraints can not be integrated
  • Generic solver: exponential number of constraints
  • No efficient algorithm [Gómez-Rodríguez et al. 2009]
  • 3. A dependency-like LTAG parser

30 / 51

slide-113
SLIDE 113

Full ILP: parsing with k-BBD and WN constraints

max

y

y⊤w (Arc-factored) s.t. y ∈ Y (Arborescence)

  • a∈δ(W )

ya ≥ 2 ∀ W ∈ W≥k+1 (k-BBD)

  • a∈δ(I1)

ya +

  • a∈δ(I2)

ya ≥ 3 ∀(I1, I2) ∈ I (WN) Problem

  • MSA: k-BBD and WN constraints can not be integrated
  • Generic solver: exponential number of constraints
  • No efficient algorithm [Gómez-Rodríguez et al. 2009]

Solving the ILP ⇒ Lagrangian Relaxation applied on k-BBD/WN constraints

  • 3. A dependency-like LTAG parser

30 / 51

slide-114
SLIDE 114

Lagrangian Relaxation

Lagrangian Dual Problem min

λ≥0

max

y∈Y

f (y, λ) Efficient minimization of the dual

  • Max: Maximum Spanning Arborescence
  • Min: Subgradient descent
  • Many relaxed constraints: Non Delayed Relax-and-Cut

Efficient maximization of the primal

  • Branch-and-Bound
  • Problem reduction (exact pruning technique)
  • 3. A dependency-like LTAG parser

31 / 51

slide-115
SLIDE 115

Experiments

Problem of existing LTAG treebanks

  • Projective derivation trees only
  • Derivation forest
  • 3. A dependency-like LTAG parser

32 / 51

slide-116
SLIDE 116

Experiments

Problem of existing LTAG treebanks

  • Projective derivation trees only
  • Derivation forest

Dependency treebanks Language Structure of 99% of trees English WN + 2-BBD German 3-BBD Dutch WN + 3-BBD Spanish WN + 2-BBD Portuguese WN + 3-BBD ⇒ just test on dependency treebanks!

  • 3. A dependency-like LTAG parser

32 / 51

slide-117
SLIDE 117

Experimental setup

Weighting model Feature-based model learned with the perceptron algorithm Goals

  • decoding time?
  • accuracy?
  • 3. A dependency-like LTAG parser

33 / 51

slide-118
SLIDE 118

Experimental setup

Weighting model Feature-based model learned with the perceptron algorithm Goals

  • decoding time?
  • accuracy?

Turboparser [Martins et al. 2013] English German Dutch Spanish Portuguese

WN+2-BBD 3-BBD WN+3-BBD WN+2-BBD WN+3-BBD

1st 94.87 98.74 93.26 93.43 94.79 2nd 99.75 99.28 97.93 98.54 98.96 3rd 99.75 99.24 97.41 99.64 98.98 Percentage of valid structure with respect to the weighting order

  • 3. A dependency-like LTAG parser

33 / 51

slide-119
SLIDE 119

Efficiency: Relative parsing time

English

WN+2-BBD

German

3-BBD

Dutch

WN+3-BBD

Spanish

WN+2-BBD

Portuguese

WN+3-BBD

10 20 30 40 50 60 Relative parsing time Arc-factored & Non-projective This work Turbo Parser 2nd order Turbo Parser 3rd order

  • 3. A dependency-like LTAG parser

34 / 51

slide-120
SLIDE 120

Efficiency: Relative parsing time

English

WN+2-BBD

German

3-BBD

Dutch

WN+3-BBD

Spanish

WN+2-BBD

Portuguese

WN+3-BBD

10 20 30 40 50 60 Relative parsing time Arc-factored & Non-projective This work Turbo Parser 2nd order Turbo Parser 3rd order

  • 3. A dependency-like LTAG parser

34 / 51

slide-121
SLIDE 121

UAS (Ratio of correct arcs)

English German Dutch Spanish Portuguese

92.4 90.4 79.7 87.3 88.1 92 89.8 79.1 86.6 87.4 89.5 87.7 77.4 83.4 83.2 89.4 87.7 77.3 83.3 83.1

This work Non-projective 2nd order 3rd order

3-BBD + WN 2-BBD + WN 3-BBD + WN 3-BBD 2-BBD + WN

  • 3. A dependency-like LTAG parser

35 / 51

slide-122
SLIDE 122

Interim conclusion

Our contribution

  • First efficient and flexible algorithm:
  • k-BBD with arbitrary k
  • WN optional
  • First experimental results with K-BBD and WN parsing
  • Linear time algorithm LTAG parse labeller (see thesis)

Perspectives

  • Applications of the algorithm to other structures (see thesis)

⇒ Yield Restricted Maximum Spanning Arborescence

  • 3. A dependency-like LTAG parser

36 / 51

slide-123
SLIDE 123

Limits of this approach

Pipeline issues

  • Error propagation
  • Possibly infeasible labelling

LTAG limits

  • No dataset
  • Continuous constituents only

Proposal

  • Joint tagging and parsing
  • No LTAG motivated structural constraints
  • 3. A dependency-like LTAG parser

37 / 51

slide-124
SLIDE 124
  • 4. Joint Tagging and Dependency

Parsing

slide-125
SLIDE 125

Discontinuous constituents

What WP WHNP-1 does MD she PRP NP SQ SBARQ walk VB VP ǫ *T*-1 NP ?

Motivation

  • Traces usually ignored
  • 4. Joint Tagging and Dependency Parsing

38 / 51

slide-126
SLIDE 126

Discontinuous constituents

walk VB NP NP SQ∗ WHNP SQ she PRP NP What WP WHNP does VB SQOA SBARQ

Motivation

  • Traces usually ignored
  • Difficult to automatically extract a LTAG
  • 4. Joint Tagging and Dependency Parsing

38 / 51

slide-127
SLIDE 127

Discontinuous constituents

What WP WHNP does MD she PRP NP SQ SBARQ walk VB VP ?

Motivation

  • Traces usually ignored
  • Difficult to automatically extract a LTAG
  • Painless discontinuous transformation [Evang et al. 2011]
  • 4. Joint Tagging and Dependency Parsing

38 / 51

slide-128
SLIDE 128

What WP WHNP does MD she PRP NP walk VB VP SQ SBARQ ?

Joint tagging and dependency parsing

Problem

  • 1. Assign one tag per lexical item
  • 2. Assign one head per lexical item with arborescence constraints

Benefits

  • Flexible composition mechanism
  • Guaranteed feasible solution
  • More expressive weighting factors
  • 4. Joint Tagging and Dependency Parsing

39 / 51

slide-129
SLIDE 129

Example (1)

v1 v2 v3 v4 v5 She deliberately walks the dog

  • 4. Joint Tagging and Dependency Parsing

40 / 51

slide-130
SLIDE 130

Example (1)

v1 v2 v3 v4 v5 She deliberately walks the dog

  • 4. Joint Tagging and Dependency Parsing

40 / 51

slide-131
SLIDE 131

Example (1)

v1 τ1 v2 τ2 v3 τ3 v4 τ4 v5 τ5

1.1 1.2 1.2.2 1

She deliberately walks the dog

  • 4. Joint Tagging and Dependency Parsing

40 / 51

slide-132
SLIDE 132

Example (2)

V0 V1 (She) V2 (walks) V3 (the) V4 (dog) * τ1 τ2 τ3 τ4

  • 4. Joint Tagging and Dependency Parsing

41 / 51

slide-133
SLIDE 133

Example (2)

V0 V1 (She) V2 (walks) V3 (the) V4 (dog) * τ1 τ2 τ3 τ4

  • 4. Joint Tagging and Dependency Parsing

41 / 51

slide-134
SLIDE 134

Example (2)

V0 V1 (She) V2 (walks) V3 (the) V4 (dog) * τ1 τ2 τ3 τ4

  • 4. Joint Tagging and Dependency Parsing

41 / 51

slide-135
SLIDE 135

Generalized Maximum Spanning Arborescence (GMSA)

Reduction

  • Word ⇒ Cluster
  • Tag ⇒ vertex
  • Attachment ⇒ arc

Complexity NP-hard [Myung et al. 1995]

  • 4. Joint Tagging and Dependency Parsing

42 / 51

slide-136
SLIDE 136

Generalized Maximum Spanning Arborescence (GMSA)

Reduction

  • Word ⇒ Cluster
  • Tag ⇒ vertex
  • Attachment ⇒ arc

Complexity NP-hard [Myung et al. 1995] Methodology [Corro et al. 2017a]

  • 1. Graph characterization of joint tagging and parsing
  • 2. ILP formulation of the problem [Pop 2009]
  • 3. Lagrangian based decoder (dual decomposition)
  • 4. Joint Tagging and Dependency Parsing

42 / 51

slide-137
SLIDE 137

Sketch of the algorithm (1)

V0 V1 (She) V2 (walks) V3 (the) V4 (dog) * τ1 τ2 τ3 τ4

  • 4. Joint Tagging and Dependency Parsing

43 / 51

slide-138
SLIDE 138

Sketch of the algorithm (1)

V0 V1 (She) V2 (walks) V3 (the) V4 (dog) * τ1 τ2 τ3 τ4

  • 4. Joint Tagging and Dependency Parsing

43 / 51

slide-139
SLIDE 139

Sketch of the algorithm (1)

V0 V1 (She) V2 (walks) V3 (the) V4 (dog) * τ1 τ2 τ3 τ4

  • 4. Joint Tagging and Dependency Parsing

43 / 51

slide-140
SLIDE 140

Sketch of the algorithm (1)

V0 V1 (She) V2 (walks) V3 (the) V4 (dog) * τ1 τ2 τ3 τ4

  • 4. Joint Tagging and Dependency Parsing

43 / 51

slide-141
SLIDE 141

Sketch of the algorithm (2)

V0 V1 (She) V2 (walks) V3 (the) V4 (dog) * τ1 τ2 τ3 τ4

  • 4. Joint Tagging and Dependency Parsing

44 / 51

slide-142
SLIDE 142

Sketch of the algorithm (2)

V0 V1 (She) V2 (walks) V3 (the) V4 (dog) * τ1 τ2 τ3 τ4

  • 4. Joint Tagging and Dependency Parsing

44 / 51

slide-143
SLIDE 143

Dual decomposition

Lagrangian Dual Problem max

y1,y2

f (y1) + g(y2) s.t. y1 = y2

  • 4. Joint Tagging and Dependency Parsing

45 / 51

slide-144
SLIDE 144

Dual decomposition

Lagrangian Dual Problem min

λ1,λ2

max

y1,y2

f ′(y1, λ1) + g′(y2, λ2)

  • 4. Joint Tagging and Dependency Parsing

45 / 51

slide-145
SLIDE 145

Dual decomposition

Lagrangian Dual Problem min

λ1,λ2

max

y1,y2

f ′(y1, λ1) + g′(y2, λ2) Efficient minimization of the dual

  • Max: 2 subproblems
  • Min: Subgradient descent

Lagrangian enhancement

  • Arc re-weighting
  • Problem reduction (exact pruning technique)
  • 4. Joint Tagging and Dependency Parsing

45 / 51

slide-146
SLIDE 146

Probabilistic model

P(d, t|s) = Pα(d|s) × Pν(t|d, w) =

  • (h,m)∈d

Pα(h|m, s) × Pν(tm|m, d, w) Independence assumption

  • Pα: head probability
  • Pν: tag probability conditioned on dependencies

Parameter estimation

  • Neural network
  • Log-likelihood maximization on train data
  • 4. Joint Tagging and Dependency Parsing

46 / 51

slide-147
SLIDE 147

Experimental results

Discontinuous PTB (English) LF Time (min) Short sentences only This work 89.85 ≈ 4 van Craenburgh et al. 87.00 ≈ 180 Full test set This work 89.17 ≈ 5.5 TIGER (German) LF Time (min) This work 81.63 ≈ 11 Coavoux & Crabbé 81.60 ≈ 2.5

  • 4. Joint Tagging and Dependency Parsing

47 / 51

slide-148
SLIDE 148

Interim conclusion

Problem formulation

  • Joint sequence tagging and non-projective dependency parsing

Contribution

  • A novel approach for discontinuous constituent parsing
  • A novel algorithm for the GMSA

Future work

  • Max-margin training
  • High-order scoring models:
  • bi-gram
  • sibling and grand-father
  • Application to other joint tagging and parsing problems
  • 4. Joint Tagging and Dependency Parsing

48 / 51

slide-149
SLIDE 149
  • 5. Conclusion
slide-150
SLIDE 150

Conclusion: Contributions

Methodology

  • 1. Graph characterization of a NLP problem
  • 2. ILP formulation
  • 3. Lagrangian based decoder

Alternative interpretation of syntactic structures

  • 1. LTAG derivation tree

⇒ Yield Restricted Spanning Arborescence

  • 2. Joint tagging and parsing

⇔ Generalized Spanning Arborescence

  • 5. Conclusion

49 / 51

slide-151
SLIDE 151

Conclusion: Research directions

In progress

  • Joint part-of-speech tagging and dependency parsing
  • High-order GMSA

Lexicalized grammars [Kuhlmann 2010]

  • Lexicalized LCFRS

Applications outside NLP

  • Standard optimization dataset
  • Other applied research areas
  • 5. Conclusion

50 / 51

slide-152
SLIDE 152

Conclusion: Structured latent variables

Motivation

  • Syntactic parsing: (most often) not an end in itself
  • Annotation process: expensive

End-to-end learning

  • Syntactic structure as a layer in a neural network
  • Training for the end goal (e.g. translation)

Deep generative models [Kingma et al. 2014]

  • Semisupervised/unsupervised structured learning
  • Linguistically motivated priors
  • 5. Conclusion

51 / 51

slide-153
SLIDE 153

References i

References

Bodirsky, Manuel, Marco Kuhlmann, and Mathias Möhl (2009). “Well-nested drawings as models of syntactic structure”. In: Tenth Conference on Formal Grammar and Ninth Meeting on Mathematics

  • pp. 195–203.
slide-154
SLIDE 154

References ii

Corro, Caio et al. (2016). “Dependency Parsing with Bounded Block Degree and Well- nestedness via Lagrangian Relaxation and Branch-and-Bound”. In: Proceedings of the 54th Annual Meeting of the Association for Computational Berlin, Germany: Association for Computational Linguistics,

  • pp. 355–366. DOI: 10.18653/v1/P16-1034. URL:

http://www.aclweb.org/anthology/P16-1034. Corro, Caio, Joseph Le Roux, and Mathieu Lacroix (2017a). “Efficient Discontinuous Phrase-Structure Parsing via the Generalized Maximum Spanning Arborescence”. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Copenhagen, Denmark: Association for Computational Linguistics, pp. 1644–1654. URL: http://aclweb.org/anthology/D17-1172.

slide-155
SLIDE 155

References iii

Corro, Caio and Joseph Le Roux (2017b). “Transforming Dependency Structures to LTAG Derivation Trees”. In: Proceedings of the 13th International Workshop on Tree Adjoining Gramma Umeå, Sweden: Association for Computational Linguistics,

  • pp. 112–121. URL: http://aclweb.org/anthology/W17-6212.

Das, Dipanjan, André FT Martins, and Noah A Smith (2012). “An exact dual decompo- sition algorithm for shallow semantic parsing with constraints”. In: Proceedings of the First Joint Conference on Lexical and Computational Association for Computational Linguistics, pp. 209–217. Edmonds, Jack (1967). “Optimum branchings”. In: Journal of Research of the National Bureau of Standards 71.4,

  • pp. 233–240.
slide-156
SLIDE 156

References iv

Eisner, Jason (2000). “Bilexical grammars and their cubic-time parsing algorithms”. In: Advances in probabilistic and other parsing technologies. Springer, pp. 29–61. Eisner, Jason and Giorgio Satta (2000). “A faster parsing algorithm for lexicalized tree-adjoining grammars”. In: Proceedings of the 5th Workshop on Tree-Adjoining Grammars and Related

  • pp. 14–19.

Evang, Kilian and Laura Kallmeyer (2011). “PLCFRS Parsing of English Discontinuous Constituents”. In: Proceedings of the 12th International Conference on Parsing Technologies. Dublin, Ireland, pp. 104–116.

slide-157
SLIDE 157

References v

Fernández-González, Daniel and André F. T. Martins (2015). “Parsing as Reduction”. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Beijing, China: Association for Computational Linguistics,

  • pp. 1523–1533. DOI: 10.3115/v1/P15-1147. URL:

http://www.aclweb.org/anthology/P15-1147. Gómez-Rodríguez, Carlos, David Weir, and John Carroll (2009). “Parsing mildly non-projective dependency structures”. In: Proceedings of the 12th Conference of the European Chapter of the Asso Association for Computational Linguistics, pp. 291–299.

slide-158
SLIDE 158

References vi

Kingma, Diederik P et al. (2014). “Semi-supervised Learning with Deep Generative Models”. In: Advances in Neural Information Processing Systems 27. Ed. by

  • Z. Ghahramani et al. Curran Associates, Inc., pp. 3581–3589.

URL: http://papers.nips.cc/paper/5352-semi- supervised-learning-with-deep-generative-models.pdf. Kong, Lingpeng, M. Alexander Rush, and A. Noah Smith (2015). “Transforming Dependencies into Phrase Structures”. In: Proceedings of the 2015 Conference of the North American Chapter of the Denver, Colorado, pp. 788–798. Koo, Terry et al. (2010). “Dual decomposition for parsing with non-projective head automata”. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Association for Computational Linguistics, pp. 1288–1298.

slide-159
SLIDE 159

References vii

Kuhlmann, Marco (2010). Dependency Structures and Lexicalized Grammars: An Algebraic Approach.

  • Vol. 6270. Springer.

Le Roux, Joseph, Antoine Rozenknop, and Jennifer Foster (2013). “Combining PCFG-LA models with dual decomposition: A case study with function labels and binarization”. In: the 2013 Conference on Empirical Methods in Natural Language Processing. Martins, Andre, Miguel Almeida, and Noah A. Smith (2013). “Turning

  • n the Turbo: Fast Third-Order Non-Projective Turbo Parsers”. In:

Proceedings of the 51st Annual Meeting of the Association for Computational Sofia, Bulgaria: Association for Computational Linguistics,

  • pp. 617–622. URL:

http://www.aclweb.org/anthology/P13-2109.

slide-160
SLIDE 160

References viii

McDonald, Ryan et al. (2005). “Non- projective dependency parsing using spanning tree algorithms”. In: Proceedings of the conference on Human Language Technology and Empirical Association for Computational Linguistics, pp. 523–530. Myung, Young-Soo, Chang-Ho Lee, and Dong-Wan Tcha (1995). “On the generalized minimum spanning tree problem”. In: Networks 26.4, pp. 231–241. Pop, Petrica Claudiu (2009). “A survey of different integer programming formulations of the generalized minimum spanning tree problem”. In: Carpathian Journal of Mathematics 25.1,

  • pp. 104–118.
slide-161
SLIDE 161

References ix

Rambow, Owen and Aravind Joshi (1997). “A formal look at dependency grammars and phrase-structure grammars, with special consideration of word-order phenomena”. In: Recent trends in meaning-text theory 39, pp. 167–190. Rush, Alexander M et al. (2010). “On Dual Decomposition and Linear Programming Relaxations for Natural Language Processing”. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Cambridge, MA: Association for Computational Linguistics,

  • pp. 1–11. URL:

http://www.aclweb.org/anthology/D10-1001.

slide-162
SLIDE 162

References x

Satta, Giorgio (1992). “RECOGNITION OF LINEAR CONTEXT-FREE REWRITING SYSTEMS”. In: 30th Annual Meeting of the Association for Computational Linguistics. URL: http://www.aclweb.org/anthology/P92-1012. Schrijver, A. (2003). Combinatorial Optimization - Polyhedra and Efficiency. Springer. Zaslavskiy, Mikhail, Marc Dymetman, and Nicola Cancedda (2009). “Phrase-Based Sta- tistical Machine Translation as a Traveling Salesman Problem”. In: Proceedings of the Joint Conference of the 47th Annual Meeting of the Suntec, Singapore: Association for Computational Linguistics,

  • pp. 333–341. URL:

http://www.aclweb.org/anthology/P09-1038.