Using Universal Linguistic Knowledge to Guide Grammar Induction - - PowerPoint PPT Presentation

using universal linguistic knowledge to guide grammar
SMART_READER_LITE
LIVE PREVIEW

Using Universal Linguistic Knowledge to Guide Grammar Induction - - PowerPoint PPT Presentation

Using Universal Linguistic Knowledge to Guide Grammar Induction Using Universal Linguistic Knowledge to Guide Grammar Induction [Naseem et al., 2010] Juri Alexander Opitz June 30, 2016 1/60 Using Universal Linguistic Knowledge to Guide


slide-1
SLIDE 1

1/60 Using Universal Linguistic Knowledge to Guide Grammar Induction

Using Universal Linguistic Knowledge to Guide Grammar Induction

[Naseem et al., 2010] Juri Alexander Opitz June 30, 2016

slide-2
SLIDE 2

2/60 Using Universal Linguistic Knowledge to Guide Grammar Induction

“By a generative grammar I mean simply a system of rules that in some explicit and well-defined way assigns structural descriptions to sentences. Obviously, every speaker of a language has mastered and internalized a generative grammar (...) This is not to say that he is aware of the rules of the grammar or even that he can become aware of them.” Noam Chomsky in Aspects of the Theory of Syntax (1965).

slide-3
SLIDE 3

3/60 Using Universal Linguistic Knowledge to Guide Grammar Induction

Overview

Introduction The Model Experiments Conclusions Outlook

slide-4
SLIDE 4

4/60 Using Universal Linguistic Knowledge to Guide Grammar Induction Introduction

Introduction

slide-5
SLIDE 5

5/60 Using Universal Linguistic Knowledge to Guide Grammar Induction Introduction

What Naseem et al. seek to accomplish

Guide (Dependency-) Grammar induction by (known) Linguistic Universals.

slide-6
SLIDE 6

6/60 Using Universal Linguistic Knowledge to Guide Grammar Induction Introduction

What is Grammar Induction?

◮ Automatic Learning of a Formal Grammer

  • 1. receive observations
  • 2. construct model which “explains” the observations
slide-7
SLIDE 7

7/60 Using Universal Linguistic Knowledge to Guide Grammar Induction Introduction

Why do we need Grammar Induction in NLP?

◮ Observations: spoken/written natural language ◮ Model: any kind of model which explains how the observations

arised (by incorporating underlying deeper structures).

slide-8
SLIDE 8

8/60 Using Universal Linguistic Knowledge to Guide Grammar Induction Introduction

Example: Practical Use

◮ Observations: Texts (+Trees in supervised case). ◮ Model: Parser. ◮ Goal: Parse new Texts.

slide-9
SLIDE 9

9/60 Using Universal Linguistic Knowledge to Guide Grammar Induction Introduction

Why Grammar Induction for LRLs?

Successful parsers rely on manually annotated training material, which is:

◮ very costly (especially in this case: human needs to annotate

data with trees)...

◮ typically constructed for each language.

slide-10
SLIDE 10

10/60 Using Universal Linguistic Knowledge to Guide Grammar Induction Introduction

Why Grammar Induction for LRLs?

Hence we need Unsupervised Grammar Induction for LLRs.

slide-11
SLIDE 11

11/60 Using Universal Linguistic Knowledge to Guide Grammar Induction Introduction

Common Problem with Unsupervised Learning

Models perform usually much worse than their supervised counterparts: They have no teacher and must learn on their own :-(

slide-12
SLIDE 12

12/60 Using Universal Linguistic Knowledge to Guide Grammar Induction Introduction

A possible Cure

Principal Idea of the paper: Exploit universal knowledge to guide the learning process.

slide-13
SLIDE 13

13/60 Using Universal Linguistic Knowledge to Guide Grammar Induction Introduction

Linguistic Universals

slide-14
SLIDE 14

14/60 Using Universal Linguistic Knowledge to Guide Grammar Induction Introduction

Linguistic Universals - Example Parse

Sentence: Nim Chimsky eats a ripe banana. Noun Noun Verb Article Adjective Noun

slide-15
SLIDE 15

15/60 Using Universal Linguistic Knowledge to Guide Grammar Induction Introduction

Linguistic Universals - Example Parse

Sentence: Nim Chimsky eats a ripe banana. Noun Noun Verb Article Adjective Noun a | banana-- | | root--eats-- ripe | Nim--Chimsky

slide-16
SLIDE 16

16/60 Using Universal Linguistic Knowledge to Guide Grammar Induction Introduction

Grammar induction & Low Resource Languages (LRLs)

Idea: With linguistic Universals we can guide grammar induction when we have few or no annotated data at all.

slide-17
SLIDE 17

17/60 Using Universal Linguistic Knowledge to Guide Grammar Induction The Model

The Model, “explaining what we observe”.

slide-18
SLIDE 18

18/60 Using Universal Linguistic Knowledge to Guide Grammar Induction The Model

Model

Naseem et al. use a generative Bayesian Model to describe grammar generation when we observe words x1, x2, ..., xn and corresponding coarse symbols, i.e. PoS-Tags s1, s2, ..., sn.

slide-19
SLIDE 19

19/60 Using Universal Linguistic Knowledge to Guide Grammar Induction The Model

Simplified Model

Naseem et al. use hidden, refined symbols z1, z2, ..., zn. For simplicity, we drop this here,i.e. z1, z2, ..., zn == s1, s2, ..., sn .

slide-20
SLIDE 20

20/60 Using Universal Linguistic Knowledge to Guide Grammar Induction The Model

Simplified Model: 2 Facets

  • 1. Generative Process for Model parameters
  • 2. Generative Process for Parses
slide-21
SLIDE 21

21/60 Using Universal Linguistic Knowledge to Guide Grammar Induction The Model

Simplified Model: 2 Facets

  • 1. For each coarse symbol s:

◮ Draw a word generation multinomial. ◮ For each possible context value c, draw also a child symbol

generation multinomial.

  • 2. For each Tree Node i generated in context c by parent symbol

s′:

◮ Draw coarse symbol si from child symbol generation

multinomial of parent

◮ Draw word xi from word generation multinomial.

slide-22
SLIDE 22

22/60 Using Universal Linguistic Knowledge to Guide Grammar Induction The Model

More formally:

  • 1. For each coarse symbol s:

◮ Draw Φs ∼ Dir(Φ0). ◮ For each possible context value c, draw θsc ∼ Dir(θ0)

  • 2. For each Tree Node i generated in context c by parent symbol

s′:

◮ Draw coarse symbol si ∼ Mult(θs′) ◮ Draw word xi ∼ Mult(Φsi).

slide-23
SLIDE 23

23/60 Using Universal Linguistic Knowledge to Guide Grammar Induction The Model

The Dirichlet Distribution...

... is a distribution over multinomial distributions...

slide-24
SLIDE 24

24/60 Using Universal Linguistic Knowledge to Guide Grammar Induction The Model

2 Parameters: K

K: How many discrete events do we have (e.g. number of words in vocab).

slide-25
SLIDE 25

25/60 Using Universal Linguistic Knowledge to Guide Grammar Induction The Model

2 Parameters: Vector α

A K-dimensional “concentration parameter” Vector, all αi must be > 0 (e.g. counts of each word in text).

slide-26
SLIDE 26

26/60 Using Universal Linguistic Knowledge to Guide Grammar Induction The Model

Example for K=3

slide-27
SLIDE 27

27/60 Using Universal Linguistic Knowledge to Guide Grammar Induction The Model

Example for K=3

α = (6, 2, 2), (3, 7, 5), (6, 2, 6), (2, 3, 4), clockwise from top left

slide-28
SLIDE 28

28/60 Using Universal Linguistic Knowledge to Guide Grammar Induction The Model

Model: Plate Outline

slide-29
SLIDE 29

29/60 Using Universal Linguistic Knowledge to Guide Grammar Induction The Model

Inference with Constraints

Idea: constrain the posterior to satisfy the rules in expectation during inference.

◮ What? we require that a certain percantage of linguistic

universals must occur in the model expectations.

◮ Why? Biases the model-inference towards linguistically more

plausible setting.

◮ Advantage: we require only a certain percentage of linguistic

universals to hold − → percentage can be tuned for every language.

slide-30
SLIDE 30

30/60 Using Universal Linguistic Knowledge to Guide Grammar Induction The Model

Inference with Constraints

Method outline:

◮ Maximize lower bound on likelihood of observations

(equivalent to minimizing Divergence between the true posterior distribution of model parameters and other distributions of model parameters!)

◮ implement constraints in constrained optimization

problem:

◮ a certain % of universals must hold!

slide-31
SLIDE 31

31/60 Using Universal Linguistic Knowledge to Guide Grammar Induction Experiments

Experiments

slide-32
SLIDE 32

32/60 Using Universal Linguistic Knowledge to Guide Grammar Induction Experiments

Experimental Setup

Languages: English, Danish, Portuguese, Slovene, Spanish, and Swedish

slide-33
SLIDE 33

33/60 Using Universal Linguistic Knowledge to Guide Grammar Induction Experiments

Experiments: Setup

Languages: English, Danish, Portuguese, Slovene, Spanish, and Swedish.

◮ English data: dependency modification of Penn Treebank

[Taylor et al., 2003], sentence-length < 20.

◮ Other data: 2006 CoNLL-X Shared

task[Buchholz and Marsi, 2006], sentence-length < 10.

◮ each data set provides manually annotated PoS-tags.

slide-34
SLIDE 34

34/60 Using Universal Linguistic Knowledge to Guide Grammar Induction Experiments

Experiments: Setup

Metric: Dependency Accuracy.

◮ Percentage of words having the correct head.

slide-35
SLIDE 35

35/60 Using Universal Linguistic Knowledge to Guide Grammar Induction Experiments

Experiments: Results

DMV, PGI: Baselines. No-split: This model without refined subsymbols. HDP DEP: This model.

slide-36
SLIDE 36

36/60 Using Universal Linguistic Knowledge to Guide Grammar Induction Experiments

Experiments: Ablations

What happens when we exclude certain universal rules?

slide-37
SLIDE 37

37/60 Using Universal Linguistic Knowledge to Guide Grammar Induction Experiments

Experiments: Ablations

slide-38
SLIDE 38

38/60 Using Universal Linguistic Knowledge to Guide Grammar Induction Experiments

Experiments: Constraints Thresholds

What happens when we increase/decrease the percentage of dependencies which must be in accordance with the universals?

slide-39
SLIDE 39

39/60 Using Universal Linguistic Knowledge to Guide Grammar Induction Experiments

Experiments: Constraints Thresholds

slide-40
SLIDE 40

40/60 Using Universal Linguistic Knowledge to Guide Grammar Induction Experiments

Experiments: Constraints Thresholds

slide-41
SLIDE 41

41/60 Using Universal Linguistic Knowledge to Guide Grammar Induction Conclusions

Conclusions

slide-42
SLIDE 42

42/60 Using Universal Linguistic Knowledge to Guide Grammar Induction Conclusions

Conclusions

◮ it is good to have only a percentage, accuracy is stable

between 75% and 90%.

◮ a value of 80% seems to perform well across languages. ◮ Setting the value to the true proportion (for all languages <=

70%)in the gold labellings does not increase performance.

◮ english performs best.

slide-43
SLIDE 43

43/60 Using Universal Linguistic Knowledge to Guide Grammar Induction Conclusions

Experiments: Sentence Lengths, Universal Rules

slide-44
SLIDE 44

44/60 Using Universal Linguistic Knowledge to Guide Grammar Induction Conclusions

Experiments: Sentence Lengths, English Specific Rules

slide-45
SLIDE 45

45/60 Using Universal Linguistic Knowledge to Guide Grammar Induction Conclusions

Conclusions

◮ longer sentences are more difficult to parse. ◮ Using no universal rules at all results in “desastrous”

performance.

◮ With additional language-specific rules, performance increases

by almost 2%.

slide-46
SLIDE 46

46/60 Using Universal Linguistic Knowledge to Guide Grammar Induction Outlook

Outlook

slide-47
SLIDE 47

47/60 Using Universal Linguistic Knowledge to Guide Grammar Induction Outlook

Another Approach to LR Dependency Parsing

Grammar Induction from Text Using Small Syntactic Prototypes. [Boonkwan and Steedman, 2011]

slide-48
SLIDE 48

48/60 Using Universal Linguistic Knowledge to Guide Grammar Induction Outlook

Another Approach to LR Dependency Parsing

[Boonkwan and Steedman, 2011] about [Naseem et al., 2010]:

◮ “method still needs language specific rules to boost accuracy”

slide-49
SLIDE 49

49/60 Using Universal Linguistic Knowledge to Guide Grammar Induction Outlook

Another Approach to LR Dependency Parsing

Idea: Use Categorial Grammar rules as prototypes.

slide-50
SLIDE 50

50/60 Using Universal Linguistic Knowledge to Guide Grammar Induction Outlook

Example

Words are from atomic categories or they are functors from categories to categories.

slide-51
SLIDE 51

51/60 Using Universal Linguistic Knowledge to Guide Grammar Induction Outlook

Example

<, >: as “head right - left child, head left-right child” /: application from right \: application from left

slide-52
SLIDE 52

52/60 Using Universal Linguistic Knowledge to Guide Grammar Induction Outlook

Example: Derivation Rules

slide-53
SLIDE 53

53/60 Using Universal Linguistic Knowledge to Guide Grammar Induction Outlook

Anyone wants to derive “John eats a delicious sandwich”?

slide-54
SLIDE 54

54/60 Using Universal Linguistic Knowledge to Guide Grammar Induction Outlook

slide-55
SLIDE 55

55/60 Using Universal Linguistic Knowledge to Guide Grammar Induction Outlook

Language Parametrization

Ask non-linguist native-speaker about word orders (e.g. subj-verb-obj), derive rules from that.

slide-56
SLIDE 56

56/60 Using Universal Linguistic Knowledge to Guide Grammar Induction Outlook

They manage to improve over Naseem et al. 1. without language specific rules and (+ 3% F1) 2. with language specific rules (+ 1% F1).

slide-57
SLIDE 57

57/60 Using Universal Linguistic Knowledge to Guide Grammar Induction Outlook

Comparison of Grammar Induction Approaches

Performance:

◮ [Boonkwan and Steedman, 2011] approach wins.

Abstraction, Universality:

◮ Naseem et al. rely on only a small set of universal rules ◮ Approach from [Boonkwan and Steedman, 2011] needs work

  • f a native speaker for each language to be parsed.

◮ [Naseem et al., 2010] approach seems more universal (to me).

slide-58
SLIDE 58

58/60 Using Universal Linguistic Knowledge to Guide Grammar Induction Outlook

Thank you for listening.

slide-59
SLIDE 59

59/60 Using Universal Linguistic Knowledge to Guide Grammar Induction Outlook

Literatur I

[Boonkwan and Steedman, 2011] Boonkwan, P. and Steedman,

  • M. (2011).

Grammar induction from text using small syntactic prototypes. In In Proceedings of 5th International Joint Conference on Natural Language Processing, pages 438–446. [Buchholz and Marsi, 2006] Buchholz, S. and Marsi, E. (2006). Conll-x shared task on multilingual dependency parsing. In In Proc. of CoNLL, pages 149–164.

slide-60
SLIDE 60

60/60 Using Universal Linguistic Knowledge to Guide Grammar Induction Outlook

Literatur II

[Naseem et al., 2010] Naseem, T., Chen, H., Barzilay, R., and Johnson, M. (2010). Using Universal Linguistic Knowledge to Guide Grammar Induction. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, EMNLP ’10, pages 1234–1244, Stroudsburg, PA, USA. Association for Computational Linguistics. [Taylor et al., 2003] Taylor, A., Marcus, M., and Santorini, B. (2003). The Penn Treebank: An Overview.