To tree or not to tree? The Quest for Sentence Structure in Natural - - PowerPoint PPT Presentation

to tree or not to tree
SMART_READER_LITE
LIVE PREVIEW

To tree or not to tree? The Quest for Sentence Structure in Natural - - PowerPoint PPT Presentation

To tree or not to tree? The Quest for Sentence Structure in Natural Language Processing ek Zden Zabokrtsk y Institute of Formal and Applied Linguistics Charles University in Prague Prague Gathering of Logicians, February 12-13, 2016


slide-1
SLIDE 1

To tree or not to tree?

The Quest for Sentence Structure in Natural Language Processing Zdenˇ ek ˇ Zabokrtsk´ y

Institute of Formal and Applied Linguistics Charles University in Prague

Prague Gathering of Logicians, February 12-13, 2016

Zdenˇ ek ˇ Zabokrtsk´ y (´ UFAL MFF UK) To tree or not to tree? PGL 2016 1 / 37

slide-2
SLIDE 2

I’ll be shamelessly borrowing all kinds of materials from my colleagus throughout the talk.

Zdenˇ ek ˇ Zabokrtsk´ y (´ UFAL MFF UK) To tree or not to tree? PGL 2016 2 / 37

slide-3
SLIDE 3

Dependency trees – a first glimpse

tree-shaped sentence analysis

◮ familiar to everyone who went through the Czech education system: Credit: http://konecekh.blog.cz Zdenˇ ek ˇ Zabokrtsk´ y (´ UFAL MFF UK) To tree or not to tree? PGL 2016 3 / 37

slide-4
SLIDE 4

Dependency trees – a more modern look

Credit: Prague Dependency Treebank 2.0, sample selection by Jan Hajiˇ c Zdenˇ ek ˇ Zabokrtsk´ y (´ UFAL MFF UK) To tree or not to tree? PGL 2016 4 / 37

slide-5
SLIDE 5

To tree or not to tree, that is the question.

A tree is an irresistibly attractive data structure, but . . . Formal linguists are not the only ones to face this question.

◮ geneticists hesitate because of horizontal gene transfer Credit: Nature Publishing Group ◮ interfaith families hesitate before Christmas Credit: http://www.frumsatire.net Zdenˇ ek ˇ Zabokrtsk´ y (´ UFAL MFF UK) To tree or not to tree? PGL 2016 5 / 37

slide-6
SLIDE 6

Outline of the talk

Actually there are more questions to discuss today: WHAT? What kind of creatures are those dependency trees? HOW? How can we build such trees automatically? WHY? Are the trees really useful in NLP applications?

Zdenˇ ek ˇ Zabokrtsk´ y (´ UFAL MFF UK) To tree or not to tree? PGL 2016 6 / 37

slide-7
SLIDE 7

Part 1: WHAT? What kind of trees do we search for?

Zdenˇ ek ˇ Zabokrtsk´ y (´ UFAL MFF UK) To tree or not to tree? PGL 2016 7 / 37

slide-8
SLIDE 8

Initial thoughts

1 We believe sentences can be reasonably represented by discrete units

and relations among them.

2 Some relations among sentence components (such as some word

groupings) make more sense than others.

3 In other words, we believe there is an latent but identifiable discrete

structure hidden in each sentence.

4 The structure must allow for various kinds of nestedness (. . . a j´

a mu ˇ rek, ˇ ze nejsem ˇ Rek, abych mu ˇ rek, kolik je v ˇ Recku ˇ reck´ ych ˇ rek . . . ).

5 This resembles recursivity. Recursivity reminds us of trees. 6 Let’s try to find such trees that make sense linguistically and can be

supported by empirical evidence.

7 Let’s hope they’ll be useful in developing NLP applications such as

Machine Translation.

Zdenˇ ek ˇ Zabokrtsk´ y (´ UFAL MFF UK) To tree or not to tree? PGL 2016 8 / 37

slide-9
SLIDE 9

So what kind of trees?

There are two types of trees broadly used: constituency (phrase-structure) trees dependency trees

Credit: Wikipedia

Constituency trees simply don’t fit to languages with freer word order, such as Czech. Let’s use dependency trees.

Zdenˇ ek ˇ Zabokrtsk´ y (´ UFAL MFF UK) To tree or not to tree? PGL 2016 9 / 37

slide-10
SLIDE 10

How do we know there is a dependency between two words?

There are various clues manifested, such as

◮ word order (juxtapositon): “. . . pˇ

rijdu z´ ıtra . . . ”

◮ agreement: “. . . nov´

ymi.pl.instr knihami.pl.instr. . . ”

◮ government: “. . . sl´

ıbil Petrovi.dative. . . ”

Different languages use different mixtures of morphological strategies to express relations among sentence units.

Zdenˇ ek ˇ Zabokrtsk´ y (´ UFAL MFF UK) To tree or not to tree? PGL 2016 10 / 37

slide-11
SLIDE 11

Basic assumptions about building units

If a sentence is to be represented by a dependency tree, then we need to be able to: identify sentence boundaries. identify word boundaries within a sentence.

Zdenˇ ek ˇ Zabokrtsk´ y (´ UFAL MFF UK) To tree or not to tree? PGL 2016 11 / 37

slide-12
SLIDE 12

Basic assumptions about dependencies

If a sentence is to be represented by a dependency tree, then: there must be a unique parent word for each word in each sentence, except for the root word there are no loops allowed.

Zdenˇ ek ˇ Zabokrtsk´ y (´ UFAL MFF UK) To tree or not to tree? PGL 2016 12 / 37

slide-13
SLIDE 13

Even the most basic assumptions are violated

Sometimes sentence boundaries are unclear – generally in speech, but e.g. in written Arabic too, and in some situations even in written Czech (e.g. direct speech) Sometimes word boundaries are unclear, (Chinese, “ins” in German, “abych” in Czech). Sometimes its unclear which words should become parents (A preposition or a noun? An auxiliary verb or a meaningful verb? . . . ). Sometimes there are too many relations (“Zahl´ edla ho bos´ eho.”), which implies loops. Life’s hard. Let’s ignore it and insist on trees.

Zdenˇ ek ˇ Zabokrtsk´ y (´ UFAL MFF UK) To tree or not to tree? PGL 2016 13 / 37

slide-14
SLIDE 14

Counter-examples revisited

If we cannot find lingustically justified decisions, then make them at least consistent. Sometimes sentence boundaries are unclear (generally in speech, but e.g. in written Arabic too. . . )

◮ OK, so let’s introduce annotation rules for sentence

segmentation.

Sometimes word boundaries are unclear, (Chinese, “ins” in German, “abych” in Czech).

◮ OK, so let’s introduce annotation rules for tokenization.

Sometimes it’s not clear which word should become parent (e.g. a preposition or a noun?).

◮ OK, so let’s introduce annotation rules for choosing parent.

Sometimes there are too many relations (“Zahl´ edla ho bos´ eho.”), which implies loops.

◮ OK, so let’s introduce annotation rules for choosing tree-shaped

skeleton.

Zdenˇ ek ˇ Zabokrtsk´ y (´ UFAL MFF UK) To tree or not to tree? PGL 2016 14 / 37

slide-15
SLIDE 15

Treebanking

Is our dependency approach viable? Can we check it? Let’s start by building the trees manually. a treebank - a collection of sentences and associated (typically manually annotated) dependency trees for English: Penn Treebank [Marcus et al., 1993] for Czech: Prague Dependency Treebank [Hajiˇ c et al., 2001]

◮ layered annotation scheme: morhology, surface syntax, deep syntax ◮ dependency trees for about 100,000 sentences

high degree of design freedom and local linguistic tradition bias different treebanks = ⇒ different annotation styles

Zdenˇ ek ˇ Zabokrtsk´ y (´ UFAL MFF UK) To tree or not to tree? PGL 2016 15 / 37

slide-16
SLIDE 16

Case study on treebank variability: Coordination

coordination structures such as “lazy dogs, cats and rats” consists

  • f

◮ conjuncts ◮ conjunctions ◮ shared modifiers ◮ punctuations

16 different annotation styles identified in 26 treebanks (and many more possible) different expressivity, limited convertibility, limited comparability

  • f experiments. . .

harmonization of annotation styles badly needed!

Zdenˇ ek ˇ Zabokrtsk´ y (´ UFAL MFF UK) To tree or not to tree? PGL 2016 16 / 37

slide-17
SLIDE 17

How many treebanks are there out there?

growing interest in dependency treebanks in the last decade or two existing treebanks for about 50 languages now (but roughly 7,000 languages in the world) UFAL participated in several treebank unification efforts:

◮ 13 languages in CoNLL in 2006 ◮ 29 languages in HamleDT in 2011 ◮ 37 languages in Universal Dependencies in 2015: Zdenˇ ek ˇ Zabokrtsk´ y (´ UFAL MFF UK) To tree or not to tree? PGL 2016 17 / 37

slide-18
SLIDE 18

We don’t do only monolingual data

parallel Czech-English treebank CzEng 15 million sentence pairs in version 1.0 [Bojar,2012] annotated fully automatically

Zdenˇ ek ˇ Zabokrtsk´ y (´ UFAL MFF UK) To tree or not to tree? PGL 2016 18 / 37

slide-19
SLIDE 19

Conclusion from Part 1

No assumptions can be taken for granted. But we can hopefully live with that, as

◮ dependencies are often manifested in a relatively tangible way, ◮ simplifications can be introduced, ◮ artificial annotation rules for deciding unclear cases can be added, ◮ annotation schemes can be verified by manual annotations, ◮ massively crosslingual view helps us not to be trapped in a local

linguistic tradition.

Nowadays, dependency trees seem to be the most viable syntactic model applicable accross languages.

Zdenˇ ek ˇ Zabokrtsk´ y (´ UFAL MFF UK) To tree or not to tree? PGL 2016 19 / 37

slide-20
SLIDE 20

Part 2: HOW? How can we build dependency trees automatically?

Zdenˇ ek ˇ Zabokrtsk´ y (´ UFAL MFF UK) To tree or not to tree? PGL 2016 20 / 37

slide-21
SLIDE 21

Dependency parsing

Task specification: Input: a sequence of words (typically also their lemmas and morphological tags) Output: for each word (except the root word) find its parent word Evaluation criterion: Unlabelled attachment score: percentage of words for which correct parents were found Labelled attachment score: percentage of words for which correct parents were found and whose dependency label were correct too Obvious drawback: all types of errors considered equally important

Zdenˇ ek ˇ Zabokrtsk´ y (´ UFAL MFF UK) To tree or not to tree? PGL 2016 21 / 37

slide-22
SLIDE 22

Typology of parsers in NLP

rule-based data-driven

◮ supervised – big amount of manually annotated trees available ◮ unsupervised – no manually annotated trees available ◮ semi-supervised – something in between Zdenˇ ek ˇ Zabokrtsk´ y (´ UFAL MFF UK) To tree or not to tree? PGL 2016 22 / 37

slide-23
SLIDE 23

Rule-based parsers

more or less obsolete although hand-coded grammars are immensely successful in computer

  • science. . .

. . . it is surprisingly difficult (if not impossible) to design a reliable hand-written a grammar for a natural language the law of diminishing returns applies very quickly

◮ a few simplest grammar patterns (such as determiner-adjective-noun)

are easy to exploit

◮ but errors start interfering with more complex rules very soon and the

system becomes unmaintainable

Zdenˇ ek ˇ Zabokrtsk´ y (´ UFAL MFF UK) To tree or not to tree? PGL 2016 23 / 37

slide-24
SLIDE 24

Supervised parsing

Main approaches: graph-based: we learn a model for scoring graph edges, and search for the highest scoring tree (global optimization e.g. by Maximum Spanning Tree algorithm) transition-based: a shift-reduce parser gradually processing words stored in a queue, CFG-based: a constituency parser applied first, then resulting constituency trees converted to dependencies

Zdenˇ ek ˇ Zabokrtsk´ y (´ UFAL MFF UK) To tree or not to tree? PGL 2016 24 / 37

slide-25
SLIDE 25

Supervised parsing: ensamble parsing

Task:

◮ Input: dependency trees resulting from several parsers ◮ Output: a single dependency tree

Intuition: different parsers are correct in different places. Greedy argmax parent selection insufficient Treeness constraint kept e.g. by applying Maximum Spanning Tree again [Green-ˇ Zabokrtsk´ y, 2012])

Zdenˇ ek ˇ Zabokrtsk´ y (´ UFAL MFF UK) To tree or not to tree? PGL 2016 25 / 37

slide-26
SLIDE 26

Unsupervised parsing

Treebanks for about 50 languages exist . . . . . . but what about the remaining 6950 languages? How can we build parsers from nothing, without having a single hand-annotated tree? Extremely challenging task!

Zdenˇ ek ˇ Zabokrtsk´ y (´ UFAL MFF UK) To tree or not to tree? PGL 2016 26 / 37

slide-27
SLIDE 27

Unsupervised parsing by Gibbs sampling

we can employ the rich-gets-richer principle to amplify detected regularities for instance by Gibbs sampling [Mareˇ cek, 2011]

1

build a probabilistic model (assign probability to each tree) using e.g.:

⋆ prior knowledge: edge length, node fertility, ⋆ sentence fragment reducibility ⋆ word frequency (tendency: frequent =

⇒ auxiliary = ⇒ leaf)

⋆ above all: prefer repeated patterns 2

initialize trees randomly

3

iterate:

⋆ generate a random small change of some of the trees (sampled

proportionally to its probability)

⋆ update the model Zdenˇ ek ˇ Zabokrtsk´ y (´ UFAL MFF UK) To tree or not to tree? PGL 2016 27 / 37

slide-28
SLIDE 28

Semi-supervised parsing

typically an under-resourced scenario: some hand-annotated trees are available . . . . . . but they are not sufficient for supervised approach, because

◮ the data is too small (sometimes only a few trees) ◮ or no data available for a particular languages, only for some other

languages

Zdenˇ ek ˇ Zabokrtsk´ y (´ UFAL MFF UK) To tree or not to tree? PGL 2016 28 / 37

slide-29
SLIDE 29

Semi-supervised parsing example: weighted multisource delexicalized parser transfer

parser transfer = we neet to parse language A, but have only training data for language B delexicalized = we ignore words, we use only part-of-speech tags (Noun Verb Noun instead of John loves Mary) multisource = treebanks for more languages (B,C,D. . . ) are used weighted = we give different weight to information gained from different languages, according to similarity A-B, A-C, A-D, . . . a possible similarity measure: Kullback-Leibler divergence on distribution of part-of-speech trigrams [Rosa-ˇ Zabokrtsk´ y, 2015]

Zdenˇ ek ˇ Zabokrtsk´ y (´ UFAL MFF UK) To tree or not to tree? PGL 2016 29 / 37

slide-30
SLIDE 30

Part 3: WHY? Are the trees useful?

Zdenˇ ek ˇ Zabokrtsk´ y (´ UFAL MFF UK) To tree or not to tree? PGL 2016 30 / 37

slide-31
SLIDE 31

Golden Rule of Natural Language Processing

Whatever task you try to solve in NLP, you can convincily argue that it will be useful for Machine Translation . . . . . . but it hardly ever really is. (but this time there will be a happy ending eventually)

Zdenˇ ek ˇ Zabokrtsk´ y (´ UFAL MFF UK) To tree or not to tree? PGL 2016 31 / 37

slide-32
SLIDE 32

TectoMT: a dependency-based machine translation system

developed in UFAL three phases:

1

analysis up to deep-syntactic trees,

2

transfer on the deep-syntactic level

3

synthesis down to sentence string level

most components trainable, for instance Maximum-Entropy based translation dictionary [Mareˇ cek et al., 2010]

SEnglishA RB However Adv , , AuxX PRP he Sb VBD tried Pred TO to AuxV VB fjnd Adv NN refuge Obj IN in AuxP NNP Brazil Adv . . AuxK SEnglishT however adv: #PersPron n:subj try v:fjn #Cor n:elided fjnd v:to+inf refuge n:obj Brazil n:in+X TCzechT přesto #PersPron snažit_se #Cor najít útočiště brazílie TCzechA Přesto D........1A... se P7.X4......... snažil VpIS...3..AA.. najít Vf........A-.. útočiště N.NS4.....A... v R Brazílii N.FS6.....A... . Z adv: n:1 n:elided n:4 v:inf n:v+6 v:fjn

However, he tried to find refuge in Brazil. Přesto se snažil najít útočiště v Brazílii.

  • 1. Morphological

Analysis (Morce Tagger)

  • 2. Surface-syntax

Analysis (MST Parser)

  • 3. Deep-syntax

Analysis (Rules)

  • 4. Transfer

(TM+HMTM)

  • 5. Surface-syntax

Synthesis (Rules)

  • 6. Morphological

Synthesis (Rules + Stats)

Zdenˇ ek ˇ Zabokrtsk´ y (´ UFAL MFF UK) To tree or not to tree? PGL 2016 32 / 37

slide-33
SLIDE 33

Hidden Tree Markov Model for MT

inspirid by noisy-channel model combination of translation model and target side language model but this time on dependency trees global optimum searched by tree-modified Viterbi algorithm [ˇ Zabokrtsk´ y-Popel, 2009]

machine engine translation arcade be have easy simple strojový překlad být snadný ROOT

PE(strojový | engine) = 0.5 PE(strojový | machine) = 0.4 PE(překlad | translation) = 0.6 PE(překlad | arcade) = 0.7

1×10-8

PT(machine | translation) = 0.02

1×10-8 1×10
  • 1

0.0001 0.002

0.001

0.01 PE(být | be) = 0.8 PE(být | have) = 0.01

1 × 1
  • 8

Source tree (Czech) Target tree (English) Source sentence: Strojový překlad by měl být snadný. Target sentence: Machine translation should be easy. PE(source | target) … emission probabilities … translation model PT(dependent | governing) … transition probabilities … target-language tree model

A N A L Y S I S TRANSFER SYNTHESIS

ROOT

Zdenˇ ek ˇ Zabokrtsk´ y (´ UFAL MFF UK) To tree or not to tree? PGL 2016 33 / 37

slide-34
SLIDE 34

TectoMT: what about more training trees for parsing?

in fact we have no more extra annotated data but we can downscale the data and try to extrapolate BLEU (horizontal axis) – an automatized estimate of parsing quality close-to-log growth = ⇒ exponentially growing annotation costs

0.05 0.06 0.07 0.08 0.09 0.1 0.11 0.12 0.13 100 1000 10000 100000 1e+06 BLEU training tokens MST Malt

Zdenˇ ek ˇ Zabokrtsk´ y (´ UFAL MFF UK) To tree or not to tree? PGL 2016 34 / 37

slide-35
SLIDE 35

TectoMT: what about different parsers?

five different parsers plugged into the translation system [Popel et al., 2011] higher parsing quality does not imply higher translation quality

0.1 0.105 0.11 0.115 0.12 0.125 0.13 0.135 0.14 0.145 0.15 0.74 0.76 0.78 0.8 0.82 0.84 0.86 0.88 0.9 0.92 BLEU UAS MST Malt Zpar Stanford CJ

Zdenˇ ek ˇ Zabokrtsk´ y (´ UFAL MFF UK) To tree or not to tree? PGL 2016 35 / 37

slide-36
SLIDE 36

DeepFix: dependency-based post-editting of an MT system’s output

Example: EU criticizes not only the Greek government. Google translation: EU kritizuje nejen ˇ reck´ a vl´ ada. intuition: it should be possible to fix such errors if we model target language grammar in this case we model valency frames:

◮ P(nominative | kritizovat, object) = 0.03 ◮ P(accusative | kritizovat, object) = 0.80

DeepFix post-editted sentence: EU kritizuje nejen ˇ reckou vl´ adu. dependency trees needed, e.g., for imposing attribute agreement improvement of state-of-the-art systems’ translation quality

Zdenˇ ek ˇ Zabokrtsk´ y (´ UFAL MFF UK) To tree or not to tree? PGL 2016 36 / 37

slide-37
SLIDE 37

Thank you!

Zdenˇ ek ˇ Zabokrtsk´ y (´ UFAL MFF UK) To tree or not to tree? PGL 2016 37 / 37