Induction of Treebank-Aligned Lexical Resources LREC 2008 - PowerPoint PPT Presentation

Induction of Treebank-Aligned Lexical Resources LREC 2008 Tejaswini Deoskar, Mats Rooth Department of Linguistics Cornell University Induction of Treebank-Aligned Lexical Resources – p. 1/2

Overview • Goal: Induction of probabilistic treebank-aligned lexical resources. • Treebank-Aligned Lexicon : a systematic correspondence between features of a probabilistic lexicon and structural annotation in a treebank. • Features: ♦ complex subcategorization frames for verbs or nouns. ♦ attachment preference of adverbs Induction of Treebank-Aligned Lexical Resources – p. 2/2

Overview • Treebank PCFG and lexicon. ♦ Unlexicalised Treebank PCFG : Clear division between grammar and lexicon. ♦ Good performance (Klein and Manning, 2003) • Large-scale lexicon: Unsupervised acquisition from unlabeled data. Induction of Treebank-Aligned Lexical Resources – p. 3/2

Why another Treebank PCFG? • PCFGs built from Treebanks are reduced representations. Induction of Treebank-Aligned Lexical Resources – p. 4/2

Why another Treebank PCFG? • PCFGs built from Treebanks are reduced representations. ♦ Exports which played a key role in fueling growth over the last two years seem to have stalled. Induction of Treebank-Aligned Lexical Resources – p. 4/2

Why another Treebank PCFG? • PCFGs built from Treebanks are reduced representations. ♦ Exports which played a key role in fueling growth over the last two years seem to have stalled. • More expressive formalisms can represent these (LFG, HPSG, TAG, CCG, Minimalist grammars) Induction of Treebank-Aligned Lexical Resources – p. 4/2

Why another Treebank PCFG? • PCFGs built from Treebanks are reduced representations. ♦ Exports which played a key role in fueling growth over the last two years seem to have stalled. • More expressive formalisms can represent these (LFG, HPSG, TAG, CCG, Minimalist grammars) • A sophisticated PCFG that captures the same phenomena as more expressive formalisms. Induction of Treebank-Aligned Lexical Resources – p. 4/2

Why another Treebank PCFG? • PCFGs built from Treebanks are reduced representations. ♦ Exports which played a key role in fueling growth over the last two years seem to have stalled. • More expressive formalisms can represent these (LFG, HPSG, TAG, CCG, Minimalist grammars) • A sophisticated PCFG that captures the same phenomena as more expressive formalisms. ♦ Linguistic theory neutral. Induction of Treebank-Aligned Lexical Resources – p. 4/2

Why another Treebank PCFG? • PCFGs built from Treebanks are reduced representations. ♦ Exports which played a key role in fueling growth over the last two years seem to have stalled. • More expressive formalisms can represent these (LFG, HPSG, TAG, CCG, Minimalist grammars) • A sophisticated PCFG that captures the same phenomena as more expressive formalisms. ♦ Linguistic theory neutral. ♦ Focus on commonly observed phenomenon. Induction of Treebank-Aligned Lexical Resources – p. 4/2

Treebank Transformation Framework • Treebank Transformation : Johnson (1999), Klein and Manning (2003), etc. • Training of PCFG on transformed treebank. Induction of Treebank-Aligned Lexical Resources – p. 5/2

Treebank Transformation Framework • Treebank Transformation : Johnson (1999), Klein and Manning (2003), etc. • Training of PCFG on transformed treebank. • Methodology for transformation based on addition of linguistically motivated features, and feature-constraint solving. • Database of Penn Treebank trees annotated with linguistic features as a resource. • Components usable for transforming existing PTB-style treebanks, and building accurate PCFGs from them. Induction of Treebank-Aligned Lexical Resources – p. 5/2

Feature Constraint Framework • Bare-bones CFG extracted from Penn Treebank. • A feature-constraint grammar is built by adding constraints on CF rules (YAP, Schmid (2000)). • Each treebank tree converted into a trivial context-free shared forest. • Constraints in the shared forest solved by YAP constraint solver. Induction of Treebank-Aligned Lexical Resources – p. 6/2

Adding Constraints Features on auxiliary verbs: Induction of Treebank-Aligned Lexical Resources – p. 7/2

Adding Constraints Features on auxiliary verbs: VP → VB ADVP VP Induction of Treebank-Aligned Lexical Resources – p. 7/2

Adding Constraints Features on auxiliary verbs: VP → VB ADVP VP → VP { Vform = base; } VB {Val = aux;} ADVP { } VP { } Induction of Treebank-Aligned Lexical Resources – p. 7/2

Adding Constraints Features on auxiliary verbs: VP → VB ADVP VP → VP { Vform = base; } VB {Val = aux;} ADVP { } VP { } VP {Vform = base; Slash = sl ; } VB {Val = aux; Vsel = vf ; } → ADVP { } VP { Slash = sl ; Vform = vf } Induction of Treebank-Aligned Lexical Resources – p. 7/2

Adding Constraints Features on auxiliary verbs: VP → VB ADVP VP → VP { Vform = base; } VB {Val = aux;} ADVP { } VP { } VP {Vform = base; Slash = sl ; } VB {Val = aux; Vsel = vf ; } → ADVP { } VP { Slash = sl ; Vform = vf } VP {Vform = base; Slash = sl ; } VB {Val = aux; Vsel = vf ; → Prep = - ; Prtcl = -; Sbj = -; } ADVP { } VP {Slash = sl ; Vform = vf } Induction of Treebank-Aligned Lexical Resources – p. 7/2

Relative Clause ..that has been seen. Induction of Treebank-Aligned Lexical Resources – p. 8/2

Verbal Subcategorization Features → VP VBD +EI-NP+ S Induction of Treebank-Aligned Lexical Resources – p. 9/2

Verbal Subcategorization Features → VP VBD +EI-NP+ S → VP{ Vform = ns; } VBD { Val = ns; } +EI-NP+ S { } Induction of Treebank-Aligned Lexical Resources – p. 9/2

Verbal Subcategorization Features → VP VBD +EI-NP+ S → VP{ Vform = ns; } VBD { Val = ns; } +EI-NP+ S { } VBD { Val=ns; Sbj = x ; Vsel = vf ; } → VP{ Vform = ns; } +EI-NP+ S { Sbj= x ; Vform = vf ; } Induction of Treebank-Aligned Lexical Resources – p. 9/2

Verbal Subcategorization Features → VP VBD +EI-NP+ S → VP{ Vform = ns; } VBD { Val = ns; } +EI-NP+ S { } VBD { Val=ns; Sbj = x ; Vsel = vf ; } → VP{ Vform = ns; } +EI-NP+ S { Sbj= x ; Vform = vf ; } VP{Vform = ns; Slash = sl ;} VBD {Val=ns; Sbj= x ; Vsel= vf ; → Prep=-; Prtcl=-; } +EI-NP+ S {Sbj= x ; Vform= vf ; Slash= sl ;} Induction of Treebank-Aligned Lexical Resources – p. 9/2

Verbal Subcategorization Structural information is projected onto lexical item: verbs, adverbs, nouns. Induction of Treebank-Aligned Lexical Resources – p. 10/2

A feature-structure Treebank Tree The product-design project he heads is scrapped Induction of Treebank-Aligned Lexical Resources – p. 11/2

Treebank PCFG • Frequencies collected from feature-annotated treebank database. • Rule frequency table and frequency lexicon that can be used by a probabilistic parser. Induction of Treebank-Aligned Lexical Resources – p. 12/2

Treebank grammar and lexicon ROOT S .fin.-.-.root → 29092.0 S .fin.-.-.- NP-SBJ .nvd.base.-.-.- VP .fin.-.- → 14134.0 NP-SBJ .nvd.base.-.-.- PRP → 13057.0 PP .nvd.of.np IN .of NP .nvd.base.-.-.-.- → 13050.0 VBD .s.e.to.- 32.0 VBN .s.e.to.- 11.0 VBN .n.-.-.- 5.0 tried VBD .z.-.-.- 1.0 VBD .n.-.-.- 1.0 VBD .s.e.g.- 1.0 VBN .z.-.-.- 1.0 VBD .n.-.- 1.0 admired VB .z.-.- 1.0 VB .n.-.- 1.0 VB .b.-.- 3.0 admit VBP .z.-.- 1.0 VBP .p.-.- 1.0 VBP .b.-.- 2.0 VBG .s.-.to 1.0 admonishing Induction of Treebank-Aligned Lexical Resources – p. 13/2

Treebank PCFG • PCFG of variable granularity, based on attributes incorporated into the PCFG symbols. PTB No Prep. Prep. Sec 23 Prepositions on verbs on nouns Labeled Recall 86.5 86.11 85.98 Labeled Precision 86.7 86.50 86.3 Labeled F-score 86.6 86.31 86.14 Number of features on all categories: 19 Some structural features, mostly linguistic features. Induction of Treebank-Aligned Lexical Resources – p. 14/2

Scarcity of lexical data In training sections of Penn Treebank, ∼ 45000 sentences • Total verb types: ∼ 7450, tokens ∼ 125000. • ∼ 2830 verb types with occurrence freq 1: 38% of all types, 2.37% of all tokens. VBD .n.-.- 1.0 admired VB .z.-.- 1.0 VB .n.-.- 1.0 VB .b.-.- 3.0 admit VBP .z.-.- 1.0 VBP .p.-.- 1.0 VBP .b.-.- 2.0 VBG .s.-.to 1.0 admonishing VBN .aux.e.fin 2.0 VBD .n.-.- 15.0 VBD .np.-.- 1.0 adopted VBN .n.-.- 16.0 Induction of Treebank-Aligned Lexical Resources – p. 15/2

Unsupervised Estimation • Inside-outside estimation over an unlabeled corpus. Induction of Treebank-Aligned Lexical Resources – p. 16/2

Unsupervised Estimation • Inside-outside estimation over an unlabeled corpus. • Treebank PCFG as starting model. Induction of Treebank-Aligned Lexical Resources – p. 16/2

Unsupervised Estimation • Inside-outside estimation over an unlabeled corpus. • Treebank PCFG as starting model. • Focus on learning lexical parameters. Induction of Treebank-Aligned Lexical Resources – p. 16/2

Induction of Treebank-Aligned Lexical Resources LREC 2008 - PowerPoint PPT Presentation

Induction of Treebank-Aligned Lexical Resources LREC 2008 Tejaswini Deoskar, Mats Rooth Department of Linguistics Cornell University Induction of Treebank-Aligned Lexical Resources p. 1/2 Overview Goal: Induction of probabilistic

Heterogeneous Lexical Resources MultiJEDI ERC 259234 Lexical Resource Lexical Resource Lexical

Induction Stepwise induction (for T PA , T cons ) Complete induction (for T PA , T cons )

LEXICAL TYPOLOGY Peter Koch (Part I) Koch, Lexical typology, 2010-8-24 A. General introduction

Compilers Lexical Analysis Alex Aiken Lexical Analysis 1. Lexical Analysis 2. Parsing 3.

Induction and recursion Chapter 5 Chapter Summary Mathematical Induction Strong Induction

Correction of Treebank Annotation: The Case of the Arabic Treebank Mohamed Maamouri, Ann Bies,

Introduction to treebanks Session 1: 7/08/2011 1 Outline Types of treebanks (Syntactic)

Mathematical Induction Lecture 10-11 Menu Mathematical Induction Strong Induction

MA THEMA TICAL INDUCTION Induction and Deduction Mathematical Induction (its

Beyond Inductive Definitions Induction-Recursion, Induction-Induction, Coalgebras Anton

Lecture Outline Strengthening Induction Hypothesis. Lecture Outline Strengthening Induction

Strong induction (3) 23/38 Let P be a unary predicate on N Strong induction: Induction . . .

LEXICAL TYPOLOGY LEXICAL TYPOLOGY Peter Koch (Part II) Department of Romance Studies, Tbingen

LEXICAL SEMANTICS LEXICAL SEMANTICS CS 224N 2011 Gerald Penn Slides largely adapted from

Lesson 2 Lexical Analysis CS 226/326 Spring 2003 Lexical Analysis Transform source program

Lexical analysis Lexical analysis Lexical analysis checks the correctness of program words and

Computer Graphics - Transformations - Hendrik Lensch Computer Graphics WS07/08

COMPUTER GRAPHICS COURSE Transformations Georgios Papaioannou 2016 ABOUT TRANSFORMATIONS

Analysis Techniques for Graph Transformation Systems Gabriele Taentzer Philipps-Universitt

Embedding a privacy and ethics by design approach into your digital transformation journey Raminta

Selective Java Code Transformation into AWS Lambda Functions Serhii Dorodko and Josef Spillner

CpSc 875 CpSc 875 John D McGregor John D. McGregor C 2 ADD Attribute Driven Design Example

Lecture 3: Linear Regression (Part 2) Feb 3rd 2020 Lecturer: Steven Wu Scribe: Steven Wu Recall

PCMH Resource Center (and Beyond!) Janice Genevro, PhD, MSW PCPCC Monthly Briefing May 21, 2015

Sambuz

Useful Links

Newsletter

Mail Us