Forest-based Algorithms in Natural Language Processing Liang Huang - PowerPoint PPT Presentation

Redundancies in n-best lists Not all those who wrote oppose the changes. (TOP (S (NP (NP (RB Not) (PDT all) (DT those)) (SBAR (WHNP (WP who)) (S (VP (VBD wrote))))) (VP (VBP oppose) (NP (DT the) (NNS changes))) (. .))) (TOP (S (RB Not) (NP (NP (PDT all) (DT those)) (SBAR (WHNP (WP who)) (S (VP (VBD wrote))))) (VP (VBP oppose) (NP (DT the) (NNS changes))) (. .))) (TOP (S (NP (NP (RB Not) (DT all) (DT those)) (SBAR (WHNP (WP who)) (S (VP (VBD wrote))))) (VP (VBP oppose) (NP (DT the) (NNS changes))) (. .))) (TOP (S (RB Not) (NP (NP (DT all) (DT those)) (SBAR (WHNP (WP who)) (S (VP (VBD wrote))))) (VP (VBP oppose) (NP (DT the) (NNS changes))) (. .))) (TOP (S (NP (NP (RB Not) (PDT all) (DT those)) (SBAR (WHNP (WP who)) (S (VP (VBD wrote))))) (VP (VB oppose) (NP (DT the) (NNS changes))) (. .))) (TOP (S (NP (NP (RB Not) (RB all) (DT those)) (SBAR (WHNP (WP who)) (S (VP (VBD wrote))))) (VP (VBP oppose) (NP (DT the) (NNS changes))) (. .))) (TOP (S (RB Not) (NP (NP (PDT all) (DT those)) (SBAR (WHNP (WP who)) (S (VP (VBD wrote))))) (VP (VB oppose) (NP (DT the) (NNS changes))) (. .))) ... Forest Algorithms 26

Redundancies in n-best lists Not all those who wrote oppose the changes. (TOP (S (NP (NP (RB Not) (PDT all) (DT those)) (SBAR (WHNP (WP who)) (S (VP (VBD wrote))))) (VP (VBP oppose) (NP (DT the) (NNS changes))) (. .))) (TOP (S (RB Not) (NP (NP (PDT all) (DT those)) (SBAR (WHNP (WP who)) (S (VP (VBD wrote))))) (VP (VBP oppose) (NP (DT the) (NNS changes))) (. .))) (TOP (S (NP (NP (RB Not) (DT all) (DT those)) (SBAR (WHNP (WP who)) (S (VP (VBD wrote))))) (VP (VBP oppose) (NP (DT the) (NNS changes))) (. .))) (TOP (S (RB Not) (NP (NP (DT all) (DT those)) (SBAR (WHNP (WP who)) (S (VP (VBD wrote))))) (VP (VBP oppose) (NP (DT the) (NNS changes))) (. .))) (TOP (S (NP (NP (RB Not) (PDT all) (DT those)) (SBAR (WHNP (WP who)) (S (VP (VBD wrote))))) (VP (VB oppose) (NP (DT the) (NNS changes))) (. .))) (TOP (S (NP (NP (RB Not) (RB all) (DT those)) (SBAR (WHNP (WP who)) (S (VP (VBD wrote))))) (VP (VBP oppose) (NP (DT the) (NNS changes))) (. .))) (TOP (S (RB Not) (NP (NP (PDT all) (DT those)) (SBAR (WHNP (WP who)) (S (VP (VBD wrote))))) (VP (VB oppose) (NP (DT the) (NNS changes))) (. .))) packed forest ... Forest Algorithms 26

Reranking on a Forest? • with only local features (Solution 2) • dynamic programming, exact, tractable (Taskar et al. 2004; McDonald et al., 2005) • with non-local features (Solution 3) • on-the-fly reranking at internal nodes • top k derivations at each node • use as many non-local features as possible at each node • chart parsing + discriminative reranking • we use perceptron for simplicity Forest Algorithms 27

Features • a feature f is a function from tree y to a real number • f 1 ( y )=log Pr( y ) is the log Prob from generative parser • every other feature counts the number of times a particular configuration occurs in y our features are from TOP (Charniak & Johnson, 2005) S (Collins, 2000) NP VP . instances of Rule feature PRP VBD NP PP . f 100 ( y ) = f S → NP VP . ( y ) = 1 I saw DT NN IN NP f 200 ( y ) = f NP → DT NN ( y ) = 2 the boy with DT NN a telescope Forest Algorithms 28

Local vs. Non-Local Features • a feature is local iff. it can be factored among local productions of a tree (i.e., hyperedges in a forest) • local features can be pre-computed on each hyperedge in the forest; non-locals can not TOP ParentRule is non-local S NP VP . PRP VBD NP PP . Rule is local I saw DT NN IN NP the boy with DT NN a telescope Forest Algorithms 29

Local vs. Non-Local: Examples • CoLenPar feature captures the difference in lengths of adjacent conjuncts (Charniak and Johnson, 2005) local! CoLenPar: 2 Forest Algorithms 30

Local vs. Non-Local: Examples • CoPar feature captures the depth to which adjacent conjuncts are isomorphic (Charniak and Johnson, 2005) non-local! (violates DP principle) CoPar: 4 Forest Algorithms 31

Factorizing non-local features • going bottom-up, at each node • compute (partial values of) feature instances that become computable at this level • postpone those uncomputable to ancestors TOP unit instance of ParentRule S feature at VP node NP VP . local features factor PRP VBD NP PP . across hyperedges statically I saw DT NN IN NP non-local features factor the boy with DT NN across nodes dynamically a telescope Forest Algorithms 32

Factorizing non-local features • going bottom-up, at each node • compute (partial values of) feature instances that become computable at this level • postpone those uncomputable to ancestors TOP unit instance of ParentRule S feature at S node NP VP . local features factor PRP VBD NP PP . across hyperedges statically I saw DT NN IN NP non-local features factor the boy with DT NN across nodes dynamically a telescope Forest Algorithms 33

Factorizing non-local features • going bottom-up, at each node • compute (partial values of) feature instances that become computable at this level • postpone those uncomputable to ancestors TOP unit instance of ParentRule S feature at TOP node NP VP . non-local features factor PRP VBD NP PP . across nodes dynamically I saw DT NN IN NP local features factor the boy with DT NN across hyperedges statically a telescope Forest Algorithms 34

NGramTree (C&J 05) • an NGramTree captures the smallest tree fragment that contains a bigram (two consecutive words) • unit instances are boundary words between subtrees TOP A i,k S B i,j C j,k NP VP . PRP VBD NP PP . w i . . . w j − 1 w j . . . w k − 1 I saw DT NN IN NP unit instance of node A the boy with DT NN a telescope Forest Algorithms 35

Approximate Decoding • bottom-up, keeps top k derivations at each node • non-monotonic grid due to non-local features w ・ f N ( ) = 0.5 A i,k 1.0 3.0 8.0 B i,j C j,k 1.0 2.0 + 0.5 4.0 + 5.0 9.0 + 0.5 w i . . . w j − 1 w j . . . w k − 1 1.1 2.1 + 0.3 4.1 + 5.4 9.1 + 0.3 3.5 4.5 + 0.6 6.5 + 10.5 11.5 + 0.6 Forest Algorithms 38

Approximate Decoding • bottom-up, keeps top k derivations at each node • non-monotonic grid due to non-local features w ・ f N ( ) = 0.5 A i,k 1.0 3.0 8.0 B i,j C j,k 1.0 2.5 9.0 9.5 w i . . . w j − 1 w j . . . w k − 1 1.1 2.4 9.5 9.4 3.5 5.1 17.0 12.1 Forest Algorithms 39

Algorithm 2 => Cube Pruning • bottom-up, keeps top k derivations at each node • non-monotonic grid due to non-local features w ・ f N ( ) = 0.5 A i,k 1.0 3.0 8.0 B i,j C j,k 1.0 2.5 9.0 9.5 w i . . . w j − 1 w j . . . w k − 1 1.1 2.4 9.5 9.4 3.5 5.1 17.0 12.1 Forest Algorithms 40

Algorithm 2 => Cube Pruning • process all hyperedges simultaneously! significant savings of computation VP hyperedge PP 1, 3 VP 3, 6 PP 1, 4 VP 4, 6 NP 1, 2 VP 2, 3 PP 3, 6 there are search errors, but the trade-off is favorable. Forest Algorithms 43

Forest vs. k -best Oracles • on top of Charniak parser (modified to dump forest) • forests enjoy higher oracle scores than k -best lists • with much smaller sizes 98.6 97.8 97.2 96.7 Forest Algorithms 44

Main Results • forest reranking beats 50-best & 100-best reranking • can be trained on the whole treebank in ~1 day even with a pure Python implementation! • most previous work only scaled to short sentences (<=15 words) and local features baseline: 1-best Charniak parser 89.72 feature extract approach training time F1% space time 50-best reranking 4 x 0.3h 91.43 2.4G 19h 100-best reranking 4 x 0.7h 91.49 5.3G 44h forest reranking 4 x 6.1h 91.69 1.2G 2.9h Forest Algorithms 45

Comparison with Others type system F 1 % Collins (2000) 89.7 n -best Charniak and Johnson (2005) 91.0 reranking updated (2006) 91.4 D Petrov and Klein (2008) 88.3 dynamic this work 91.7 programming Carreras et al. (2008) 91.1 Bod (2000) 90.7 G Petrov and Klein (2007) 90.1 semi- S McClosky et al. (2006) 92.1 supervised best accuracy to date on the Penn Treebank, and fast training Forest Algorithms 46

on to Machine Translation... applying the same ideas of non-locality...

Translate Server Error Forest Algorithms 48

Translate Server Error clear evidence that MT is used in real life. Forest Algorithms 48

Context in Translation Forest Algorithms 49

Context in Translation Algorithm 2 => cube pruning fluency problem ( n -gram) Forest Algorithms 49

Context in Translation xiaoxin 小心 X <=> be careful not to X Algorithm 2 => cube pruning fluency problem ( n -gram) syntax problem (SCFG) Forest Algorithms 49

Context in Translation xiaoxin gou 小心狗 <=> be aware of dog xiaoxin 小心 X <=> be careful not to X Algorithm 2 => cube pruning fluency problem ( n -gram) syntax problem (SCFG) Forest Algorithms 49

Context in Translation 小心 VP <=> be careful not to VP xiaoxin gou 小心 NP <=> be careful of NP 小心狗 <=> be aware of dog xiaoxin 小心 X <=> be careful not to X Algorithm 2 => cube pruning fluency problem ( n -gram) syntax problem (SCFG) Forest Algorithms 49

布什与了 How do people translate? 1. understand the source language sentence 2. generate the target language translation 会谈沙龙举行 juxíng huìtán Bùshí Shalóng yu le and/ Bush Sharon hold meeting [ past. ] with Forest Algorithms 50

与了布什 How do people translate? 1. understand the source language sentence 2. generate the target language translation 会谈沙龙举行 juxíng huìtán Bùshí Shalóng yu le and/ Bush Sharon hold meeting [ past. ] with “Bush held a meeting with Sharon” Forest Algorithms 50

How do compilers translate? 1. parse high-level language program into a syntax tree 2. generate intermediate or machine code accordingly x3 = y + 3; Forest Algorithms 51

How do compilers translate? 1. parse high-level language program into a syntax tree 2. generate intermediate or machine code accordingly x3 = y + 3; LD R1, id2 ADDF R1, R1, #3.0 // add float RTOI R2, R1 // real to int ST id1, R2 Forest Algorithms 51

How do compilers translate? 1. parse high-level language program into a syntax tree 2. generate intermediate or machine code accordingly x3 = y + 3; LD R1, id2 ADDF R1, R1, #3.0 // add float RTOI R2, R1 // real to int ST id1, R2 syntax-directed translation (~1960) Forest Algorithms 51

Syntax-Directed Machine Translation • get 1-best parse tree; then convert to English and/ Bush Sharon hold [ past. ] meeting with “Bush held a meeting with Sharon” Forest Algorithms 52

Syntax-Directed Machine Translation • recursive rewrite by pattern-matching (Galley et al. 2004; Liu et al., 2006; Huang, Knight, Joshi 2006) Forest Algorithms 53

Syntax-Directed Machine Translation • recursive rewrite by pattern-matching with (Galley et al. 2004; Liu et al., 2006; Huang, Knight, Joshi 2006) Forest Algorithms 53

Syntax-Directed Machine Translation • recursively solve unfinished subproblems with (Galley et al. 2004; Liu et al., 2006; Huang, Knight, Joshi 2006) Forest Algorithms 54

Syntax-Directed Machine Translation • recursively solve unfinished subproblems Bush with (Galley et al. 2004; Liu et al., 2006; Huang, Knight, Joshi 2006) Forest Algorithms 54

Syntax-Directed Machine Translation • recursively solve unfinished subproblems Bush with held (Galley et al. 2004; Liu et al., 2006; Huang, Knight, Joshi 2006) Forest Algorithms 54

Syntax-Directed Machine Translation • continue pattern-matching Bush held with (Galley et al. 2004; Liu et al., 2006; Huang, Knight, Joshi 2006) Forest Algorithms 55

Syntax-Directed Machine Translation • continue pattern-matching Bush held with a meeting Sharon (Galley et al. 2004; Liu et al., 2006; Huang, Knight, Joshi 2006) Forest Algorithms 55

Syntax-Directed Machine Translation • continue pattern-matching Bush held a meeting with Sharon Forest Algorithms 56

Syntax-Directed Machine Translation • continue pattern-matching Bush held a meeting with Sharon this method is simple, fast, and expressive. but... crucial difference between PL and NL: ambiguity! using 1-best parse causes error propagation! idea: use k -best parses? use a parse forest! Forest Algorithms 56

Forest-based Translation “and” / “with” Forest Algorithms 57

Forest-based Translation pattern-matching on forest “and” / “with” Forest Algorithms 58

Forest-based Translation pattern-matching on forest “and” “and” / “with” Forest Algorithms 58

Forest-based Translation pattern-matching on forest directed by underspecified syntax “and” “and” / “with” Forest Algorithms 58

Forest-based Algorithms in Natural Language Processing Liang Huang - PowerPoint PPT Presentation

Forest-based Algorithms in Natural Language Processing Liang Huang overview of Ph.D. work done at Penn (and ISI, ICT) INSTITUTE OF COMPUTING TECHNOLOGY includes joint work with David Chiang, Kevin Knight, Aravind Joshi, Haitao Mi and Qun Liu

U.S. Forest Service Forest Service U.S. Forest Inventory and Analysis Forest Service Research

Logic-based Evaluation of Forest Logic-based Evaluation of Forest Ecosystem Sustainability

Epping Forest Arts Epping Forest Arts Epping Forest Councils Epping Forest Councils Arts

Forest management associations Forest owners own associations Forest Management Association is

CURRENT U.S. FOREST DATA AND MAPS Forest age FIA MapMaker Forest ownership TPO Data CURRENT

based results vs based results based results based results based results based results vs

Forest Health Protection Priorities in the US Forest Service Rick Cooksey Continental Dialogue

National Forest Monitoring and National Forest Inventory at FAO FAO Forestry

US Forest Service Presentation Forest Health and Water Implications United State Forest Service

PERTINENT FACTS ABOUT THE FOREST SURVEY "What is the Forest Survey? Edward C. Crafts, Chief,

Interpretation of forest plots Part I 1 At the end of this lecture, you should be able to

Sustainable forest management, National Forest Strategy in Hungary Andrs Szepesi Department

WELCOME The Scheduled Tribes and Other Traditional Forest Dwellers (Recognition of Forest

Integrated Model for Building Climate Resilience of Forest Dependent Communities in UTTARAKHAND

The Boreal Forest Overview - Introduction to the Boreal Forest - Why the Boreal is important -

New Forest Catchment Partnership Dr Naomi Ewald Freshwater Habitats Trust New Forest Catchment

lazyeval A uniform approach to NSE July 2016 Hadley Wickham @hadleywickham Chief

Libclang Integration in the KDevelop IDE Kevin Funk (kfunk@kde.org) April 14, 2015 | London |

Human action recognition in still images via text analysis Dieu-Thu Le Email:

Word Meaning & Word Sense Disambiguation CMSC 723 / LING 723 / INST 725 M ARINE C ARPUAT

Unihan Disambiguation Through Font Technology Dirk Meyer CJKV Type Development Adobe Systems

Generalized Type-Based Disambiguation of Meta Programs with Concrete Object Syntax GPCE 2005

Dealing with Ambiguity in Plan Recognition under Time Constraints Moser S. Fagundes,

A Reminder about the Importance of Computing and Exploiting Invariants in Planning azar,

Forest-based Algorithms in Natural Language Processing Liang Huang - PowerPoint PPT Presentation

Forest-based Algorithms in Natural Language Processing Liang Huang overview of Ph.D. work done at Penn (and ISI, ICT) INSTITUTE OF COMPUTING TECHNOLOGY includes joint work with David Chiang, Kevin Knight, Aravind Joshi, Haitao Mi and Qun Liu

U.S. Forest Service Forest Service U.S. Forest Inventory and Analysis Forest Service Research

Logic-based Evaluation of Forest Logic-based Evaluation of Forest Ecosystem Sustainability

Epping Forest Arts Epping Forest Arts Epping Forest Councils Epping Forest Councils Arts

Forest management associations Forest owners own associations Forest Management Association is

CURRENT U.S. FOREST DATA AND MAPS Forest age FIA MapMaker Forest ownership TPO Data CURRENT

based results vs based results based results based results based results based results vs

Forest Health Protection Priorities in the US Forest Service Rick Cooksey Continental Dialogue

National Forest Monitoring and National Forest Inventory at FAO FAO Forestry

US Forest Service Presentation Forest Health and Water Implications United State Forest Service

PERTINENT FACTS ABOUT THE FOREST SURVEY &quot;What is the Forest Survey? Edward C. Crafts, Chief,

Interpretation of forest plots Part I 1 At the end of this lecture, you should be able to

Sustainable forest management, National Forest Strategy in Hungary Andrs Szepesi Department

WELCOME The Scheduled Tribes and Other Traditional Forest Dwellers (Recognition of Forest

Integrated Model for Building Climate Resilience of Forest Dependent Communities in UTTARAKHAND

The Boreal Forest Overview - Introduction to the Boreal Forest - Why the Boreal is important -

New Forest Catchment Partnership Dr Naomi Ewald Freshwater Habitats Trust New Forest Catchment

lazyeval A uniform approach to NSE July 2016 Hadley Wickham @hadleywickham Chief

Libclang Integration in the KDevelop IDE Kevin Funk (kfunk@kde.org) April 14, 2015 | London |

Human action recognition in still images via text analysis Dieu-Thu Le Email:

Word Meaning &amp; Word Sense Disambiguation CMSC 723 / LING 723 / INST 725 M ARINE C ARPUAT

Unihan Disambiguation Through Font Technology Dirk Meyer CJKV Type Development Adobe Systems

Generalized Type-Based Disambiguation of Meta Programs with Concrete Object Syntax GPCE 2005

Dealing with Ambiguity in Plan Recognition under Time Constraints Moser S. Fagundes,

A Reminder about the Importance of Computing and Exploiting Invariants in Planning azar,

PERTINENT FACTS ABOUT THE FOREST SURVEY "What is the Forest Survey? Edward C. Crafts, Chief,

Word Meaning & Word Sense Disambiguation CMSC 723 / LING 723 / INST 725 M ARINE C ARPUAT