Algorithms for NLP CS 11-711 Fall 2020 Lecture 14: Graph-based - PowerPoint PPT Presentation

Algorithms for NLP CS 11-711 · Fall 2020 Lecture 14: Graph-based dependency parsing Emma Strubell

Announcements ■ No recitation on Friday (Tartan Community Day). 2

Dependency parsing 3

Dependency parsing Input buffer ■ Transition-based (shift-reduce) parsing: w1 w2 wn s1 Dependency Parser Relations s2 Oracle ... Stack sn 3

Dependency parsing Input buffer ■ Transition-based (shift-reduce) parsing: w1 w2 wn ■ Greedy choice of local transitions guided by a good classifier. s1 Dependency Parser Relations s2 Oracle ... Stack sn 3

Dependency parsing Input buffer ■ Transition-based (shift-reduce) parsing: w1 w2 wn ■ Greedy choice of local transitions guided by a good classifier. s1 Dependency Parser Relations s2 Oracle ... ■ Examples: MaltParser [Nivre et al. 2008], Stack Stack LSTM [Dyer et al. 2015] sn 3

Dependency parsing Input buffer ■ Transition-based (shift-reduce) parsing: w1 w2 wn ■ Greedy choice of local transitions guided by a good classifier. s1 Dependency Parser Relations s2 Oracle ... ■ Examples: MaltParser [Nivre et al. 2008], Stack Stack LSTM [Dyer et al. 2015] sn ■ Graph-based dependency parsing: 4 4 12 5 8 root Book that flight 6 7 7 5 3

Dependency parsing Input buffer ■ Transition-based (shift-reduce) parsing: w1 w2 wn ■ Greedy choice of local transitions guided by a good classifier. s1 Dependency Parser Relations s2 Oracle ... ■ Examples: MaltParser [Nivre et al. 2008], Stack Stack LSTM [Dyer et al. 2015] sn ■ Graph-based dependency parsing: ■ Given scores for every pair of words, find 4 4 the (globally) highest scoring set of edges. 12 5 8 root Book that flight 6 7 7 5 3

Dependency parsing Input buffer ■ Transition-based (shift-reduce) parsing: w1 w2 wn ■ Greedy choice of local transitions guided by a good classifier. s1 Dependency Parser Relations s2 Oracle ... ■ Examples: MaltParser [Nivre et al. 2008], Stack Stack LSTM [Dyer et al. 2015] sn ■ Graph-based dependency parsing: ■ Given scores for every pair of words, find 4 4 the (globally) highest scoring set of edges. 12 5 8 ■ Examples: MSTParser [McDonald et al. root Book that flight 6 7 2005], TurboParser [Martins et al. 2009], 7 Deep Biaffine [Dozat et al. 2017] 5 3

Graph-based dependency parsing 4 4 12 5 8 root Book that flight 6 7 7 5 4

Graph-based dependency parsing ■ Edge-factored (or arc-factored ) approaches: 4 4 12 5 8 root Book that flight 6 7 7 5 4

<latexit sha1_base64="B8n3543ays3HiDAwfo7VpnAkHG8=">AD9XicfVJLixNBEO4kPtb42KwevbSGhVCSFZBQYRFPXgRI5jdhXQIPT01kzb9GLp78mCYn+JNvPpbPpLvNozSZadzWLD9HxUfVX1VUHieDW9Xp/avXGjZu3bu/dad69d/Bfuvg4anVqWEwZFpocx5QC4IrGDruBJwnBqgMBJwFs/eF/2wOxnKtvrpVAmNJY8UjzqjzpklrRgaWH5FAZqu8g4v/In+DiZuCo8/wW0xsKicZJ0vD46mjxuhFZnL8DROu8DosJ4lPsUvZputs01a7V63Vx68C/ob0EabM5gc1A0JNUslKMcEtXbU7yVunFHjOBOQN0lqIaFsRmMYeaioBDvOyqnk+NBbQhxp4z/lcGm9HJFRae1KBp4pqZvaq7CeJ1vlLro9TjKkdKLYuFKUCO42LEeOQG2BOrDygzHCvFbMpNZQ5v4jm4eUyUxBzcNVGmBxnNiqrVyQFMq8GL9eNokBQumpaQqfJ6RiEouViFENBUuz4iNtvi6eXCOU/sZnQXKQU4ov1CuaJCQORIcVXN5bpJeTfJB/ALMvDJq/6cgKFOG6+EmljSZe4XFpMnpID/Y3J1wfSw2lZWCvDNFHPRCagsLyET2gIJYqPTpCJ4J74U6hPQyK9hzYdq2JrhX2n/6pvcBafH3f6L7vGXl+2Td5v3uoceo6foCPXRK3SCPqIBGiKGfqO/NVSrNRaN740fjZ9rar2iXmEKqfx6x+lY1wO</latexit> Graph-based dependency parsing ■ Edge-factored (or arc-factored ) approaches: ■ Score of a tree decomposes as sum of edge scores: r X Ψ ( y , w ; θ ) = ψ ( i → j , w , θ ) − r → j ∈ y − i 4 4 12 5 8 root Book that flight 6 7 7 5 4

<latexit sha1_base64="B8n3543ays3HiDAwfo7VpnAkHG8=">AD9XicfVJLixNBEO4kPtb42KwevbSGhVCSFZBQYRFPXgRI5jdhXQIPT01kzb9GLp78mCYn+JNvPpbPpLvNozSZadzWLD9HxUfVX1VUHieDW9Xp/avXGjZu3bu/dad69d/Bfuvg4anVqWEwZFpocx5QC4IrGDruBJwnBqgMBJwFs/eF/2wOxnKtvrpVAmNJY8UjzqjzpklrRgaWH5FAZqu8g4v/In+DiZuCo8/wW0xsKicZJ0vD46mjxuhFZnL8DROu8DosJ4lPsUvZputs01a7V63Vx68C/ob0EabM5gc1A0JNUslKMcEtXbU7yVunFHjOBOQN0lqIaFsRmMYeaioBDvOyqnk+NBbQhxp4z/lcGm9HJFRae1KBp4pqZvaq7CeJ1vlLro9TjKkdKLYuFKUCO42LEeOQG2BOrDygzHCvFbMpNZQ5v4jm4eUyUxBzcNVGmBxnNiqrVyQFMq8GL9eNokBQumpaQqfJ6RiEouViFENBUuz4iNtvi6eXCOU/sZnQXKQU4ov1CuaJCQORIcVXN5bpJeTfJB/ALMvDJq/6cgKFOG6+EmljSZe4XFpMnpID/Y3J1wfSw2lZWCvDNFHPRCagsLyET2gIJYqPTpCJ4J74U6hPQyK9hzYdq2JrhX2n/6pvcBafH3f6L7vGXl+2Td5v3uoceo6foCPXRK3SCPqIBGiKGfqO/NVSrNRaN740fjZ9rar2iXmEKqfx6x+lY1wO</latexit> Graph-based dependency parsing ■ Edge-factored (or arc-factored ) approaches: ■ Score of a tree decomposes as sum of edge scores: r X Ψ ( y , w ; θ ) = ψ ( i → j , w , θ ) − r → j ∈ y − i 4 ■ Start with a fully-connected directed graph 4 12 5 8 root Book that flight 6 7 7 5 4

<latexit sha1_base64="B8n3543ays3HiDAwfo7VpnAkHG8=">AD9XicfVJLixNBEO4kPtb42KwevbSGhVCSFZBQYRFPXgRI5jdhXQIPT01kzb9GLp78mCYn+JNvPpbPpLvNozSZadzWLD9HxUfVX1VUHieDW9Xp/avXGjZu3bu/dad69d/Bfuvg4anVqWEwZFpocx5QC4IrGDruBJwnBqgMBJwFs/eF/2wOxnKtvrpVAmNJY8UjzqjzpklrRgaWH5FAZqu8g4v/In+DiZuCo8/wW0xsKicZJ0vD46mjxuhFZnL8DROu8DosJ4lPsUvZputs01a7V63Vx68C/ob0EabM5gc1A0JNUslKMcEtXbU7yVunFHjOBOQN0lqIaFsRmMYeaioBDvOyqnk+NBbQhxp4z/lcGm9HJFRae1KBp4pqZvaq7CeJ1vlLro9TjKkdKLYuFKUCO42LEeOQG2BOrDygzHCvFbMpNZQ5v4jm4eUyUxBzcNVGmBxnNiqrVyQFMq8GL9eNokBQumpaQqfJ6RiEouViFENBUuz4iNtvi6eXCOU/sZnQXKQU4ov1CuaJCQORIcVXN5bpJeTfJB/ALMvDJq/6cgKFOG6+EmljSZe4XFpMnpID/Y3J1wfSw2lZWCvDNFHPRCagsLyET2gIJYqPTpCJ4J74U6hPQyK9hzYdq2JrhX2n/6pvcBafH3f6L7vGXl+2Td5v3uoceo6foCPXRK3SCPqIBGiKGfqO/NVSrNRaN740fjZ9rar2iXmEKqfx6x+lY1wO</latexit> Graph-based dependency parsing ■ Edge-factored (or arc-factored ) approaches: ■ Score of a tree decomposes as sum of edge scores: r X Ψ ( y , w ; θ ) = ψ ( i → j , w , θ ) − r → j ∈ y − i 4 ■ Start with a fully-connected directed graph 4 12 ■ How to infer the highest scoring tree? 5 8 root Book that flight 6 7 7 5 4

<latexit sha1_base64="B8n3543ays3HiDAwfo7VpnAkHG8=">AD9XicfVJLixNBEO4kPtb42KwevbSGhVCSFZBQYRFPXgRI5jdhXQIPT01kzb9GLp78mCYn+JNvPpbPpLvNozSZadzWLD9HxUfVX1VUHieDW9Xp/avXGjZu3bu/dad69d/Bfuvg4anVqWEwZFpocx5QC4IrGDruBJwnBqgMBJwFs/eF/2wOxnKtvrpVAmNJY8UjzqjzpklrRgaWH5FAZqu8g4v/In+DiZuCo8/wW0xsKicZJ0vD46mjxuhFZnL8DROu8DosJ4lPsUvZputs01a7V63Vx68C/ob0EabM5gc1A0JNUslKMcEtXbU7yVunFHjOBOQN0lqIaFsRmMYeaioBDvOyqnk+NBbQhxp4z/lcGm9HJFRae1KBp4pqZvaq7CeJ1vlLro9TjKkdKLYuFKUCO42LEeOQG2BOrDygzHCvFbMpNZQ5v4jm4eUyUxBzcNVGmBxnNiqrVyQFMq8GL9eNokBQumpaQqfJ6RiEouViFENBUuz4iNtvi6eXCOU/sZnQXKQU4ov1CuaJCQORIcVXN5bpJeTfJB/ALMvDJq/6cgKFOG6+EmljSZe4XFpMnpID/Y3J1wfSw2lZWCvDNFHPRCagsLyET2gIJYqPTpCJ4J74U6hPQyK9hzYdq2JrhX2n/6pvcBafH3f6L7vGXl+2Td5v3uoceo6foCPXRK3SCPqIBGiKGfqO/NVSrNRaN740fjZ9rar2iXmEKqfx6x+lY1wO</latexit> Graph-based dependency parsing ■ Edge-factored (or arc-factored ) approaches: ■ Score of a tree decomposes as sum of edge scores: r X Ψ ( y , w ; θ ) = ψ ( i → j , w , θ ) − r → j ∈ y − i 4 ■ Start with a fully-connected directed graph 4 12 ■ How to infer the highest scoring tree? 5 8 root Book that flight 6 7 ■ Find a maximum directed spanning tree : 7 5 Chu and Liu (1965) and Edmonds (1967) algorithm 4

Chu-Liu-Edmonds algorithm function M AX S PANNING T REE ( G=(V,E) , root , score ) returns spanning tree F ← [] T’ ← [] score’ ← [] for each v ∈ V do bestInEdge ← argmax e =( u , v ) ∈ E score[e] F ← F ∪ bestInEdge for each e=(u,v) ∈ E do score’[e] ← score[e] − score[bestInEdge] if T=(V,F) is a spanning tree then return it else C ← a cycle in F G’ ← C ONTRACT ( G , C ) T’ ← M AX S PANNING T REE ( G’ , root , score’ ) T ← E XPAND ( T’ , C ) return T function C ONTRACT ( G , C ) returns contracted graph function E XPAND ( T , C ) returns expanded graph 5

Chu-Liu-Edmonds algorithm function M AX S PANNING T REE ( G=(V,E) , root , score ) returns spanning tree F ← [] T’ ← [] score’ ← [] for each v ∈ V do select best incoming edge for each node bestInEdge ← argmax e =( u , v ) ∈ E score[e] F ← F ∪ bestInEdge for each e=(u,v) ∈ E do score’[e] ← score[e] − score[bestInEdge] if T=(V,F) is a spanning tree then return it else C ← a cycle in F G’ ← C ONTRACT ( G , C ) T’ ← M AX S PANNING T REE ( G’ , root , score’ ) T ← E XPAND ( T’ , C ) return T function C ONTRACT ( G , C ) returns contracted graph function E XPAND ( T , C ) returns expanded graph 5

Chu-Liu-Edmonds algorithm function M AX S PANNING T REE ( G=(V,E) , root , score ) returns spanning tree F ← [] T’ ← [] score’ ← [] for each v ∈ V do select best incoming edge for each node bestInEdge ← argmax e =( u , v ) ∈ E score[e] F ← F ∪ bestInEdge for each e=(u,v) ∈ E do subtract its score from all incoming edges score’[e] ← score[e] − score[bestInEdge] if T=(V,F) is a spanning tree then return it else C ← a cycle in F G’ ← C ONTRACT ( G , C ) T’ ← M AX S PANNING T REE ( G’ , root , score’ ) T ← E XPAND ( T’ , C ) return T function C ONTRACT ( G , C ) returns contracted graph function E XPAND ( T , C ) returns expanded graph 5

Algorithms for NLP CS 11-711 Fall 2020 Lecture 14: Graph-based - PowerPoint PPT Presentation

Algorithms for NLP CS 11-711 Fall 2020 Lecture 14: Graph-based dependency parsing Emma Strubell Announcements No recitation on Friday (Tartan Community Day). 2 Dependency parsing 3 Dependency parsing Input buffer Transition-based

SI485i : NLP Missing Topics and the Future Who cares about NLP? NLP has expanded quickly

SI425 : NLP Missing Topics and the Future Who cares about NLP? NLP has expanded quickly

NLP: Two pictures Wordnet and Word Sense Problem NLP Disambiguation Semantics NLP Trinity

Recurrent Neural Networks Graham Neubig Site https://phontron.com/class/nn4nlp2017/ NLP and

Natural Language Processing (NLP) In 11-711 Algorithms for NLP we take an

Ontologies for NLP NLP for Ontologies FOIS 2014 - LogOnto Workshop on Logics and Ontologies for

Algorithms for NLP 11-711, Fall 2019 Lecture 26: Computational Ethics Yulia Tsvetkov 1

Algorithms for NLP IITP, Fall 2019 Lecture 25: Computational Ethics Yulia Tsvetkov 1 Tsvetkov

Facing NLP German Rigau i Claramunt http://adimen.si.ehu.es/~rigau IXA group Departamento de

IXA pipes: Efficient and Ready to Use Multilingual NLP tools Rodrigo Agerri IXA NLP Group,

Prominent Research Directions in NLP Alexander Panchenko Assistant Professor for NLP About

Deep Learning for NLP Kiran Vodrahalli Feb 11, 2015 Overview What is NLP? Natural

Hybrid NLP Hybrid NLP O UTLINE O UTLINE Problems of Deep and Shallow Processing

NLP Programming Tutorial 4 - Word Segmentation Graham Neubig Nara Institute of Science and

SI485i : NLP Set 12 Features and Prediction What is NLP, really? Many of our tasks boil down

Capsule Networks for NLP Will Merrill Advanced NLP 10/25/18 Capsule Networks: A Better ConvNet

Simple and Accurate Dependency Parsing Using Bidirectional LSTM Feature Representation Eliyahu

Automatic Music Generation Graduate School of Culture Technology, KAIST Juhan Nam Outlines

Music Informatics Alan Smaill Mar 2, 2017 Alan Smaill Music Informatics Mar 2, 2017 1/17

Optimized Joint Unicast-Multicast Panoramic Video Streaming in Cellular Networks Akbar Majidi and

Learning From/For Knowledge Bases Graham Neubig Site https://phontron.com/class/nn4nlp2019/

Parking Can Get You There Faster Model Augmentation to Speed up Real-Time Model Checking Oliver

VI.2 IE for Entities, Relations, Roles Extracting named entities (either type-less constants or

Termination Dr. Liam OConnor University of Edinburgh LFCS (and UNSW) Term 2 2020 1

Sambuz

Useful Links

Newsletter

Mail Us