algorithms for nlp
play

Algorithms for NLP CS 11-711 Fall 2020 Lecture 14: Graph-based - PowerPoint PPT Presentation

Algorithms for NLP CS 11-711 Fall 2020 Lecture 14: Graph-based dependency parsing Emma Strubell Announcements No recitation on Friday (Tartan Community Day). 2 Dependency parsing 3 Dependency parsing Input buffer Transition-based


  1. Algorithms for NLP CS 11-711 · Fall 2020 Lecture 14: Graph-based dependency parsing Emma Strubell

  2. Announcements ■ No recitation on Friday (Tartan Community Day). 2

  3. Dependency parsing 3

  4. Dependency parsing Input buffer ■ Transition-based (shift-reduce) parsing: w1 w2 wn s1 Dependency Parser Relations s2 Oracle ... Stack sn 3

  5. Dependency parsing Input buffer ■ Transition-based (shift-reduce) parsing: w1 w2 wn ■ Greedy choice of local transitions guided by a good classifier. s1 Dependency Parser Relations s2 Oracle ... Stack sn 3

  6. Dependency parsing Input buffer ■ Transition-based (shift-reduce) parsing: w1 w2 wn ■ Greedy choice of local transitions guided by a good classifier. s1 Dependency Parser Relations s2 Oracle ... ■ Examples: MaltParser [Nivre et al. 2008], Stack Stack LSTM [Dyer et al. 2015] sn 3

  7. Dependency parsing Input buffer ■ Transition-based (shift-reduce) parsing: w1 w2 wn ■ Greedy choice of local transitions guided by a good classifier. s1 Dependency Parser Relations s2 Oracle ... ■ Examples: MaltParser [Nivre et al. 2008], Stack Stack LSTM [Dyer et al. 2015] sn ■ Graph-based dependency parsing: 4 4 12 5 8 root Book that flight 6 7 7 5 3

  8. Dependency parsing Input buffer ■ Transition-based (shift-reduce) parsing: w1 w2 wn ■ Greedy choice of local transitions guided by a good classifier. s1 Dependency Parser Relations s2 Oracle ... ■ Examples: MaltParser [Nivre et al. 2008], Stack Stack LSTM [Dyer et al. 2015] sn ■ Graph-based dependency parsing: ■ Given scores for every pair of words, find 4 4 the (globally) highest scoring set of edges. 12 5 8 root Book that flight 6 7 7 5 3

  9. Dependency parsing Input buffer ■ Transition-based (shift-reduce) parsing: w1 w2 wn ■ Greedy choice of local transitions guided by a good classifier. s1 Dependency Parser Relations s2 Oracle ... ■ Examples: MaltParser [Nivre et al. 2008], Stack Stack LSTM [Dyer et al. 2015] sn ■ Graph-based dependency parsing: ■ Given scores for every pair of words, find 4 4 the (globally) highest scoring set of edges. 12 5 8 ■ Examples: MSTParser [McDonald et al. root Book that flight 6 7 2005], TurboParser [Martins et al. 2009], 7 Deep Biaffine [Dozat et al. 2017] 5 3

  10. Graph-based dependency parsing 4 4 12 5 8 root Book that flight 6 7 7 5 4

  11. Graph-based dependency parsing ■ Edge-factored (or arc-factored ) approaches: 4 4 12 5 8 root Book that flight 6 7 7 5 4

  12. <latexit sha1_base64="B8n3543ays3HiDAwfo7VpnAkHG8=">AD9XicfVJLixNBEO4kPtb42KwevbSGhVCSFZBQYRFPXgRI5jdhXQIPT01kzb9GLp78mCYn+JNvPpbPpLvNozSZadzWLD9HxUfVX1VUHieDW9Xp/avXGjZu3bu/dad69d/Bfuvg4anVqWEwZFpocx5QC4IrGDruBJwnBqgMBJwFs/eF/2wOxnKtvrpVAmNJY8UjzqjzpklrRgaWH5FAZqu8g4v/In+DiZuCo8/wW0xsKicZJ0vD46mjxuhFZnL8DROu8DosJ4lPsUvZputs01a7V63Vx68C/ob0EabM5gc1A0JNUslKMcEtXbU7yVunFHjOBOQN0lqIaFsRmMYeaioBDvOyqnk+NBbQhxp4z/lcGm9HJFRae1KBp4pqZvaq7CeJ1vlLro9TjKkdKLYuFKUCO42LEeOQG2BOrDygzHCvFbMpNZQ5v4jm4eUyUxBzcNVGmBxnNiqrVyQFMq8GL9eNokBQumpaQqfJ6RiEouViFENBUuz4iNtvi6eXCOU/sZnQXKQU4ov1CuaJCQORIcVXN5bpJeTfJB/ALMvDJq/6cgKFOG6+EmljSZe4XFpMnpID/Y3J1wfSw2lZWCvDNFHPRCagsLyET2gIJYqPTpCJ4J74U6hPQyK9hzYdq2JrhX2n/6pvcBafH3f6L7vGXl+2Td5v3uoceo6foCPXRK3SCPqIBGiKGfqO/NVSrNRaN740fjZ9rar2iXmEKqfx6x+lY1wO</latexit> Graph-based dependency parsing ■ Edge-factored (or arc-factored ) approaches: ■ Score of a tree decomposes as sum of edge scores: r X Ψ ( y , w ; θ ) = ψ ( i → j , w , θ ) − r → j ∈ y − i 4 4 12 5 8 root Book that flight 6 7 7 5 4

  13. <latexit sha1_base64="B8n3543ays3HiDAwfo7VpnAkHG8=">AD9XicfVJLixNBEO4kPtb42KwevbSGhVCSFZBQYRFPXgRI5jdhXQIPT01kzb9GLp78mCYn+JNvPpbPpLvNozSZadzWLD9HxUfVX1VUHieDW9Xp/avXGjZu3bu/dad69d/Bfuvg4anVqWEwZFpocx5QC4IrGDruBJwnBqgMBJwFs/eF/2wOxnKtvrpVAmNJY8UjzqjzpklrRgaWH5FAZqu8g4v/In+DiZuCo8/wW0xsKicZJ0vD46mjxuhFZnL8DROu8DosJ4lPsUvZputs01a7V63Vx68C/ob0EabM5gc1A0JNUslKMcEtXbU7yVunFHjOBOQN0lqIaFsRmMYeaioBDvOyqnk+NBbQhxp4z/lcGm9HJFRae1KBp4pqZvaq7CeJ1vlLro9TjKkdKLYuFKUCO42LEeOQG2BOrDygzHCvFbMpNZQ5v4jm4eUyUxBzcNVGmBxnNiqrVyQFMq8GL9eNokBQumpaQqfJ6RiEouViFENBUuz4iNtvi6eXCOU/sZnQXKQU4ov1CuaJCQORIcVXN5bpJeTfJB/ALMvDJq/6cgKFOG6+EmljSZe4XFpMnpID/Y3J1wfSw2lZWCvDNFHPRCagsLyET2gIJYqPTpCJ4J74U6hPQyK9hzYdq2JrhX2n/6pvcBafH3f6L7vGXl+2Td5v3uoceo6foCPXRK3SCPqIBGiKGfqO/NVSrNRaN740fjZ9rar2iXmEKqfx6x+lY1wO</latexit> Graph-based dependency parsing ■ Edge-factored (or arc-factored ) approaches: ■ Score of a tree decomposes as sum of edge scores: r X Ψ ( y , w ; θ ) = ψ ( i → j , w , θ ) − r → j ∈ y − i 4 ■ Start with a fully-connected directed graph 4 12 5 8 root Book that flight 6 7 7 5 4

  14. <latexit sha1_base64="B8n3543ays3HiDAwfo7VpnAkHG8=">AD9XicfVJLixNBEO4kPtb42KwevbSGhVCSFZBQYRFPXgRI5jdhXQIPT01kzb9GLp78mCYn+JNvPpbPpLvNozSZadzWLD9HxUfVX1VUHieDW9Xp/avXGjZu3bu/dad69d/Bfuvg4anVqWEwZFpocx5QC4IrGDruBJwnBqgMBJwFs/eF/2wOxnKtvrpVAmNJY8UjzqjzpklrRgaWH5FAZqu8g4v/In+DiZuCo8/wW0xsKicZJ0vD46mjxuhFZnL8DROu8DosJ4lPsUvZputs01a7V63Vx68C/ob0EabM5gc1A0JNUslKMcEtXbU7yVunFHjOBOQN0lqIaFsRmMYeaioBDvOyqnk+NBbQhxp4z/lcGm9HJFRae1KBp4pqZvaq7CeJ1vlLro9TjKkdKLYuFKUCO42LEeOQG2BOrDygzHCvFbMpNZQ5v4jm4eUyUxBzcNVGmBxnNiqrVyQFMq8GL9eNokBQumpaQqfJ6RiEouViFENBUuz4iNtvi6eXCOU/sZnQXKQU4ov1CuaJCQORIcVXN5bpJeTfJB/ALMvDJq/6cgKFOG6+EmljSZe4XFpMnpID/Y3J1wfSw2lZWCvDNFHPRCagsLyET2gIJYqPTpCJ4J74U6hPQyK9hzYdq2JrhX2n/6pvcBafH3f6L7vGXl+2Td5v3uoceo6foCPXRK3SCPqIBGiKGfqO/NVSrNRaN740fjZ9rar2iXmEKqfx6x+lY1wO</latexit> Graph-based dependency parsing ■ Edge-factored (or arc-factored ) approaches: ■ Score of a tree decomposes as sum of edge scores: r X Ψ ( y , w ; θ ) = ψ ( i → j , w , θ ) − r → j ∈ y − i 4 ■ Start with a fully-connected directed graph 4 12 ■ How to infer the highest scoring tree? 5 8 root Book that flight 6 7 7 5 4

  15. <latexit sha1_base64="B8n3543ays3HiDAwfo7VpnAkHG8=">AD9XicfVJLixNBEO4kPtb42KwevbSGhVCSFZBQYRFPXgRI5jdhXQIPT01kzb9GLp78mCYn+JNvPpbPpLvNozSZadzWLD9HxUfVX1VUHieDW9Xp/avXGjZu3bu/dad69d/Bfuvg4anVqWEwZFpocx5QC4IrGDruBJwnBqgMBJwFs/eF/2wOxnKtvrpVAmNJY8UjzqjzpklrRgaWH5FAZqu8g4v/In+DiZuCo8/wW0xsKicZJ0vD46mjxuhFZnL8DROu8DosJ4lPsUvZputs01a7V63Vx68C/ob0EabM5gc1A0JNUslKMcEtXbU7yVunFHjOBOQN0lqIaFsRmMYeaioBDvOyqnk+NBbQhxp4z/lcGm9HJFRae1KBp4pqZvaq7CeJ1vlLro9TjKkdKLYuFKUCO42LEeOQG2BOrDygzHCvFbMpNZQ5v4jm4eUyUxBzcNVGmBxnNiqrVyQFMq8GL9eNokBQumpaQqfJ6RiEouViFENBUuz4iNtvi6eXCOU/sZnQXKQU4ov1CuaJCQORIcVXN5bpJeTfJB/ALMvDJq/6cgKFOG6+EmljSZe4XFpMnpID/Y3J1wfSw2lZWCvDNFHPRCagsLyET2gIJYqPTpCJ4J74U6hPQyK9hzYdq2JrhX2n/6pvcBafH3f6L7vGXl+2Td5v3uoceo6foCPXRK3SCPqIBGiKGfqO/NVSrNRaN740fjZ9rar2iXmEKqfx6x+lY1wO</latexit> Graph-based dependency parsing ■ Edge-factored (or arc-factored ) approaches: ■ Score of a tree decomposes as sum of edge scores: r X Ψ ( y , w ; θ ) = ψ ( i → j , w , θ ) − r → j ∈ y − i 4 ■ Start with a fully-connected directed graph 4 12 ■ How to infer the highest scoring tree? 5 8 root Book that flight 6 7 ■ Find a maximum directed spanning tree : 7 5 Chu and Liu (1965) and Edmonds (1967) algorithm 4

  16. Chu-Liu-Edmonds algorithm function M AX S PANNING T REE ( G=(V,E) , root , score ) returns spanning tree F ← [] T’ ← [] score’ ← [] for each v ∈ V do bestInEdge ← argmax e =( u , v ) ∈ E score[e] F ← F ∪ bestInEdge for each e=(u,v) ∈ E do score’[e] ← score[e] − score[bestInEdge] if T=(V,F) is a spanning tree then return it else C ← a cycle in F G’ ← C ONTRACT ( G , C ) T’ ← M AX S PANNING T REE ( G’ , root , score’ ) T ← E XPAND ( T’ , C ) return T function C ONTRACT ( G , C ) returns contracted graph function E XPAND ( T , C ) returns expanded graph 5

  17. Chu-Liu-Edmonds algorithm function M AX S PANNING T REE ( G=(V,E) , root , score ) returns spanning tree F ← [] T’ ← [] score’ ← [] for each v ∈ V do select best incoming edge for each node bestInEdge ← argmax e =( u , v ) ∈ E score[e] F ← F ∪ bestInEdge for each e=(u,v) ∈ E do score’[e] ← score[e] − score[bestInEdge] if T=(V,F) is a spanning tree then return it else C ← a cycle in F G’ ← C ONTRACT ( G , C ) T’ ← M AX S PANNING T REE ( G’ , root , score’ ) T ← E XPAND ( T’ , C ) return T function C ONTRACT ( G , C ) returns contracted graph function E XPAND ( T , C ) returns expanded graph 5

  18. Chu-Liu-Edmonds algorithm function M AX S PANNING T REE ( G=(V,E) , root , score ) returns spanning tree F ← [] T’ ← [] score’ ← [] for each v ∈ V do select best incoming edge for each node bestInEdge ← argmax e =( u , v ) ∈ E score[e] F ← F ∪ bestInEdge for each e=(u,v) ∈ E do subtract its score from all incoming edges score’[e] ← score[e] − score[bestInEdge] if T=(V,F) is a spanning tree then return it else C ← a cycle in F G’ ← C ONTRACT ( G , C ) T’ ← M AX S PANNING T REE ( G’ , root , score’ ) T ← E XPAND ( T’ , C ) return T function C ONTRACT ( G , C ) returns contracted graph function E XPAND ( T , C ) returns expanded graph 5

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend