training deterministic parsers with non deterministic
play

Training Deterministic Parsers with Non-Deterministic Oracles by - PowerPoint PPT Presentation

Training Deterministic Parsers with Non-Deterministic Oracles by Yoav Goldberg and Joakim Nivre, 2013 Seminarvortrag Pius Meinert July 13, 2018 Training Deterministic Parsers with Non-Deterministic Oracles root PRD SBJ DOBJ IOBJ


  1. “Training Deterministic Parsers with Non-Deterministic Oracles” by Yoav Goldberg and Joakim Nivre, 2013 Seminarvortrag Pius Meinert July 13, 2018

  2. Training Deterministic Parsers with Non-Deterministic Oracles root PRD SBJ DOBJ IOBJ DET 1 He 1 wrote 2 her 3 a 4 letter 5

  3. Training Deterministic Parsers with Non-Deterministic Oracles root PRD SBJ DOBJ IOBJ DET 1 He 1 wrote 2 her 3 a 4 letter 5

  4. Transition System Defjnition (Transition System) A transition system for dependency parsing is a quadruple 1. C is a set (confjgurations), 2. T is a set of transitions, each of which is a (partial) 2 S = ( C , T , c s , C t ) , where function t : C → C , 3. c s is an initialization function, mapping sentence w = w 1 w 2 ... w n to a confjguration c ∈ C , 4. C t ⊆ C (terminal confjgurations).

  5. c s w He 1 wrote 2 her 3 a 4 letter 5 Training Deterministic Parsers with Non-Deterministic Oracles root PRD SBJ DOBJ IOBJ DET root 3 He 1 wrote 2 her 3 a 4 letter 5

  6. Training Deterministic Parsers with Non-Deterministic Oracles root PRD SBJ DOBJ IOBJ DET 3 He 1 wrote 2 her 3 a 4 letter 5 c s ( w ) [ root ] , [ He 1 , wrote 2 , her 3 , a 4 , letter 5 ] , {}

  7. Training Deterministic Parsers with Non-Deterministic Oracles root PRD SBJ DOBJ IOBJ DET Shift 3 He 1 wrote 2 her 3 a 4 letter 5 [ root ] , [ He 1 , wrote 2 , her 3 , a 4 , letter 5 ] [ root , He 1 ] , [ wrote 2 , her 3 , a 4 , letter 5 ]

  8. Training Deterministic Parsers with Non-Deterministic Oracles root PRD SBJ DOBJ IOBJ DET Left SBJ 3 He 1 wrote 2 her 3 a 4 letter 5 [ root , He 1 ] , [ wrote 2 , her 3 , a 4 , letter 5 ] [ root ] , [ wrote 2 , her 3 , a 4 , letter 5 ]

  9. Training Deterministic Parsers with Non-Deterministic Oracles root PRD SBJ DOBJ IOBJ DET Right PRD 3 He 1 wrote 2 her 3 a 4 letter 5 [ root ] , [ wrote 2 , her 3 , a 4 , letter 5 ] [ root , wrote 2 ] , [ her 3 , a 4 , letter 5 ]

  10. Training Deterministic Parsers with Non-Deterministic Oracles root PRD SBJ DOBJ IOBJ DET 3 He 1 wrote 2 her 3 a 4 letter 5 [ root , wrote 2 ] , [ her 3 , a 4 , letter 5 ] Right IOBJ , Shift , Left DET , Reduce , Right DOBJ [ root , wrote 2 , letter 5 ] , [ ] ∈ C t

  11. Training Deterministic Parsers with Non-Deterministic Oracles 2 4 then 6 7 else 8 4 1 if c = ( σ | i , j | β, A ) and ( j , i ) ∈ T then t ← Left 3 else if c = ( σ | i , j | β, A ) and ( i , j ) ∈ T then t ← Right 5 else if c = ( σ | i , j | β, A ) and ∃ k [ k < i ∧ [( k , j ) ∈ T ∨ ( j , k ) ∈ T ]] t ← Reduce t ← Shift 9 return t

  12. Greedy Classifjer-based Parsing 1 11 10 9 8 7 6 5 4 3 2 5 c ← c s ( w ) while c / ∈ C t do t p ← arg max t ∈ Legal ( c ) w · φ ( c , t ) c ← t p ( c ) 12 return A c

  13. Training Deterministic Parsers with Non-Deterministic Oracles 6 12 return w 11 else 10 9 8 7 5 4 2 3 6 1 for ( w , T ) ∈ d do c ← c s ( w ) while c / ∈ C t do t p ← arg max t ∈ Legal ( c ) w · φ ( c , t ) Correct ( c ) ← { t | o ( t ; c , T ) = true } t o ← arg max t ∈ Correct ( c ) w · φ ( c , t ) if t p / ∈ Correct ( c ) then Update ( w , φ ( c , t o ) , φ ( c , t p )) c ← t o ( c ) c ← t p ( c )

  14. SH LA SBJ RA PRD RA IOBJ RE SH LA DET RA DOBJ Training Deterministic Parsers with Non-Deterministic Oracles DOBJ instead of static oracle spurious ambiguity requires non-deterministic oracle DET IOBJ SBJ root PRD 7 He 1 wrote 2 her 3 a 4 letter 5 SH , LA SBJ , RA PRD , RA IOBJ , SH , LA DET , RE , RA DOBJ

  15. Training Deterministic Parsers with Non-Deterministic Oracles root instead of static oracle DET IOBJ DOBJ SBJ PRD 7 He 1 wrote 2 her 3 a 4 letter 5 SH , LA SBJ , RA PRD , RA IOBJ , SH , LA DET , RE , RA DOBJ SH , LA SBJ , RA PRD , RA IOBJ , RE , SH , LA DET , RA DOBJ → spurious ambiguity requires non-deterministic oracle

  16. ... with Non-Deterministic and Complete Oracles root dynamic oracle: non-deterministic + complete error propagation can be mitigated by complete oracle DET DOBJ SBJ PRD 8 He 1 wrote 2 her 3 a 4 letter 5 [ root ] , [ He 1 , wrote 2 , her 3 , a 4 , letter 5 ] SH , LA SBJ , RA PRD , SH [ root , wrote 2 , her 3 ] , [ a 4 , letter 5 ]

  17. ... with Non-Deterministic and Complete Oracles PRD DET DOBJ root SBJ 8 He 1 wrote 2 her 3 a 4 letter 5 [ root , wrote 2 , her 3 ] , [ a 4 , letter 5 ] SH , LA DET , SH [ root , wrote 2 , her 3 , letter 5 ] , [ ] ∈ C t → error propagation can be mitigated by complete oracle → dynamic oracle: non-deterministic + complete

  18. Training (Standard) 6 12 return w 11 else 10 9 8 7 5 4 2 3 9 1 for ( w , T ) ∈ d do c ← c s ( w ) while c / ∈ C t do t p ← arg max t ∈ Legal ( c ) w · φ ( c , t ) Correct ( c ) ← { t | o ( t ; c , T ) = true } t o ← arg max t ∈ Correct ( c ) w · φ ( c , t ) if t p / ∈ Correct ( c ) then Update ( w , φ ( c , t o ) , φ ( c , t p )) c ← t o ( c ) c ← t p ( c )

  19. Training with Exploration 6 12 return w 11 else 10 9 8 7 5 4 2 3 10 1 for ( w , T ) ∈ d do c ← c s ( w ) while c / ∈ C t do t p ← arg max t ∈ Legal ( c ) w · φ ( c , t ) Optimal ( c ) ← { t | o ( t ; c , T ) = true } t o ← arg max t ∈ Optimal ( c ) w · φ ( c , t ) if t p / ∈ Optimal ( c ) then Update ( w , φ ( c , t o ) , φ ( c , t p )) c ← Explore ( c , t o , t p ) c ← t p ( c )

  20. o d c T Optimality / Transition Costs DOBJ 0 t c T t 2 A T DET IOBJ root DOBJ SBJ PRD 11 wrote 2 He 1 her 3 a 4 letter 5

  21. o d c T Optimality / Transition Costs IOBJ 0 t c T t DET DOBJ DOBJ root SBJ PRD 11 wrote 2 He 1 her 3 a 4 letter 5 C ( A , T ) = 2

  22. o d c T Optimality / Transition Costs IOBJ 0 t c T t DET DOBJ DOBJ root SBJ PRD 11 wrote 2 a 4 He 1 her 3 letter 5 [ root , wrote 2 , her 3 ] , [ a 4 , letter 5 ]

  23. o d c T Optimality / Transition Costs root 0 t c T t DET DOBJ IOBJ IOBJ DOBJ DOBJ SBJ PRD 11 wrote 2 a 4 He 1 her 3 letter 5 min A : c � A C ( A , T ) = 0

  24. o d c T Optimality / Transition Costs root 0 t c T t DET DOBJ IOBJ DOBJ SBJ PRD 11 He 1 wrote 2 her 3 a 4 letter 5 [ root , wrote 2 , her 3 ] , [ a 4 , letter 5 ] SH , ...

  25. o d c T Optimality / Transition Costs DOBJ 0 t c T t DET DOBJ root IOBJ DOBJ SBJ PRD 11 wrote 2 a 4 He 1 her 3 letter 5 C ( Shift ; c , T ) = A : t ( c ) � A C ( A , T ) − min min A : c � A C ( A , T ) = 1

  26. Optimality / Transition Costs SBJ DET DOBJ IOBJ root DOBJ DOBJ PRD 11 wrote 2 a 4 He 1 her 3 letter 5 C ( Shift ; c , T ) = A : t ( c ) � A C ( A , T ) − min min A : c � A C ( A , T ) = 1 o d ( c , T ) = { t | C ( t ; c , T ) = 0 }

  27. Arc Decomposition - Defjnition Defjnition (Tree Consistency) A set of arcs A is said to be tree consistent if there exists a Defjnition (Arc Decomposition) A transition system is said to be arc decomposable if, for 12 projective dependency tree T such that A ⊆ T . every tree consistent arc set A and confjguration c , c � A is entailed by c � ( h , d ) for every arc ( h , d ) ∈ A .

  28. Arc Decomposition - Arc-Standard Counterexample a b c Arc-Standard Transitions 13 c = ([ a , b , c ] , β ) Left [( σ | s 1 | s 0 , β, A )] = ( σ | s 0 , β, A ∪ { ( s 0 , s 1 ) } ) Right [( σ | s 1 | s 0 , β, A )] = ( σ | s 1 , β, A ∪ { ( s 1 , s 0 ) } ) Shift [( σ, b | β, A )] = ( σ | b , β, A )

  29. Arc Decomposition - Arc-Standard Counterexample Left a b c Arc-Standard Transitions 13 c = ([ a , b , c ] , β ) ⊢ ([ a , c ] , β ) Left [( σ | s 1 | s 0 , β, A )] = ( σ | s 0 , β, A ∪ { ( s 0 , s 1 ) } ) Right [( σ | s 1 | s 0 , β, A )] = ( σ | s 1 , β, A ∪ { ( s 1 , s 0 ) } ) Shift [( σ, b | β, A )] = ( σ | b , β, A )

  30. Arc Decomposition - Arc-Standard Counterexample Right Left a b c Arc-Standard Transitions 13 c = ([ a , b , c ] , β ) ⊢ ([ a , b ] , β ) ⊢ ([ b ] , β ) Left [( σ | s 1 | s 0 , β, A )] = ( σ | s 0 , β, A ∪ { ( s 0 , s 1 ) } ) Right [( σ | s 1 | s 0 , β, A )] = ( σ | s 1 , β, A ∪ { ( s 1 , s 0 ) } ) Shift [( σ, b | β, A )] = ( σ | b , β, A )

  31. Arc Decomposition - Arc-Eager Proof Sketch 14 Given: arbitrary confjguration c = ( σ, β, A ) and tree consistent arc set A ′ such that all arc are reachable from c . To show: c � A ′ B = { ( h , d ) | h , d / ∈ β } B = { ( h , d ) | h , d ∈ β } B h = { ( h , d ) | h ∈ β, d ∈ σ } B d = { ( h , d ) | d ∈ β, h ∈ σ }

  32. Arc Decomposition - Arc-Eager Proof Sketch 5 0 7 6 8 4 3 2 1 14 β σ B = { ( h , d ) | h , d / ∈ β } B = { ( h , d ) | h , d ∈ β } B h = { ( h , d ) | h ∈ β, d ∈ σ } B d = { ( h , d ) | d ∈ β, h ∈ σ }

  33. Arc Decomposition - Arc-Eager Proof Sketch 5 0 7 6 8 4 3 2 1 14 β σ B = { ( h , d ) | h , d / ∈ β } B = { ( h , d ) | h , d ∈ β } B h = { ( h , d ) | h ∈ β, d ∈ σ } B d = { ( h , d ) | d ∈ β, h ∈ σ }

  34. Arc Decomposition - Arc-Eager Proof Sketch 5 0 7 6 8 4 3 2 1 14 β σ B = { ( h , d ) | h , d / ∈ β } B = { ( h , d ) | h , d ∈ β } B h = { ( h , d ) | h ∈ β, d ∈ σ } B d = { ( h , d ) | d ∈ β, h ∈ σ }

  35. Arc Decomposition - Arc-Eager Proof Sketch 5 0 7 6 8 4 3 2 1 14 β σ B = { ( h , d ) | h , d / ∈ β } B = { ( h , d ) | h , d ∈ β } B h = { ( h , d ) | h ∈ β, d ∈ σ } B d = { ( h , d ) | d ∈ β, h ∈ σ }

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend