learned prioritization for trading off speed and accuracy
play

Learned Prioritization for Trading Off Speed and Accuracy Jiarong - PowerPoint PPT Presentation

Learned Prioritization for Trading Off Speed and Accuracy Jiarong Jiang 1 Adam Teichert 2 Hal Daum III 1 Jason Eisner 2 1 University of Maryland, College Park 2 Johns Hopkins University ICML workshop on Inferning: Interactions between Inference


  1. Learned Prioritization for Trading Off Speed and Accuracy Jiarong Jiang 1 Adam Teichert 2 Hal Daumé III 1 Jason Eisner 2 1 University of Maryland, College Park 2 Johns Hopkins University ICML workshop on Inferning: Interactions between Inference and Learning Jiang, Teichert, Daumé, Eisner (UMD, JHU) 1 / 21

  2. Introduction Introduction Fast and accurate structured prediction Jiang, Teichert, Daumé, Eisner (UMD, JHU) 2 / 21

  3. Introduction Introduction Fast and accurate structured prediction Manual exploration of speed/accuracy tradeoff Prioritization heuristics A* [Klein and Manning, 2003] Hierarchical A* [Pauls and Klein, 2010] Pruning heuristics Coarse-to-fine pruning [Charniak et al., 2006; Petrov and Klein, 2007] Classifier-based pruning [Roark and Hollingshead, 2008] Jiang, Teichert, Daumé, Eisner (UMD, JHU) 2 / 21

  4. Introduction Introduction Fast and accurate structured prediction Manual exploration of speed/accuracy tradeoff Prioritization heuristics A* [Klein and Manning, 2003] Hierarchical A* [Pauls and Klein, 2010] Pruning heuristics Coarse-to-fine pruning [Charniak et al., 2006; Petrov and Klein, 2007] Classifier-based pruning [Roark and Hollingshead, 2008] Goal: learn a heuristic for your input distribution, grammar, and speed/accuracy needs Jiang, Teichert, Daumé, Eisner (UMD, JHU) 2 / 21

  5. Introduction Introduction Fast and accurate structured prediction Manual exploration of speed/accuracy tradeoff Prioritization heuristics A* [Klein and Manning, 2003] Hierarchical A* [Pauls and Klein, 2010] Pruning heuristics Coarse-to-fine pruning [Charniak et al., 2006; Petrov and Klein, 2007] Classifier-based pruning [Roark and Hollingshead, 2008] Goal: learn a heuristic for your input distribution, grammar, and speed/accuracy needs Objective measure quality = accuracy − λ × time Jiang, Teichert, Daumé, Eisner (UMD, JHU) 2 / 21

  6. Priority-based Inference Agenda-based Parsing S S PP NP VP NP N V P DET N 0 Time 1 flies 2 like 3 an 4 arrow 5 Jiang, Teichert, Daumé, Eisner (UMD, JHU) 3 / 21

  7. Priority-based Inference Agenda-based Parsing GRAMMAR 1 S -> NP VP 6 S -> Vst NP 2 S -> S PP 1 VP -> VP PP AGENDA 2 VP-> V NP 1 NP -> DET N 10 3 NP 5 2 NP -> NP PP 3 NP -> NP NP 0 PP -> P NP S 8 NP 3 NP 4 P 2 DET 1 N 8 Vst 3 VP 4 V 5 0 Time 1 flies 2 like 3 an 4 arrow 5 Jiang, Teichert, Daumé, Eisner (UMD, JHU) 3 / 21

  8. Priority-based Inference Agenda-based Parsing GRAMMAR 1 S -> NP VP 6 S -> Vst NP 2 S -> S PP 1 VP -> VP PP AGENDA 2 VP-> V NP 1 NP -> DET N 10 3 NP 5 2 NP -> NP PP 3 NP -> NP NP 0 PP -> P NP S 8 NP 3 NP 4 P 2 DET 1 N 8 Vst 3 VP 4 V 5 0 Time 1 flies 2 like 3 an 4 arrow 5 Jiang, Teichert, Daumé, Eisner (UMD, JHU) 3 / 21

  9. Priority-based Inference Agenda-based Parsing GRAMMAR 1 S -> NP VP 6 S -> Vst NP 2 S -> S PP 1 VP -> VP PP AGENDA 2 VP-> V NP 1 NP -> DET N 10 3 NP 5 2 NP -> NP PP 3 NP -> NP NP 0 PP -> P NP NP 10 S 8 NP 3 NP 4 P 2 DET 1 N 8 Vst 3 VP 4 V 5 0 Time 1 flies 2 like 3 an 4 arrow 5 Jiang, Teichert, Daumé, Eisner (UMD, JHU) 3 / 21

  10. Priority-based Inference Agenda-based Parsing GRAMMAR 1 S -> NP VP 6 S -> Vst NP 2 S -> S PP 1 VP -> VP PP AGENDA 2 VP-> V NP 1 NP -> DET N 2 NP -> NP PP 3 NP -> NP NP 0 PP -> P NP NP 10 S 8 NP 3 NP 4 P 2 DET 1 N 8 Vst 3 VP 4 V 5 0 Time 1 flies 2 like 3 an 4 arrow 5 Jiang, Teichert, Daumé, Eisner (UMD, JHU) 3 / 21

  11. Priority-based Inference Agenda-based Parsing GRAMMAR 1 S -> NP VP 6 S -> Vst NP 2 S -> S PP AGENDA 1 VP -> VP PP 2 VP-> V NP 10 2 PP 5 1 NP -> DET N 12 2 VP 5 2 NP -> NP PP 3 NP -> NP NP 0 PP -> P NP NP 10 S 8 NP 3 NP 4 P 2 DET 1 N 8 Vst 3 VP 4 V 5 0 Time 1 flies 2 like 3 an 4 arrow 5 Jiang, Teichert, Daumé, Eisner (UMD, JHU) 3 / 21

  12. Priority-based Inference Agenda-based Parsing GRAMMAR 1 S -> NP VP 6 S -> Vst NP 2 S -> S PP AGENDA 1 VP -> VP PP 2 VP-> V NP 10 2 PP 5 1 NP -> DET N 12 2 VP 5 2 NP -> NP PP 3 NP -> NP NP 0 PP -> P NP NP 10 S 8 NP 3 NP 4 P 2 DET 1 N 8 Vst 3 VP 4 V 5 0 Time 1 flies 2 like 3 an 4 arrow 5 Jiang, Teichert, Daumé, Eisner (UMD, JHU) 3 / 21

  13. Priority-based Inference Speed Accuracy for Agenda-based Parsing All experiments are on Penn Treebank WSJ with sentence length ≤ 15. Preliminary results setup: Berkeley latent variable PCFG trained on section 2-20 Training set: 100 sentences from section 21 Evaluated on the same 100 sentences Baseline 1: Exhaustive Search Recall: 93.3; Relative number of pops: 3.0x Baseline 2: Uniform Cost Search (UC) Recall: 93.3; Relative number of pops: 1.0x Baseline 3: Pruned Uniform Cost Search Recall: 92.0; Relative number of pops: 0.33x Jiang, Teichert, Daumé, Eisner (UMD, JHU) 4 / 21

  14. Priority-based Inference Agenda-based Parsing as a Markov Decision Process State space: current chart and agenda Action: pop a partial parse from the agenda Transition: Given the chosen action, deterministically updates chart and pushes other parses to the agenda Policy: computes action priorities from extracted features π θ ( s ) = arg max θ · φ ( a , s ) a (Delayed) Reward reward = accuracy − λ × time accuracy = labeled span recall time = # of pops from agenda Jiang, Teichert, Daumé, Eisner (UMD, JHU) 5 / 21

  15. Priority-based Inference Agenda-based Parsing as a Markov Decision Process State space: current chart and agenda Action: pop a partial parse from the agenda Transition: Given the chosen action, deterministically updates chart and pushes other parses to the agenda Policy: computes action priorities from extracted features π θ ( s ) = arg max θ · φ ( a , s ) a (Delayed) Reward reward = accuracy − λ × time accuracy = labeled span recall time = # of pops from agenda ✞ ☎ Learning Policy = Learning Prioritization Function ✝ ✆ Jiang, Teichert, Daumé, Eisner (UMD, JHU) 5 / 21

  16. Priority-based Inference Decoding as a Markov Decision Process (MDP) GRAMMAR 1 S -> NP VP 6 S -> Vst NP 2 S -> S PP AGENDA 1 VP -> VP PP 2 VP-> V NP 10??? 2 PP 5 1 NP -> DET N 12??? 2 VP 5 2 NP -> NP PP 3 NP -> NP NP 0 PP -> P NP NP 10 S 8 NP 3 NP 4 P 2 DET 1 N 8 Vst 3 VP 4 V 5 0 Time 1 flies 2 like 3 an 4 arrow 5 Jiang, Teichert, Daumé, Eisner (UMD, JHU) 6 / 21

  17. Attempt 1: Policy Gradient with Boltzmann Exploration Boltzmann Exploration Transition at test time: deterministic Transition at training time: exploration with stochastic policies: π � θ ( a | s ) . Boltzmann exploration: 1 � 1 � � θ · � π � θ ( a | s ) = Z ( s ) exp φ ( a , s ) temp Temperature → 0, exploration → exploitation A trajectory τ = � s 0 , a 0 , r 0 , s 1 , a 1 , r 1 , . . . , s T , a T , r T � . Expected future reward: � T � � R = E τ ∼ π � θ [ R ( τ )] = E τ ∼ π � r t . θ t = 0 Jiang, Teichert, Daumé, Eisner (UMD, JHU) 7 / 21

  18. Attempt 1: Policy Gradient with Boltzmann Exploration Policy Gradient Find parameters that maximize the expected reward with respect to the induced distribution over trajectories Policy gradient [Sutton et al., 2000] The gradient of the objective T � � � ∇ � θ E τ [ R ( τ )] = E τ R ( τ ) ∇ � θ log π ( a t | s t ) t = 0 where � � 1 θ ( a ′ | s t ) � � � φ ( a ′ , s t ) ∇ � θ log π � θ ( a | s ) = φ ( a t , s t ) − π � temp a ′ ∈ A Jiang, Teichert, Daumé, Eisner (UMD, JHU) 8 / 21

  19. Attempt 1: Policy Gradient with Boltzmann Exploration Features Width of partial parse 1 Viterbi inside score 2 Touches start of sentence? 3 Touches end of sentence? 4 Ratio of width to sentence length 5 log p ( label | prev POS ) and log p ( label | next POS ) 6 (statistics extracted from labeled trees, word POS assumed to be most frequent) Case pattern of first word in partial parse and previous/next word 7 Punctuation pattern in partial parse (five most frequent) 8 Jiang, Teichert, Daumé, Eisner (UMD, JHU) 9 / 21

  20. Attempt 1: Policy Gradient with Boltzmann Exploration Policy Gradient with Boltzmann Exploration Preliminary results: Method Recall Relative # of pops Policy Gradient w/ 56.4 0.46x Boltzmann Exploration Uniform cost search 93.3 1.0x Pruned uniform cost search 92.0 0.33x Jiang, Teichert, Daumé, Eisner (UMD, JHU) 10 / 21

  21. Attempt 1: Policy Gradient with Boltzmann Exploration Policy Gradient with Boltzmann Exploration Preliminary results: Method Recall Relative # of pops Policy Gradient w/ 56.4 0.46x Boltzmann Exploration Uniform cost search 93.3 1.0x Pruned uniform cost search 92.0 0.33x Main Difficulty : ✞ ☎ Which actions were “responsible” for a trajectory’s reward? ✝ ✆ Jiang, Teichert, Daumé, Eisner (UMD, JHU) 10 / 21

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend