structured perceptron with inexact search
play

Structured Perceptron with Inexact Search x x the man bit - PowerPoint PPT Presentation

Structured Perceptron with Inexact Search x x the man bit the dog x the man bit the dog x DT NN VBD DT NN y y y=+ 1 y=- 1 Liang Huang Suphan Fayong Yang Guo Information Sciences Institute


  1. Structured Perceptron with Inexact Search x x the man bit the dog x the man bit the dog x DT NN VBD DT NN y 那 人 咬 了 狗 y y=+ 1 y=- 1 Liang Huang Suphan Fayong Yang Guo Information Sciences Institute University of Southern California NAACL 2012 Montréal June 2012

  2. Structured Perceptron (Collins 02) binary classification w x x exact z x update weights inference if y ≠ z y y=+ 1 y=- 1 2

  3. Structured Perceptron (Collins 02) binary classification w x x exact z x update weights inference if y ≠ z y y=+ 1 y=- 1 structured classification the man bit the dog x DT NN VBD DT NN y 2

  4. Structured Perceptron (Collins 02) binary classification w x x exact z x update weights inference if y ≠ z y y=+ 1 y=- 1 structured classification w exact the man bit the dog x z x update weights inference if y ≠ z y DT NN VBD DT NN y 2

  5. Structured Perceptron (Collins 02) binary classification trivial w x x exact z x constant update weights inference # of classes if y ≠ z y y=+ 1 y=- 1 hard exponential structured classification w # of classes exact the man bit the dog x z x update weights inference if y ≠ z y DT NN VBD DT NN y • challenge: search efficiency (exponentially many classes) • often use dynamic programming (DP) • but still too slow for repeated use, e.g. parsing is O ( n 3 ) • and can’t use non-local features in DP 2

  6. Perceptron w/ Inexact Inference w the man bit the dog x inexact z x update weights inference if y ≠ z DT NN VBD DT NN y y beam search greedy search 3

  7. Perceptron w/ Inexact Inference w the man bit the dog x inexact z x update weights inference if y ≠ z DT NN VBD DT NN y y beam search greedy search • routine use of inexact inference in NLP (e.g. beam search) 3

  8. Perceptron w/ Inexact Inference w the man bit the dog x inexact z x update weights inference if y ≠ z DT NN VBD DT NN y y beam search greedy search • routine use of inexact inference in NLP (e.g. beam search) • how does structured perceptron work with inexact search? 3

  9. Perceptron w/ Inexact Inference w the man bit the dog x inexact z x update weights inference if y ≠ z DT NN VBD DT NN y y beam search greedy search • routine use of inexact inference in NLP (e.g. beam search) • how does structured perceptron work with inexact search? • so far most structured learning theory assume exact search 3

  10. Perceptron w/ Inexact Inference w the man bit the dog x inexact z x update weights inference if y ≠ z DT NN VBD DT NN y y beam search greedy search • routine use of inexact inference in NLP (e.g. beam search) • how does structured perceptron work with inexact search? • so far most structured learning theory assume exact search • would search errors break these learning properties? 3

  11. Perceptron w/ Inexact Inference w the man bit the dog x inexact z x update weights inference if y ≠ z DT NN VBD DT NN y y does it still work??? beam search greedy search • routine use of inexact inference in NLP (e.g. beam search) • how does structured perceptron work with inexact search? • so far most structured learning theory assume exact search • would search errors break these learning properties? • if so how to modify learning to accommodate inexact search? 3

  12. Prior work: Early update (Collins/Roark) w greedy z x early update on or beam prefixes y’, z’ y 4

  13. Prior work: Early update (Collins/Roark) w greedy z x early update on or beam prefixes y’, z’ y • a partial answer: “early update” (Collins & Roark, 2004) • a heuristic for perceptron with greedy or beam search • updates on prefixes rather than full sequences • works much better than standard update in practice, but... 4

  14. Prior work: Early update (Collins/Roark) w greedy z x early update on or beam prefixes y’, z’ y • a partial answer: “early update” (Collins & Roark, 2004) • a heuristic for perceptron with greedy or beam search • updates on prefixes rather than full sequences • works much better than standard update in practice, but... • two major problems for early update • there is no theoretical justification -- why does it work? • it learns too slowly (due to partial examples); e.g. 40 epochs 4

  15. Prior work: Early update (Collins/Roark) w greedy z x early update on or beam prefixes y’, z’ y • a partial answer: “early update” (Collins & Roark, 2004) • a heuristic for perceptron with greedy or beam search • updates on prefixes rather than full sequences • works much better than standard update in practice, but... • two major problems for early update • there is no theoretical justification -- why does it work? • it learns too slowly (due to partial examples); e.g. 40 epochs • we’ll solve problems in a much larger framework 4

  16. Our Contributions w greedy z x early update on or beam prefixes y’, z’ y • theory: a framework for perceptron w/ inexact search • explains early update (and others) as a special case • practice: new update methods within the framework • converges faster and better than early update • real impact on state-of-the-art parsing and tagging • more advantageous when search error is severer 5

  17. In this talk... • Motivations: Structured Learning and Search Efficiency • Structured Perceptron and Inexact Search • perceptron does not converge with inexact search • early update (Collins/Roark ’04) seems to help; but why? • New Perceptron Framework for Inexact Search • explains early update as a special case • convergence theory with arbitrarily inexact search • new update methods within this framework • Experiments 6

  18. Structured Perceptron (Collins 02) • simple generalization from binary/multiclass perceptron • online learning: for each example (x, y) in data • inference: find the best output z given current weight w • update weights when if y ≠ z x x w exact z x update weights inference if y ≠ z y y=+ 1 y=- 1 7

  19. Structured Perceptron (Collins 02) • simple generalization from binary/multiclass perceptron • online learning: for each example (x, y) in data • inference: find the best output z given current weight w • update weights when if y ≠ z x x w exact z x update weights inference if y ≠ z y y=+ 1 y=- 1 the man bit the dog x DT NN VBD DT NN y 7

  20. Structured Perceptron (Collins 02) • simple generalization from binary/multiclass perceptron • online learning: for each example (x, y) in data • inference: find the best output z given current weight w • update weights when if y ≠ z x x w exact z x update weights inference if y ≠ z y y=+ 1 y=- 1 w the man bit the dog x exact z x update weights inference if y ≠ z DT NN VBD DT NN y y 7

  21. Structured Perceptron (Collins 02) • simple generalization from binary/multiclass perceptron • online learning: for each example (x, y) in data • inference: find the best output z given current weight w • update weights when if y ≠ z trivial x x w constant exact z x update weights classes inference if y ≠ z y y=+ 1 y=- 1 hard exponential w classes the man bit the dog x exact z x update weights inference if y ≠ z DT NN VBD DT NN y y 7

  22. Convergence with Exact Search • linear classification: converges iff. data is separable • structured: converges iff. data separable & search exact • there is an oracle vector that correctly labels all examples • one vs the rest (correct label better than all incorrect labels) • theorem: if separable, then # of updates ≤ R 2 / δ 2 R: diameter x 100 y 100 x 100 x 111 δ x 3012 x 2000 Rosenblatt => Collins z ≠ y 100 y=- 1 y=+ 1 1957 2002 8

  23. Convergence with Exact Search • linear classification: converges iff. data is separable • structured: converges iff. data separable & search exact • there is an oracle vector that correctly labels all examples • one vs the rest (correct label better than all incorrect labels) • theorem: if separable, then # of updates ≤ R 2 / δ 2 R: diameter x 100 R: diameter y 100 x 100 x 111 δ x 3012 x 2000 Rosenblatt => Collins z ≠ y 100 y=- 1 y=+ 1 1957 2002 8

  24. Convergence with Exact Search • linear classification: converges iff. data is separable • structured: converges iff. data separable & search exact • there is an oracle vector that correctly labels all examples • one vs the rest (correct label better than all incorrect labels) • theorem: if separable, then # of updates ≤ R 2 / δ 2 R: diameter x 100 R: diameter y 100 x 100 x 111 δ δ x 3012 x 2000 Rosenblatt => Collins z ≠ y 100 y=- 1 y=+ 1 1957 2002 8

  25. Convergence with Exact Search • linear classification: converges iff. data is separable • structured: converges iff. data separable & search exact • there is an oracle vector that correctly labels all examples • one vs the rest (correct label better than all incorrect labels) • theorem: if separable, then # of updates ≤ R 2 / δ 2 R: diameter x 100 R: diameter R: diameter y 100 x 100 x 111 δ δ x 3012 x 2000 Rosenblatt => Collins z ≠ y 100 y=- 1 y=+ 1 1957 2002 8

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend