training strategies
play

Training Strategies CS 6355: Structured Prediction 1 So far we saw - PowerPoint PPT Presentation

Training Strategies CS 6355: Structured Prediction 1 So far we saw What is structured output prediction? Different ways for modeling structured prediction Conditional random fields, factor graphs, constraints What we only


  1. Training Strategies CS 6355: Structured Prediction 1

  2. So far we saw • What is structured output prediction? • Different ways for modeling structured prediction – Conditional random fields, factor graphs, constraints • What we only occasionally touched upon: – Algorithms for training and inference • Viterbi (inference in sequences) • Structured perceptron (training in general) 2

  3. Rest of the semester • Strategies for training – Structural SVM – Stochastic gradient descent – More on local vs. global training • Algorithms for inference – Exact inference – “Approximate” inference – Formulating inference problems in general • Latent/hidden variables, representations and such 3

  4. Up next • Structural Support Vector Machine – How it naturally extends multiclass SVM • Empirical Risk Minimization – Or: how structural SVM and CRF are solving very similar problems • Training Structural SVM via stochastic gradient descent – And some tricks 4

  5. Where are we? • Structural Support Vector Machine – How it naturally extends multiclass SVM • Empirical Risk Minimization – Or: how structural SVM and CRF are solving very similar problems • Training Structural SVM via stochastic gradient descent – And some tricks 5

  6. Recall: Binary and Multiclass SVM • Binary SVM – Maximize margin – Equivalently, Minimize norm of weights such that the closest points to the hyperplane have a score ± 1 • Multiclass SVM – Each label has a different weight vector (like one-vs-all) – Maximize multiclass margin – Equivalently, Minimize total norm of the weights such that the true label is scored at least 1 more than the second best one 6

  7. Multiclass SVM in the separable case We have a data set D = {< x i , y i >} Recall hard binary SVM 7

  8. Multiclass SVM in the separable case Size of the weights. We have a data set D = {< x i , y i >} Recall hard binary SVM Effectively, regularizer The score for the true label is higher than the score for any other label by 1 8

  9. � Structural SVM: First attempt Suppose we have some definition of a structure (a factor graph) – And feature definitions for each “part” 𝑞 as Φ 𝑞 (𝐲, 𝐳 𝑞 ) – Remember: we can talk about the feature vector for the entire structure • Features sum over the parts Φ 𝐲, 𝐳 = ) Φ * 𝐲, 𝐳 * *∈-./01 𝐲 9

  10. � Structural SVM: First attempt Suppose we have some definition of a structure (a factor graph) – And feature definitions for each “part” 𝑞 as Φ 𝑞 (𝐲, 𝐳 𝑞 ) – Remember: we can talk about the feature vector for the entire structure • Features sum over the parts Φ 𝐲, 𝐳 = ) Φ * 𝐲, 𝐳 * *∈-./01 𝐲 We also have a data set 𝐸 = {(𝐲 4 , 𝐳 4 )} 10

  11. � Structural SVM: First attempt Suppose we have some definition of a structure (a factor graph) – And feature definitions for each “part” 𝑞 as Φ 𝑞 (𝐲, 𝐳 𝑞 ) – Remember: we can talk about the feature vector for the entire structure • Features sum over the parts Φ 𝐲, 𝐳 = ) Φ 𝑞 𝐲, 𝐳 𝑞 𝑞∈parts 𝐲 We also have a data set 𝐸 = {(𝐲 𝑗 , 𝐳 𝑗 )} What we want from training (following the multiclass idea) For each training example (𝐲 4 , 𝐳 4 ) : The annotated structure 𝐳 4 gets the highest score among all structures – Or to be safe, 𝐳 4 gets a score that is at least one more than all other structures – 𝐱 ? Φ 𝐲 4 , 𝐳 4 ≥ 𝐱 ? Φ 𝐲 4 , 𝐳 + 1 ∀𝐳 ≠ 𝐳 4 , 11

  12. � Structural SVM: First attempt Suppose we have some definition of a structure (a factor graph) – And feature definitions for each “part” 𝑞 as Φ 𝑞 (𝐲, 𝐳 𝑞 ) – Remember: we can talk about the feature vector for the entire structure • Features sum over the parts Φ 𝐲, 𝐳 = ) Φ 𝑞 𝐲, 𝐳 𝑞 𝑞∈parts 𝐲 We also have a data set 𝐸 = {(𝐲 𝑗 , 𝐳 𝑗 )} What we want from training (following the multiclass idea) For each training example (𝐲 4 , 𝐳 4 ) : The annotated structure 𝐳 4 gets the highest score among all structures – Or to be safe, 𝐳 4 gets a score that is at least one more than all other structures – 𝐱 ? Φ 𝐲 4 , 𝐳 4 ≥ 𝐱 ? Φ 𝐲 4 , 𝐳 + 1 ∀𝐳 ≠ 𝐳 4 , 12

  13. � Structural SVM: First attempt Suppose we have some definition of a structure (a factor graph) – And feature definitions for each “part” 𝑞 as Φ 𝑞 (𝐲, 𝐳 𝑞 ) – Remember: we can talk about the feature vector for the entire structure • Features sum over the parts Φ 𝐲, 𝐳 = ) Φ 𝑞 𝐲, 𝐳 𝑞 𝑞∈parts 𝐲 We also have a data set 𝐸 = {(𝐲 𝑗 , 𝐳 𝑗 )} What we want from training (following the multiclass idea) For each training example (𝐲 4 , 𝐳 4 ) : The annotated structure 𝐳 4 gets the highest score among all structures – Or to be safe, 𝐳 4 gets a score that is at least one more than all other structures – 𝐱 ? Φ 𝐲 4 , 𝐳 4 ≥ 𝐱 ? Φ 𝐲 4 , 𝐳 + 1 ∀𝐳 ≠ 𝐳 4 , 13

  14. � Structural SVM: First attempt Suppose we have some definition of a structure (a factor graph) – And feature definitions for each “part” 𝑞 as Φ 𝑞 (𝐲, 𝐳 𝑞 ) – Remember: we can talk about the feature vector for the entire structure • Features sum over the parts Φ 𝐲, 𝐳 = ) Φ 𝑞 𝐲, 𝐳 𝑞 𝑞∈parts 𝐲 We also have a data set 𝐸 = {(𝐲 𝑗 , 𝐳 𝑗 )} What we want from training (following the multiclass idea) For each training example (𝐲 4 , 𝐳 4 ) : The annotated structure 𝐳 4 gets the highest score among all structures – Or to be safe, 𝐳 4 gets a score that is at least one more than all other structures – 𝐱 ? Φ 𝐲 4 , 𝐳 4 ≥ 𝐱 ? Φ 𝐲 4 , 𝐳 + 1 ∀𝐳 ≠ 𝐳 4 , 14

  15. � Structural SVM: First attempt Suppose we have some definition of a structure (a factor graph) – And feature definitions for each “part” 𝑞 as Φ 𝑞 (𝐲, 𝐳 𝑞 ) – Remember: we can talk about the feature vector for the entire structure • Features sum over the parts Φ 𝐲, 𝐳 = ) Φ 𝑞 𝐲, 𝐳 𝑞 𝑞∈parts 𝐲 We also have a data set 𝐸 = {(𝐲 𝑗 , 𝐳 𝑗 )} What we want from training (following the multiclass idea) For each training example (𝐲 4 , 𝐳 4 ) : The annotated structure 𝐳 4 gets the highest score among all structures – Or to be safe, 𝐳 4 gets a score that is at least one more than all other structures – 𝐱 ? Φ 𝐲 4 , 𝐳 4 ≥ 𝐱 ? Φ 𝐲 4 , 𝐳 + 1 ∀𝐳 ≠ 𝐳 4 , 15

  16. Structural SVM: First attempt Maximize margin For every Score for gold Score for other training example structure structure Some other structure 16

  17. Structural SVM: First attempt Maximize margin Input with gold Some other Score for gold Score for other structure structure 17

  18. Structural SVM: First attempt Maximize margin by minimizing norm of w Input with gold Some other Score for gold Score for other structure structure 18

  19. Structural SVM: First attempt Maximize margin by minimizing norm of w Input with gold Some other Score for gold Score for other structure structure Problem? 19

  20. Structural SVM: First attempt Maximize margin by minimizing norm of w Input with gold Some other Score for gold Score for other structure structure Problem Gold structure 20

  21. Structural SVM: First attempt Maximize margin by minimizing norm of w Input with gold Some other Score for gold Score for other structure structure Problem Gold structure Other structure A: Only one mistake Other structure B: Fully incorrect 21

  22. Structural SVM: First attempt Maximize margin by minimizing norm of w Input with gold Some other Score for gold Score for other structure structure Problem Gold structure Other structure A: Only one mistake Structure B has is more wrong, but this formulation will be happy if both A & B are scored one less than gold! Other structure B: Fully incorrect No partial credit! 22

  23. Structural SVM: Second attempt Maximize margin by minimizing norm of w Input with gold Some other Score for gold Score for other structure structure 23

  24. Structural SVM: Second attempt Maximize margin by minimizing norm of w Input with gold Some other Score for gold Score for other structure structure Hamming distance between structures: Counts the number of differences between them 24

  25. Structural SVM: Second attempt Maximize margin by minimizing norm of w Input with gold Some other Score for gold Score for other structure structure 25

  26. Structural SVM: Second attempt Intuition • It is okay for a structure that is close (in Hamming sense) to the true one to get a score that is close to the true structure • Structures that are very different from the true structure should get much lower scores 26

  27. Structural SVM: Second attempt Maximize margin by minimizing norm of w Intuition • It is okay for a structure that is close (in Hamming sense) to the true one to get a score that is close to the true structure • Structures that are very different from the true structure should get much lower scores 27

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend