na ve bayes maxent and neural models
play

Nave Bayes, Maxent and Neural Models CMSC 473/673 UMBC Some - PowerPoint PPT Presentation

Nave Bayes, Maxent and Neural Models CMSC 473/673 UMBC Some slides adapted from 3SLP Outline Recap: classification (MAP vs. noisy channel) & evaluation Nave Bayes (NB) classification Terminology: bag-of-words Nave assumption


  1. We need to score the different combinations. Three people have been A TTACK fatally shot, and five people, including a mayor, were seriously wounded as a result of a Shining Path attack today against a community in Junin department, central Peruvian mountain region.

  2. Score and Combine Our Possibilities score 1 (fatally shot, A TTACK ) C OMBINE posterior score 2 (seriously wounded, A TTACK ) probability of score 3 (Shining Path, A TTACK ) A TTACK … score k (department, A TTACK ) … are all of these uncorrelated?

  3. Score and Combine Our Possibilities score 1 (fatally shot, A TTACK ) C OMBINE posterior score 2 (seriously wounded, A TTACK ) probability of score 3 (Shining Path, A TTACK ) A TTACK … Q: What are the score and combine functions for Naïve Bayes?

  4. Scoring Our Possibilities Three people have been fatally shot, and five people, including a score( , ) = mayor, were seriously wounded as A TTACK a result of a Shining Path attack today against a community in Junin department, central Peruvian mountain region . score 1 (fatally shot, A TTACK ) score 2 (seriously wounded, A TTACK ) score 3 (Shining Path, A TTACK ) …

  5. https://www.csee.umbc.edu/courses/undergraduate/473/f18/loglin-tutorial/ https://goo.gl/BQCdH9 Lesson 1

  6. ) ∝ Maxent Modeling Three people have been fatally shot, and five people, including p( | a mayor, were seriously A TTACK wounded as a result of a Shining Path attack today against a community in Junin department, central Peruvian mountain region . Three people have been fatally S NAP ( score( , ) ) shot, and five people, including a mayor, were seriously wounded A TTACK as a result of a Shining Path attack today against a community in Junin department, central Peruvian mountain region .

  7. What function… operates on any real number? is never less than 0?

  8. What function… operates on any real number? is never less than 0? f(x) = exp(x)

  9. ) ∝ Maxent Modeling Three people have been fatally shot, and five people, including p( | a mayor, were seriously A TTACK wounded as a result of a Shining Path attack today against a community in Junin department, central Peruvian mountain region . Three people have been fatally exp ( score( , ) ) shot, and five people, including a mayor, were seriously wounded A TTACK as a result of a Shining Path attack today against a community in Junin department, central Peruvian mountain region .

  10. ) ∝ Maxent Modeling Three people have been fatally shot, and five people, including p( | a mayor, were seriously A TTACK wounded as a result of a Shining Path attack today against a community in Junin department, central Peruvian mountain region . exp ( ) ) score 1 (fatally shot, A TTACK ) score 2 (seriously wounded, A TTACK ) score 3 (Shining Path, A TTACK ) …

  11. ) ∝ Maxent Modeling Three people have been fatally shot, and five people, including p( | a mayor, were seriously A TTACK wounded as a result of a Shining Path attack today against a community in Junin department, central Peruvian mountain region . exp ( ) ) score 1 (fatally shot, A TTACK ) score 2 (seriously wounded, A TTACK ) score 3 (Shining Path, A TTACK ) … Learn the scores (but we’ll declare what combinations should be looked at)

  12. ) ∝ Maxent Modeling Three people have been fatally shot, and five people, including p( | a mayor, were seriously A TTACK wounded as a result of a Shining Path attack today against a community in Junin department, central Peruvian mountain region . exp ( ) ) weight 1 * occurs 1 (fatally shot, A TTACK ) weight 2 * occurs 2 (seriously wounded, A TTACK ) weight 3 * occurs 3 (Shining Path, A TTACK ) …

  13. ) ∝ Maxent Modeling: Feature Functions p( | Three people have been fatally shot, and five A TTACK people, including a mayor, were seriously wounded as a result of a Shining Path attack today against a community in Junindepartment, central Peruvian mountain region . weight 1 * occurs 1 (fatally shot, A TTACK ) exp ( ) ) weight 2 * occurs 2 (seriously wounded, A TTACK ) weight 3 * occurs 3 (Shining Path, A TTACK ) … Feature functions help occurs target,type fatally shot,ATTACK = extract useful features (characteristics) of the ቊ 1, target == fatally shot and type == ATTACK data 0, otherwise Generally templated Often binary-valued (0 binary or 1), but can be real- valued

  14. More on Feature Functions Feature functions help extract useful features (characteristics) of the data Generally templated Often binary-valued (0 or 1), but can be real-valued occurs target,type fatally shot, ATTACK = occurs target,type fatally shot, ATTACK = ቊ 1, target == fatally shot and type == ATTACK log 𝑞 fatally shot ATTACK) + log 𝑞 type ATTACK) 0, otherwise + log 𝑞(ATTACK |type) binary Templated real- valued occurs fatally shot, ATTACK = log 𝑞 fatally shot ATTACK) ??? Non-templated Non-templated real-valued count-valued

  15. More on Feature Functions Feature functions help extract useful features (characteristics) of the data Generally templated Often binary-valued (0 or 1), but can be real-valued occurs target,type fatally shot, ATTACK = occurs target,type fatally shot, ATTACK = ቊ 1, target == fatally shot and type == ATTACK log 𝑞 fatally shot ATTACK) + log 𝑞 type ATTACK) 0, otherwise + log 𝑞(ATTACK |type) binary Templated real- valued occurs fatally shot, ATTACK = occurs fatally shot, ATTACK = log 𝑞 fatally shot ATTACK) count fatally sho𝑢 ATTACK) Non-templated Non-templated real-valued count-valued

  16. Maxent Modeling Three people have been fatally shot, and five people, including p( | ) = a mayor, were seriously A TTACK wounded as a result of a Shining Path attack today against a community in Junin department, central Peruvian mountain region . Q: How do we define Z? exp ( ) ) 1 weight 1 * applies 1 (fatally shot, A TTACK ) weight 2 * applies 2 (seriously wounded, A TTACK ) Z weight 3 * applies 3 (Shining Path, A TTACK ) …

  17. Normalization for Classification Z = Σ exp ( weight 1 * occurs 1 (fatally shot, A TTACK ) ) weight 2 * occurs 2 (seriously wounded, A TTACK ) weight 3 * occurs 3 (Shining Path, A TTACK ) … label x 𝑞 𝑦 𝑧) ∝ exp(𝜄 ⋅ 𝑔 𝑦, 𝑧 ) classify doc y with label x in one go

  18. Normalization for Language Model general class-based (X) language model of doc y

  19. Normalization for Language Model general class-based (X) language model of doc y Can be significantly harder in the general case

  20. Normalization for Language Model general class-based (X) language model of doc y Can be significantly harder in the general case Simplifying assumption: maxent n-grams!

  21. Understanding Conditioning Is this a good language model?

  22. Understanding Conditioning Is this a good language model?

  23. Understanding Conditioning Is this a good language model? (no)

  24. Understanding Conditioning Is this a good posterior classifier? (no)

  25. https://www.csee.umbc.edu/courses/undergraduate/473/f18/loglin-tutorial/ https://goo.gl/BQCdH9 Lesson 11

  26. Outline Recap: classification (MAP vs. noisy channel) & evaluation Naïve Bayes (NB) classification Terminology: bag-of-words “Naïve” assumption Training & performance NB as a language Maximum Entropy classifiers Defining the model Defining the objective Learning: Optimizing the objective Math: gradient derivation Neural (language) models

  27. p θ (x | y ) probabilistic model objective (given observations)

  28. Objective = Full Likelihood? Differentiating this These values can have very product could be a pain small magnitude ➔ underflow

  29. Logarithms (0, 1] ➔ (- ∞, 0] Products ➔ Sums log(ab) = log(a) + log(b) log(a/b) = log(a) – log(b) Inverse of exp log(exp(x)) = x

  30. Log-Likelihood Wide range of (negative) numbers Sums are more stable Products ➔ Sums log(ab) = log(a) + log(b) Differentiating this log(a/b) = log(a) – log(b) becomes nicer (even though Z depends on θ )

  31. Log-Likelihood Wide range of (negative) numbers Sums are more stable Inverse of exp log(exp(x)) = x Differentiating this becomes nicer (even though Z depends on θ )

  32. Log-Likelihood Wide range of (negative) numbers Sums are more stable = 𝐺 𝜄

  33. Outline Recap: classification (MAP vs. noisy channel) & evaluation Naïve Bayes (NB) classification Terminology: bag-of-words “Naïve” assumption Training & performance NB as a language Maximum Entropy classifiers Defining the model Defining the objective Learning: Optimizing the objective Math: gradient derivation Neural (language) models

  34. How will we optimize F( θ )? Calculus

  35. F( θ ) θ

  36. F( θ ) θ θ *

  37. F( θ ) F’(θ ) derivative of F wrt θ θ θ *

  38. Example F(x) = -(x-2) 2 differentiate F’(x) = -2x + 4 Solve F’(x) = 0 x = 2

  39. Common Derivative Rules

  40. What if you can’t find the roots? Follow the derivative F( θ ) F’(θ ) derivative of F wrt θ θ θ *

  41. What if you can’t find the roots? Follow the derivative Set t = 0 F( θ ) F’(θ ) Pick a starting value θ t y 0 derivative Until converged: of F wrt θ 1. Get value y t = F( θ t ) θ θ 0 θ *

  42. What if you can’t find the roots? Follow the derivative Set t = 0 F( θ ) F’(θ ) Pick a starting value θ t y 0 derivative Until converged: of F wrt θ 1. Get value y t = F( θ t ) 2. Get derivative g t = F’(θ t ) g 0 θ θ 0 θ *

  43. What if you can’t find the roots? Follow the derivative Set t = 0 F( θ ) F’(θ ) Pick a starting value θ t y 0 derivative Until converged: of F wrt θ 1. Get value y t = F( θ t ) 2. Get derivative g t = F’(θ t ) 3. Get scaling factor ρ t 4. Set θ t+1 = θ t + ρ t *g t g 0 5. Set t += 1 θ θ 0 θ 1 θ *

  44. What if you can’t find the roots? Follow the derivative y 1 Set t = 0 F( θ ) F’(θ ) Pick a starting value θ t y 0 derivative Until converged: of F wrt θ 1. Get value y t = F( θ t ) 2. Get derivative g t = F’(θ t ) 3. Get scaling factor ρ t 4. Set θ t+1 = θ t + ρ t *g t g 0 g 1 5. Set t += 1 θ θ 0 θ 2 θ 1 θ *

  45. What if you can’t find the roots? Follow the derivative y 3 y 2 y 1 Set t = 0 F( θ ) F’(θ ) Pick a starting value θ t y 0 derivative Until converged: of F wrt θ 1. Get value y t = F( θ t ) 2. Get derivative g t = F’(θ t ) 3. Get scaling factor ρ t 4. Set θ t+1 = θ t + ρ t *g t g 0 g 1 5. Set t += 1 g 2 θ θ 0 θ 2 θ 3 θ 1 θ *

  46. What if you can’t find the roots? Follow the derivative y 3 y 2 y 1 Set t = 0 F( θ ) F’(θ ) Pick a starting value θ t y 0 derivative Until converged : of F wrt θ 1. Get value y t = F( θ t ) 2. Get derivative g t = F’(θ t ) 3. Get scaling factor ρ t g 0 4. Set θ t+1 = θ t + ρ t *g t g 1 g 2 θ 5. Set t += 1 θ 0 θ 2 θ 3 θ 1 θ *

  47. Gradient = Multi-variable derivative K-dimensional input K-dimensional output

  48. Gradient Ascent

  49. Gradient Ascent

  50. Gradient Ascent

  51. Gradient Ascent

  52. Gradient Ascent

  53. Gradient Ascent

  54. Outline Recap: classification (MAP vs. noisy channel) & evaluation Naïve Bayes (NB) classification Terminology: bag-of-words “Naïve” assumption Training & performance NB as a language Maximum Entropy classifiers Defining the model Defining the objective Learning: Optimizing the objective Math: gradient derivation Neural (language) models

  55. Expectations number of pieces of candy 1 2 3 4 5 6 1/6 * 1 + 1/6 * 2 + 1/6 * 3 + = 3.5 1/6 * 4 + 1/6 * 5 + 1/6 * 6

  56. Expectations number of pieces of candy 1 2 3 4 5 6 1/2 * 1 + 1/10 * 2 + 1/10 * 3 + = 2.5 1/10 * 4 + 1/10 * 5 + 1/10 * 6

  57. Expectations number of pieces of candy 1 2 3 4 5 6 1/2 * 1 + 1/10 * 2 + 1/10 * 3 + = 2.5 1/10 * 4 + 1/10 * 5 + 1/10 * 6

  58. Expectations number of pieces of candy 1 2 3 4 5 6 1/2 * 1 + 1/10 * 2 + 1/10 * 3 + = 2.5 1/10 * 4 + 1/10 * 5 + 1/10 * 6

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend