Maxent Models, Conditional Estimation, and Optimization Without - - PowerPoint PPT Presentation

maxent models conditional estimation and optimization
SMART_READER_LITE
LIVE PREVIEW

Maxent Models, Conditional Estimation, and Optimization Without - - PowerPoint PPT Presentation

Maxent Models, Conditional Estimation, and Optimization Without Magic That is, With Math! Dan Klein and Chris Manning Stanford University http://nlp.stanford.edu/ HLT-NAACL 2003 and ACL 2003 Tutorial Introduction


slide-1
SLIDE 1

Maxent Models, Conditional Estimation, and Optimization

Dan Klein and Chris Manning Stanford University http://nlp.stanford.edu/

HLT-NAACL 2003 and ACL 2003 Tutorial

Without Magic

That is, With Math!

slide-2
SLIDE 2

Introduction

  • In recent years there has been extensive use
  • f conditional or discriminative probabilistic

models in NLP, IR, and Speech

  • Because:

They give high accuracy performance

They make it easy to incorporate lots of linguistically important features

They allow automatic building of language independent, retargetable NLP modules

slide-3
SLIDE 3

Joint vs. Conditional models

  • Joint (generative) models place probabilities over

both observed data and the hidden stuff (gene- rate the observed data from hidden stuff):

All the best known StatNLP models:

✂ ✄✆☎ ✝✞ ✟ ✠ ✠✡ ☛✌☞ ✍✌✎✑✏ ✒ ✟ ✓✕✔ ☞ ✖ ✟ ✗ ☞ ✎ ✘ ✍ ✟ ✎ ✎ ✙ ✚ ✙✛☞ ✞ ✎✑✏ ✜ ✙ ☛ ☛ ☞ ✢ ✣ ✟ ✞ ✤ ✡ ✔ ✠ ✡ ☛✌☞ ✍✌✎✑✏ ✥ ✞ ✡ ✦ ✟ ✦ ✙ ✍ ✙ ✎ ✧ ✙ ✘ ✘ ✡ ✢ ✧ ☞ ★ ✧ ☎ ✚ ✞ ☞ ☞ ✝✞ ✟ ✠ ✠ ✟ ✞ ✎ ✩

Discriminative (conditional) models take the data as given, and put a probability over

hidden structure given the data:

✪ ✫✛✬ ✭ ✮✛✯ ✰ ✮✛✱ ✲✳ ✭ ✲✳ ✯ ✯ ✮ ✬ ✴✑✵ ✱ ✬ ✴ ✶ ✮ ✰ ✮ ✬ ✴✷ ✸ ✸ ✬ ✭ ✸ ✮ ✴ ✳ ✷ ✲ ✹ ✬ ✶ ✳ ✸ ✯ ✵ ✹ ✷ ✺ ✮ ✹ ✻ ✹ ✳ ✴ ✰ ✲ ✬ ✼ ✽ ✹ ✷ ✲ ✾ ✬ ✿ ✹ ✬ ✶ ✳ ✸ ✯ ✵ ❀ ❁ ❂❃ ✯ ✵ ✼ ✳ ✲ ✱ ✳ ✼ ✰ ✲ ✬ ✴ ✯ ❄

P(c,d) P(c|d)

slide-4
SLIDE 4

Bayes Net/Graphical models

  • Bayes net diagrams draw circles for random

variables, and lines for direct dependencies

  • Some variables are observed; some are hidden
  • Each node is a little classifier (conditional

probability table) based on incoming arcs c1 c2 c3 d1 d2 d3

✁ ✂ ✂

c

d1 d 2 d 3

✄✆☎ ✝✟✞ ✠ ✡ ☎ ☛ ✠☞

c

d1 d2 d3 Generative

✌✆✍ ✎ ✏ ☞ ✑ ✏✆✒ ✓ ✠ ✎✔ ✠ ☞ ☞ ✏ ✍ ✕

Discriminative

slide-5
SLIDE 5

Conditional models work well: Word Sense Disambiguation

  • ✁✄✂
☎✆ ✝ ✞ ✟ ✠ ✡ ☛ ☞✌ ✍ ✎✑✏ ✍ ✒ ✡ ✓ ☞✔ ✡ ✕ ✡ ☞ ✍✗✖ ✘ ✡ ✓ ✙ ✚ ✠✜✛ ✆ ✢ ✞ ✆ ✢ ✣✥✤✦ ✧ ★ ✦ ✞ ✆ ✟ ✟ ✦ ✚ ✦ ✆ ✩ ✞ ✟ ✞ ✦ ✆ ✛ ✪ ☎✫ ✟ ✞ ✧ ✛ ✟ ✞ ✦ ✆ ✞ ✆ ✚ ✤ ☎ ✛ ✫ ☎ ✫ ✬ ☎ ✤ ✣ ✦ ✤ ✧ ✛ ✆ ✚ ☎
✠✜✛ ✟ ✞ ✫ ✙ ✝ ☎ ✮ ✫ ☎ ✟ ✠ ☎ ✫ ✛ ✧ ☎ ✫ ✧ ✦ ✦ ✟ ✠ ✞ ✆ ✢ ✙ ✛ ✆ ✩ ✟ ✠ ☎ ✫ ✛ ✧ ☎ ✯ ✰ ✘ ✱✳✲ ✌ ✎ ☞ ✓ ✓ ✣ ☎ ✛ ✟ ✮ ✤ ☎ ✫ ✙ ✝ ☎ ★ ✮ ✫ ✟ ✚ ✠✜✛ ✆ ✢ ☎ ✟ ✠ ☎ ✆ ✮ ✧ ✴ ☎ ✤ ✫ ✵ ✬ ✛ ✤ ✛ ✧ ☎ ✟ ☎ ✤ ✫ ✶ ✭ ✤ ✛ ✞ ✆ ✞ ✆ ✢ ✷ ☎ ✟ ✸✹ ✺ ✻ ✼✾✽ ✿ ❀ ✺ ❁❂ ❃✜❄ ✺ ✹ ❅ ✺ ✹ ❆ ✽ ❂ ✿ ❇ ❁❂ ❃✜❄ ✺ ❈❊❉ ❉ ❋
❉ ■ ❏ ❑ ▲ ❄ ❉ ❇ ❂✄▼ ❄ ◆ ❄❖ ❇ P ❄ ❇ ◗ ❅ ✺ ❘ ✼ ✽ ✿ ❀ ✺ ❁ ❂ ❃✜❄ ✺ ◗ ❙ ✺ ❅ ❆ ✽ ❂ ✿ ❇ ❁ ❂ ❃✜❄ ✺ ❈ ❉ ❉ ❋
❉ ■ ❏ ❑ ▲ ❄ ❉ ❇ ❂✄▼ ❄ ❚ ❯ ❱❳❲ ❨❬❩ ❭ ❩ ❪ ❫ ❭ ❩ ❩ ❨❬❩ ❴ ❵❛ ❛ ❵❝❜ ❞ ❡ ❨ ❩ ❴ ❢ ❲ ❩ ❡ ❲ ❣ ❭ ❱✐❤ ❥ ❦ ❭ ❧ ❭ ♠
slide-6
SLIDE 6

Overview: HLT Systems

  • Typical Speech/NLP problems involve

complex structures (sequences, pipelines, trees, feature structures, signals)

  • Models are decomposed into individual local

decision making locations

  • Combining them together is the global

inference problem

✁✄✂ ☎ ✆ ✂ ✝✞ ✂ ✟✡✠ ☛ ✠ ✁✄✂ ☎ ✆ ✂ ✝✞ ✂ ☞✡✌ ✍ ✂ ✎ ✏ ✌ ✑ ✒ ✓ ✝ ✂ ✎ ✓ ☛ ☛ ✎ ✂ ✑✌ ✍ ✂ ✎✕✔ ☛ ✌ ✖ ✂ ☛ ✗ ✂ ✘ ✙ ✓ ✠ ✓ ✝ ✚ ✂ ✘ ✂ ✝ ✞ ✂
slide-7
SLIDE 7

Overview: The local level

Sequence Level Local Level

  • ✁✂
✄ ☎ ✆ ✄ ✝ ✄ ✞✠✟ ✄ ✝ ✡ ☛ ✟ ☞✍✌ ✝ ☛ ✄ ✂ ✝ ✎ ✁✏ ✑✓✒ ✔ ✕ ✖ ✗ ✒ ✘ ✙ ✔ ✚ ✒ ✛ ✜✣✢ ✝ ✎ ✤ ✎✠✥ ✄ ✝ ✎ ✁✏ ✦ ✤ ✁ ✁ ✝ ✧ ✎ ✏ ★ ✩ ☎ ✄✪ ✪ ✎ ✫ ✎ ✟ ☛ ✬✍✭ ✢ ✟ ✮ ✯✰ ✱ ✲✳ ✯✴ ✵ ✰ ✶ ✯ ✷ ✦ ✟ ✸ ✡ ✟ ✏ ✂ ✟ ✆ ✄ ✝ ✄ ✹ ✄ ✌ ✎ ✤ ✡ ✤ ☞ ✏ ✝ ☛ ✁ ✢ ✭ ✹ ✁ ✺ ✟ ☎ ✪ ✻✽✼ ✾ ✿❁❀ ✾ ❂ ❃❅❄ ❆✠❇ ❈ ✾ ❉ ❂ ❃ ❇ ❊ ❋✣● ❈ ❍ ✼ ■ ✾ ❂ ❇ ❏ ❀ ✾ ✿ ❃ ❇ ❈ ❂ ❑ ❇ ▲ ✼ ❇ ❈❄ ❇ ▼
❇ ❉ ◆❖ ❆P ❊ ❊ ✼ ❇ ❊ P ❈ ◗ ❇ ❀ ❇ ❈❄ ❇ ❖
✾ ❉ ❘ ✾ ❂ ✾ ❖
✾ ❉ ❘ ✾ ❂ ✾
slide-8
SLIDE 8

Tutorial Plan

  • 1. Exponential/Maximum entropy models
  • 2. Optimization methods
  • 3. Linguistic issues in using these models
slide-9
SLIDE 9

Part I: Maximum Entropy Models

  • a. Examples of Feature-Based Modeling
  • b. Exponential Models for Classification
  • c. Maximum Entropy Models
  • d. Smoothing
✂✁ ✄ ☎ ✆ ✆✞✝ ✟ ✁ ✠ ✡ ✁ ✠ ✁☛ ☞ ✌ ☞✍ ✎ ✁ ✏ ✠ ✑ ☞✒ ✓ ✁ ✆ ✟✕✔ ✖ ✝ ✠ ✄ ☎ ✆ ✆ ☎ ✏ ✠ ☛ ✒ ✓ ✝ ✗ ✁ ✠ ✡ ✁ ☞ ✍ ✟ ✆ ✒ ✘ ✆ ☎ ✏ ✁ ✍ ☛ ✒ ☛ ✁ ✎ ✙ ✒ ✏ ✁ ✏ ✠ ☎ ✍ ✆ ☞✒ ✓ ✁ ✆ ✟✕✔ ✓ ✁ ✚ ✁☛ ☛ ☎ ✏ ✘ ✠ ✡ ✁ ☎ ✏ ✠ ✁☛ ✙ ☛ ✁ ✠ ✍ ✠ ☎ ✒ ✏ ✍ ✟ ✌ ☞ ✍ ✎ ☎ ☞ ✝ ☞ ✁ ✏ ✠ ☛ ✒ ✙ ✛ ☞✒ ✓ ✁ ✆ ✟ ✑ ✝ ✏ ✠ ☎ ✆ ✆ ✍ ✠ ✁☛ ✜
slide-10
SLIDE 10

Features

  • In this tutorial and most maxent work:

features are elementary pieces of evidence that link aspects of what we observe d with a category c that we want to predict.

  • A feature has a real value: f: C × D → R
  • Usually features are indicator functions of

properties of the input and a particular class (every one we present is). They pick out a subset.

fi(c, d) ≡ ≡ ≡ ≡ [Φ(d) ∧ ∧ ∧ ∧ c = ci]

[Value is 0 or 1]

We will freely say that Φ(d) is a feature of the data

d, when, for each ci, the conjunction Φ(d) ∧

∧ ∧ ∧ c = ci is a feature of the data-class pair (c, d).

slide-11
SLIDE 11

Features

  • For example:

f1(c, d) ≡ [c= “NN” ∧ ∧ ∧ ∧ islower(w0) ∧ ∧ ∧ ∧ ends(w0, “d”)]

f2(c, d) ≡ [c = “NN” ∧ ∧ ∧ ∧ w-1 = “to” ∧ ∧ ∧ ∧ t-1 = “TO”]

f3(c, d) ≡ [c = “VB” ∧ ∧ ∧ ∧ islower(w0)]

  • Models will assign each feature a weight
  • Empirical count (expectation) of a feature:
  • Model expectation of a feature:
✂ ✄ ☎ ☎ ✆ ✝ ✞ ✟ ✠ ✡ ☎ ☛ ☛ ✟✌☞ ✍✎ ✏ ✑ ✂ ✄ ✒✓ ✆ ✝ ✞ ✟ ✠ ✡ ☎ ☎ ☎ ✟✌☞ ✍ ✑ ✠

=

) , (

  • bserved

) , (

) , ( ) ( empirical

D C d c i i

d c f f E

=

) , ( ) , (

) , ( ) , ( ) (

D C d c i i

d c f d c P f E

slide-12
SLIDE 12

Feature-Based Models

  • The decision about a data point is based
  • nly on the features active at that point.
✁ ✂ ✄ ☎ ✆✝ ✄ ✄✟✞ ✄ ✠ ✡☛ ☞✍✌ ✎ ✏ ✠ ✑ ✒ ✓ ✑✔ ✕ ✒ ✕ ✡ ✖ ✗ ✘ ✑ ✠ ✑ ✙ ✓ ✑ ✠ ✚ ✔ ✓ ✌ ✛ ✗ ✜ ✌ ✠ ✡ ☛ ☞ ✌ ✜ ✎ ✏ ✠ ✜ ✑ ✜ ✒ ✓ ✑✔ ✕ ✒ ✜ ✕ ✡ ✖ ✜ ✗ ✢ ✣ ✑ ✤ ✓ ✕ ✁ ✂ ✄ ☎ ✆✝ ✄ ✄ ✥ ✓ ✦ ✠ ✧ ✑ ✠ ✓★ ✡ ✔ ✏✪✩ ✑ ✠ ✏ ✡✫ ✗ ✠ ✡ ✔ ✓ ✌ ✠ ✔ ✚ ☛ ✠ ✚ ✔ ✓ ✤ ✑ ✫ ☞ ✞ ✬ ✭ ✆✝ ✮ ✯ ✓ ✤✰✠✲✱ ✘ ✑ ✠ ✑ ✙ ✓ ✑ ✠ ✚ ✔ ✓ ✌ ✛ ✗ ✜ ✳✵✴ ✔ ✓ ✌ ✠ ✔ ✚ ☛ ✠ ✚ ✔ ✓ ✜ ✆ ✴ ✯ ✓ ✤✰✠ ✜ ✣✵✴ ✶ ✷ ✜ ✗ ✢ ✣ ✑ ✤ ✓ ✕ ✬ ✭ ✆ ✝ ✮ ✸ ✡ ✔ ✯✺✹ ✄ ✓ ✫ ✌ ✓ ✘ ✏ ✌ ✑✻ ✤ ✏ ★ ✚ ✑ ✠ ✏ ✡✫ ✘ ✥ ✼ ✼ ✆ ✆ ✗ ✽ ✾✍✿ ❀❁ ✿ ❂ ❃❅❄ ❆ ❇ ❈❊❉ ❋ ❋
❉ ■ ❉ ❏ ✿ ❉ ■ ❆ ❁ ✿ ❇ ❑ ▲ ▼ ❈ ❉ ❋ ❋❖◆ P ✽ ▼ ◗ ◗ P ▲ ▼ ❀ ❁ ✿ ❂ ❃❅❄ ❆ ❇ ❘ ❙ ❉ ❚ ✿ ❋ ❯ ❯ P ❱❲ ✽ ❉❳ ❳ ❃❅❨ ❳
slide-13
SLIDE 13

Example: Text Categorization

✂☎✄ ✆✝ ✄ ✆ ✞ ✟ ✠☎✡ ☛ ☞✌ ✌ ✍ ✎ ✏

Features are a word in document and class (they do feature selection to use reliable indicators)

Tests on classic Reuters data set (and others)

Naïve Bayes: 77.0% F

✒ ✓

Linear regression: 86.0%

Logistic regression: 86.4%

Support vector machine: 86.5%

Emphasizes the importance of regularization (smoothing) for successful use of discriminative methods (not used in most early NLP/IR work)

slide-14
SLIDE 14

Example: NER

✂☎✄ ✆✞✝ ✄ ✟ ✠ ✂☛✡ ☞✌ ✌ ✍✏✎ ✠ ✂☎✑ ✒✔✓ ✕ ✒✖ ✟ ✗ ✘ ✆✞✙ ✚ ✛ ✜ ✜ ✜✢✓ ✄ ✟ ✙ ✡ ✣ ✤ ✥✧✦ ★ ✩ ✦ ✪✫ ✦ ✬✭ ✮ ✦ ✯ ✰ ✫ ✱ ✭ ✲ ✲ ✳ ✭ ✱ ✮ ✲ ✤ ✴ ✰ ✫ ✵ ✳ ✭ ✱ ✮ ✫ ✯ ✰ ✲ ✲ ✶ ✷ ✶ ✦ ✮ ✸✺✹ ✯ ✭ ✫ ✰ ✯ ✬ ✭ ✮ ✦ ✯ ✤ ✻ ✦ ✰ ✼ ✩ ✱ ✦ ✲ ✶ ✪ ✫ ✯ ✩ ✮ ✦ ✼ ✵ ✦ ✳ ✭ ✱ ✮✾✽ ✿ ✱ ✦ ❀ ✶ ✭ ✩ ✲ ✰ ✪ ✮ ✪ ✦ ❁ ✼ ✳ ✭ ✱ ✮ ✲ ✽ ✿ ✱ ✦ ❀ ✶ ✭ ✩ ✲ ✫ ✯ ✰ ✲ ✲ ✦ ✲ ✽ ✿ ✱ ✦ ❀ ✶ ✭ ✩ ✲ ✽ ✪ ✦ ❁ ✼ ✽ ✰ ✪ ✮ ✫ ✩ ✱ ✱ ✦ ✪ ✼ ❂ ❃ ✥ ✼ ✰❄ ✽ ✫ ✵ ✰ ✱ ✰ ✫ ✼ ✦ ✱❅ ❆ ❇❈ ❉❊ ❋✺● ❉ ❍ ■ ❈
❉❑ ▲ ▼ ◆P❖ ◗❘ ❙ ❋ ❚ ❙ ❈ ▲ ❯ ❱❳❲❨ ❩ ❬❭ ❪ ❲ ❫❴ ❵ ❪❜❛ ❝❞ ❞ ❡ ❢ ❲ ❵ ❩ ❣ ❤ ❲ ❨ ✐ ❥❦♠❧ ♥♦ ♣ qr s✧t ✉ ✈ ✉ ❧ ✇ ❦♠① ♥ ②P③ ④ ⑤ ③ ① ⑤ ① ③ ⑤ ⑥ ⑦ ③ ⑧ ⑨ t ⑧⑩ ❶ ✉❷ ③ ❷ t ⑩ ③ ① ⑨ ⑧ t ⑩ ❷ t ⑩ ❸ ❦ ✉ ❦ ✉ ❧ ⑩ ❶ ✉ ❹ ❦ ✉ ⑨ t ⑧ ⑩ ❶ ⑤ ❦ ④ ③ ⑨ ③ ❶ ⑤ ❺ ⑧ ③ ① ❻ ❼ ❽ ❦ ⑤ ♥ ① ⑩ t t ⑤ ♥ ❦ ✉ ❧ ❾ ⑧ ③ ❧ ❺ ✇ ❶ ⑧ ❦✧❿ ❶ ⑤ ❦ t ✉➁➀ ⑩ t ⑧ ③ ⑨ ③ ❶ ⑤ ❺ ⑧ ③ ① ✉ ③ ④ ③ ⑧ ♥ ❺ ⑧ ⑤ ➂ ➃☎➄ ➃☎➄ ➄ ➅ ➆✏➇ ➈ ➈➉ ➈ ➈➉ ➊ ➈ ➋❳➌ ➇ ➍❳➎ ➌ ➏ ➐♠➑ ➌➒ ➓ ➌ ➔ → ➎ ➑ ➏ ➣ ➣ ➣ ➣ ➣ ➣ ↔ ➔ ↕ ➓ ➑ ➙ ➛ ➌➜ ➜ ➈ ➓ ➄ ➔ ➙✧➝ ➑ ➉ ➑ ➓ ➞

Local Context

➟ ➓ ➒ ➆ ➜ ➆ ➎➠ ➉ ➎ ➆ ➠ ➔ ➡ ➅ ➔ ➌ ➔ ➓ ➢ ➎ ➑ ➤♠➥➦ ➧➨
slide-15
SLIDE 15

Example: NER

✂✁ ✄ ☎ ✂✁ ✆ ✝ ✞ ✟ ✠☛✡ ☞✍✌ ✎ ✏ ✑ ✒ ✓ ✒ ✎✕✔ ✖ ✗ ✌ ✑ ✘✍✙ ✂✁ ✄ ☎ ✟ ✂✁ ✆✚ ✡ ✟ ✠☛✡ ✟ ✠ ✡ ☞✍✌ ✎ ✏ ✟ ✖ ✗ ✌ ✟ ✛ ✎ ✡ ✒ ✑ ✘✍✙ ✜✣✢ ✤✥ ✦ ✧ ✢ ★ ✥ ✩✍✪ ✫ ✬ ✭✯✮ ✰ ✂✁ ✝✱ ✟ ✂✁ ✱
✟ ✡ ✟ ✠☛✡ ☞ ✁ ✑ ✒ ✓ ✒ ✎ ✟ ✲✳✟ ✖ ✗ ✌ ✑ ✘ ✙ ✂✁ ✴ ✆ ✂✁ ✝
  • ✠☛✡
✵ ✗ ✌ ✌ ✎ ✛ ✒ ✑ ✘ ✙ ✛ ✓ ✒ ✗ ✌ ✎ ✟ ✂✁ ✚ ✱ ✟ ✂✁ ☎
✒ ✶ ✎ ✌ ☞✍✌ ✎ ✏ ✘✍✷ ✗ ✑ ✑ ✒ ✓ ✒ ✎ ✂✁ ✸ ✴ ✟ ✂✁ ✸
  • ✹✺
✺ ✺ ☞ ☞✍✌ ✎ ✏ ✓ ✛ ✻ ✖ ✗ ✌ ✒ ✓ ✙ ✑ ✂✁ ✴✼ ✂✁ ✴ ☎ ✺ ✺ ☞ ✵ ✗ ✌ ✌ ✎ ✛ ✒ ☞ ✞✽ ✒ ✓ ✙ ✟ ✂✁
✂✁ ✴✼ ✾ ✿ ❀ ✎ ✙ ✘ ✛ ✛ ✘ ✛✙ ❁ ✘✍✙ ✌ ✓ ❂ ✂✁
  • ✂✁
✿❄❃ ❅❆ ❇ ✵ ✗ ✌ ✌ ✎ ✛ ✒ ❈ ✷ ✌ ✻ ✂✁ ✚ ✴ ✟ ✂✁ ☎ ✄ ❅ ❉ ☞✍✌ ✎ ✏ ✘✍✷ ✗ ✑ ❈ ✷ ✌ ✻ ❊ ❋● ❍■ ❏ ❑ ▲✍▼ ✬ ✫ ◆ ❖ ▼ ▲✍▼ ✬ ✫ ◆ ❖ ▼ ✩☛P ◗ ▼ ❘✳❙ ❘✳❙ ❙ ❚ ❯ ❱ ❲ ❲❳ ❲ ❲❳ ❨ ❲ ❩❭❬ ❱ ❪❭❫ ❬ ❴ ❵❜❛ ❬ ❝❞ ❬ ❡ ❢ ❫ ❛ ❴ ❣ ❣ ❣ ❣ ❣ ❣ ❤ ❡ ✐ ❞ ❛ ❥ ❦ ❬❧ ❧ ❲ ❞ ❙ ❡ ❥♥♠ ❛ ❳ ❛ ❞ ♦

Local Context Feature Weights

♣ q ❦ ❞ ❯❄r ❞ ❡ ❬ ❦ts ✉✈ ✈ ✇ ① ②❭❞ ❝ ❯ ❧ ❯ ❫ r ❳ ❫ ❯❄r ❡ ③ ❚ ❡ ❬ ❡ ❞ ④ ❫ ❛ ⑤❜⑥⑦ ⑧⑨
slide-16
SLIDE 16

Example: Tagging

✂ ✄ ☎✝✆ ✞ ✂✟ ✠ ✄✡ ☛ ✡ ✠ ☞ ✆ ✌ ✂ ✍ ✎ ✏✒✑ ✓ ✓✔ ✕ ✖✘✗ ✙ ✓✔ ✚ ✛✢✜ ✑ ✣ ✗ ✕ ✔ ✤ ✖ ✥ ✜ ✓ ✦ ✣ ✛ ✕ ✛ ✣ ✜ ✧✩★ ✖ ✛ ✜ ✕ ✜ ✓ ✖ ✜ ✪ ✔ ✖ ✫ ✔ ✓✭✬ ✮ ✯ ✓✔ ✚ ✛ ✜ ✑ ✣ ✰ ✜ ✓ ✕ ✔ ✤ ✖ ✱ ✜ ✕ ✔ ✗ ✖ ✥ ✜ ✗ ✖ ✫ ✓✔ ✔ ✖ ★ ✪ ✣ ✬ ✮ ✲ ✜ ✓ ✦✴✳ ✛ ✕ ✖ ✔ ✓ ✕ ★ ✧ ✵ ✔ ★ ✖ ✑ ✓✔ ✣ ✶ ✥ ✜ ✓ ✦ ✖ ✷ ✙ ✔ ✣ ✗ ✣ ✑ ✵ ✵ ✛ ✤ ✔ ✣ ✗ ✦ ★ ✣ ✫ ✔ ✣ ✗ ✔ ✖ ✸ ✬ ✹ ✺ ✺✼✻ ✽ ✾❀✿ ❁ ❁ ❂❄❃ ❅ ❆ ❇ ✿ ❈ ❈ ❈ ❈ ❈ ❈ ❉❊ ❂ ❋ ❋● ❂ ❆ ❍ ■ ❏ ❑ ■ ❑ ✺ ❑ ▲

Local Context Features

▼ ◆ ❖ P ◗❙❘❚ ❯ ❱❳❲ ❱ ▼ ❨ ❩ ❩ ❬ ❬❭❫❪ ❴❵ ❯ ❛❝❜ ❞ ❪ ❛ ❜ ❡ ❴❵ ❯ ❛❝❜ ❞ ❢ P ❣ ❣ ❤ ❜ ❞ ✐ ❤❦❥ ❞ ❧ ❧✘♠ ♥ ❤♣♦

Decision Point

q r❄s t ✉s ✈ s ✇ ①② ③ ④ ⑤ ⑤⑥⑧⑦ ⑨❄⑩ ❶ t s ✉ ⑩ ❷ s ❸ t s ❹❻❺ ❼❽ ❽ ❾✼❿ ❸ t ➀ ❺ ➁
slide-17
SLIDE 17

Other Maxent Examples

  • Sentence boundary detection
✁ ✂✄ ☎✆ ✝ ✝ ✞ ✟✠ ✠ ✠ ✡ ☛

Is period end of sentence or abbreviation?

PP attachment

✌ ✍ ✎ ✏ ✑ ✎✒ ✎✓ ✔✕ ✖ ✗ ✘ ✘✙ ✚ ✛

Features of head noun, preposition, etc.

Language models

✜ ✢ ✣✤ ✥✦ ✧ ✥ ★✩ ✪ ✫ ✫✬ ✭ ✮

P(w

|w

✰ ✱

,…,w

✰ ✲

). Features are word n-gram features, and trigger features which model repetitions of the same word.

Parsing

✴ ✵ ✶ ✷ ✸ ✶✹ ✶✺ ✻✼ ✽ ✾ ✿ ✿ ❀ ❁ ❂❄❃ ✼ ✸ ❅ ❃ ✸ ❆ ✷ ✶ ❇❉❈ ✾ ✿ ✿ ✿❋❊ ❆ ✷
❍ ■

Either: Local classifications decide parser actions or feature counts choose a parse.

slide-18
SLIDE 18

The likelihood of data: CL vs. JL

  • We have some data {(d, c)} and we want to place

probability distributions over it.

  • A joint model gives probabilities P(d,c) and tries

to maximize this likelihood.

It turns out to be trivial to choose weights: just relative frequencies.

  • A conditional model gives probabilities P(c|d). It

takes the data as given and models only the conditional probability of the class.

We seek to maximize conditional likelihood.

Harder to do (as we’ll see…)

More closely related to classification error.

slide-19
SLIDE 19

Feature-Based Classifiers

  • “Linear” classifiers:

Classify from features sets {fi} to classes {c}.

Assign a weight λi to each feature fi.

For a pair (c,d), features vote with their weights:

vote(c) = Σλifi(c,d)

Choose the class c which maximizes Σλifi(c,d) = VB

There are many ways to chose weights

✄ ☎✝✆ ✞✟ ✆ ✠ ✡ ✞☛ ☞ ✌ ✍ ✎ ☞ ✏ ✑ ✟ ✒ ✞ ✞ ✆ ☞ ✡ ✓✕✔ ✖ ✎✝✗ ✟ ✓ ✑ ✗ ✗ ✎ ✍ ✎ ✆ ✏ ✆ ✘ ✑ ✖ ✠ ✓ ✆✚✙ ✑ ☞ ✏ ☞ ✒ ✏✜✛ ✆ ✢ ✆ ✎ ✛ ✣ ✡ ✗ ✎ ☞ ✡ ✣ ✆ ✏ ✎ ✞✆ ✟ ✡ ✎ ☛ ☞ ☛ ✍ ✑ ✟ ☛ ✞ ✞✆ ✟ ✡ ✟ ✓ ✑ ✗ ✗ ✎ ✍ ✎ ✟ ✑ ✡ ✎ ☛ ☞ ✤ ✥ ✦ ✦ ✧ ★ ✩ ✪ ✫ ✤ ✥ ✬ ✭ ✧ ★ ✩ ✪ ✫ ✮✰✯ ✱ ✲ ✮✰✯ ✳ ✴ ✯ ✵
slide-20
SLIDE 20

Feature-Based Classifiers

✂ ✄☎ ✆✝ ✆ ✞ ✟✡✠ ☛ ☞ ☛ ☎ ✌✎✍ ☛ ✟ ✆✝ ✠ ✏✒✑ ✓ ✠ ✂ ✝ ✆ ✞ ✑ ☛ ☎ ✌ ✟✡✔ ✞ ✟✡✕ ✑ ✖ ✟ ✗ ✗ ✔ ✘ ✓ ☎ ✙✚✝ ☛ ✔ ✛ ✜ ✢ ✔ ✝ ✞ ✣✚✝ ☛ ✟ ✆✝ ✠ ✏ ✕ ☎ ✓ ✗ ✟ ✆ ✠ ✞ ✟ ☎ ✆

Σλifi(c,d)

✞ ☎ ✄ ✏ ☎ ✙✥✤ ✕ ✝ ✠ ✄ ✏ ☎ ✗ ✠ ✗ ✟ ☛ ✟ ✔ ✞ ✟✡✕ ✓ ☎ ✙ ✝ ☛ ✛ ✦ ✧ ★ ✩ ✩ ✪ ✫ ✬ ✭ ✮ ✯ ✰ ✭ ✱ ✲ ✳✵✴ ✶ ✷✹✸ ✺ ✶ ✻ ✷✹✸ ✼ ✽ ★ ✶ ✷✹✸ ✺ ✶ ✻ ✷✹✸ ✼✿✾ ✶ ❀ ✸ ❁ ✳ ✴ ❂❄❃ ❅❆ ✦ ✧ ★ ❇ ❈ ✪ ✫ ✬ ✭ ✮ ✯ ✰ ✭ ✱ ✲ ✳ ✴ ✶ ❀ ✸ ❁ ✽ ★ ✶ ✷✹✸ ✺ ✶ ✻ ✷✹✸ ✼ ✾ ✶ ❀ ✸ ❁ ✳ ✴ ❂❄❃ ❉❊ ❋
  • ❍✚■
❏ ■ ❑✡▲ ❍◆▼ ❖ P◗ ■ ▼ ❍ ■ ❘ P◗ P❙ ■ ▼ ■ ◗ ❖ ❚ ❯ ▼ ❍ ■ ❘ ◗ ❚ ❱ P ❱ ❑ ❲ ❑ ▼ ❳ ❙ ❚ ❨✚■ ❲ ❩ ❬ ❚ ❙ ❱ ❑✡❭ ■ ❨✥❪ ❑ P P ❫ ❖ ❚ ❯ ▼ ❙ P ❴ ❵ ❯❜❛ ❭ ❬ ▼ ❑ ❚ ❭ ❋ ❝ ❑ ❪ ■ ❭ ▼ ❍ ❑ ❖ ❙ ❚ ❨ ■ ❲ ❯ ❚ ◗ ❙ ❩ ❏ ■ ❏ ❑ ❲ ❲ ❬ ❍ ❚ ❚ ❖ ■ ❘ P ◗ P ❙ ■ ▼ ■ ◗ ❖

{λi}

▼ ❍ P ▼ ❞❡ ❢ ❣ ❞ ❣✡❤ ✐ ❥ ❦ ✐❧ ♠♥ ♦ ❣ ❥ ❣ ♠♥ ❡ ♣ ♣ ❣ q ✐ ♣ ❣ ❦ ♠ ♠ ♦ ❚ ❯ ▼ ❍ ■ ❨ P ▼ P P ❬ ❬ ❚ ◗ ❨ ❑✡❭ ▲ ▼ ❚ ▼ ❍ ❑ ❖ ❙ ❚ ❨ ■ ❲ r

∑ ∑

'

) , ' ( exp

c i i i

d c f λ = ) , | ( λ d c P

i i i

d c f ) , ( exp λ

s✉t ✈①✇② ③ ④ ⑤ ✇ ② ⑥ ④ ② ⑦ ⑤ ⑦ ③ ✇⑨⑧ ⑩ ④❶ ❷t ❸ ⑦✉❹ ✇② ③ ④ ⑤ ✇ ② ⑧
slide-21
SLIDE 21

Other Feature-Based Classifiers

  • The exponential model approach is one way of

deciding how to weight features, given data.

  • It constructs not only classifications, but

probability distributions over classifications.

  • There are other (good!) ways of discriminating

classes: SVMs, boosting, even perceptrons – though these methods are not as trivial to interpret as distributions over classes.

  • We’ll see later what maximizing the conditional

likelihood according to the exponential model has to do with entropy.

slide-22
SLIDE 22

Exponential Model Likelihood

  • Maximum Likelihood (Conditional) Models :

Given a model form, choose values of parameters to maximize the (conditional) likelihood of the data.

  • Exponential model form, for a data set (C,D):

∑ ∑

∈ ∈

= =

) , ( ) , ( ) , ( ) , (

log ) , | ( log ) , | ( log

D C d c D C d c

d c P D C P λ λ

∑ ∑

'

) , ' ( exp

c i i i

d c f λ

i i i

d c f ) , ( exp λ

slide-23
SLIDE 23

Building a Maxent Model

  • ✁✄✂
☎ ✆ ✝ ✂ ☎ ✂ ✞ ✟✡✠ ☛ ✂ ☞ ✌ ✆ ✝ ✍ ✆✄✎ ✞ ✟ ✏ ☛ ☎ ✠ ✝ ✎ ✟ ✆ ✏ ✝ ☞ ✑ ✏ ✒ ✂ ☛ ✍ ✞ ✟ ✞ ✓ ✏ ✆ ✝ ✟ ☞✕✔ ✖ ✗✙✘ ✚ ✛ ✜ ✢ ✘ ✣ ✢ ✘ ✤ ✢ ✘ ✣ ✘ ✥ ✛ ✣ ✘ ✛ ✣ ✦ ✧ ★ ✚ ✛ ✚ ✤ ✦ ✩ ✥ ✛ ✣ ✪ ✫ ✩✙✬ ✫ ✚ ✢ ✘ ★ ✩ ✣ ✛ ✩ ✥ ✬ ✛ ✩✮✭ ✘ ✘ ✥ ✦ ✜ ✯ ✫ ✛ ✦ ★ ✘ ✣ ✘ ✢ ✭ ✘ ✰ ✦ ★ ✘ ✱ ✤ ✚ ✢ ✚ ✰ ✘ ✛ ✘ ✢ ✣✳✲ ✴ ✵ ✣ ✜ ✚ ✱ ✱✷✶ ✧ ✘ ✚ ✛ ✜ ✢ ✘ ✣ ✚ ✢ ✘ ✚ ★ ★ ✘ ★ ✩ ✥ ✬ ✢ ✘ ✰ ✘ ✥ ✛ ✚ ✱ ✱ ✶ ✛ ✦ ✸ ✛ ✚ ✢ ✯ ✘ ✛ ✹ ✘ ✢ ✢ ✦ ✢ ✣✳✲ ✺ ✻✄✼ ✽ ✾✿ ❀ ❁ ❂❄❃ ❅ ✿ ❆ ❅ ✾ ❇✡❈ ✽ ❅ ❉ ❅ ❂ ❁ ❊ ❇ ❋✕● ❉ ❅ ❉ ✾ ✿ ❇ ❇ ✼ ❍ ❅ ✾ ❍■ ❅ ❇ ✼ ❏ ✾ ■ ❏ ❈ ■ ✾ ❇ ❅ ❑ ✴ ▲ ✚ ✛ ✚ ▼ ✬ ✦ ✥ ★ ✩ ✛ ✩ ✦ ✥ ✚ ✱◆ ✱ ✩ ❖ ✘ ✱ ✩ ✫ ✦ ✦ ★ ✴ ▲ ✘ ✢ ✩ ✭ ✚ ✛ ✩✮✭ ✘ ✦ ✧ ✛ ✫ ✘ ✱ ✩ ❖ ✘ ✱ ✩ ✫ ✦ ✦ ★ ✪ ✢ ✛ ✘ ✚ ✬ ✫ ✧ ✘ ✚ ✛ ✜ ✢ ✘ ✪ ✘ ✩ ✯ ✫ ✛ P ◗❙❘❚ ❚ ❯ ❱ ❚ ❲ ❳ ❨ ❳ ❩❭❬ ❪ ❘ ❬ ❫ ❚ ❨ ❲ ❴ ❫ ❚ ❨ ❳ ❵ ❛ ❚ ❨ ❲ ❲ ❬ ❛ ❜ ❩ ❪❝ ❳ ❬ ❳ ❴ ❚ ❞ ❬ ❜ ❚ ❡ ❢ ❣❤✄✐ ❥❧❦ ♠♦♥ ♣q ❦ ❤ r s r t ♥ ✉ ❦ s ✈ ♥ ✇ ♥ ❤✄① ♠ ❦ ② ③ ✐ ♥ ④ ❦ q ✉ ✈ ❦ ⑤ ⑥
slide-24
SLIDE 24

The Likelihood Value

✂☎✄ ✆✝ ✞✟ ✠ ✡ ✞☛ ☞ ✌ ✍ ✌ ✞☛ ✎ ✝ ✝ ✌ ✏ ✄ ✝ ✌ ✂ ✞ ✞ ☞ ✌ ✑ ✎ ✒✔✓ ☛ ✡ ✍ ✌ ✞☛ ✞ ✒ ✍ ✂ ✄ ☞ ✎ ✍ ✎ ✆

C,D

✠ ✎ ☛ ☞ ✍ ✂ ✄ ✕ ✎✖ ✎ ✗✄ ✍ ✄ ✖ ✑

λ

✒ ✍ ✂ ✄ ✖ ✄ ✎✖ ✄ ☛ ✚ ✍ ✗ ✎ ☛ ✛✜ ✎ ✝ ✓ ✄ ✑ ✞ ✒✣✢ ✤ ✌ ✍ ✚ ✑ ✄ ✎ ✑ ✛ ✍ ✞ ✡ ✎ ✝ ✡ ✓ ✝ ✎ ✍ ✄ ✘
✄ ✡ ✎ ☛ ✑ ✄ ✕ ✎✖ ✎ ✍ ✄ ✍ ✂ ✌ ✑ ✌ ☛ ✍ ✞ ✍ ✦ ✞ ✡ ✞ ✗ ✕ ✞☛ ✄ ☛ ✍ ✑ ✘ ✧ ★ ✩✫✪ ✬✫✪✭ ✮✰✯ ✱ ✲ ✮✰✯ ✪ ✮✴✳ ✲ ✩✫✪ ✬ ✮ ✵ ✵ ✪ ✭ ✪✶ ✷ ✪ ✸ ✪ ✲ ✹ ✪ ✪✶ ✲ ✩✫✪ ✬✫✪ ✭ ✮✰✯ ✱ ✲ ✮✰✯ ✪ ✳ ✺ ✵ ✪ ✱ ✷ ✩ ✷ ✺✻ ✼ ✺ ✶ ✪ ✶ ✲

) , | ( log ) , | ( log

) , ( ) , (

λ λ d c P D C P

D C d c ∑ ∈

=

=

) , ( ) , (

log ) , | ( log

D C d c

D C P λ

∑ ∑

'

) , ( exp

c i i i

d c f λ

i i i

d c f ) , ( exp λ

∑ ∑ ∑

∈ ) , ( ) , ( '

) , ' ( exp log

D C d c c i i i

d c f λ

∑ ∑

∈ ) , ( ) , (

) , ( exp log

D C d c i i i

d c f λ − = ) , | ( log λ D C P

) (λ N ) (λ M = ) , | ( log λ D C P −

slide-25
SLIDE 25

The Derivative I: Numerator

i D C d c i i i

d c f λ λ ∂ ∂ =

∑ ∑

∈ ) , ( ) , (

) , (

∑ ∑

∂ ∂ =

) , ( ) , (

) , (

D C d c i i i i

d c f λ λ

=

) , ( ) , (

) , (

D C d c i

d c f

i D C d c i i ci i

d c f N λ λ λ λ ∂ ∂ = ∂ ∂

∑ ∑

∈ ) , ( ) , (

) , ( exp log ) ( Derivative of the numerator is: the empirical count(f

  • , c)
slide-26
SLIDE 26

The Derivative II: Denominator

i D C d c c i i i i

d c f M λ λ λ λ ∂ ∂ = ∂ ∂

∑ ∑ ∑

∈ ) , ( ) , ( '

) , ' ( exp log ) (

∑ ∑ ∑ ∑ ∑

∂ ∂ =

) , ( ) , ( ' '

) , ' ( exp ) , ' ( exp 1

D C d c i c i i i c i i i

d c f d c f λ λ λ

∑ ∑ ∑ ∑ ∑ ∑

∂ ∂ =

) , ( ) , ( ' '

) , ' ( 1 ) , ' ( exp ) , ' ( exp 1

D C d c c i i i i i i i c i i i

d c f d c f d c f λ λ λ λ

i i i i D C d c c c i i i i i i

d c f d c f d c f λ λ λ λ ∂ ∂ =

∑ ∑ ∑ ∑ ∑ ∑

) , ' ( ) , ' ( exp ) , ' ( exp

) , ( ) , ( ' '

∑ ∑

=

) , ( ) , ( '

) , ' ( ) , | ' (

D C d c i c

d c f d c P λ

  • ✁✂
✄ ☎ ✆ ✝ ✞ ✄ ☎ ✝✟ ✠ ✡ ✞ ☛ ☞✍✌✏✎

λ

slide-27
SLIDE 27

The Derivative III

✂☎✄ ✆✝ ✞ ✟ ✠ ✡ ✠ ✝ ☛☞ ☛ ✠✄ ✞ ✄ ☞✌ ☛☞ ✄ ✞ ✂ ✄ ✆✍ ✄ ✌ ✎ ✆ ☞ ✏ ✂ ✟✒✑ ✂ ✄ ☛ ✑ ✂ ✎ ✄ ☛ ✞ ✡ ☞ ✄ ✌ ✝ ☞ ✄ ✓ ✟✒✑ ✞ ✄ ✓ ✄ ✔ ✝ ✄ ✑ ✞ ☛ ✞ ✟ ✆ ✍ ✄ ✕ ✡ ☛ ✖ ✌ ✟ ✞ ✌ ✄ ✠ ✝ ✟ ☞ ✟✒✑ ☛ ✖ ✄ ✔ ✝ ✄ ✑ ✞ ☛ ✞ ✟ ✆ ✍ ✗ ✁ ✂☎✄ ✆✝ ✞ ✟ ✠ ✡ ✠ ✓ ✟ ✌ ✞ ☞ ✟ ✘ ✡ ✞ ✟ ✆✍ ✟ ✌ ✙ ✚ ✛ ✜✣✢ ✤ ✥ ✦ ✧ ★ ✩✫✪ ✧ ✬ ✭✮ ✧ ✯ ✰ ✤✱ ✤✲ ✬ ✯ ✬ ✱ ✦ ✲ ✤ ✥ ★✳ ✯ ✮ ✬ ✧ ★ ✩ ✪ ✧ ✬ ✴ ✵ ✛ ✜✣✢ ✤ ✥ ✦ ✬ ✶ ✩ ✦ ✯ ✦ ✭ ✩ ✷ ✷ ✬ ✤ ✯ ✧ ✱ ✬ ✦✸ ✳ ✧ ★ ✯ ✦ ✤ ✱ ✬ ✷ ✱ ✳ ✲ ✤ ✸ ✯ ✧ ✤ ✜✹ ✤ ✯ ✤ ✴✻✺ ✼ ✽✒✾ ✿ ❀❂❁ ❃ ✾ ❄❅ ✿❆ ❇ ✿ ❈ ✾ ❇ ❉✒❊ ❇ ❋● ❍ ✾ ■ ✾ ❏ ❑ ✾ ❅ ❀ ✿ ❀ ❉
❄ ▲ ❑ ❃ ✾ ❍ ❉ ❅ ❀ ✾ ❍ ❅
❆ ❀ ❄ ▼ ✾ ❉ ❀ ❇ ✾ ❃ ◆ ✾ ❅ ✿ ❁ ❄ ✾ ❀ ❇ ✾ ❖ ❇ ✿ ❈ ✾ ■ ✿ ❃ ❊ ✾ P ✾ ❉ ❊ ❇ ❀ ❄
◆ ✾ ❅ ✿ ❁ ❄ ✾ ❀ ❇ ✾ ❖
❅ ❁ ❃ P ❉ ❀ ❇
❇ ✾ ❃ ◗ ✾ ✿ ❀ ❁ ❃ ✾ ❄ P ❇ ❉ ❅ ❇ ❇ ✿ ❈ ✾ ■ ✿ ❃ ❊ ✾ P ✾ ❉ ❊ ❇ ❀ ❄❙❘

= ∂ ∂

i

D C P λ λ) , | ( log ) , ( count actual C fi ) , ( count predicted λ

i

f −

slide-28
SLIDE 28

Summary so far

  • We have a function to optimize.
  • We know the function’s derivatives.

Perfect situation for general optimization (Part II)

But first … what has all this got to do with maximum entropy models?

=

) , ( ) , (

log ) , | ( log

D C d c

D C P λ

∑ ∑

'

) , ( exp

c i i i

d c f λ

i i i

d c f ) , ( exp λ

= ∂ ∂

i

D C P λ λ / ) , | ( log ) , ( count actual C fi ) , ( count predicted λ

i

f −

slide-29
SLIDE 29

Maximum Entropy Models

  • An equivalent approach:

Lots of distributions out there, most of them very spiked, specific, overfit.

We want a distribution which is uniform except in specific ways we require.

Uniformity means high entropy – we can search for distributions which have properties we desire, but also have high entropy.

slide-30
SLIDE 30

(Maximum) Entropy

  • Entropy: the uncertainty of a distribution.
  • Quantifying uncertainty (“surprise”):

Event

x

Probability

px

“Surprise”

log(1/px)

Entropy: expected surprise (over p):

− =

x x x

p p p log ) ( H       =

x p

p E p 1 log ) ( H

✄✆☎ ✝ ✞✆✟ ✠ ✡ ☛ ✞✆☞ ✞✆✌ ✍ ✝ ✌ ✎ ✏ ✟ ☎ ✑✒ ✎ ✓ ✞ ✟ ✡ ✝ ✒ ✓ ✡ ✓ ✞ ✒ ☎ ✝ ✞ ✟ ✔

pHEADS H

slide-31
SLIDE 31

Maxent Examples I

✂☎✄ ✆ ✝☎✞ ✟ ✠ ✟ ✄ ✡ ✆ ☛✌☞ ✞ ✍ ✄ ✝ ✎✑✏ ✆ ☞ ✎ ✒✌✓ ✆ ✎ ✞ ✡ ✔ ✕ ✖ ✗✙✘ ✗ ✚ ✗✜✛ ✢✣ ✤ ✚ ✚ ✗ ✥ ✚ ✢ ✘ ✥ ✦ ✚✧ ★ ✗ ✚ ✗ ✛ ✢ ✢ ✘ ✥ ✩ ✤✪ ✫ ✬ ✭ ✮ ✢✯ ✢ ✚ ✰✱ ✢ ✯ ✤ ✚ ✢ ✩ ✢ ✲ ✢ ✩ ✢ ✘ ✣ ✢ ✳ ✗ ✯ ✥ ✩ ✗ ✰✵✴ ✥ ✗ ✤ ✘ ✶ ✳ ✧ ✥ ✧ ✷ ✬ ✸ ✹✻✺ ✼✌✽ ✾ ✿ ✺ ❀ ❁ ❂❃ ❄ ✿ ❂ ✿✻❅ ❆ ❆ ❀ ✾ ❇ ✺ ❈ ❉

H

❊ ❋ ✽
❆■ ✾ ✾ ✺ ❏ ❆ ❃ ✾ ✽ ❇ ❆▲❑
❋ ❆ ▼ ■ ✺ ❀ ❋ ✾ ❇ ❃ ✿ ❀ ✾ ❋ ❁ ✸ ◆ ▼ ▼ ✿ ❀❖ ■ ✺ ❀ ❋ ✾ ❇ ❃ ✿ ❀ ✾ ❋ P ❏ ❆ ❃ ✾ ✽ ❇ ❆ ❋ ◗ ❁ ✭ ❘ ✤ ❙ ✢ ✩ ✯ ✚ ✧ ★ ✗ ✚ ✴ ✚ ✢ ✘ ✥ ✩ ✤✪ ✫ ✭ ✮ ✧ ✗ ✯ ✢✯ ✚ ✧ ★ ✗ ✚ ✴ ✚ ✱ ✗ ❚ ✢ ✱ ✗ ❯ ✤ ✤ ✳ ✤ ✲ ✳ ✧ ✥ ✧ ✭ ❱ ✩ ✗ ✘ ❲ ✯ ✥ ❯ ✢ ✳ ✗ ✯ ✥ ✩ ✗ ✰✵✴ ✥ ✗ ✤ ✘ ✲ ✴ ✩ ✥ ❯ ✢ ✩ ✲ ✩ ✤ ✚ ✴ ✘ ✗ ✲ ✤ ✩ ✚ ✭ ❱ ✩ ✗ ✘ ❲ ✯ ✥ ❯ ✢ ✳ ✗ ✯ ✥ ✩ ✗ ✰✵✴ ✥ ✗ ✤ ✘ ✣ ✱ ✤ ✯ ✢ ✩ ✥ ✤ ✳ ✧ ✥ ✧

[ ] [ ]

i p i p

f E f E

ˆ

=

=

i

f x i x

C p

❳✑❨ ❩❬ ❨ ❭ ❪ ❫❴ ❵ ❨ ❛ ❜❞❝ ❡ ❴ ❢ ❴ ❪ ❣✐❤ ❥ ❦ ❬ ❨ ❭ ❪ ❫❴ ❵ ❨ ❪ ❪ ❧ ❴ ❪

pHEADS = 0.3

slide-32
SLIDE 32

Maxent Examples II

H(pH pT,) pH + pT = 1 pH = 0.3

  • x log x

1/e

slide-33
SLIDE 33

Maxent Examples III

  • Lets say we have the following event space:
  • … and the following empirical data:
  • Maximize H:
  • … want probabilities: E[NN,NNS,NNP,NNPS,VBZ,VBD] = 1

VBD VBZ NNPS NNP NNS NN

1/e 1/e 1/e 1/e 1/e 1/e 1/6 1/6 1/6 1/6 1/6 1/6 1 3 13 11 5 3

slide-34
SLIDE 34

Maxent Examples IV

  • ✁✄✂
✂ ☎ ✆ ✝ ✞ ✂ ✟ ✠ ✡
  • ☛☞
✌ ✟✍ ✠ ✂ ✟✍ ✎ ✂ ✠ ✠ ✂ ✆ ✏ ✑ ✌ ✆ ✒ ☞✔✓ ✕ ✂ ✖ ✍ ✌ ✗ ✗ ✏ ✑ ✍ ✞ ✍ ✌ ✏ ☎ ✟✍

f

✘ ✙ ✚ ☛ ☛ ✓ ☛ ☛ ✛ ✓ ☛ ☛✜ ✓ ☛ ☛ ✜ ✛ ✢ ✓ ✖ ✝ ✏ ✑ ✣ ✤

f

✘ ✥ ✙ ✦✧ ★ ✦✩
✌ ✆ ✗ ✫ ✟✂ ✫ ✍ ✟ ✆ ✂ ☎ ✆ ✕ ✌ ✟ ✍ ✠ ✂ ✟✍ ✞ ✟ ✍ ✬ ☎ ✍ ✆ ✏ ✏ ✑ ✌ ✆ ✎ ✂ ✠ ✠ ✂ ✆ ✆ ✂ ☎ ✆ ✕ ✓ ✕ ✂ ✖ ✍ ✌ ✗ ✗

f

✭ ✙ ✚ ☛ ☛✜ ✓ ☛ ☛✜ ✛ ✢ ✓ ✖ ✝ ✏ ✑ ✣ ✤

f

✭ ✥ ✙ ✧ ✮ ★ ✦ ✩
✖ ✍ ✎ ✂ ☎ ✯ ✗✰ ✍ ✍ ✫ ✟✍ ✞ ✝ ✆ ✝ ✆✱ ✏ ✑ ✍ ✠ ✂ ✗ ✍ ✯ ✕ ✓ ✍✳✲ ✱ ✲ ✴✶✵ ✌ ✗ ✗ ✝ ✆ ✱ ✌ ✞ ✍ ✌ ✏ ☎ ✟✍ ✏ ✂ ✗ ✝ ✕ ✏ ✝ ✆✱ ☎ ✝ ✕ ✑ ✕ ✝ ✆✱ ☎ ✯ ✌ ✟ ✷ ✕ ✲ ✫ ✯ ☎ ✟ ✌ ✯ ✆ ✂ ☎ ✆ ✕ ✓ ✂ ✟ ✷ ✍ ✟ ✴ ✏ ✵ ✫ ✍ ✕ ✲

2/36 2/36 8/36 8/36 8/36 8/36 2/36 2/36 12/36 12/36 4/36 4/36

✸✹ ✺ ✸✹ ✻ ✼ ✼✽ ✾ ✼ ✼✽ ✼ ✼ ✾ ✼ ✼
slide-35
SLIDE 35

Feature Overlap

  • ✁✄✂
☎ ✆✝ ✞ ✟✠ ✡ ✆ ☛✌☞ ✍ ✂ ✝ ✡ ☛ ✆ ✠ ✎ ✆✏ ☛ ✂ ✑ ✑ ✒ ✝ ✓ ✔ ✆ ✂ ✞✖✕ ✏ ✆ ☞ ✗ ✆ ☛ ☛ ✘
✝ ☛ ✒ ✚ ✆ ✂ ✛✜ ✟ ✠ ✡ ✆ ☛ ✢ ✞ ✍ ✆ ✏ ✆ ✒✄☞ ✝ ✠ ✡ ✠ ✕ ✣ ☛ ✆✤ ✠ ✕ ✝ ✞ ✒ ✝ ✓ ✥ ✦ ✧ ★ ✦ ✧ ✩ ✪ ✫ ✦ ✬ ✭ ✦ ✬ ✭ ★ ✦ ✬ ✭ ✦ ✬ ✭ ✩ ✪ ✫ ✮✰✯ ✱ ✲✴✳ ✲✴✵ ✪ ✶ ✫ ✶ ✶✸✷ ✦ ★ ✩ ✪ ✫ ✦ ✬ ✹ ✦ ✬ ✺ ★ ✦ ✬ ✹ ✦ ✬ ✺ ✩ ✪ ✫ ✫ ✷ ✧ ✬ ✺ ★ ✩ ✪ ✫ ✦ ✬ ✹ ✦ ✬ ✺ ★ ✦ ✬ ✹ ✦ ✬ ✺ ✩ ✪ ✫ ✫ ✷ ✧ ✬ ✺ ★ ✩ ✪ ✫ ★ ✩ ✪ ✫

λ

✻ ✼

λ

✻ ✽ ✾ ✿

λ

❀❂❁ ❃

λ

❀ ❀ ❁ ✼

λ

❀❄❁ ❃

λ

❀ ❀ ❁ ✽ ✾ ✿
slide-36
SLIDE 36

Example: NER Overlap

✂✁ ✄ ☎
✆ ✝ ✞ ✟ ✠☛✡ ☞✍✌ ✎ ✏ ✑ ✒ ✓ ✒ ✎✕✔ ✖ ✗ ✌ ✑ ✘✍✙ ✂✁ ✄ ☎ ✟
✆✚ ✡ ✟ ✠☛✡ ✟ ✠☛✡ ☞✍✌ ✎ ✏ ✟ ✖ ✗ ✌ ✟ ✛ ✎ ✡ ✒ ✑ ✘✍✙ ✜✣✢ ✤✥ ✦ ✧ ✢ ★ ✥ ✩✍✪ ✫ ✬ ✭✯✮ ✰ ✂✁ ✝✱ ✟
✟ ✡ ✟ ✠☛✡ ☞ ✁ ✑ ✒ ✓ ✒ ✎ ✟ ✲✳✟ ✖ ✗ ✌ ✑ ✘ ✙ ✂✁ ✴ ✆
✡ ✵ ✗ ✌ ✌ ✎ ✛ ✒ ✑ ✘ ✙ ✛ ✓ ✒ ✗ ✌ ✎ ✟
✚ ✱ ✟
✒ ✶ ✎ ✌ ☞✍✌ ✎ ✏ ✘✍✷ ✗ ✑ ✑ ✒ ✓ ✒ ✎ ✂✁ ✸ ✴ ✟
  • ✹✺
✺ ✺ ☞ ☞✍✌ ✎ ✏ ✓ ✛ ✻ ✖ ✗ ✌ ✒ ✓ ✙ ✑ ✂✁ ✴✼
✴ ☎ ✺ ✺ ☞ ✵ ✗ ✌ ✌ ✎ ✛ ✒ ☞ ✞✽ ✒ ✓ ✙ ✟
✴✼ ✾ ✿ ❀ ✎ ✙ ✘ ✛ ✛ ✘ ✛✙ ❁ ✘✍✙ ✌ ✓ ❂ ✂✁
✿❄❃ ❅❆ ❇ ✵ ✗ ✌ ✌ ✎ ✛ ✒ ❈ ✷ ✌ ✻ ✂✁ ✚ ✴ ✟
☎ ✄ ❅ ❉ ☞✍✌ ✎ ✏ ✘✍✷ ✗ ✑ ❈ ✷ ✌ ✻ ❊ ❋● ❍■ ❏ ❑ ▲✍▼ ✬ ✫ ◆ ❖ ▼ ▲ ▼ ✬ ✫ ◆ ❖ ▼ ✩☛P ◗ ▼ ❘✳❙ ❘✳❙ ❙ ❚ ❯ ❱ ❲ ❲❳ ❲ ❲❳ ❨ ❲ ❩❭❬ ❱ ❪❭❫ ❬ ❴ ❵❜❛ ❬ ❝❞ ❬ ❡ ❢ ❫ ❛ ❴ ❣ ❣ ❣ ❣ ❣ ❣ ❤ ❡ ✐ ❞ ❛ ❚ ❡ ❬ ❡ ❞ ❲ ❞ ❙ ❡ ❥❧❦ ❛ ❳ ❛ ❞ ♠

Local Context Feature Weights

♥❧♦ ♣q r s❜t q ✉ ♦ ♦ r ✈ ♣ ✇ r ① ② s ✇ ③ ④ ⑤⑥ ⑦⑧ ⑨❶⑩ ❷❹❸ ✇ ① ✉ r t ❺ ✉ ✇ ♣ ① ① ❻ ❸ q ③ r ❼ s ① r ❺ q r ✉ ❺ ✇ ✉❽ ✉ ❾ ♣ ✈ ♦ r ♣ ①❹❿ ➀ ❺ ✉ ② s ❺➁ ❽ ♦ r ❾ s❧➂ ❾ r ♣ ✇ ❸ ♦ r t➄➃
slide-37
SLIDE 37

Feature Interaction

  • ✁✄✂
☎ ✆✝ ✞ ✟✠ ✡ ✆ ☛✌☞ ✍ ✂ ✝ ✡ ☛ ✆ ✠ ✎ ✆✏ ☛ ✂ ✑ ✑ ✒ ✝ ✓ ✔ ✆ ✂ ✞✖✕ ✏ ✆ ☞ ✗ ✆ ☛ ☛ ✘ ✙ ✕ ✞ ✡ ✠ ✝ ✠ ✞ ✂ ✕ ✞ ✠ ✟ ✂ ✞ ✒✄✚ ✂ ☛ ☛✜✛ ✟ ✠ ✡ ✆ ☛ ✔ ✆ ✂ ✞ ✕ ✏ ✆ ✒ ✝ ✞ ✆ ✏ ✂ ✚ ✞ ✒ ✠ ✝ ☞✣✢ ✤ ✥ ✦ ✥ ✥ ✧ ★ ✩ ✥ ✪ ✫ ✥ ✪ ✫ ✦ ✥ ✪ ✫ ✥ ✪ ✫ ✧ ★ ✩ ✬✮✭ ✯ ✰✲✱ ✰✲✳ ★ ✴ ✩ ✴ ✴✶✵ ✥ ✦ ✧ ★ ✩ ✥ ✪ ✷ ✥ ✪ ✸ ✦ ✥ ✪ ✷ ✥ ✪ ✸ ✧ ★ ✩ ✩ ✵ ✹ ✪ ✸ ✦ ✧ ★ ✩ ✥ ✪ ✺ ✹ ✪ ✺ ✦ ✹ ✪ ✺ ✫ ✪ ✺ ✧ ★ ✩ ✧ ✵ ✹ ✪ ✸ ✦ ✧ ★ ✩ ✤ ✤ ✦ ✤ ✤ ✧ ★ ✩

λ

✻ ✼

λ

✻ ✽ ✾ ✿

λ

✻ ✼

λ

λ

❁ ❂

λ

❃ ✽ ✾ ✿
slide-38
SLIDE 38

Feature Interaction

✂ ✄ ☎ ✆ ✝ ✞✟ ✠ ✡ ✟ ✠ ☛ ☞ ✞✌ ✠ ✡ ☎ ✟ ✠ ☛ ☞✍ ✎✑✏ ✄ ☎ ✆ ✒ ✞ ✓ ☛ ✠ ☎ ✞ ✔ ✔ ✠ ✒ ☛ ✍ ✕
✔ ✡ ✎ ✗ ✆ ✟ ✌ ✠ ✡ ✓ ☛ ✂ ☛ ✞ ✠ ✆ ☞ ☛ ✝ ☎ ✆ ✘ ✔ ✞ ✘ ✎ ☎ ✒ ✞ ✓ ☛ ✔ ☎ ✟ ☛ ✡ ✠ ✙ ✞ ✘ ☎ ✟ ☛ ✚ ✕ ✛ ✜ ✢ ✜ ✜ ✣ ✤ ✥ ✦★✧ ✩ ✪✬✫ ✪✬✭ ✤ ✮ ✜ ✯ ✰ ✜ ✯ ✱ ✢ ✜ ✯ ✰ ✜ ✯ ✱ ✣ ✤ ✥ ✥✳✲ ✴ ✯ ✱ ✢ ✣ ✤ ✥ ✜ ✯ ✵ ✴ ✯ ✵ ✢ ✴ ✯ ✵ ✶ ✯ ✵ ✣ ✤ ✥ ✣ ✲ ✴ ✯ ✱ ✢ ✣ ✤ ✥ ✛ ✜ ✯ ✱ ✢ ✜ ✯ ✱ ✜ ✯ ✱ ✣ ✤ ✥ ✥ ✣ ✲ ✜ ✯ ✱ ✢ ✣ ✤ ✥ ✢ ✣ ✤ ✥ ✛ ✜ ✯ ✱ ✢ ✜ ✯ ✱ ✜ ✯ ✱ ✣ ✤ ✥
slide-39
SLIDE 39

Feature Interaction

  • For loglinear/logistic regression models in

statistics, it is standard to do a greedy stepwise search over the space of all possible interaction terms.

  • This combinatorial space is exponential in

size, but that’s okay as most statistics models only have 4–8 features.

  • In NLP, our models commonly use hundreds
  • f thousands of features, so that’s not okay.
  • Commonly, interaction terms are added by

hand based on linguistic intuitions.

slide-40
SLIDE 40

Example: NER Interaction

✂✁ ✄ ☎
✆ ✝ ✞ ✟ ✠☛✡ ☞✍✌ ✎ ✏ ✑ ✒ ✓ ✒ ✎✕✔ ✖ ✗ ✌ ✑ ✘✍✙ ✂✁ ✄ ☎ ✟
✆✚ ✡ ✟ ✠☛✡ ✟ ✠☛✡ ☞✍✌ ✎ ✏ ✟ ✖ ✗ ✌ ✟ ✛ ✎ ✡ ✒ ✑ ✘✍✙ ✜✣✢ ✤✥ ✦ ✧ ✢ ★ ✥ ✩✍✪ ✫ ✬ ✭✯✮ ✰ ✂✁ ✝✱ ✟
✟ ✡ ✟ ✠☛✡ ☞ ✁ ✑ ✒ ✓ ✒ ✎ ✟ ✲✳✟ ✖ ✗ ✌ ✑ ✘ ✙ ✂✁ ✴ ✆
✡ ✵ ✗ ✌ ✌ ✎ ✛ ✒ ✑ ✘ ✙ ✛ ✓ ✒ ✗ ✌ ✎ ✟
✚ ✱ ✟
✒ ✶ ✎ ✌ ☞✍✌ ✎ ✏ ✘✍✷ ✗ ✑ ✑ ✒ ✓ ✒ ✎ ✂✁ ✸ ✴ ✟
  • ✹✺
✺ ✺ ☞ ☞✍✌ ✎ ✏ ✓ ✛ ✻ ✖ ✗ ✌ ✒ ✓ ✙ ✑ ✂✁ ✴✼
✴ ☎ ✺ ✺ ☞ ✵ ✗ ✌ ✌ ✎ ✛ ✒ ☞ ✞✽ ✒ ✓ ✙ ✟
✴✼ ✾ ✿ ❀ ✎ ✙ ✘ ✛ ✛ ✘ ✛✙ ❁ ✘✍✙ ✌ ✓ ❂ ✂✁
✿❄❃ ❅❆ ❇ ✵ ✗ ✌ ✌ ✎ ✛ ✒ ❈ ✷ ✌ ✻ ✂✁ ✚ ✴ ✟
☎ ✄ ❅ ❉ ☞✍✌ ✎ ✏ ✘✍✷ ✗ ✑ ❈ ✷ ✌ ✻ ❊ ❋● ❍■ ❏ ❑ ▲✍▼ ✬ ✫ ◆ ❖ ▼ ▲ ▼ ✬ ✫ ◆ ❖ ▼ ✩☛P ◗ ▼ ❘✳❙ ❘✳❙ ❙ ❚ ❯ ❱ ❲ ❲❳ ❲ ❲❳ ❨ ❲ ❩❭❬ ❱ ❪❭❫ ❬ ❴ ❵❜❛ ❬ ❝❞ ❬ ❡ ❢ ❫ ❛ ❴ ❣ ❣ ❣ ❣ ❣ ❣ ❤ ❡ ✐ ❞ ❛ ❚ ❡ ❬ ❡ ❞ ❲ ❞ ❙ ❡ ❥❧❦ ❛ ❳ ❛ ❞ ♠

Local Context Feature Weights

❳ ❛ ❞ ♠ ❯ ❫ ❦ ♥✳♦ ♥ ❡ ❬ ❡ ❞ ❬ ♣ ❴ ❝ ❦ ❛ ❛ ❞ ♣ ❡ ♦ ♥ ❯ ❱ ♣ ❬ ❡ ❦ ❛ ❞ ✐ ❬ ♠ ❞ ❯ ♣ ❡ ❞ ❛ ❬ ❝ ❡ ❯ ❫ ♣ ♥rq ❞rs ❱ s ❳ t ❳ ✉ ❪ ❚ ♦ ❥ t ❘✳❙ ❯ ♣ ❴ ❯ ❝ ❬ ❡ ❞ ♥ ❥ t ❳ ✉ ❪ ❚ ✈ ❦ ❝ ✐ ✈ ❫ ❛ ❞ ♥ ❡ ❛ ❫ ♣ ❱ ✇②① ❡ ✐ ❬ ♣ ❥ t ❘✳❙ ❬ ♣ ❴ ❳ t ❳ ✉ ❪ ❚ ❯ ♣ ❴ ❞ ③❞ ♣ ❴ ❞ ♣ ❡ ✇②①④s ❩ ✐ ❯ ♥ ⑤ ❞ ❬ ❡ ❦ ❛ ❞ ❡ ① ③❞ ❬ ✇ ✇ ❫ ⑥ ♥ ❡ ✐ ❞ ✈ ❫ ❴ ❞ ✇ ❡ ❫ ❝ ❬ ③ ❡ ❦ ❛ ❞ ❡ ✐ ❯ ♥ ❯ ♣ ❡ ❞ ❛ ❬ ❝ ❡ ❯ ❫ ♣ s
slide-41
SLIDE 41

Classification

✂☎✄ ✆ ✝☎✞ ✆ ✂☎✟✠ ✟ ✡ ✞ ✝ ✟ ☛ ✠ ✞ ☞

P(x)

✂ ✄ ✌ ✟ ✆ ✞ ✝☎✞ ✍ ✎ ✆ ✂ ✏ ✞ ✑ ✝ ✎ ✆ ✎ ✞ ✑ ✄ ☛ ✡✞ ✝☎✟ ☛ ✠

P(c,d)

✂ ✎ ✑ ✔ ✞ ☞ ✆ ✂ ✟ ✕ ✄ ✎✗✖ ✠

(c,d)

✄ ✠ ✆ ✂☎✟

x

✎ ✆ ✟ ✡ ✠ ✘
✠ ✠ ✚ ✡✟ ✄ ☞ ✟ ✄ ✆ ✚ ✖ ✟ ☞ ✞ ✖ ✟ ✄ ✏ ✂ ✝ ✄ ✆ ✄ ✆✜✛ ✕ ✟ ✢
✂☎✟ ✑✤✣ ✄ ☛ ☛ ✆ ✂ ✄ ✆ ✏ ✄ ✑ ✌ ✄ ✖ ✛ ✎ ✠ ✆ ✂ ✟ ✏ ✞ ✑ ✝ ✎ ✆ ✎ ✞ ✑ ✄ ☛

P(c|d):

  • ✥✗✄
✦ ✎ ✡ ✎★✧ ✎ ✑✩ ✪ ✞ ✎ ✑ ✆ ☛ ✎ ✔ ✟ ☛ ✎ ✂☎✞ ✞ ✝ ✄ ✑ ✝ ✏ ✞ ✑ ✝ ✎ ✆ ✎ ✞ ✑ ✄ ☛ ☛ ✎ ✔ ✟ ☛ ✎ ✂☎✞ ✞ ✝ ✎ ✑ ✆ ✂ ✎ ✠ ✡ ✞ ✝☎✟ ☛ ✄ ✖ ✟ ✟ ✫ ✚ ✎ ✌ ✄ ☛ ✟ ✑ ✆ ✬

) ( ˆ ) ( ) ( d P d P D d = ∈ ∀ ) ( ˆ ) | ( ) ( ) | ( ) , ( d P d c P d P d c P d c P = =

slide-42
SLIDE 42

Comparison to Naïve-Bayes

  • ✁✄✂
☎✝✆ ✞✠✟ ✡ ✂ ☛ ✞☞ ✌ ☞ ✂ ✍✎ ✏ ✑ ✞✒ ✏ ✎ ✎ ✓ ✔ ✎ ✒ ✕ ✓✖✂ ☞ ☞ ✌ ✔ ✌ ✕ ✂ ✏ ✌ ✎ ✍ ✗ ✘ ✙ ✞ ✑ ✂ ✆ ✞ ✂ ✚✜✛ ✍ ✕ ✑ ✎ ✔ ✒ ✂ ✍ ✢✖✎ ✣ ✆ ✂ ✒ ✌✄✂ ✚ ✓ ✞☞ ✤ ✢ ✂ ✏ ✂ ✔ ✞ ✂ ✏ ✛ ✒ ✞☞ ✥ ✦ ✑ ✌ ✕ ✑ ✦ ✞ ✦ ✎ ✛ ✓ ✢ ✓ ✌ ✧ ✞ ✏ ✎ ✛ ☞ ✞ ✏ ✎ ★ ✒ ✞ ✢ ✌ ✕ ✏ ✂ ✍ ✎ ✏ ✑ ✞✒ ✆ ✂ ✒ ✌✄✂ ✚ ✓ ✞ ✤ ✏ ✑ ✞ ✕ ✓✖✂ ☞ ☞ ✥ ✗ ✘ ✩ ✑ ✞ ✁✄✂ ☎ ✆ ✞ ✟ ✡ ✂ ☛ ✞ ☞ ✓ ✌ ✧ ✞ ✓ ✌ ✑ ✎ ✎ ✢ ✎ ✆ ✞✒ ✕ ✓✖✂ ☞ ☞ ✞☞ ✌ ☞ ✗

c φ1 φ 2 φ 3

= ) , | ( λ d c P

i i c

P c P ) | ( ) ( φ

∑ ∏

'

) ' | ( ) ' (

c i i c

P c P φ

✪✬✫ ✭ ✮✬✯ ✰

+∑

i i c

P c P ) | ( log ) ( log exp φ

∑ ∑

✱✳✲ ✴ ✵✳✶ ✷

+

'

) ' | ( log ) ' ( log exp

c i i c

P c P φ

✸✬✹ ✺ ✻✬✼ ✽

i ic ic

c d f ) , ( exp λ

∑ ∑

✾✬✿ ❀ ❁✬❂ ❃

' ' '

) ' , ( exp

c i ic ic

c d f λ

❄✄❅ ❆✝❇ ❈✠❉ ❊ ❅ ❋ ❈● ❍
  • ■❑❏
❅ ▼ ❈ ◆ ❖P ▼ ❈ ▼ ▲ ❍✄❅ ◗ ❘ P ❙ ❈ ◗❯❚
slide-43
SLIDE 43

Comparison to Naïve-Bayes

  • The primary differences between Naïve-

Bayes and maxent models are:

Naïve-Bayes Maxent

✁✄✂ ☎ ✆ ✝ ✞ ✂ ✟ ☎ ✟ ✟ ✝ ✠ ✂ ✡ ✆ ☛ ✟ ✝ ☞ ☞ ✌✎✍ ✏✒✑ ✡ ✂ ☞ ✂ ✑ ✡ ✂ ✑ ✆ ✂ ✓ ✏ ✡ ✂ ✑ ✔ ✂✖✕ ✁✄✂ ☎ ✆ ✝ ✞ ✂ ✟ ✗ ✂ ✏✒✘ ✙ ✆ ✟ ✆ ☎ ✚ ✂ ✛ ✂ ☎ ✆ ✝ ✞ ✂ ✡ ✂ ☞ ✂ ✑ ✡ ✂ ✑ ✔ ✂ ✏✒✑ ✆ ☛ ☎ ✔ ✔☛ ✝ ✑ ✆ ✕ ✁✄✂ ☎ ✆ ✝ ✞ ✂ ✗ ✂ ✏ ✘ ✙ ✆ ✟ ✔ ☎ ✑ ✜ ✂ ✟ ✂ ✆ ✏✒✑ ✡ ✂ ☞ ✂ ✑ ✡ ✂ ✑ ✆ ✌✎✍ ✕ ✁✄✂ ☎ ✆ ✝ ✞ ✂ ✗ ✂ ✏ ✘ ✙ ✆ ✟ ✠ ✝ ✟ ✆ ✜ ✂ ✠ ✝ ✆ ✝ ☎ ✌ ✌ ✍ ✂ ✟ ✆ ✏ ✠ ☎ ✆ ✂ ✡ ✕ ✁✄✂ ☎ ✆ ✝ ✞ ✂ ✟ ✠ ✝ ✟ ✆ ✜ ✂ ☛ ✛ ✆ ✙ ✂ ✔☛ ✑ ✢ ✝ ✑ ✔ ✆ ✏ ✓ ✂ ✣

(d) ∧ c = ci

✛ ☛ ✞ ✠ ✕ ✁✄✂ ☎ ✆ ✝ ✞ ✂ ✟ ✑ ✂ ✂ ✡ ✑ ☛ ✆ ✜ ✂ ☛ ✛ ✆ ✙ ✂ ✔☛ ✑ ✢ ✝ ✑ ✔ ✆ ✏ ✓ ✂ ✛ ☛ ✞ ✠ ✤ ✜✎✝ ✆ ✝ ✟ ✝ ☎ ✌ ✌✎✍ ☎ ✞ ✂ ✥ ✕ ✦ ✞ ☎ ✏✒✑ ✂ ✡ ✆ ☛ ✠ ☎ ✧ ✏ ✠ ✏✄★ ✂ ✢ ☛ ✏ ✑ ✆ ✌ ✏ ✚ ✂ ✌ ✏ ✙ ☛ ☛ ✡ ☛ ✛ ✡ ☎ ✆ ☎ ☎ ✑ ✡ ✔ ✌ ☎ ✟ ✟ ✂ ✟ ✕ ✦ ✞ ☎ ✏✒✑ ✂ ✡ ✆ ☛ ✠ ☎ ✧ ✏ ✠ ✏✄★ ✂ ✆ ✙ ✂ ✔☛ ✑ ✡ ✏ ✆ ✏ ☛ ✑ ☎ ✌ ✌ ✏ ✚ ✂ ✌ ✏ ✙ ☛ ☛ ✡ ☛ ✛ ✔ ✌ ☎ ✟ ✟ ✂ ✟ ✕
slide-44
SLIDE 44

Example: Sensors

NB FACTORS:

  • P(s) = 1/2
  • P(+|s) = 1/4
  • P(+|r) = 3/4

Raining Sunny

✁ ✂ ✄✆☎ ✄ ☎ ✝ ✞ ✟ ✠ ✡ ☛ ✁ ✂ ✄✆☎ ✄✆☎ ☞ ✞ ✟ ✌ ✡ ☛

Reality

✍ ✎✑✏✓✒ ✏✓✒ ✔ ✕ ✖ ✗ ✘ ✙ ✍ ✎ ✏ ✒ ✏✓✒ ✚ ✕ ✖ ✛ ✘ ✙

Raining? M1 M2 NB Model

PREDICTIONS:

P(r,+,+) = (½)(¾)(¾)

P(s,+,+) = (½)(¼)(¼)

P(r|+,+) = 9/10

P(s|+,+) = 1/10

slide-45
SLIDE 45

Example: Sensors

  • Problem: NB multi-counts the evidence.
  • Maxent behavior:

Take a model over (M1,…Mn,R) with features:

fri: Mi=+, R=r

✄ ☎ ✆✞✝ ✟✡✠ ☛

λ λ λ λ

☞ ✌ ✂

fsi: Mi=+, R=s

✄ ☎ ✆✞✝ ✟✡✠ ☛

λ λ λ λ

✍ ✌ ✎

exp(λ λ λ λ

☞ ✌
  • λ

λ λ λ

✍ ✌

) is the factor analogous to P(+|r)/P(+|s)

… but instead of being 3, it will be 3

✑ ✒✔✓ ✏

… because if it were 3, E[fri] would be far higher than the target of 3/8!

) | ( ) | ( ... ) | ( ) | ( ) ( ) ( ) ... | ( ) ... | ( s P r P s P r P s P r P s P r P + + + + = + + + +

slide-46
SLIDE 46

Example: Stoplights

Lights Working Lights Broken P(g,r,w) = 3/7 P(r,g,w) = 3/7 P(r,r,b) = 1/7 Working? NS EW NB Model Reality

NB FACTORS:

  • P(w) = 6/7
  • P(r|w) = 1/2
  • P(g|w) = 1/2
  • P(b) = 1/7
  • P(r|b) = 1
  • P(g|b) = 0
slide-47
SLIDE 47

Example: Stoplights

  • What does the model say when both lights are red?

P(b,r,r) = (1/7)(1)(1) = 1/7 = 4/28

P(w,r,r)= (6/7)(1/2)(1/2) = 6/28 = 6/28

P(w|r,r) = 6/10!

  • We’ll guess that (r,r) indicates lights are working!
  • Imagine if P(b) were boosted higher, to 1/2:

P(b,r,r) = (1/2)(1)(1) = 1/2 = 4/8

P(

,r,r) = (1/2)(1/2)(1/2) = 1/8 = 1/8

P(w|r,r) = 4/5!

Changing the parameters, bought conditional accuracy at the expense of data likelihood!

slide-48
SLIDE 48

Issues of Scale

✂ ✄ ☎ ✂ ✆ ✆✞✝ ✟ ✄✡✠ ☛ ✝ ☎ ☞ ✌ ✍✎ ✏ ✑✒ ✓ ✔✕ ✖ ✗✘ ✙✛✚ ✜✛✢ ✣✤ ✥ ✦ ✤ ✧ ✚ ✘ ✧ ✚ ★ ✩ ✪ ✫ ✚ ✤ ✬ ✭ ★ ✚ ✢✯✮ ✰ ✱ ✧ ✚ ✥ ✢ ✬ ✘ ★ ✲ ✥✳ ✤ ✢ ✲ ✥✳ ✜✛✚ ✤ ★ ★ ✤ ✴ ✘ ✫✶✵ ✤ ★ ✤ ✗ ✚ ✬ ✚ ★ ✧ ✤ ✜ ✭ ✚ ✢ ✣ ✤ ✥ ✦ ✤ ✧ ✚ ✤ ✢ ✭ ✷ ✢ ✬ ✤ ✥ ✬ ✲ ✤ ✜ ✗✚ ✗ ✘ ★ ✴ ✣ ✘ ✢ ✬ ✮ ✸ ✹✻✺ ✼ ✽ ✺ ✾ ✽ ✿❀ ❁ ✽ ❂ ✼✡❃❄ ✰ ❅ ✧ ✚ ★ ✫ ✲ ✬ ✬ ✲ ✥✳ ✧ ✚ ★ ✴ ✚ ✤ ✢ ✴✻❆ ✥ ✚ ✚ ✙ ✢ ✗✘ ✘ ✬ ✦ ✲ ✥✳ ❇ ✰ ✪ ✤ ✥ ✴ ✫ ✚ ✤ ✬ ✭ ★ ✚ ✢ ✢ ✚ ✚ ✥ ✲ ✥ ✬ ★ ✤ ✲ ✥ ✲ ✥✳ ❈ ✲ ✜ ✜ ✥ ✚ ✧ ✚ ★ ✘ ✣ ✣ ✭ ★ ✤ ✳ ✤ ✲ ✥ ✤ ✬ ✬ ✚ ✢ ✬ ✬ ✲ ✗ ✚ ✮ ✸ ❉ ✿ ✼ ❂ ❊ ❂●❋ ❀ ✼ ❂ ✺ ❍ ✿ ❁ ✺ ■❏▲❑ ❊ ✽ ❄ ✰ ▼ ✚ ✤ ✬ ✭ ★ ✚ ❈ ✚ ✲◆✳ ✦ ✬ ✢ ✣✤ ✥ ✷ ✚ ✲ ✥ ✫ ✲ ✥ ✲ ✬ ✚✯❖ ✤ ✥ ✙ ✲ ✬ ✚ ★ ✤ ✬ ✲ ✧ ✚ ✢ ✘ ✜ ✧ ✚ ★ ✢ ✣✤ ✥ ✬ ✤ P ✚ ✤ ✜ ✘ ✥✳ ✬ ✲ ✗✚ ✬ ✘ ✳ ✚ ✬ ✬ ✘ ✬ ✦ ✘ ✢ ✚ ✲ ✥ ✫ ✲ ✥ ✲ ✬ ✲◆✚ ✢✯✮
slide-49
SLIDE 49

Smoothing: Issues

  • Assume the following empirical distribution:
  • Features: {Heads}, {Tails}
  • We’ll have the following model distribution:
  • Really, only one degree of freedom (λ = λ
  • λ

)

t h

✄✆☎ ✝ ✞✠✟ ✡✆☛ ☎ ☞✠✟

T H H

HEADS λ λ λ

e e e p + =

T H T

TAILS λ λ λ

e e e p + =

HEADS

T T T H T H

e e e e e e e e e p + = + =

− − − λ λ λ λ λ λ λ λ TAILS

e e e p + =

λ

λ

slide-50
SLIDE 50

Smoothing: Issues

  • The data likelihood in this model is:

TAILS HEADS

log log ) | , ( log p t p h t h P + = λ ) 1 ( log ) ( ) | , ( log

λ

λ λ e h t h t h P + + − =

2 2

✁✄✂ ☎ ✆✞✝ ✟✄✠ ✂ ✡✞✝

1 3

✁✄✂ ☎ ✆ ✝ ✟✄✠ ✂ ✡✞✝

4

✁✄✂ ☎ ✆✞✝ ✟ ✠ ✂ ✡✞✝

λ λ λ

log P log P log P

slide-51
SLIDE 51

Smoothing: Early Stopping

✂ ✄ ☎ ✆ ✝ ✞ ✟ ✠✡ ☛ ✆✌☞ ✄ ☎ ✆✍ ✆ ✎ ✆ ✍ ✆ ✄ ✎ ✏ ✑ ✍ ✏ ✒✓ ✆✔ ☛ ✕ ✖ ✗ ✘✚✙ ✛✜ ✢ ✣✥✤ ✦ ✧✩★ ✦ ✧✩✪ ✙ ✛ ✫

λ

✬ ✦✭

✮ ✬ ✘ ✣✥✯ ✘ ✣ ✭ ✦ ✧ ✛✰ ✱ ✢ ✲ ✣ ✜ ✫ ✛ ✲ ✦ ✰ ✛✜ ✢ ✣ ✤ ✣✴✳ ✦ ✢ ✣ ✛✰ ✜ ✲ ✛ ✯ ✙ ✵ ✪ ✲ ✙✷✶ ✸ ✗ ✘✚✙ ✧✚✙ ✦ ✲ ✰ ✙ ✵ ✵ ✣ ✭ ✢ ✲ ✣ ✹ ✪ ✢ ✣ ✛ ✰ ✣ ✭ ✺ ✪ ✭ ✢ ✦✭ ✭ ✜ ✣ ✻ ✙ ✵ ✦✭ ✢ ✘ ✙ ✙ ✤ ✜ ✣ ✲ ✣✥✯ ✦ ✧ ✛✰ ✙ ✼ ✰ ✛ ✭ ✤ ✛ ✛ ✢ ✘ ✣ ✰ ✱ ✶ ✽ ✾❀✿ ❁ ❂ ❃ ❄ ❅ ❆ ❇ ❆ ❈❊❉ ❁ ❋ ❆ ❅
❇ ❇ ■ ❁ ❇ ❍ ❇ ❅ ❆ ❏ ■ ❇ ❅ ❇ ❅ ❆ ❑ ❅
❆ ❑ ❅ ❍ ▲ ❍ ▼ ❃ ❅ ❍ ❆ ✿ ❁ ❃◆ ❈ ❄P❖ ❃ ◗ ❅ ❁ ◆ ❃ ◗ ❁ ❂ ❍ ❅ ❁ ◆ ❃ ❅ ❍ ❆ ✿ ❇✌❘ ✸ ✗ ✘✚✙ ★ ✦ ✧✩✪ ✙ ✛ ✫

λ

✬ ✣ ✧ ✧ ✹ ✙ ✫ ✣ ✰ ✣ ✢ ✙ ❙ ✹ ✪ ✢ ✜ ✲ ✙ ✭ ✪ ✤ ✦ ✹ ✧✩❚ ✹ ✣ ✱ ❯ ✶ ✸ ✗ ✘✚✙ ✛✜ ✢ ✣✥✤ ✣ ✳ ✦ ✢ ✣ ✛✰ ✬ ✛✰ ❱ ✢ ✢ ✦ ✻ ✙ ✫ ✛ ✲ ✙ ★ ✙ ✲ ❙ ✯ ✧✚✙ ✦ ✲ ✧✩❚ ❯ ✶ ✸ ❲ ✛ ✤ ✤ ✛✰ ✧✩❚ ✪ ✭ ✙ ✵ ✣ ✰ ✙ ✦ ✲ ✧ ❚ ✤ ✦ ❳ ✙ ✰ ✢ ✬ ✛ ✲ ✻ ✶

4

❨❬❩ ❭ ❪❴❫ ❵❬❛ ❩ ❜ ❫

1

❨❬❩ ❭ ❪❴❫ ❵❬❛ ❩ ❜ ❫ ❝❬❞ ❡ ❢ ❣ ❤✴❢ ❣ ❡ ❢ ❣

λ

slide-52
SLIDE 52

Smoothing: Priors (MAP)

✂☎✄ ✆ ✝ ✞✠✟ ✡ ✂☎✄ ☛ ✄ ☞✌ ✝✎✍ ✌ ✡ ✏ ☞ ✡✑ ✆ ✄ ✆ ✝✎✍ ✒ ✆ ✂☎✄ ✆ ☞ ✄ ✌ ✄ ✓ ✡ ✆ ✡ ✌ ✔ ✄ ✕✗✖ ✡✘ ✟ ✍ ✖ ✕ ☛ ✒ ✙ ✆ ✚ ✡ ✔ ✡ ✌ ✛ ✕ ✄ ✌✜ ✡ ✢
✡ ✑ ✍ ✖ ✕ ☛ ✆ ✂ ✡ ✒ ✚ ✄ ✕ ✄ ✒ ✑ ✡ ✡ ✔ ✝ ☛ ✡ ✒ ✑ ✡ ✘ ✖ ✜ ✜ ✡✘ ✆ ✝ ✒ ✜ ✕ ✄ ✌ ✜ ✡ ☞ ✄ ✌ ✄ ✓ ✡ ✆ ✡ ✌ ✘ ✣ ✍ ✌ ✝ ✒ ✞ ✝ ✒ ✝ ✆ ✡ ✤ ✄ ✜ ✄ ✝ ✒ ✘ ✆ ✍ ✖ ✌ ☞✌ ✝✎✍ ✌✦✥
✂ ✡ ✡ ✔ ✝ ☛ ✡ ✒ ✑ ✡ ✟ ✍ ✖ ✕ ☛ ✒ ✡ ✔ ✡ ✌ ✆ ✍ ✆ ✄ ✕ ✕ ✛ ☛ ✡ ✞ ✡ ✄ ✆ ✆ ✂ ✡ ☞✌ ✝ ✍ ✌✦★ ✄ ✒ ☛ ☞ ✄ ✌ ✄ ✓ ✡ ✆ ✡ ✌ ✘ ✟ ✍ ✖ ✕ ☛ ✚ ✡ ✘ ✓✍ ✍ ✆ ✂ ✡ ☛ ✣ ✄ ✒ ☛✩ ✡ ☞ ✆ ✞ ✝ ✒ ✝ ✆ ✡ ✪ ✤ ✥
✡ ✑ ✄ ✒ ☛☎✍ ✆ ✂ ✝ ✘ ✡ ✏ ☞ ✕ ✝ ✑ ✝ ✆ ✕ ✛ ✚ ✛ ✑ ✂ ✄ ✒ ✜ ✝ ✒ ✜ ✆ ✂ ✡ ✍ ☞ ✆ ✝ ✓ ✝✬✫ ✄ ✆ ✝ ✍ ✒ ✍ ✚ ✭ ✡ ✑ ✆ ✝ ✔ ✡ ✆ ✍ ✓ ✄ ✏ ✝ ✓ ✖ ✓ ☞ ✍ ✘ ✆ ✡ ✌ ✝✎✍ ✌ ✕ ✝ ✩ ✡ ✕ ✝ ✂☎✍ ✍ ☛✗✮

) , | ( log ) ( log ) | , ( log λ λ λ D C P P D C P + =

Posterior Prior Evidence

slide-53
SLIDE 53

Smoothing: Priors

  • ✁✄✂
☎ ✆ ✆ ✝ ✂ ✞✠✟ ✡☛ ☞ ☎ ✂ ✌ ☛ ✂ ✍ ✝✏✎ ✟ ✑ ☛ ✝ ✡ ☛ ✆ ✒ ✓ ✔ ✞ ✍ ☎ ✝ ✍ ✝ ✡ ✞ ✒ ✑ ✂ ☛ ✂ ✕✖ ✍ ✖ ☛ ✆ ☎ ✆ ☎ ✂ ✗ ✗✙✘ ✚ ✡ ✞ ✛ ✍ ✜ ✖ ✗ ✂ ☛ ✢ ✖✠✣ ✓ ✤ ✡☛ ✕ ✂ ✗ ✝✄✥ ✂ ✍ ✝ ✡ ✞ ✒ ✑ ☛ ✝ ✡ ☛ ✖ ✦ ✑ ✖ ✎ ✍ ✂ ✍ ✝ ✡ ✞ ✍ ✧ ✂ ✍ ✖ ✂ ✎ ✧ ✑ ✂ ☛ ✂ ✕✖ ✍ ✖ ☛ ✚ ✝ ✗ ✗ ✜ ✖ ✌ ✝ ✆ ✍ ☛ ✝ ✜ ☎ ✍ ✖ ✌ ✂ ✎ ✎ ✡☛ ✌ ✝ ✞ ✢ ✍ ✡ ✂ ✢ ✂ ☎ ✆ ✆ ✝ ✂ ✞ ✚ ✝ ✍ ✧ ✕ ✖ ✂ ✞

µ

✂ ✞ ✌✙★ ✂ ☛ ✝ ✂ ✞✎ ✖

σ

✩ ✣ ✓ ✪ ✖ ✞ ✂ ✗ ✝✄✥ ✖ ✆ ✑ ✂ ☛ ✂ ✕✖ ✍ ✖ ☛ ✆ ✫ ✡ ☛ ✌ ☛ ✝ ✫ ✍ ✝ ✞ ✢ ✍ ✡ ✫ ✂ ☛ ✫ ☛ ✡ ✕ ✍ ✧ ✖ ✝ ☛ ✕✖ ✂ ✞ ✑ ☛ ✝ ✡ ☛ ★ ✂ ✗ ☎ ✖ ✬ ☎ ✆ ☎ ✂ ✗ ✗ ✘

µ

✭ ✮ ✯ ✣ ✓ ✰

σ

✩ ✭ ✱ ✚ ✡ ☛ ✲ ✆ ✆ ☎ ☛ ✑ ☛ ✝ ✆ ✝ ✞ ✢ ✗ ✘ ✚ ✖ ✗ ✗ ✣ ✳ ✴✶✵ ✷ ✸✶✹✺ ✻ ✼ ✵ ✽ ✵ ✺ ✾✿❀ ❁ ✼ ✿ ❂ ❁❄❃ ✵ ❅ ✷ ✺ ✿ ❅✵ ✿ ✺ ✷ ❅✹❆ ✵ ❇ ❈ ❈ ❉ ❊ ❋ ❋

− − =

2 2

2 ) ( exp 2 1 ) (

i i i i i

P σ µ λ π σ λ

2σ2 =1 2σ2 = 10 2σ2 = ∞

slide-54
SLIDE 54

Smoothing: Priors

  • If we use gaussian priors:
✁ ✂☎✄ ✆ ✝✟✞ ✠ ✡ ✡☞☛ ✠ ✌ ✞ ✞ ✍ ✎ ✞ ✏ ✑ ✆ ✑ ✒ ✠✓ ✔ ✌ ✆ ✑ ✏ ✕ ✒ ✓ ✖ ✡ ✠ ✄ ☛ ✌ ✆ ✗ ✗ ✞ ✄ ✎ ✆✄ ✆ ✌ ✞ ✑ ✞ ✄ ☛ ✘ ✁ ✙ ✕✟✞ ✓ ✌ ✚ ✗ ✑ ✒ ✎ ✗ ✞ ✡ ✞ ✆ ✑ ✚ ✄ ✞ ☛ ✏ ✆ ✓ ✛ ✞ ✄ ✞ ✏ ✄ ✚ ✒ ✑ ✞ ✝ ✑ ✠ ✞ ✍ ✎ ✗ ✆ ✒ ✓ ✆ ✝ ✆ ✑ ✆ ✎ ✠ ✒ ✓ ✑✢✜ ✑ ✕✟✞ ✌ ✠ ✄ ✞ ✏ ✠ ✌ ✌ ✠ ✓ ✠✓ ✞ ☛ ✖✞ ✓ ✞ ✄ ✆ ✗ ✗☞✣ ✄ ✞ ✏ ✞ ✒✥✤ ✞ ✌ ✠ ✄ ✞ ✦ ✞ ✒ ✖ ✕ ✑ ✘ ✁ ✧ ✏ ✏ ✚ ✄ ✆ ✏ ✣ ✖ ✞ ✓ ✞ ✄ ✆ ✗ ✗ ✣ ✖ ✠ ✞ ☛ ✚ ✎ ★
  • Change the objective:

Change the derivative: ) ( log λ P − ) , | ( log ) | , ( log λ λ D C P D C P =

=

) , ( ) , (

) , | ( ) | , ( log

D C d c

d c P D C P λ λ k

i i i i

+ − −∑

2 2

2 ) ( σ µ λ

) , ( predicted ) , ( actual / ) | , ( log λ λ λ

i i i

f C f D C P − = ∂ ∂

2

/ ) ( σ µ λ

i i −

2σ2 =1 2σ2 = 10 2σ2 = ∞

slide-55
SLIDE 55

Example: NER Smoothing

✂✁ ✄ ☎
✆ ✝ ✞ ✟ ✠☛✡ ☞✍✌ ✎ ✏ ✑ ✒ ✓ ✒ ✎✕✔ ✖ ✗ ✌ ✑ ✘✍✙ ✂✁ ✄ ☎ ✟
✆✚ ✡ ✟ ✠☛✡ ✟ ✠☛✡ ☞✍✌ ✎ ✏ ✟ ✖ ✗ ✌ ✟ ✛ ✎ ✡ ✒ ✑ ✘✍✙ ✜✣✢ ✤✥ ✦ ✧ ✢ ★ ✥ ✩✍✪ ✫ ✬ ✭✯✮ ✰ ✂✁ ✝✱ ✟
✟ ✡ ✟ ✠☛✡ ☞ ✁ ✑ ✒ ✓ ✒ ✎ ✟ ✲✳✟ ✖ ✗ ✌ ✑ ✘ ✙ ✂✁ ✴ ✆
✡ ✵ ✗ ✌ ✌ ✎ ✛ ✒ ✑ ✘ ✙ ✛ ✓ ✒ ✗ ✌ ✎ ✟
✚ ✱ ✟
✒ ✶ ✎ ✌ ☞✍✌ ✎ ✏ ✘✍✷ ✗ ✑ ✑ ✒ ✓ ✒ ✎ ✂✁ ✸ ✴ ✟
  • ✹✺
✺ ✺ ☞ ☞✍✌ ✎ ✏ ✓ ✛ ✻ ✖ ✗ ✌ ✒ ✓ ✙ ✑ ✂✁ ✴✼
✴ ☎ ✺ ✺ ☞ ✵ ✗ ✌ ✌ ✎ ✛ ✒ ☞ ✞✽ ✒ ✓ ✙ ✟
✴✼ ✾ ✿ ❀ ✎ ✙ ✘ ✛ ✛ ✘ ✛✙ ❁ ✘✍✙ ✌ ✓ ❂ ✂✁
✿❄❃ ❅❆ ❇ ✵ ✗ ✌ ✌ ✎ ✛ ✒ ❈ ✷ ✌ ✻ ✂✁ ✚ ✴ ✟
☎ ✄ ❅ ❉ ☞✍✌ ✎ ✏ ✘✍✷ ✗ ✑ ❈ ✷ ✌ ✻ ❊ ❋● ❍■ ❏ ❑ ▲✍▼ ✬ ✫ ◆ ❖ ▼ ▲ ▼ ✬ ✫ ◆ ❖ ▼ ✩☛P ◗ ▼ ❘✳❙ ❘✳❙ ❙ ❚ ❯ ❱ ❲ ❲❳ ❲ ❲❳ ❨ ❲ ❩❭❬ ❱ ❪❭❫ ❬ ❴ ❵❜❛ ❬ ❝❞ ❬ ❡ ❢ ❫ ❛ ❴ ❣ ❣ ❣ ❣ ❣ ❣ ❤ ❡ ✐ ❞ ❛ ❚ ❡ ❬ ❡ ❞ ❲ ❞ ❙ ❡ ❥❧❦ ❛ ❳ ❛ ❞ ♠

Local Context Feature Weights

♥❭❞ ❝ ❬ ❦ ♦ ❞ ❫ ♣ ♦q ❫ ❫ ❡ ✐ ❯❄r ❱ts ❡ ✐ ❞ q ❫ ❛ ❞ ❝ ❫ q q ❫ r ✉ ❛ ❞ ♣ ❯ ❙ ❬ r ❴ ♦ ❯❄r ❱ ✈ ❞✳✇ ❡ ❬ ❱ ♣ ❞ ❬ ❡ ❦ ❛ ❞ ♦ ✐ ❬ ♠ ❞ ✈ ❬ ❛ ❱ ❞ ❛ ① ❞ ❯ ❱ ✐ ❡ ♦ ❞ ♠ ❞ r ❡ ✐ ❫ ❦ ❱ ✐ ❞ r ❡ ❯ ❛ ❞✳✇ ① ❫ ❛ ❴ ❬ r ❴ ❡ ❬ ❱ ✇ ✉ ❬ ❯ ❛ ♣ ❞ ❬ ❡ ❦ ❛ ❞ ♦ ❬ ❛ ❞ q ❫ ❛ ❞ ♦ ✉❞ ❝ ❯ ♣ ❯ ❝t②
slide-56
SLIDE 56

Example: POS Tagging

  • ✁✄✂
☎ ✆ ✝ ✞ ☎ ✟ ✠ ✡ ☛ ☎ ☞ ✡ ✌ ✠ ✡ ✍ ✎✏ ✑✒ ✒ ✓ ✔✖✕
✆ ☎ ☎ ✠ ✘ ✙ ☛✚ ✘ ✌ ✍✜✛ ✢ ✕ ✣ ✤✦✥ ✧✩★ ✪ ✫✬ ✭ ✮ ✬ ★ ✯ ✮ ✰✲✱ ★ ✮ ✥ ✫✬✴✳ ✵ ✶ ✱ ✬ ✷ ✪ ✬ ✸ ✪ ✮✺✹ ✷ ★ ✥ ✫ ★ ✥ ✻ ✥ ✯ ✪ ✪ ✼ ✽ ✾❀✿ ✫ ✿ ★ ✥ ✯ ❁ ✧ ✪ ✿ ★ ✱ ✯ ✪ ✬ ✳ ✵ ❂ ✾ ✾ ✥ ✸ ✬ ✻ ✿ ✫ ❁ ✧ ✪ ✿ ★ ✱ ✯ ✪ ✬ ★ ✥ ✰ ✪ ✭ ✱ ✻ ✽ ✪ ✭ ✬ ✿ ✧ ✪ ✾ ❁ ✮ ✫ ★ ✥ ★ ✷ ✪ ✻ ✮ ✼ ✳ ✵ ✤ ✽ ✪ ✪ ✭❀✬ ✱ ✽❃ ✥ ✫ ❄ ✪ ✯ ✹ ✪ ✫ ❃ ✪ ❅ ✮ ✧ ✰❀✥ ★ ✷ ✿ ✯ ✪ ✿ ✾ ✾ ✥ ✸ ✪ ✭ ★ ✥ ❃ ✥ ✫ ❄ ✪ ✯ ✹ ✪ ❆ ❇

88.20 97.10

❈ ❉ ❊ ❋
  • ✺❍
■ ■ ❊ ❋ ❉❑❏ ▲

85.20 96.54

❈ ❉ ❊ ❋ ■ ▼ ❊
  • ✺❍
■ ■ ❊ ❋ ❉❑❏ ▲ ◆ ❏ ❖ ❏ ■ P ❏ ❈ ■◗ ❘ ❙ ❚ ❚ ❯✦❱ ❲ ◗❳ ❨ ❨ ❙ ❚ ❚ ▼ ◗ ❳ ❚ ❩
slide-57
SLIDE 57

Smoothing: Virtual Data

  • Another option: smooth the data, not the parameters.
  • Example:

Equivalent to adding two extra data points.

Similar to add-one smoothing for generative models.

  • Hard to know what artificial data to create!

4

✂☎✄ ✆ ✝✟✞ ✠☎✡ ✄ ☛✟✞

1 5

✂☎✄ ✆ ✝✟✞ ✠☎✡ ✄ ☛✟✞
slide-58
SLIDE 58

Part II: Optimization

  • a. Unconstrained optimization methods
  • b. Constrained optimization methods
  • c. Duality of maximum entropy and

exponential models

slide-59
SLIDE 59

Function Optimization

  • ✁✄✂
☎✆ ✝ ✞ ✟✠ ✝ ☎ ✝ ✡ ☎ ☛ ✠ ☞ ✠ ✟ ☎ ✝ ☎ ☞ ✆ ✂ ✌ ✠ ✟ ✠ ✍ ✞ ✟ ✎ ✟ ✏ ✞ ✑ ☎ ✏ ✞ ✡ ✂ ✂ ✒ ✟ ✂ ✒ ☎ ✏ ✓ ✔ ☎ ✟ ✎ ✆ ✝ ✌ ✞✄✕ ✒ ✝ ✡ ☎

λ

✔ ✡ ✞✄✖ ✡ ✟✠ ✍ ✞ ✟ ✞✘✗ ☎ ✆ ✙
☎ ✛ ✏ ✏ ✠ ☛ ☛ ☞ ✂ ✠ ✖ ✡ ✝ ✡ ✞ ✆ ✠ ✆ ✠ ✜ ☎ ✕ ☎ ☞ ✠ ✏ ✌ ✎ ✕ ✖ ✝ ✞ ✂ ✕ ✂ ☛ ✝ ✞ ✟ ✞✘✗ ✠ ✝ ✞ ✂ ✕ ☛ ☞ ✂ ✢ ✏ ☎ ✟ ✓ ✝ ✡ ✂ ✎ ✜ ✡ ✆ ☛ ☎ ✖ ✞ ✠ ✏✤✣ ☛ ✎ ☞ ☛ ✂ ✆ ☎ ✟ ☎ ✝ ✡ ✂ ✒ ✆ ☎ ✍ ✞ ✆ ✝✦✥
✕ ✠ ✒✩★ ✠ ✕ ✝ ✠ ✜ ☎ ✂ ✌ ✝ ✡ ☎ ✜ ☎ ✕ ☎ ☞ ✠ ✏✤✣ ☛ ✎ ☞ ☛ ✂ ✆ ☎ ✠ ☛ ☛ ☞ ✂ ✠ ✖ ✡ ✞ ✆ ✝ ✡ ✠ ✝ ✕ ✂ ✟ ✂ ✒ ✞ ✌ ✞ ✖ ✠ ✝ ✞ ✂ ✕ ✕ ☎ ☎ ✒ ✆ ✝ ✂ ✢ ☎ ✟✠ ✒ ☎ ✝ ✂ ✝ ✡ ☎ ✠ ✏ ✜ ✂ ☞ ✞ ✝ ✡ ✟ ✝ ✂ ✆ ✎ ☛ ☛ ✂ ☞ ✝ ✆ ✟ ✂ ✂ ✝ ✡ ✞ ✕ ✜ ✢✩✪ ☛ ☞ ✞ ✂ ☞ ✆ ✥

∑ ∑ ∑ ∑

=

) , ( ) , ( '

) , ' ( exp ) , ( exp log ) , | ( log

D C d c c i i i i i i

d c f d c f D C P λ λ λ

slide-60
SLIDE 60

Notation

  • ✁✄✂
✂ ☎ ✆✝ ✞ ✝ ✟✡✠ ☛ ✝ ✠ ☞ ☎ ✌✍ ✎ ✏✄✑ ✌

f(x)

☞✓✒ ✑ ✆

Rn

✎ ✑

R

✟ ✝ ✖ ✒ ✠ ✗ ✏ ✝ ✌ ✎

∇f(x)

✏ ✂ ✎ ✟ ✝

n×1

☛ ✝ ✍ ✎ ✑ ✒ ✑ ☞✓✘ ✠ ✒ ✎ ✏✄✠ ✙ ✗ ✝ ✒ ✏ ☛ ✠ ✎ ✏ ☛ ✝ ✂

∂f/∂xi

✟ ✝ ✚ ✝ ✂ ✂ ✏✄✠ ✌

f

✏ ✂ ✎ ✟ ✝

n×n

✆✠ ✎ ✒ ✏✢✜ ✑ ☞ ✂ ✝ ✍ ✑ ✌ ✗ ✗ ✝ ✒ ✏ ☛ ✠ ✎ ✏ ☛ ✝ ✂

∂2f/∂xi∂xj

          ∂ ∂ ∂ ∂ = ∇

n

x f x f f / / M           ∂ ∂ ∂ ∂ ∂ ∂ ∂ ∂ ∂ ∂ ∂ ∂ = ∇

n n n n

x x f x x f x x f x x f f / / / /

2 1 2 1 2 1 1 2 2

L M O M L f

slide-61
SLIDE 61

Taylor Approximations

  • Constant (zeroth-order):
  • Linear (first-order):
  • Quadratic (second-order):

) ( ) ( x f x f x =

) ( ) (

1

x f x f x = x x f

T 0)

( ∇ +

x x f x ) ( 2 1

2 T∇

+ ) ( ) (

2

x f x f x = x x f

T 0)

( ∇ +

slide-62
SLIDE 62

Unconstrained Optimization

  • Problem:
  • Questions:

Is there a unique maximum?

How do we find it efficiently?

Does f have a special form?

  • Our situation:

f is convex.

f’s first derivative vector ∇f is known.

f’s second derivative matrix ∇

f is not available.

) ( max arg

*

x f x

x

=

slide-63
SLIDE 63

Convexity

) (

i i i

x f w ∑ 1 = ∑

i i w

) (

i i i

x w f ∑ ≥

) (x f w

  • )

( x w f

Convex Non-Convex Convexity guarantees a single, global maximum because any higher points are greedily reachable.

slide-64
SLIDE 64

Optimization Methods

  • Iterative Methods:

Start at some xi.

Repeatedly find a new xi+1 such that f(xi+1) ≥ f(xi).

  • Iterative Line Search Methods:

Improve xi by choosing a search direction si and setting

Gradient Methods:

si is a function of the gradient ∇f at xi.

) ( max arg

1 i i ts x i

ts x f x

i i

+ =

+ +

slide-65
SLIDE 65

Line Search I

  • Choose a start point xi and a

search direction si.

  • Search along si to find the line

maximizer:

  • When are we done?

si

xi xi+1

) ( max arg

1 i i ts x i

ts x f x

i i

+ =

+ +

xi xi+1

∇f ⋅si

) (

i i

ts x f +

slide-66
SLIDE 66

Line Search II

  • ✁✄✂
☎ ✆ ✝ ✞ ☎ ✂ ✟ ✝ ✠ ✂ ✡ ☛ ☛ ✝ ✂ ☎ ✟ ☎ ✡☞✌ ✍ ✝ ✟ ✞ ✎ ✌ ✍ ✟ ✝ ✞✏ ☛ ☎ ☞ ✑ ✍ ✡ ✂ ✞ ✎ ☛ ✑ ✝ ✆ ✝ ✞ ☎ ✂ ✟ ✝ ✠ ✂ ✡ ☛ ✟ ☎ ✡ ☞✌ ✍ ✒
☎ ✔ ☎ ☞ ✡ ☛ ✕ ✡ ✖ ✟ ✑ ✠ ✗ ✝ ✂ ✆ ✑ ✍ ☎ ☛ ✝ ✂ ☎ ✞ ✡ ✘ ✝ ✞ ✝✚✙ ☎ ☞ ✛ ✜ ✢✣✥✤ ✣✧✦ ✣ ✤ ★ ✦ ★✩✪ ✫ ✬✮✭ ✯ ✩✪ ✪ ✰ ✱ ✣ ✯✲ ✩ ✱ ✣ ✯ ✳ ✰ ✱ ✫ ✰ ✯ ✴ ✩ ✣ ✯ ✣ ✯ ✲ ✴ ✬ ★ ✵ ✩ ✶✸✷ ✹ ✺ ★✻ ★ ✩ ✴ ★ ✳ ✩ ✻ ✻ ✪ ✰ ✶ ✣ ✵ ✩ ✴ ✣ ✰ ✯ ✭
slide-67
SLIDE 67

Gradient Ascent I

  • Gradient Ascent:

Until convergence:

✂☎✄ ✆✝ ✞ ✟ ✠ ✡☞☛ ✟☞☛ ✌ ✝ ✍ ✎ ✠ ✝ ✍ ☛

∇f(x)

✏ ✑✓✒ ✔ ✝ ✞☛ ✕ ☛ ✎ ✌✖ ✡ ✎ ✗ ✘ ✞✙

∇f(x)

✏ ✚

Each iteration improves the value of f(x)

✏ ✚

Guaranteed to find a local optimum (in theory could find a saddle point).

Why would you ever want anything else?

✛ ✜✣✢ ✤✦✥ ✧ ★✥ ✢ ✤✦✩ ✪✦✫ ✬ ✤ ✩ ✫ ✥ ✭ ✥ ✢ ✢ ✥ ✧ ✫ ✥ ✮ ✧ ✬ ✤ ✪ ✯ ✧ ✥ ✬ ✢ ✯ ✩ ✰✫✲✱ ✛ ✳ ✱ ✴ ✱ ✵

∇f(x)

★ ✮ ✶ ✭ ✥ ★ ✮ ✷ ✯ ★ ✮ ✸ ✸ ✶ ✹✻✺ ✼ ✤ ✯ ✸ ✸✽ ✵ ✭ ✺ ✢ ✶ ✩ ✺ ✾ ✪ ✧ ✮ ✢ ✤✦✥ ✧ ✭ ✥ ✼ ✩ ✯ ✰ ✢ ✥ ✪ ✫ ✢ ✧ ✮ ✯ ✴ ✤ ✢ ✮ ✢ ✢ ✤ ✥ ✫ ✩ ✸ ✺ ✢ ✯ ✩ ✰ ✿
slide-68
SLIDE 68

Gradient Ascent II

✂☎✄ ✆✝ ✞ ✟ ✠ ✄ ✡ ☛ ✠ ☞ ✞ ✌ ✍ ✞ ✎ ☞ ✏ ✄ ✝ ✏ ✄ ✡ ✟ ✠✒✑ ✓ ✌ ✞ ✝ ☛ ✔ ☛ ✂ ✄ ✌ ✄ ✕ ✄ ✌ ✑ ✓ ✝ ✕ ✄ ☞✗✖
✌ ✔ ✡ ✆ ✞ ✌ ✠ ✡ ✄✗✙ ☛ ✂☎✄ ✚ ✞ ✛ ✠ ✚ ✓ ✚ ✔ ✑ ✑ ✓ ✝ ☞ ✍ ✂ ✄ ✡ ☛ ✂ ✄ ✆✝ ✞ ✟ ✠ ✄ ✡ ☛ ✂ ✞ ☞ ✡ ✔ ✑ ✔ ✚ ✏ ✔ ✡ ✄ ✡ ☛ ✠ ✡ ☛ ✂☎✄ ✌ ✠ ✡ ✄ ✖
☛ ☛ ✂ ✞ ☛ ✏ ✔ ✠ ✡ ☛ ✙ ☛ ✂ ✄ ✆✝ ✞ ✟ ✠ ✄ ✡ ☛ ✠ ☞ ✔ ✝ ☛ ✂ ✔ ✆ ✔ ✡ ✞ ✌ ☛ ✔ ☛ ✂ ✄ ☞ ✄ ✞ ✝ ✑ ✂ ✌ ✠ ✡ ✄ ✙ ☞ ✔ ☛ ✂ ✄ ✡ ✄ ✛ ☛ ✟ ✠ ✝ ✄ ✑ ☛ ✠ ✔ ✡ ✍ ✠ ✌ ✌ ✜ ✄ ✔ ✝ ☛ ✂ ✔ ✆ ✔ ✡ ✞ ✌ ☛ ✔ ☛ ✂☎✄ ✌ ✞ ☞ ☛ ✖
slide-69
SLIDE 69

What Goes Wrong?

  • ✁✄✂
☎✆ ✝ ✞ ✟ ☎ ✠ ✠☛✡ ☞ ✌ ✍✏✎ ✑ ✒ ✓✔ ✕ ✖✗ ✎ ✘ ✙ ✔ ✓ ✚ ✙✜✛ ✢ ✗ ✚ ✒ ✢ ✖ ✢ ✓ ✎ ✣ ✚ ✢ ✚ ✒ ✔ ✤ ✗ ✔ ✥ ✙ ✢ ✦ ✛ ✣ ✙ ✓ ✔ ✛ ✔ ✎ ✗ ✑ ✒★✧ ✛ ✢ ✕ ✔ ✩ ✣ ✣ ✪ ✔ ✔ ✤ ✫ ✎ ✪ ✙ ✓ ✖ ✗ ✙ ✖ ✒ ✚✭✬ ✎ ✓ ✖ ✣ ✔ ✚ ✦ ✗ ✓ ✛ ✮ ✯ ✚ ✩ ✛ ✣ ✙ ✪ ✔ ✰ ✔ ✙ ✓ ✖ ✢ ✓ ✎ ✑ ✙ ✚ ✱ ✛ ✚ ✗ ✔ ✔ ✚ ✖ ✗ ✙ ✘ ✧ ✚ ✗ ✱ ✙ ✓ ✖ ✚ ✢ ✖ ✢ ✎ ✣ ✢ ✓ ✖ ✎ ✘ ✙ ✎ ✖ ✢ ✓ ✎ ✣✳✲ ✱ ✢ ✦ ✩ ✣ ✣ ✫ ✎ ✪ ✔ ✎ ✣ ✢ ✚ ✢ ✴ ✚ ✦ ✗ ✓ ✛ ✮ ✵ ✶✸✷ ✹ ✺✼✻ ✽ ✷ ✹ ✾✸✿ ✷ ❀ ❀☛❁❂ ❃ ❄ ✔ ✩ ✥ ✔ ❅ ✦ ✛ ✚ ✛ ✔ ✎ ✗ ✑ ✒ ✔ ✘ ✎ ✣ ✢ ✓ ✖ ✚ ✒ ✔ ✢ ✣ ✘ ✖✗ ✎ ✘ ✙ ✔ ✓ ✚ ✘ ✙ ✗ ✔ ✑ ✚ ✙ ✢ ✓

si-1 = ∇f(xi-1)

✮ ❃ ❆ ✒ ✔ ✓ ✔ ✕ ✖✗ ✎ ✘ ✙ ✔ ✓ ✚ ✙ ✛

∇f(xi)

✎ ✓ ✘ ✕ ✔ ✪ ✓ ✢ ✕

si-1

T⋅∇f(xi) = ∇f(xi-1)T⋅∇f(xi) = 0

✮ ❃ ❇ ✛ ✕ ✔ ✫ ✢ ✥ ✔ ✎ ✣ ✢ ✓ ✖

si = ∇f(xi),

✚ ✒ ✔ ✖ ✗ ✎ ✘ ✙ ✔ ✓ ✚ ✰ ✔ ✑ ✢ ✫✔ ✛

∇f(xi+tsi) ≈ ∇f(xi)

+ t∇

f(xi) si = ∇f(xi) + t∇

f(xi)∇f(xi).

❉ ❊ ❋❍● ■
  • ❏❍❑
▲ ■ ■ ❋
❑ ▼◆ ◆ ❖ P◗❘ ■ ❖ ❑ ❙

si-1

❚ ❯

si-1

T ⋅ (∇f(xi-1) + t∇

f(xi)∇f(xi))

❲ ❳

∇f(xi-1)T∇f(xi) + t∇f(xi-1)T∇

f(xi)∇f(xi)

❲ ❩

0 + t∇f(xi-1)T∇

f(xi)∇f(xi)

❳ ❬ ❭❪ ❫ ❴❛❵ ❜❝❞ ❡ ❢ ❵❣ ❫ ❢ ❭ ❝ ❵ ❜ ❝ ❪ ❤ ❢ ❣ ❜ ❞ ✐ ❪ ❥❦ ❪ ❣ ❵ ❣ ❫ ❢ ❣ ❫ ❴ ❵ ❧ ❞ ❭ ❫ ❡ ❢ ❝ ❵ ✐ ❫ ❢ ❪ ❣ ♠
slide-70
SLIDE 70

Conjugacy I

  • Problem: with gradient ascent,

search along si ruined optimization in previous directions.

  • Idea: choose si to keep the gradient

in the previous direction(s) zero.

  • If we choose a direction si, we want:

∇f(xi+tsi)

✂ ✄ ☎ ✂ ✆ ✝ ✄✞ ✂ ✟ ✄✠ ✄✡ ✆ ☛ ✂ ✄ ☞ ✞ ✌ ✍ ✎ ✄ ✏ ☎

s

si-1T ⋅

∇f(xi+tsi)] = 0

si-1T ⋅ [∇f(xi) + t∇

f(xi)si] = 0

si-1T ⋅ ∇f(xi) + si-1T ⋅ t∇

f(xi)si = 0

0 + si-1T ⋅ t∇

f(xi)si = 0

si-1 si

∇f(xi)

If ∇

f(x) is constant, then we want: si-1

T∇

f(x)si = 0

slide-71
SLIDE 71

Conjugacy II

  • The condition si-1

T∇

f(xi)si = 0

almost says that the new direction and the last should be

  • rthogonal – it says that they

must be ∇

f(xi)-orthogonal, or

conjugate.

  • Various ways to operationalize

this condition.

  • Basic problems:
✂ ✄✆☎ ✝ ☎✞ ☎✟ ✠ ✡ ✡☞☛ ✌✎✍ ✞ ✏✒✑ ✓ ✞ ✍ ✔

f(xi).

✖ ✗ ✑ ✔ ✍ ✘ ✡ ✌ ✞ ✏✒✑ ✙ ✚ ✑ ✚ ✞ ✛ ☎ ✛ ✍ ✟ ☛ ✠ ✞ ☛ ✔ ✠ ☛✢✜

si-

1

si

∇f(xi)

✣✥✤ ✦ ✧✩★✪ ★✫ ✬ ✭ ✮ ★ ✫ ✯✱✰ ✪ ✬ ✦ ✲
slide-72
SLIDE 72

Conjugate Gradient Methods

✂☎✄ ✆ ✄ ✝ ✄ ✞✟ ✠ ✡☛ ☞✄ ✌ ✂☎✍ ✎✑✏ ✒ ✓ ✝ ✌ ✔ ✠☎✕ ✍ ✝ ✖ ✄ ✞ ✆ ✄ ✝ ✕ ✄ ✏ ✗✙✘ ✚ ✛✢✜ ✣✥✤ ✦★✧ ✣ ✧ ✩ ✛✫✪ ✬ ✤ ✛✫✪ ✧

∇f(xi)

✭ ✮ ✭ ✯ ✧ ✰✱ ✪ ✧ ✲ ✱ ✰✳ ✱ ✜ ✧ ✜ ✤ ✴ ✱ ✵

∇f(xi)

✜ ✱ ✤ ✲ ✱ ✜ ✶✫✷ ✸ ✬ ✤ ✧ ✤ ✱ ✳ ✩ ✧ ✪ ✛ ✱ ✷ ✴ ✣ ✛ ✩ ✧ ✲ ✤ ✛ ✱ ✜ ✴ ✭ ✹ ✭ ✺ ✛✢✜ ✧ ✴ ✧ ✬ ✩ ✲ ✦ ✬ ✻★✱ ✜ ✸ ✤ ✦★✧ ✩ ✧ ✰ ✬ ✛✢✜ ✛✢✜ ✸✽✼ ✲ ✱ ✜ ✶✫✷ ✸ ✬ ✤ ✧ ✳ ✩ ✱ ✶✢✧ ✲ ✤ ✛ ✱ ✜ ✱ ✵

∇f(xi)

✭ ✾ ✿ ❀☎❁ ❂ ❃❄ ❅ ❃ ❆ ❅❈❇ ❉❊ ❃ ❄ ❁ ❅ ❉ ❊ ❆ ❁ ❋
  • ■❍
❏ ❑ ✵ ▲ ✧ ▼ ✜ ✱ ▲

f(xi)

❖P ◗✥❘ ❙ ❖❚ ❯ ❖ ❱ ❱★❲ ❙❳ ❨ ❩✢❬ ❭ ❪ ❪ ❳ ❖ ❙❚ ❫ ◗ ❩ ❙ ❳ ❚ ❘ ❩ ❬ P ❪✽❴ ❵ ❳ ❚ ❖ P ❩✢❛ ❲ ❱ ❳ ❛ ❳ P ❘ ❘ ❫ ❩ ❪ ◗ ❩ ❙ ❳ ❚ ❘ ❱❝❜❡❞ ❢ ❣ ❤ ❵ ❳ ◗ ❬ P ❬ ❘ ❯★P ❬ ❵

f(xi) –

❵ ❳ ◗ ❬ P ✐ ❘ ❤ ❬ ❙ ❛ ❖ ❥ ❳ P ❘ ❛ ❬ ◗ ❳ ❱ ❩ P ❦♠❧ ❖ P ◗ ❩ ❘ ❩ ❪ P ✐ ❘ ❚ ❬ P ❪ ❘ ❖P ❘ ♥ ❩ ❘ ✐ ❪ P ❬ ❘ ♦ ❴ ❘ ❫ ❳ ❙ ❳ ❖ ❙ ❳ ❬ ❘ ❫ ❳ ❙ ♥♣ ❳ ❘ ❘ ❳ ❙ ♦ ❵ ❖ ❜ ❪ ❞ q r✢s t t ✉✇✈ ✉✇① ② ③ ③ ④ ① ②⑤ s ⑥ ① ✈ ④ ② ⑦ s ⑧⑨ ✈ ⑩ ③ ④ ③ ❶ ① ⑤ ✉ ② ⑧ ❷ ① ❸ ⑥ ① ❹ ✉ ④ s ⑤ ❺ ✉ ⑥ ① ✈ ③ ✉ ④ ②❼❻ ❽ ❾ ⑨ ② ❺ ④ ③ ❶ ✉ ⑤ ❿ ✉ ③ ❶ ③ ❶ ① t ④ ❷ ❷ ④ ❿ ✉ ② ⑧ ⑥ ① ✈ s ⑥ ⑥ ① ② ✈ ① ⑤ ➀ ➁ ❷ ① ③ ✈ ❶ ① ⑥✫➂ ➃ ① ① ❹ ① ⑤ ➄➆➅

1

) (

+ ∇ =

i i i i

s x f s β

) ( ) ( ) ( ) (

1 1 − Τ − Τ

∇ ∇ ∇ ∇ =

i i i i i

x f x f x f x f β

slide-73
SLIDE 73

Constrained Optimization

  • Goal:

subject to the constraints:

  • Problems:

Have to ensure we satisfy the constraints.

No guarantee that ∇f(x*) = 0, so how to recognize the max?

Solution: the method of Lagrange Multipliers

) ( max arg

*

x f x

x

=

) ( : = ∀ x g i

i

slide-74
SLIDE 74

Lagrange Multipliers I

✂ ✄ ☎ ✆✞✝ ✟ ✄ ✆ ✠ ✄ ✡☞☛

∇f(x*) = 0.

  • ✌✎✍
✏ ✑ ✒✞✓ ✄✔ ✝ ✍ ✏ ✂ ✕ ✄ ✑✎✍ ✂ ✕ ✓ ☎ ✑ ✝ ✍ ☛

∇f(x*)

✔ ✄ ✍ ✟ ✓ ✍ ✝ ✍ ✖ ✗ ✓ ✕ ✝ ☛ ✟✙✘ ✂ ✑ ✂ ✏ ✚ ✕ ✝ ✛✎✓ ✔ ✂ ✑ ✝ ✍ ✑ ✍ ✏ ✑ ✒✞✓ ✂ ✜✞✓ ✔ ✝ ✍ ✏ ✂ ✕ ✄ ✑ ✍ ✂ ✠ ✘ ✏ ✂ ✟ ✓ ✗ ✓ ✕ ✝ ✢
  • ✌✎✍
✂ ✣ ✝ ✒ ✑ ✠ ✓ ✍ ✏ ✑ ✝ ✍ ✏ ☛ ✂ ✜ ✑ ✏ ✠ ✓ ✄ ✍ ✏ ✂ ✜ ✄ ✂ ✂ ✜✞✓ ☎ ✕ ✄ ✒ ✑ ✓ ✍ ✂ ✠ ✘ ✏ ✂ ✟ ✓ ✄ ✠ ✘ ✆ ✂ ✑ ✚ ✆ ✓ ✝ ✤ ✂ ✜ ✓ ✔ ✝ ✍ ✏ ✂ ✕ ✄ ✑ ✍ ✂ ✍ ✝ ✕ ✠ ✄ ✆ ✥ ✌ ✆ ✝ ✦ ✓ ✂ ✜ ✑ ✏ ✚ ✄ ✕ ✂ ✢

= ) (x g ∇ λ ) (x f ∇

slide-75
SLIDE 75

Lagrange Multipliers II

✂ ✄ ☎ ✆✞✝ ✟✡✠ ✆☞☛ ✌ ✟ ✄ ☛ ✂✍ ✟✡✎ ✂✍✑✏ ✒ ✟ ✝ ✓ ✄ ☎ ✆ ✝ ✟ ✠ ✆☞☛ ✔ ✎ ✂✍ ✝ ✕ ✖ ✟ ✂ ✝ ✍✑✏ ✝ ✓ ☛ ✗ ✕ ✖ ✌ ✟✡☛ ✂ ✝ ✄ ☎ ✍ ✝ ✘ ☛ ✟ ✂ ✝ ✓ ☛ ✍ ✠ ✖ ✂ ✎ ✙ ✝ ✓ ☛ ✍ ☎ ✕ ✙ ✖ ✔ ☛ ✂ ✎ ✕ ✄ ✖ ✆ ✍ ✚
✆ ✍ ✎ ✏ ✒ ☛ ✍ ✝ ✟ ✆ ✆ ✓ ✖ ✜ ☛ ✔ ✎ ✂✍ ✝ ✕ ✖ ✟ ✂ ✝ ✍ ✎ ✂ ✚
☛ ✔ ✖ ✂ ✔ ✖ ✠ ✝ ☎ ✕ ☛ ✘ ✎ ✝ ✓ ✕ ☛ ✣ ☎ ✟ ✕ ☛ ✄ ☛ ✂ ✝ ✍ ✘✥✤ ✆ ✎ ✎ ✦ ✟ ✂ ✗ ✙ ✎ ✕ ✔ ✕ ✟ ✝ ✟ ✔ ✖ ✆ ✠ ✎ ✟ ✂ ✝ ✍ ✎ ✙ ✝ ✓ ☛ ✧ ✖ ✗ ✕ ✖ ✂ ✗ ✟ ✖ ✂ ✚

= ∑ ∇

i i i

x g ) ( λ ) (x f ∇

− ∑

i i i

x g ) ( λ ) (x f = Λ ) , ( λ x

∂Λ/∂x = 0

✕ ☛ ✔ ✎ ✜ ☛ ✕ ✍ ✝ ✓ ☛ ✗ ✕ ✖ ✌ ✟✡☛ ✂ ✝✩★ ✟ ✂ ★ ✍ ✠ ✖ ✂ ✠ ✕ ✎ ✠ ☛ ✕ ✝ ✤✫✪

∂Λ/∂λi = 0

✕ ☛ ✔ ✎ ✜ ☛ ✕ ✍ ✔ ✎ ✂✍ ✝ ✕ ✖ ✟ ✂ ✝

i.

) ( : = ∀ x g i

i

slide-76
SLIDE 76

The Lagrangian as an Encoding

✂☎✄ ✆✞✝ ✟✠ ✝ ✡ ✟ ☛ ✝ ✡ ☞
✄ ✠ ✍ ☛ ✡ ✟ ✎ ✂☎✄

xj

✏☎✄ ✠ ☛✒✑ ✝ ✎ ☛ ✑ ✄ ✠ ✄ ✓ ✍ ✑ ✄ ✠ ✔ ✎ ✂ ✄

j

✎ ✂ ✓ ✍ ✕✖ ✍ ✡✄ ✡ ✎ ✍ ✗ ✎ ✂ ✄ ✟ ✠ ✝ ✏ ☛ ✄ ✡ ✎ ✔ ✖ ✝ ✡ ✓ ✍ ✡ ✏ ☛ ✎ ☛ ✍ ✡ ☞
✄ ✠ ✍ ☛ ✡ ✟ ✎ ✂☎✄

λi

✏ ✄ ✠ ☛✒✑ ✝ ✎ ☛✒✑ ✄ ✠ ✄ ✓ ✍ ✑ ✄ ✠ ✔ ✎ ✂☎✄

i

✎ ✂ ✓ ✍ ✡ ✔ ✎ ✠ ✝ ☛ ✡ ✎ ☞

− ∑

i i i

x g ) ( λ ) (x f = Λ ) , ( λ x

− ∑ ∂ ∂

i j i i

x x g ) ( λ

j

x x f ∂ ∂ ) ( = ∂ Λ ∂

j

x x ) , ( λ

− ) (x gi

= ∂ Λ ∂

i

x λ λ) , (

− ∑ ∇

i i i

x g ) ( λ ) (x f ∇ =

) (x gi =

slide-77
SLIDE 77

A Duality Theorem

  • ✁✄✂
☎✆ ✝ ✞ ✟ ✠ ☎✡ ☛ ☞ ✟ ✌ ✠ ☞ ✟

x*

✂ ✍ ✍ ✎ ✞ ✟ ✝ ✍ ✞ ✠ ✝ ✠ ✍ ✟ ✏ ✑ ✂ ✠ ☎ ✝ ✆ ✒

x*,λ*

✓ ✂ ✔

Λ

✕ ✖ ✡ ✞ ✡ ✗
  • 1. x*
✘✚✙ ✛ ✜✣✢ ✤ ✛ ✜ ✥ ✛ ✦ ✘ ✥ ✧ ✥ ✢ ★

Λ

x,λ*

  • 2. λ *
✘ ✙ ✛ ✜ ✢ ✤ ✛ ✜ ✥ ✘✚✫ ✘ ✥ ✧ ✥ ✢ ★

Λ

x*,λ

✞ ✂ ✂ ✔ ✭ ✠ ✝ ✆ ✗ ✮ ✯ ✰ ✱✲ ✳✴ ✵ ✰ ✶ ✱ ✷ ✴ ✸ ✹ ✺ ✱ ✻ ✷ ✺ ✼ ✺

x*

✽ ✾ ✿ ❀ ❀❂❁ ❃❄❅ ❆ ❇❈ ❉ ❄ ❆ ❅

i

❊ ❋ ❅ ❆
  • ❂❍
❅ ❈ ❆ ❉ ❅ ■ ❉ ❍ ❏ ❈ ❆

x*

❑ ✾ ▲ ▼ ❍ ◆ ❇ ❈ ❏ ❉ ❍ ❄ ❆ ❅ ❖ ❈ ❄ ❁ ❃❄ ❏ ❉ ❆ ❉ ❃ ❄ ▼ ❃ ❀ ❏ ❅ ❈ ❆

x*

■ ❃ ❇ ❅ ❃ ❊ ❍

λ

❑ ✮ P ◗ ✳ ✲ ✱ ❘ ✺ ✱ ✻ ✷ ✴

x

❙ ❚ ❯❲❱ ✸ ✲ ❳ ✱ ✴ ❨ ✸

x*

❩ ✵ ❘ ✷ ❨ ❳ ✰ ❘❭❬ ❩ ❱ ❳ ✷ ❘ ✸ ✵ ✰ ✱ ❬ ✷ ✴ ❨ ✷ ✴ ✰ ❳ ✸ ✲ ✳ ✴ ✵ ✰ ✶ ✱ ✷ ✴ ✰ ✶ ✸ ❨ ✷ ✳ ✴ ❩

f(x)

✺ ✼ ✵ ✰ ✹ ✶ ✳❪❴❫ ❵ ✳ ❱ ✸ ❛ ✸ ✶ ❩ ✸ ✱ ✲ ❳

gi(x)

❱ ✷ ❘ ❘ ✵ ✰ ✱ ❬ ❜ ✸ ✶ ✳ ❩ ✵ ✳

Λ

x,λ

❞ ❱ ✷ ❘ ❘ ✹ ✶ ✳ ❪❴❫ ✮ P ◗ ✳ ✲ ✱ ❘ ✺ ✷ ✴ ✷ ✴

λ

❙ ❚ ❯ ❱ ✸ ✲ ❳ ✱ ✴ ❨ ✸

λ*

❩ ✵ ❘ ✷ ❨ ❳ ✰ ❘ ❬ ❩ ✰ ❳ ✸ ✴ ❯ ✷ ✴ ✹ ✰ ❳ ✸

x

❱ ❳ ✷ ✲ ❳ ✺ ✱ ✻ ✷ ✺ ✷ ❜ ✸ ✵

Λ

❩ ✰ ❳ ✸ ✺ ✱ ✻

Λ

✲ ✱ ✴ ✳ ✴ ❘❭❬ ❡ ✸ ❨ ✶ ✸ ✱ ✰ ✸ ✶ ✰ ❳ ✱ ✴ ✰ ❳ ✸ ✳ ❘ ✹ ✳ ✴ ✸ ❩ ❡ ✸ ✲ ✱ ✼ ✵ ✸ ✱ ✰

x* Λ

❢ ✵ ❛ ✱ ❘❭✼ ✸ ✷ ✵ ✷ ✴ ✹ ✸ ❪ ✸ ✴ ✹ ✸ ✴ ✰ ✳ ❯

λ

❩ ✵ ✳ ❱ ✸ ✲ ✱ ✴ ✵ ✰ ✷ ❘ ❘ ❨ ✸ ✰ ✷ ✰ ❫
slide-78
SLIDE 78

Direct Constrained Optimization

  • Many methods for constrained optimization are
  • utgrowths of Lagrange multiplier ideas.
  • Iterative Penalty Methods
✁ ✂☎✄ ✆ ✄ ✝ ✝ ✄ ✆ ✞ ✆✟ ✠✡ ✄ ☛ ✞ ✆☞ ✌ ✡ ✆ ✄ ✍✏✎ ✑ ✎ ✒ ✎ ✓ ✡ ✒ ✔ ✕ ✡ ✟ ✎ ✞☎✖ ✡ ✗ ✒ ✠ ✖ ✞ ✒ ✍ ✄ ✎ ✞ ✆☞ ✟ ✒ ✆ ☛ ✎ ✠ ✄ ✞ ✆ ✎ ☛ ✘ ✙ ✚ ✓ ✞ ☛ ✛ ✒ ✠ ✜ ☛ ✔ ✑ ✞ ✎ ☛ ✡ ✍ ✗ ✢ ✎ ✓ ✒ ✣ ☞ ✓ ✆ ✒ ✎ ✛ ✡ ✍ ✍✤ ✄ ☛ ✑ ✒ ✣ ✞ ✆✟ ✠ ✡ ✄ ☛ ✡

k

✥ ✦ ✧✩★✪ ✫✬ ✭

k

✮ ✫ ✬ ✯ ✬ ✰★ ✬ ✱ ✲ ✪ ✫ ✳ ✬ ✴ ✵ ★ ✶ ✲ ✳✸✷ ✳✩✹ ✫ ✲ ✳ ★ ✬ ✺ ✳ ✻ ✻✼ ✫ ✻ ✫✬ ✰ ✴ ✱ ✲ ✽ ✴ ✶ ✴ ✬ ✫ ✻ ✲ ✭ ✫✾ ✫ ✳ ✬ ✱ ✲ ✾ ✫ ✳ ✬ ✱ ✳ ✬ ✿ ✯ ✬ ✰ ✲ ✳ ★ ✬ ❀ ✫ ✻ ✯ ✴ ❁

k

❂❃ ❄ ❅❆❃ ❇ ❈ ❉ ❊ ❋ ❈ ❅❍● ■ ❈ ❉ ❊ ■ ❈ ❉ ❏ ❊ ❑▲ ❉ ▼ ❃ ◆ ❑ ❉ ❇ ◆ ❊ ❖❆❃ ❉ ◆ ❊ ❑▲ ▲ ❂ ❃ ❖ ❖◗P

− 2 / ) (

2

i i x

g k ) (x f = ) , ( k x f PENALIZED

slide-79
SLIDE 79

Direct Constrained Optimization

  • ✁✄✂
☎ ☎ ✂ ✆ ✝ ✂ ☎ ✞✠✟ ✡☞☛ ✌ ✞ ✍ ✎ ☎ ☎ ✞ ✂ ✎ ✟ ✆✏ ✂ ✂ ✑ ✂ ✆ ☎ ✂ ✡ ✒☞✓ ☎ ✞ ✂ ✔ ✂ ✕✖ ✗ ☎ ✓ ✟ ✕ ☎ ✟ ✘ ✖ ✙ ✆ ✖ ✕ ✙ ✂ ✝ ✚ ✗ ☎ ✍ ✔ ✗ ✍ ✂ ✆ ✌ ☛
✍ ✑

λ=0

✖ ✕ ✡

k=k0

✖ ✏ ✞ ✆ ✟ ✚ ✕ ✡ ☛ ✣

x = arg max Λ

x,λ*,k)

k = α k

λi = λi + k gi(x)

✞ ✍ ✌ ✎ ✍ ✕ ✡ ✌ ✒ ✟ ☎ ✞ ☎ ✞ ✂ ✟ ✔ ☎ ✍ ✝ ✚ ✝

x*

✖ ✕ ✡

λ*

✖ ☎ ☎ ✞ ✂ ✌ ✖ ✝ ✂ ☎ ✍ ✝ ✂ ✦

− 2 / ) (

2

i i x

g k ) (x f = Λ ) , , ( k x

PENALIZED

λ

i i i

x g ) ( λ −

✧✩★ ✪ ✫ ✬ ✭✮ ✯ ✰ ✭✱ ✭✲ ★ ✳ ✴✩✵ ✭ ✶✸✷ ✹ ✮ ✺ ★ ✻ ✭✽✼ ✾ ✭✲ ★ ✳ ✯ ✿ ✻ ✫ ✷ ✯ ❀ ✮ ✫ ❁ ✷ ✭ ★ ✻ ✰ ✮ ✫ ✹ ✲ ✶ ✼ ❂ ★ ❀ ✮ ★ ✲ ❀ ✭ ❃ ✹ ✳ ✯ ✴ ✱ ✳ ✴ ✭ ✮ ✷ ✯ ★ ❄ ✭ ✫ ✬ ✭✮ ✯ ✰ ✭ ✺ ✫ ✮ ✻ ✭ ✯ ✰ ★ ✯ ✯ ✰ ✭✱ ✭ ✲ ★ ✳ ✯ ✿ ✺ ✹ ✲ ✻ ✯ ✴ ✫ ✲ ✭ ✪ ✭ ✮ ✯ ✭ ✶ ✴ ✲ ✯ ✰ ✭ ✻ ✹ ✮ ✮ ✭✲ ✯ ✮ ✫ ✹ ✲ ✶ ✼
slide-80
SLIDE 80

Maximum Entropy

  • ✁✄✂
☎✆ ✝ ✝ ✞ ✟ ✠ ✂ ✡ ✆ ☛☞ ✝ ✂ ✞ ✌ ☎ ✞✍ ✎ ✏ ✠ ✆ ✑ ✍ ✂ ✒ ✞ ☞ ✏ ✑ ☛ ✑✔✓ ✆ ✏ ✑ ✞ ✍ ✕
✂ ☎ ✆ ✍ ✗ ✟ ✑ ✝ ✒ ✑ ✏ ✎ ✘ ✆ ✙ ✠ ✆ ✍ ✙ ✑ ✆ ✍ ✕
✂ ✚✛ ✜ ✢✣ ✞ ☞ ✏ ✑ ☛ ✑✔✓ ✂ ✏ ✤ ✑ ✎ ✒ ✑ ✠✂ ☎ ✏ ✝✦✥ ✏ ✞ ✙ ✂ ✏ ✞ ✟ ✠ ☛ ✆ ✡ ✂ ✍ ✏ ☛ ✞ ✒ ✂ ✝ ✧

− =

x x x

p p p log ) ( H

i i

f f x x

C p i = ∀ ∑

:

★✩ ✪ ✫ ★ ✫✭✬ ✮ ✯ ✰ ✱ ✲ ✮✳ ✴ ✴ ✵

x x x

p p log = Λ ) , ( λ p

∑ ∑

      − −

i x i x f i

x f p C

i

) ( λ

slide-81
SLIDE 81

Lagrangian: Max-and-Min

  • Can think of constrained optimization as:
  • Penalty methods work somewhat in this way:
✁ ✂ ✄ ☎ ✆ ✝ ✞ ✄ ✟✡✠ ☛☞ ✞✌ ✄ ✍ ☎ ✝ ✞✠ ✎ ✍ ✠ ✏ ✝ ☞ ✞✒✑ ☞ ✍ ✆ ☞ ✓ ✍ ✔ ✓ ✞ ☛ ✄ ✝ ☞ ✞ ✕ ☎ ✖ ✓ ✠ ✏✠ ✄ ✌ ☛ ✖ ☞ ✗ ✗ ✠ ✍ ✠ ✎ ✗ ✆ ✘ ✠ ✞ ☎ ✖ ✄ ✝ ✠ ✌ ✙ ✚ ✛✢✜ ✣ ✤ ✥ ✦ ✧ ✤✩★ ✦ ✪ ✧ ✫ ✜ ✬ ★ ✭ ★ ✬ ✪ ★ ✦ ✮ ★ ✫ ✬ ✯ ★ ✬ ✥ ✰✱ ✲ ✚ ✛✢✜ ✣ ✤ ✳ ★ ✦ ✮ ✫ ✯ ✪ ✴ ✫ ✬ ✵ ✥ ✰ ✦ ✮ ✥ ✪ ✴ ✣ ✧ ✲ ✶ ✂ ☞ ✖ ✕ ✠ ✄ ✟ ✠ ✷ ☎ ✸ ✝ ✷ ✝✺✹ ☎ ✄ ✝ ☞ ✞ ✔ ☞ ✍ ☎ ✏ ✝ ✕ ✠ ✞ ✌ ✠ ✄ ☞ ✔

λ

✌ ✙ ✶ ✻ ✔ ✄ ✟ ✠ ✌ ✠ ✌ ☞ ✖ ✓ ✄ ✝ ☞ ✞✌ ✑ ✷ ✝ ✞ ✝ ✷ ✝ ✹ ✠ ☞ ✕ ✠ ✍ ✄ ✟✡✠ ✌ ✘ ☎ ☛ ✠ ☞ ✔

λ

✌ ✙

− ∑

i i i

x g ) ( λ ) (x f = Λ ) , ( λ x

λ

min

x

max − ∑

i i i

x g ) ( λ ) (x f = Λ ) , ( λ x

λ

min

x

max

slide-82
SLIDE 82

The Dual Problem

✂✄ ☎ ✆ ✝ ✞ ✟

λ

✠ ✡ ✞ ☛✌☞ ✂ ✡ ✍ ✎✌✏ ✍

Λ

✎ ✏ ✑ ✏ ✒ ✏ ✝ ✆ ✒ ✓ ✒ ✡ ✎ ✞ ✄ ✞ ✔
✏ ☞ ✟ ✔
✑ ✂ ✡ ✞ ☛✌☞ ✂ ✡ ✔

= ∂ Λ ∂

x

p p ) , ( λ

x x x x

p p p ∂ ∂ − ∑ log

x i x i x i i

p x f p C ∂

✖✘✗ ✙ ✚✘✛ ✜

− ∂ − +

∑ ∑

) ( λ

=

x x x x x

p p p p log 1 log + = ∂ ∂∑

∑ ∑ ∑

− = ∂

✢✤✣ ✥ ✦✤✧ ★

− ∂

i i i x i x i x i i

x f p x f p C ) ( ) ( λ λ ) ( log 1 x f p

i i i x ∑

= + λ

) ( exp x f p

i i i x

∝ λ

slide-83
SLIDE 83

The Dual Problem

  • ✁✄✂
☎✝✆ ✞ ✟ ✠ ✡ ✂ ☛☞ ✌ ✍ ☛ ✎ ☛ ✂ ✆ ✠ ✏ ✞✑ ✒ ✓ ✍✄✔ ✠ ✏ ✍ ✕ ✎ ✠ ✍ ✞ ✆ ✡✝☞ ✔ ✠ ✡ ✂ ✂ ✌ ✑ ✞ ✆ ✂ ✆ ✠ ✍✄☞ ✖ ✗ ✞ ✏ ☛ ✘
✒ ✠ ✡ ✂ ✓✚✎ ☞ ✖ ✍ ✠ ✒ ✠ ✡ ✂ ✞ ✏ ✂ ☛✜✛ ✟ ✂ ✟ ☞ ✆ ✠ ✠ ✞ ✗ ✍ ✆ ✓ ✠ ✡ ✂ ☛ ✎ ✖ ✠ ✍ ✑ ✖ ✍ ✂ ✏ ✔

λ

✠ ✡ ☞ ✠ ✢ ✣✥✤ ✣ ✢ ✣✧✦ ★ ✠ ✡ ✂ ✩ ☞ ✪ ✏ ☞ ✆ ✪ ✍✄☞ ✆ ✘
✡ ✂ ✩ ☞ ✪ ✏ ☞ ✆ ✪ ✍✄☞ ✆ ✍✄✔ ✠ ✡ ✂ ✆ ✂ ✪ ☞ ✠ ✍✥✬ ✂ ✓✝☞ ✠ ☞ ✖ ✞ ✪✮✭ ✖ ✍ ☎ ✂ ✖ ✍ ✡ ✞ ✞ ✓✯ ✆ ✂ ✌ ✠ ✔ ✖ ✍ ✓ ✂ ✔ ✰ ✛ ✔ ✞ ✠ ✡ ✍✄✔ ✍ ✔ ✠ ✡ ✂ ✔ ☞ ☛ ✂ ☞ ✔ ✗ ✍ ✆ ✓ ✍ ✆ ✪ ✠ ✡ ✂

λ

✟ ✡ ✍✄✱ ✡ ☛☞ ✌ ✍ ☛ ✍✥✲ ✂ ✠ ✡ ✂ ✓ ☞ ✠ ☞ ✖ ✍ ☎ ✂ ✖ ✍ ✡ ✞ ✞ ✓✴✳ ✞ ✎ ✏ ✞ ✏ ✍ ✪ ✍ ✆ ☞ ✖ ✑ ✏ ✞ ✕ ✖ ✂ ☛ ✍ ✆ ✑ ☞ ✏ ✠ ✵✷✶

) ( exp ) ( x f p

i i i x

∝ λ λ

x x x

p p log = Λ ) , ( λ p

∑ ∑

      − −

i x i x f i

x f p C

i

) ( λ

slide-84
SLIDE 84

The Dual Problem ∑

x x x

p p log = Λ ) , ( λ p

∑ ∑

      − −

i x i x f i

x f p C

i

) ( λ

∑ ∑ ∑ ∑

x x i i i i i i x

x f x f p

'

) ' ( exp ) ( exp log λ λ

∑ ∑

      − −

i x i x f i

x f p C

i

) ( λ       +      −

∑ ∑ ∑ ∑

'

) ' ( exp log ) (

x i i i x i i i x

x f x f p λ λ       + −

∑ ∑ ∑

x i i i x f i i

x f p C

i

) ( λ λ

slide-85
SLIDE 85

The Dual Problem

     

∑ ∑

x i i i

x f ) ( exp log λ

i

f i iC

− λ = Λ ) , ( λ p ) ( ˆ x f p C

i x x fi ∑

=      

∑ ∑

x i i i

x f ) ( exp log λ

∑∑

x i i i x

x f p ) ( ˆ λ

∑ ∑

x i i i x

x f p ) ( exp log ˆ λ      

∑ ∑

x i i i

x f ) ( exp log λ           −

∑ ∑ ∑ ∑

x i i i i i i x x

x f x f p ) ( exp ) ( exp log ˆ λ λ

x x x

p p log ˆ

− =

slide-86
SLIDE 86

Iterative Scaling Methods

✂ ✄☎ ✆ ✂ ✝✟✞ ✄ ✠☛✡ ✆ ☞ ✝✍✌ ✎ ✏ ✄ ✂ ✑✓✒ ✔✓✕ ✆ ☎ ✄ ✆ ✌ ✆ ☞ ✂ ✄☎ ✌ ✆ ✂ ✝ ✞ ✄ ✒ ✖ ✂ ✝ ✏ ✝✟✗ ✆ ✂ ✝ ✒ ✌ ✏ ✄ ✂ ✑ ✒ ✔ ✘ ✙ ✚ ✛✜ ✜✢ ✣ ✤ ✛✥ ✦ ✧ ✛ ★ ✣ ✩ ✪ ✫ ✫✭✬ ✮ ✯ ✰
✖ ✄ ✡ ✝ ✆ ☞ ✝✟✗ ✄ ✔ ✂ ✒ ✂ ✑ ✄ ✖ ☎ ✒ ✱ ☞ ✄ ✏ ✒ ✲ ✲ ✝✍✌ ✔ ✝ ✌ ✎ ✏ ✆ ✳ ✄ ✌ ✂ ✏ ✒ ✔ ✄ ☞ ✕ ✘
✑ ✄ ✵ ✆ ☎ ✄ ✝ ✂ ✄ ☎ ✆ ✂ ✝ ✞ ✄ ☞ ✒ ✶ ✄☎ ✱ ✒ ✷ ✌ ✔ ✝ ✌ ✎ ✏ ✄ ✂ ✑ ✒ ✔ ✕ ✸ ✕ ✒ ✝ ✕ ✹✺ ✻✽✼ ✾ ✿❁❀ ❂❃ ❄ ❅ ❆ ❇ ❄ ❈ ❉ ❀ ❊ ❋ ❅
❆ ❂ ❍ ❄ ❀ ❄ ■ ❋ ❏ ❆ ❂ ❇ ❄ ❑ ❀ ❂▼▲ ✾ ◆❁❖ ❄ ❑◗P ❑❁❘ ❋ ❄ ■ ❋
❆ ❂ ❍❙▲ ✾ ❚ ❅ ❀
❋ P ❯ ❉ ❀ ❊ ❋ ❅
❆ ❂ ❍ ❇ ❈ ❂
❉ ❀ ❀ ❃ ❋ ❱ ❲ ❳✍❨ ❩❬ ❭ ❨ ❪✓❫ ❴ ❨ ❵ ❩❛ ❜ ❨ ❝ ❩❞ ❡ ❫ ❞ ❢ ❴ ❫ ❛ ❣ ❫ ❞ ❤ ✐❦❥ ❧✽♠ ❤ ❤ ❪ ❨ ✐ ❨ ❫ ❭✓♥ ❩❛ ❣ ❤ ❪ ❡ ✐ ❫ ❛ ❨ ❞ ❨ ❣ ❤ ❪ ❨ ❛ ✐ ❣ ❡ ❬ ❭ ❨ ❛ ❤ ❩ ♠ ❞ ❝ ❨ ❛ ✐ ❤ ❫ ❞ ❝ ❥ ❞ ❩❛ ❨ ❡ ❬ ❣ ❛ ❣✍♦ ❫ ❭ ❭ ❢ ❡ ❩❛ ❨ ❨ ♣ ♣ ❣ ♦ ❣ ❨ ❞ ❤rq
slide-87
SLIDE 87

Newton Methods

  • ✁✄✂
☎ ✆ ✝ ✞ ✟ ✂ ✆ ✠ ✝ ✡☞☛ ✌✍ ✂ ✌ ✎ ☛ ✝ ✏ ✆ ✂ ✍ ✌ ✆ ✏✄✑ ✂ ✌✒ ✒ ✍ ✝ ✓ ✏ ✔ ✌ ✆ ✏ ✝ ✞ ✌ ✎☞✕ ✝ ✍ ✏ ✆ ✠ ✔ ☛✗✖ ✘ ✙ ✝ ✞ ☛ ✆ ✍ ✚ ✛ ✆ ✌✜ ✚ ✌ ✡ ✍ ✌ ✆ ✏ ✛ ✌ ✒ ✒ ✍ ✝ ✓ ✏ ✔ ✌ ✆ ✏ ✝ ✞ ✖ ✘ ✟ ✌ ✓ ✏ ✔ ✏✄✢ ✂ ✆ ✠ ✂ ✌✒ ✒ ✍ ✝ ✓ ✏ ✔ ✌ ✆ ✏ ✝ ✞ ✖
✌ ✍ ✏ ✝ ✚ ☛ ☎ ✌ ✤ ☛ ✝ ✥ ✡ ✝ ✏ ✞✕ ✂ ✌ ✛ ✠ ✌✒ ✒ ✍ ✝ ✓ ✏ ✔ ✌ ✆ ✏ ✝ ✞ ✦ ✘ ✧ ✠ ✂ ✒ ✚ ✍ ✂ ✁✄✂ ☎ ✆ ✝ ✞ ✔ ✂ ✆ ✠ ✝ ✡ ✛ ✝ ✞ ☛ ✆ ✍ ✚ ✛ ✆ ☛ ✆ ✠ ✂ ✆ ✌ ✞ ✕ ✂ ✞ ✆ ✜ ✚ ✌ ✡ ✍ ✌ ✆ ✏ ✛ ☛ ✚ ✍ ✥ ✌ ✛ ✂ ✌ ✆

x

★ ✚ ☛ ✏ ✞ ✕

∇f(x)

✌ ✞ ✡

f(x).

✘ ✧ ✠ ✏✪☛ ✏ ✞ ✑ ✝ ✎✫✑ ✂ ☛ ✏ ✞ ✑ ✂ ✍ ✆ ✏ ✞ ✕ ✆ ✠ ✂

f(x)

✬ ✭☞✮ ✯☞✰ ✱ ✲✴✳ ✵✷✶ ✸ ✮ ✹✻✺ ✼✄✽ ✱ ✾ ✰ ✿ ❀✽ ✾ ❁ ✰ ❂ ✮ ✶ ✮ ✽ ✮ ✹ ❀❃ ✯☞✽ ✺ ✮ ✾ ❄ ✶ ❅ ✾ ✶ ❄ ✽ ❂ ✸ ❃ ❃ ❄ ✰ ❆ ✹ ❀ ✸ ✾ ✹✪✰ ✿ ✮ ✾ ✰

f(x).

❈ ❉ ❊ ✾ ❁ ✽ ✿ ✶ ❀ ❋ ✽ ❄ ✰ ❊ ❂ ✹ ❀✽ ✿ ✮ ✹ ✰ ✿ ✮ ✭ ✿ ✶ ❀ ❋ ✽ ❄ ✰ ❊ ❊ ✽ ✸ ✾ ✶ ❄ ✽ ✮ ✲ ✹ ✮ ✯ ✸ ❄

f(x)

✹ ✮ ✾ ✰ ✰ ✯ ✸ ❄
✾ ✰ ✮ ✾ ✰ ❄ ✽ ❍ ✯ ✹ ❀ ✹ ✾ ✽ ❂ ✺ ❀ ✽ ❀ ✰ ❄ ■ ❏ ✶ ✸ ✮ ✹✻✺ ✼✄✽ ✱ ✾ ✰ ✿ ❀ ✽ ✾ ❁ ✰ ❂ ✮ ✶ ✮ ✽ ✾ ❁ ✽ ✯ ✸ ✮ ✾ ❊ ✽ ✱
✸ ❂ ✹ ✽ ✿ ✾ ❑ ✸ ✯ ✶ ✽ ✮ ✾ ✰ ✹ ❀❃ ✯ ✹ ❅ ✹ ✾ ✯ ■ ✸ ❃ ❃ ❄ ✰ ❆ ✹ ❀ ✸ ✾ ✽

f(x)

✭ ▲ ▼ ✹ ✮ ✸ ✮ ❃ ✽ ❅ ✹ ✸ ✯ ❅ ✸ ✮ ✽ ✰ ❊ ✾ ❁ ✹ ✮ ✲ ✳ ◆ ❖ ✹ ❀ ✹ ✾ ✽ ❂ ✺ ❀✽ ❀✰ ❄ ■ ❏ ✶ ✸ ✮ ✹✻✺ ✼✄✽ ✱ ✾ ✰ ✿ ❀ ✽ ✾ ❁ ✰ ❂ ✮ ✯ ✹ P ✽ ✹ ✿ ✭ ✼ ✰ ❅ ✽ ❂ ✸ ✯ ◗ ❘ ❘ ❙ ✲ ✸ ❄ ✽ ❃ ✰ ✮ ✮ ✹ ❋ ✯ ■ ✾ ❁ ✽ ❀ ✰ ✮ ✾ ✽ ❊ ❊ ✹ ❅ ✹✪✽ ✿ ✾ ✱ ✸ ■ ✾ ✰ ✾ ❄ ✸ ✹ ✿ ❀ ✸ ❆ ✽ ✿ ✾ ❀ ✰ ❂☞✽ ✯ ✮ ✭ ❚ ✸ ✯ ✰ ✶ ❊ ❯ ❱ ❱ ❯ ✲ ✳ ❲ ❳❩❨ ❬ ❭ ❪ ❫❴ ❵ ❛ ❛✷❜ ❫ ❴ ❝ ❴ ❝ ❞ ❴ ❫ ❪ ❡ ❢❤❣ ✐
slide-88
SLIDE 88

Part III: NLP Issues

  • Sequence Inference
  • Model Structure and Independence

Assumptions

  • Biases of Conditional Models
slide-89
SLIDE 89

Inference in Systems

Sequence Level Local Level

  • ✁✂
✄ ☎ ✆ ✄ ✝ ✄ ✞✠✟ ✡ ☛ ☞ ✌ ✟ ✍✏✎ ☛ ✌ ✡✑ ☛ ✒✏✓ ✔ ✞ ✟ ✡ ☛ ☞ ✌ ✟ ✕ ✖ ✡ ✗ ✟ ✘ ✙✛✚ ✝ ✜ ✢ ✜✤✣ ✄ ✝ ✜ ✁✥ ✦ ✢ ✁ ✁ ✝ ✧ ✜ ✥ ★ ✩ ☎ ✄✪ ✪ ✜ ✫ ✜✭✬ ✮ ✯✱✰ ✚ ✬ ✲✴✳✵ ✶ ✷✸ ✳✹ ✺ ✵ ✻ ✳ ✼ ✦ ✬ ✽ ✾ ✬ ✥ ✂ ✬ ✆ ✄ ✝ ✄ ✿✤❀ ❁ ❂ ❃ ❄ ❃ ❅✤❆ ❇ ❈ ❉❊ ❋ ✿ ❉
  • ■❍
❏■❑ ▲ ❄ ❀
❀ ❇ ❂✭▼ ◆ ❍ ❆ ❀ ❏ ❇ ❂ ❍ ❑ ❖ ❉ ❆ P ❄ ◗ ❀ ❇ ❍ ❘ ❈ ❀
❍ ❆ ❇ ❙ ❍ ❚ ❄ ❍ ❆ ▼ ❍ ✿ ❉
  • ■❍
❏ ❯❱ ◆❲ ❑ ❑ ❄ ❍ ❑ ❲ ❆ ❳ ❍ ❈ ❍ ❆ ▼ ❍ ❱ ❉ ▼ ❀ ❏ ❨ ❀ ❇ ❀ ❱ ❉ ▼ ❀ ❏ ❨ ❀ ❇ ❀
slide-90
SLIDE 90

Beam Inference

  • ✁✄✂
☎ ✆ ✝✄✞ ✟ ✂ ✠ ✂ ✞ ✡ ✂ ☛ ☞ ✌ ✍ ✎✏ ✑ ✒✔✓ ✕✖ ✗ ✍ ✗ ✕✘ ✙ ✎ ✎ ✓ ✍ ✒ ✎ ✍ ✕ ✓

k

✑ ✕✚ ✓ ✛ ✎ ✍ ✎ ✖ ✎✜ ✢ ✎ ✘ ✑ ✎ ✖✤✣ ☞ ✥✧✦ ★ ✩ ✪ ✫ ✩ ✬✭ ✮✔✯ ✩ ✰ ✱ ✩ ✪ ✭ ✩ ✲ ✪ ✩ ✬✭ ✮✳ ✴ ✭ ✬ ✳ ✵ ✬ ✶✸✷ ✹ ✺ ✮ ✩ ✩ ✦ ★ ✩ ✪ ✯ ✲ ✴ ✪ ✯ ✭ ✴✻ ✼ ✩ ★ ✩ ✽ ✴✾ ★ ✮ ✩

k

✯ ✳ ✴ ★ ✯ ✬ ★ ★ ✮ ✩ ✪ ✩ ✦ ★ ✼ ✴ ✯ ✲ ★ ✲ ✴ ✪ ✷ ✿ ❀ ❁❃❂ ❄❅ ❆ ❄❇ ❈❉ ❊ ✹ ❋ ✬ ✯ ★
✪ ✫❍ ✩ ✬ ✻ ✯ ✲✧■ ✩ ✯ ✴ ✽ ❏ ❑ ▲ ✬ ✾ ✩ ✬ ✯ ▼ ✴ ✴ ✫ ✴ ✾ ✬ ✳ ✻ ✴ ✯ ★ ✬ ✯ ▼ ✴ ✴ ✫ ✬ ✯ ✩ ✦ ✬✭ ★ ✲ ✪ ✽ ✩ ✾ ✩ ✪ ✭ ✩ ✲ ✪ ✻ ✬ ✪ ✶ ✭ ✬ ✯ ✩ ✯ ✷ ✹ ✥ ✬ ✯ ✶ ★ ✴ ✲ ✻ ✼ ✳ ✩ ✻ ✩ ✪ ★ ◆ ✪ ✴ ✫ ✶ ✪ ✬ ✻ ✲ ✭ ✼ ✾ ✴ ▼ ✾ ✬ ✻ ✻ ✲ ✪ ▼ ✾ ✩ ✰ ✱ ✲ ✾ ✩ ✫❖ ✷ ✿ P◗ ❉ ❄ ❁❃❂ ❄❅ ❆ ❄ ❇ ❈ ❊ ✹ ❘ ✪ ✩ ✦ ✬ ✭ ★ ❙ ★ ✮ ✩ ▼ ✳ ✴ ❍ ✬ ✳ ❍ ✩ ✯ ★ ✯ ✩ ✰ ✱ ✩ ✪ ✭ ✩ ✭ ✬ ✪ ✽ ✬ ✳ ✳ ✴ ✽ ✽ ★ ✮ ✩ ❍ ✩ ✬ ✻ ✷ ❚❱❯ ❲ ❳ ❯ ❨❩ ❯ ❬❪❭ ❫ ❯ ❴ ❵ ❨ ❛ ❯❜ ❯ ❨❩ ❯ ❝❪❯❞ ❡ ❚ ❯ ❲ ❳ ❯ ❨❩ ❯
slide-91
SLIDE 91

Viterbi Inference

  • ✁✂
✄ ☎✆ ✝ ✂ ✂ ✞ ✟ ☎✆ ☎ ✞✠ ☎ ✡ ☛ ☞✍✌ ✎✏ ✑ ✒✔✓ ✕✖ ✗✘ ✖ ✏ ✑ ✑ ✒ ✎ ✘ ✗✖ ✑✙ ✑ ✗ ✒✍✚ ✏ ✛ ✒ ✗ ✎✢✜ ☛ ✣✔✤ ✥ ✦ ✧✔★ ✤ ✩ ✩✪ ✫ ✬ ✬ ✭ ✧✔✮ ✯✱✰ ✭ ✰ ✲ ✩ ✳ ✫ ✳ ✤ ✧ ✮ ✲ ✬ ✦ ✤ ✮ ✴ ✤ ✵ ✤✢✶ ✷ ✶ ✸ ✹ ✫ ✩ ✳ ✳ ✭ ✰ ✩ ✳ ✫ ✳ ✤ ✩ ✫ ★ ✤ ★ ✤ ✬ ✤ ✺ ✫ ✮ ✳ ✻ ✶ ✼ ✽ ✾❀✿ ❁❂ ❃ ❁❄ ❅ ❆ ❇ ❈✍❉ ✫ ✴ ✳ ❊ ✳ ❋ ✤ ✷ ✬ ✰
✩ ✳ ✩ ✤ ✥ ✦ ✤ ✮ ✴ ✤ ✧ ✩ ★ ✤ ✳ ✦ ★ ✮ ✤ ✯ ✶ ✼ ❍■❑❏ ❁ ✾❀✿ ❁❂ ❃ ❁ ❄ ❅ ❆ ❇ ▲ ✫ ★ ✯ ✤ ★ ✳ ✰ ✧ ✪ ✹ ✬ ✤ ✪ ✤ ✮ ✳ ✬✱✰ ✮ ✷❑▼ ✯ ✧ ✩ ✳ ✫ ✮ ✴ ✤ ✩ ✳ ✫ ✳ ✤ ▼ ✩ ✳ ✫ ✳ ✤ ✧ ✮ ✳ ✤ ★ ✫ ✴ ✳ ✧✔✰ ✮ ✩ ✵
✫ ✪ ✧ ✮ ✲ ✤ ★ ✤ ✮ ✴ ✤ ✳ ✤ ✮ ✯ ✩ ✮ ✰ ✳ ✳ ✰ ✫ ✬ ✬ ✰ ✭ ✬✱✰ ✮ ✷ ▼ ✯ ✧ ✩ ✳ ✫ ✮ ✴ ✤ ★ ✤ ✩ ✦ ★ ★ ✤ ✴ ✳ ✧ ✰ ✮ ✰ ✲ ✩ ✤ ✥ ✦ ✤ ✮ ✴ ✤ ✩ ✫ ✮ ◆ ✭ ✫ ◆ ✻ ✶ ❖✔P ◗ ❘ P ❙❚ P ❯❲❱ ❳ P ❨ ❩ ❙ ❬ P❭ P ❙❚ P ❪❲P❫ ❴ ❖ P ◗ ❘ P ❙❚ P
slide-92
SLIDE 92

Independence Assumptions

  • Graphical models describe the conditional

independence assumptions implicit in models.

c1 c2 c3 d1 d2 d3 HMM

c

d1 d 2 d3 Naïve-Bayes

slide-93
SLIDE 93

Causes and Effects

✂ ✂☎✄ ✆ ✝ ✞ ✟ ✠ ✡ ☛ ☞✌ ✍✎ ✏ ✑✓✒ ✡ ✎

wi

✡ ✎ ✍✎ ✔ ✕ ✍ ✎ ✎ ✖ ✖ ✎ ✗ ✒ ✘ ☛ ✏ ✒ ✡ ✎ ✙✚ ✌ ✎ ☞✜✛ ✟ ✢ ✡ ✎ ✏ ✒ ✣ ✚ ✕ ✍ ✍ ✚ ✣ ✘ ✎ ✤ ☛ ✒ ✕ ✏✚ ✌ ✎✦✥ ✒ ✡ ✎ ✗ ✡ ☛ ☞✌ ✍✎ ✏ ✕ ✍✎ ✑ ☛ ✏ ✌ ✎ ✧ ✎ ✏ ✌ ✎ ✏ ✒ ✔ ✎ ✖ ✖ ✎ ✗ ✒ ✘ ✛
  • ★✪✩
✫ ✬✭ ✬ ✮ ✯✱✰✲ ✳✴ ✵ ✶ ✷ ✵ ✸ ✳

wi

✸ ✳ ✲ ✳ ✹ ✰ ✲ ✳ ✺ ✰ ✻ ✶ ✳ ✶ ✼ ✴ ✵ ✸ ✳ ✽✾ ✿ ✳ ❀✜❁ ✮ ❂ ✸ ✳ ✴ ✵ ❃ ✾ ✰ ✲ ✲ ✾ ❃ ✶ ✳ ✴ ✵ ✳ ✲ ✰ ✴ ✾ ✿ ✳ ✷ ✰ ❄❆❅ ✶ ✵ ✲ ✻ ✺ ✵ ✻ ✲ ✳ ✹✜❇ ✵ ✸ ✳❈ ✰ ✲ ✳✴ ✵ ✶ ✰ ✲ ✳ ✼ ✴ ✺ ✰ ✻ ✶ ✰ ❀ ✺ ✾ ✽ ❈ ✳ ✵ ✼ ✵ ✼ ✾ ✴ ❁

c

d1 d2 d3

c

d1 d2 d3

slide-94
SLIDE 94

Explaining-Away

✂☎✄ ✆ ✆✝ ✞ ✄✟ ✠✡ ✄ ☛ ✆☞ ✠ ✌ ✟ ✠ ✍ ☞ ✝ ✎✏ ✄ ✑ ☛ ✑ ☛ ✝ ✆✓✒ ✠ ☞ ✝ ✎ ✎ ✝ ✆ ☛ ✆ ✑ ✄ ✡ ✠ ☞ ✑ ☛ ✝ ✆ ☛ ✟ ✄ ✔ ✏ ✍ ✠ ☛ ✆ ☛ ✆✕✗✖ ✠ ✘ ✠ ✙✛✚
✆ ✄ ✔ ✏ ✍ ✠ ☛ ✆ ☛ ✆✕✗✖ ✠ ✘ ✠ ✙ ✒ ✞ ☛ ✟ ☞ ✝ ✢ ✄ ✡ ☛ ✆✕ ✝ ✆ ✄ ☞ ✠ ✌ ✟ ✄ ✍ ✄ ✠ ✞ ✟ ✑ ✝ ✠ ✍ ✝ ✘ ✄ ✡ ✄ ✞✣ ✄ ✍ ☛ ✄ ✤ ☛ ✆ ✝ ✑ ✂ ✄ ✡ ☞ ✠ ✌ ✟ ✄ ✟ ✚ ✥✦✧ ★✩ ✪✫✭✬ ✮ ✫✭✯ ✧ ✰ ✫✭✱ ✱✲ ✦ ✳ ✫ ✯ ✧ ✰ ✫ ✱ ✱ ✲ ✦ ✴ ✔ ✠ ✎ ✏ ✍ ✄ ✵ ✜ ✣ ✌ ✙ ✍ ✝ ✑ ✑ ✄ ✡ ✙ ✑ ☛ ☞ ✶ ✄ ✑ ✟ ✷ ✠ ✆ ✞ ✸ ✚ ✹ ✝ ✌ ✠ ✟ ✟ ✌ ✎ ✄ ✆ ✄ ☛ ✑ ✂ ✄ ✡ ☛ ✟ ✠ ✘ ☛ ✆ ✆ ✄ ✡ ✚ ✜ ✑ ✂ ✄ ✆ ✞ ✝ ✠ ☞ ✡ ✠ ✺ ✙ ✻ ☛✼✕ ✚ ✹ ✝ ✌ ✑ ✂ ✄ ✆ ✣ ✄ ✍ ☛ ✄ ✢ ✄ ✝ ✆✄ ✝ ✤ ✎ ✙ ✑ ✘ ✝ ✍ ✝ ✑ ✑ ✄ ✡ ✙ ✑ ☛ ☞ ✶ ✄ ✑ ✟ ✎ ✌ ✟ ✑ ✣ ✄ ✠ ✘ ☛ ✆ ✆✄ ✡ ✒ ✽ ✾✿ ✖ ✽ ✾✿ ✚ ✜ ✤ ✙ ✝ ✌ ✑ ✂ ✄ ✆ ✤ ☛ ✆ ✞ ✑ ✂ ✠ ✑ ✑ ☛ ☞ ✶ ✄ ✑ ✷ ✞ ☛ ✞ ☛ ✆ ✞☎✄ ✄ ✞ ✘ ☛ ✆ ✒ ✙ ✝ ✌ ✕ ✝ ✣ ✠ ☞ ✶ ✑ ✝ ✣ ✄ ✍ ☛ ✄ ✢ ☛ ✆✕ ✑ ✂ ✠ ✑ ✸ ☛ ✟ ✏ ✡ ✝ ✣ ✠ ✣ ✍ ✙ ✆ ✝ ✑ ✠ ✘ ☛ ✆ ✆ ✄ ✡ ✚
slide-95
SLIDE 95

Data and Causal Competition

  • ✁✄✂
☎ ✆✝ ✞ ✟ ✠✄✡ ☛☞ ✁ ✠ ✡ ✌ ✞ ✡ ✞ ✂ ✍ ✝✏✎ ✑ ✒✔✓ ✕✖ ✗ ✘✚✙ ✛ ✜ ✖ ✢ ✓ ✙ ✣ ✓ ✤ ✥ ✗✦ ✤ ✖ ✙ ✓ ✘ ✗ ✖★✧ ✩ ✪ ✢ ✫ ✖ ✤ ✗ ✦ ✤ ✖ ✬ ✓ ✭ ✤ ✓ ✙ ✜ ✬ ✓ ✙ ✜ ✬ ✛ ✜ ✘ ✕✮ ✗ ✖ ✓ ✯ ✦ ✛ ✓ ✓ ✥ ✯ ✖ ✦ ✢ ✭ ✤ ✖ ✧ ✰ ✱✄✲ ✳ ✴✵ ✶ ✷✸ ✹ ✴ ✺✼✻ ✽ ✲ ✾ ✴ ✲ ✵ ✿ ✵ ✶ ✴❀ ✴ ✻ ✶ ✿ ✵ ❁❃❂ ❄❅ ❆ ❇ ❈ ❆ ❉❋❊
  • ✏❍
■❏ ❑▼▲ ◆P❖ ◗✄❘ ❙ ❚❯✏❱ ❲ ❘ ❳✼❨ ❩ ◗✄❬ ❖❃❭ ❪ ❫✔❴ ❵ ❛✚❜ ❛✔❝ ❞ ❵ ❛✚❡ ❢❣ ❡ ❞ ❤❥✐ ❞❦ ❦ ❛ ❣ ❢ ❵ ❧♥♠ ♦ ❡ ♣ ♣ ♠ ♦ ❵ ♦ ❤ ❞❦ ❦★q ❪ r ♣ ❡ ♦ ♠ ❦ ❦ ✐ ❞❦ ❦ ❛ ❣ ❢ ❦ ❜ ❡ ♣ ♠ s ♠ ❛ ❣ ❧ ❵ t✉✈ ❤ ❞ ❜ ♠ ✇① ❵ ❡ ② ♠ ❞ ❵ ③ ♣ ♠ ❦ s ❧ ❛ ♦ ❧ ❞ ♣ ♠ ❢♠ ♠ ④ ♠ ④ ❵ ❡ ❣ ♠ ❵ ♦ ❤ ❞❦ ❦ ❛ ② ❛ ♦ ❞ ❵ ❛✚❡ ❢ ❦ ♣ ❛ ❣ ❧ ❵ q ❪ ⑤ ❞ ⑥ ♠ ❢ ❵ ❜ ❡ ④ ♠ ❤ ❦ ♠ ② ② ♠ ♦ ❵ ❛✔⑦ ♠ ❤❥⑧ ❧ ❞ ⑦ ♠ ❵ ❧ ♠ ❦ ❵ ♣ ③ ♦ ❵ ③ ♣ ♠ ❦ ❧ ❡ s ❢★⑨ ❴ ③ ❵ ❵ ❛ ❢ ❣ ② ♠ ❞ ❵ ③ ♣ ♠ ❦ ❛ ❢ ❵ ❡ ♦ ❞ ③ ❦ ❞ ❤ ♦ ❡ ❜ ❴ ♠ ❵ ❛ ❵ ❛ ❡ ❢ q

c

w1 w 2 w 3

slide-96
SLIDE 96

Example WSD Behavior I

✂ ✄ ☎✝✆ ✞ ✟ ✠ ✡ ☛ ✄ ☎ ✁ ✂ ✄ ☎ ☞

A) “thanks anyway, the transatlantic line

died.” B) “… phones with more than one line

, plush robes, exotic flowers, and complimentary wine.”

✍ ✎✑✏ ✒✔✓ ✕✖ ✗✑✘ ✖ ✙ ✚✛ ✛ ✜ ✢✣ ✤ ✗ ✥ ✦✧ ✗ ✏ ✘ ★ ✩ ✪ ✫ ✬ ✭✑✮ ✯✰✲✱ ✳ ✴✑✵ ✶✔✷ ✸✺✹ ✻✺✼ ✵ ✯ ✽ ✰ ✾ ✿ ✼ ❀ ❀ ❁ ❂ ✰ ❃ ✭ ✬ ✻❄ ✭✑✵ ✯❆❅ ❇ ❈ ❇ ✪ ❇ ❈ ✫ ✬ ✭ ✮ ✯ ✰✲✱ ✳ ❉ ✯ ❊❋ ❄ ✭
✸ ✬ ❂❍ ✵ ✰ ❍ ✬ ❄ ❍ ✵ ✬ ✭ ❀ ✿ ✬ ✼ ✭ ✵ ❋ ✭ ❀ ❍ ✬ ✯ ❄ ✭ ✵ ✯ ❅ ✮ ✼ ❂ ✯ ✬ ✻ ❍ ✵ ✸ ■ ❄ ✼ ❃ ✯ ❂ ✰ ✿ ❋ ✼ ✯ ✰✲✱ ✱ ✱
slide-97
SLIDE 97

Example WSD Behavior II

✂ ✄ ☎ ✆ ✂ ✝✟✞ ✠✟✡ ☛ ✡ ✞ ☞✟✌ ✝ ✝ ✂✍ ✞ ✎ ✏ ✡ ✞ ☛ ✝ ✂✑ ✂ ☛ ✍ ✄ ✡ ✆ ✂ ✂ ✄ ☎ ✒ ✍ ✓ ✔

With Naïve-Bayes:

With a word-featured maxent model:

Of course, “thanks” is just like “transatlantic”!

2 ) 1 | ( ) 2 | ( = flowers P flowers P

NB NB

2 ) 1 | ( ) 2 | ( = tic transatlan P tic transatlan P

NB NB

05 . 2 ) 1 | ( ) 2 | ( = flowers P flowers P

ME ME

74 . 3 ) 1 | ( ) 2 | ( = tic transatlan P tic transatlan P

ME ME

slide-98
SLIDE 98

Markov Models for POS Tagging

c1 c2 c3 w1 w2 w3 c1 c2 c3 w1 w2 w3 Joint HMM Conditional CMM

  • Need P(c|w,c-1), P(w)
  • Advantage: easy to

include features.

  • Typically split P(c|w,c-1)

Need P(c|c-1), P(w|c)

Advantage: easy to train.

Could be used for language modeling.

slide-99
SLIDE 99

WSJ Results

  • ✁✄✂
☎ ☎ ✆✄✝ ☎ ✞ ✟ ✠ ✡☛ ✝ ☞ ☛ ✝ ✌ ☛ ✡✎✍ ✏ ✡ ✆ ✝ ☎✑ ✝ ✒✔✓ ✕✖ ☛ ✗ ✆ ✑ ✏ ✡✙✘ ☞ ✂ ☎ ✂ ✝ ✚ ✌ ✏ ✖ ✖ ☛ ✝ ☞ ✘ ✛ ✑ ✖ ✚ ✜ ☛ ✂ ☞ ✏ ✖ ☛ ✡✎✢
☛ ✖ ✓ ✡ ✆✄✤ ✆ ✒ ✂ ✖ ☛ ✥ ✕ ☛ ✖ ✆ ✤ ☛ ✝ ☞ ☞ ✑ ✦ ✧ ✂ ✜ ✜ ☛ ✖ ☞ ✓ ☛ ☞ ✂ ✒ ✢ ★✩ ✩ ✪ ✫
☛ ☞ ✂ ✆ ✒ ✡ ✭ ✮ ✯✱✰✲ ✳✵✴ ✰✶ ✶ ✷ ✲ ✲ ✸✱✹ ✺ ✻✵✼ ✴ ✴ ✽ ✾✵✿ ✹ ❀ ✽ ✸✱❁ ✼ ✴ ❁ ✿ ✲ ❂ ✼ ✳ ❃ ❄❅ ❆ ❄ ✰ ✰ ✽ ✾ ✼ ✲ ✴ ❁ ✰ ✰ ✽ ✾ ✸ ✹ ✺❈❇ ❉❊●❋ ❍ ❊ ■ ❋ ❍ ❏ ❑ ❑ ▲ ❑ ❑ ▼✄◆ ❖ ❖ P✄◗ ◆ ◆ ❘❚❙ ❖ ❯ ❱ ❲ ❳❩❨ P ◆❬ ❭ ❲ ◆ ❭
slide-100
SLIDE 100

Label Bias

✂☎✄ ✆✞✝ ✟✠ ✡ ✂ ✟☛ ✝ ☞ ✆ ✌ ✡ ✌ ✝ ☞✍ ✎ ✏ ✑ ✑✓✒ ☞ ✆ ✟✔✕ ✟ ✔ ✖ ✝ ✔ ✗ ✡ ✂ ✟ ✘ ✝ ✌ ☞ ✡ ✗✝ ✆ ✟ ✎ ✙ ✚ ✌✓✛ ✟ ☞ ✡ ✂ ✟ ✠ ✍ ✗ ✟ ✖ ✟ ✍ ✡ ✒ ✔ ✟ ✠ ✜
✆ ✟ ✍ ✣ ✎ ✍ ✤ ✟ ✎ ✤ ✌ ✍ ✠ ✥ ✦ ✝ ✡ ✡ ✝ ✒ ✧ ★ ★ ✧ ✩ ✪ ✫ ✬✮✭✯ ✯ ✰ ✯ ✱ ✲ ✳ ✴ ✬✮✵ ✱ ✰ ✶ ✲ ✳ ✰✷ ✳ ✸ ✵ ✹ ✺ ✱ ✲ ✬ ✬✻ ✰ ✹ ✸ ✰ ✼ ✰ ✸ ✸ ✰ ✽✿✾ ❀ ❁ ❂ ✭✯ ✯ ✹ ✸ ✰ ✯ ✰ ✸ ❃ ✭ ✳ ✲❄✵ ✷ ❅❇❆ ✲ ✼ ✭ ❈ ✬✮✭✯ ✯ ✴✮✭✯ ✵ ✷ ✬ ✺ ✵ ✷ ✰ ✰ ✶ ✲ ✳❊❉ ✳ ✴ ✭ ✳ ✰ ✶ ✲ ✳ ✲ ✯ ✳ ✭ ❋ ✰ ✷ ✱ ✲ ✳ ✴ ❈ ✵ ✷ ✽ ✲ ✳ ✲❄✵ ✷ ✭ ✬ ✹ ✸ ✵ ✻ ✭ ✻ ✲ ✬ ✲ ✳ ✺
✸ ✰❍ ✭ ✸ ✽ ✬ ✰ ✯ ✯ ✵ ✼ ✳ ✴ ✰ ✷ ✰ ✶ ✳ ✵ ✻ ✯ ✰ ✸ ❃ ✭ ✳ ✲ ✵ ✷ ✾ ■ ❏✓❑ ▲▼ ◆ ❖✞P ◗ ❀ ❘ ✼ ✱ ✰ ✳ ✭ ❍ ✭ ✱ ✵ ✸ ✽ ✭✯ ✭ ✹ ✸ ✰❚❙ ✽ ✰ ✳ ✰ ✸ ❯ ✲ ✷ ✰ ✸ ❱ ❲ ❳❨ ❩ ❉ ✳ ✴ ✰ ✷ ✳ ✴ ✰ ✷ ✰ ✶ ✳ ✱ ✵ ✸ ✽ ✱ ✲ ✬ ✬ ✭ ✬ ❯✵ ✯ ✳ ✯ ❬ ✸ ✰ ✬ ✺ ✻ ✰ ✭ ✽ ✰ ✳ ✰ ✸ ❯ ✲ ✷ ✰ ✸ ❱ ❳ ❨ ❩ ✾ ❀ ❲ ✸ ✰ ❃ ✲ ✵ ❬ ✯ ❈ ✬✮✭✯ ✯ ✽ ✰ ✳ ✰ ✸ ❯ ✲ ✷ ✰ ✯ ❈ ❬ ✸ ✸ ✰ ✷ ✳ ❈ ✬✮✭✯ ✯ ✸ ✰ ❍ ✭ ✸ ✽ ✬ ✰ ✯ ✯ ✵ ✼ ✱ ✵ ✸ ✽
slide-101
SLIDE 101

States and Causal Competition

✂ ✄ ☎ ✆ ✝ ✞ ✂ ✟ ✠ ✄ ✠ ✞ ✂✡ ☛ ☞ ✞ ✟ ✆ ☛✍✌ ☎ ✞ ✎ ✂✑✏ ✒ ✓ ✔ ✡ ✂ ✟ ✕ ✡✖ ✆ ✝ ✞ ☞✗ ✆ ✄ ✠ ✂✘ ✝ ✡ ✙ ✌ ✆ ✌ ✚ ✞ ✖ ✒ ✛
✡ ✢ ✆ ☛ ✢ ✠ ✡ ✌ ✠ ✌ ✆ ✣ ✗ ☛ ✡ ✠ ✂ ✠ ✂✘✥✤ ✡ ✎ ✡ ✦ ✛ ✧ ★ ✩✫✪ ✬ ✭ ✮ ✪ ✯ ✰ ✱✫✲ ✳✵✴✶ ✬ ✶ ✷ ✸ ✪ ✱ ✱✺✹ ✩ ✲ ✹ ✻ ✳ ✶ ✳✵✼ ✴ ✷✽ ✪ ✾❀✿ ❁ ❂ ❃✍❄ ❅ ❄ ❆ ❄ ❅❇ ❄ ❄ ❈ ❉ ❊✍❋
  • ■❍
  • ■❍
❏✥❑ ❋ ▲ ❋ ▼ ❄ ◆ ◆ ❄ ❖ P❘◗ ❙ ❚ ❯ ❱✫❲❳ ❨ ❩ ❬ ❭ ❪✵❫ ❴ ❱ ❪ ❬ ❲ ❵ ❙ ❛ ❜ ❳ ❝ ❳ ❞ ❡ ❢ ❬ ❪ ❴ ❲ ❣ ❲ ❫ ❤ ❳ ❢ ❢ ❭ ❜ ❬ ❭ ❣ ✐ ❥ ❪ ❲ ❪✵❦ ❴ ❫ ❨ ❳ ❧❀♠ ♥ ♦■♣ qr s q t ♣ q ✉ ♣ ✈ ✇ ♣① ②■③ ♣ s ④ r ⑤ ⑤⑦⑥ ⑧⑩⑨ ① ④ t ♣❶ ♣ ④❘❷ ⑨ ♣ ⑧ ⑧ ♣ q ④ ❶ ❸

c

❹ ❺

c w

slide-102
SLIDE 102

Example: Observation Bias

  • “All” is usually a DT, not a PDT.
  • “The” is virtually always a DT.
  • The CMM is happy with the (rare) DT-DT

sequence, because having “the” explains the second DT.

✁✄✂ ☎ ✆✄✝ ✂ ✞✠✟ ✞ ✡ ☛ ✡ ☞ ✌ ✍ ✎ ✂ ✏ ✑ ✡ ✒ ✎ ✑ ✓ ✑✔ ☞ ✕ ✑ ✖ ☛ ☛ ✗ ✂ ✝ ✎ ✔ ✘ ✙ ✍ ✚ ✘ ✛ ✍ ✜ ✍ ✢✣ ✤ ✥ ✥ ✦ ✤✧ ✤✧ ★✄✩ ✪✫✬ ✬ ✭ ✪ ✮ ✯✄✰ ✱✲ ✳ ✴✶✵ ✷ ✳ ✸ ✵ ✸ ✵ ✹✺ ✻ ✼ ✼ ✽ ✻ ✯ ✾ ✻ ✯ ✿ ✫✬ ✬ ✭ ✪ ✮ ✯✄✰ ✱✲ ✿ ❀ ❀ ❁ ❀ ❀
slide-103
SLIDE 103

Label Bias?

  • Label exit entropy vs. overproposal rate:
✁ ✂ ✄✆☎ ✝ ✞ ✟ ✠ ✂ ✝✡☞☛ ✌✎✍ ✏✒✑ ✓ ✝ ✟ ✔ ✍ ✕ ✞ ✖ ✟ ☎ ✟ ✓ ✖ ☎ ✔ ✓ ✗ ✂ ✖ ✕ ✔ ✓ ✄ ✓ ✔ ✔ ✓ ✗ ✘ ✞ ✟ ✠ ✓ ✙ ✚ ✚ ✛ ✜ ✢✤✣ ✥✧✦ ★ ✥ ✩ ✣ ✪ ✫ ✩✤✬ ✭✯✮ ✰ ✦ ★ ★ ✣ ✱ ✩ ✪ ✦ ✩✤✲ ✫✳ ✴ ✦ ★ ✪ ✰ ✩ ✮ ✭ ✫✳ ✱ ✦ ✵ ✦ ✣ ✮ ✶ ✱ ✦ ✪✸✷ ✳ ✱ ✳ ✥ ✪ ✦ ✱ ✹ ✣ ✮ ✩ ✳ ✲ ✥ ✩ ✣ ✪ ✫ ✩✤✬ ✭✯✮ ✲ ✳ ✮✻✺ ✼ ✽✿✾ ❀✤❁ ❀❂❃ ❄ ✾ ❃ ❅ ❆✿❇ ❈ ❅❉ ❊ ❂ ❇ ❋ ❋ ❉ ❈ ❈ ❂❃
❉ ■ ❂ ❇ ❂ ❊ ❋ ❁ ❏ ✾ ❃ ❑ ❄ ❂ ❉ ❋ ▲ ❃ ❂
❍ ❉ ❋ ▼ ❉ ❇ ❅ ❆ ❋ ❆ ❈ ❉ ❋ ❂ ✾ ◆
  • ❂❃
■ ❉ ❋ ❆ ✾ ❇ ◆ ❆✿❉
  • ☞❖
  • 0.8
  • 0.6
  • 0.4
  • 0.2

0.2 0.4 0.6 0.8 1 2 3 4 HMM CMM

slide-104
SLIDE 104

CRFs

  • ✁✄✂
☎ ✆ ✝✟✞ ✠✡ ✞ ☛ ☞ ✞ ✂ ✌ ✞ ✍ ☎ ✎ ✞ ✏✒✑ ✓ ☎ ✂ ✎ ✔ ✆ ✔ ☎ ✂ ✕ ✏ ✖ ✕ ✂ ✎ ☎ ✍ ✗ ✔ ✞ ✏ ✎ ✡ ✘ ✓ ✖ ✗ ✡ ✙ ☎ ✚ ✘ ✛ ✕ ✚ ✚ ✞ ✠ ✆✢✜ ✞ ✆ ✕ ✏ ✣ ✤✥ ✥ ✦ ✙ ✣
✧ ✝ ☎ ✏ ✞✩★ ✡ ✞ ☛ ☞ ✞ ✂ ✌ ✞ ✌ ☎ ✂ ✎ ✔ ✆ ✔ ☎ ✂ ✕ ✏ ✍ ☎ ✎ ✞ ✏ ✠ ✕ ✆ ✝ ✞ ✠ ✆ ✝ ✕✂ ✕ ✌ ✝ ✕ ✔ ✂ ✔ ✂ ✪ ☎ ✚ ✏ ☎ ✌ ✕ ✏ ✍ ☎ ✎ ✞ ✏ ✡ ✣
✝ ✞ ✡ ✬ ✕ ✌ ✞ ☎ ✚

c

✭ ✡ ✔ ✡ ✂ ☎ ✧ ✆ ✝ ✞ ✡ ✬ ✕ ✌ ✞ ☎ ✚ ✡ ✞ ☛ ☞ ✞ ✂ ✌ ✞ ✡ ✮ ✕ ✂ ✎ ✝ ✞ ✂ ✌ ✞ ✍ ☞ ✡ ✆ ✯ ✞ ✡ ☞ ✍ ✍ ✞ ✎ ☎ ✰ ✞ ✠ ☞ ✡ ✔ ✂ ✪ ✎ ✜ ✂ ✕ ✍ ✔ ✌ ✬ ✠ ☎ ✪ ✠ ✕ ✍ ✍ ✔ ✂ ✪ ✣
✠ ✕ ✔ ✂ ✔ ✂ ✪ ✔ ✡ ✰ ✞ ✠ ✜ ✡ ✏ ☎ ✧ ✮ ✯ ☞ ✆ ✓ ✖ ✗ ✡ ✕ ✰ ☎ ✔ ✎ ✌ ✕ ☞ ✡ ✕ ✏ ★ ✌ ☎ ✍ ✬ ✞ ✆ ✔ ✆ ✔ ☎ ✂ ✯ ✔ ✕ ✡ ✞ ✡ ✣

∑ ∑

'

) , ' ( exp

c i i i

d c f λ = ) , | ( λ d c P

i i i

d c f ) , ( exp λ

slide-105
SLIDE 105

Model Biases

  • ✁✄✂
☎ ✆ ✂ ✝ ✞✟ ✠✡ ☛ ☞ ✌ ☞ ✌ ✟ ✍ ✎ ☛ ☞✑✏ ☛ ☛ ✍ ✒ ✌ ✓ ✓ ☛ ✍ ✔ ✂ ✕ ✌ ✂ ✎ ✝ ☛ ✆ ✆ ☛ ☛ ✠ ✆ ☞ ✟ ✖ ☛ ✍ ☛ ✕ ✂ ✝ ✝✘✗ ✎ ☛ ✒ ✂ ✕ ✠ ✙ ☎ ✝ ✙ ✟ ✕ ✚✛ ✜✣✢ ✤ ✥ ✦★✧ ✩ ✩✪ ✩ ✫ ✩✭✬ ✮ ✯ ✩✪ ✰ ✫ ✧ ✱ ✲ ✮✳ ✩ ✲ ✳ ✱ ✧ ✴ ✴ ✲ ✳ ✴ ✬ ✵ ✶✸✷ ✹ ✱ ✺ ✲ ✳ ✹ ✻ ✱ ✼ ✮ ✰✽ ✲ ✳ ✴ ✰ ✪ ✾ ✻ ✽ ✱ ✲ ✮✳ ✩ ✲ ✳ ✩ ✿ ✲ ✼ ✱❁❀ ✰ ✪ ✾ ✻ ✽ ✪ ✹ ✧ ✰ ✩ ✲ ✳ ✴ ✬ ❂ ❃❅❄ ❆ ❇❈ ❉ ❊❋
❍❏■ ❑ ❄ ❈ ❄ ❈
  • ❏❋
▲ ❄ ▼ ❇ ❉ ▲ ❇ ■ ❇ ◆❅■ ■ ❖ ❇ ■◗P ❘ ❖ ❉❚❙ ✵ ❯ ✿ ✪ ✷ ✮ ✾ ✪ ✦ ❱ ✲ ✱ ✿ ✱ ✿ ✪ ✯ ✪ ✱ ✱ ✪ ✰ ✼ ✪ ✧ ✱ ✻ ✰ ✪ ✩ ✻ ✩ ✻ ✧ ✦ ✦ ✺ ❱ ✲ ✳ ✩ ✬ ✵ ❲ ✧ ❳ ✪ ✳ ✱ ✷ ✮ ✾ ✪ ✦ ✩ ✧ ✰ ✪ ✪ ✧ ✩ ✺ ✱ ✮ ✩ ✱ ✻ ✼ ✼ ✿ ✻ ✴ ✪ ✳ ✻ ✷ ✯ ✪ ✰ ✩ ✮ ✼ ✳ ✮✳ ❀ ✲ ✳ ✾ ✪ ✹ ✪ ✳ ✾ ✪ ✳ ✱ ✼ ✪ ✧ ✱ ✻ ✰ ✪ ✩ ✲ ✳ ✱ ✮ ✬ ✵ ❯ ✿ ✪ ✩✪ ✪ ✼ ✼ ✪ ✽ ✱ ✩ ✩ ✪ ✪ ✷ ✱ ✮ ✯ ✪ ✦ ✪ ✩ ✩ ✱ ✰ ✮ ✻ ✯ ✦ ✪ ✩ ✮ ✷ ✪ ❱ ✿ ✪ ✳ ✺ ✮ ✻ ✲ ✳ ✽ ✦ ✻ ✾ ✪ ✦ ✮ ✱ ✩ ✮ ✼ ✽ ✮✳ ✾ ✲ ✱ ✲ ✮ ✳ ✲ ✳ ✴ ✽ ✮✳ ✱ ✪ ❳ ✱ ✵ ✥ ✧ ✳ ✧ ✫ ✮ ✲ ✾ ✱ ✿ ✪ ✩ ✪ ✯ ✲✸✧ ✩ ✪ ✩ ❱ ✲ ✱ ✿ ✴ ✦ ✮ ✯ ✧ ✦ ✷ ✮ ✾ ✪ ✦ ✩✭❨ ✯❩✻ ✱ ✱ ✿ ✪ ✪ ✼ ✼ ✲ ✽ ✲ ✪ ✳ ✽ ✺ ✽ ✮ ✩ ✱ ✽ ✧ ✳ ✯ ✪ ✿ ✻ ✴ ✪ ✬
slide-106
SLIDE 106

Part IV: Resources

  • Our Software
  • Other Software Resources
  • References
slide-107
SLIDE 107

Classifier Package

  • Our Java software package:

Classifier interface

General linear classifiers

✂ ✄✆☎ ✝ ✞✟ ✠ ✡ ☛ ☎ ☞ ☞ ✌ ✍ ✌ ✞✎ ✍ ☎ ✡ ✠ ✏ ✎ ✑ ✒ ✓ ☎ ✔✖✕ ✞✘✗ ✙ ☎ ✑ ✞ ☞ ✡ ☛ ☎ ☞ ☞ ✌ ✍ ✌ ✞✎ ✍ ☎ ✡ ✠ ✏ ✎ ✑ ✚

Optimization

✒ ✛ ✟ ✡ ✏ ✟ ☞ ✠ ✎ ☎ ✌ ✟ ✞ ✜ ✢✣ ✄ ✌ ✟ ✌ ✤ ✌✦✥ ✞ ✎ ✒ ✢ ✏ ✟ ☞ ✠ ✎ ☎ ✌ ✟ ✞ ✜ ✧ ✞ ✟ ☎ ☛ ✠ ✑ ✄ ✌ ✟ ✌ ✤ ✌✦✥ ✞ ✎ ★

Available at:

✩ ✪✬✫ ✫ ✭ ✮ ✯ ✯✱✰ ✲ ✭✴✳ ✵ ✫ ✶ ✰ ✷✹✸ ✺ ✻ ✳ ✼ ✻✹✽ ✯ ✻ ✸ ✾ ✰ ✲ ✸ ✶ ✻ ✵ ✯✱✿ ✲ ✶ ✵ ✵ ❀ ✷ ❀ ✼ ✺ ✳ ✵ ✪✬✫ ❁ ✲

↑ ↑ ↑ ↑

❂❃❄
slide-108
SLIDE 108

Other software sources

  • http://maxent.sourceforge.net/

Jason Baldridge et al. Java maxent model

  • library. GIS.
  • http://www-rohan.sdsu.edu/~malouf/pubs.html

Rob Malouf. Frontend maxent package that uses PETSc library for optimization. GIS, IIS, gradient ascent, CG, limited memory variable metric quasi-Newton technique.

  • http://search.cpan.org/author/TERDOEST/

Hugo WL ter Doest. Perl 5. GIS, IIS.

slide-109
SLIDE 109

Other software non-sources

  • ✁✄✂
✂ ☎ ✆ ✝ ✝ ✞ ✞ ✞✠✟ ✡ ☛✌☞ ✟ ✍ ☎✎ ✏ ✏ ✟ ✎ ✑ ✍ ✝✓✒ ✔ ✑ ✞ ✔ ☛ ✂ ✝ ☞ ✂ ✔ ✂ ✏ ✕ ☎ ✟ ✁✄✂ ✖ ✕ ✗ ✘ ✑ ✞ ✔ ☛ ✂ ✙ ✔ ✂ ✏ ✔ ☎ ✔✚ ✛ ✁ ☛ ✟ ✜ ✔ ✢ ✔ ✣✥✤ ✂ ✎ ✡✦ ✑ ✎ ✧ ✦ ✚ ✖ ✔ ★ ✎ ✏ ✂ ✩ ✪✫ ✂ ✔✬ ✬ ✎ ✚ ✔ ✏ ✑ ☞ ✎ ✏ ✂ ✎ ✏ ✡ ✎ ✣ ✦ ✍ ✏ ✑ ✔ ✚ ✤ ✧ ☛ ✏ ✑ ✎ ✚ ✟ ✭ ✮ ✫ ✟
  • ✁✄✂
✂ ☎ ✆ ✝ ✝ ✞ ✞ ✞✠✟ ✡ ☞ ✟ ☎ ✚ ☛ ✏ ✡ ✎ ✂ ✦ ✏ ✟ ✎ ✑ ✍ ✝ ✒ ✚ ☛ ☞ ✂ ✔ ✑ ✝ ✗ ✯ ✚ ☛ ✡ ✙ ☛ ☞ ✂ ✔ ✑ ✦ ✏ ✡ ✎ ✍ ☎ ✦ ✏ ✔ ✂ ☛ ✖ ✎ ✑ ☛✌☞ ✂ ✚ ☛ ✣ ✍ ✂ ✎ ✑ ✔ ✖ ✔ ★ ✎ ✏ ✂ ✂ ✦ ✦ ✕ ✛ ☛ ✂ ✂ ✦ ✔ ✡ ✡✦ ✖ ☎ ✔ ✏ ✤ ✁ ☛✌☞ ✘ ✰ ✱ ✝ ✯ ✘ ✰ ✱ ✲ ✳ ✳ ✴ ✂ ✍ ✂ ✦ ✚ ☛ ✔ ✕ ✵ ✣ ✍ ✂ ✂ ✁ ✔ ✂ ✞ ✔ ☞ ✖ ✔ ✏ ✤ ✖ ✦ ✦ ✏ ☞ ✔ ✬ ✦ ✟ ✭ ✮ ✫ ✟
  • ✁✄✂
✂ ☎ ✆ ✝ ✝ ✞ ✞ ✞✠✟ ✡ ☞ ✟ ✍ ✖ ✔ ☞ ☞ ✟ ✎ ✑ ✍ ✝✓✒ ✖ ✡ ✡ ✔ ✕ ✕ ✍ ✖ ✝ ✖ ✔ ✕ ✕ ✎ ✂ ✝ ✗ ✘ ✏ ✑ ✚ ✎ ✞ ✶ ✡ ✰ ✔ ✕ ✕ ✍ ✖ ✔ ✏ ✏ ✦ ✍ ✏ ✡ ✎ ✑ ✔ ☎ ✔ ✡ ✛ ✔ ✬ ✎ ✔ ✂ ✷ ✮ ✩ ✫ ✸✹ ✹ ✸ ✂ ✁ ✔ ✂ ☛ ✏ ✡ ✕ ✍ ✑ ✎ ☞ ✔ ✖ ✔ ★ ✎ ✏ ✂ ✡ ✕ ✔ ☞ ☞ ☛ ✧ ☛ ✎ ✚ ✔ ✕✺☞ ✦ ✍ ☞ ☛ ✏ ✬ ✔ ✕ ☛ ✖ ☛ ✂ ✎ ✑ ✖ ✎ ✖ ✦ ✚ ✤ ✻ ✍ ✔ ☞ ☛✽✼ ✾✌✿ ❀ ❁ ❂ ❃ ❂ ❄ ❁ ❅✌❆ ❅❈❇ ❉ ❁ ❅ ❂ ❃ ❁ ✿ ❊ ❋ ❃ ❅✌● ❍ ✿❏■ ❑ ❍ ❁ ▲ ✿ ▼ ❅❈◆ ✿ ❖ P ◗ ✿ ✿ ❆ ◗ ❁ ❂ ❋ ❉ ◆ ✿ ❘ ✿ ✿ ❃ ❙ ▲✺✿ ▼ ❉ P ✿ ▲❚ ■
slide-110
SLIDE 110

References: Optimization/Maxent

  • ✁✄✂
☎ ✆✞✝ ✟✠ ✝ ✟☛✡ ☞ ✌ ✝ ✍ ✎ ✝ ✏ ✑ ✝ ✒ ✒✄✂ ✓ ✔ ✝ ✌ ✟ ✂ ✡ ✂ ✏ ✁ ✕ ✔ ✏✖ ✝ ✏ ✌ ✑✞✝ ✒ ✒ ✂ ✓ ✔ ✝ ✌ ✟ ✂☛✗ ✘ ✙ ✙✚ ✗ ✛
  • ☎✂
✜ ✔ ☎ ✢ ☎ ✝ ✏ ✌ ✟✣ ✍ ✤ ✂ ✍ ✍ ✟ ✣ ✂ ✖ ✎ ✌ ✣ ✏✂ ✌ ✢ ✟ ✂ ✒ ✒✄✂ ✏ ✠ ✢ ✂ ✠ ✝ ✍ ✟ ✣ ✖ ✝ ✥ ✥ ✔ ✏ ✠ ✗ ✦ ✧ ✣ ☎ ✍ ✢ ✌ ✂ ✌ ✔ ✣ ✏ ✂ ✒ ★ ✔ ✏ ✠ ✢ ✔ ✥ ✌ ✔ ✖ ✥ ✗ ✩ ✩ ✗ ✪ ✗ ✑ ✂ ✟ ✟ ✣ ✖ ✎ ✂ ✏ ✁ ✑ ✗ ✫ ✂ ✌ ✖ ✒ ✔ ✬ ✬ ✗ ✘ ✙ ✭ ✩ ✗ ✛ ✮ ✝ ✏ ✝ ✟ ✂ ✒ ✔✞✯ ✝ ✁ ✔ ✌ ✝ ✟ ✂ ✌ ✔✞✰ ✝ ✥ ✖ ✂ ✒ ✔ ✏ ✠ ✬ ✣ ✟ ✒ ✣ ✠ ✱ ✒ ✔ ✏ ✝ ✂ ✟ ☎ ✣ ✁ ✝ ✒ ✥ ✗ ✦
✏ ✗ ✲ ✂ ✌ ✎ ✗ ☞ ✌ ✂ ✌ ✔ ✥ ✌ ✔ ✖ ✥ ✡ ✳ ✴✶✵ ✘ ✳ ✭ ✷ ✱ ✘ ✳ ✸ ✷ ✗ ✪ ✣ ✎ ✏ ★ ✂ ✬ ✬ ✝ ✟ ✌ ✤ ✡ ✹✞✝ ✟ ✏ ✂ ✏ ✁ ✣ ✓ ✝ ✟ ✝ ✔ ✟ ✂ ✡ ✂ ✏ ✁
✁ ✟ ✝ ✺ ✲✞✖ ✧ ✂ ✒ ✒ ✢ ☎ ✗ ✩ ✷ ✷ ✘ ✗ ✛ ✧ ✣ ✏ ✁ ✔ ✌ ✔ ✣ ✏✂ ✒ ✟ ✂ ✏ ✁ ✣ ☎ ✬ ✔ ✝ ✒ ✁ ✥ ✵ ✓ ✟ ✣ ✻ ✂ ✻ ✔ ✒ ✔ ✥ ✌ ✔ ✖ ☎ ✣ ✁ ✝ ✒ ✥ ✬ ✣ ✟ ✥ ✝ ✠ ☎ ✝ ✏ ✌ ✔ ✏ ✠ ✂ ✏ ✁ ✒✄✂ ✻✄✝ ✒ ✔ ✏ ✠ ✥ ✝ ✼ ✢ ✝ ✏ ✖ ✝ ✁✄✂ ✌ ✂ ✗ ✦ ✽ ✏ ✓ ✟ ✣ ✖ ✝ ✝ ✁ ✔ ✏ ✠ ✥ ✣ ✬ ✌ ✎ ✝ ✽ ✏ ✌ ✝ ✟ ✏ ✂ ✌ ✔ ✣ ✏✂ ✒ ✧ ✣ ✏ ✬ ✝ ✟ ✝ ✏ ✖ ✝ ✣ ✏ ✲ ✂ ✖ ✎ ✔ ✏ ✝ ★ ✝ ✂ ✟ ✏ ✔ ✏ ✠ ✾ ✽ ✧ ✲ ★ ✱ ✩ ✷ ✷ ✘ ✿ ✗ ✫✞✣ ✻✄✝ ✟ ✌ ✲ ✂ ✒ ✣ ✢ ✬ ✗ ✩ ✷ ✷ ✩ ✗ ❀
✣ ☎ ✍ ✂ ✟ ✔ ✥ ✣ ✏ ✣ ✬ ✂ ✒ ✠ ✣ ✟ ✔ ✌ ✎ ☎ ✥ ✬ ✣ ✟ ☎ ✂ ✜ ✔ ☎ ✢ ☎ ✝ ✏ ✌ ✟✣ ✍ ✤ ✍ ✂ ✟ ✂ ☎ ✝ ✌ ✝ ✟ ✝ ✥ ✌ ✔ ☎✂ ✌ ✔ ✣ ✏ ✗ ❀ ✽ ✏ ✓ ✟ ✣ ✖ ✝ ✝ ✁ ✔ ✏ ✠ ✥ ✣ ✬ ✌ ✎ ✝ ☞ ✔ ✜ ✌ ✎ ✧ ✣ ✏ ✬ ✝ ✟ ✝ ✏ ✖ ✝ ✣ ✏ ❁✞✂ ✌ ✢ ✟ ✂ ✒ ★ ✂ ✏ ✠ ✢ ✂ ✠ ✝ ★ ✝ ✂ ✟ ✏ ✔ ✏ ✠ ✾ ✧ ✣ ❁ ★ ★ ✱ ✩ ✷ ✷ ✩ ✿ ✗ ✓ ✂ ✠ ✝ ✥ ✳ ✙ ✱ ❂ ❂ ✗ ❃ ✎ ✣ ☎✂ ✥ ✓ ✗ ✲ ✔ ✏ ❄ ✂☛✗ ✩ ✷ ✷ ✘ ✗
✠ ✣ ✟ ✔ ✌ ✎ ☎ ✥ ✬ ✣ ✟ ☎✂ ✜ ✔ ☎ ✢ ☎ ✱ ✒ ✔ ❄ ✝ ✒ ✔ ✎ ✣ ✣ ✁ ✒ ✣ ✠ ✔ ✥ ✌ ✔ ✖ ✟ ✝ ✠ ✟ ✝ ✥ ✥ ✔ ✣ ✏ ✗ ☞ ✌ ✂ ✌ ✔ ✥ ✌ ✔ ✖ ✥ ❃ ✝ ✖ ✎ ✫ ✝ ✍ ✣ ✟ ✌ ✭ ❂ ✸ ✡ ✧ ✲❅ ✗ ✪ ✣ ✟ ✠ ✝ ❁ ✣ ✖ ✝ ✁✄✂ ✒ ✗ ✘ ✙ ✙ ✭ ✗ ✛ ★ ✂ ✟ ✠ ✝ ✱ ✥ ✖ ✂ ✒ ✝ ✢ ✏ ✖ ✣ ✏ ✥ ✌ ✟ ✂ ✔ ✏ ✝ ✁ ✣ ✍ ✌ ✔ ☎ ✔✞✯ ✂ ✌ ✔ ✣ ✏ ✗ ✦ ✽ ✏
❆ ✂ ✌ ✥ ✣ ✏ ✂ ✏ ✁ ✽ ✗ ✑ ✢ ✬ ✬ ✡ ✝ ✁ ✥ ✗ ✡ ❃ ✎ ✝ ☞ ✌ ✂ ✌ ✝ ✣ ✬ ✌ ✎ ✝
✌ ✔ ✏ ❁ ✢ ☎ ✝ ✟ ✔ ✖ ✂ ✒
✂ ✒ ✤ ✥ ✔ ✥ ✡ ✍ ✍ ✴ ✘ ✘ ✱ ✴ ✴ ✸ ✗ ❇ ✜ ✬ ✣ ✟ ✁ ❅ ✏ ✔ ✰ ✝ ✟ ✥ ✔ ✌ ✤ ✓ ✟ ✝ ✥ ✥ ✗
slide-111
SLIDE 111

References: Regularization

✂✄ ☎✝✆ ✞ ✟ ✠ ✆ ✄ ✂ ✄ ✡ ☛✌☞ ✄ ✂ ☎ ✡ ☛✌☞ ✍ ✆ ✄ ✎ ✆ ☎ ✡✑✏ ✒ ✔✓ ✕ ✖ ✆ ✞ ☞ ✎
☞ ☞ ✁ ✠ ✘ ✄ ✙ ✚ ✆ ✛ ✠ ✄ ✘ ✜ ✓ ✆ ✍ ✎ ☞ ✕ ✢✣ ✢ ☞ ✡ ✆ ☎ ✍ ✏ ✤✥ ✥ ✥ ✦✌✧ ★✩ ✪ ★✫ ✬ ✭✌✮ ✩ ✪ ✮ ✩ ✯✱✰ ✲ ✲ ✫ ✳ ★ ✩ ✴ ✵✷✶ ✴ ✭✌✮ ✸ ✧ ✮ ✫ ✲ ✪ ✪ ✭ ✩ ✹✻✺ ✼ ✽ ✾ ✿ ✺ ❀ ❀ ✏ ❁ ❂ ❃ ❃ ❄ ❅ ✏ ❆ ✂ ✄ ✓ ✂ ✕ ✞ ❇ ❅ ❅ ❅ ✏ ✢ ✏ ❆ ☞ ✠ ✄ ✍ ☞ ✄ ✺
❈ ✆ ✗ ✂ ✄ ✺
✟ ✂ ✄ ☞ ✄ ✺ ❉ ✏ ✟ ✠ ✘ ✂✄ ✡
☛ ✘ ✆ ❊ ☎ ✆ ✕ ✏ ✾ ❋ ❋ ❋ ✏ ✣ ✍ ✁ ✘ ✗ ✂ ✁ ☞ ✕ ✍ ✎ ☞ ✕
☞ ✛ ✠ ✂ ✍ ✁ ✘ ✛
✄ ✘ ✎ ✘ ✛ ✂ ✁ ✘ ☞ ✄ ❃ ■ ✂ ✍ ✆ ✡❏ ❈ ✕ ✂ ✗ ✗ ✂ ✕ ✍ ✏ ✸ ✧ ✮ ✫ ✲ ✲ ✴ ✭ ✩ ✹ ✪ ✮ ❑ ✵ ▲ ▼ ◆❖ ❖ ❖ ✏
slide-112
SLIDE 112

References: Named Entity Recognition

✂✁ ✄✆☎ ✝ ✞ ✟✂✠ ☎ ✡ ☛ ✞ ☞✍✌ ✎✑✏ ✒ ✓ ✓ ✓ ✏
  • ✔✂✕
✖ ☞✍✗ ✘ ✗ ✙ ✁ ✡ ☎ ✠ ✚ ✛
✚ ☎ ✠ ✕ ✌ ☛ ✡ ✠ ✜ ✕ ✗ ✝ ✄ ✙ ✁ ✡ ☞ ✡ ✛ ✢ ✝ ✌ ✠ ✣ ✁ ☞ ✡ ☞ ✠ ✁ ✏ ✤ ☛ ✏ ✥ ✏ ✦ ☛ ✝✧ ☞ ✧ ✏ ✜ ✝ ✞ ★ ✠ ☎ ✎ ✩ ✁ ☞✂✪ ✝ ☎ ✧ ☞ ✡ ✛ ✏ ✥ ✕ ✁ ✫ ✬ ✝ ☞ ✁✮✭ ✯ ✠ ✧ ✝ ✚ ☛ ✰ ✗ ✕ ☎ ☎ ✭ ✱ ✘ ✛ ✜ ✣ ✘ ✛ ✝ ✁ ✭ ✕ ✁ ✄ ✲ ☛ ☎ ☞ ✧ ✡ ✠ ✚ ☛ ✝ ☎ ✥ ✏ ✔ ✕ ✁ ✁ ☞ ✁ ✣ ✏ ✳✴ ✴ ✵ ✏ ✜ ✕ ✗ ✝ ✄ ✙ ✁ ✡ ☞ ✡ ✛ ✢ ✝ ✌ ✠ ✣ ✁ ☞ ✡ ☞ ✠ ✁ ✞ ☞ ✡ ☛ ✲ ☛ ✕ ☎ ✕ ✌ ✡ ✝ ☎✷✶ ✸ ✝ ✪ ✝ ✬ ✔ ✠ ✄ ✝ ✬ ✧ ✏ ✹✂✺ ✻✼ ✽ ✽ ✾ ✿✂❀ ❁❂ ❃ ❄ ✽ ❅ ✽ ❆ ✽ ❀ ❃ ❄ ❇ ✻ ❀ ❈ ✽ ✺ ✽ ❀ ✼ ✽ ✻ ❀ ❉✂❊ ❃ ❋ ✺ ❊
❊ ❀ ❁ ❋ ❊ ❁ ✽ ❍ ✽ ❊ ✺ ❀ ✿ ❀ ❁ ■ ❇ ✻ ❉ ❍ ❍ ❏❑ ❑ ▲ ▼✑◆
slide-113
SLIDE 113

References: POS Tagging

✂✄ ☎ ✆✞✝ ✟✡✠ ☛ ☛ ✁☞ ✁ ☞ ✌ ✍ ✎ ✄ ✏ ✑ ✄ ☞ ✟ ✒ ✁ ☛ ✓✔ ✕✖ ✖ ✗ ✘ ✝ ✙ ☞ ✚ ✄ ☎ ✎ ✛✢✜ ✁ ✎ ✛ ☞ ✜ ✣ ✙ ✍ ✁ ☞ ✌ ✍ ✂✤ ✤ ✎ ✑ ✛ ☞ ✜ ✥ ✤ ☛ ✦ ✁ ✧ ✛ ✂ ✠ ✂ ★ ☞ ✎ ☛ ✤ ✏ ✩ ✪ ✁ ✜ ✜ ✄ ☛ ☎ ✝ ✫✭✬ ✮✯ ✰ ✰ ✱ ✲✭✳ ✴✵ ✮ ✶✸✷ ✹ ✰ ✺ ✺ ✷ ✹ ✻ ✳ ✳ ✼ ✽ ✾ ✿ ✰ ✰ ✷ ✲ ✳ ✴ ✮ ✶ ✷ ✹ ✰ ❀ ✼ ✬ ✮❁ ✰ ✽ ✳ ❂ ✹ ✽ ❁ ✷ ✰ ✬ ✮ ✶✸✷ ✹ ✰ ✻ ✵ ✵ ✮✯ ✲ ✽ ✷ ✲ ✮ ✳ ✶ ✮ ✬ ❂ ✮❃ ❁ ✼ ✷ ✽ ✷ ✲ ✮ ✳ ✽ ✾ ❄ ✲✭✳ ✴ ✼ ✲ ✵ ✷ ✲ ✯ ✵ ❅ ❀ ✻ ❂ ❄ ❆ ❇ ❈ ❉❋❊ ✏ ✏ ✝
  • ❍❏■
  • ❑▼▲
◆❖✠ ✌ ✁ ✏ ✄ ☎ ✎ ▲ P ✠ ☞ ✜ ✁ ☛ ✩ ◗ ✌❙❘ ✁ ✛ ✎ ✆ ✁ ✎ ☞ ✁ ✏ ✁ ☛ ✓ ✑ ✛ ✝ ◗ ✦ ✁ ✧ ✛ ✂ ✠ ✂ ★ ☞ ✎ ☛ ✤ ✏ ✩ ❚ ✁ ☛ ✎ ■ ❯ ✥ ■ ✍ ✏ ✄ ✄ ❱ ✑ ✪ ✁ ✜ ✜ ✄ ☛ ✝ ✙ ☞ ✫ ✬ ✮✯ ✰ ✰ ✱ ✲ ✳ ✴ ✵ ✮ ✶✸✷ ✹ ✰ ❀ ❃ ❁ ✲ ✬ ✲ ✯ ✽ ✾ ✿ ✰ ✷ ✹ ✮ ✱ ✵ ✲ ✳ ❲ ✽ ✷ ✼ ✬ ✽ ✾ ❄ ✽ ✳ ✴ ✼ ✽ ✴ ✰ ✫ ✬ ✮✯ ✰ ✵ ✵ ✲ ✳ ✴ ❂ ✮ ✳ ✶ ✰ ✬ ✰ ✳ ✯ ✰ ▲ ✦ ✁ ✩ ❍ ❳ ■ ❍ ❑ ▲ ❍
✝ ❩ ☞ ✛ ✚ ✄ ☛ ☎ ✛ ✎ ✩ ✤ ✥ ❚ ✄ ☞ ☞ ☎ ✩ ✒ ✚ ✁ ☞ ✛ ✁ ❬ ☛ ✛ ☎ ✎ ✛ ☞ ✁ ✪ ✤ ✠ ✎ ✁ ☞ ✤ ✚ ✁ ✁ ☞ ✌ ✟ ✑ ☛ ✛ ☎ ✎ ✤ ✏ ✑ ✄ ☛ ❭ ✝ ✦ ✁ ☞ ☞ ✛ ☞ ✜ ✝ ✕ ✖ ✖ ✖ ✝ ★✭☞ ☛ ✛ ❱ ✑ ✛ ☞ ✜ ✎ ✑ ✄ ❬ ☞ ✤ ❘ ✒ ✄ ✌ ✜ ✄ ✍ ✤ ✠ ☛ ❱ ✄ ☎ ❩ ☎ ✄ ✌ ✛ ☞ ✁ ✦ ✁ ✧ ✛ ✂ ✠ ✂ ★✭☞ ✎ ☛ ✤ ✏ ✩ ❚ ✁ ☛ ✎ ■ ✤ ✥ ■ ✍ ✏ ✄ ✄ ❱ ✑ ✪ ✁ ✜ ✜ ✄ ☛ ✝ ✫ ✬ ✮✯ ✰ ✰ ✱ ✲ ✳ ✴ ✵ ✮ ✶✸✷ ✹ ✰ ❪ ✮ ✲✭✳ ✷ ❫ ❴ ❵ ❛ ✻❜ ❂ ✮ ✳ ✶ ✰ ✬ ✰ ✳ ✯ ✰ ✮ ✳ ❀ ❃ ❁ ✲ ✬ ✲ ✯ ✽ ✾ ✿ ✰ ✷ ✹ ✮ ✱ ✵ ✲ ✳ ❲ ✽ ✷ ✼ ✬ ✽ ✾ ❄ ✽ ✳ ✴ ✼ ✽ ✴ ✰ ✫ ✬ ✮✯ ✰ ✵ ✵ ✲ ✳ ✴ ✽ ✳ ✱ ❝ ✰ ✬ ❞ ❡✭❢ ❣❤ ✐ ❥❧❦ ❣♠ ❦ ❣ ❢ ♥ ♦ ♣q ❡r s t ❡ ❥✈✉ ✇① ① ① ②❋③ ④ ④⑥⑤ ⑦⑧✈⑨ ⑩ ❶ ⑤ ❷✭❸ ❹❺ ❻ ❸ ❹ ❺ ⑤ ❻✭❼ ❽✢❾ ❿ ❽ ❹➀ ➁ ❸ ➂ ❿ ➀ ❹ ❸ ➃ ➀⑥➄ ➅ ➀ ❹ ❻ ➆➈➇ ❽ ❹ ➄ ➉ ➊ ❼ ❽ ❾ ❿ ❸ ④ ➊ ➇ ❼ ➅ ⑤ ➋ ➀ ❹ ❹ ❽ ❹ ❺ ➄ ➀ ❹ ➌ ➍ ❸ ❼ ➀ ➎ ➏ ❽ ❹❺ ➇ ❼ ⑤ ➐ ❶ ❶⑧ ⑤ ➑ ➇ ➀ ❿ ➂ ❼ ➇ ⑨ ➒ ❽✢➓ ➊ ➔ ➀ ❼ ❿ ⑨ ❸ → ⑨ ➏ ④ ➇ ➇ ➓ ➊ ➁ ➀ ❺ ❺ ❽ ❹❺ ➣ ❽ ❿ ➊ ➀ ➉✡↔ ➓ ➆ ❽✢➓ ➅ ➇ ④ ➇ ❹ ➌➈➇ ❹ ➓ ↔ ↕✭➇ ❿ ➣ ❸ ❼ ➙ ⑤ ❷➛ ➁ ⑨ ↕➜ ➜ ➉ ➛ ➐ ❶ ❶⑧ ⑤
slide-114
SLIDE 114

References: Other Applications

✂✁ ✄☎ ✆ ✝✟✞ ✄☎ ✞ ✄ ✠ ✡✂☛ ✞ ✄ ☞ ✌ ✍ ✎ ✏✟✑ ✒ ✍ ✓✔ ✔ ✕ ✍
✖ ✗ ✘ ✞ ✗ ✑ ☎ ✁ ☛ ✙✂✚ ✞ ✗ ✙ ✁ ✄ ✛ ✞ ✒ ✑ ✠ ✁ ✄ ✜ ✑ ☎ ✢ ✏ ✞ ☛ ✙ ✚ ✑ ✠ ✣ ✙ ✄ ✑ ✞ ☛ ✘ ✏ ✞ ✒ ✒ ✙ ✤ ✙✦✥ ✞ ✗ ✙ ✁ ✄ ✧✂✑ ✗ ✝ ✁ ✠ ✒ ✍ ★✂✩ ✪✬✫ ✭ ✮✯ ✰ ✱ ✫ ✩ ✲✂✳ ✰ ✭ ✱ ✳ ✴ ✯ ✵ ✶✸✷ ✹✻✺ ✼ ✕ ✍ ✜ ✁ ✄✞ ✏ ✠ ✜ ✁ ✒ ✑ ✄ ✤ ✑ ✏ ✠ ✍ ✽ ✧✂✞ ✖ ✙✦✾ ✢ ✾ ✿ ✄ ✗ ☛ ✁ ❀ ❁ ✽ ❀ ❀ ☛ ✁ ✞ ✥ ✝ ✗ ✁ ✽ ✠ ✞ ❀ ✗ ✙✂❂ ✑ ❃ ✗ ✞ ✗ ✙ ✒ ✗ ✙ ✥ ✞ ✏ ✣ ✞ ✄☎ ✢ ✞ ☎ ✑ ✧ ✁ ✠ ✑ ✏ ✙ ✄ ☎ ✍ ❄ ✫ ✮❅ ❆ ✰ ✳ ✭❈❇ ❉ ❅ ✳ ✳ ❊ ❋ ✯ ✩
✯ ✩ ■ ❆ ✯ ■ ✳ ✕ ✔❑❏ ✕ ▲ ▼✻◆ ◆ ✓ ✓ ▲ ❏ ✕ ❖ ❖P ✍ ✽ ✠✬◗ ✞ ✙ ✗ ✜ ✞ ✗ ✄ ✞ ❀ ✞ ☛ ☞ ✝ ✙ ✍ ✽ ✣ ✙ ✄ ✑ ✞ ☛ ✎ ❘ ✒ ✑ ☛ ❂ ✑ ✠
✾ ✑ ❃ ✗ ✞ ✗ ✙ ✒ ✗ ✙ ✥ ✞ ✏ ❙ ✞ ☛ ✒ ✑ ☛ ✛ ✞ ✒ ✑ ✠ ✁ ✄ ✧✂✞ ✖ ✙ ✾ ✢ ✾ ✿ ✄ ✗ ☛ ✁ ❀ ❁ ✧ ✁ ✠ ✑ ✏ ✒ ✍ ❚ ✄ ❙ ☛ ✁ ✥ ✑ ✑ ✠ ✙ ✄ ☎ ✒ ✁ ✤ ✗ ✝ ✑ ❃ ✑ ✥ ✁ ✄ ✠ ✘ ✁ ✄ ✤ ✑ ☛ ✑ ✄✥ ✑ ✁ ✄ ✿ ✾ ❀ ✙ ☛ ✙ ✥ ✞ ✏ ✧✂✑ ✗ ✝ ✁ ✠ ✒ ✙ ✄ ❯ ✞ ✗ ✢ ☛ ✞ ✏ ✣ ✞ ✄☎ ✢ ✞ ☎ ✑ ❙ ☛ ✁ ✥ ✑ ✒ ✒ ✙ ✄☎ ✍ ✽ ✢ ☎ ✍ ✕ ◆ ✓ ❏ ✕ ❖ ❖ ▼ ✍ ✛ ☛ ✁ ◗ ✄ ❱ ✄ ✙✂❂ ✑ ☛ ✒ ✙ ✗ ❁ ❏ ❙ ☛ ✁ ❂ ✙ ✠ ✑ ✄ ✥ ✑ ❏ ✜ ✝ ✁ ✠ ✑ ❚ ✒ ✏✟✞ ✄ ✠ ✍ ✽ ✠✬◗ ✞ ✙ ✗ ✜ ✞ ✗ ✄ ✞ ❀ ✞ ☛ ☞ ✝ ✙ ✍ ❱ ✄ ✒ ✢ ❀ ✑ ☛ ❂ ✙ ✒ ✑ ✠ ❃ ✗ ✞ ✗ ✙ ✒ ✗ ✙ ✥ ✞ ✏ ✧ ✁ ✠ ✑ ✏ ✒ ✤ ✁ ☛ ❙ ☛ ✑ ❀ ✁ ✒ ✙ ✗ ✙ ✁ ✄✞ ✏ ❙ ✝ ☛ ✞ ✒ ✑ ✽ ✗ ✗ ✞ ✥ ✝ ✾ ✑ ✄ ✗ ✍ ❚ ✄ ❙ ☛ ✁ ✥ ✑ ✑ ✠ ✙ ✄ ☎ ✒ ✁ ✤ ✗ ✝ ✑ ❃ ✑ ❂ ✑ ✄ ✗ ✑ ✑ ✄ ✗ ✝ ❚ ✄ ✗ ✑ ☛ ✄ ✞ ✗ ✙ ✁ ✄✞ ✏ ✘ ✁ ✄ ✤ ✑ ☛ ✑ ✄✥ ✑ ✁ ✄ ✘ ✁ ✾ ❀ ✢ ✗ ✞ ✗ ✙ ✁ ✄✞ ✏ ✣ ✙ ✄☎ ✢ ✙ ✒ ✗ ✙ ✥ ✒ ❏ ✽ ✢ ☎ ✍ ✕ ✔ ◆ ✕ ✶ ❏ ✕ ❖ ❖ ▲ ✍ ✧ ✁ ✄ ✗ ☛ ✑ ✞ ✏ ✍ ✽ ✄ ✠ ☛ ✑ ✙ ✧ ✙ ☞ ✝ ✑ ✑ ❂ ✍ ✓✔ ✔ ✔ ✍
☎ ☎ ✙ ✄☎ ❃ ✑ ✄ ✗ ✑ ✄ ✥ ✑ ✛ ✁ ✢ ✄ ✠ ✞ ☛ ✙ ✑ ✒ ✍ ❲❳ ❳ ❄ ❍ ❨❩ ❩ ❩ ❇ ❀ ❀ ✍ ✓ P ✶ ◆ ✓ ▼ ✕ ✍
slide-115
SLIDE 115

References: Linguistic Issues

✂✄ ☎ ✂ ✆ ✆ ✂ ✝ ✞ ✟ ✠ ✠ ✟ ✞ ✡ ✄ ☛ ☞✌ ✌ ✍ ✂✎ ✏ ☛ ✆ ✏ ☛ ✂ ✍ ✑✓✒ ✝ ☛ ✔ ☛ ✕✖ ☞ ✌ ✌ ✍ ☛✄ ✆ ✑✓✗ ✗ ☞✘ ☛ ✎ ✂ ✄ ✄ ☛ ✙ ✑ ✂ ✄ ✑ ✗ ✆ ☛ ✚ ☞✌ ✌ ✕ ✑ ✎ ☞ ✆ ✑ ✂ ✄ ✗ ☞ ✕ ☞ ✍ ☛ ✎ ✂ ✄ ✄ ☞ ✑ ✗ ✗ ☞ ✄ ✎ ☛ ✔ ☛ ✕ ☞ ✌ ☞ ✍ ✂ ✕ ☛ ✞ ✛ ✏ ✞ ✜ ✞ ✆ ✏ ☛ ✗ ✑ ✗ ✢ ✡ ✄ ✑✤✣ ☛ ✍ ✗ ✑ ✆ ✁ ✔ ☛ ✛ ☞ ✍ ✑ ✗ ✥✦ ✞ ✧ ☞ ✍ ★ ✩ ✂ ✏ ✄ ✗ ✂✄ ✞ ✪✫ ✫ ✟ ✞ ✩ ✂ ✑ ✄ ✆ ☞ ✄ ✔ ✎ ✂ ✄ ✔ ✑ ✆ ✑ ✂ ✄ ☞ ✕ ☛ ✗ ✆ ✑ ✬ ☞ ✆ ✑ ✂ ✄ ✂ ✭ ✆ ☞ ✘ ✘ ✑ ✄ ✘ ☞ ✄ ✔ ✌ ☞ ✍ ✗ ✑ ✄ ✘ ✬ ✂ ✔ ☛ ✕ ✗ ✞ ✦ ✄ ✮ ✯
✠ ✢ ✌ ☞ ✘ ☛ ✗ ✰ ✟✱ ✲ ✰ ✪ ✟ ✞ ✜ ☞ ✄ ✳ ✕ ☛ ✑ ✄ ☞ ✄ ✔ ✯ ✏ ✍ ✑ ✗ ✆ ✂ ✌ ✏ ☛ ✍ ✜ ✞ ✧ ☞ ✄ ✄ ✑ ✄ ✘ ✞ ✪ ✫ ✫ ✪ ✞ ✯ ✂ ✄ ✔ ✑ ✆ ✑ ✂ ✄ ☞ ✕ ✴ ✆ ✍ ✝ ✎ ✆ ✝ ✍ ☛ ✣ ☛ ✍ ✗ ✝ ✗ ✯ ✂ ✄ ✔ ✑ ✆ ✑ ✂ ✄ ☞ ✕ ✵ ✗ ✆ ✑ ✬ ☞ ✆ ✑ ✂✄ ✑ ✄ ✶
✧ ✂ ✔ ☛ ✕ ✗ ✞ ✪ ✫ ✫ ✪ ✯ ✂ ✄ ✭ ☛ ✍ ☛ ✄ ✎ ☛ ✂ ✄ ✵ ✬ ✌ ✑ ✍ ✑ ✎ ☞ ✕ ✧ ☛ ✆ ✏ ✂ ✔ ✗ ✑ ✄ ✶ ☞ ✆ ✝ ✍ ☞ ✕
✄ ✘ ✝ ☞ ✘ ☛ ✛ ✍ ✂ ✎ ☛ ✗ ✗ ✑ ✄ ✘ ✷ ✵ ✧ ✶
✪ ✫ ✫ ✪ ✸ ✢ ✌ ✌ ✞ ✠✺✹ ✟ ✻ ✞ ✮ ✄ ✔ ✍ ☛ ✼ ✧ ✎ ✯ ☞ ✕ ✕ ✝ ✬ ✢ ✜ ☞ ✽ ✄ ☛ ✾ ✍ ☛ ✑ ✆ ☞✘ ☞ ✄ ✔ ✾ ☛ ✍ ✄ ☞ ✄ ✔ ✂ ✛ ☛ ✍ ☛ ✑ ✍ ☞ ✞ ✪✫ ✫ ✫ ✞ ✧ ☞ ✙ ✑ ✬ ✝ ✬ ✵ ✄ ✆ ✍ ✂ ✌ ✽ ✧ ☞ ✍ ★ ✂ ✣ ✧ ✂ ✔ ☛ ✕ ✗ ✭ ✂ ✍ ✦ ✄ ✭ ✂ ✍ ✬ ☞ ✆ ✑ ✂ ✄ ✵ ✙ ✆ ✍ ☞ ✎ ✆ ✑ ✂✄ ☞ ✄ ✔ ✴ ☛ ✘ ✬ ☛ ✄ ✆ ☞ ✆ ✑ ✂ ✄ ✞ ✿ ❀ ❁ ❂❄❃ ❅ ✑ ☛ ❆ ✕ ☛ ✍ ✢ ✴ ✞ ✢ ❇ ✞ ✳ ✑ ✄ ✘ ✢ ❅ ✞ ✳ ☞ ✌ ✕ ☞ ✄ ✢ ❅ ✞ ✯ ✍ ✂ ✝ ✎ ✏ ✢ ✩ ✞ ✧ ☞ ✙ ✼ ☛ ✕ ✕ ☞ ✄ ✔ ✧ ✞ ✩ ✂ ✏ ✄ ✗ ✂ ✄ ✞ ✪ ✫ ✫ ✪ ✞ ✛ ☞ ✍ ✗ ✑ ✄ ✘ ✆ ✏ ☛ ❈ ☞ ✕ ✕ ✴ ✆ ✍ ☛ ☛ ✆ ✩ ✂ ✝ ✍ ✄ ☞ ✕ ✝ ✗ ✑ ✄ ✘ ☞
✙ ✑ ✎ ☞ ✕ ✹ ✾ ✝ ✄ ✎ ✆ ✑ ✂ ✄ ☞ ✕ ❉ ✍ ☞ ✬ ✬ ☞ ✍ ☞ ✄ ✔ ✜ ✑ ✗ ✎ ✍ ✑ ✬ ✑ ✄ ☞ ✆ ✑ ✣ ☛ ✵ ✗ ✆ ✑ ✬ ☞ ✆ ✑ ✂ ✄ ❇ ☛ ✎ ✏ ✄ ✑ ✒ ✝ ☛ ✗ ✞ ❊✤❋● ❍■ ■ ❏ ❑✤▲ ▼◆
  • ❖◗P
❘ ■ ❙ ❚ P ❘ ❯ ▲ ▲ ❱ ❲ ❳ ❁ ■ ■ P ❑ ▲ ▼
P ❘ ■ ❯ ◆ ◆
❑ ❲ P ❑
  • ❨❩
❱ P ❲ P ❑
❲ ❳ ❂ ❑ ▲ ▼ ❱ ❑ ◆ P ❑ ❍ ◆ ❃