Algorithms for NLP CS 11-711 Fall 2020 Lecture 9: CRFs, neural - - PowerPoint PPT Presentation

algorithms for nlp
SMART_READER_LITE
LIVE PREVIEW

Algorithms for NLP CS 11-711 Fall 2020 Lecture 9: CRFs, neural - - PowerPoint PPT Presentation

Algorithms for NLP CS 11-711 Fall 2020 Lecture 9: CRFs, neural sequence labeling Emma Strubell Announcements Project 2 released today after class : sequence labeling. Due: October 16. You will implement part-of-speech taggers for


slide-1
SLIDE 1

Emma Strubell

Algorithms for NLP

CS 11-711 · Fall 2020

Lecture 9: CRFs, neural sequence labeling

slide-2
SLIDE 2

Announcements

2

■ Project 2 released today after class: sequence labeling. ■ Due: October 16. ■ You will implement part-of-speech taggers for English and Norwegian: ■ HMM, BiLSTM, and BiLSTM-CRF. ■ Friday’s recitation will be an overview of P2.

slide-3
SLIDE 3

Recap

3

slide-4
SLIDE 4

Recap

3

■ HMMs: Natural extension of Naive Bayes to sequence labeling

slide-5
SLIDE 5

Recap

3

■ HMMs: Natural extension of Naive Bayes to sequence labeling ■ Hard to add rich features of the input, e.g. affixes, capitalization, …

slide-6
SLIDE 6

Recap

3

■ HMMs: Natural extension of Naive Bayes to sequence labeling ■ Hard to add rich features of the input, e.g. affixes, capitalization, … ■ Would like to train a discriminative model, like logistic regression, to directly model

the conditional probability of labels given inputs.

slide-7
SLIDE 7

Recap

3

■ HMMs: Natural extension of Naive Bayes to sequence labeling ■ Hard to add rich features of the input, e.g. affixes, capitalization, … ■ Would like to train a discriminative model, like logistic regression, to directly model

the conditional probability of labels given inputs.

■ Logistic regression (MEMMs) suffer from the label bias problem.

slide-8
SLIDE 8

Recap

3

■ HMMs: Natural extension of Naive Bayes to sequence labeling ■ Hard to add rich features of the input, e.g. affixes, capitalization, … ■ Would like to train a discriminative model, like logistic regression, to directly model

the conditional probability of labels given inputs.

■ Logistic regression (MEMMs) suffer from the label bias problem. ■ Solution: linear-chain CRFs.

slide-9
SLIDE 9

Conditional random fields (CRFs)

4

■ Linear-chain CRFs: Globally-normalized discriminative sequence labeling models!

slide-10
SLIDE 10

P(y | w) = exp(Ψ(w, y)) P

y0∈Y(w)

exp(Ψ(w, y0))

<latexit sha1_base64="Hl4SzGDfq/v1/mRkJxB+iyWEgs4=">AH9nicnVTPb9s2FayZG69X0137IVbEFha3cDqCmyXAkW3wy7tPMBpOoSOQFGUTYeUBIpybBD8V3obdt3fstv+kx1HUrJrOU4CTICl5/e+7+P3HgnGBaOlHAz+2dv/5ODw086Dh93Pv/iy68eHT1+V+aVwOQM5ywX72NUEkYzciapZOR9IQjiMSPn8dVPtn4+J6KkeTaSy4KMOZpkNKUYSZOKjg6uidwiqSOgovM/ASQCQmHC0iJW1Cg6HvAgA5TcC1DYPuyU0YTAXCaug7hMO6QtDQA72qBXoX/XZi96SpgY2i4ReFyBcAmncSKfoy1JeZBdIGR3xbrRBrUQVfRbq4H/M4qP29kBudwZ2Lj2PpD8L7KpOxhHfzo2eKfs0QCY103GkZtrPI2m9IlZM0ZpVnzFaip3UG2bI73R4qj2SzesXVtrM32p6NOZ7tdOr+xf6zdo5rvG0v6FyvrjgMki8KHckokAhAnuQSpfTvkdbKNgMZ5VSWposepOZkIDnFiKnftd4p2+vfI2r9jmq359sWhyX1z/tgtL30qAfM2mBzcdBm9KzwR357I9RbDQtTur9n0IjAmKtrU7XfpV7Jrf1wp/rmagtoB6BbmZgM6leKywjbl+KG/k+4EF36NeibgI1atcYthxsT8Sle3p7LA0t0LcI9axS9Oh4cDpwD7gZhE1w7DXPMDraFzDJcVJjFDZXkRDgo5VkhIihnRXViVpED4Ck3IhQkzxEk5Vu5a1ODEZBKQ5sL8MglcdpOhEC/LJY8N0jZSbtdsclftopLpj2NFs6KSJMP1QmnFgMyBvWNBQgXBki1NgLCgxivAU2TGLM1NbDZ6Y5kpYXMi241gPlZl6lZvWYq5bpMXdaNdKEhGrnHOcqS7xRMEadsmZAUVUza/UtX8a59ZM5LcpmdGtJRiTMBZ3QDFGUgntq502n6mE7t2FPxOzQYK8Ma5/LYhAMhfGSX25aLNhE/iNvdf0XUiarZEmbLelnAHTjJ1LXpBM6fr4sbwkMJ6IvCpahm/wnVEjgFKzDTWetGk1wpzScPtM3gzePT8Nvz9/tuL41evm/P6wHvifev5Xuj94L3yfvG3pmHD/4+Pdw73C/s+h86PzR+bOG7u81nK+91tP56z9G0Lzs</latexit>

Conditional random fields (CRFs)

4

■ Linear-chain CRFs: Globally-normalized discriminative sequence labeling models!

slide-11
SLIDE 11

P(y | w) = exp(Ψ(w, y)) P

y0∈Y(w)

exp(Ψ(w, y0))

<latexit sha1_base64="Hl4SzGDfq/v1/mRkJxB+iyWEgs4=">AH9nicnVTPb9s2FayZG69X0137IVbEFha3cDqCmyXAkW3wy7tPMBpOoSOQFGUTYeUBIpybBD8V3obdt3fstv+kx1HUrJrOU4CTICl5/e+7+P3HgnGBaOlHAz+2dv/5ODw086Dh93Pv/iy68eHT1+V+aVwOQM5ywX72NUEkYzciapZOR9IQjiMSPn8dVPtn4+J6KkeTaSy4KMOZpkNKUYSZOKjg6uidwiqSOgovM/ASQCQmHC0iJW1Cg6HvAgA5TcC1DYPuyU0YTAXCaug7hMO6QtDQA72qBXoX/XZi96SpgY2i4ReFyBcAmncSKfoy1JeZBdIGR3xbrRBrUQVfRbq4H/M4qP29kBudwZ2Lj2PpD8L7KpOxhHfzo2eKfs0QCY103GkZtrPI2m9IlZM0ZpVnzFaip3UG2bI73R4qj2SzesXVtrM32p6NOZ7tdOr+xf6zdo5rvG0v6FyvrjgMki8KHckokAhAnuQSpfTvkdbKNgMZ5VSWposepOZkIDnFiKnftd4p2+vfI2r9jmq359sWhyX1z/tgtL30qAfM2mBzcdBm9KzwR357I9RbDQtTur9n0IjAmKtrU7XfpV7Jrf1wp/rmagtoB6BbmZgM6leKywjbl+KG/k+4EF36NeibgI1atcYthxsT8Sle3p7LA0t0LcI9axS9Oh4cDpwD7gZhE1w7DXPMDraFzDJcVJjFDZXkRDgo5VkhIihnRXViVpED4Ck3IhQkzxEk5Vu5a1ODEZBKQ5sL8MglcdpOhEC/LJY8N0jZSbtdsclftopLpj2NFs6KSJMP1QmnFgMyBvWNBQgXBki1NgLCgxivAU2TGLM1NbDZ6Y5kpYXMi241gPlZl6lZvWYq5bpMXdaNdKEhGrnHOcqS7xRMEadsmZAUVUza/UtX8a59ZM5LcpmdGtJRiTMBZ3QDFGUgntq502n6mE7t2FPxOzQYK8Ma5/LYhAMhfGSX25aLNhE/iNvdf0XUiarZEmbLelnAHTjJ1LXpBM6fr4sbwkMJ6IvCpahm/wnVEjgFKzDTWetGk1wpzScPtM3gzePT8Nvz9/tuL41evm/P6wHvifev5Xuj94L3yfvG3pmHD/4+Pdw73C/s+h86PzR+bOG7u81nK+91tP56z9G0Lzs</latexit>

Conditional random fields (CRFs)

4

y’ is an entire sequence

■ Linear-chain CRFs: Globally-normalized discriminative sequence labeling models!

slide-12
SLIDE 12

P(y | w) = exp(Ψ(w, y)) P

y0∈Y(w)

exp(Ψ(w, y0))

<latexit sha1_base64="Hl4SzGDfq/v1/mRkJxB+iyWEgs4=">AH9nicnVTPb9s2FayZG69X0137IVbEFha3cDqCmyXAkW3wy7tPMBpOoSOQFGUTYeUBIpybBD8V3obdt3fstv+kx1HUrJrOU4CTICl5/e+7+P3HgnGBaOlHAz+2dv/5ODw086Dh93Pv/iy68eHT1+V+aVwOQM5ywX72NUEkYzciapZOR9IQjiMSPn8dVPtn4+J6KkeTaSy4KMOZpkNKUYSZOKjg6uidwiqSOgovM/ASQCQmHC0iJW1Cg6HvAgA5TcC1DYPuyU0YTAXCaug7hMO6QtDQA72qBXoX/XZi96SpgY2i4ReFyBcAmncSKfoy1JeZBdIGR3xbrRBrUQVfRbq4H/M4qP29kBudwZ2Lj2PpD8L7KpOxhHfzo2eKfs0QCY103GkZtrPI2m9IlZM0ZpVnzFaip3UG2bI73R4qj2SzesXVtrM32p6NOZ7tdOr+xf6zdo5rvG0v6FyvrjgMki8KHckokAhAnuQSpfTvkdbKNgMZ5VSWposepOZkIDnFiKnftd4p2+vfI2r9jmq359sWhyX1z/tgtL30qAfM2mBzcdBm9KzwR357I9RbDQtTur9n0IjAmKtrU7XfpV7Jrf1wp/rmagtoB6BbmZgM6leKywjbl+KG/k+4EF36NeibgI1atcYthxsT8Sle3p7LA0t0LcI9axS9Oh4cDpwD7gZhE1w7DXPMDraFzDJcVJjFDZXkRDgo5VkhIihnRXViVpED4Ck3IhQkzxEk5Vu5a1ODEZBKQ5sL8MglcdpOhEC/LJY8N0jZSbtdsclftopLpj2NFs6KSJMP1QmnFgMyBvWNBQgXBki1NgLCgxivAU2TGLM1NbDZ6Y5kpYXMi241gPlZl6lZvWYq5bpMXdaNdKEhGrnHOcqS7xRMEadsmZAUVUza/UtX8a59ZM5LcpmdGtJRiTMBZ3QDFGUgntq502n6mE7t2FPxOzQYK8Ma5/LYhAMhfGSX25aLNhE/iNvdf0XUiarZEmbLelnAHTjJ1LXpBM6fr4sbwkMJ6IvCpahm/wnVEjgFKzDTWetGk1wpzScPtM3gzePT8Nvz9/tuL41evm/P6wHvifev5Xuj94L3yfvG3pmHD/4+Pdw73C/s+h86PzR+bOG7u81nK+91tP56z9G0Lzs</latexit>

Conditional random fields (CRFs)

4

y’ is an entire sequence

Ψ(w, y) =

M+1

X

m=1

ψ(w, ym, ym−1, m)

<latexit sha1_base64="LEiuqxhwrDWuN4pQW8a63QJUygI=">AEvXiclVLdbtMwFE5GgVH+NrjkxlBNa2GbmoENxMTcMHNRJHYD5pL5TgnrTX/RLaztbLyIDwNt/AIvA1O0k3LOpCwFPvk+DvnfMfnizPOjO3f4dLN1o3b91evtO+e+/+g4crq48OjMo1hX2quNJHMTHAmYR9yHo0wDETGHw/jkfXl/eAraMCW/2FkGQ0HGkqWMEutdo9Vwu72GB4Z1cSzcWbGBynNW9NAOwiYXmDPBrBk5sRMV39zei6goAdhOwJICYZoW4WkxUWG2UiUmxObkf8TvfbaoFtnxYIlqIZVBVJNqMwzbrXUegVrkGhcq/7okwiLIidUMLd1/O6veIvidbLTO3/7DG7hF7oZ7TS6W/1q4UWjWhudIL5GoxWlzROFM0FSEs5MeY46md26Ii2jHLw7HIDGaEnZAzH3pREgBm6aroFWvOeBKVK+09aVHkvRzgijJmJ2CPLVzFX70rndXfHuU3fDB2TW5B0rpQmnNkFSqlghKmgVo+8wahmnmuiE6In5n1gvKyuVRmAvwUbLMRKobOpFX1BqVYFM3gad1oG2uQcEaVEQmzx1OiWB8lkBKcm5LMaTn9nXvtZGcszMn+4iJQeLlWZjJgnkFpcbk23PyYWV3sbfwA/IA17nvWnDSxSnsmRI8FmRZ+YGP8FJfmv5BMXiC92WzLVQR8M+W7qAykK2otc2UAx2Ot8qxBeCG+IuoTkNSPocZDM6xGeJVGVzW5aBxsb0Uvt7Y/v+rsvpvrdTl4EjwLukEUvA52g4/BINgPaPg9/BH+DH+13ragxVuyhi6F85jHQWO1zv4AU5+hA=</latexit>

■ Linear-chain CRFs: Globally-normalized discriminative sequence labeling models!

slide-13
SLIDE 13

P(y | w) = exp(Ψ(w, y)) P

y0∈Y(w)

exp(Ψ(w, y0))

<latexit sha1_base64="Hl4SzGDfq/v1/mRkJxB+iyWEgs4=">AH9nicnVTPb9s2FayZG69X0137IVbEFha3cDqCmyXAkW3wy7tPMBpOoSOQFGUTYeUBIpybBD8V3obdt3fstv+kx1HUrJrOU4CTICl5/e+7+P3HgnGBaOlHAz+2dv/5ODw086Dh93Pv/iy68eHT1+V+aVwOQM5ywX72NUEkYzciapZOR9IQjiMSPn8dVPtn4+J6KkeTaSy4KMOZpkNKUYSZOKjg6uidwiqSOgovM/ASQCQmHC0iJW1Cg6HvAgA5TcC1DYPuyU0YTAXCaug7hMO6QtDQA72qBXoX/XZi96SpgY2i4ReFyBcAmncSKfoy1JeZBdIGR3xbrRBrUQVfRbq4H/M4qP29kBudwZ2Lj2PpD8L7KpOxhHfzo2eKfs0QCY103GkZtrPI2m9IlZM0ZpVnzFaip3UG2bI73R4qj2SzesXVtrM32p6NOZ7tdOr+xf6zdo5rvG0v6FyvrjgMki8KHckokAhAnuQSpfTvkdbKNgMZ5VSWposepOZkIDnFiKnftd4p2+vfI2r9jmq359sWhyX1z/tgtL30qAfM2mBzcdBm9KzwR357I9RbDQtTur9n0IjAmKtrU7XfpV7Jrf1wp/rmagtoB6BbmZgM6leKywjbl+KG/k+4EF36NeibgI1atcYthxsT8Sle3p7LA0t0LcI9axS9Oh4cDpwD7gZhE1w7DXPMDraFzDJcVJjFDZXkRDgo5VkhIihnRXViVpED4Ck3IhQkzxEk5Vu5a1ODEZBKQ5sL8MglcdpOhEC/LJY8N0jZSbtdsclftopLpj2NFs6KSJMP1QmnFgMyBvWNBQgXBki1NgLCgxivAU2TGLM1NbDZ6Y5kpYXMi241gPlZl6lZvWYq5bpMXdaNdKEhGrnHOcqS7xRMEadsmZAUVUza/UtX8a59ZM5LcpmdGtJRiTMBZ3QDFGUgntq502n6mE7t2FPxOzQYK8Ma5/LYhAMhfGSX25aLNhE/iNvdf0XUiarZEmbLelnAHTjJ1LXpBM6fr4sbwkMJ6IvCpahm/wnVEjgFKzDTWetGk1wpzScPtM3gzePT8Nvz9/tuL41evm/P6wHvifev5Xuj94L3yfvG3pmHD/4+Pdw73C/s+h86PzR+bOG7u81nK+91tP56z9G0Lzs</latexit>

Conditional random fields (CRFs)

4

y’ is an entire sequence decompose into local scores

Ψ(w, y) =

M+1

X

m=1

ψ(w, ym, ym−1, m)

<latexit sha1_base64="LEiuqxhwrDWuN4pQW8a63QJUygI=">AEvXiclVLdbtMwFE5GgVH+NrjkxlBNa2GbmoENxMTcMHNRJHYD5pL5TgnrTX/RLaztbLyIDwNt/AIvA1O0k3LOpCwFPvk+DvnfMfnizPOjO3f4dLN1o3b91evtO+e+/+g4crq48OjMo1hX2quNJHMTHAmYR9yHo0wDETGHw/jkfXl/eAraMCW/2FkGQ0HGkqWMEutdo9Vwu72GB4Z1cSzcWbGBynNW9NAOwiYXmDPBrBk5sRMV39zei6goAdhOwJICYZoW4WkxUWG2UiUmxObkf8TvfbaoFtnxYIlqIZVBVJNqMwzbrXUegVrkGhcq/7okwiLIidUMLd1/O6veIvidbLTO3/7DG7hF7oZ7TS6W/1q4UWjWhudIL5GoxWlzROFM0FSEs5MeY46md26Ii2jHLw7HIDGaEnZAzH3pREgBm6aroFWvOeBKVK+09aVHkvRzgijJmJ2CPLVzFX70rndXfHuU3fDB2TW5B0rpQmnNkFSqlghKmgVo+8wahmnmuiE6In5n1gvKyuVRmAvwUbLMRKobOpFX1BqVYFM3gad1oG2uQcEaVEQmzx1OiWB8lkBKcm5LMaTn9nXvtZGcszMn+4iJQeLlWZjJgnkFpcbk23PyYWV3sbfwA/IA17nvWnDSxSnsmRI8FmRZ+YGP8FJfmv5BMXiC92WzLVQR8M+W7qAykK2otc2UAx2Ot8qxBeCG+IuoTkNSPocZDM6xGeJVGVzW5aBxsb0Uvt7Y/v+rsvpvrdTl4EjwLukEUvA52g4/BINgPaPg9/BH+DH+13ragxVuyhi6F85jHQWO1zv4AU5+hA=</latexit>

■ Linear-chain CRFs: Globally-normalized discriminative sequence labeling models!

slide-14
SLIDE 14

P(y | w) = exp(Ψ(w, y)) P

y0∈Y(w)

exp(Ψ(w, y0))

<latexit sha1_base64="Hl4SzGDfq/v1/mRkJxB+iyWEgs4=">AH9nicnVTPb9s2FayZG69X0137IVbEFha3cDqCmyXAkW3wy7tPMBpOoSOQFGUTYeUBIpybBD8V3obdt3fstv+kx1HUrJrOU4CTICl5/e+7+P3HgnGBaOlHAz+2dv/5ODw086Dh93Pv/iy68eHT1+V+aVwOQM5ywX72NUEkYzciapZOR9IQjiMSPn8dVPtn4+J6KkeTaSy4KMOZpkNKUYSZOKjg6uidwiqSOgovM/ASQCQmHC0iJW1Cg6HvAgA5TcC1DYPuyU0YTAXCaug7hMO6QtDQA72qBXoX/XZi96SpgY2i4ReFyBcAmncSKfoy1JeZBdIGR3xbrRBrUQVfRbq4H/M4qP29kBudwZ2Lj2PpD8L7KpOxhHfzo2eKfs0QCY103GkZtrPI2m9IlZM0ZpVnzFaip3UG2bI73R4qj2SzesXVtrM32p6NOZ7tdOr+xf6zdo5rvG0v6FyvrjgMki8KHckokAhAnuQSpfTvkdbKNgMZ5VSWposepOZkIDnFiKnftd4p2+vfI2r9jmq359sWhyX1z/tgtL30qAfM2mBzcdBm9KzwR357I9RbDQtTur9n0IjAmKtrU7XfpV7Jrf1wp/rmagtoB6BbmZgM6leKywjbl+KG/k+4EF36NeibgI1atcYthxsT8Sle3p7LA0t0LcI9axS9Oh4cDpwD7gZhE1w7DXPMDraFzDJcVJjFDZXkRDgo5VkhIihnRXViVpED4Ck3IhQkzxEk5Vu5a1ODEZBKQ5sL8MglcdpOhEC/LJY8N0jZSbtdsclftopLpj2NFs6KSJMP1QmnFgMyBvWNBQgXBki1NgLCgxivAU2TGLM1NbDZ6Y5kpYXMi241gPlZl6lZvWYq5bpMXdaNdKEhGrnHOcqS7xRMEadsmZAUVUza/UtX8a59ZM5LcpmdGtJRiTMBZ3QDFGUgntq502n6mE7t2FPxOzQYK8Ma5/LYhAMhfGSX25aLNhE/iNvdf0XUiarZEmbLelnAHTjJ1LXpBM6fr4sbwkMJ6IvCpahm/wnVEjgFKzDTWetGk1wpzScPtM3gzePT8Nvz9/tuL41evm/P6wHvifev5Xuj94L3yfvG3pmHD/4+Pdw73C/s+h86PzR+bOG7u81nK+91tP56z9G0Lzs</latexit>

Conditional random fields (CRFs)

4

y’ is an entire sequence

y) =

M+1

X

m=1

θ · f(w, ym, ym−1, m)

<latexit sha1_base64="6QgzZybjFg9uG/YPephMOPdGJY8=">AHcnicnVTdbts2FZTr+68nzbdXvDLghirV5gdQXWmwJFt4vdNPMAp+kQOgJFUTYdUhJIyolBcO+5J9gL7AFGUrJrOW4CTICp43PO953vHFJMSkalGg7/vrd3v/PFg+7DL3tf3Nt48e7z/5ItKYHKC1aIjwmShNGcnCqGPlYCoJ4wshZcvmLi58tiJC0yMdqWZIJR9OcZhQjZV3x/v1/eodwhpRWJo4ucvAGQCSmHF3HWjmHAaO+NwDkNAVXzgx7hzfTYCYQ1qO+z/C5PhA28NCsYqHZBf8sHfYxMBG0OLUhTXANo1jTV9E5mL3CXSJot64O3ZNmtFqumPkQn/xyw+cW8P5PKwM7Si1j156Gr6mk8GRh+Wy4T0NkXOTxHpu+kWsnFbEyhlao2TFV6gmcgvUtTk2Gy2Oa710Q9qVkzY3F5q+mJtBrfTS/XV6w2a+61w6WPfiaP1xgOS67EM1IwoBiNCgcxBndQG+2agYxyqTt4ghSezKQmHE9J/G7KQ9GtxB6vSOa7Vn2xJHkvbPBmC8Xp8BGxtsFkctBFHjvgTvr0R+sTA0obu7hn0PAdMuL6yQfdemhXbWg73pO9fRMYl1BMwzQicJzNrhmXM3aK5ZR8AHsaPD4bHQ/+Am0bUGAdB84zi/T0B0wJXnOQKMyTleTQs1UQjoShmxPRgJUmJ8CWaknNr5ogTOdH+SjLg0HpSkBXC/nIFvHcToRGXcskTm+lmK7djzrkrdl6p7PVE07ysFMlxXSirGFAFcPcbSKkgWLGlNRAW1GoFeIbsTit7C9qd2igzI2xBVLsRzCdaZr56S1LCTRt8XTfag4Lk5AoXnKM8/UHDHKlinJUMWUO03Zyt41r0G6oKVsRremZETBQtApzRFjJFPQLW23fc0U9GsP/krsBgny3qr+vSQCqUJYJfWHbeyGTeFzd6eY2zJpvs60Zrst7QXYZtxcipLk2tRfBCskgclUFXZEnwD74VaApTZbajzSRtWZ9hTGm2fyZvGh5fH0U/HL/94dfD2XNeHwbPgu+DfhAFPwdvg9+CUXAa4M5JR3VM568H/3afdp93m8O9d6/BfBe0nu7gPxLUizA=</latexit>

decompose into local scores

Ψ(w, y) =

M+1

X

m=1

ψ(w, ym, ym−1, m)

<latexit sha1_base64="LEiuqxhwrDWuN4pQW8a63QJUygI=">AEvXiclVLdbtMwFE5GgVH+NrjkxlBNa2GbmoENxMTcMHNRJHYD5pL5TgnrTX/RLaztbLyIDwNt/AIvA1O0k3LOpCwFPvk+DvnfMfnizPOjO3f4dLN1o3b91evtO+e+/+g4crq48OjMo1hX2quNJHMTHAmYR9yHo0wDETGHw/jkfXl/eAraMCW/2FkGQ0HGkqWMEutdo9Vwu72GB4Z1cSzcWbGBynNW9NAOwiYXmDPBrBk5sRMV39zei6goAdhOwJICYZoW4WkxUWG2UiUmxObkf8TvfbaoFtnxYIlqIZVBVJNqMwzbrXUegVrkGhcq/7okwiLIidUMLd1/O6veIvidbLTO3/7DG7hF7oZ7TS6W/1q4UWjWhudIL5GoxWlzROFM0FSEs5MeY46md26Ii2jHLw7HIDGaEnZAzH3pREgBm6aroFWvOeBKVK+09aVHkvRzgijJmJ2CPLVzFX70rndXfHuU3fDB2TW5B0rpQmnNkFSqlghKmgVo+8wahmnmuiE6In5n1gvKyuVRmAvwUbLMRKobOpFX1BqVYFM3gad1oG2uQcEaVEQmzx1OiWB8lkBKcm5LMaTn9nXvtZGcszMn+4iJQeLlWZjJgnkFpcbk23PyYWV3sbfwA/IA17nvWnDSxSnsmRI8FmRZ+YGP8FJfmv5BMXiC92WzLVQR8M+W7qAykK2otc2UAx2Ot8qxBeCG+IuoTkNSPocZDM6xGeJVGVzW5aBxsb0Uvt7Y/v+rsvpvrdTl4EjwLukEUvA52g4/BINgPaPg9/BH+DH+13ragxVuyhi6F85jHQWO1zv4AU5+hA=</latexit>

■ Linear-chain CRFs: Globally-normalized discriminative sequence labeling models!

slide-15
SLIDE 15

Conditional random fields (CRFs)

5

Ψ(w, y) =

M+1

X

m=1

θ · f(w, ym, ym−1, m)

<latexit sha1_base64="6QgzZybjFg9uG/YPephMOPdGJY8=">AHcnicnVTdbts2FZTr+68nzbdXvDLghirV5gdQXWmwJFt4vdNPMAp+kQOgJFUTYdUhJIyolBcO+5J9gL7AFGUrJrOW4CTICp43PO953vHFJMSkalGg7/vrd3v/PFg+7DL3tf3Nt48e7z/5ItKYHKC1aIjwmShNGcnCqGPlYCoJ4wshZcvmLi58tiJC0yMdqWZIJR9OcZhQjZV3x/v1/eodwhpRWJo4ucvAGQCSmHF3HWjmHAaO+NwDkNAVXzgx7hzfTYCYQ1qO+z/C5PhA28NCsYqHZBf8sHfYxMBG0OLUhTXANo1jTV9E5mL3CXSJot64O3ZNmtFqumPkQn/xyw+cW8P5PKwM7Si1j156Gr6mk8GRh+Wy4T0NkXOTxHpu+kWsnFbEyhlao2TFV6gmcgvUtTk2Gy2Oa710Q9qVkzY3F5q+mJtBrfTS/XV6w2a+61w6WPfiaP1xgOS67EM1IwoBiNCgcxBndQG+2agYxyqTt4ghSezKQmHE9J/G7KQ9GtxB6vSOa7Vn2xJHkvbPBmC8Xp8BGxtsFkctBFHjvgTvr0R+sTA0obu7hn0PAdMuL6yQfdemhXbWg73pO9fRMYl1BMwzQicJzNrhmXM3aK5ZR8AHsaPD4bHQ/+Am0bUGAdB84zi/T0B0wJXnOQKMyTleTQs1UQjoShmxPRgJUmJ8CWaknNr5ogTOdH+SjLg0HpSkBXC/nIFvHcToRGXcskTm+lmK7djzrkrdl6p7PVE07ysFMlxXSirGFAFcPcbSKkgWLGlNRAW1GoFeIbsTit7C9qd2igzI2xBVLsRzCdaZr56S1LCTRt8XTfag4Lk5AoXnKM8/UHDHKlinJUMWUO03Zyt41r0G6oKVsRremZETBQtApzRFjJFPQLW23fc0U9GsP/krsBgny3qr+vSQCqUJYJfWHbeyGTeFzd6eY2zJpvs60Zrst7QXYZtxcipLk2tRfBCskgclUFXZEnwD74VaApTZbajzSRtWZ9hTGm2fyZvGh5fH0U/HL/94dfD2XNeHwbPgu+DfhAFPwdvg9+CUXAa4M5JR3VM568H/3afdp93m8O9d6/BfBe0nu7gPxLUizA=</latexit>

■ Linear-chain CRFs: Globally-normalized discriminative sequence labeling models!

slide-16
SLIDE 16

Conditional random fields (CRFs)

5

Ψ(w, y) =

M+1

X

m=1

θ · f(w, ym, ym−1, m)

<latexit sha1_base64="6QgzZybjFg9uG/YPephMOPdGJY8=">AHcnicnVTdbts2FZTr+68nzbdXvDLghirV5gdQXWmwJFt4vdNPMAp+kQOgJFUTYdUhJIyolBcO+5J9gL7AFGUrJrOW4CTICp43PO953vHFJMSkalGg7/vrd3v/PFg+7DL3tf3Nt48e7z/5ItKYHKC1aIjwmShNGcnCqGPlYCoJ4wshZcvmLi58tiJC0yMdqWZIJR9OcZhQjZV3x/v1/eodwhpRWJo4ucvAGQCSmHF3HWjmHAaO+NwDkNAVXzgx7hzfTYCYQ1qO+z/C5PhA28NCsYqHZBf8sHfYxMBG0OLUhTXANo1jTV9E5mL3CXSJot64O3ZNmtFqumPkQn/xyw+cW8P5PKwM7Si1j156Gr6mk8GRh+Wy4T0NkXOTxHpu+kWsnFbEyhlao2TFV6gmcgvUtTk2Gy2Oa710Q9qVkzY3F5q+mJtBrfTS/XV6w2a+61w6WPfiaP1xgOS67EM1IwoBiNCgcxBndQG+2agYxyqTt4ghSezKQmHE9J/G7KQ9GtxB6vSOa7Vn2xJHkvbPBmC8Xp8BGxtsFkctBFHjvgTvr0R+sTA0obu7hn0PAdMuL6yQfdemhXbWg73pO9fRMYl1BMwzQicJzNrhmXM3aK5ZR8AHsaPD4bHQ/+Am0bUGAdB84zi/T0B0wJXnOQKMyTleTQs1UQjoShmxPRgJUmJ8CWaknNr5ogTOdH+SjLg0HpSkBXC/nIFvHcToRGXcskTm+lmK7djzrkrdl6p7PVE07ysFMlxXSirGFAFcPcbSKkgWLGlNRAW1GoFeIbsTit7C9qd2igzI2xBVLsRzCdaZr56S1LCTRt8XTfag4Lk5AoXnKM8/UHDHKlinJUMWUO03Zyt41r0G6oKVsRremZETBQtApzRFjJFPQLW23fc0U9GsP/krsBgny3qr+vSQCqUJYJfWHbeyGTeFzd6eY2zJpvs60Zrst7QXYZtxcipLk2tRfBCskgclUFXZEnwD74VaApTZbajzSRtWZ9hTGm2fyZvGh5fH0U/HL/94dfD2XNeHwbPgu+DfhAFPwdvg9+CUXAa4M5JR3VM568H/3afdp93m8O9d6/BfBe0nu7gPxLUizA=</latexit>

Janet will back the bill

wm+1 wm wm-1 wm-2

<s>

w

■ Linear-chain CRFs: Globally-normalized discriminative sequence labeling models!

slide-17
SLIDE 17

Conditional random fields (CRFs)

5

Ψ(w, y) =

M+1

X

m=1

θ · f(w, ym, ym−1, m)

<latexit sha1_base64="6QgzZybjFg9uG/YPephMOPdGJY8=">AHcnicnVTdbts2FZTr+68nzbdXvDLghirV5gdQXWmwJFt4vdNPMAp+kQOgJFUTYdUhJIyolBcO+5J9gL7AFGUrJrOW4CTICp43PO953vHFJMSkalGg7/vrd3v/PFg+7DL3tf3Nt48e7z/5ItKYHKC1aIjwmShNGcnCqGPlYCoJ4wshZcvmLi58tiJC0yMdqWZIJR9OcZhQjZV3x/v1/eodwhpRWJo4ucvAGQCSmHF3HWjmHAaO+NwDkNAVXzgx7hzfTYCYQ1qO+z/C5PhA28NCsYqHZBf8sHfYxMBG0OLUhTXANo1jTV9E5mL3CXSJot64O3ZNmtFqumPkQn/xyw+cW8P5PKwM7Si1j156Gr6mk8GRh+Wy4T0NkXOTxHpu+kWsnFbEyhlao2TFV6gmcgvUtTk2Gy2Oa710Q9qVkzY3F5q+mJtBrfTS/XV6w2a+61w6WPfiaP1xgOS67EM1IwoBiNCgcxBndQG+2agYxyqTt4ghSezKQmHE9J/G7KQ9GtxB6vSOa7Vn2xJHkvbPBmC8Xp8BGxtsFkctBFHjvgTvr0R+sTA0obu7hn0PAdMuL6yQfdemhXbWg73pO9fRMYl1BMwzQicJzNrhmXM3aK5ZR8AHsaPD4bHQ/+Am0bUGAdB84zi/T0B0wJXnOQKMyTleTQs1UQjoShmxPRgJUmJ8CWaknNr5ogTOdH+SjLg0HpSkBXC/nIFvHcToRGXcskTm+lmK7djzrkrdl6p7PVE07ysFMlxXSirGFAFcPcbSKkgWLGlNRAW1GoFeIbsTit7C9qd2igzI2xBVLsRzCdaZr56S1LCTRt8XTfag4Lk5AoXnKM8/UHDHKlinJUMWUO03Zyt41r0G6oKVsRremZETBQtApzRFjJFPQLW23fc0U9GsP/krsBgny3qr+vSQCqUJYJfWHbeyGTeFzd6eY2zJpvs60Zrst7QXYZtxcipLk2tRfBCskgclUFXZEnwD74VaApTZbajzSRtWZ9hTGm2fyZvGh5fH0U/HL/94dfD2XNeHwbPgu+DfhAFPwdvg9+CUXAa4M5JR3VM568H/3afdp93m8O9d6/BfBe0nu7gPxLUizA=</latexit>

Janet will back the bill

wm+1 wm wm-1 wm-2

VB?

<s>

w

■ Linear-chain CRFs: Globally-normalized discriminative sequence labeling models!

slide-18
SLIDE 18

Conditional random fields (CRFs)

5

Ψ(w, y) =

M+1

X

m=1

θ · f(w, ym, ym−1, m)

<latexit sha1_base64="6QgzZybjFg9uG/YPephMOPdGJY8=">AHcnicnVTdbts2FZTr+68nzbdXvDLghirV5gdQXWmwJFt4vdNPMAp+kQOgJFUTYdUhJIyolBcO+5J9gL7AFGUrJrOW4CTICp43PO953vHFJMSkalGg7/vrd3v/PFg+7DL3tf3Nt48e7z/5ItKYHKC1aIjwmShNGcnCqGPlYCoJ4wshZcvmLi58tiJC0yMdqWZIJR9OcZhQjZV3x/v1/eodwhpRWJo4ucvAGQCSmHF3HWjmHAaO+NwDkNAVXzgx7hzfTYCYQ1qO+z/C5PhA28NCsYqHZBf8sHfYxMBG0OLUhTXANo1jTV9E5mL3CXSJot64O3ZNmtFqumPkQn/xyw+cW8P5PKwM7Si1j156Gr6mk8GRh+Wy4T0NkXOTxHpu+kWsnFbEyhlao2TFV6gmcgvUtTk2Gy2Oa710Q9qVkzY3F5q+mJtBrfTS/XV6w2a+61w6WPfiaP1xgOS67EM1IwoBiNCgcxBndQG+2agYxyqTt4ghSezKQmHE9J/G7KQ9GtxB6vSOa7Vn2xJHkvbPBmC8Xp8BGxtsFkctBFHjvgTvr0R+sTA0obu7hn0PAdMuL6yQfdemhXbWg73pO9fRMYl1BMwzQicJzNrhmXM3aK5ZR8AHsaPD4bHQ/+Am0bUGAdB84zi/T0B0wJXnOQKMyTleTQs1UQjoShmxPRgJUmJ8CWaknNr5ogTOdH+SjLg0HpSkBXC/nIFvHcToRGXcskTm+lmK7djzrkrdl6p7PVE07ysFMlxXSirGFAFcPcbSKkgWLGlNRAW1GoFeIbsTit7C9qd2igzI2xBVLsRzCdaZr56S1LCTRt8XTfag4Lk5AoXnKM8/UHDHKlinJUMWUO03Zyt41r0G6oKVsRremZETBQtApzRFjJFPQLW23fc0U9GsP/krsBgny3qr+vSQCqUJYJfWHbeyGTeFzd6eY2zJpvs60Zrst7QXYZtxcipLk2tRfBCskgclUFXZEnwD74VaApTZbajzSRtWZ9hTGm2fyZvGh5fH0U/HL/94dfD2XNeHwbPgu+DfhAFPwdvg9+CUXAa4M5JR3VM568H/3afdp93m8O9d6/BfBe0nu7gPxLUizA=</latexit>

1 … 1 … … 1 …

Janet will back the bill

wm+1 wm wm-1 wm-2

VB?

<s>

w

■ Linear-chain CRFs: Globally-normalized discriminative sequence labeling models!

slide-19
SLIDE 19

Conditional random fields (CRFs)

5

Ψ(w, y) =

M+1

X

m=1

θ · f(w, ym, ym−1, m)

<latexit sha1_base64="6QgzZybjFg9uG/YPephMOPdGJY8=">AHcnicnVTdbts2FZTr+68nzbdXvDLghirV5gdQXWmwJFt4vdNPMAp+kQOgJFUTYdUhJIyolBcO+5J9gL7AFGUrJrOW4CTICp43PO953vHFJMSkalGg7/vrd3v/PFg+7DL3tf3Nt48e7z/5ItKYHKC1aIjwmShNGcnCqGPlYCoJ4wshZcvmLi58tiJC0yMdqWZIJR9OcZhQjZV3x/v1/eodwhpRWJo4ucvAGQCSmHF3HWjmHAaO+NwDkNAVXzgx7hzfTYCYQ1qO+z/C5PhA28NCsYqHZBf8sHfYxMBG0OLUhTXANo1jTV9E5mL3CXSJot64O3ZNmtFqumPkQn/xyw+cW8P5PKwM7Si1j156Gr6mk8GRh+Wy4T0NkXOTxHpu+kWsnFbEyhlao2TFV6gmcgvUtTk2Gy2Oa710Q9qVkzY3F5q+mJtBrfTS/XV6w2a+61w6WPfiaP1xgOS67EM1IwoBiNCgcxBndQG+2agYxyqTt4ghSezKQmHE9J/G7KQ9GtxB6vSOa7Vn2xJHkvbPBmC8Xp8BGxtsFkctBFHjvgTvr0R+sTA0obu7hn0PAdMuL6yQfdemhXbWg73pO9fRMYl1BMwzQicJzNrhmXM3aK5ZR8AHsaPD4bHQ/+Am0bUGAdB84zi/T0B0wJXnOQKMyTleTQs1UQjoShmxPRgJUmJ8CWaknNr5ogTOdH+SjLg0HpSkBXC/nIFvHcToRGXcskTm+lmK7djzrkrdl6p7PVE07ysFMlxXSirGFAFcPcbSKkgWLGlNRAW1GoFeIbsTit7C9qd2igzI2xBVLsRzCdaZr56S1LCTRt8XTfag4Lk5AoXnKM8/UHDHKlinJUMWUO03Zyt41r0G6oKVsRremZETBQtApzRFjJFPQLW23fc0U9GsP/krsBgny3qr+vSQCqUJYJfWHbeyGTeFzd6eY2zJpvs60Zrst7QXYZtxcipLk2tRfBCskgclUFXZEnwD74VaApTZbajzSRtWZ9hTGm2fyZvGh5fH0U/HL/94dfD2XNeHwbPgu+DfhAFPwdvg9+CUXAa4M5JR3VM568H/3afdp93m8O9d6/BfBe0nu7gPxLUizA=</latexit>

1 … 1 … … 1 …

Janet will back the bill

wm+1 wm wm-1 wm-2

NNP MD VB?

<s>

ym-1

f(w, VB, MD, 3)

w y

■ Linear-chain CRFs: Globally-normalized discriminative sequence labeling models!

slide-20
SLIDE 20

Conditional random fields (CRFs)

5

Ψ(w, y) =

M+1

X

m=1

θ · f(w, ym, ym−1, m)

<latexit sha1_base64="6QgzZybjFg9uG/YPephMOPdGJY8=">AHcnicnVTdbts2FZTr+68nzbdXvDLghirV5gdQXWmwJFt4vdNPMAp+kQOgJFUTYdUhJIyolBcO+5J9gL7AFGUrJrOW4CTICp43PO953vHFJMSkalGg7/vrd3v/PFg+7DL3tf3Nt48e7z/5ItKYHKC1aIjwmShNGcnCqGPlYCoJ4wshZcvmLi58tiJC0yMdqWZIJR9OcZhQjZV3x/v1/eodwhpRWJo4ucvAGQCSmHF3HWjmHAaO+NwDkNAVXzgx7hzfTYCYQ1qO+z/C5PhA28NCsYqHZBf8sHfYxMBG0OLUhTXANo1jTV9E5mL3CXSJot64O3ZNmtFqumPkQn/xyw+cW8P5PKwM7Si1j156Gr6mk8GRh+Wy4T0NkXOTxHpu+kWsnFbEyhlao2TFV6gmcgvUtTk2Gy2Oa710Q9qVkzY3F5q+mJtBrfTS/XV6w2a+61w6WPfiaP1xgOS67EM1IwoBiNCgcxBndQG+2agYxyqTt4ghSezKQmHE9J/G7KQ9GtxB6vSOa7Vn2xJHkvbPBmC8Xp8BGxtsFkctBFHjvgTvr0R+sTA0obu7hn0PAdMuL6yQfdemhXbWg73pO9fRMYl1BMwzQicJzNrhmXM3aK5ZR8AHsaPD4bHQ/+Am0bUGAdB84zi/T0B0wJXnOQKMyTleTQs1UQjoShmxPRgJUmJ8CWaknNr5ogTOdH+SjLg0HpSkBXC/nIFvHcToRGXcskTm+lmK7djzrkrdl6p7PVE07ysFMlxXSirGFAFcPcbSKkgWLGlNRAW1GoFeIbsTit7C9qd2igzI2xBVLsRzCdaZr56S1LCTRt8XTfag4Lk5AoXnKM8/UHDHKlinJUMWUO03Zyt41r0G6oKVsRremZETBQtApzRFjJFPQLW23fc0U9GsP/krsBgny3qr+vSQCqUJYJfWHbeyGTeFzd6eY2zJpvs60Zrst7QXYZtxcipLk2tRfBCskgclUFXZEnwD74VaApTZbajzSRtWZ9hTGm2fyZvGh5fH0U/HL/94dfD2XNeHwbPgu+DfhAFPwdvg9+CUXAa4M5JR3VM568H/3afdp93m8O9d6/BfBe0nu7gPxLUizA=</latexit>

1 … 1 … … 1 …

Janet will back the bill

= f(w1, w2, w3, w4, VB, MD) wm-1 = will wm-1 = my wm = back wm+1 = ache ym-1 = MD

wm+1 wm wm-1 wm-2

NNP MD VB?

<s>

ym-1

f(w, VB, MD, 3)

w y

■ Linear-chain CRFs: Globally-normalized discriminative sequence labeling models!

slide-21
SLIDE 21

Conditional random fields (CRFs)

5

Ψ(w, y) =

M+1

X

m=1

θ · f(w, ym, ym−1, m)

<latexit sha1_base64="6QgzZybjFg9uG/YPephMOPdGJY8=">AHcnicnVTdbts2FZTr+68nzbdXvDLghirV5gdQXWmwJFt4vdNPMAp+kQOgJFUTYdUhJIyolBcO+5J9gL7AFGUrJrOW4CTICp43PO953vHFJMSkalGg7/vrd3v/PFg+7DL3tf3Nt48e7z/5ItKYHKC1aIjwmShNGcnCqGPlYCoJ4wshZcvmLi58tiJC0yMdqWZIJR9OcZhQjZV3x/v1/eodwhpRWJo4ucvAGQCSmHF3HWjmHAaO+NwDkNAVXzgx7hzfTYCYQ1qO+z/C5PhA28NCsYqHZBf8sHfYxMBG0OLUhTXANo1jTV9E5mL3CXSJot64O3ZNmtFqumPkQn/xyw+cW8P5PKwM7Si1j156Gr6mk8GRh+Wy4T0NkXOTxHpu+kWsnFbEyhlao2TFV6gmcgvUtTk2Gy2Oa710Q9qVkzY3F5q+mJtBrfTS/XV6w2a+61w6WPfiaP1xgOS67EM1IwoBiNCgcxBndQG+2agYxyqTt4ghSezKQmHE9J/G7KQ9GtxB6vSOa7Vn2xJHkvbPBmC8Xp8BGxtsFkctBFHjvgTvr0R+sTA0obu7hn0PAdMuL6yQfdemhXbWg73pO9fRMYl1BMwzQicJzNrhmXM3aK5ZR8AHsaPD4bHQ/+Am0bUGAdB84zi/T0B0wJXnOQKMyTleTQs1UQjoShmxPRgJUmJ8CWaknNr5ogTOdH+SjLg0HpSkBXC/nIFvHcToRGXcskTm+lmK7djzrkrdl6p7PVE07ysFMlxXSirGFAFcPcbSKkgWLGlNRAW1GoFeIbsTit7C9qd2igzI2xBVLsRzCdaZr56S1LCTRt8XTfag4Lk5AoXnKM8/UHDHKlinJUMWUO03Zyt41r0G6oKVsRremZETBQtApzRFjJFPQLW23fc0U9GsP/krsBgny3qr+vSQCqUJYJfWHbeyGTeFzd6eY2zJpvs60Zrst7QXYZtxcipLk2tRfBCskgclUFXZEnwD74VaApTZbajzSRtWZ9hTGm2fyZvGh5fH0U/HL/94dfD2XNeHwbPgu+DfhAFPwdvg9+CUXAa4M5JR3VM568H/3afdp93m8O9d6/BfBe0nu7gPxLUizA=</latexit>

1 … 1 … … 1 …

Janet will back the bill

= f(w1, w2, w3, w4, VB, MD) wm-1 = will wm-1 = my wm = back wm+1 = ache ym-1 = MD

wm+1 wm wm-1 wm-2

NNP MD VB?

<s>

ym-1

K + F

<latexit sha1_base64="wMAXQnpawC1ge/9hOKMhUb3NFJc=">AExXiclVJb9MwFE5GgRFuGzyYpimtWxUzUACIU2aAESmigSu6C5VI5z0lqznch2ulVWxG/h1/AKj/wbnKSblnUgYanO1+PvnPOdS5Rxpk2v9tfuNK6eu364o3g5q3bd+4uLd/b02muKOzSlKfqICIaOJOwa5jhcJApICLisB8dvS7f9yegNEvlZzPNYCDISLKEUWKcabjsvwxWcV+zNo6EPS42UPmdFh20hbDOBeZMKOHVmyFxVe7sx4WJQGbMRhSIEzj1FQuSXEWYToU5WXFk9D9E51gtd+uo2LBYlTqgSJItRiOMnal0noFLYhoTKvuaRMIiyIGVPC7ZfTvJ3iL4HWykj/XWR2j1X0Ae0jt6iYLi0uv2qoPmQTgDK97s9IfLCwrHKc0FSEM50fow7GVmYIkyjHIoApxryAg9IiM4dFASAXpgqykXaNVZYpSkyv2kQZX1vIclQupiByz7I6+FYaL3s7zE3yYmCZzHIDktaJkpwjk6JyZVDMFDpw4QqpjTiuiYuNkZt1ius+fSjIFPwDQLoWJgdVJlb0iKRNF0PqkLDbACc0FYLI+LHFCRGMT2NISM5NuRTJKb6sXxvxhGV61rqzkBwMThUbMUk4h8Tg8mqa3WdscHUH+A24ASnYcao/ZqCISZVTQtRIkJPCDWyEH+IS/ovJ5BnTwWZthLgin7kmYgbVHvNE814Gik0jxrCJ7zr4S6ACRxY6j50HSrGW5Lw4s7OQ/2Nrvh0+7mp2cr269m+7roPfAeW0v9J572957r+/tetT/7v/wf/q/Wu9aomVak5q64M987nuN0/r2BwZQops=</latexit>

f(w, VB, MD, 3)

w y

■ Linear-chain CRFs: Globally-normalized discriminative sequence labeling models!

slide-22
SLIDE 22

Conditional random fields (CRFs)

5

Ψ(w, y) =

M+1

X

m=1

θ · f(w, ym, ym−1, m)

<latexit sha1_base64="6QgzZybjFg9uG/YPephMOPdGJY8=">AHcnicnVTdbts2FZTr+68nzbdXvDLghirV5gdQXWmwJFt4vdNPMAp+kQOgJFUTYdUhJIyolBcO+5J9gL7AFGUrJrOW4CTICp43PO953vHFJMSkalGg7/vrd3v/PFg+7DL3tf3Nt48e7z/5ItKYHKC1aIjwmShNGcnCqGPlYCoJ4wshZcvmLi58tiJC0yMdqWZIJR9OcZhQjZV3x/v1/eodwhpRWJo4ucvAGQCSmHF3HWjmHAaO+NwDkNAVXzgx7hzfTYCYQ1qO+z/C5PhA28NCsYqHZBf8sHfYxMBG0OLUhTXANo1jTV9E5mL3CXSJot64O3ZNmtFqumPkQn/xyw+cW8P5PKwM7Si1j156Gr6mk8GRh+Wy4T0NkXOTxHpu+kWsnFbEyhlao2TFV6gmcgvUtTk2Gy2Oa710Q9qVkzY3F5q+mJtBrfTS/XV6w2a+61w6WPfiaP1xgOS67EM1IwoBiNCgcxBndQG+2agYxyqTt4ghSezKQmHE9J/G7KQ9GtxB6vSOa7Vn2xJHkvbPBmC8Xp8BGxtsFkctBFHjvgTvr0R+sTA0obu7hn0PAdMuL6yQfdemhXbWg73pO9fRMYl1BMwzQicJzNrhmXM3aK5ZR8AHsaPD4bHQ/+Am0bUGAdB84zi/T0B0wJXnOQKMyTleTQs1UQjoShmxPRgJUmJ8CWaknNr5ogTOdH+SjLg0HpSkBXC/nIFvHcToRGXcskTm+lmK7djzrkrdl6p7PVE07ysFMlxXSirGFAFcPcbSKkgWLGlNRAW1GoFeIbsTit7C9qd2igzI2xBVLsRzCdaZr56S1LCTRt8XTfag4Lk5AoXnKM8/UHDHKlinJUMWUO03Zyt41r0G6oKVsRremZETBQtApzRFjJFPQLW23fc0U9GsP/krsBgny3qr+vSQCqUJYJfWHbeyGTeFzd6eY2zJpvs60Zrst7QXYZtxcipLk2tRfBCskgclUFXZEnwD74VaApTZbajzSRtWZ9hTGm2fyZvGh5fH0U/HL/94dfD2XNeHwbPgu+DfhAFPwdvg9+CUXAa4M5JR3VM568H/3afdp93m8O9d6/BfBe0nu7gPxLUizA=</latexit>

1 … 1 … … 1 …

Janet will back the bill

= f(w1, w2, w3, w4, VB, MD) wm-1 = will wm-1 = my wm = back wm+1 = ache ym-1 = MD

wm+1 wm wm-1 wm-2

NNP MD VB?

<s>

ym-1

K + F

<latexit sha1_base64="wMAXQnpawC1ge/9hOKMhUb3NFJc=">AExXiclVJb9MwFE5GgRFuGzyYpimtWxUzUACIU2aAESmigSu6C5VI5z0lqznch2ulVWxG/h1/AKj/wbnKSblnUgYanO1+PvnPOdS5Rxpk2v9tfuNK6eu364o3g5q3bd+4uLd/b02muKOzSlKfqICIaOJOwa5jhcJApICLisB8dvS7f9yegNEvlZzPNYCDISLKEUWKcabjsvwxWcV+zNo6EPS42UPmdFh20hbDOBeZMKOHVmyFxVe7sx4WJQGbMRhSIEzj1FQuSXEWYToU5WXFk9D9E51gtd+uo2LBYlTqgSJItRiOMnal0noFLYhoTKvuaRMIiyIGVPC7ZfTvJ3iL4HWykj/XWR2j1X0Ae0jt6iYLi0uv2qoPmQTgDK97s9IfLCwrHKc0FSEM50fow7GVmYIkyjHIoApxryAg9IiM4dFASAXpgqykXaNVZYpSkyv2kQZX1vIclQupiByz7I6+FYaL3s7zE3yYmCZzHIDktaJkpwjk6JyZVDMFDpw4QqpjTiuiYuNkZt1ius+fSjIFPwDQLoWJgdVJlb0iKRNF0PqkLDbACc0FYLI+LHFCRGMT2NISM5NuRTJKb6sXxvxhGV61rqzkBwMThUbMUk4h8Tg8mqa3WdscHUH+A24ASnYcao/ZqCISZVTQtRIkJPCDWyEH+IS/ovJ5BnTwWZthLgin7kmYgbVHvNE814Gik0jxrCJ7zr4S6ACRxY6j50HSrGW5Lw4s7OQ/2Nrvh0+7mp2cr269m+7roPfAeW0v9J572957r+/tetT/7v/wf/q/Wu9aomVak5q64M987nuN0/r2BwZQops=</latexit>

f(w, VB, MD, 3)

w y

# labels

■ Linear-chain CRFs: Globally-normalized discriminative sequence labeling models!

slide-23
SLIDE 23

Conditional random fields (CRFs)

5

Ψ(w, y) =

M+1

X

m=1

θ · f(w, ym, ym−1, m)

<latexit sha1_base64="6QgzZybjFg9uG/YPephMOPdGJY8=">AHcnicnVTdbts2FZTr+68nzbdXvDLghirV5gdQXWmwJFt4vdNPMAp+kQOgJFUTYdUhJIyolBcO+5J9gL7AFGUrJrOW4CTICp43PO953vHFJMSkalGg7/vrd3v/PFg+7DL3tf3Nt48e7z/5ItKYHKC1aIjwmShNGcnCqGPlYCoJ4wshZcvmLi58tiJC0yMdqWZIJR9OcZhQjZV3x/v1/eodwhpRWJo4ucvAGQCSmHF3HWjmHAaO+NwDkNAVXzgx7hzfTYCYQ1qO+z/C5PhA28NCsYqHZBf8sHfYxMBG0OLUhTXANo1jTV9E5mL3CXSJot64O3ZNmtFqumPkQn/xyw+cW8P5PKwM7Si1j156Gr6mk8GRh+Wy4T0NkXOTxHpu+kWsnFbEyhlao2TFV6gmcgvUtTk2Gy2Oa710Q9qVkzY3F5q+mJtBrfTS/XV6w2a+61w6WPfiaP1xgOS67EM1IwoBiNCgcxBndQG+2agYxyqTt4ghSezKQmHE9J/G7KQ9GtxB6vSOa7Vn2xJHkvbPBmC8Xp8BGxtsFkctBFHjvgTvr0R+sTA0obu7hn0PAdMuL6yQfdemhXbWg73pO9fRMYl1BMwzQicJzNrhmXM3aK5ZR8AHsaPD4bHQ/+Am0bUGAdB84zi/T0B0wJXnOQKMyTleTQs1UQjoShmxPRgJUmJ8CWaknNr5ogTOdH+SjLg0HpSkBXC/nIFvHcToRGXcskTm+lmK7djzrkrdl6p7PVE07ysFMlxXSirGFAFcPcbSKkgWLGlNRAW1GoFeIbsTit7C9qd2igzI2xBVLsRzCdaZr56S1LCTRt8XTfag4Lk5AoXnKM8/UHDHKlinJUMWUO03Zyt41r0G6oKVsRremZETBQtApzRFjJFPQLW23fc0U9GsP/krsBgny3qr+vSQCqUJYJfWHbeyGTeFzd6eY2zJpvs60Zrst7QXYZtxcipLk2tRfBCskgclUFXZEnwD74VaApTZbajzSRtWZ9hTGm2fyZvGh5fH0U/HL/94dfD2XNeHwbPgu+DfhAFPwdvg9+CUXAa4M5JR3VM568H/3afdp93m8O9d6/BfBe0nu7gPxLUizA=</latexit>

1 … 1 … … 1 …

Janet will back the bill

= f(w1, w2, w3, w4, VB, MD) wm-1 = will wm-1 = my wm = back wm+1 = ache ym-1 = MD

wm+1 wm wm-1 wm-2

NNP MD VB?

<s>

ym-1

K + F

<latexit sha1_base64="wMAXQnpawC1ge/9hOKMhUb3NFJc=">AExXiclVJb9MwFE5GgRFuGzyYpimtWxUzUACIU2aAESmigSu6C5VI5z0lqznch2ulVWxG/h1/AKj/wbnKSblnUgYanO1+PvnPOdS5Rxpk2v9tfuNK6eu364o3g5q3bd+4uLd/b02muKOzSlKfqICIaOJOwa5jhcJApICLisB8dvS7f9yegNEvlZzPNYCDISLKEUWKcabjsvwxWcV+zNo6EPS42UPmdFh20hbDOBeZMKOHVmyFxVe7sx4WJQGbMRhSIEzj1FQuSXEWYToU5WXFk9D9E51gtd+uo2LBYlTqgSJItRiOMnal0noFLYhoTKvuaRMIiyIGVPC7ZfTvJ3iL4HWykj/XWR2j1X0Ae0jt6iYLi0uv2qoPmQTgDK97s9IfLCwrHKc0FSEM50fow7GVmYIkyjHIoApxryAg9IiM4dFASAXpgqykXaNVZYpSkyv2kQZX1vIclQupiByz7I6+FYaL3s7zE3yYmCZzHIDktaJkpwjk6JyZVDMFDpw4QqpjTiuiYuNkZt1ius+fSjIFPwDQLoWJgdVJlb0iKRNF0PqkLDbACc0FYLI+LHFCRGMT2NISM5NuRTJKb6sXxvxhGV61rqzkBwMThUbMUk4h8Tg8mqa3WdscHUH+A24ASnYcao/ZqCISZVTQtRIkJPCDWyEH+IS/ovJ5BnTwWZthLgin7kmYgbVHvNE814Gik0jxrCJ7zr4S6ACRxY6j50HSrGW5Lw4s7OQ/2Nrvh0+7mp2cr269m+7roPfAeW0v9J572957r+/tetT/7v/wf/q/Wu9aomVak5q64M987nuN0/r2BwZQops=</latexit>

f(w, VB, MD, 3)

w y

# features # labels

■ Linear-chain CRFs: Globally-normalized discriminative sequence labeling models!

slide-24
SLIDE 24

Conditional random fields (CRFs)

5

Ψ(w, y) =

M+1

X

m=1

θ · f(w, ym, ym−1, m)

<latexit sha1_base64="6QgzZybjFg9uG/YPephMOPdGJY8=">AHcnicnVTdbts2FZTr+68nzbdXvDLghirV5gdQXWmwJFt4vdNPMAp+kQOgJFUTYdUhJIyolBcO+5J9gL7AFGUrJrOW4CTICp43PO953vHFJMSkalGg7/vrd3v/PFg+7DL3tf3Nt48e7z/5ItKYHKC1aIjwmShNGcnCqGPlYCoJ4wshZcvmLi58tiJC0yMdqWZIJR9OcZhQjZV3x/v1/eodwhpRWJo4ucvAGQCSmHF3HWjmHAaO+NwDkNAVXzgx7hzfTYCYQ1qO+z/C5PhA28NCsYqHZBf8sHfYxMBG0OLUhTXANo1jTV9E5mL3CXSJot64O3ZNmtFqumPkQn/xyw+cW8P5PKwM7Si1j156Gr6mk8GRh+Wy4T0NkXOTxHpu+kWsnFbEyhlao2TFV6gmcgvUtTk2Gy2Oa710Q9qVkzY3F5q+mJtBrfTS/XV6w2a+61w6WPfiaP1xgOS67EM1IwoBiNCgcxBndQG+2agYxyqTt4ghSezKQmHE9J/G7KQ9GtxB6vSOa7Vn2xJHkvbPBmC8Xp8BGxtsFkctBFHjvgTvr0R+sTA0obu7hn0PAdMuL6yQfdemhXbWg73pO9fRMYl1BMwzQicJzNrhmXM3aK5ZR8AHsaPD4bHQ/+Am0bUGAdB84zi/T0B0wJXnOQKMyTleTQs1UQjoShmxPRgJUmJ8CWaknNr5ogTOdH+SjLg0HpSkBXC/nIFvHcToRGXcskTm+lmK7djzrkrdl6p7PVE07ysFMlxXSirGFAFcPcbSKkgWLGlNRAW1GoFeIbsTit7C9qd2igzI2xBVLsRzCdaZr56S1LCTRt8XTfag4Lk5AoXnKM8/UHDHKlinJUMWUO03Zyt41r0G6oKVsRremZETBQtApzRFjJFPQLW23fc0U9GsP/krsBgny3qr+vSQCqUJYJfWHbeyGTeFzd6eY2zJpvs60Zrst7QXYZtxcipLk2tRfBCskgclUFXZEnwD74VaApTZbajzSRtWZ9hTGm2fyZvGh5fH0U/HL/94dfD2XNeHwbPgu+DfhAFPwdvg9+CUXAa4M5JR3VM568H/3afdp93m8O9d6/BfBe0nu7gPxLUizA=</latexit>

1 … 1 … … 1 …

Janet will back the bill

= f(w1, w2, w3, w4, VB, MD) wm-1 = will wm-1 = my wm = back wm+1 = ache ym-1 = MD

wm+1 wm wm-1 wm-2

NNP MD VB?

<s>

ym-1

K + F

<latexit sha1_base64="wMAXQnpawC1ge/9hOKMhUb3NFJc=">AExXiclVJb9MwFE5GgRFuGzyYpimtWxUzUACIU2aAESmigSu6C5VI5z0lqznch2ulVWxG/h1/AKj/wbnKSblnUgYanO1+PvnPOdS5Rxpk2v9tfuNK6eu364o3g5q3bd+4uLd/b02muKOzSlKfqICIaOJOwa5jhcJApICLisB8dvS7f9yegNEvlZzPNYCDISLKEUWKcabjsvwxWcV+zNo6EPS42UPmdFh20hbDOBeZMKOHVmyFxVe7sx4WJQGbMRhSIEzj1FQuSXEWYToU5WXFk9D9E51gtd+uo2LBYlTqgSJItRiOMnal0noFLYhoTKvuaRMIiyIGVPC7ZfTvJ3iL4HWykj/XWR2j1X0Ae0jt6iYLi0uv2qoPmQTgDK97s9IfLCwrHKc0FSEM50fow7GVmYIkyjHIoApxryAg9IiM4dFASAXpgqykXaNVZYpSkyv2kQZX1vIclQupiByz7I6+FYaL3s7zE3yYmCZzHIDktaJkpwjk6JyZVDMFDpw4QqpjTiuiYuNkZt1ius+fSjIFPwDQLoWJgdVJlb0iKRNF0PqkLDbACc0FYLI+LHFCRGMT2NISM5NuRTJKb6sXxvxhGV61rqzkBwMThUbMUk4h8Tg8mqa3WdscHUH+A24ASnYcao/ZqCISZVTQtRIkJPCDWyEH+IS/ovJ5BnTwWZthLgin7kmYgbVHvNE814Gik0jxrCJ7zr4S6ACRxY6j50HSrGW5Lw4s7OQ/2Nrvh0+7mp2cr269m+7roPfAeW0v9J572957r+/tetT/7v/wf/q/Wu9aomVak5q64M987nuN0/r2BwZQops=</latexit>

(K − 1)(K + F)

<latexit sha1_base64="PlZBZr6ihz0X20hclBQ9g8VkZw=">AEzHiclVJb9MwFE5GgRFuGzyYpimtWyrmoEL5PGRQhpGhSJXdBcKsc5a3ZTmQ72yor/wWfg2vwL/BSbpWQcSlup8Pf7Od+5RBln2vR6v/25a63rN27O3wpu37l7/7C4oM9neaKwi5NeaoOIqKBMwm7hkOB5kCIiIO+9HRm/J9/xiUZqn8bCYZDAQZSZYwSowzDRf9V8Ey7mvWxpGwJ8UaKr+ToM2Eda5wJwJZvTQis2w+Gp3VsOiJGAzBkMKhGmcmsolKc4jTIaivKxYD90/0QmW+06KhYsRjWtSpAoQi2G06x9lYROYRsSKvOKS8okwoKYMSXcfjnL2yn+EmiljPTfRWYX2DMFtbfXw057G62idx0UDBeWet1edAsCKdgyZue/nBxTuE4pbkAaSgnWh+GvcwMLFGUQ5FgHMNGaFHZASHDkoiQA9sNewCLTtLjJUuZ80qLJe9LBEaD0RkWOWTdKX30rjVW+HuUleDiyTW5A0jpRknNkUlRuDoqZAmr4xAFCFXNaER0TN0Lj9s1+EKaMfBjM1CqBhYnVTZG5IiUTSdT+tCA6xAwglNhSAyfmpxQgTjkxgSknNT7kZyhq/q1p8zDI9bd15SA4Gp4qNmCScQ2JweTXN7jM2uLoD/BbcgBTsONUfM1DEpMopIWokyGnhBjbCj3EJ/8Vk8pzpYLMsWwlwxZR9STOQtqhXm6cacDRSaZ41BM/4V0JdAJK4MdR8aLrVDLel4eWdnAV7G93wWXfj0/OlrdfTfZ3HnlPvLYXei+8Le+91/d2Pep/93/4P/1frQ8t07KtoqbO+VOfh17jtL79ATypCw=</latexit>

f(w, VB, MD, 3)

w y

# features # labels

■ Linear-chain CRFs: Globally-normalized discriminative sequence labeling models!

slide-25
SLIDE 25

Conditional random fields (CRFs)

6

■ Decoding: Direct application of Viterbi!

P(y | w) = exp(Ψ(w, y)) P

y0∈Y(w)

exp(Ψ(w, y0))

<latexit sha1_base64="Hl4SzGDfq/v1/mRkJxB+iyWEgs4=">AH9nicnVTPb9s2FayZG69X0137IVbEFha3cDqCmyXAkW3wy7tPMBpOoSOQFGUTYeUBIpybBD8V3obdt3fstv+kx1HUrJrOU4CTICl5/e+7+P3HgnGBaOlHAz+2dv/5ODw086Dh93Pv/iy68eHT1+V+aVwOQM5ywX72NUEkYzciapZOR9IQjiMSPn8dVPtn4+J6KkeTaSy4KMOZpkNKUYSZOKjg6uidwiqSOgovM/ASQCQmHC0iJW1Cg6HvAgA5TcC1DYPuyU0YTAXCaug7hMO6QtDQA72qBXoX/XZi96SpgY2i4ReFyBcAmncSKfoy1JeZBdIGR3xbrRBrUQVfRbq4H/M4qP29kBudwZ2Lj2PpD8L7KpOxhHfzo2eKfs0QCY103GkZtrPI2m9IlZM0ZpVnzFaip3UG2bI73R4qj2SzesXVtrM32p6NOZ7tdOr+xf6zdo5rvG0v6FyvrjgMki8KHckokAhAnuQSpfTvkdbKNgMZ5VSWposepOZkIDnFiKnftd4p2+vfI2r9jmq359sWhyX1z/tgtL30qAfM2mBzcdBm9KzwR357I9RbDQtTur9n0IjAmKtrU7XfpV7Jrf1wp/rmagtoB6BbmZgM6leKywjbl+KG/k+4EF36NeibgI1atcYthxsT8Sle3p7LA0t0LcI9axS9Oh4cDpwD7gZhE1w7DXPMDraFzDJcVJjFDZXkRDgo5VkhIihnRXViVpED4Ck3IhQkzxEk5Vu5a1ODEZBKQ5sL8MglcdpOhEC/LJY8N0jZSbtdsclftopLpj2NFs6KSJMP1QmnFgMyBvWNBQgXBki1NgLCgxivAU2TGLM1NbDZ6Y5kpYXMi241gPlZl6lZvWYq5bpMXdaNdKEhGrnHOcqS7xRMEadsmZAUVUza/UtX8a59ZM5LcpmdGtJRiTMBZ3QDFGUgntq502n6mE7t2FPxOzQYK8Ma5/LYhAMhfGSX25aLNhE/iNvdf0XUiarZEmbLelnAHTjJ1LXpBM6fr4sbwkMJ6IvCpahm/wnVEjgFKzDTWetGk1wpzScPtM3gzePT8Nvz9/tuL41evm/P6wHvifev5Xuj94L3yfvG3pmHD/4+Pdw73C/s+h86PzR+bOG7u81nK+91tP56z9G0Lzs</latexit>
slide-26
SLIDE 26

Conditional random fields (CRFs)

6

■ Decoding: Direct application of Viterbi!

P(y | w) = exp(Ψ(w, y)) P

y0∈Y(w)

exp(Ψ(w, y0))

<latexit sha1_base64="Hl4SzGDfq/v1/mRkJxB+iyWEgs4=">AH9nicnVTPb9s2FayZG69X0137IVbEFha3cDqCmyXAkW3wy7tPMBpOoSOQFGUTYeUBIpybBD8V3obdt3fstv+kx1HUrJrOU4CTICl5/e+7+P3HgnGBaOlHAz+2dv/5ODw086Dh93Pv/iy68eHT1+V+aVwOQM5ywX72NUEkYzciapZOR9IQjiMSPn8dVPtn4+J6KkeTaSy4KMOZpkNKUYSZOKjg6uidwiqSOgovM/ASQCQmHC0iJW1Cg6HvAgA5TcC1DYPuyU0YTAXCaug7hMO6QtDQA72qBXoX/XZi96SpgY2i4ReFyBcAmncSKfoy1JeZBdIGR3xbrRBrUQVfRbq4H/M4qP29kBudwZ2Lj2PpD8L7KpOxhHfzo2eKfs0QCY103GkZtrPI2m9IlZM0ZpVnzFaip3UG2bI73R4qj2SzesXVtrM32p6NOZ7tdOr+xf6zdo5rvG0v6FyvrjgMki8KHckokAhAnuQSpfTvkdbKNgMZ5VSWposepOZkIDnFiKnftd4p2+vfI2r9jmq359sWhyX1z/tgtL30qAfM2mBzcdBm9KzwR357I9RbDQtTur9n0IjAmKtrU7XfpV7Jrf1wp/rmagtoB6BbmZgM6leKywjbl+KG/k+4EF36NeibgI1atcYthxsT8Sle3p7LA0t0LcI9axS9Oh4cDpwD7gZhE1w7DXPMDraFzDJcVJjFDZXkRDgo5VkhIihnRXViVpED4Ck3IhQkzxEk5Vu5a1ODEZBKQ5sL8MglcdpOhEC/LJY8N0jZSbtdsclftopLpj2NFs6KSJMP1QmnFgMyBvWNBQgXBki1NgLCgxivAU2TGLM1NbDZ6Y5kpYXMi241gPlZl6lZvWYq5bpMXdaNdKEhGrnHOcqS7xRMEadsmZAUVUza/UtX8a59ZM5LcpmdGtJRiTMBZ3QDFGUgntq502n6mE7t2FPxOzQYK8Ma5/LYhAMhfGSX25aLNhE/iNvdf0XUiarZEmbLelnAHTjJ1LXpBM6fr4sbwkMJ6IvCpahm/wnVEjgFKzDTWetGk1wpzScPtM3gzePT8Nvz9/tuL41evm/P6wHvifev5Xuj94L3yfvG3pmHD/4+Pdw73C/s+h86PzR+bOG7u81nK+91tP56z9G0Lzs</latexit>
slide-27
SLIDE 27

Conditional random fields (CRFs)

6

■ Decoding: Direct application of Viterbi!

ˆ y = argmax

y

log P(y | w)

<latexit sha1_base64="tmRzwOs7SiKi8INixRl4Jca9kY=">AFB3iclVNb9MwFE5Lga1cdnvkxTBNa9lFzUCl0kTIQ0TRSJXdBcKsc5a3ZcWQ726rgH8Cv4Q3xys/gmT+Ck3RVsw4kLMU+Of7Od+5OEg406bT+VWr32rcvnN3br57/6DhwuLS8tHWqaKwiGVXKqTgGjgLIZDwyHk0QBEQGH4+DsdX5/fA5KMxl/NKMEeoIMYhYxSoxT9ZdqX5pruKtZCwciu7CbKD9Hto12EdapwJwJZnQ/E7u+/ZwdbPg2B2AzBEMswjSUpjCJ7MTDqC/yLRNbvsT7eZat1V6xYKFqIQVASJFaIbhMmndRKFtswqFQr3ugrIYUHMkBKefbqK27Z/cbSe/rvJMp9GxCrf0tv93aRxvobRs18ZCYkp21uUuiBoJc9icqzOUAXdUATRehv7ja2e4UC80K/lhY9car21+qKxKmgqIDeVE61O/k5heRpRhlINt4lRDQugZGcCpE2MiQPeyYk4sWnOaEVSuS82qNBOW2REaD0SgUPm9dX73LlTXenqYle9jIWJ6mBmJaBopQjI1E+dChkCqjhIycQqpjiuiQuO4bN5quN1NhsDPwVQToaKX6aiIXqEUCFs1viwTbWIFMVxQKQSJw6cZjohgfBRCRFJu8rGKruSb6rUZnrNEj0s3cnBYKnYgMWEc4gMzreq2h1Dg4u9id+Aa5CA8f6fQKGKkck3I2rGvYAD/GufgvJIsnSCdW08oKAi6ZvC4ygTiz5avgUgMOBkqmSYXwjH1B1DkgkWtDiYeqWYlwU+pfn8lZ4Whn23+2vfPh+ereq/G8znmPvCdey/O9F96e987reocerf2uz9eX6yuNr41vje+NHyW0XhvbrHiV1fj5BxU+uaY=</latexit>

P(y | w) = exp(Ψ(w, y)) P

y0∈Y(w)

exp(Ψ(w, y0))

<latexit sha1_base64="Hl4SzGDfq/v1/mRkJxB+iyWEgs4=">AH9nicnVTPb9s2FayZG69X0137IVbEFha3cDqCmyXAkW3wy7tPMBpOoSOQFGUTYeUBIpybBD8V3obdt3fstv+kx1HUrJrOU4CTICl5/e+7+P3HgnGBaOlHAz+2dv/5ODw086Dh93Pv/iy68eHT1+V+aVwOQM5ywX72NUEkYzciapZOR9IQjiMSPn8dVPtn4+J6KkeTaSy4KMOZpkNKUYSZOKjg6uidwiqSOgovM/ASQCQmHC0iJW1Cg6HvAgA5TcC1DYPuyU0YTAXCaug7hMO6QtDQA72qBXoX/XZi96SpgY2i4ReFyBcAmncSKfoy1JeZBdIGR3xbrRBrUQVfRbq4H/M4qP29kBudwZ2Lj2PpD8L7KpOxhHfzo2eKfs0QCY103GkZtrPI2m9IlZM0ZpVnzFaip3UG2bI73R4qj2SzesXVtrM32p6NOZ7tdOr+xf6zdo5rvG0v6FyvrjgMki8KHckokAhAnuQSpfTvkdbKNgMZ5VSWposepOZkIDnFiKnftd4p2+vfI2r9jmq359sWhyX1z/tgtL30qAfM2mBzcdBm9KzwR357I9RbDQtTur9n0IjAmKtrU7XfpV7Jrf1wp/rmagtoB6BbmZgM6leKywjbl+KG/k+4EF36NeibgI1atcYthxsT8Sle3p7LA0t0LcI9axS9Oh4cDpwD7gZhE1w7DXPMDraFzDJcVJjFDZXkRDgo5VkhIihnRXViVpED4Ck3IhQkzxEk5Vu5a1ODEZBKQ5sL8MglcdpOhEC/LJY8N0jZSbtdsclftopLpj2NFs6KSJMP1QmnFgMyBvWNBQgXBki1NgLCgxivAU2TGLM1NbDZ6Y5kpYXMi241gPlZl6lZvWYq5bpMXdaNdKEhGrnHOcqS7xRMEadsmZAUVUza/UtX8a59ZM5LcpmdGtJRiTMBZ3QDFGUgntq502n6mE7t2FPxOzQYK8Ma5/LYhAMhfGSX25aLNhE/iNvdf0XUiarZEmbLelnAHTjJ1LXpBM6fr4sbwkMJ6IvCpahm/wnVEjgFKzDTWetGk1wpzScPtM3gzePT8Nvz9/tuL41evm/P6wHvifev5Xuj94L3yfvG3pmHD/4+Pdw73C/s+h86PzR+bOG7u81nK+91tP56z9G0Lzs</latexit>
slide-28
SLIDE 28

Conditional random fields (CRFs)

6

■ Decoding: Direct application of Viterbi!

ˆ y = argmax

y

log P(y | w)

<latexit sha1_base64="tmRzwOs7SiKi8INixRl4Jca9kY=">AFB3iclVNb9MwFE5Lga1cdnvkxTBNa9lFzUCl0kTIQ0TRSJXdBcKsc5a3ZcWQ726rgH8Cv4Q3xys/gmT+Ck3RVsw4kLMU+Of7Od+5OEg406bT+VWr32rcvnN3br57/6DhwuLS8tHWqaKwiGVXKqTgGjgLIZDwyHk0QBEQGH4+DsdX5/fA5KMxl/NKMEeoIMYhYxSoxT9ZdqX5pruKtZCwciu7CbKD9Hto12EdapwJwJZnQ/E7u+/ZwdbPg2B2AzBEMswjSUpjCJ7MTDqC/yLRNbvsT7eZat1V6xYKFqIQVASJFaIbhMmndRKFtswqFQr3ugrIYUHMkBKefbqK27Z/cbSe/rvJMp9GxCrf0tv93aRxvobRs18ZCYkp21uUuiBoJc9icqzOUAXdUATRehv7ja2e4UC80K/lhY9car21+qKxKmgqIDeVE61O/k5heRpRhlINt4lRDQugZGcCpE2MiQPeyYk4sWnOaEVSuS82qNBOW2REaD0SgUPm9dX73LlTXenqYle9jIWJ6mBmJaBopQjI1E+dChkCqjhIycQqpjiuiQuO4bN5quN1NhsDPwVQToaKX6aiIXqEUCFs1viwTbWIFMVxQKQSJw6cZjohgfBRCRFJu8rGKruSb6rUZnrNEj0s3cnBYKnYgMWEc4gMzreq2h1Dg4u9id+Aa5CA8f6fQKGKkck3I2rGvYAD/GufgvJIsnSCdW08oKAi6ZvC4ygTiz5avgUgMOBkqmSYXwjH1B1DkgkWtDiYeqWYlwU+pfn8lZ4Whn23+2vfPh+ereq/G8znmPvCdey/O9F96e987reocerf2uz9eX6yuNr41vje+NHyW0XhvbrHiV1fj5BxU+uaY=</latexit>

= argmax

y

Ψ(y, w) − log X

y0∈Y(w)

exp Ψ(y0, w)

<latexit sha1_base64="DypLI6QWGLOj7fmyWz0DYpUsnmg=">AFfXiclVNdb9MwFM3KCqN8bfDIi2Gq1rKuagbSeJk0AUJI0SR2JfmUjmO01qz48h2tlWxSv8RP4GfwCcpO2atSDNUpyb63Pf6JkgYVbrT+bVUubNcvXtv5X7twcNHj5+srj09UiKVmBxiwYQ8CZAijMbkUFPNyEkiCeIBI8fB+fvs/PiCSEVF/FWPEtLjaBDTiGKknau/tvS7VodRsw4ObStkD2Htkm2AVQpRwyqlWfcN3fvNHGz6NgNAPSQaWQBxKHQeEtkpw6jPs83wLd98Wat3m0UrJDTEBSwPEkETaQXCWNRKa1pQk5O4Nl5TGAHKkhxgxczrJ27T/INrImG5dZDKDni+osb/lNxv7YBN8bALHPUS6kGdtxonkgKOr/tQFmRiASRPAbBdqi9ATpaOx0qxbWznJrftxzbVxTdZfXe+0O/kC84Y/Nta98er21yoShgKnMQaM6TUmd9JdM8gqSlmxNZgqkiC8DkakDNnxogT1TP5dFpQd54QREK6J9Yg985GMSVGvHAIbMq1M2zLno7CzV0dueoXGSahLjIlGUMqAFyEYdhFQSrNnIGQhL6rQCPERu5rT7IdytzaQZEnZBdLkQzHtGRXn2kqSA23LwVFoDUoSk0sOEdx+MrACHKRiGJUMp0NszRxF7Ur1Z4QRM1bt2UkhENhaQDGiPGSKRhtpXd7jXUMN9r8ANxFyTJgVP9OSESaSGdkmLErLuwAXwBM/N/SBpPkc4sl2VyAa6YrC8iIbGxewxoQgMBlKkSUnwXHwu1BGgyF1DgSflsALhptS/OZPzxtF23/d3v7yZn3v3XheV7zn3kuv4fnejrfnfK63qGHK6eV75UflZ/Lf6r1aqvaLqCVpXHM6+0qjt/AWSk5PE=</latexit>

P(y | w) = exp(Ψ(w, y)) P

y0∈Y(w)

exp(Ψ(w, y0))

<latexit sha1_base64="Hl4SzGDfq/v1/mRkJxB+iyWEgs4=">AH9nicnVTPb9s2FayZG69X0137IVbEFha3cDqCmyXAkW3wy7tPMBpOoSOQFGUTYeUBIpybBD8V3obdt3fstv+kx1HUrJrOU4CTICl5/e+7+P3HgnGBaOlHAz+2dv/5ODw086Dh93Pv/iy68eHT1+V+aVwOQM5ywX72NUEkYzciapZOR9IQjiMSPn8dVPtn4+J6KkeTaSy4KMOZpkNKUYSZOKjg6uidwiqSOgovM/ASQCQmHC0iJW1Cg6HvAgA5TcC1DYPuyU0YTAXCaug7hMO6QtDQA72qBXoX/XZi96SpgY2i4ReFyBcAmncSKfoy1JeZBdIGR3xbrRBrUQVfRbq4H/M4qP29kBudwZ2Lj2PpD8L7KpOxhHfzo2eKfs0QCY103GkZtrPI2m9IlZM0ZpVnzFaip3UG2bI73R4qj2SzesXVtrM32p6NOZ7tdOr+xf6zdo5rvG0v6FyvrjgMki8KHckokAhAnuQSpfTvkdbKNgMZ5VSWposepOZkIDnFiKnftd4p2+vfI2r9jmq359sWhyX1z/tgtL30qAfM2mBzcdBm9KzwR357I9RbDQtTur9n0IjAmKtrU7XfpV7Jrf1wp/rmagtoB6BbmZgM6leKywjbl+KG/k+4EF36NeibgI1atcYthxsT8Sle3p7LA0t0LcI9axS9Oh4cDpwD7gZhE1w7DXPMDraFzDJcVJjFDZXkRDgo5VkhIihnRXViVpED4Ck3IhQkzxEk5Vu5a1ODEZBKQ5sL8MglcdpOhEC/LJY8N0jZSbtdsclftopLpj2NFs6KSJMP1QmnFgMyBvWNBQgXBki1NgLCgxivAU2TGLM1NbDZ6Y5kpYXMi241gPlZl6lZvWYq5bpMXdaNdKEhGrnHOcqS7xRMEadsmZAUVUza/UtX8a59ZM5LcpmdGtJRiTMBZ3QDFGUgntq502n6mE7t2FPxOzQYK8Ma5/LYhAMhfGSX25aLNhE/iNvdf0XUiarZEmbLelnAHTjJ1LXpBM6fr4sbwkMJ6IvCpahm/wnVEjgFKzDTWetGk1wpzScPtM3gzePT8Nvz9/tuL41evm/P6wHvifev5Xuj94L3yfvG3pmHD/4+Pdw73C/s+h86PzR+bOG7u81nK+91tP56z9G0Lzs</latexit>
slide-29
SLIDE 29

Conditional random fields (CRFs)

6

■ Decoding: Direct application of Viterbi!

ˆ y = argmax

y

log P(y | w)

<latexit sha1_base64="tmRzwOs7SiKi8INixRl4Jca9kY=">AFB3iclVNb9MwFE5Lga1cdnvkxTBNa9lFzUCl0kTIQ0TRSJXdBcKsc5a3ZcWQ726rgH8Cv4Q3xys/gmT+Ck3RVsw4kLMU+Of7Od+5OEg406bT+VWr32rcvnN3br57/6DhwuLS8tHWqaKwiGVXKqTgGjgLIZDwyHk0QBEQGH4+DsdX5/fA5KMxl/NKMEeoIMYhYxSoxT9ZdqX5pruKtZCwciu7CbKD9Hto12EdapwJwJZnQ/E7u+/ZwdbPg2B2AzBEMswjSUpjCJ7MTDqC/yLRNbvsT7eZat1V6xYKFqIQVASJFaIbhMmndRKFtswqFQr3ugrIYUHMkBKefbqK27Z/cbSe/rvJMp9GxCrf0tv93aRxvobRs18ZCYkp21uUuiBoJc9icqzOUAXdUATRehv7ja2e4UC80K/lhY9car21+qKxKmgqIDeVE61O/k5heRpRhlINt4lRDQugZGcCpE2MiQPeyYk4sWnOaEVSuS82qNBOW2REaD0SgUPm9dX73LlTXenqYle9jIWJ6mBmJaBopQjI1E+dChkCqjhIycQqpjiuiQuO4bN5quN1NhsDPwVQToaKX6aiIXqEUCFs1viwTbWIFMVxQKQSJw6cZjohgfBRCRFJu8rGKruSb6rUZnrNEj0s3cnBYKnYgMWEc4gMzreq2h1Dg4u9id+Aa5CA8f6fQKGKkck3I2rGvYAD/GufgvJIsnSCdW08oKAi6ZvC4ygTiz5avgUgMOBkqmSYXwjH1B1DkgkWtDiYeqWYlwU+pfn8lZ4Whn23+2vfPh+ereq/G8znmPvCdey/O9F96e987reocerf2uz9eX6yuNr41vje+NHyW0XhvbrHiV1fj5BxU+uaY=</latexit>

= argmax

y

Ψ(y, w) − log X

y0∈Y(w)

exp Ψ(y0, w)

<latexit sha1_base64="DypLI6QWGLOj7fmyWz0DYpUsnmg=">AFfXiclVNdb9MwFM3KCqN8bfDIi2Gq1rKuagbSeJk0AUJI0SR2JfmUjmO01qz48h2tlWxSv8RP4GfwCcpO2atSDNUpyb63Pf6JkgYVbrT+bVUubNcvXtv5X7twcNHj5+srj09UiKVmBxiwYQ8CZAijMbkUFPNyEkiCeIBI8fB+fvs/PiCSEVF/FWPEtLjaBDTiGKknau/tvS7VodRsw4ObStkD2Htkm2AVQpRwyqlWfcN3fvNHGz6NgNAPSQaWQBxKHQeEtkpw6jPs83wLd98Wat3m0UrJDTEBSwPEkETaQXCWNRKa1pQk5O4Nl5TGAHKkhxgxczrJ27T/INrImG5dZDKDni+osb/lNxv7YBN8bALHPUS6kGdtxonkgKOr/tQFmRiASRPAbBdqi9ATpaOx0qxbWznJrftxzbVxTdZfXe+0O/kC84Y/Nta98er21yoShgKnMQaM6TUmd9JdM8gqSlmxNZgqkiC8DkakDNnxogT1TP5dFpQd54QREK6J9Yg985GMSVGvHAIbMq1M2zLno7CzV0dueoXGSahLjIlGUMqAFyEYdhFQSrNnIGQhL6rQCPERu5rT7IdytzaQZEnZBdLkQzHtGRXn2kqSA23LwVFoDUoSk0sOEdx+MrACHKRiGJUMp0NszRxF7Ur1Z4QRM1bt2UkhENhaQDGiPGSKRhtpXd7jXUMN9r8ANxFyTJgVP9OSESaSGdkmLErLuwAXwBM/N/SBpPkc4sl2VyAa6YrC8iIbGxewxoQgMBlKkSUnwXHwu1BGgyF1DgSflsALhptS/OZPzxtF23/d3v7yZn3v3XheV7zn3kuv4fnejrfnfK63qGHK6eV75UflZ/Lf6r1aqvaLqCVpXHM6+0qjt/AWSk5PE=</latexit>

P(y | w) = exp(Ψ(w, y)) P

y0∈Y(w)

exp(Ψ(w, y0))

<latexit sha1_base64="Hl4SzGDfq/v1/mRkJxB+iyWEgs4=">AH9nicnVTPb9s2FayZG69X0137IVbEFha3cDqCmyXAkW3wy7tPMBpOoSOQFGUTYeUBIpybBD8V3obdt3fstv+kx1HUrJrOU4CTICl5/e+7+P3HgnGBaOlHAz+2dv/5ODw086Dh93Pv/iy68eHT1+V+aVwOQM5ywX72NUEkYzciapZOR9IQjiMSPn8dVPtn4+J6KkeTaSy4KMOZpkNKUYSZOKjg6uidwiqSOgovM/ASQCQmHC0iJW1Cg6HvAgA5TcC1DYPuyU0YTAXCaug7hMO6QtDQA72qBXoX/XZi96SpgY2i4ReFyBcAmncSKfoy1JeZBdIGR3xbrRBrUQVfRbq4H/M4qP29kBudwZ2Lj2PpD8L7KpOxhHfzo2eKfs0QCY103GkZtrPI2m9IlZM0ZpVnzFaip3UG2bI73R4qj2SzesXVtrM32p6NOZ7tdOr+xf6zdo5rvG0v6FyvrjgMki8KHckokAhAnuQSpfTvkdbKNgMZ5VSWposepOZkIDnFiKnftd4p2+vfI2r9jmq359sWhyX1z/tgtL30qAfM2mBzcdBm9KzwR357I9RbDQtTur9n0IjAmKtrU7XfpV7Jrf1wp/rmagtoB6BbmZgM6leKywjbl+KG/k+4EF36NeibgI1atcYthxsT8Sle3p7LA0t0LcI9axS9Oh4cDpwD7gZhE1w7DXPMDraFzDJcVJjFDZXkRDgo5VkhIihnRXViVpED4Ck3IhQkzxEk5Vu5a1ODEZBKQ5sL8MglcdpOhEC/LJY8N0jZSbtdsclftopLpj2NFs6KSJMP1QmnFgMyBvWNBQgXBki1NgLCgxivAU2TGLM1NbDZ6Y5kpYXMi241gPlZl6lZvWYq5bpMXdaNdKEhGrnHOcqS7xRMEadsmZAUVUza/UtX8a59ZM5LcpmdGtJRiTMBZ3QDFGUgntq502n6mE7t2FPxOzQYK8Ma5/LYhAMhfGSX25aLNhE/iNvdf0XUiarZEmbLelnAHTjJ1LXpBM6fr4sbwkMJ6IvCpahm/wnVEjgFKzDTWetGk1wpzScPtM3gzePT8Nvz9/tuL41evm/P6wHvifev5Xuj94L3yfvG3pmHD/4+Pdw73C/s+h86PzR+bOG7u81nK+91tP56z9G0Lzs</latexit>
slide-30
SLIDE 30

Conditional random fields (CRFs)

6

■ Decoding: Direct application of Viterbi!

ˆ y = argmax

y

log P(y | w)

<latexit sha1_base64="tmRzwOs7SiKi8INixRl4Jca9kY=">AFB3iclVNb9MwFE5Lga1cdnvkxTBNa9lFzUCl0kTIQ0TRSJXdBcKsc5a3ZcWQ726rgH8Cv4Q3xys/gmT+Ck3RVsw4kLMU+Of7Od+5OEg406bT+VWr32rcvnN3br57/6DhwuLS8tHWqaKwiGVXKqTgGjgLIZDwyHk0QBEQGH4+DsdX5/fA5KMxl/NKMEeoIMYhYxSoxT9ZdqX5pruKtZCwciu7CbKD9Hto12EdapwJwJZnQ/E7u+/ZwdbPg2B2AzBEMswjSUpjCJ7MTDqC/yLRNbvsT7eZat1V6xYKFqIQVASJFaIbhMmndRKFtswqFQr3ugrIYUHMkBKefbqK27Z/cbSe/rvJMp9GxCrf0tv93aRxvobRs18ZCYkp21uUuiBoJc9icqzOUAXdUATRehv7ja2e4UC80K/lhY9car21+qKxKmgqIDeVE61O/k5heRpRhlINt4lRDQugZGcCpE2MiQPeyYk4sWnOaEVSuS82qNBOW2REaD0SgUPm9dX73LlTXenqYle9jIWJ6mBmJaBopQjI1E+dChkCqjhIycQqpjiuiQuO4bN5quN1NhsDPwVQToaKX6aiIXqEUCFs1viwTbWIFMVxQKQSJw6cZjohgfBRCRFJu8rGKruSb6rUZnrNEj0s3cnBYKnYgMWEc4gMzreq2h1Dg4u9id+Aa5CA8f6fQKGKkck3I2rGvYAD/GufgvJIsnSCdW08oKAi6ZvC4ygTiz5avgUgMOBkqmSYXwjH1B1DkgkWtDiYeqWYlwU+pfn8lZ4Whn23+2vfPh+ereq/G8znmPvCdey/O9F96e987reocerf2uz9eX6yuNr41vje+NHyW0XhvbrHiV1fj5BxU+uaY=</latexit>

= argmax

y

Ψ(y, w) − log X

y0∈Y(w)

exp Ψ(y0, w)

<latexit sha1_base64="DypLI6QWGLOj7fmyWz0DYpUsnmg=">AFfXiclVNdb9MwFM3KCqN8bfDIi2Gq1rKuagbSeJk0AUJI0SR2JfmUjmO01qz48h2tlWxSv8RP4GfwCcpO2atSDNUpyb63Pf6JkgYVbrT+bVUubNcvXtv5X7twcNHj5+srj09UiKVmBxiwYQ8CZAijMbkUFPNyEkiCeIBI8fB+fvs/PiCSEVF/FWPEtLjaBDTiGKknau/tvS7VodRsw4ObStkD2Htkm2AVQpRwyqlWfcN3fvNHGz6NgNAPSQaWQBxKHQeEtkpw6jPs83wLd98Wat3m0UrJDTEBSwPEkETaQXCWNRKa1pQk5O4Nl5TGAHKkhxgxczrJ27T/INrImG5dZDKDni+osb/lNxv7YBN8bALHPUS6kGdtxonkgKOr/tQFmRiASRPAbBdqi9ATpaOx0qxbWznJrftxzbVxTdZfXe+0O/kC84Y/Nta98er21yoShgKnMQaM6TUmd9JdM8gqSlmxNZgqkiC8DkakDNnxogT1TP5dFpQd54QREK6J9Yg985GMSVGvHAIbMq1M2zLno7CzV0dueoXGSahLjIlGUMqAFyEYdhFQSrNnIGQhL6rQCPERu5rT7IdytzaQZEnZBdLkQzHtGRXn2kqSA23LwVFoDUoSk0sOEdx+MrACHKRiGJUMp0NszRxF7Ur1Z4QRM1bt2UkhENhaQDGiPGSKRhtpXd7jXUMN9r8ANxFyTJgVP9OSESaSGdkmLErLuwAXwBM/N/SBpPkc4sl2VyAa6YrC8iIbGxewxoQgMBlKkSUnwXHwu1BGgyF1DgSflsALhptS/OZPzxtF23/d3v7yZn3v3XheV7zn3kuv4fnejrfnfK63qGHK6eV75UflZ/Lf6r1aqvaLqCVpXHM6+0qjt/AWSk5PE=</latexit>

P(y | w) = exp(Ψ(w, y)) P

y0∈Y(w)

exp(Ψ(w, y0))

<latexit sha1_base64="Hl4SzGDfq/v1/mRkJxB+iyWEgs4=">AH9nicnVTPb9s2FayZG69X0137IVbEFha3cDqCmyXAkW3wy7tPMBpOoSOQFGUTYeUBIpybBD8V3obdt3fstv+kx1HUrJrOU4CTICl5/e+7+P3HgnGBaOlHAz+2dv/5ODw086Dh93Pv/iy68eHT1+V+aVwOQM5ywX72NUEkYzciapZOR9IQjiMSPn8dVPtn4+J6KkeTaSy4KMOZpkNKUYSZOKjg6uidwiqSOgovM/ASQCQmHC0iJW1Cg6HvAgA5TcC1DYPuyU0YTAXCaug7hMO6QtDQA72qBXoX/XZi96SpgY2i4ReFyBcAmncSKfoy1JeZBdIGR3xbrRBrUQVfRbq4H/M4qP29kBudwZ2Lj2PpD8L7KpOxhHfzo2eKfs0QCY103GkZtrPI2m9IlZM0ZpVnzFaip3UG2bI73R4qj2SzesXVtrM32p6NOZ7tdOr+xf6zdo5rvG0v6FyvrjgMki8KHckokAhAnuQSpfTvkdbKNgMZ5VSWposepOZkIDnFiKnftd4p2+vfI2r9jmq359sWhyX1z/tgtL30qAfM2mBzcdBm9KzwR357I9RbDQtTur9n0IjAmKtrU7XfpV7Jrf1wp/rmagtoB6BbmZgM6leKywjbl+KG/k+4EF36NeibgI1atcYthxsT8Sle3p7LA0t0LcI9axS9Oh4cDpwD7gZhE1w7DXPMDraFzDJcVJjFDZXkRDgo5VkhIihnRXViVpED4Ck3IhQkzxEk5Vu5a1ODEZBKQ5sL8MglcdpOhEC/LJY8N0jZSbtdsclftopLpj2NFs6KSJMP1QmnFgMyBvWNBQgXBki1NgLCgxivAU2TGLM1NbDZ6Y5kpYXMi241gPlZl6lZvWYq5bpMXdaNdKEhGrnHOcqS7xRMEadsmZAUVUza/UtX8a59ZM5LcpmdGtJRiTMBZ3QDFGUgntq502n6mE7t2FPxOzQYK8Ma5/LYhAMhfGSX25aLNhE/iNvdf0XUiarZEmbLelnAHTjJ1LXpBM6fr4sbwkMJ6IvCpahm/wnVEjgFKzDTWetGk1wpzScPtM3gzePT8Nvz9/tuL41evm/P6wHvifev5Xuj94L3yfvG3pmHD/4+Pdw73C/s+h86PzR+bOG7u81nK+91tP56z9G0Lzs</latexit>

= argmax

y

Ψ(y, w)

<latexit sha1_base64="DypLI6QWGLOj7fmyWz0DYpUsnmg=">AFfXiclVNdb9MwFM3KCqN8bfDIi2Gq1rKuagbSeJk0AUJI0SR2JfmUjmO01qz48h2tlWxSv8RP4GfwCcpO2atSDNUpyb63Pf6JkgYVbrT+bVUubNcvXtv5X7twcNHj5+srj09UiKVmBxiwYQ8CZAijMbkUFPNyEkiCeIBI8fB+fvs/PiCSEVF/FWPEtLjaBDTiGKknau/tvS7VodRsw4ObStkD2Htkm2AVQpRwyqlWfcN3fvNHGz6NgNAPSQaWQBxKHQeEtkpw6jPs83wLd98Wat3m0UrJDTEBSwPEkETaQXCWNRKa1pQk5O4Nl5TGAHKkhxgxczrJ27T/INrImG5dZDKDni+osb/lNxv7YBN8bALHPUS6kGdtxonkgKOr/tQFmRiASRPAbBdqi9ATpaOx0qxbWznJrftxzbVxTdZfXe+0O/kC84Y/Nta98er21yoShgKnMQaM6TUmd9JdM8gqSlmxNZgqkiC8DkakDNnxogT1TP5dFpQd54QREK6J9Yg985GMSVGvHAIbMq1M2zLno7CzV0dueoXGSahLjIlGUMqAFyEYdhFQSrNnIGQhL6rQCPERu5rT7IdytzaQZEnZBdLkQzHtGRXn2kqSA23LwVFoDUoSk0sOEdx+MrACHKRiGJUMp0NszRxF7Ur1Z4QRM1bt2UkhENhaQDGiPGSKRhtpXd7jXUMN9r8ANxFyTJgVP9OSESaSGdkmLErLuwAXwBM/N/SBpPkc4sl2VyAa6YrC8iIbGxewxoQgMBlKkSUnwXHwu1BGgyF1DgSflsALhptS/OZPzxtF23/d3v7yZn3v3XheV7zn3kuv4fnejrfnfK63qGHK6eV75UflZ/Lf6r1aqvaLqCVpXHM6+0qjt/AWSk5PE=</latexit>

same for all settings of y

slide-31
SLIDE 31

Conditional random fields (CRFs)

6

■ Decoding: Direct application of Viterbi!

ˆ y = argmax

y

log P(y | w)

<latexit sha1_base64="tmRzwOs7SiKi8INixRl4Jca9kY=">AFB3iclVNb9MwFE5Lga1cdnvkxTBNa9lFzUCl0kTIQ0TRSJXdBcKsc5a3ZcWQ726rgH8Cv4Q3xys/gmT+Ck3RVsw4kLMU+Of7Od+5OEg406bT+VWr32rcvnN3br57/6DhwuLS8tHWqaKwiGVXKqTgGjgLIZDwyHk0QBEQGH4+DsdX5/fA5KMxl/NKMEeoIMYhYxSoxT9ZdqX5pruKtZCwciu7CbKD9Hto12EdapwJwJZnQ/E7u+/ZwdbPg2B2AzBEMswjSUpjCJ7MTDqC/yLRNbvsT7eZat1V6xYKFqIQVASJFaIbhMmndRKFtswqFQr3ugrIYUHMkBKefbqK27Z/cbSe/rvJMp9GxCrf0tv93aRxvobRs18ZCYkp21uUuiBoJc9icqzOUAXdUATRehv7ja2e4UC80K/lhY9car21+qKxKmgqIDeVE61O/k5heRpRhlINt4lRDQugZGcCpE2MiQPeyYk4sWnOaEVSuS82qNBOW2REaD0SgUPm9dX73LlTXenqYle9jIWJ6mBmJaBopQjI1E+dChkCqjhIycQqpjiuiQuO4bN5quN1NhsDPwVQToaKX6aiIXqEUCFs1viwTbWIFMVxQKQSJw6cZjohgfBRCRFJu8rGKruSb6rUZnrNEj0s3cnBYKnYgMWEc4gMzreq2h1Dg4u9id+Aa5CA8f6fQKGKkck3I2rGvYAD/GufgvJIsnSCdW08oKAi6ZvC4ygTiz5avgUgMOBkqmSYXwjH1B1DkgkWtDiYeqWYlwU+pfn8lZ4Whn23+2vfPh+ereq/G8znmPvCdey/O9F96e987reocerf2uz9eX6yuNr41vje+NHyW0XhvbrHiV1fj5BxU+uaY=</latexit>

= argmax

y

Ψ(y, w) − log X

y0∈Y(w)

exp Ψ(y0, w)

<latexit sha1_base64="DypLI6QWGLOj7fmyWz0DYpUsnmg=">AFfXiclVNdb9MwFM3KCqN8bfDIi2Gq1rKuagbSeJk0AUJI0SR2JfmUjmO01qz48h2tlWxSv8RP4GfwCcpO2atSDNUpyb63Pf6JkgYVbrT+bVUubNcvXtv5X7twcNHj5+srj09UiKVmBxiwYQ8CZAijMbkUFPNyEkiCeIBI8fB+fvs/PiCSEVF/FWPEtLjaBDTiGKknau/tvS7VodRsw4ObStkD2Htkm2AVQpRwyqlWfcN3fvNHGz6NgNAPSQaWQBxKHQeEtkpw6jPs83wLd98Wat3m0UrJDTEBSwPEkETaQXCWNRKa1pQk5O4Nl5TGAHKkhxgxczrJ27T/INrImG5dZDKDni+osb/lNxv7YBN8bALHPUS6kGdtxonkgKOr/tQFmRiASRPAbBdqi9ATpaOx0qxbWznJrftxzbVxTdZfXe+0O/kC84Y/Nta98er21yoShgKnMQaM6TUmd9JdM8gqSlmxNZgqkiC8DkakDNnxogT1TP5dFpQd54QREK6J9Yg985GMSVGvHAIbMq1M2zLno7CzV0dueoXGSahLjIlGUMqAFyEYdhFQSrNnIGQhL6rQCPERu5rT7IdytzaQZEnZBdLkQzHtGRXn2kqSA23LwVFoDUoSk0sOEdx+MrACHKRiGJUMp0NszRxF7Ur1Z4QRM1bt2UkhENhaQDGiPGSKRhtpXd7jXUMN9r8ANxFyTJgVP9OSESaSGdkmLErLuwAXwBM/N/SBpPkc4sl2VyAa6YrC8iIbGxewxoQgMBlKkSUnwXHwu1BGgyF1DgSflsALhptS/OZPzxtF23/d3v7yZn3v3XheV7zn3kuv4fnejrfnfK63qGHK6eV75UflZ/Lf6r1aqvaLqCVpXHM6+0qjt/AWSk5PE=</latexit>

P(y | w) = exp(Ψ(w, y)) P

y0∈Y(w)

exp(Ψ(w, y0))

<latexit sha1_base64="Hl4SzGDfq/v1/mRkJxB+iyWEgs4=">AH9nicnVTPb9s2FayZG69X0137IVbEFha3cDqCmyXAkW3wy7tPMBpOoSOQFGUTYeUBIpybBD8V3obdt3fstv+kx1HUrJrOU4CTICl5/e+7+P3HgnGBaOlHAz+2dv/5ODw086Dh93Pv/iy68eHT1+V+aVwOQM5ywX72NUEkYzciapZOR9IQjiMSPn8dVPtn4+J6KkeTaSy4KMOZpkNKUYSZOKjg6uidwiqSOgovM/ASQCQmHC0iJW1Cg6HvAgA5TcC1DYPuyU0YTAXCaug7hMO6QtDQA72qBXoX/XZi96SpgY2i4ReFyBcAmncSKfoy1JeZBdIGR3xbrRBrUQVfRbq4H/M4qP29kBudwZ2Lj2PpD8L7KpOxhHfzo2eKfs0QCY103GkZtrPI2m9IlZM0ZpVnzFaip3UG2bI73R4qj2SzesXVtrM32p6NOZ7tdOr+xf6zdo5rvG0v6FyvrjgMki8KHckokAhAnuQSpfTvkdbKNgMZ5VSWposepOZkIDnFiKnftd4p2+vfI2r9jmq359sWhyX1z/tgtL30qAfM2mBzcdBm9KzwR357I9RbDQtTur9n0IjAmKtrU7XfpV7Jrf1wp/rmagtoB6BbmZgM6leKywjbl+KG/k+4EF36NeibgI1atcYthxsT8Sle3p7LA0t0LcI9axS9Oh4cDpwD7gZhE1w7DXPMDraFzDJcVJjFDZXkRDgo5VkhIihnRXViVpED4Ck3IhQkzxEk5Vu5a1ODEZBKQ5sL8MglcdpOhEC/LJY8N0jZSbtdsclftopLpj2NFs6KSJMP1QmnFgMyBvWNBQgXBki1NgLCgxivAU2TGLM1NbDZ6Y5kpYXMi241gPlZl6lZvWYq5bpMXdaNdKEhGrnHOcqS7xRMEadsmZAUVUza/UtX8a59ZM5LcpmdGtJRiTMBZ3QDFGUgntq502n6mE7t2FPxOzQYK8Ma5/LYhAMhfGSX25aLNhE/iNvdf0XUiarZEmbLelnAHTjJ1LXpBM6fr4sbwkMJ6IvCpahm/wnVEjgFKzDTWetGk1wpzScPtM3gzePT8Nvz9/tuL41evm/P6wHvifev5Xuj94L3yfvG3pmHD/4+Pdw73C/s+h86PzR+bOG7u81nK+91tP56z9G0Lzs</latexit>

= argmax

y

Ψ(y, w)

<latexit sha1_base64="DypLI6QWGLOj7fmyWz0DYpUsnmg=">AFfXiclVNdb9MwFM3KCqN8bfDIi2Gq1rKuagbSeJk0AUJI0SR2JfmUjmO01qz48h2tlWxSv8RP4GfwCcpO2atSDNUpyb63Pf6JkgYVbrT+bVUubNcvXtv5X7twcNHj5+srj09UiKVmBxiwYQ8CZAijMbkUFPNyEkiCeIBI8fB+fvs/PiCSEVF/FWPEtLjaBDTiGKknau/tvS7VodRsw4ObStkD2Htkm2AVQpRwyqlWfcN3fvNHGz6NgNAPSQaWQBxKHQeEtkpw6jPs83wLd98Wat3m0UrJDTEBSwPEkETaQXCWNRKa1pQk5O4Nl5TGAHKkhxgxczrJ27T/INrImG5dZDKDni+osb/lNxv7YBN8bALHPUS6kGdtxonkgKOr/tQFmRiASRPAbBdqi9ATpaOx0qxbWznJrftxzbVxTdZfXe+0O/kC84Y/Nta98er21yoShgKnMQaM6TUmd9JdM8gqSlmxNZgqkiC8DkakDNnxogT1TP5dFpQd54QREK6J9Yg985GMSVGvHAIbMq1M2zLno7CzV0dueoXGSahLjIlGUMqAFyEYdhFQSrNnIGQhL6rQCPERu5rT7IdytzaQZEnZBdLkQzHtGRXn2kqSA23LwVFoDUoSk0sOEdx+MrACHKRiGJUMp0NszRxF7Ur1Z4QRM1bt2UkhENhaQDGiPGSKRhtpXd7jXUMN9r8ANxFyTJgVP9OSESaSGdkmLErLuwAXwBM/N/SBpPkc4sl2VyAa6YrC8iIbGxewxoQgMBlKkSUnwXHwu1BGgyF1DgSflsALhptS/OZPzxtF23/d3v7yZn3v3XheV7zn3kuv4fnejrfnfK63qGHK6eV75UflZ/Lf6r1aqvaLqCVpXHM6+0qjt/AWSk5PE=</latexit>

same for all settings of y

= argmax

y M+1

X

m=1

ψ(w, ym, ym−1, m)

<latexit sha1_base64="DlaVeuBnBAV5S8rb4kmojlkndes=">AFvniclVPdbtMwFM4KhRH+NrjkxjBVa9mPmg0JbiZNgBDSNFEk9oPmUjmO01qz48h2tlZWXoSn4RbegLfBTtquWQtiluqcHn/nO985Pg5TRpVut38v1W7drt+5u3zPv/g4aPHK6tPjpXIJCZHWDAhT0OkCKMJOdJUM3KaSoJ4yMhJeP7OnZ9cEKmoSL7oUq6HPUTGlOMtHX1Vmu7fgN2FG3CkJvLfBO47yhvgT0AVcYho5xq1TN8L8i/mcONIHcAqAdEoxAHAldhMT5lGHU424zfCuw/3jLb3SaJSvkNAIlrEgQS4QNJMO0uUhCKzcVCYV73SalCYAc6QFGzHyd5G3lfyFad0w3LjKdQc8X1DzYClrNA7ABPrSA5R4gXcrLc8eJZJ+jYW/qgkz0waQJYLYLfmMRfCJ1NJbq2rVsNy4IVdc61dk/nxSR/zfDeitrLW328UC80YwNta8erYMZMwEjJNGYIaXOgnaquwZJTEjuQ8zRVKEz1GfnFkzQZyorinGOwcN64lALKT9JRoU3tkIg7hSIx5apOuCun7mnIvOzjIdv+kamqSZJgkuE8UZA1oA91ZARCXBmo2sgbCkVivA2SHVtsXZa9Js2AsAuiq4Vg3jUqLrJXJIU8rwYPy0J9KElCLrHgHCXRSwNjxCkbRSRGdPuNcQTe1G/NqMLmqpx6aUjGgoJO3TBDFGYg3dVnXbz0DYvfhe2IvSJDq/pTSiTSQlol5bTk9sL68Dl05r+QNJkirVktyxQCbDGuLyIlicnL2WVCERj2pcjSiuC5+EKoJUCxvYST6phJcJOaXB9JueN453tYHd75/Ortf2343ld9p5L7ymF3ivX3vo9fxjxc+17UftZ+1Xfr8d1XhcltLY0jnqVZ9+AfhCPx+</latexit>
slide-32
SLIDE 32

Conditional random fields (CRFs)

6

■ Decoding: Direct application of Viterbi!

ˆ y = argmax

y

log P(y | w)

<latexit sha1_base64="tmRzwOs7SiKi8INixRl4Jca9kY=">AFB3iclVNb9MwFE5Lga1cdnvkxTBNa9lFzUCl0kTIQ0TRSJXdBcKsc5a3ZcWQ726rgH8Cv4Q3xys/gmT+Ck3RVsw4kLMU+Of7Od+5OEg406bT+VWr32rcvnN3br57/6DhwuLS8tHWqaKwiGVXKqTgGjgLIZDwyHk0QBEQGH4+DsdX5/fA5KMxl/NKMEeoIMYhYxSoxT9ZdqX5pruKtZCwciu7CbKD9Hto12EdapwJwJZnQ/E7u+/ZwdbPg2B2AzBEMswjSUpjCJ7MTDqC/yLRNbvsT7eZat1V6xYKFqIQVASJFaIbhMmndRKFtswqFQr3ugrIYUHMkBKefbqK27Z/cbSe/rvJMp9GxCrf0tv93aRxvobRs18ZCYkp21uUuiBoJc9icqzOUAXdUATRehv7ja2e4UC80K/lhY9car21+qKxKmgqIDeVE61O/k5heRpRhlINt4lRDQugZGcCpE2MiQPeyYk4sWnOaEVSuS82qNBOW2REaD0SgUPm9dX73LlTXenqYle9jIWJ6mBmJaBopQjI1E+dChkCqjhIycQqpjiuiQuO4bN5quN1NhsDPwVQToaKX6aiIXqEUCFs1viwTbWIFMVxQKQSJw6cZjohgfBRCRFJu8rGKruSb6rUZnrNEj0s3cnBYKnYgMWEc4gMzreq2h1Dg4u9id+Aa5CA8f6fQKGKkck3I2rGvYAD/GufgvJIsnSCdW08oKAi6ZvC4ygTiz5avgUgMOBkqmSYXwjH1B1DkgkWtDiYeqWYlwU+pfn8lZ4Whn23+2vfPh+ereq/G8znmPvCdey/O9F96e987reocerf2uz9eX6yuNr41vje+NHyW0XhvbrHiV1fj5BxU+uaY=</latexit>

= argmax

y

Ψ(y, w) − log X

y0∈Y(w)

exp Ψ(y0, w)

<latexit sha1_base64="DypLI6QWGLOj7fmyWz0DYpUsnmg=">AFfXiclVNdb9MwFM3KCqN8bfDIi2Gq1rKuagbSeJk0AUJI0SR2JfmUjmO01qz48h2tlWxSv8RP4GfwCcpO2atSDNUpyb63Pf6JkgYVbrT+bVUubNcvXtv5X7twcNHj5+srj09UiKVmBxiwYQ8CZAijMbkUFPNyEkiCeIBI8fB+fvs/PiCSEVF/FWPEtLjaBDTiGKknau/tvS7VodRsw4ObStkD2Htkm2AVQpRwyqlWfcN3fvNHGz6NgNAPSQaWQBxKHQeEtkpw6jPs83wLd98Wat3m0UrJDTEBSwPEkETaQXCWNRKa1pQk5O4Nl5TGAHKkhxgxczrJ27T/INrImG5dZDKDni+osb/lNxv7YBN8bALHPUS6kGdtxonkgKOr/tQFmRiASRPAbBdqi9ATpaOx0qxbWznJrftxzbVxTdZfXe+0O/kC84Y/Nta98er21yoShgKnMQaM6TUmd9JdM8gqSlmxNZgqkiC8DkakDNnxogT1TP5dFpQd54QREK6J9Yg985GMSVGvHAIbMq1M2zLno7CzV0dueoXGSahLjIlGUMqAFyEYdhFQSrNnIGQhL6rQCPERu5rT7IdytzaQZEnZBdLkQzHtGRXn2kqSA23LwVFoDUoSk0sOEdx+MrACHKRiGJUMp0NszRxF7Ur1Z4QRM1bt2UkhENhaQDGiPGSKRhtpXd7jXUMN9r8ANxFyTJgVP9OSESaSGdkmLErLuwAXwBM/N/SBpPkc4sl2VyAa6YrC8iIbGxewxoQgMBlKkSUnwXHwu1BGgyF1DgSflsALhptS/OZPzxtF23/d3v7yZn3v3XheV7zn3kuv4fnejrfnfK63qGHK6eV75UflZ/Lf6r1aqvaLqCVpXHM6+0qjt/AWSk5PE=</latexit>

P(y | w) = exp(Ψ(w, y)) P

y0∈Y(w)

exp(Ψ(w, y0))

<latexit sha1_base64="Hl4SzGDfq/v1/mRkJxB+iyWEgs4=">AH9nicnVTPb9s2FayZG69X0137IVbEFha3cDqCmyXAkW3wy7tPMBpOoSOQFGUTYeUBIpybBD8V3obdt3fstv+kx1HUrJrOU4CTICl5/e+7+P3HgnGBaOlHAz+2dv/5ODw086Dh93Pv/iy68eHT1+V+aVwOQM5ywX72NUEkYzciapZOR9IQjiMSPn8dVPtn4+J6KkeTaSy4KMOZpkNKUYSZOKjg6uidwiqSOgovM/ASQCQmHC0iJW1Cg6HvAgA5TcC1DYPuyU0YTAXCaug7hMO6QtDQA72qBXoX/XZi96SpgY2i4ReFyBcAmncSKfoy1JeZBdIGR3xbrRBrUQVfRbq4H/M4qP29kBudwZ2Lj2PpD8L7KpOxhHfzo2eKfs0QCY103GkZtrPI2m9IlZM0ZpVnzFaip3UG2bI73R4qj2SzesXVtrM32p6NOZ7tdOr+xf6zdo5rvG0v6FyvrjgMki8KHckokAhAnuQSpfTvkdbKNgMZ5VSWposepOZkIDnFiKnftd4p2+vfI2r9jmq359sWhyX1z/tgtL30qAfM2mBzcdBm9KzwR357I9RbDQtTur9n0IjAmKtrU7XfpV7Jrf1wp/rmagtoB6BbmZgM6leKywjbl+KG/k+4EF36NeibgI1atcYthxsT8Sle3p7LA0t0LcI9axS9Oh4cDpwD7gZhE1w7DXPMDraFzDJcVJjFDZXkRDgo5VkhIihnRXViVpED4Ck3IhQkzxEk5Vu5a1ODEZBKQ5sL8MglcdpOhEC/LJY8N0jZSbtdsclftopLpj2NFs6KSJMP1QmnFgMyBvWNBQgXBki1NgLCgxivAU2TGLM1NbDZ6Y5kpYXMi241gPlZl6lZvWYq5bpMXdaNdKEhGrnHOcqS7xRMEadsmZAUVUza/UtX8a59ZM5LcpmdGtJRiTMBZ3QDFGUgntq502n6mE7t2FPxOzQYK8Ma5/LYhAMhfGSX25aLNhE/iNvdf0XUiarZEmbLelnAHTjJ1LXpBM6fr4sbwkMJ6IvCpahm/wnVEjgFKzDTWetGk1wpzScPtM3gzePT8Nvz9/tuL41evm/P6wHvifev5Xuj94L3yfvG3pmHD/4+Pdw73C/s+h86PzR+bOG7u81nK+91tP56z9G0Lzs</latexit>

= argmax

y

Ψ(y, w)

<latexit sha1_base64="DypLI6QWGLOj7fmyWz0DYpUsnmg=">AFfXiclVNdb9MwFM3KCqN8bfDIi2Gq1rKuagbSeJk0AUJI0SR2JfmUjmO01qz48h2tlWxSv8RP4GfwCcpO2atSDNUpyb63Pf6JkgYVbrT+bVUubNcvXtv5X7twcNHj5+srj09UiKVmBxiwYQ8CZAijMbkUFPNyEkiCeIBI8fB+fvs/PiCSEVF/FWPEtLjaBDTiGKknau/tvS7VodRsw4ObStkD2Htkm2AVQpRwyqlWfcN3fvNHGz6NgNAPSQaWQBxKHQeEtkpw6jPs83wLd98Wat3m0UrJDTEBSwPEkETaQXCWNRKa1pQk5O4Nl5TGAHKkhxgxczrJ27T/INrImG5dZDKDni+osb/lNxv7YBN8bALHPUS6kGdtxonkgKOr/tQFmRiASRPAbBdqi9ATpaOx0qxbWznJrftxzbVxTdZfXe+0O/kC84Y/Nta98er21yoShgKnMQaM6TUmd9JdM8gqSlmxNZgqkiC8DkakDNnxogT1TP5dFpQd54QREK6J9Yg985GMSVGvHAIbMq1M2zLno7CzV0dueoXGSahLjIlGUMqAFyEYdhFQSrNnIGQhL6rQCPERu5rT7IdytzaQZEnZBdLkQzHtGRXn2kqSA23LwVFoDUoSk0sOEdx+MrACHKRiGJUMp0NszRxF7Ur1Z4QRM1bt2UkhENhaQDGiPGSKRhtpXd7jXUMN9r8ANxFyTJgVP9OSESaSGdkmLErLuwAXwBM/N/SBpPkc4sl2VyAa6YrC8iIbGxewxoQgMBlKkSUnwXHwu1BGgyF1DgSflsALhptS/OZPzxtF23/d3v7yZn3v3XheV7zn3kuv4fnejrfnfK63qGHK6eV75UflZ/Lf6r1aqvaLqCVpXHM6+0qjt/AWSk5PE=</latexit>

same for all settings of y

= argmax

y M+1

X

m=1

ψ(w, ym, ym−1, m)

<latexit sha1_base64="DlaVeuBnBAV5S8rb4kmojlkndes=">AFvniclVPdbtMwFM4KhRH+NrjkxjBVa9mPmg0JbiZNgBDSNFEk9oPmUjmO01qz48h2tlZWXoSn4RbegLfBTtquWQtiluqcHn/nO985Pg5TRpVut38v1W7drt+5u3zPv/g4aPHK6tPjpXIJCZHWDAhT0OkCKMJOdJUM3KaSoJ4yMhJeP7OnZ9cEKmoSL7oUq6HPUTGlOMtHX1Vmu7fgN2FG3CkJvLfBO47yhvgT0AVcYho5xq1TN8L8i/mcONIHcAqAdEoxAHAldhMT5lGHU424zfCuw/3jLb3SaJSvkNAIlrEgQS4QNJMO0uUhCKzcVCYV73SalCYAc6QFGzHyd5G3lfyFad0w3LjKdQc8X1DzYClrNA7ABPrSA5R4gXcrLc8eJZJ+jYW/qgkz0waQJYLYLfmMRfCJ1NJbq2rVsNy4IVdc61dk/nxSR/zfDeitrLW328UC80YwNta8erYMZMwEjJNGYIaXOgnaquwZJTEjuQ8zRVKEz1GfnFkzQZyorinGOwcN64lALKT9JRoU3tkIg7hSIx5apOuCun7mnIvOzjIdv+kamqSZJgkuE8UZA1oA91ZARCXBmo2sgbCkVivA2SHVtsXZa9Js2AsAuiq4Vg3jUqLrJXJIU8rwYPy0J9KElCLrHgHCXRSwNjxCkbRSRGdPuNcQTe1G/NqMLmqpx6aUjGgoJO3TBDFGYg3dVnXbz0DYvfhe2IvSJDq/pTSiTSQlol5bTk9sL68Dl05r+QNJkirVktyxQCbDGuLyIlicnL2WVCERj2pcjSiuC5+EKoJUCxvYST6phJcJOaXB9JueN453tYHd75/Ortf2343ld9p5L7ymF3ivX3vo9fxjxc+17UftZ+1Xfr8d1XhcltLY0jnqVZ9+AfhCPx+</latexit>

= sm(ym, ym−1)

<latexit sha1_base64="HUJof5rs93v8xuaMgJAQitvYE=">AGNXiclVPbhMxEN02BEq4tfDIi6GKmtCLsgWpCKlSuQghVRVBohdUp5HX602s2uV7W0bWf4hfoBv4YE3xCu/gHc3SbNQNRSvJPxzJnj45kgYVTpVuv73HzlRvXmrYXbtTt3791/sLj08ECJVGKyjwUT8ihAijAak31NSNHiSIB4wcBqdvs/PDMyIVFfFnPUhIh6NeTCOKkXau7tL81odthVtwICbc7sGsu/ANsE2gCrlkFOteoavu3bE7O36tsAOo+0cgCiEOh85TIjhEGXZ5thq/7h9v1urtRoEKOQ1BEZYXiCTCBpKLpDGLQtOaEoXcveK0hAjnQfI2a+jOo27V+AVjKka18ymYievlBjd91vNnbBKnjfBA67j3RBz9oME8keRxfdsQsy0QMjEcCkCrX6rPAR1cGQaibXeo5ybUEusVYuwWZVzZD/X4Ft1eWNSa/DhBNwXeO/2nOg06IXJ7n0Lv5aVbuLy62NVr7AtOEPjWVvuNquyUMBU45iTVmSKljv5XojkFSU8yIrcFUkQThU9Qjx86MESeqY/KpsqDuPCGIhHS/WIPcO5lhEFdqwAMXmYmvrp5lzlnx6mOXnYMjZNUkxgXhaKUAS1ANqIgpJgzQbOQFhSxXgPnKzot0gO50nyvQJOyO6fBHMO0ZFefUSpYDbcvJFcdEalCQm51hwjuLwmYER4pQNQhKhlOlsCKORPUuvtfCMJmo3RiSEQ2FpD0aI8ZIpG2ld3u09cw32vwHXEPJMmeY/0xIRJpIR2Tokete7AefJL1i/1XJI3Hkc4sX8vkBNxlMl1EQmJji5FhQhEY9KRIkxLhqfycqANAkXuGIp6U04oI16X+1Z6cNg42N/znG5ufXizvBn264L32HvqNTzf2/J2vA9e29v3cGWpslXZqbyufqv+qP6s/ipC5+eGOY+80qr+/gO+OCZh</latexit>
slide-33
SLIDE 33

Conditional random fields (CRFs)

7

■ Decoding: Direct application of Viterbi!

= argmax

y M+1

X

m=1

ψ(w, ym, ym−1, m)

<latexit sha1_base64="DlaVeuBnBAV5S8rb4kmojlkndes=">AFvniclVPdbtMwFM4KhRH+NrjkxjBVa9mPmg0JbiZNgBDSNFEk9oPmUjmO01qz48h2tlZWXoSn4RbegLfBTtquWQtiluqcHn/nO985Pg5TRpVut38v1W7drt+5u3zPv/g4aPHK6tPjpXIJCZHWDAhT0OkCKMJOdJUM3KaSoJ4yMhJeP7OnZ9cEKmoSL7oUq6HPUTGlOMtHX1Vmu7fgN2FG3CkJvLfBO47yhvgT0AVcYho5xq1TN8L8i/mcONIHcAqAdEoxAHAldhMT5lGHU424zfCuw/3jLb3SaJSvkNAIlrEgQS4QNJMO0uUhCKzcVCYV73SalCYAc6QFGzHyd5G3lfyFad0w3LjKdQc8X1DzYClrNA7ABPrSA5R4gXcrLc8eJZJ+jYW/qgkz0waQJYLYLfmMRfCJ1NJbq2rVsNy4IVdc61dk/nxSR/zfDeitrLW328UC80YwNta8erYMZMwEjJNGYIaXOgnaquwZJTEjuQ8zRVKEz1GfnFkzQZyorinGOwcN64lALKT9JRoU3tkIg7hSIx5apOuCun7mnIvOzjIdv+kamqSZJgkuE8UZA1oA91ZARCXBmo2sgbCkVivA2SHVtsXZa9Js2AsAuiq4Vg3jUqLrJXJIU8rwYPy0J9KElCLrHgHCXRSwNjxCkbRSRGdPuNcQTe1G/NqMLmqpx6aUjGgoJO3TBDFGYg3dVnXbz0DYvfhe2IvSJDq/pTSiTSQlol5bTk9sL68Dl05r+QNJkirVktyxQCbDGuLyIlicnL2WVCERj2pcjSiuC5+EKoJUCxvYST6phJcJOaXB9JueN453tYHd75/Ortf2343ld9p5L7ymF3ivX3vo9fxjxc+17UftZ+1Xfr8d1XhcltLY0jnqVZ9+AfhCPx+</latexit>

= sm(ym, ym−1)

<latexit sha1_base64="HUJof5rs93v8xuaMgJAQitvYE=">AGNXiclVPbhMxEN02BEq4tfDIi6GKmtCLsgWpCKlSuQghVRVBohdUp5HX602s2uV7W0bWf4hfoBv4YE3xCu/gHc3SbNQNRSvJPxzJnj45kgYVTpVuv73HzlRvXmrYXbtTt3791/sLj08ECJVGKyjwUT8ihAijAak31NSNHiSIB4wcBqdvs/PDMyIVFfFnPUhIh6NeTCOKkXau7tL81odthVtwICbc7sGsu/ANsE2gCrlkFOteoavu3bE7O36tsAOo+0cgCiEOh85TIjhEGXZ5thq/7h9v1urtRoEKOQ1BEZYXiCTCBpKLpDGLQtOaEoXcveK0hAjnQfI2a+jOo27V+AVjKka18ymYievlBjd91vNnbBKnjfBA67j3RBz9oME8keRxfdsQsy0QMjEcCkCrX6rPAR1cGQaibXeo5ybUEusVYuwWZVzZD/X4Ft1eWNSa/DhBNwXeO/2nOg06IXJ7n0Lv5aVbuLy62NVr7AtOEPjWVvuNquyUMBU45iTVmSKljv5XojkFSU8yIrcFUkQThU9Qjx86MESeqY/KpsqDuPCGIhHS/WIPcO5lhEFdqwAMXmYmvrp5lzlnx6mOXnYMjZNUkxgXhaKUAS1ANqIgpJgzQbOQFhSxXgPnKzot0gO50nyvQJOyO6fBHMO0ZFefUSpYDbcvJFcdEalCQm51hwjuLwmYER4pQNQhKhlOlsCKORPUuvtfCMJmo3RiSEQ2FpD0aI8ZIpG2ld3u09cw32vwHXEPJMmeY/0xIRJpIR2Tokete7AefJL1i/1XJI3Hkc4sX8vkBNxlMl1EQmJji5FhQhEY9KRIkxLhqfycqANAkXuGIp6U04oI16X+1Z6cNg42N/znG5ufXizvBn264L32HvqNTzf2/J2vA9e29v3cGWpslXZqbyufqv+qP6s/ipC5+eGOY+80qr+/gO+OCZh</latexit>

ˆ y

<latexit sha1_base64="tmRzwOs7SiKi8INixRl4Jca9kY=">AFB3iclVNb9MwFE5Lga1cdnvkxTBNa9lFzUCl0kTIQ0TRSJXdBcKsc5a3ZcWQ726rgH8Cv4Q3xys/gmT+Ck3RVsw4kLMU+Of7Od+5OEg406bT+VWr32rcvnN3br57/6DhwuLS8tHWqaKwiGVXKqTgGjgLIZDwyHk0QBEQGH4+DsdX5/fA5KMxl/NKMEeoIMYhYxSoxT9ZdqX5pruKtZCwciu7CbKD9Hto12EdapwJwJZnQ/E7u+/ZwdbPg2B2AzBEMswjSUpjCJ7MTDqC/yLRNbvsT7eZat1V6xYKFqIQVASJFaIbhMmndRKFtswqFQr3ugrIYUHMkBKefbqK27Z/cbSe/rvJMp9GxCrf0tv93aRxvobRs18ZCYkp21uUuiBoJc9icqzOUAXdUATRehv7ja2e4UC80K/lhY9car21+qKxKmgqIDeVE61O/k5heRpRhlINt4lRDQugZGcCpE2MiQPeyYk4sWnOaEVSuS82qNBOW2REaD0SgUPm9dX73LlTXenqYle9jIWJ6mBmJaBopQjI1E+dChkCqjhIycQqpjiuiQuO4bN5quN1NhsDPwVQToaKX6aiIXqEUCFs1viwTbWIFMVxQKQSJw6cZjohgfBRCRFJu8rGKruSb6rUZnrNEj0s3cnBYKnYgMWEc4gMzreq2h1Dg4u9id+Aa5CA8f6fQKGKkck3I2rGvYAD/GufgvJIsnSCdW08oKAi6ZvC4ygTiz5avgUgMOBkqmSYXwjH1B1DkgkWtDiYeqWYlwU+pfn8lZ4Whn23+2vfPh+ereq/G8znmPvCdey/O9F96e987reocerf2uz9eX6yuNr41vje+NHyW0XhvbrHiV1fj5BxU+uaY=</latexit>
slide-34
SLIDE 34

Conditional random fields (CRFs)

7

■ Decoding: Direct application of Viterbi!

= argmax

y M+1

X

m=1

ψ(w, ym, ym−1, m)

<latexit sha1_base64="DlaVeuBnBAV5S8rb4kmojlkndes=">AFvniclVPdbtMwFM4KhRH+NrjkxjBVa9mPmg0JbiZNgBDSNFEk9oPmUjmO01qz48h2tlZWXoSn4RbegLfBTtquWQtiluqcHn/nO985Pg5TRpVut38v1W7drt+5u3zPv/g4aPHK6tPjpXIJCZHWDAhT0OkCKMJOdJUM3KaSoJ4yMhJeP7OnZ9cEKmoSL7oUq6HPUTGlOMtHX1Vmu7fgN2FG3CkJvLfBO47yhvgT0AVcYho5xq1TN8L8i/mcONIHcAqAdEoxAHAldhMT5lGHU424zfCuw/3jLb3SaJSvkNAIlrEgQS4QNJMO0uUhCKzcVCYV73SalCYAc6QFGzHyd5G3lfyFad0w3LjKdQc8X1DzYClrNA7ABPrSA5R4gXcrLc8eJZJ+jYW/qgkz0waQJYLYLfmMRfCJ1NJbq2rVsNy4IVdc61dk/nxSR/zfDeitrLW328UC80YwNta8erYMZMwEjJNGYIaXOgnaquwZJTEjuQ8zRVKEz1GfnFkzQZyorinGOwcN64lALKT9JRoU3tkIg7hSIx5apOuCun7mnIvOzjIdv+kamqSZJgkuE8UZA1oA91ZARCXBmo2sgbCkVivA2SHVtsXZa9Js2AsAuiq4Vg3jUqLrJXJIU8rwYPy0J9KElCLrHgHCXRSwNjxCkbRSRGdPuNcQTe1G/NqMLmqpx6aUjGgoJO3TBDFGYg3dVnXbz0DYvfhe2IvSJDq/pTSiTSQlol5bTk9sL68Dl05r+QNJkirVktyxQCbDGuLyIlicnL2WVCERj2pcjSiuC5+EKoJUCxvYST6phJcJOaXB9JueN453tYHd75/Ortf2343ld9p5L7ymF3ivX3vo9fxjxc+17UftZ+1Xfr8d1XhcltLY0jnqVZ9+AfhCPx+</latexit>

= sm(ym, ym−1)

<latexit sha1_base64="HUJof5rs93v8xuaMgJAQitvYE=">AGNXiclVPbhMxEN02BEq4tfDIi6GKmtCLsgWpCKlSuQghVRVBohdUp5HX602s2uV7W0bWf4hfoBv4YE3xCu/gHc3SbNQNRSvJPxzJnj45kgYVTpVuv73HzlRvXmrYXbtTt3791/sLj08ECJVGKyjwUT8ihAijAak31NSNHiSIB4wcBqdvs/PDMyIVFfFnPUhIh6NeTCOKkXau7tL81odthVtwICbc7sGsu/ANsE2gCrlkFOteoavu3bE7O36tsAOo+0cgCiEOh85TIjhEGXZ5thq/7h9v1urtRoEKOQ1BEZYXiCTCBpKLpDGLQtOaEoXcveK0hAjnQfI2a+jOo27V+AVjKka18ymYievlBjd91vNnbBKnjfBA67j3RBz9oME8keRxfdsQsy0QMjEcCkCrX6rPAR1cGQaibXeo5ybUEusVYuwWZVzZD/X4Ft1eWNSa/DhBNwXeO/2nOg06IXJ7n0Lv5aVbuLy62NVr7AtOEPjWVvuNquyUMBU45iTVmSKljv5XojkFSU8yIrcFUkQThU9Qjx86MESeqY/KpsqDuPCGIhHS/WIPcO5lhEFdqwAMXmYmvrp5lzlnx6mOXnYMjZNUkxgXhaKUAS1ANqIgpJgzQbOQFhSxXgPnKzot0gO50nyvQJOyO6fBHMO0ZFefUSpYDbcvJFcdEalCQm51hwjuLwmYER4pQNQhKhlOlsCKORPUuvtfCMJmo3RiSEQ2FpD0aI8ZIpG2ld3u09cw32vwHXEPJMmeY/0xIRJpIR2Tokete7AefJL1i/1XJI3Hkc4sX8vkBNxlMl1EQmJji5FhQhEY9KRIkxLhqfycqANAkXuGIp6U04oI16X+1Z6cNg42N/znG5ufXizvBn264L32HvqNTzf2/J2vA9e29v3cGWpslXZqbyufqv+qP6s/ipC5+eGOY+80qr+/gO+OCZh</latexit>

max

y1:M Ψ(w, y1:M) = max y1:M M+1

X

m=1

sm(ym, ym−1)

<latexit sha1_base64="cFZeFhMRSODTiOyXYHFD+C/9s=">AGOHiclVPdbtMwFM5WCiP8bcAdN4apWst+1Awk0KRJE5sQ0jRJPaD5hI5jtNas5PIdrZVlh+JF+BNuOMOcsT4CRt16wFaZbqnB5/5zufj8JUkalard/zM3XbtVv31m46967/+Dho8Wlx0cyQmhzhiTgJkCSMxuRQUcXISoI4gEjx8HZbn5+fE6EpEn8WQ1S0uWoF9OIYqSsy1+a/+Y2YEfSJgy4vjBrIP8OTAtsAygzDhnlVElf823PfNUHq57JAVD1iUIGQBwmqgiJzJh4PN803zds/94y210miUr5DQEJaxIEAmENSXaXOWhJbRFQmFe8UmpTGAHKk+Rkx/GeVtmX8QreRMN75kOoGevlBzf91rNfBKnjfApa7j1Qpz5icE4keR5f+2AVZ0gOjIoDJKriNWfCR1MFQal6u9YLlxgW54lq5IpuVNWe+QW2pc+bk+6WCyfofO1tHVjS6aKXJ0XpZ+CviZhK4i8utzfaxQLThjc0lp3h6tgGFzBMcMZJrDBDUp567VR1NRKYkaMCzNJUoTPUI+cWjNGnMiuLgbLgIb1hCBKhP3FChTeyQiNuJQDHlhkXn95/Sx3zjo7zVT0tqtpnGaKxLhMFGUMqATkUwpCKghWbGANhAW1WgHuIzsuys6ybiJNH3CzomqXgTzrpZRkb0iKeCmGnxZXtSFgsTkAiecozh8qWGEOGWDkEQoYyqfw2hkz6rXWnhOUzks3ZiSEQUTQXs0RoyRSMF8q7rtp69gsbtwj9gHEuTAqv6YEoFUIqySsk2NfbAefJ63jPkfksZjpDWr19KFAHuZvC5JSmJtyqlhiSQw6IkSyuCp+ILoZYARfYZSjyphpUI26Xe9Z6cNo42N7xXG5ufXi/vBv264LzHnhNB3PePsOB+cjnPo4NrT2lZt7ZX/17/Wf9V/1C5+eGMU+cyqr/+Qtf4ih5</latexit>

ˆ y

<latexit sha1_base64="tmRzwOs7SiKi8INixRl4Jca9kY=">AFB3iclVNb9MwFE5Lga1cdnvkxTBNa9lFzUCl0kTIQ0TRSJXdBcKsc5a3ZcWQ726rgH8Cv4Q3xys/gmT+Ck3RVsw4kLMU+Of7Od+5OEg406bT+VWr32rcvnN3br57/6DhwuLS8tHWqaKwiGVXKqTgGjgLIZDwyHk0QBEQGH4+DsdX5/fA5KMxl/NKMEeoIMYhYxSoxT9ZdqX5pruKtZCwciu7CbKD9Hto12EdapwJwJZnQ/E7u+/ZwdbPg2B2AzBEMswjSUpjCJ7MTDqC/yLRNbvsT7eZat1V6xYKFqIQVASJFaIbhMmndRKFtswqFQr3ugrIYUHMkBKefbqK27Z/cbSe/rvJMp9GxCrf0tv93aRxvobRs18ZCYkp21uUuiBoJc9icqzOUAXdUATRehv7ja2e4UC80K/lhY9car21+qKxKmgqIDeVE61O/k5heRpRhlINt4lRDQugZGcCpE2MiQPeyYk4sWnOaEVSuS82qNBOW2REaD0SgUPm9dX73LlTXenqYle9jIWJ6mBmJaBopQjI1E+dChkCqjhIycQqpjiuiQuO4bN5quN1NhsDPwVQToaKX6aiIXqEUCFs1viwTbWIFMVxQKQSJw6cZjohgfBRCRFJu8rGKruSb6rUZnrNEj0s3cnBYKnYgMWEc4gMzreq2h1Dg4u9id+Aa5CA8f6fQKGKkck3I2rGvYAD/GufgvJIsnSCdW08oKAi6ZvC4ygTiz5avgUgMOBkqmSYXwjH1B1DkgkWtDiYeqWYlwU+pfn8lZ4Whn23+2vfPh+ereq/G8znmPvCdey/O9F96e987reocerf2uz9eX6yuNr41vje+NHyW0XhvbrHiV1fj5BxU+uaY=</latexit>
slide-35
SLIDE 35

Conditional random fields (CRFs)

7

■ Decoding: Direct application of Viterbi!

= argmax

y M+1

X

m=1

ψ(w, ym, ym−1, m)

<latexit sha1_base64="DlaVeuBnBAV5S8rb4kmojlkndes=">AFvniclVPdbtMwFM4KhRH+NrjkxjBVa9mPmg0JbiZNgBDSNFEk9oPmUjmO01qz48h2tlZWXoSn4RbegLfBTtquWQtiluqcHn/nO985Pg5TRpVut38v1W7drt+5u3zPv/g4aPHK6tPjpXIJCZHWDAhT0OkCKMJOdJUM3KaSoJ4yMhJeP7OnZ9cEKmoSL7oUq6HPUTGlOMtHX1Vmu7fgN2FG3CkJvLfBO47yhvgT0AVcYho5xq1TN8L8i/mcONIHcAqAdEoxAHAldhMT5lGHU424zfCuw/3jLb3SaJSvkNAIlrEgQS4QNJMO0uUhCKzcVCYV73SalCYAc6QFGzHyd5G3lfyFad0w3LjKdQc8X1DzYClrNA7ABPrSA5R4gXcrLc8eJZJ+jYW/qgkz0waQJYLYLfmMRfCJ1NJbq2rVsNy4IVdc61dk/nxSR/zfDeitrLW328UC80YwNta8erYMZMwEjJNGYIaXOgnaquwZJTEjuQ8zRVKEz1GfnFkzQZyorinGOwcN64lALKT9JRoU3tkIg7hSIx5apOuCun7mnIvOzjIdv+kamqSZJgkuE8UZA1oA91ZARCXBmo2sgbCkVivA2SHVtsXZa9Js2AsAuiq4Vg3jUqLrJXJIU8rwYPy0J9KElCLrHgHCXRSwNjxCkbRSRGdPuNcQTe1G/NqMLmqpx6aUjGgoJO3TBDFGYg3dVnXbz0DYvfhe2IvSJDq/pTSiTSQlol5bTk9sL68Dl05r+QNJkirVktyxQCbDGuLyIlicnL2WVCERj2pcjSiuC5+EKoJUCxvYST6phJcJOaXB9JueN453tYHd75/Ortf2343ld9p5L7ymF3ivX3vo9fxjxc+17UftZ+1Xfr8d1XhcltLY0jnqVZ9+AfhCPx+</latexit>

= sm(ym, ym−1)

<latexit sha1_base64="HUJof5rs93v8xuaMgJAQitvYE=">AGNXiclVPbhMxEN02BEq4tfDIi6GKmtCLsgWpCKlSuQghVRVBohdUp5HX602s2uV7W0bWf4hfoBv4YE3xCu/gHc3SbNQNRSvJPxzJnj45kgYVTpVuv73HzlRvXmrYXbtTt3791/sLj08ECJVGKyjwUT8ihAijAak31NSNHiSIB4wcBqdvs/PDMyIVFfFnPUhIh6NeTCOKkXau7tL81odthVtwICbc7sGsu/ANsE2gCrlkFOteoavu3bE7O36tsAOo+0cgCiEOh85TIjhEGXZ5thq/7h9v1urtRoEKOQ1BEZYXiCTCBpKLpDGLQtOaEoXcveK0hAjnQfI2a+jOo27V+AVjKka18ymYievlBjd91vNnbBKnjfBA67j3RBz9oME8keRxfdsQsy0QMjEcCkCrX6rPAR1cGQaibXeo5ybUEusVYuwWZVzZD/X4Ft1eWNSa/DhBNwXeO/2nOg06IXJ7n0Lv5aVbuLy62NVr7AtOEPjWVvuNquyUMBU45iTVmSKljv5XojkFSU8yIrcFUkQThU9Qjx86MESeqY/KpsqDuPCGIhHS/WIPcO5lhEFdqwAMXmYmvrp5lzlnx6mOXnYMjZNUkxgXhaKUAS1ANqIgpJgzQbOQFhSxXgPnKzot0gO50nyvQJOyO6fBHMO0ZFefUSpYDbcvJFcdEalCQm51hwjuLwmYER4pQNQhKhlOlsCKORPUuvtfCMJmo3RiSEQ2FpD0aI8ZIpG2ld3u09cw32vwHXEPJMmeY/0xIRJpIR2Tokete7AefJL1i/1XJI3Hkc4sX8vkBNxlMl1EQmJji5FhQhEY9KRIkxLhqfycqANAkXuGIp6U04oI16X+1Z6cNg42N/znG5ufXizvBn264L32HvqNTzf2/J2vA9e29v3cGWpslXZqbyufqv+qP6s/ipC5+eGOY+80qr+/gO+OCZh</latexit>

max

y1:M Ψ(w, y1:M) = max y1:M M+1

X

m=1

sm(ym, ym−1)

<latexit sha1_base64="cFZeFhMRSODTiOyXYHFD+C/9s=">AGOHiclVPdbtMwFM5WCiP8bcAdN4apWst+1Awk0KRJE5sQ0jRJPaD5hI5jtNas5PIdrZVlh+JF+BNuOMOcsT4CRt16wFaZbqnB5/5zufj8JUkalard/zM3XbtVv31m46967/+Dho8Wlx0cyQmhzhiTgJkCSMxuRQUcXISoI4gEjx8HZbn5+fE6EpEn8WQ1S0uWoF9OIYqSsy1+a/+Y2YEfSJgy4vjBrIP8OTAtsAygzDhnlVElf823PfNUHq57JAVD1iUIGQBwmqgiJzJh4PN803zds/94y210miUr5DQEJaxIEAmENSXaXOWhJbRFQmFe8UmpTGAHKk+Rkx/GeVtmX8QreRMN75kOoGevlBzf91rNfBKnjfApa7j1Qpz5icE4keR5f+2AVZ0gOjIoDJKriNWfCR1MFQal6u9YLlxgW54lq5IpuVNWe+QW2pc+bk+6WCyfofO1tHVjS6aKXJ0XpZ+CviZhK4i8utzfaxQLThjc0lp3h6tgGFzBMcMZJrDBDUp567VR1NRKYkaMCzNJUoTPUI+cWjNGnMiuLgbLgIb1hCBKhP3FChTeyQiNuJQDHlhkXn95/Sx3zjo7zVT0tqtpnGaKxLhMFGUMqATkUwpCKghWbGANhAW1WgHuIzsuys6ybiJNH3CzomqXgTzrpZRkb0iKeCmGnxZXtSFgsTkAiecozh8qWGEOGWDkEQoYyqfw2hkz6rXWnhOUzks3ZiSEQUTQXs0RoyRSMF8q7rtp69gsbtwj9gHEuTAqv6YEoFUIqySsk2NfbAefJ63jPkfksZjpDWr19KFAHuZvC5JSmJtyqlhiSQw6IkSyuCp+ILoZYARfYZSjyphpUI26Xe9Z6cNo42N7xXG5ufXi/vBv264LzHnhNB3PePsOB+cjnPo4NrT2lZt7ZX/17/Wf9V/1C5+eGMU+cyqr/+Qtf4ih5</latexit>

ˆ y

<latexit sha1_base64="tmRzwOs7SiKi8INixRl4Jca9kY=">AFB3iclVNb9MwFE5Lga1cdnvkxTBNa9lFzUCl0kTIQ0TRSJXdBcKsc5a3ZcWQ726rgH8Cv4Q3xys/gmT+Ck3RVsw4kLMU+Of7Od+5OEg406bT+VWr32rcvnN3br57/6DhwuLS8tHWqaKwiGVXKqTgGjgLIZDwyHk0QBEQGH4+DsdX5/fA5KMxl/NKMEeoIMYhYxSoxT9ZdqX5pruKtZCwciu7CbKD9Hto12EdapwJwJZnQ/E7u+/ZwdbPg2B2AzBEMswjSUpjCJ7MTDqC/yLRNbvsT7eZat1V6xYKFqIQVASJFaIbhMmndRKFtswqFQr3ugrIYUHMkBKefbqK27Z/cbSe/rvJMp9GxCrf0tv93aRxvobRs18ZCYkp21uUuiBoJc9icqzOUAXdUATRehv7ja2e4UC80K/lhY9car21+qKxKmgqIDeVE61O/k5heRpRhlINt4lRDQugZGcCpE2MiQPeyYk4sWnOaEVSuS82qNBOW2REaD0SgUPm9dX73LlTXenqYle9jIWJ6mBmJaBopQjI1E+dChkCqjhIycQqpjiuiQuO4bN5quN1NhsDPwVQToaKX6aiIXqEUCFs1viwTbWIFMVxQKQSJw6cZjohgfBRCRFJu8rGKruSb6rUZnrNEj0s3cnBYKnYgMWEc4gMzreq2h1Dg4u9id+Aa5CA8f6fQKGKkck3I2rGvYAD/GufgvJIsnSCdW08oKAi6ZvC4ygTiz5avgUgMOBkqmSYXwjH1B1DkgkWtDiYeqWYlwU+pfn8lZ4Whn23+2vfPh+ereq/G8znmPvCdey/O9F96e987reocerf2uz9eX6yuNr41vje+NHyW0XhvbrHiV1fj5BxU+uaY=</latexit>

score of best tag sequence

  • f length M
slide-36
SLIDE 36

Conditional random fields (CRFs)

7

■ Decoding: Direct application of Viterbi!

= argmax

y M+1

X

m=1

ψ(w, ym, ym−1, m)

<latexit sha1_base64="DlaVeuBnBAV5S8rb4kmojlkndes=">AFvniclVPdbtMwFM4KhRH+NrjkxjBVa9mPmg0JbiZNgBDSNFEk9oPmUjmO01qz48h2tlZWXoSn4RbegLfBTtquWQtiluqcHn/nO985Pg5TRpVut38v1W7drt+5u3zPv/g4aPHK6tPjpXIJCZHWDAhT0OkCKMJOdJUM3KaSoJ4yMhJeP7OnZ9cEKmoSL7oUq6HPUTGlOMtHX1Vmu7fgN2FG3CkJvLfBO47yhvgT0AVcYho5xq1TN8L8i/mcONIHcAqAdEoxAHAldhMT5lGHU424zfCuw/3jLb3SaJSvkNAIlrEgQS4QNJMO0uUhCKzcVCYV73SalCYAc6QFGzHyd5G3lfyFad0w3LjKdQc8X1DzYClrNA7ABPrSA5R4gXcrLc8eJZJ+jYW/qgkz0waQJYLYLfmMRfCJ1NJbq2rVsNy4IVdc61dk/nxSR/zfDeitrLW328UC80YwNta8erYMZMwEjJNGYIaXOgnaquwZJTEjuQ8zRVKEz1GfnFkzQZyorinGOwcN64lALKT9JRoU3tkIg7hSIx5apOuCun7mnIvOzjIdv+kamqSZJgkuE8UZA1oA91ZARCXBmo2sgbCkVivA2SHVtsXZa9Js2AsAuiq4Vg3jUqLrJXJIU8rwYPy0J9KElCLrHgHCXRSwNjxCkbRSRGdPuNcQTe1G/NqMLmqpx6aUjGgoJO3TBDFGYg3dVnXbz0DYvfhe2IvSJDq/pTSiTSQlol5bTk9sL68Dl05r+QNJkirVktyxQCbDGuLyIlicnL2WVCERj2pcjSiuC5+EKoJUCxvYST6phJcJOaXB9JueN453tYHd75/Ortf2343ld9p5L7ymF3ivX3vo9fxjxc+17UftZ+1Xfr8d1XhcltLY0jnqVZ9+AfhCPx+</latexit>

= sm(ym, ym−1)

<latexit sha1_base64="HUJof5rs93v8xuaMgJAQitvYE=">AGNXiclVPbhMxEN02BEq4tfDIi6GKmtCLsgWpCKlSuQghVRVBohdUp5HX602s2uV7W0bWf4hfoBv4YE3xCu/gHc3SbNQNRSvJPxzJnj45kgYVTpVuv73HzlRvXmrYXbtTt3791/sLj08ECJVGKyjwUT8ihAijAak31NSNHiSIB4wcBqdvs/PDMyIVFfFnPUhIh6NeTCOKkXau7tL81odthVtwICbc7sGsu/ANsE2gCrlkFOteoavu3bE7O36tsAOo+0cgCiEOh85TIjhEGXZ5thq/7h9v1urtRoEKOQ1BEZYXiCTCBpKLpDGLQtOaEoXcveK0hAjnQfI2a+jOo27V+AVjKka18ymYievlBjd91vNnbBKnjfBA67j3RBz9oME8keRxfdsQsy0QMjEcCkCrX6rPAR1cGQaibXeo5ybUEusVYuwWZVzZD/X4Ft1eWNSa/DhBNwXeO/2nOg06IXJ7n0Lv5aVbuLy62NVr7AtOEPjWVvuNquyUMBU45iTVmSKljv5XojkFSU8yIrcFUkQThU9Qjx86MESeqY/KpsqDuPCGIhHS/WIPcO5lhEFdqwAMXmYmvrp5lzlnx6mOXnYMjZNUkxgXhaKUAS1ANqIgpJgzQbOQFhSxXgPnKzot0gO50nyvQJOyO6fBHMO0ZFefUSpYDbcvJFcdEalCQm51hwjuLwmYER4pQNQhKhlOlsCKORPUuvtfCMJmo3RiSEQ2FpD0aI8ZIpG2ld3u09cw32vwHXEPJMmeY/0xIRJpIR2Tokete7AefJL1i/1XJI3Hkc4sX8vkBNxlMl1EQmJji5FhQhEY9KRIkxLhqfycqANAkXuGIp6U04oI16X+1Z6cNg42N/znG5ufXizvBn264L32HvqNTzf2/J2vA9e29v3cGWpslXZqbyufqv+qP6s/ipC5+eGOY+80qr+/gO+OCZh</latexit>

max

y1:M Ψ(w, y1:M) = max y1:M M+1

X

m=1

sm(ym, ym−1)

<latexit sha1_base64="cFZeFhMRSODTiOyXYHFD+C/9s=">AGOHiclVPdbtMwFM5WCiP8bcAdN4apWst+1Awk0KRJE5sQ0jRJPaD5hI5jtNas5PIdrZVlh+JF+BNuOMOcsT4CRt16wFaZbqnB5/5zufj8JUkalard/zM3XbtVv31m46967/+Dho8Wlx0cyQmhzhiTgJkCSMxuRQUcXISoI4gEjx8HZbn5+fE6EpEn8WQ1S0uWoF9OIYqSsy1+a/+Y2YEfSJgy4vjBrIP8OTAtsAygzDhnlVElf823PfNUHq57JAVD1iUIGQBwmqgiJzJh4PN803zds/94y210miUr5DQEJaxIEAmENSXaXOWhJbRFQmFe8UmpTGAHKk+Rkx/GeVtmX8QreRMN75kOoGevlBzf91rNfBKnjfApa7j1Qpz5icE4keR5f+2AVZ0gOjIoDJKriNWfCR1MFQal6u9YLlxgW54lq5IpuVNWe+QW2pc+bk+6WCyfofO1tHVjS6aKXJ0XpZ+CviZhK4i8utzfaxQLThjc0lp3h6tgGFzBMcMZJrDBDUp567VR1NRKYkaMCzNJUoTPUI+cWjNGnMiuLgbLgIb1hCBKhP3FChTeyQiNuJQDHlhkXn95/Sx3zjo7zVT0tqtpnGaKxLhMFGUMqATkUwpCKghWbGANhAW1WgHuIzsuys6ybiJNH3CzomqXgTzrpZRkb0iKeCmGnxZXtSFgsTkAiecozh8qWGEOGWDkEQoYyqfw2hkz6rXWnhOUzks3ZiSEQUTQXs0RoyRSMF8q7rtp69gsbtwj9gHEuTAqv6YEoFUIqySsk2NfbAefJ63jPkfksZjpDWr19KFAHuZvC5JSmJtyqlhiSQw6IkSyuCp+ILoZYARfYZSjyphpUI26Xe9Z6cNo42N7xXG5ufXi/vBv264LzHnhNB3PePsOB+cjnPo4NrT2lZt7ZX/17/Wf9V/1C5+eGMU+cyqr/+Qtf4ih5</latexit>

ˆ y

<latexit sha1_base64="tmRzwOs7SiKi8INixRl4Jca9kY=">AFB3iclVNb9MwFE5Lga1cdnvkxTBNa9lFzUCl0kTIQ0TRSJXdBcKsc5a3ZcWQ726rgH8Cv4Q3xys/gmT+Ck3RVsw4kLMU+Of7Od+5OEg406bT+VWr32rcvnN3br57/6DhwuLS8tHWqaKwiGVXKqTgGjgLIZDwyHk0QBEQGH4+DsdX5/fA5KMxl/NKMEeoIMYhYxSoxT9ZdqX5pruKtZCwciu7CbKD9Hto12EdapwJwJZnQ/E7u+/ZwdbPg2B2AzBEMswjSUpjCJ7MTDqC/yLRNbvsT7eZat1V6xYKFqIQVASJFaIbhMmndRKFtswqFQr3ugrIYUHMkBKefbqK27Z/cbSe/rvJMp9GxCrf0tv93aRxvobRs18ZCYkp21uUuiBoJc9icqzOUAXdUATRehv7ja2e4UC80K/lhY9car21+qKxKmgqIDeVE61O/k5heRpRhlINt4lRDQugZGcCpE2MiQPeyYk4sWnOaEVSuS82qNBOW2REaD0SgUPm9dX73LlTXenqYle9jIWJ6mBmJaBopQjI1E+dChkCqjhIycQqpjiuiQuO4bN5quN1NhsDPwVQToaKX6aiIXqEUCFs1viwTbWIFMVxQKQSJw6cZjohgfBRCRFJu8rGKruSb6rUZnrNEj0s3cnBYKnYgMWEc4gMzreq2h1Dg4u9id+Aa5CA8f6fQKGKkck3I2rGvYAD/GufgvJIsnSCdW08oKAi6ZvC4ygTiz5avgUgMOBkqmSYXwjH1B1DkgkWtDiYeqWYlwU+pfn8lZ4Whn23+2vfPh+ereq/G8znmPvCdey/O9F96e987reocerf2uz9eX6yuNr41vje+NHyW0XhvbrHiV1fj5BxU+uaY=</latexit>

score of best tag sequence

  • f length M

= max

yM sM+1(h/si, yM) + max y1:M−1 M

X

m=1

sm(ym, ym−1)

<latexit sha1_base64="WGAjEtshRFBM14UTBXAPpJKzjY=">AGpHiclVRdb9MwFM1WKN8bfCEeDFMZS37oBlIKRJEyCENCaKoNvQ3EWO46TW7CSynW1VFIlfwyv8Hf4NdtJ2TVOQFqnO7fW5xf23FjRqXqdP4sLNauXa/fWLrZuHX7zt17yv3D2SUCEx6OGKROHKRJIyGpKeoYuQoFgRxl5FD9/SdmT8I0LSKPymhjHpcxSE1KcYKZ1yVmoPG03YlbQFXZ6eZxvAvIdZG+wAKBMOGeVUSflO3Z2ku6v25kBQDUgCmUAYi9SeYmfTRiGDjdDyjdt/Y+3G81uq2CFnHqgOUCvkA4heQibs2z0M7SkoU8vaZFaQgR2qAEUu/j3Xb2T+I1gzTlRcZT6GrC2rtbdrt1h5YBx/aQHMPkCrsZnhRCLg6MKZpCLAjBuApjuQqM5Dz62OhxZNe3azFmu3JBLrVLsnmqhvkKHdiRDm9Np3UOTvE5qf1mX7NWu17M5L2fg59xUVWZLdNFuhUG24IMhQEjzyUeZBXGan1qpTmKotVpZzl1c5WJ39ANbBHwao1erOyqKAXoQTkKFGZLy2O7Eqp8ioShmJGvARJIY4VMUkGMdhogT2U/zS5yBps54wI+E/oUK5NnpihRxKYfc1Uiz13J2ziTnzR0nyn/dT2kYJ4qEuBDyEwZUBMwXAXhUEKzYUAcIC6q9AjxA+moq/d3QuzolMyDsjKjyQjDvp9LP1UuWXJ6Viy+KhTagICE5xHnKPSepdBHnLKhR3yUMGXuvD+O5/VrwzujsRy1bkLJiIKRoAENEWPEV9AM5bR+DRTMxwZ8T/QGCbKvX+OiUAqEtpJcSUyvWEBfGyOTPY/JA0nSB2Wl5XmBvRiTF+imIRpVtxQFkC3UBESVwyXKnPjWoC5OtKPCkXFYg9Cm1Z89kNTjY3rJfbG1/ebm6+3Z0XpesR9YTq2XZ1itr1/poda2ehWs/aj9rv2q/60/rn+pf670CurgwqnlglZ76yV/XA02e</latexit>
slide-37
SLIDE 37

Conditional random fields (CRFs)

7

■ Decoding: Direct application of Viterbi!

= argmax

y M+1

X

m=1

ψ(w, ym, ym−1, m)

<latexit sha1_base64="DlaVeuBnBAV5S8rb4kmojlkndes=">AFvniclVPdbtMwFM4KhRH+NrjkxjBVa9mPmg0JbiZNgBDSNFEk9oPmUjmO01qz48h2tlZWXoSn4RbegLfBTtquWQtiluqcHn/nO985Pg5TRpVut38v1W7drt+5u3zPv/g4aPHK6tPjpXIJCZHWDAhT0OkCKMJOdJUM3KaSoJ4yMhJeP7OnZ9cEKmoSL7oUq6HPUTGlOMtHX1Vmu7fgN2FG3CkJvLfBO47yhvgT0AVcYho5xq1TN8L8i/mcONIHcAqAdEoxAHAldhMT5lGHU424zfCuw/3jLb3SaJSvkNAIlrEgQS4QNJMO0uUhCKzcVCYV73SalCYAc6QFGzHyd5G3lfyFad0w3LjKdQc8X1DzYClrNA7ABPrSA5R4gXcrLc8eJZJ+jYW/qgkz0waQJYLYLfmMRfCJ1NJbq2rVsNy4IVdc61dk/nxSR/zfDeitrLW328UC80YwNta8erYMZMwEjJNGYIaXOgnaquwZJTEjuQ8zRVKEz1GfnFkzQZyorinGOwcN64lALKT9JRoU3tkIg7hSIx5apOuCun7mnIvOzjIdv+kamqSZJgkuE8UZA1oA91ZARCXBmo2sgbCkVivA2SHVtsXZa9Js2AsAuiq4Vg3jUqLrJXJIU8rwYPy0J9KElCLrHgHCXRSwNjxCkbRSRGdPuNcQTe1G/NqMLmqpx6aUjGgoJO3TBDFGYg3dVnXbz0DYvfhe2IvSJDq/pTSiTSQlol5bTk9sL68Dl05r+QNJkirVktyxQCbDGuLyIlicnL2WVCERj2pcjSiuC5+EKoJUCxvYST6phJcJOaXB9JueN453tYHd75/Ortf2343ld9p5L7ymF3ivX3vo9fxjxc+17UftZ+1Xfr8d1XhcltLY0jnqVZ9+AfhCPx+</latexit>

= sm(ym, ym−1)

<latexit sha1_base64="HUJof5rs93v8xuaMgJAQitvYE=">AGNXiclVPbhMxEN02BEq4tfDIi6GKmtCLsgWpCKlSuQghVRVBohdUp5HX602s2uV7W0bWf4hfoBv4YE3xCu/gHc3SbNQNRSvJPxzJnj45kgYVTpVuv73HzlRvXmrYXbtTt3791/sLj08ECJVGKyjwUT8ihAijAak31NSNHiSIB4wcBqdvs/PDMyIVFfFnPUhIh6NeTCOKkXau7tL81odthVtwICbc7sGsu/ANsE2gCrlkFOteoavu3bE7O36tsAOo+0cgCiEOh85TIjhEGXZ5thq/7h9v1urtRoEKOQ1BEZYXiCTCBpKLpDGLQtOaEoXcveK0hAjnQfI2a+jOo27V+AVjKka18ymYievlBjd91vNnbBKnjfBA67j3RBz9oME8keRxfdsQsy0QMjEcCkCrX6rPAR1cGQaibXeo5ybUEusVYuwWZVzZD/X4Ft1eWNSa/DhBNwXeO/2nOg06IXJ7n0Lv5aVbuLy62NVr7AtOEPjWVvuNquyUMBU45iTVmSKljv5XojkFSU8yIrcFUkQThU9Qjx86MESeqY/KpsqDuPCGIhHS/WIPcO5lhEFdqwAMXmYmvrp5lzlnx6mOXnYMjZNUkxgXhaKUAS1ANqIgpJgzQbOQFhSxXgPnKzot0gO50nyvQJOyO6fBHMO0ZFefUSpYDbcvJFcdEalCQm51hwjuLwmYER4pQNQhKhlOlsCKORPUuvtfCMJmo3RiSEQ2FpD0aI8ZIpG2ld3u09cw32vwHXEPJMmeY/0xIRJpIR2Tokete7AefJL1i/1XJI3Hkc4sX8vkBNxlMl1EQmJji5FhQhEY9KRIkxLhqfycqANAkXuGIp6U04oI16X+1Z6cNg42N/znG5ufXizvBn264L32HvqNTzf2/J2vA9e29v3cGWpslXZqbyufqv+qP6s/ipC5+eGOY+80qr+/gO+OCZh</latexit>

max

y1:M Ψ(w, y1:M) = max y1:M M+1

X

m=1

sm(ym, ym−1)

<latexit sha1_base64="cFZeFhMRSODTiOyXYHFD+C/9s=">AGOHiclVPdbtMwFM5WCiP8bcAdN4apWst+1Awk0KRJE5sQ0jRJPaD5hI5jtNas5PIdrZVlh+JF+BNuOMOcsT4CRt16wFaZbqnB5/5zufj8JUkalard/zM3XbtVv31m46967/+Dho8Wlx0cyQmhzhiTgJkCSMxuRQUcXISoI4gEjx8HZbn5+fE6EpEn8WQ1S0uWoF9OIYqSsy1+a/+Y2YEfSJgy4vjBrIP8OTAtsAygzDhnlVElf823PfNUHq57JAVD1iUIGQBwmqgiJzJh4PN803zds/94y210miUr5DQEJaxIEAmENSXaXOWhJbRFQmFe8UmpTGAHKk+Rkx/GeVtmX8QreRMN75kOoGevlBzf91rNfBKnjfApa7j1Qpz5icE4keR5f+2AVZ0gOjIoDJKriNWfCR1MFQal6u9YLlxgW54lq5IpuVNWe+QW2pc+bk+6WCyfofO1tHVjS6aKXJ0XpZ+CviZhK4i8utzfaxQLThjc0lp3h6tgGFzBMcMZJrDBDUp567VR1NRKYkaMCzNJUoTPUI+cWjNGnMiuLgbLgIb1hCBKhP3FChTeyQiNuJQDHlhkXn95/Sx3zjo7zVT0tqtpnGaKxLhMFGUMqATkUwpCKghWbGANhAW1WgHuIzsuys6ybiJNH3CzomqXgTzrpZRkb0iKeCmGnxZXtSFgsTkAiecozh8qWGEOGWDkEQoYyqfw2hkz6rXWnhOUzks3ZiSEQUTQXs0RoyRSMF8q7rtp69gsbtwj9gHEuTAqv6YEoFUIqySsk2NfbAefJ63jPkfksZjpDWr19KFAHuZvC5JSmJtyqlhiSQw6IkSyuCp+ILoZYARfYZSjyphpUI26Xe9Z6cNo42N7xXG5ufXi/vBv264LzHnhNB3PePsOB+cjnPo4NrT2lZt7ZX/17/Wf9V/1C5+eGMU+cyqr/+Qtf4ih5</latexit>

ˆ y

<latexit sha1_base64="tmRzwOs7SiKi8INixRl4Jca9kY=">AFB3iclVNb9MwFE5Lga1cdnvkxTBNa9lFzUCl0kTIQ0TRSJXdBcKsc5a3ZcWQ726rgH8Cv4Q3xys/gmT+Ck3RVsw4kLMU+Of7Od+5OEg406bT+VWr32rcvnN3br57/6DhwuLS8tHWqaKwiGVXKqTgGjgLIZDwyHk0QBEQGH4+DsdX5/fA5KMxl/NKMEeoIMYhYxSoxT9ZdqX5pruKtZCwciu7CbKD9Hto12EdapwJwJZnQ/E7u+/ZwdbPg2B2AzBEMswjSUpjCJ7MTDqC/yLRNbvsT7eZat1V6xYKFqIQVASJFaIbhMmndRKFtswqFQr3ugrIYUHMkBKefbqK27Z/cbSe/rvJMp9GxCrf0tv93aRxvobRs18ZCYkp21uUuiBoJc9icqzOUAXdUATRehv7ja2e4UC80K/lhY9car21+qKxKmgqIDeVE61O/k5heRpRhlINt4lRDQugZGcCpE2MiQPeyYk4sWnOaEVSuS82qNBOW2REaD0SgUPm9dX73LlTXenqYle9jIWJ6mBmJaBopQjI1E+dChkCqjhIycQqpjiuiQuO4bN5quN1NhsDPwVQToaKX6aiIXqEUCFs1viwTbWIFMVxQKQSJw6cZjohgfBRCRFJu8rGKruSb6rUZnrNEj0s3cnBYKnYgMWEc4gMzreq2h1Dg4u9id+Aa5CA8f6fQKGKkck3I2rGvYAD/GufgvJIsnSCdW08oKAi6ZvC4ygTiz5avgUgMOBkqmSYXwjH1B1DkgkWtDiYeqWYlwU+pfn8lZ4Whn23+2vfPh+ereq/G8znmPvCdey/O9F96e987reocerf2uz9eX6yuNr41vje+NHyW0XhvbrHiV1fj5BxU+uaY=</latexit>

score of best tag sequence

  • f length M

= max

yM sM+1(h/si, yM) + max y1:M−1 M

X

m=1

sm(ym, ym−1)

<latexit sha1_base64="WGAjEtshRFBM14UTBXAPpJKzjY=">AGpHiclVRdb9MwFM1WKN8bfCEeDFMZS37oBlIKRJEyCENCaKoNvQ3EWO46TW7CSynW1VFIlfwyv8Hf4NdtJ2TVOQFqnO7fW5xf23FjRqXqdP4sLNauXa/fWLrZuHX7zt17yv3D2SUCEx6OGKROHKRJIyGpKeoYuQoFgRxl5FD9/SdmT8I0LSKPymhjHpcxSE1KcYKZ1yVmoPG03YlbQFXZ6eZxvAvIdZG+wAKBMOGeVUSflO3Z2ku6v25kBQDUgCmUAYi9SeYmfTRiGDjdDyjdt/Y+3G81uq2CFnHqgOUCvkA4heQibs2z0M7SkoU8vaZFaQgR2qAEUu/j3Xb2T+I1gzTlRcZT6GrC2rtbdrt1h5YBx/aQHMPkCrsZnhRCLg6MKZpCLAjBuApjuQqM5Dz62OhxZNe3azFmu3JBLrVLsnmqhvkKHdiRDm9Np3UOTvE5qf1mX7NWu17M5L2fg59xUVWZLdNFuhUG24IMhQEjzyUeZBXGan1qpTmKotVpZzl1c5WJ39ANbBHwao1erOyqKAXoQTkKFGZLy2O7Eqp8ioShmJGvARJIY4VMUkGMdhogT2U/zS5yBps54wI+E/oUK5NnpihRxKYfc1Uiz13J2ziTnzR0nyn/dT2kYJ4qEuBDyEwZUBMwXAXhUEKzYUAcIC6q9AjxA+moq/d3QuzolMyDsjKjyQjDvp9LP1UuWXJ6Viy+KhTagICE5xHnKPSepdBHnLKhR3yUMGXuvD+O5/VrwzujsRy1bkLJiIKRoAENEWPEV9AM5bR+DRTMxwZ8T/QGCbKvX+OiUAqEtpJcSUyvWEBfGyOTPY/JA0nSB2Wl5XmBvRiTF+imIRpVtxQFkC3UBESVwyXKnPjWoC5OtKPCkXFYg9Cm1Z89kNTjY3rJfbG1/ebm6+3Z0XpesR9YTq2XZ1itr1/poda2ehWs/aj9rv2q/60/rn+pf670CurgwqnlglZ76yV/XA02e</latexit>

score of best tag sequence

  • f length M-1
slide-38
SLIDE 38

Conditional random fields (CRFs)

7

■ Decoding: Direct application of Viterbi!

= argmax

y M+1

X

m=1

ψ(w, ym, ym−1, m)

<latexit sha1_base64="DlaVeuBnBAV5S8rb4kmojlkndes=">AFvniclVPdbtMwFM4KhRH+NrjkxjBVa9mPmg0JbiZNgBDSNFEk9oPmUjmO01qz48h2tlZWXoSn4RbegLfBTtquWQtiluqcHn/nO985Pg5TRpVut38v1W7drt+5u3zPv/g4aPHK6tPjpXIJCZHWDAhT0OkCKMJOdJUM3KaSoJ4yMhJeP7OnZ9cEKmoSL7oUq6HPUTGlOMtHX1Vmu7fgN2FG3CkJvLfBO47yhvgT0AVcYho5xq1TN8L8i/mcONIHcAqAdEoxAHAldhMT5lGHU424zfCuw/3jLb3SaJSvkNAIlrEgQS4QNJMO0uUhCKzcVCYV73SalCYAc6QFGzHyd5G3lfyFad0w3LjKdQc8X1DzYClrNA7ABPrSA5R4gXcrLc8eJZJ+jYW/qgkz0waQJYLYLfmMRfCJ1NJbq2rVsNy4IVdc61dk/nxSR/zfDeitrLW328UC80YwNta8erYMZMwEjJNGYIaXOgnaquwZJTEjuQ8zRVKEz1GfnFkzQZyorinGOwcN64lALKT9JRoU3tkIg7hSIx5apOuCun7mnIvOzjIdv+kamqSZJgkuE8UZA1oA91ZARCXBmo2sgbCkVivA2SHVtsXZa9Js2AsAuiq4Vg3jUqLrJXJIU8rwYPy0J9KElCLrHgHCXRSwNjxCkbRSRGdPuNcQTe1G/NqMLmqpx6aUjGgoJO3TBDFGYg3dVnXbz0DYvfhe2IvSJDq/pTSiTSQlol5bTk9sL68Dl05r+QNJkirVktyxQCbDGuLyIlicnL2WVCERj2pcjSiuC5+EKoJUCxvYST6phJcJOaXB9JueN453tYHd75/Ortf2343ld9p5L7ymF3ivX3vo9fxjxc+17UftZ+1Xfr8d1XhcltLY0jnqVZ9+AfhCPx+</latexit>

= sm(ym, ym−1)

<latexit sha1_base64="HUJof5rs93v8xuaMgJAQitvYE=">AGNXiclVPbhMxEN02BEq4tfDIi6GKmtCLsgWpCKlSuQghVRVBohdUp5HX602s2uV7W0bWf4hfoBv4YE3xCu/gHc3SbNQNRSvJPxzJnj45kgYVTpVuv73HzlRvXmrYXbtTt3791/sLj08ECJVGKyjwUT8ihAijAak31NSNHiSIB4wcBqdvs/PDMyIVFfFnPUhIh6NeTCOKkXau7tL81odthVtwICbc7sGsu/ANsE2gCrlkFOteoavu3bE7O36tsAOo+0cgCiEOh85TIjhEGXZ5thq/7h9v1urtRoEKOQ1BEZYXiCTCBpKLpDGLQtOaEoXcveK0hAjnQfI2a+jOo27V+AVjKka18ymYievlBjd91vNnbBKnjfBA67j3RBz9oME8keRxfdsQsy0QMjEcCkCrX6rPAR1cGQaibXeo5ybUEusVYuwWZVzZD/X4Ft1eWNSa/DhBNwXeO/2nOg06IXJ7n0Lv5aVbuLy62NVr7AtOEPjWVvuNquyUMBU45iTVmSKljv5XojkFSU8yIrcFUkQThU9Qjx86MESeqY/KpsqDuPCGIhHS/WIPcO5lhEFdqwAMXmYmvrp5lzlnx6mOXnYMjZNUkxgXhaKUAS1ANqIgpJgzQbOQFhSxXgPnKzot0gO50nyvQJOyO6fBHMO0ZFefUSpYDbcvJFcdEalCQm51hwjuLwmYER4pQNQhKhlOlsCKORPUuvtfCMJmo3RiSEQ2FpD0aI8ZIpG2ld3u09cw32vwHXEPJMmeY/0xIRJpIR2Tokete7AefJL1i/1XJI3Hkc4sX8vkBNxlMl1EQmJji5FhQhEY9KRIkxLhqfycqANAkXuGIp6U04oI16X+1Z6cNg42N/znG5ufXizvBn264L32HvqNTzf2/J2vA9e29v3cGWpslXZqbyufqv+qP6s/ipC5+eGOY+80qr+/gO+OCZh</latexit>

max

y1:M Ψ(w, y1:M) = max y1:M M+1

X

m=1

sm(ym, ym−1)

<latexit sha1_base64="cFZeFhMRSODTiOyXYHFD+C/9s=">AGOHiclVPdbtMwFM5WCiP8bcAdN4apWst+1Awk0KRJE5sQ0jRJPaD5hI5jtNas5PIdrZVlh+JF+BNuOMOcsT4CRt16wFaZbqnB5/5zufj8JUkalard/zM3XbtVv31m46967/+Dho8Wlx0cyQmhzhiTgJkCSMxuRQUcXISoI4gEjx8HZbn5+fE6EpEn8WQ1S0uWoF9OIYqSsy1+a/+Y2YEfSJgy4vjBrIP8OTAtsAygzDhnlVElf823PfNUHq57JAVD1iUIGQBwmqgiJzJh4PN803zds/94y210miUr5DQEJaxIEAmENSXaXOWhJbRFQmFe8UmpTGAHKk+Rkx/GeVtmX8QreRMN75kOoGevlBzf91rNfBKnjfApa7j1Qpz5icE4keR5f+2AVZ0gOjIoDJKriNWfCR1MFQal6u9YLlxgW54lq5IpuVNWe+QW2pc+bk+6WCyfofO1tHVjS6aKXJ0XpZ+CviZhK4i8utzfaxQLThjc0lp3h6tgGFzBMcMZJrDBDUp567VR1NRKYkaMCzNJUoTPUI+cWjNGnMiuLgbLgIb1hCBKhP3FChTeyQiNuJQDHlhkXn95/Sx3zjo7zVT0tqtpnGaKxLhMFGUMqATkUwpCKghWbGANhAW1WgHuIzsuys6ybiJNH3CzomqXgTzrpZRkb0iKeCmGnxZXtSFgsTkAiecozh8qWGEOGWDkEQoYyqfw2hkz6rXWnhOUzks3ZiSEQUTQXs0RoyRSMF8q7rtp69gsbtwj9gHEuTAqv6YEoFUIqySsk2NfbAefJ63jPkfksZjpDWr19KFAHuZvC5JSmJtyqlhiSQw6IkSyuCp+ILoZYARfYZSjyphpUI26Xe9Z6cNo42N7xXG5ufXi/vBv264LzHnhNB3PePsOB+cjnPo4NrT2lZt7ZX/17/Wf9V/1C5+eGMU+cyqr/+Qtf4ih5</latexit>

ˆ y

<latexit sha1_base64="tmRzwOs7SiKi8INixRl4Jca9kY=">AFB3iclVNb9MwFE5Lga1cdnvkxTBNa9lFzUCl0kTIQ0TRSJXdBcKsc5a3ZcWQ726rgH8Cv4Q3xys/gmT+Ck3RVsw4kLMU+Of7Od+5OEg406bT+VWr32rcvnN3br57/6DhwuLS8tHWqaKwiGVXKqTgGjgLIZDwyHk0QBEQGH4+DsdX5/fA5KMxl/NKMEeoIMYhYxSoxT9ZdqX5pruKtZCwciu7CbKD9Hto12EdapwJwJZnQ/E7u+/ZwdbPg2B2AzBEMswjSUpjCJ7MTDqC/yLRNbvsT7eZat1V6xYKFqIQVASJFaIbhMmndRKFtswqFQr3ugrIYUHMkBKefbqK27Z/cbSe/rvJMp9GxCrf0tv93aRxvobRs18ZCYkp21uUuiBoJc9icqzOUAXdUATRehv7ja2e4UC80K/lhY9car21+qKxKmgqIDeVE61O/k5heRpRhlINt4lRDQugZGcCpE2MiQPeyYk4sWnOaEVSuS82qNBOW2REaD0SgUPm9dX73LlTXenqYle9jIWJ6mBmJaBopQjI1E+dChkCqjhIycQqpjiuiQuO4bN5quN1NhsDPwVQToaKX6aiIXqEUCFs1viwTbWIFMVxQKQSJw6cZjohgfBRCRFJu8rGKruSb6rUZnrNEj0s3cnBYKnYgMWEc4gMzreq2h1Dg4u9id+Aa5CA8f6fQKGKkck3I2rGvYAD/GufgvJIsnSCdW08oKAi6ZvC4ygTiz5avgUgMOBkqmSYXwjH1B1DkgkWtDiYeqWYlwU+pfn8lZ4Whn23+2vfPh+ereq/G8znmPvCdey/O9F96e987reocerf2uz9eX6yuNr41vje+NHyW0XhvbrHiV1fj5BxU+uaY=</latexit>

score of best tag sequence

  • f length M

= max

yM sM+1(h/si, yM) + max y1:M−1 M

X

m=1

sm(ym, ym−1)

<latexit sha1_base64="WGAjEtshRFBM14UTBXAPpJKzjY=">AGpHiclVRdb9MwFM1WKN8bfCEeDFMZS37oBlIKRJEyCENCaKoNvQ3EWO46TW7CSynW1VFIlfwyv8Hf4NdtJ2TVOQFqnO7fW5xf23FjRqXqdP4sLNauXa/fWLrZuHX7zt17yv3D2SUCEx6OGKROHKRJIyGpKeoYuQoFgRxl5FD9/SdmT8I0LSKPymhjHpcxSE1KcYKZ1yVmoPG03YlbQFXZ6eZxvAvIdZG+wAKBMOGeVUSflO3Z2ku6v25kBQDUgCmUAYi9SeYmfTRiGDjdDyjdt/Y+3G81uq2CFnHqgOUCvkA4heQibs2z0M7SkoU8vaZFaQgR2qAEUu/j3Xb2T+I1gzTlRcZT6GrC2rtbdrt1h5YBx/aQHMPkCrsZnhRCLg6MKZpCLAjBuApjuQqM5Dz62OhxZNe3azFmu3JBLrVLsnmqhvkKHdiRDm9Np3UOTvE5qf1mX7NWu17M5L2fg59xUVWZLdNFuhUG24IMhQEjzyUeZBXGan1qpTmKotVpZzl1c5WJ39ANbBHwao1erOyqKAXoQTkKFGZLy2O7Eqp8ioShmJGvARJIY4VMUkGMdhogT2U/zS5yBps54wI+E/oUK5NnpihRxKYfc1Uiz13J2ziTnzR0nyn/dT2kYJ4qEuBDyEwZUBMwXAXhUEKzYUAcIC6q9AjxA+moq/d3QuzolMyDsjKjyQjDvp9LP1UuWXJ6Viy+KhTagICE5xHnKPSepdBHnLKhR3yUMGXuvD+O5/VrwzujsRy1bkLJiIKRoAENEWPEV9AM5bR+DRTMxwZ8T/QGCbKvX+OiUAqEtpJcSUyvWEBfGyOTPY/JA0nSB2Wl5XmBvRiTF+imIRpVtxQFkC3UBESVwyXKnPjWoC5OtKPCkXFYg9Cm1Z89kNTjY3rJfbG1/ebm6+3Z0XpesR9YTq2XZ1itr1/poda2ehWs/aj9rv2q/60/rn+pf670CurgwqnlglZ76yV/XA02e</latexit>

score of best tag sequence

  • f length M-1

score of most probable extension yM

slide-39
SLIDE 39

Conditional random fields (CRFs)

7

■ Decoding: Direct application of Viterbi!

= argmax

y M+1

X

m=1

ψ(w, ym, ym−1, m)

<latexit sha1_base64="DlaVeuBnBAV5S8rb4kmojlkndes=">AFvniclVPdbtMwFM4KhRH+NrjkxjBVa9mPmg0JbiZNgBDSNFEk9oPmUjmO01qz48h2tlZWXoSn4RbegLfBTtquWQtiluqcHn/nO985Pg5TRpVut38v1W7drt+5u3zPv/g4aPHK6tPjpXIJCZHWDAhT0OkCKMJOdJUM3KaSoJ4yMhJeP7OnZ9cEKmoSL7oUq6HPUTGlOMtHX1Vmu7fgN2FG3CkJvLfBO47yhvgT0AVcYho5xq1TN8L8i/mcONIHcAqAdEoxAHAldhMT5lGHU424zfCuw/3jLb3SaJSvkNAIlrEgQS4QNJMO0uUhCKzcVCYV73SalCYAc6QFGzHyd5G3lfyFad0w3LjKdQc8X1DzYClrNA7ABPrSA5R4gXcrLc8eJZJ+jYW/qgkz0waQJYLYLfmMRfCJ1NJbq2rVsNy4IVdc61dk/nxSR/zfDeitrLW328UC80YwNta8erYMZMwEjJNGYIaXOgnaquwZJTEjuQ8zRVKEz1GfnFkzQZyorinGOwcN64lALKT9JRoU3tkIg7hSIx5apOuCun7mnIvOzjIdv+kamqSZJgkuE8UZA1oA91ZARCXBmo2sgbCkVivA2SHVtsXZa9Js2AsAuiq4Vg3jUqLrJXJIU8rwYPy0J9KElCLrHgHCXRSwNjxCkbRSRGdPuNcQTe1G/NqMLmqpx6aUjGgoJO3TBDFGYg3dVnXbz0DYvfhe2IvSJDq/pTSiTSQlol5bTk9sL68Dl05r+QNJkirVktyxQCbDGuLyIlicnL2WVCERj2pcjSiuC5+EKoJUCxvYST6phJcJOaXB9JueN453tYHd75/Ortf2343ld9p5L7ymF3ivX3vo9fxjxc+17UftZ+1Xfr8d1XhcltLY0jnqVZ9+AfhCPx+</latexit>

= sm(ym, ym−1)

<latexit sha1_base64="HUJof5rs93v8xuaMgJAQitvYE=">AGNXiclVPbhMxEN02BEq4tfDIi6GKmtCLsgWpCKlSuQghVRVBohdUp5HX602s2uV7W0bWf4hfoBv4YE3xCu/gHc3SbNQNRSvJPxzJnj45kgYVTpVuv73HzlRvXmrYXbtTt3791/sLj08ECJVGKyjwUT8ihAijAak31NSNHiSIB4wcBqdvs/PDMyIVFfFnPUhIh6NeTCOKkXau7tL81odthVtwICbc7sGsu/ANsE2gCrlkFOteoavu3bE7O36tsAOo+0cgCiEOh85TIjhEGXZ5thq/7h9v1urtRoEKOQ1BEZYXiCTCBpKLpDGLQtOaEoXcveK0hAjnQfI2a+jOo27V+AVjKka18ymYievlBjd91vNnbBKnjfBA67j3RBz9oME8keRxfdsQsy0QMjEcCkCrX6rPAR1cGQaibXeo5ybUEusVYuwWZVzZD/X4Ft1eWNSa/DhBNwXeO/2nOg06IXJ7n0Lv5aVbuLy62NVr7AtOEPjWVvuNquyUMBU45iTVmSKljv5XojkFSU8yIrcFUkQThU9Qjx86MESeqY/KpsqDuPCGIhHS/WIPcO5lhEFdqwAMXmYmvrp5lzlnx6mOXnYMjZNUkxgXhaKUAS1ANqIgpJgzQbOQFhSxXgPnKzot0gO50nyvQJOyO6fBHMO0ZFefUSpYDbcvJFcdEalCQm51hwjuLwmYER4pQNQhKhlOlsCKORPUuvtfCMJmo3RiSEQ2FpD0aI8ZIpG2ld3u09cw32vwHXEPJMmeY/0xIRJpIR2Tokete7AefJL1i/1XJI3Hkc4sX8vkBNxlMl1EQmJji5FhQhEY9KRIkxLhqfycqANAkXuGIp6U04oI16X+1Z6cNg42N/znG5ufXizvBn264L32HvqNTzf2/J2vA9e29v3cGWpslXZqbyufqv+qP6s/ipC5+eGOY+80qr+/gO+OCZh</latexit>

max

y1:M Ψ(w, y1:M) = max y1:M M+1

X

m=1

sm(ym, ym−1)

<latexit sha1_base64="cFZeFhMRSODTiOyXYHFD+C/9s=">AGOHiclVPdbtMwFM5WCiP8bcAdN4apWst+1Awk0KRJE5sQ0jRJPaD5hI5jtNas5PIdrZVlh+JF+BNuOMOcsT4CRt16wFaZbqnB5/5zufj8JUkalard/zM3XbtVv31m46967/+Dho8Wlx0cyQmhzhiTgJkCSMxuRQUcXISoI4gEjx8HZbn5+fE6EpEn8WQ1S0uWoF9OIYqSsy1+a/+Y2YEfSJgy4vjBrIP8OTAtsAygzDhnlVElf823PfNUHq57JAVD1iUIGQBwmqgiJzJh4PN803zds/94y210miUr5DQEJaxIEAmENSXaXOWhJbRFQmFe8UmpTGAHKk+Rkx/GeVtmX8QreRMN75kOoGevlBzf91rNfBKnjfApa7j1Qpz5icE4keR5f+2AVZ0gOjIoDJKriNWfCR1MFQal6u9YLlxgW54lq5IpuVNWe+QW2pc+bk+6WCyfofO1tHVjS6aKXJ0XpZ+CviZhK4i8utzfaxQLThjc0lp3h6tgGFzBMcMZJrDBDUp567VR1NRKYkaMCzNJUoTPUI+cWjNGnMiuLgbLgIb1hCBKhP3FChTeyQiNuJQDHlhkXn95/Sx3zjo7zVT0tqtpnGaKxLhMFGUMqATkUwpCKghWbGANhAW1WgHuIzsuys6ybiJNH3CzomqXgTzrpZRkb0iKeCmGnxZXtSFgsTkAiecozh8qWGEOGWDkEQoYyqfw2hkz6rXWnhOUzks3ZiSEQUTQXs0RoyRSMF8q7rtp69gsbtwj9gHEuTAqv6YEoFUIqySsk2NfbAefJ63jPkfksZjpDWr19KFAHuZvC5JSmJtyqlhiSQw6IkSyuCp+ILoZYARfYZSjyphpUI26Xe9Z6cNo42N7xXG5ufXi/vBv264LzHnhNB3PePsOB+cjnPo4NrT2lZt7ZX/17/Wf9V/1C5+eGMU+cyqr/+Qtf4ih5</latexit>

ˆ y

<latexit sha1_base64="tmRzwOs7SiKi8INixRl4Jca9kY=">AFB3iclVNb9MwFE5Lga1cdnvkxTBNa9lFzUCl0kTIQ0TRSJXdBcKsc5a3ZcWQ726rgH8Cv4Q3xys/gmT+Ck3RVsw4kLMU+Of7Od+5OEg406bT+VWr32rcvnN3br57/6DhwuLS8tHWqaKwiGVXKqTgGjgLIZDwyHk0QBEQGH4+DsdX5/fA5KMxl/NKMEeoIMYhYxSoxT9ZdqX5pruKtZCwciu7CbKD9Hto12EdapwJwJZnQ/E7u+/ZwdbPg2B2AzBEMswjSUpjCJ7MTDqC/yLRNbvsT7eZat1V6xYKFqIQVASJFaIbhMmndRKFtswqFQr3ugrIYUHMkBKefbqK27Z/cbSe/rvJMp9GxCrf0tv93aRxvobRs18ZCYkp21uUuiBoJc9icqzOUAXdUATRehv7ja2e4UC80K/lhY9car21+qKxKmgqIDeVE61O/k5heRpRhlINt4lRDQugZGcCpE2MiQPeyYk4sWnOaEVSuS82qNBOW2REaD0SgUPm9dX73LlTXenqYle9jIWJ6mBmJaBopQjI1E+dChkCqjhIycQqpjiuiQuO4bN5quN1NhsDPwVQToaKX6aiIXqEUCFs1viwTbWIFMVxQKQSJw6cZjohgfBRCRFJu8rGKruSb6rUZnrNEj0s3cnBYKnYgMWEc4gMzreq2h1Dg4u9id+Aa5CA8f6fQKGKkck3I2rGvYAD/GufgvJIsnSCdW08oKAi6ZvC4ygTiz5avgUgMOBkqmSYXwjH1B1DkgkWtDiYeqWYlwU+pfn8lZ4Whn23+2vfPh+ereq/G8znmPvCdey/O9F96e987reocerf2uz9eX6yuNr41vje+NHyW0XhvbrHiV1fj5BxU+uaY=</latexit>

score of best tag sequence

  • f length M

= max

yM sM+1(h/si, yM) + max y1:M−1 M

X

m=1

sm(ym, ym−1)

<latexit sha1_base64="WGAjEtshRFBM14UTBXAPpJKzjY=">AGpHiclVRdb9MwFM1WKN8bfCEeDFMZS37oBlIKRJEyCENCaKoNvQ3EWO46TW7CSynW1VFIlfwyv8Hf4NdtJ2TVOQFqnO7fW5xf23FjRqXqdP4sLNauXa/fWLrZuHX7zt17yv3D2SUCEx6OGKROHKRJIyGpKeoYuQoFgRxl5FD9/SdmT8I0LSKPymhjHpcxSE1KcYKZ1yVmoPG03YlbQFXZ6eZxvAvIdZG+wAKBMOGeVUSflO3Z2ku6v25kBQDUgCmUAYi9SeYmfTRiGDjdDyjdt/Y+3G81uq2CFnHqgOUCvkA4heQibs2z0M7SkoU8vaZFaQgR2qAEUu/j3Xb2T+I1gzTlRcZT6GrC2rtbdrt1h5YBx/aQHMPkCrsZnhRCLg6MKZpCLAjBuApjuQqM5Dz62OhxZNe3azFmu3JBLrVLsnmqhvkKHdiRDm9Np3UOTvE5qf1mX7NWu17M5L2fg59xUVWZLdNFuhUG24IMhQEjzyUeZBXGan1qpTmKotVpZzl1c5WJ39ANbBHwao1erOyqKAXoQTkKFGZLy2O7Eqp8ioShmJGvARJIY4VMUkGMdhogT2U/zS5yBps54wI+E/oUK5NnpihRxKYfc1Uiz13J2ziTnzR0nyn/dT2kYJ4qEuBDyEwZUBMwXAXhUEKzYUAcIC6q9AjxA+moq/d3QuzolMyDsjKjyQjDvp9LP1UuWXJ6Viy+KhTagICE5xHnKPSepdBHnLKhR3yUMGXuvD+O5/VrwzujsRy1bkLJiIKRoAENEWPEV9AM5bR+DRTMxwZ8T/QGCbKvX+OiUAqEtpJcSUyvWEBfGyOTPY/JA0nSB2Wl5XmBvRiTF+imIRpVtxQFkC3UBESVwyXKnPjWoC5OtKPCkXFYg9Cm1Z89kNTjY3rJfbG1/ebm6+3Z0XpesR9YTq2XZ1itr1/poda2ehWs/aj9rv2q/60/rn+pf670CurgwqnlglZ76yV/XA02e</latexit>

score of best tag sequence

  • f length M-1

score of most probable extension yM same subproblem!

slide-40
SLIDE 40

Conditional random fields (CRFs)

7

■ Decoding: Direct application of Viterbi!

= argmax

y M+1

X

m=1

ψ(w, ym, ym−1, m)

<latexit sha1_base64="DlaVeuBnBAV5S8rb4kmojlkndes=">AFvniclVPdbtMwFM4KhRH+NrjkxjBVa9mPmg0JbiZNgBDSNFEk9oPmUjmO01qz48h2tlZWXoSn4RbegLfBTtquWQtiluqcHn/nO985Pg5TRpVut38v1W7drt+5u3zPv/g4aPHK6tPjpXIJCZHWDAhT0OkCKMJOdJUM3KaSoJ4yMhJeP7OnZ9cEKmoSL7oUq6HPUTGlOMtHX1Vmu7fgN2FG3CkJvLfBO47yhvgT0AVcYho5xq1TN8L8i/mcONIHcAqAdEoxAHAldhMT5lGHU424zfCuw/3jLb3SaJSvkNAIlrEgQS4QNJMO0uUhCKzcVCYV73SalCYAc6QFGzHyd5G3lfyFad0w3LjKdQc8X1DzYClrNA7ABPrSA5R4gXcrLc8eJZJ+jYW/qgkz0waQJYLYLfmMRfCJ1NJbq2rVsNy4IVdc61dk/nxSR/zfDeitrLW328UC80YwNta8erYMZMwEjJNGYIaXOgnaquwZJTEjuQ8zRVKEz1GfnFkzQZyorinGOwcN64lALKT9JRoU3tkIg7hSIx5apOuCun7mnIvOzjIdv+kamqSZJgkuE8UZA1oA91ZARCXBmo2sgbCkVivA2SHVtsXZa9Js2AsAuiq4Vg3jUqLrJXJIU8rwYPy0J9KElCLrHgHCXRSwNjxCkbRSRGdPuNcQTe1G/NqMLmqpx6aUjGgoJO3TBDFGYg3dVnXbz0DYvfhe2IvSJDq/pTSiTSQlol5bTk9sL68Dl05r+QNJkirVktyxQCbDGuLyIlicnL2WVCERj2pcjSiuC5+EKoJUCxvYST6phJcJOaXB9JueN453tYHd75/Ortf2343ld9p5L7ymF3ivX3vo9fxjxc+17UftZ+1Xfr8d1XhcltLY0jnqVZ9+AfhCPx+</latexit>

= sm(ym, ym−1)

<latexit sha1_base64="HUJof5rs93v8xuaMgJAQitvYE=">AGNXiclVPbhMxEN02BEq4tfDIi6GKmtCLsgWpCKlSuQghVRVBohdUp5HX602s2uV7W0bWf4hfoBv4YE3xCu/gHc3SbNQNRSvJPxzJnj45kgYVTpVuv73HzlRvXmrYXbtTt3791/sLj08ECJVGKyjwUT8ihAijAak31NSNHiSIB4wcBqdvs/PDMyIVFfFnPUhIh6NeTCOKkXau7tL81odthVtwICbc7sGsu/ANsE2gCrlkFOteoavu3bE7O36tsAOo+0cgCiEOh85TIjhEGXZ5thq/7h9v1urtRoEKOQ1BEZYXiCTCBpKLpDGLQtOaEoXcveK0hAjnQfI2a+jOo27V+AVjKka18ymYievlBjd91vNnbBKnjfBA67j3RBz9oME8keRxfdsQsy0QMjEcCkCrX6rPAR1cGQaibXeo5ybUEusVYuwWZVzZD/X4Ft1eWNSa/DhBNwXeO/2nOg06IXJ7n0Lv5aVbuLy62NVr7AtOEPjWVvuNquyUMBU45iTVmSKljv5XojkFSU8yIrcFUkQThU9Qjx86MESeqY/KpsqDuPCGIhHS/WIPcO5lhEFdqwAMXmYmvrp5lzlnx6mOXnYMjZNUkxgXhaKUAS1ANqIgpJgzQbOQFhSxXgPnKzot0gO50nyvQJOyO6fBHMO0ZFefUSpYDbcvJFcdEalCQm51hwjuLwmYER4pQNQhKhlOlsCKORPUuvtfCMJmo3RiSEQ2FpD0aI8ZIpG2ld3u09cw32vwHXEPJMmeY/0xIRJpIR2Tokete7AefJL1i/1XJI3Hkc4sX8vkBNxlMl1EQmJji5FhQhEY9KRIkxLhqfycqANAkXuGIp6U04oI16X+1Z6cNg42N/znG5ufXizvBn264L32HvqNTzf2/J2vA9e29v3cGWpslXZqbyufqv+qP6s/ipC5+eGOY+80qr+/gO+OCZh</latexit>

max

y1:M Ψ(w, y1:M) = max y1:M M+1

X

m=1

sm(ym, ym−1)

<latexit sha1_base64="cFZeFhMRSODTiOyXYHFD+C/9s=">AGOHiclVPdbtMwFM5WCiP8bcAdN4apWst+1Awk0KRJE5sQ0jRJPaD5hI5jtNas5PIdrZVlh+JF+BNuOMOcsT4CRt16wFaZbqnB5/5zufj8JUkalard/zM3XbtVv31m46967/+Dho8Wlx0cyQmhzhiTgJkCSMxuRQUcXISoI4gEjx8HZbn5+fE6EpEn8WQ1S0uWoF9OIYqSsy1+a/+Y2YEfSJgy4vjBrIP8OTAtsAygzDhnlVElf823PfNUHq57JAVD1iUIGQBwmqgiJzJh4PN803zds/94y210miUr5DQEJaxIEAmENSXaXOWhJbRFQmFe8UmpTGAHKk+Rkx/GeVtmX8QreRMN75kOoGevlBzf91rNfBKnjfApa7j1Qpz5icE4keR5f+2AVZ0gOjIoDJKriNWfCR1MFQal6u9YLlxgW54lq5IpuVNWe+QW2pc+bk+6WCyfofO1tHVjS6aKXJ0XpZ+CviZhK4i8utzfaxQLThjc0lp3h6tgGFzBMcMZJrDBDUp567VR1NRKYkaMCzNJUoTPUI+cWjNGnMiuLgbLgIb1hCBKhP3FChTeyQiNuJQDHlhkXn95/Sx3zjo7zVT0tqtpnGaKxLhMFGUMqATkUwpCKghWbGANhAW1WgHuIzsuys6ybiJNH3CzomqXgTzrpZRkb0iKeCmGnxZXtSFgsTkAiecozh8qWGEOGWDkEQoYyqfw2hkz6rXWnhOUzks3ZiSEQUTQXs0RoyRSMF8q7rtp69gsbtwj9gHEuTAqv6YEoFUIqySsk2NfbAefJ63jPkfksZjpDWr19KFAHuZvC5JSmJtyqlhiSQw6IkSyuCp+ILoZYARfYZSjyphpUI26Xe9Z6cNo42N7xXG5ufXi/vBv264LzHnhNB3PePsOB+cjnPo4NrT2lZt7ZX/17/Wf9V/1C5+eGMU+cyqr/+Qtf4ih5</latexit>

ˆ y

<latexit sha1_base64="tmRzwOs7SiKi8INixRl4Jca9kY=">AFB3iclVNb9MwFE5Lga1cdnvkxTBNa9lFzUCl0kTIQ0TRSJXdBcKsc5a3ZcWQ726rgH8Cv4Q3xys/gmT+Ck3RVsw4kLMU+Of7Od+5OEg406bT+VWr32rcvnN3br57/6DhwuLS8tHWqaKwiGVXKqTgGjgLIZDwyHk0QBEQGH4+DsdX5/fA5KMxl/NKMEeoIMYhYxSoxT9ZdqX5pruKtZCwciu7CbKD9Hto12EdapwJwJZnQ/E7u+/ZwdbPg2B2AzBEMswjSUpjCJ7MTDqC/yLRNbvsT7eZat1V6xYKFqIQVASJFaIbhMmndRKFtswqFQr3ugrIYUHMkBKefbqK27Z/cbSe/rvJMp9GxCrf0tv93aRxvobRs18ZCYkp21uUuiBoJc9icqzOUAXdUATRehv7ja2e4UC80K/lhY9car21+qKxKmgqIDeVE61O/k5heRpRhlINt4lRDQugZGcCpE2MiQPeyYk4sWnOaEVSuS82qNBOW2REaD0SgUPm9dX73LlTXenqYle9jIWJ6mBmJaBopQjI1E+dChkCqjhIycQqpjiuiQuO4bN5quN1NhsDPwVQToaKX6aiIXqEUCFs1viwTbWIFMVxQKQSJw6cZjohgfBRCRFJu8rGKruSb6rUZnrNEj0s3cnBYKnYgMWEc4gMzreq2h1Dg4u9id+Aa5CA8f6fQKGKkck3I2rGvYAD/GufgvJIsnSCdW08oKAi6ZvC4ygTiz5avgUgMOBkqmSYXwjH1B1DkgkWtDiYeqWYlwU+pfn8lZ4Whn23+2vfPh+ereq/G8znmPvCdey/O9F96e987reocerf2uz9eX6yuNr41vje+NHyW0XhvbrHiV1fj5BxU+uaY=</latexit>

score of best tag sequence

  • f length M

= max

yM sM+1(h/si, yM) + max y1:M−1 M

X

m=1

sm(ym, ym−1)

<latexit sha1_base64="WGAjEtshRFBM14UTBXAPpJKzjY=">AGpHiclVRdb9MwFM1WKN8bfCEeDFMZS37oBlIKRJEyCENCaKoNvQ3EWO46TW7CSynW1VFIlfwyv8Hf4NdtJ2TVOQFqnO7fW5xf23FjRqXqdP4sLNauXa/fWLrZuHX7zt17yv3D2SUCEx6OGKROHKRJIyGpKeoYuQoFgRxl5FD9/SdmT8I0LSKPymhjHpcxSE1KcYKZ1yVmoPG03YlbQFXZ6eZxvAvIdZG+wAKBMOGeVUSflO3Z2ku6v25kBQDUgCmUAYi9SeYmfTRiGDjdDyjdt/Y+3G81uq2CFnHqgOUCvkA4heQibs2z0M7SkoU8vaZFaQgR2qAEUu/j3Xb2T+I1gzTlRcZT6GrC2rtbdrt1h5YBx/aQHMPkCrsZnhRCLg6MKZpCLAjBuApjuQqM5Dz62OhxZNe3azFmu3JBLrVLsnmqhvkKHdiRDm9Np3UOTvE5qf1mX7NWu17M5L2fg59xUVWZLdNFuhUG24IMhQEjzyUeZBXGan1qpTmKotVpZzl1c5WJ39ANbBHwao1erOyqKAXoQTkKFGZLy2O7Eqp8ioShmJGvARJIY4VMUkGMdhogT2U/zS5yBps54wI+E/oUK5NnpihRxKYfc1Uiz13J2ziTnzR0nyn/dT2kYJ4qEuBDyEwZUBMwXAXhUEKzYUAcIC6q9AjxA+moq/d3QuzolMyDsjKjyQjDvp9LP1UuWXJ6Viy+KhTagICE5xHnKPSepdBHnLKhR3yUMGXuvD+O5/VrwzujsRy1bkLJiIKRoAENEWPEV9AM5bR+DRTMxwZ8T/QGCbKvX+OiUAqEtpJcSUyvWEBfGyOTPY/JA0nSB2Wl5XmBvRiTF+imIRpVtxQFkC3UBESVwyXKnPjWoC5OtKPCkXFYg9Cm1Z89kNTjY3rJfbG1/ebm6+3Z0XpesR9YTq2XZ1itr1/poda2ehWs/aj9rv2q/60/rn+pf670CurgwqnlglZ76yV/XA02e</latexit>

score of best tag sequence

  • f length M-1

score of most probable extension yM same subproblem! ψ includes transition and emission/unary scores

slide-41
SLIDE 41

Learning in CRFs

8

slide-42
SLIDE 42

Learning in CRFs

8

■ As with logistic regression, weights θ are learned by minimizing negative log

likelihood:

` = −

N

X

i=1

log P(y(i) | w(i); ✓)

<latexit sha1_base64="qyQNuw9LW72v28UgaIfcr0GnkE=">AG6HiclVTdbtMwFM5WKP8bXDJjWGqlrB1NAMJBJo0AUJI06BIbAPNXeW6TmpmJ5HtbKusvAN3iFuehZfgFbiFB8BOuq5pC9Is1Tk9/s53Ph8fu5swKlWz+XNuvnLpcvXKwtXates3bt5aXLq9J+NUYLKLYxaLj10kCaMR2VUMfIxEQTxLiP73aOXdn3/mAhJ4+iDGiSkzVEY0YBipIyrs1Rp1+qwJakLu1yfZGvAfgeZBzYBlCmHjHKqZEfzT871DurfmYBUPWJQhmAuBerPCTIRgyDreT5g3f/ONerd5yC1bIaQ8UsDxBIBDWkJwm7iwJXqZLEnL3iklKIwA5Un2MmP50ltfL/kG0YpkuvMlkD29IXe74XvuNlgFrz1guPtIFfKyzHIiEXJ02hm5ItDcFYEMF6FWn0W/EzqYCjVlquRs1y4IOdcK+dks7Ja5gtUYFN2uDvuNj4xtfR/rMdwzpd9WIlr/0M/ISKGVkm40yUqYUFu5ChKGTkoYQiN/Iwm2t1OpchK2ebkQsSxozORoGjFve2dJaH2qVe6UQLz3NQ3BCvs7jcXG/mA0wb/tBYdoaj1VmaF7AX45STSGpDzwm4lqayQUxYxkNZhKkiB8hEJyYMwIcSLbOn8IMlA3nh4IYmF+kQK5dzxCIy7lgHcN0vaLnFyzlrB6kKnrY1jZJUkQgXiYKUARUD+6qAHhUEKzYwBsKCGq0A95G53sq8PaYzxtL0CTsmqrwRzNtaBn2kqQuz8rBp8VGa1CQiJzgmHMU9R5oGCBO2aBHApQyZd+N4MyeVa+13jFN5LB0I0pGFIwFDWmEGCOBgnYqu82nr2A+1+ArYg5IkB2j+l1CBFKxMEqKa5WZAwvhPdt12f+QNBohjVnels4FmM3YusQJiXRW3HIWSwK7oYjTpCR4Kj4XaghQYI6hwJNyWIEwXepP9uS0sbex7j9a3j/eHnrxbBfF5y7zn3HdXznibPlvHFazq6DKz8qvyq/K3+qn6tfql+r3wro/Nw5o5TGtXvfwFvtGeb</latexit>
slide-43
SLIDE 43

Learning in CRFs

8

■ As with logistic regression, weights θ are learned by minimizing negative log

likelihood:

` = −

N

X

i=1

log P(y(i) | w(i); ✓)

<latexit sha1_base64="qyQNuw9LW72v28UgaIfcr0GnkE=">AG6HiclVTdbtMwFM5WKP8bXDJjWGqlrB1NAMJBJo0AUJI06BIbAPNXeW6TmpmJ5HtbKusvAN3iFuehZfgFbiFB8BOuq5pC9Is1Tk9/s53Ph8fu5swKlWz+XNuvnLpcvXKwtXates3bt5aXLq9J+NUYLKLYxaLj10kCaMR2VUMfIxEQTxLiP73aOXdn3/mAhJ4+iDGiSkzVEY0YBipIyrs1Rp1+qwJakLu1yfZGvAfgeZBzYBlCmHjHKqZEfzT871DurfmYBUPWJQhmAuBerPCTIRgyDreT5g3f/ONerd5yC1bIaQ8UsDxBIBDWkJwm7iwJXqZLEnL3iklKIwA5Un2MmP50ltfL/kG0YpkuvMlkD29IXe74XvuNlgFrz1guPtIFfKyzHIiEXJ02hm5ItDcFYEMF6FWn0W/EzqYCjVlquRs1y4IOdcK+dks7Ja5gtUYFN2uDvuNj4xtfR/rMdwzpd9WIlr/0M/ISKGVkm40yUqYUFu5ChKGTkoYQiN/Iwm2t1OpchK2ebkQsSxozORoGjFve2dJaH2qVe6UQLz3NQ3BCvs7jcXG/mA0wb/tBYdoaj1VmaF7AX45STSGpDzwm4lqayQUxYxkNZhKkiB8hEJyYMwIcSLbOn8IMlA3nh4IYmF+kQK5dzxCIy7lgHcN0vaLnFyzlrB6kKnrY1jZJUkQgXiYKUARUD+6qAHhUEKzYwBsKCGq0A95G53sq8PaYzxtL0CTsmqrwRzNtaBn2kqQuz8rBp8VGa1CQiJzgmHMU9R5oGCBO2aBHApQyZd+N4MyeVa+13jFN5LB0I0pGFIwFDWmEGCOBgnYqu82nr2A+1+ArYg5IkB2j+l1CBFKxMEqKa5WZAwvhPdt12f+QNBohjVnels4FmM3YusQJiXRW3HIWSwK7oYjTpCR4Kj4XaghQYI6hwJNyWIEwXepP9uS0sbex7j9a3j/eHnrxbBfF5y7zn3HdXznibPlvHFazq6DKz8qvyq/K3+qn6tfql+r3wro/Nw5o5TGtXvfwFvtGeb</latexit>

= −

N

X

i=1

θ · f(w(i), y(i)) + log X

y0∈Y(w(i))

exp ⇣ θ · f(w(i), y0) ⌘

<latexit sha1_base64="5MA9J47fV/N83fC3Lg9LbvGCH4=">AHpHiclVRtb9MwEA6jUCivg0+IL4ZpLGFsNIAEAk1CgBDSGAzBXtDcVa7rtBZ2EtnOWGVF4tfwFf4O/wa/tFvTdqBFanM53z3Pc3e2OzmjUjWbf87Mna2dO1+/cLFx6fKVq9euz9/YlkhMNnCGcvEbgdJwmhKthRVjOzmgiDeYWSn8+21Xd85IELSLP2iBjlpcdRLaUIxUsbVnq/daizCTUlD2OH6e/kA2PegjMAagLgkFOlWxrvhaX+3pjOS5tAFR9olAJIO5myqUk5RHCoM3tn+YrsfniUWNxM/SokNMu8GOIBEIa0gO83CWhKjUFQnOvWRIaQogR6qPEdNfR7xReQLQkU6dZH5WPR0QeH6ShyF62AZvI2Awe4j5eWVpcVEosfRYfvIBVnWA6MmgPEuNBZnhY+kDoZSbtWHMqpG3KMtXQMNovVIp+iA2uyzcNxt/HBMby2jp9vGNTprvsV1/sZ8RMqZrBM5pks0wsbHEKG0h4jDyUznBplmt5msuAVdlmcUHCmBG64gOpDfxQGea+DmlUGan3vAD+iESNiWzwn+Pj0e98l9O/wnDN7OfHv0wzW8ARhIVjrE60pM5zXGBgvb6KmpfX2iuNt0Dpo14aCwEw2ezPT8nYDfDBSepwgxJuRc3c9XSCiKGSkbsJAkR/gb6pE9Y6aIE9nS7hIrwaLxdEGSCfNLFXDe8QyNuJQD3jGRtmA5uWads9b2CpU8a2ma5oUiKfZEScGAyoC9EUGXCoIVGxgDYUGNVoD7yFxNytybZg+M0fQJOyCqWgjmLS0Tx16R1OFlNfnQF9qAgqTkO84R2n3voYJ4pQNuiRBVP2zktG9qx+Pege0FwOW3cEyYiCmRkbTRGzA3dTr7rdUP1oG/ANMQMSZMOo/pgTgVQmjBJ/JZRmYD14x56Y8l+RND2KNGa1LO0EmGJsX7KcpLr025RlksBOT2RFXhE8le+EGgCUmDH4eFJN8xFml8aTe3La2H60Gj9efTpycLV8P9eiG4HdwNwiAOngYvg3fBZrAV4NqP2s/ar9rv+r36+/rn+pYPnTszLkZVJ76/l85mKrY</latexit>
slide-44
SLIDE 44

Learning in CRFs

8

■ As with logistic regression, weights θ are learned by minimizing negative log

likelihood:

` = −

N

X

i=1

log P(y(i) | w(i); ✓)

<latexit sha1_base64="qyQNuw9LW72v28UgaIfcr0GnkE=">AG6HiclVTdbtMwFM5WKP8bXDJjWGqlrB1NAMJBJo0AUJI06BIbAPNXeW6TmpmJ5HtbKusvAN3iFuehZfgFbiFB8BOuq5pC9Is1Tk9/s53Ph8fu5swKlWz+XNuvnLpcvXKwtXates3bt5aXLq9J+NUYLKLYxaLj10kCaMR2VUMfIxEQTxLiP73aOXdn3/mAhJ4+iDGiSkzVEY0YBipIyrs1Rp1+qwJakLu1yfZGvAfgeZBzYBlCmHjHKqZEfzT871DurfmYBUPWJQhmAuBerPCTIRgyDreT5g3f/ONerd5yC1bIaQ8UsDxBIBDWkJwm7iwJXqZLEnL3iklKIwA5Un2MmP50ltfL/kG0YpkuvMlkD29IXe74XvuNlgFrz1guPtIFfKyzHIiEXJ02hm5ItDcFYEMF6FWn0W/EzqYCjVlquRs1y4IOdcK+dks7Ja5gtUYFN2uDvuNj4xtfR/rMdwzpd9WIlr/0M/ISKGVkm40yUqYUFu5ChKGTkoYQiN/Iwm2t1OpchK2ebkQsSxozORoGjFve2dJaH2qVe6UQLz3NQ3BCvs7jcXG/mA0wb/tBYdoaj1VmaF7AX45STSGpDzwm4lqayQUxYxkNZhKkiB8hEJyYMwIcSLbOn8IMlA3nh4IYmF+kQK5dzxCIy7lgHcN0vaLnFyzlrB6kKnrY1jZJUkQgXiYKUARUD+6qAHhUEKzYwBsKCGq0A95G53sq8PaYzxtL0CTsmqrwRzNtaBn2kqQuz8rBp8VGa1CQiJzgmHMU9R5oGCBO2aBHApQyZd+N4MyeVa+13jFN5LB0I0pGFIwFDWmEGCOBgnYqu82nr2A+1+ArYg5IkB2j+l1CBFKxMEqKa5WZAwvhPdt12f+QNBohjVnels4FmM3YusQJiXRW3HIWSwK7oYjTpCR4Kj4XaghQYI6hwJNyWIEwXepP9uS0sbex7j9a3j/eHnrxbBfF5y7zn3HdXznibPlvHFazq6DKz8qvyq/K3+qn6tfql+r3wro/Nw5o5TGtXvfwFvtGeb</latexit>

= −

N

X

i=1

θ · f(w(i), y(i)) + log X

y0∈Y(w(i))

exp ⇣ θ · f(w(i), y0) ⌘

<latexit sha1_base64="5MA9J47fV/N83fC3Lg9LbvGCH4=">AHpHiclVRtb9MwEA6jUCivg0+IL4ZpLGFsNIAEAk1CgBDSGAzBXtDcVa7rtBZ2EtnOWGVF4tfwFf4O/wa/tFvTdqBFanM53z3Pc3e2OzmjUjWbf87Mna2dO1+/cLFx6fKVq9euz9/YlkhMNnCGcvEbgdJwmhKthRVjOzmgiDeYWSn8+21Xd85IELSLP2iBjlpcdRLaUIxUsbVnq/daizCTUlD2OH6e/kA2PegjMAagLgkFOlWxrvhaX+3pjOS5tAFR9olAJIO5myqUk5RHCoM3tn+YrsfniUWNxM/SokNMu8GOIBEIa0gO83CWhKjUFQnOvWRIaQogR6qPEdNfR7xReQLQkU6dZH5WPR0QeH6ShyF62AZvI2Awe4j5eWVpcVEosfRYfvIBVnWA6MmgPEuNBZnhY+kDoZSbtWHMqpG3KMtXQMNovVIp+iA2uyzcNxt/HBMby2jp9vGNTprvsV1/sZ8RMqZrBM5pks0wsbHEKG0h4jDyUznBplmt5msuAVdlmcUHCmBG64gOpDfxQGea+DmlUGan3vAD+iESNiWzwn+Pj0e98l9O/wnDN7OfHv0wzW8ARhIVjrE60pM5zXGBgvb6KmpfX2iuNt0Dpo14aCwEw2ezPT8nYDfDBSepwgxJuRc3c9XSCiKGSkbsJAkR/gb6pE9Y6aIE9nS7hIrwaLxdEGSCfNLFXDe8QyNuJQD3jGRtmA5uWads9b2CpU8a2ma5oUiKfZEScGAyoC9EUGXCoIVGxgDYUGNVoD7yFxNytybZg+M0fQJOyCqWgjmLS0Tx16R1OFlNfnQF9qAgqTkO84R2n3voYJ4pQNuiRBVP2zktG9qx+Pege0FwOW3cEyYiCmRkbTRGzA3dTr7rdUP1oG/ANMQMSZMOo/pgTgVQmjBJ/JZRmYD14x56Y8l+RND2KNGa1LO0EmGJsX7KcpLr025RlksBOT2RFXhE8le+EGgCUmDH4eFJN8xFml8aTe3La2H60Gj9efTpycLV8P9eiG4HdwNwiAOngYvg3fBZrAV4NqP2s/ar9rv+r36+/rn+pYPnTszLkZVJ76/l85mKrY</latexit>

sum over all possible labelings

slide-45
SLIDE 45

Learning in CRFs

8

■ As with logistic regression, weights θ are learned by minimizing negative log

likelihood:

■ Can be computed efficiently using forward algorithm.

` = −

N

X

i=1

log P(y(i) | w(i); ✓)

<latexit sha1_base64="qyQNuw9LW72v28UgaIfcr0GnkE=">AG6HiclVTdbtMwFM5WKP8bXDJjWGqlrB1NAMJBJo0AUJI06BIbAPNXeW6TmpmJ5HtbKusvAN3iFuehZfgFbiFB8BOuq5pC9Is1Tk9/s53Ph8fu5swKlWz+XNuvnLpcvXKwtXates3bt5aXLq9J+NUYLKLYxaLj10kCaMR2VUMfIxEQTxLiP73aOXdn3/mAhJ4+iDGiSkzVEY0YBipIyrs1Rp1+qwJakLu1yfZGvAfgeZBzYBlCmHjHKqZEfzT871DurfmYBUPWJQhmAuBerPCTIRgyDreT5g3f/ONerd5yC1bIaQ8UsDxBIBDWkJwm7iwJXqZLEnL3iklKIwA5Un2MmP50ltfL/kG0YpkuvMlkD29IXe74XvuNlgFrz1guPtIFfKyzHIiEXJ02hm5ItDcFYEMF6FWn0W/EzqYCjVlquRs1y4IOdcK+dks7Ja5gtUYFN2uDvuNj4xtfR/rMdwzpd9WIlr/0M/ISKGVkm40yUqYUFu5ChKGTkoYQiN/Iwm2t1OpchK2ebkQsSxozORoGjFve2dJaH2qVe6UQLz3NQ3BCvs7jcXG/mA0wb/tBYdoaj1VmaF7AX45STSGpDzwm4lqayQUxYxkNZhKkiB8hEJyYMwIcSLbOn8IMlA3nh4IYmF+kQK5dzxCIy7lgHcN0vaLnFyzlrB6kKnrY1jZJUkQgXiYKUARUD+6qAHhUEKzYwBsKCGq0A95G53sq8PaYzxtL0CTsmqrwRzNtaBn2kqQuz8rBp8VGa1CQiJzgmHMU9R5oGCBO2aBHApQyZd+N4MyeVa+13jFN5LB0I0pGFIwFDWmEGCOBgnYqu82nr2A+1+ArYg5IkB2j+l1CBFKxMEqKa5WZAwvhPdt12f+QNBohjVnels4FmM3YusQJiXRW3HIWSwK7oYjTpCR4Kj4XaghQYI6hwJNyWIEwXepP9uS0sbex7j9a3j/eHnrxbBfF5y7zn3HdXznibPlvHFazq6DKz8qvyq/K3+qn6tfql+r3wro/Nw5o5TGtXvfwFvtGeb</latexit>

= −

N

X

i=1

θ · f(w(i), y(i)) + log X

y0∈Y(w(i))

exp ⇣ θ · f(w(i), y0) ⌘

<latexit sha1_base64="5MA9J47fV/N83fC3Lg9LbvGCH4=">AHpHiclVRtb9MwEA6jUCivg0+IL4ZpLGFsNIAEAk1CgBDSGAzBXtDcVa7rtBZ2EtnOWGVF4tfwFf4O/wa/tFvTdqBFanM53z3Pc3e2OzmjUjWbf87Mna2dO1+/cLFx6fKVq9euz9/YlkhMNnCGcvEbgdJwmhKthRVjOzmgiDeYWSn8+21Xd85IELSLP2iBjlpcdRLaUIxUsbVnq/daizCTUlD2OH6e/kA2PegjMAagLgkFOlWxrvhaX+3pjOS5tAFR9olAJIO5myqUk5RHCoM3tn+YrsfniUWNxM/SokNMu8GOIBEIa0gO83CWhKjUFQnOvWRIaQogR6qPEdNfR7xReQLQkU6dZH5WPR0QeH6ShyF62AZvI2Awe4j5eWVpcVEosfRYfvIBVnWA6MmgPEuNBZnhY+kDoZSbtWHMqpG3KMtXQMNovVIp+iA2uyzcNxt/HBMby2jp9vGNTprvsV1/sZ8RMqZrBM5pks0wsbHEKG0h4jDyUznBplmt5msuAVdlmcUHCmBG64gOpDfxQGea+DmlUGan3vAD+iESNiWzwn+Pj0e98l9O/wnDN7OfHv0wzW8ARhIVjrE60pM5zXGBgvb6KmpfX2iuNt0Dpo14aCwEw2ezPT8nYDfDBSepwgxJuRc3c9XSCiKGSkbsJAkR/gb6pE9Y6aIE9nS7hIrwaLxdEGSCfNLFXDe8QyNuJQD3jGRtmA5uWads9b2CpU8a2ma5oUiKfZEScGAyoC9EUGXCoIVGxgDYUGNVoD7yFxNytybZg+M0fQJOyCqWgjmLS0Tx16R1OFlNfnQF9qAgqTkO84R2n3voYJ4pQNuiRBVP2zktG9qx+Pege0FwOW3cEyYiCmRkbTRGzA3dTr7rdUP1oG/ANMQMSZMOo/pgTgVQmjBJ/JZRmYD14x56Y8l+RND2KNGa1LO0EmGJsX7KcpLr025RlksBOT2RFXhE8le+EGgCUmDH4eFJN8xFml8aTe3La2H60Gj9efTpycLV8P9eiG4HdwNwiAOngYvg3fBZrAV4NqP2s/ar9rv+r36+/rn+pYPnTszLkZVJ76/l85mKrY</latexit>

sum over all possible labelings

slide-46
SLIDE 46

Learning in CRFs

9

slide-47
SLIDE 47

Learning in CRFs

9

■ Likelihood can be computed efficiently using forward algorithm.

slide-48
SLIDE 48

Learning in CRFs

9

■ Likelihood can be computed efficiently using forward algorithm.

Define: αm(ym) = X

y1:m−1

exp

m

X

n=1

sn(yn, yn−1)

<latexit sha1_base64="KhLME4V4Sic9fp1LltTVpr0QY=">AKNXicrVXdbuNEFPZ2YbMJf9vlkpuBqpNtyUuSItAkcqvkFaFItHdRZ2sNbHyezO2NbMpNvI8gvxAjwLF9whbnkFzsw4WTtxCxVYSnJy5nznO38+Myk4U3o4/O3W1u3Xr/Tu9sfvPHmW2+/c2/7/mOVz2VMz+Kc5/LphCjKWUbPNOcPi0kJWLC6ZPJi6/M+ZMLKhXLs5/0oqBjQaYZS1lMNKi7Tu/DHbxqWI+nojyZfUAmd9FaARwmouMGeCaRWVYhRWz8qT/bAyBljPqCYVwnGSawtJq5WHRSTMVykOQvgngsHuqe+8YsES5MwsQSpJXGJ6WfhdIQRV2QrBqveAlGUIC6JnMeHlz0veoLrC0Z7xdOMki4b1ZkL+o4Mw8B+hfRtgMD3jGgXlUZn0ROBbmMVirM8ylaFgE1qzDY7TJfhrqoQzXlOrBeblyQV72XjnrYjWeb1CBkYqE31SDjf8RWX42Ql43ay6O7G17Bfi6KDZR0HKiFMfYxJ9mU048UlawMO1v8kFztpsXVyYcg6BHjhDZgy/bzXzWemzoNVSp/kcuVfExtuCo394fx+WSz3zyZwRfeh+Zu9r2FuAjhNtd9gtaRXc8L7giWbznQwIQXM+LKsnxZmkUtoiGwx5kJkGBVJQBILN1zFY9a0FZDS1knixk8X9vogbuJp5KB1eyF5Z7fW9WCT4KhNYO3+IwnMBxNUoTo7q/Ubg+dWY0K5JmYIq6VsWxg9t2uGw5pPViozI805+Y8jZ53DlUwhm1y1WE9Ov8yhP+V8PprBI2iezvDw6F90KYQ1sKOVz+n0faWxEkezwXNdMyJUufhsNDjkjNYk6rAZ4rWpD4BZnScxAzAi0Zl/ZWrdAuaBKU5hI+mUZW20SURCi1EBOwNC+gWj8zyq6z87lOPx2XLCvmaxI0rnHOkcmSsaJUzSWPMFCSWDGJF8YxANzRc5FCnBs2M8guq24nEYlyq1LK3QpqIqg2+dIkOsKQZfRnQpAs+bDEKRGMLxKakjnX5hJOl3JXvR4kF6xQdelWLjnVOIc1wjLCzQKyW6itkvGrZoB/pCgyQ9gah/KgkOpcQibujKmjYFL9vVnh1nSXLVpYgtMqbQCQjKlLXtCsrNza5LmieDKV+bxoBbyBt4GCA5JCG5w9bcOcBUxpuD6Tm8Ljo8Pw48OjHz/ZOf6ynte73nveB57vhd5D79j7zjv1zry4t9172DvufdH/tf97/4/+n85061aNedrPf2/gYLQpVa</latexit>
slide-49
SLIDE 49

Learning in CRFs

9

■ Likelihood can be computed efficiently using forward algorithm.

Define: αm(ym) = X

y1:m−1

exp

m

X

n=1

sn(yn, yn−1)

<latexit sha1_base64="KhLME4V4Sic9fp1LltTVpr0QY=">AKNXicrVXdbuNEFPZ2YbMJf9vlkpuBqpNtyUuSItAkcqvkFaFItHdRZ2sNbHyezO2NbMpNvI8gvxAjwLF9whbnkFzsw4WTtxCxVYSnJy5nznO38+Myk4U3o4/O3W1u3Xr/Tu9sfvPHmW2+/c2/7/mOVz2VMz+Kc5/LphCjKWUbPNOcPi0kJWLC6ZPJi6/M+ZMLKhXLs5/0oqBjQaYZS1lMNKi7Tu/DHbxqWI+nojyZfUAmd9FaARwmouMGeCaRWVYhRWz8qT/bAyBljPqCYVwnGSawtJq5WHRSTMVykOQvgngsHuqe+8YsES5MwsQSpJXGJ6WfhdIQRV2QrBqveAlGUIC6JnMeHlz0veoLrC0Z7xdOMki4b1ZkL+o4Mw8B+hfRtgMD3jGgXlUZn0ROBbmMVirM8ylaFgE1qzDY7TJfhrqoQzXlOrBeblyQV72XjnrYjWeb1CBkYqE31SDjf8RWX42Ql43ay6O7G17Bfi6KDZR0HKiFMfYxJ9mU048UlawMO1v8kFztpsXVyYcg6BHjhDZgy/bzXzWemzoNVSp/kcuVfExtuCo394fx+WSz3zyZwRfeh+Zu9r2FuAjhNtd9gtaRXc8L7giWbznQwIQXM+LKsnxZmkUtoiGwx5kJkGBVJQBILN1zFY9a0FZDS1knixk8X9vogbuJp5KB1eyF5Z7fW9WCT4KhNYO3+IwnMBxNUoTo7q/Ubg+dWY0K5JmYIq6VsWxg9t2uGw5pPViozI805+Y8jZ53DlUwhm1y1WE9Ov8yhP+V8PprBI2iezvDw6F90KYQ1sKOVz+n0faWxEkezwXNdMyJUufhsNDjkjNYk6rAZ4rWpD4BZnScxAzAi0Zl/ZWrdAuaBKU5hI+mUZW20SURCi1EBOwNC+gWj8zyq6z87lOPx2XLCvmaxI0rnHOkcmSsaJUzSWPMFCSWDGJF8YxANzRc5FCnBs2M8guq24nEYlyq1LK3QpqIqg2+dIkOsKQZfRnQpAs+bDEKRGMLxKakjnX5hJOl3JXvR4kF6xQdelWLjnVOIc1wjLCzQKyW6itkvGrZoB/pCgyQ9gah/KgkOpcQibujKmjYFL9vVnh1nSXLVpYgtMqbQCQjKlLXtCsrNza5LmieDKV+bxoBbyBt4GCA5JCG5w9bcOcBUxpuD6Tm8Ljo8Pw48OjHz/ZOf6ynte73nveB57vhd5D79j7zjv1zry4t9172DvufdH/tf97/4/+n85061aNedrPf2/gYLQpVa</latexit>

= X

y1:m−1 m

Y

n=1

exp sn(yn, yn−1)

<latexit sha1_base64="mezX0sqYlDlGzh2M6vW+sE82j1s=">AKNXicrVXbtGEGWcNorUW9w+5mVbwzBZx67oBkiRQIDTGwoEbl2gzgVehViRS2mT3SWxu3QsEPyh/kC/pQ9K/raX+heKIWUqKRGS0DSaHZmzsyZ4ewkp0Sq4fD3a1vX3n3Ru9mf/De+x98+NGt7Y8fy6wQMT6LM5qJpxMkMSUcnymiKH6aC4zYhOInk5fmPMnF1hIkvFf1DzHY4amnKQkRkqrou0bvw524akPpyw8lV1B5jfeRWAEYCyYJASRpSMSjYKq+flyX5YGQOoZlihCsA4yZR1SatlhHnEzFfJDkL9jwWD3VPfRYWMJMCZWYBUoLiE+DL3u1IqrKVglXvaVDCAWRIzWJEy2cL3KDaEGjPRLpykXnDer0g/9FBGPiPwD74PgA69gwpl15VmZhITBm6jJYqSLMpWJAmiwMdrvMF6nO61QNXQc2ypUJeR1r73WwLlQT+QoMjGTE/KZa62AjXlSG90901HXW3YnlvsN+JYsOlFU/7aW5MY+pIhPKf5CQmEF62aw9texdLA2WhcWxJTqRA+cITGP7a+bz0SdBqdM8AO4Vsfm23MFb3h/nvyDL/bMFbOi+bv5672s3NwEUp8pvoFrQzZj6fYGCTGfK1I9oPkOl8Xb0mSRWRYNiD3gpkIGZMS1A7dEckvkJs9cZMnCzYTpcH1LDuQKkUbOtW6vLt7ZrXQ9WAc4agNYu/8IoueDMCxBXZ3V+o3Bc6sxwVQhM4TVQrYtjF7YNUP1mk+WKjMjzTn7jyNXnQOVTDW2TYT06/zKF/xXwzdcIGEW3doaHQ/uAdSGshR2vfk6j7S0BkywuGOYqpkjK83CYq3GJhCIxdUAFhLnKH6JpvhcixzploxLe6tWYFdrEpBmQn+4Alb9CgRk3LOJtrSvIBy9cwou87OC5V+NS4JzwuFeyA0oIClQFzRYOECBwrOtcCigXRuYJ4hnQ3lL7INU8NmBmF1i1C4nZuJSpRW+lNGFV2/nSFTqAnP8Ks4YQz5vIQpYoTOE5yigipzCacLuYuvO8kFyWVN3TIkxQpmeo0QjqhZQHYLtdV2ybhVM4DfYt0gU901j/lWCVCZ2Ju6Mq3bAp/NSs8OpNloQvLbXYLqu0CehiDC9ZjnlZubVJM4nhZCqyIm8lvOZvE9UBUKrb4Ox281Z6CkNV2dyXh8dBh+eXj0892d46/reb3p3fY+83wv9O5x94P3ql35sW97d693nHvYf+3/h/9P/t/OdOta7XPJ17r6f/9D9AplSI=</latexit>
slide-50
SLIDE 50

Learning in CRFs

9

■ Likelihood can be computed efficiently using forward algorithm. ■ As in decoding / Viterbi, can be decomposed into recursive substructure:

Define: αm(ym) = X

y1:m−1

exp

m

X

n=1

sn(yn, yn−1)

<latexit sha1_base64="KhLME4V4Sic9fp1LltTVpr0QY=">AKNXicrVXdbuNEFPZ2YbMJf9vlkpuBqpNtyUuSItAkcqvkFaFItHdRZ2sNbHyezO2NbMpNvI8gvxAjwLF9whbnkFzsw4WTtxCxVYSnJy5nznO38+Myk4U3o4/O3W1u3Xr/Tu9sfvPHmW2+/c2/7/mOVz2VMz+Kc5/LphCjKWUbPNOcPi0kJWLC6ZPJi6/M+ZMLKhXLs5/0oqBjQaYZS1lMNKi7Tu/DHbxqWI+nojyZfUAmd9FaARwmouMGeCaRWVYhRWz8qT/bAyBljPqCYVwnGSawtJq5WHRSTMVykOQvgngsHuqe+8YsES5MwsQSpJXGJ6WfhdIQRV2QrBqveAlGUIC6JnMeHlz0veoLrC0Z7xdOMki4b1ZkL+o4Mw8B+hfRtgMD3jGgXlUZn0ROBbmMVirM8ylaFgE1qzDY7TJfhrqoQzXlOrBeblyQV72XjnrYjWeb1CBkYqE31SDjf8RWX42Ql43ay6O7G17Bfi6KDZR0HKiFMfYxJ9mU048UlawMO1v8kFztpsXVyYcg6BHjhDZgy/bzXzWemzoNVSp/kcuVfExtuCo394fx+WSz3zyZwRfeh+Zu9r2FuAjhNtd9gtaRXc8L7giWbznQwIQXM+LKsnxZmkUtoiGwx5kJkGBVJQBILN1zFY9a0FZDS1knixk8X9vogbuJp5KB1eyF5Z7fW9WCT4KhNYO3+IwnMBxNUoTo7q/Ubg+dWY0K5JmYIq6VsWxg9t2uGw5pPViozI805+Y8jZ53DlUwhm1y1WE9Ov8yhP+V8PprBI2iezvDw6F90KYQ1sKOVz+n0faWxEkezwXNdMyJUufhsNDjkjNYk6rAZ4rWpD4BZnScxAzAi0Zl/ZWrdAuaBKU5hI+mUZW20SURCi1EBOwNC+gWj8zyq6z87lOPx2XLCvmaxI0rnHOkcmSsaJUzSWPMFCSWDGJF8YxANzRc5FCnBs2M8guq24nEYlyq1LK3QpqIqg2+dIkOsKQZfRnQpAs+bDEKRGMLxKakjnX5hJOl3JXvR4kF6xQdelWLjnVOIc1wjLCzQKyW6itkvGrZoB/pCgyQ9gah/KgkOpcQibujKmjYFL9vVnh1nSXLVpYgtMqbQCQjKlLXtCsrNza5LmieDKV+bxoBbyBt4GCA5JCG5w9bcOcBUxpuD6Tm8Ljo8Pw48OjHz/ZOf6ynte73nveB57vhd5D79j7zjv1zry4t9172DvufdH/tf97/4/+n85061aNedrPf2/gYLQpVa</latexit>

= X

y1:m−1 m

Y

n=1

exp sn(yn, yn−1)

<latexit sha1_base64="mezX0sqYlDlGzh2M6vW+sE82j1s=">AKNXicrVXbtGEGWcNorUW9w+5mVbwzBZx67oBkiRQIDTGwoEbl2gzgVehViRS2mT3SWxu3QsEPyh/kC/pQ9K/raX+heKIWUqKRGS0DSaHZmzsyZ4ewkp0Sq4fD3a1vX3n3Ru9mf/De+x98+NGt7Y8fy6wQMT6LM5qJpxMkMSUcnymiKH6aC4zYhOInk5fmPMnF1hIkvFf1DzHY4amnKQkRkqrou0bvw524akPpyw8lV1B5jfeRWAEYCyYJASRpSMSjYKq+flyX5YGQOoZlihCsA4yZR1SatlhHnEzFfJDkL9jwWD3VPfRYWMJMCZWYBUoLiE+DL3u1IqrKVglXvaVDCAWRIzWJEy2cL3KDaEGjPRLpykXnDer0g/9FBGPiPwD74PgA69gwpl15VmZhITBm6jJYqSLMpWJAmiwMdrvMF6nO61QNXQc2ypUJeR1r73WwLlQT+QoMjGTE/KZa62AjXlSG90901HXW3YnlvsN+JYsOlFU/7aW5MY+pIhPKf5CQmEF62aw9texdLA2WhcWxJTqRA+cITGP7a+bz0SdBqdM8AO4Vsfm23MFb3h/nvyDL/bMFbOi+bv5672s3NwEUp8pvoFrQzZj6fYGCTGfK1I9oPkOl8Xb0mSRWRYNiD3gpkIGZMS1A7dEckvkJs9cZMnCzYTpcH1LDuQKkUbOtW6vLt7ZrXQ9WAc4agNYu/8IoueDMCxBXZ3V+o3Bc6sxwVQhM4TVQrYtjF7YNUP1mk+WKjMjzTn7jyNXnQOVTDW2TYT06/zKF/xXwzdcIGEW3doaHQ/uAdSGshR2vfk6j7S0BkywuGOYqpkjK83CYq3GJhCIxdUAFhLnKH6JpvhcixzploxLe6tWYFdrEpBmQn+4Alb9CgRk3LOJtrSvIBy9cwou87OC5V+NS4JzwuFeyA0oIClQFzRYOECBwrOtcCigXRuYJ4hnQ3lL7INU8NmBmF1i1C4nZuJSpRW+lNGFV2/nSFTqAnP8Ks4YQz5vIQpYoTOE5yigipzCacLuYuvO8kFyWVN3TIkxQpmeo0QjqhZQHYLtdV2ybhVM4DfYt0gU901j/lWCVCZ2Ju6Mq3bAp/NSs8OpNloQvLbXYLqu0CehiDC9ZjnlZubVJM4nhZCqyIm8lvOZvE9UBUKrb4Ox281Z6CkNV2dyXh8dBh+eXj0892d46/reb3p3fY+83wv9O5x94P3ql35sW97d693nHvYf+3/h/9P/t/OdOta7XPJ17r6f/9D9AplSI=</latexit>

= X

ym−1

(exp sm(ym, ym−1)) X

y1:m−2 m−1

Y

n=1

exp sn(yn, yn−1)

<latexit sha1_base64="1jW+4tvq9pLDCdP9LJYNMcCYaQ=">AKNXicrVb9s2Fazra69W7M9oVbERamtRKB3RYSC7YkCRLQOWtkPoCrRE2WxJSiCpNIagP7Q/sN+yh70Ne91fGC+yK9lylgATYPv48JzvfOeiw0lOiVTD4R+3t56+53bvTv9wbvf/Bh3e3P3oqs0LE+CzOaCaeT5DElHB8poi+HkuMGITip9NXn1jzp9dYCFJxn9R8xyPGZpykpIYKa2Ktm/NtiFp5L4cMLK19V9YH7nVQBGAMqCQUoYUTIq2SisXpQn+2FlDKCaYUqAOMkU9YlrZYI84iZr5IdhPofCwa7p75DhYwkwJnZAKlAcQnxZe53UQiqskXBqvd0UMIBZEjNYkTLXxdxg2oD0J5BunGSecN6PSH/yUEY+E/APvg+ABp7hpSjV1UGE4kpQ5fRUgVpNgWLIoBmFQa7XeYLqvOaqinXgUW5cUHeYO29AeuKapBvUIGRjJjfVGsdbOBFZfjliUZdr7o7sbXvsF9h0RFl1U976VoYx9SxKcUP5BQWMG6mVj767E0WDtaVyIKdVED5whMY/tpr5ovRJ0Gqp0zwG7hWxfFvu4D/eH+e/KJb7ZxPY0H3d/PXe125uAihOld+IaoNujqnfFyjIdKZM/ojmM+TqsnhbmlVktomiD3gJkMGZMS1A7eF5MumdbrmIksWfgany/dqEuT6SCPnWbdXJ+/MVroerDM9auNbu01srxlEzwdhWI6Oav1G4PnVmOCqUJmCKuFbFsYvbRrhuo1nyxVZkac/bdeRq97ByqYKy3yabDenSuSeF/DXj1NQJG0d2d4eHQPmBdCGthx6uf02h7S8AkiwuGuYopkvI8HOZqXCKhSExNYCFxDmKX6EpPtciR7ol49LeqhXY1ZoEpJnQH6A1TY9SsSknLOJtjQvoFw9M8qus/NCpV+MS8LzQmEeu0BpQYHKgLmiQUIEjhWdawHFgmiuIJ4h3Q2lL3Jdp0aYGaYXWLUTidm4lKmN3qI0YVXb+dIlOoACc/w6zhDPmshClihM4TnKCKnMJpwu5q173kwuSy7p0S0iKFcz0GiEcUbOA7BZq+2ScatmAL/FukECn2jWP+VYIJUJzcTdUZVu2BR+YlZ4dZUl4UtLbTKi0BnYypS5ZjXlZubdJMYjiZiqzIW4TX/C1RDYBS3QZnj9tuzkJPabg6k+vC06PD8OHh0c+f7x/Xc/rHe+e96ne6H3yDv2fvBOvTMv7m3HvWOe1/1f+/2f+r/7cz3bpV+3zstZ7+P/8CrWyU6g=</latexit>
slide-51
SLIDE 51

Learning in CRFs

9

■ Likelihood can be computed efficiently using forward algorithm. ■ As in decoding / Viterbi, can be decomposed into recursive substructure:

Define:

= X

ym−1

(exp sm(ym, ym−1)) × αm−1(ym−1)

<latexit sha1_base64="S+C2pUnFb9NeQcjrf+w7RtILE54=">AJH3iclVXdbts2FazJfW8v2a73A27IjcNJmVDtjQIkCxDcOAIlsGLG2H0BVoibKJkpRAUmkMQW/RF9jT7G7YbZ9lNzskZceylS0RYJs+PN/3nR8ecVxwps1w+O7Oxnvb27d7X3Q/Cjz/59N72Z891XqEniU5z9XLMdGUM0nPDOcviwUJWLM6Yvx6+/t/osLqjTL5W9mVtCRIBPJMpYQA6Z4e/Ntfxefahbisaje1A+R/Z3VA3SMsC4F5kwo+NKHEf1q+pkP6qtAzZTakiNcJLmxkGyesEwi4X9qsRBP/EoL97GnpWLFiKvJsTyBRJKkwvi7ArhEFdtUJw5j0QZRJhQcw0Ibz6fa47qK8h2rNMt06yWPJeTyh8dhANwmdoH/04QMA9JcaHV9eWk6iJIJfxwoR5PkHzIqDlKvR3u9znoc6aUG25DhzLrQtyxbV3RdalaplvUYFjHYtw2Qw2vMQXV9HjE2Bdr7rfcbXv8F+JokNlFQcoqIV1DjEncsLpVxort3Awq7W/rgVkbUuLUw5h0APvCOzj+3mvmqCtmg1VJveYL8iLh4W3D0P/Pj8fNi+X8ugWu6D81f730D8yeA08yES6pO9HpNmBes2GRqbP6EF1Pi6zKflqsqsfCVdGKuA1pMxRIxIA0hVSLprWCS1Uns5xlqcLe7MgbsJ07KFNfyF7fS9sG6wFbwPldI3JDTgfTFCNmuScNVwcvPjezvBw6B60voiaxU7QPKfx9obCaZ6UgkqTcKL1eTQszKgiyrCE07qPS0LkrwmE3oOS0lAfFS5+6NGu2BJUZYr+EiDnHUZURGh9UyMwdMeNb26Z41de+elyb4dVUwWpaEy8UJZyZHJkb2MUMoUTQyfwYIkikGsKJkSuBUMXFnQ+SWZKeUX1LQTScSo0plTb4U0FnUbfOkT7WNFJX2T5EIQmT6ocEYE47OUZqTkxl432XzdVa+H6QUrdFO6BSWnBucwMEwSbkfNzVvb7MbJD1Uf/0ChQYqeQNS/FQRkyuIxL+Na2jYBN+3L6v6vzyZXHjCsp1W5QKAZGxd8oLKqvYvCJ5riscTlZdFK+A1vAsUCEgGbfD+tA3zHnBKo9Uzub54fnQYPTo8+vXrnafNe1F3wRfBmEQR8EzwNfgpOg7Mg2fxn6/7Wg6393h+9P3t/9f72rht3GsznQevpvfsXiMUzJw=</latexit>

αm(ym) = X

y1:m−1

exp

m

X

n=1

sn(yn, yn−1)

<latexit sha1_base64="KhLME4V4Sic9fp1LltTVpr0QY=">AKNXicrVXdbuNEFPZ2YbMJf9vlkpuBqpNtyUuSItAkcqvkFaFItHdRZ2sNbHyezO2NbMpNvI8gvxAjwLF9whbnkFzsw4WTtxCxVYSnJy5nznO38+Myk4U3o4/O3W1u3Xr/Tu9sfvPHmW2+/c2/7/mOVz2VMz+Kc5/LphCjKWUbPNOcPi0kJWLC6ZPJi6/M+ZMLKhXLs5/0oqBjQaYZS1lMNKi7Tu/DHbxqWI+nojyZfUAmd9FaARwmouMGeCaRWVYhRWz8qT/bAyBljPqCYVwnGSawtJq5WHRSTMVykOQvgngsHuqe+8YsES5MwsQSpJXGJ6WfhdIQRV2QrBqveAlGUIC6JnMeHlz0veoLrC0Z7xdOMki4b1ZkL+o4Mw8B+hfRtgMD3jGgXlUZn0ROBbmMVirM8ylaFgE1qzDY7TJfhrqoQzXlOrBeblyQV72XjnrYjWeb1CBkYqE31SDjf8RWX42Ql43ay6O7G17Bfi6KDZR0HKiFMfYxJ9mU048UlawMO1v8kFztpsXVyYcg6BHjhDZgy/bzXzWemzoNVSp/kcuVfExtuCo394fx+WSz3zyZwRfeh+Zu9r2FuAjhNtd9gtaRXc8L7giWbznQwIQXM+LKsnxZmkUtoiGwx5kJkGBVJQBILN1zFY9a0FZDS1knixk8X9vogbuJp5KB1eyF5Z7fW9WCT4KhNYO3+IwnMBxNUoTo7q/Ubg+dWY0K5JmYIq6VsWxg9t2uGw5pPViozI805+Y8jZ53DlUwhm1y1WE9Ov8yhP+V8PprBI2iezvDw6F90KYQ1sKOVz+n0faWxEkezwXNdMyJUufhsNDjkjNYk6rAZ4rWpD4BZnScxAzAi0Zl/ZWrdAuaBKU5hI+mUZW20SURCi1EBOwNC+gWj8zyq6z87lOPx2XLCvmaxI0rnHOkcmSsaJUzSWPMFCSWDGJF8YxANzRc5FCnBs2M8guq24nEYlyq1LK3QpqIqg2+dIkOsKQZfRnQpAs+bDEKRGMLxKakjnX5hJOl3JXvR4kF6xQdelWLjnVOIc1wjLCzQKyW6itkvGrZoB/pCgyQ9gah/KgkOpcQibujKmjYFL9vVnh1nSXLVpYgtMqbQCQjKlLXtCsrNza5LmieDKV+bxoBbyBt4GCA5JCG5w9bcOcBUxpuD6Tm8Ljo8Pw48OjHz/ZOf6ynte73nveB57vhd5D79j7zjv1zry4t9172DvufdH/tf97/4/+n85061aNedrPf2/gYLQpVa</latexit>

= X

y1:m−1 m

Y

n=1

exp sn(yn, yn−1)

<latexit sha1_base64="mezX0sqYlDlGzh2M6vW+sE82j1s=">AKNXicrVXbtGEGWcNorUW9w+5mVbwzBZx67oBkiRQIDTGwoEbl2gzgVehViRS2mT3SWxu3QsEPyh/kC/pQ9K/raX+heKIWUqKRGS0DSaHZmzsyZ4ewkp0Sq4fD3a1vX3n3Ru9mf/De+x98+NGt7Y8fy6wQMT6LM5qJpxMkMSUcnymiKH6aC4zYhOInk5fmPMnF1hIkvFf1DzHY4amnKQkRkqrou0bvw524akPpyw8lV1B5jfeRWAEYCyYJASRpSMSjYKq+flyX5YGQOoZlihCsA4yZR1SatlhHnEzFfJDkL9jwWD3VPfRYWMJMCZWYBUoLiE+DL3u1IqrKVglXvaVDCAWRIzWJEy2cL3KDaEGjPRLpykXnDer0g/9FBGPiPwD74PgA69gwpl15VmZhITBm6jJYqSLMpWJAmiwMdrvMF6nO61QNXQc2ypUJeR1r73WwLlQT+QoMjGTE/KZa62AjXlSG90901HXW3YnlvsN+JYsOlFU/7aW5MY+pIhPKf5CQmEF62aw9texdLA2WhcWxJTqRA+cITGP7a+bz0SdBqdM8AO4Vsfm23MFb3h/nvyDL/bMFbOi+bv5672s3NwEUp8pvoFrQzZj6fYGCTGfK1I9oPkOl8Xb0mSRWRYNiD3gpkIGZMS1A7dEckvkJs9cZMnCzYTpcH1LDuQKkUbOtW6vLt7ZrXQ9WAc4agNYu/8IoueDMCxBXZ3V+o3Bc6sxwVQhM4TVQrYtjF7YNUP1mk+WKjMjzTn7jyNXnQOVTDW2TYT06/zKF/xXwzdcIGEW3doaHQ/uAdSGshR2vfk6j7S0BkywuGOYqpkjK83CYq3GJhCIxdUAFhLnKH6JpvhcixzploxLe6tWYFdrEpBmQn+4Alb9CgRk3LOJtrSvIBy9cwou87OC5V+NS4JzwuFeyA0oIClQFzRYOECBwrOtcCigXRuYJ4hnQ3lL7INU8NmBmF1i1C4nZuJSpRW+lNGFV2/nSFTqAnP8Ks4YQz5vIQpYoTOE5yigipzCacLuYuvO8kFyWVN3TIkxQpmeo0QjqhZQHYLtdV2ybhVM4DfYt0gU901j/lWCVCZ2Ju6Mq3bAp/NSs8OpNloQvLbXYLqu0CehiDC9ZjnlZubVJM4nhZCqyIm8lvOZvE9UBUKrb4Ox281Z6CkNV2dyXh8dBh+eXj0892d46/reb3p3fY+83wv9O5x94P3ql35sW97d693nHvYf+3/h/9P/t/OdOta7XPJ17r6f/9D9AplSI=</latexit>

= X

ym−1

(exp sm(ym, ym−1)) X

y1:m−2 m−1

Y

n=1

exp sn(yn, yn−1)

<latexit sha1_base64="1jW+4tvq9pLDCdP9LJYNMcCYaQ=">AKNXicrVb9s2Fazra69W7M9oVbERamtRKB3RYSC7YkCRLQOWtkPoCrRE2WxJSiCpNIagP7Q/sN+yh70Ne91fGC+yK9lylgATYPv48JzvfOeiw0lOiVTD4R+3t56+53bvTv9wbvf/Bh3e3P3oqs0LE+CzOaCaeT5DElHB8poi+HkuMGITip9NXn1jzp9dYCFJxn9R8xyPGZpykpIYKa2Ktm/NtiFp5L4cMLK19V9YH7nVQBGAMqCQUoYUTIq2SisXpQn+2FlDKCaYUqAOMkU9YlrZYI84iZr5IdhPofCwa7p75DhYwkwJnZAKlAcQnxZe53UQiqskXBqvd0UMIBZEjNYkTLXxdxg2oD0J5BunGSecN6PSH/yUEY+E/APvg+ABp7hpSjV1UGE4kpQ5fRUgVpNgWLIoBmFQa7XeYLqvOaqinXgUW5cUHeYO29AeuKapBvUIGRjJjfVGsdbOBFZfjliUZdr7o7sbXvsF9h0RFl1U976VoYx9SxKcUP5BQWMG6mVj767E0WDtaVyIKdVED5whMY/tpr5ovRJ0Gqp0zwG7hWxfFvu4D/eH+e/KJb7ZxPY0H3d/PXe125uAihOld+IaoNujqnfFyjIdKZM/ojmM+TqsnhbmlVktomiD3gJkMGZMS1A7eF5MumdbrmIksWfgany/dqEuT6SCPnWbdXJ+/MVroerDM9auNbu01srxlEzwdhWI6Oav1G4PnVmOCqUJmCKuFbFsYvbRrhuo1nyxVZkac/bdeRq97ByqYKy3yabDenSuSeF/DXj1NQJG0d2d4eHQPmBdCGthx6uf02h7S8AkiwuGuYopkvI8HOZqXCKhSExNYCFxDmKX6EpPtciR7ol49LeqhXY1ZoEpJnQH6A1TY9SsSknLOJtjQvoFw9M8qus/NCpV+MS8LzQmEeu0BpQYHKgLmiQUIEjhWdawHFgmiuIJ4h3Q2lL3Jdp0aYGaYXWLUTidm4lKmN3qI0YVXb+dIlOoACc/w6zhDPmshClihM4TnKCKnMJpwu5q173kwuSy7p0S0iKFcz0GiEcUbOA7BZq+2ScatmAL/FukECn2jWP+VYIJUJzcTdUZVu2BR+YlZ4dZUl4UtLbTKi0BnYypS5ZjXlZubdJMYjiZiqzIW4TX/C1RDYBS3QZnj9tuzkJPabg6k+vC06PD8OHh0c+f7x/Xc/rHe+e96ne6H3yDv2fvBOvTMv7m3HvWOe1/1f+/2f+r/7cz3bpV+3zstZ7+P/8CrWyU6g=</latexit>
slide-52
SLIDE 52

Learning in CRFs

9

■ Likelihood can be computed efficiently using forward algorithm. ■ As in decoding / Viterbi, can be decomposed into recursive substructure:

Define:

= X

ym−1

(exp sm(ym, ym−1)) × αm−1(ym−1)

<latexit sha1_base64="S+C2pUnFb9NeQcjrf+w7RtILE54=">AJH3iclVXdbts2FazJfW8v2a73A27IjcNJmVDtjQIkCxDcOAIlsGLG2H0BVoibKJkpRAUmkMQW/RF9jT7G7YbZ9lNzskZceylS0RYJs+PN/3nR8ecVxwps1w+O7Oxnvb27d7X3Q/Cjz/59N72Z891XqEniU5z9XLMdGUM0nPDOcviwUJWLM6Yvx6+/t/osLqjTL5W9mVtCRIBPJMpYQA6Z4e/Ntfxefahbisaje1A+R/Z3VA3SMsC4F5kwo+NKHEf1q+pkP6qtAzZTakiNcJLmxkGyesEwi4X9qsRBP/EoL97GnpWLFiKvJsTyBRJKkwvi7ArhEFdtUJw5j0QZRJhQcw0Ibz6fa47qK8h2rNMt06yWPJeTyh8dhANwmdoH/04QMA9JcaHV9eWk6iJIJfxwoR5PkHzIqDlKvR3u9znoc6aUG25DhzLrQtyxbV3RdalaplvUYFjHYtw2Qw2vMQXV9HjE2Bdr7rfcbXv8F+JokNlFQcoqIV1DjEncsLpVxort3Awq7W/rgVkbUuLUw5h0APvCOzj+3mvmqCtmg1VJveYL8iLh4W3D0P/Pj8fNi+X8ugWu6D81f730D8yeA08yES6pO9HpNmBes2GRqbP6EF1Pi6zKflqsqsfCVdGKuA1pMxRIxIA0hVSLprWCS1Uns5xlqcLe7MgbsJ07KFNfyF7fS9sG6wFbwPldI3JDTgfTFCNmuScNVwcvPjezvBw6B60voiaxU7QPKfx9obCaZ6UgkqTcKL1eTQszKgiyrCE07qPS0LkrwmE3oOS0lAfFS5+6NGu2BJUZYr+EiDnHUZURGh9UyMwdMeNb26Z41de+elyb4dVUwWpaEy8UJZyZHJkb2MUMoUTQyfwYIkikGsKJkSuBUMXFnQ+SWZKeUX1LQTScSo0plTb4U0FnUbfOkT7WNFJX2T5EIQmT6ocEYE47OUZqTkxl432XzdVa+H6QUrdFO6BSWnBucwMEwSbkfNzVvb7MbJD1Uf/0ChQYqeQNS/FQRkyuIxL+Na2jYBN+3L6v6vzyZXHjCsp1W5QKAZGxd8oLKqvYvCJ5riscTlZdFK+A1vAsUCEgGbfD+tA3zHnBKo9Uzub54fnQYPTo8+vXrnafNe1F3wRfBmEQR8EzwNfgpOg7Mg2fxn6/7Wg6393h+9P3t/9f72rht3GsznQevpvfsXiMUzJw=</latexit>

sum instead of max

αm(ym) = X

y1:m−1

exp

m

X

n=1

sn(yn, yn−1)

<latexit sha1_base64="KhLME4V4Sic9fp1LltTVpr0QY=">AKNXicrVXdbuNEFPZ2YbMJf9vlkpuBqpNtyUuSItAkcqvkFaFItHdRZ2sNbHyezO2NbMpNvI8gvxAjwLF9whbnkFzsw4WTtxCxVYSnJy5nznO38+Myk4U3o4/O3W1u3Xr/Tu9sfvPHmW2+/c2/7/mOVz2VMz+Kc5/LphCjKWUbPNOcPi0kJWLC6ZPJi6/M+ZMLKhXLs5/0oqBjQaYZS1lMNKi7Tu/DHbxqWI+nojyZfUAmd9FaARwmouMGeCaRWVYhRWz8qT/bAyBljPqCYVwnGSawtJq5WHRSTMVykOQvgngsHuqe+8YsES5MwsQSpJXGJ6WfhdIQRV2QrBqveAlGUIC6JnMeHlz0veoLrC0Z7xdOMki4b1ZkL+o4Mw8B+hfRtgMD3jGgXlUZn0ROBbmMVirM8ylaFgE1qzDY7TJfhrqoQzXlOrBeblyQV72XjnrYjWeb1CBkYqE31SDjf8RWX42Ql43ay6O7G17Bfi6KDZR0HKiFMfYxJ9mU048UlawMO1v8kFztpsXVyYcg6BHjhDZgy/bzXzWemzoNVSp/kcuVfExtuCo394fx+WSz3zyZwRfeh+Zu9r2FuAjhNtd9gtaRXc8L7giWbznQwIQXM+LKsnxZmkUtoiGwx5kJkGBVJQBILN1zFY9a0FZDS1knixk8X9vogbuJp5KB1eyF5Z7fW9WCT4KhNYO3+IwnMBxNUoTo7q/Ubg+dWY0K5JmYIq6VsWxg9t2uGw5pPViozI805+Y8jZ53DlUwhm1y1WE9Ov8yhP+V8PprBI2iezvDw6F90KYQ1sKOVz+n0faWxEkezwXNdMyJUufhsNDjkjNYk6rAZ4rWpD4BZnScxAzAi0Zl/ZWrdAuaBKU5hI+mUZW20SURCi1EBOwNC+gWj8zyq6z87lOPx2XLCvmaxI0rnHOkcmSsaJUzSWPMFCSWDGJF8YxANzRc5FCnBs2M8guq24nEYlyq1LK3QpqIqg2+dIkOsKQZfRnQpAs+bDEKRGMLxKakjnX5hJOl3JXvR4kF6xQdelWLjnVOIc1wjLCzQKyW6itkvGrZoB/pCgyQ9gah/KgkOpcQibujKmjYFL9vVnh1nSXLVpYgtMqbQCQjKlLXtCsrNza5LmieDKV+bxoBbyBt4GCA5JCG5w9bcOcBUxpuD6Tm8Ljo8Pw48OjHz/ZOf6ynte73nveB57vhd5D79j7zjv1zry4t9172DvufdH/tf97/4/+n85061aNedrPf2/gYLQpVa</latexit>

= X

y1:m−1 m

Y

n=1

exp sn(yn, yn−1)

<latexit sha1_base64="mezX0sqYlDlGzh2M6vW+sE82j1s=">AKNXicrVXbtGEGWcNorUW9w+5mVbwzBZx67oBkiRQIDTGwoEbl2gzgVehViRS2mT3SWxu3QsEPyh/kC/pQ9K/raX+heKIWUqKRGS0DSaHZmzsyZ4ewkp0Sq4fD3a1vX3n3Ru9mf/De+x98+NGt7Y8fy6wQMT6LM5qJpxMkMSUcnymiKH6aC4zYhOInk5fmPMnF1hIkvFf1DzHY4amnKQkRkqrou0bvw524akPpyw8lV1B5jfeRWAEYCyYJASRpSMSjYKq+flyX5YGQOoZlihCsA4yZR1SatlhHnEzFfJDkL9jwWD3VPfRYWMJMCZWYBUoLiE+DL3u1IqrKVglXvaVDCAWRIzWJEy2cL3KDaEGjPRLpykXnDer0g/9FBGPiPwD74PgA69gwpl15VmZhITBm6jJYqSLMpWJAmiwMdrvMF6nO61QNXQc2ypUJeR1r73WwLlQT+QoMjGTE/KZa62AjXlSG90901HXW3YnlvsN+JYsOlFU/7aW5MY+pIhPKf5CQmEF62aw9texdLA2WhcWxJTqRA+cITGP7a+bz0SdBqdM8AO4Vsfm23MFb3h/nvyDL/bMFbOi+bv5672s3NwEUp8pvoFrQzZj6fYGCTGfK1I9oPkOl8Xb0mSRWRYNiD3gpkIGZMS1A7dEckvkJs9cZMnCzYTpcH1LDuQKkUbOtW6vLt7ZrXQ9WAc4agNYu/8IoueDMCxBXZ3V+o3Bc6sxwVQhM4TVQrYtjF7YNUP1mk+WKjMjzTn7jyNXnQOVTDW2TYT06/zKF/xXwzdcIGEW3doaHQ/uAdSGshR2vfk6j7S0BkywuGOYqpkjK83CYq3GJhCIxdUAFhLnKH6JpvhcixzploxLe6tWYFdrEpBmQn+4Alb9CgRk3LOJtrSvIBy9cwou87OC5V+NS4JzwuFeyA0oIClQFzRYOECBwrOtcCigXRuYJ4hnQ3lL7INU8NmBmF1i1C4nZuJSpRW+lNGFV2/nSFTqAnP8Ks4YQz5vIQpYoTOE5yigipzCacLuYuvO8kFyWVN3TIkxQpmeo0QjqhZQHYLtdV2ybhVM4DfYt0gU901j/lWCVCZ2Ju6Mq3bAp/NSs8OpNloQvLbXYLqu0CehiDC9ZjnlZubVJM4nhZCqyIm8lvOZvE9UBUKrb4Ox281Z6CkNV2dyXh8dBh+eXj0892d46/reb3p3fY+83wv9O5x94P3ql35sW97d693nHvYf+3/h/9P/t/OdOta7XPJ17r6f/9D9AplSI=</latexit>

= X

ym−1

(exp sm(ym, ym−1)) X

y1:m−2 m−1

Y

n=1

exp sn(yn, yn−1)

<latexit sha1_base64="1jW+4tvq9pLDCdP9LJYNMcCYaQ=">AKNXicrVb9s2Fazra69W7M9oVbERamtRKB3RYSC7YkCRLQOWtkPoCrRE2WxJSiCpNIagP7Q/sN+yh70Ne91fGC+yK9lylgATYPv48JzvfOeiw0lOiVTD4R+3t56+53bvTv9wbvf/Bh3e3P3oqs0LE+CzOaCaeT5DElHB8poi+HkuMGITip9NXn1jzp9dYCFJxn9R8xyPGZpykpIYKa2Ktm/NtiFp5L4cMLK19V9YH7nVQBGAMqCQUoYUTIq2SisXpQn+2FlDKCaYUqAOMkU9YlrZYI84iZr5IdhPofCwa7p75DhYwkwJnZAKlAcQnxZe53UQiqskXBqvd0UMIBZEjNYkTLXxdxg2oD0J5BunGSecN6PSH/yUEY+E/APvg+ABp7hpSjV1UGE4kpQ5fRUgVpNgWLIoBmFQa7XeYLqvOaqinXgUW5cUHeYO29AeuKapBvUIGRjJjfVGsdbOBFZfjliUZdr7o7sbXvsF9h0RFl1U976VoYx9SxKcUP5BQWMG6mVj767E0WDtaVyIKdVED5whMY/tpr5ovRJ0Gqp0zwG7hWxfFvu4D/eH+e/KJb7ZxPY0H3d/PXe125uAihOld+IaoNujqnfFyjIdKZM/ojmM+TqsnhbmlVktomiD3gJkMGZMS1A7eF5MumdbrmIksWfgany/dqEuT6SCPnWbdXJ+/MVroerDM9auNbu01srxlEzwdhWI6Oav1G4PnVmOCqUJmCKuFbFsYvbRrhuo1nyxVZkac/bdeRq97ByqYKy3yabDenSuSeF/DXj1NQJG0d2d4eHQPmBdCGthx6uf02h7S8AkiwuGuYopkvI8HOZqXCKhSExNYCFxDmKX6EpPtciR7ol49LeqhXY1ZoEpJnQH6A1TY9SsSknLOJtjQvoFw9M8qus/NCpV+MS8LzQmEeu0BpQYHKgLmiQUIEjhWdawHFgmiuIJ4h3Q2lL3Jdp0aYGaYXWLUTidm4lKmN3qI0YVXb+dIlOoACc/w6zhDPmshClihM4TnKCKnMJpwu5q173kwuSy7p0S0iKFcz0GiEcUbOA7BZq+2ScatmAL/FukECn2jWP+VYIJUJzcTdUZVu2BR+YlZ4dZUl4UtLbTKi0BnYypS5ZjXlZubdJMYjiZiqzIW4TX/C1RDYBS3QZnj9tuzkJPabg6k+vC06PD8OHh0c+f7x/Xc/rHe+e96ne6H3yDv2fvBOvTMv7m3HvWOe1/1f+/2f+r/7cz3bpV+3zstZ7+P/8CrWyU6g=</latexit>
slide-53
SLIDE 53

Learning in CRFs

9

■ Likelihood can be computed efficiently using forward algorithm. ■ As in decoding / Viterbi, can be decomposed into recursive substructure:

Define:

= X

ym−1

(exp sm(ym, ym−1)) × αm−1(ym−1)

<latexit sha1_base64="S+C2pUnFb9NeQcjrf+w7RtILE54=">AJH3iclVXdbts2FazJfW8v2a73A27IjcNJmVDtjQIkCxDcOAIlsGLG2H0BVoibKJkpRAUmkMQW/RF9jT7G7YbZ9lNzskZceylS0RYJs+PN/3nR8ecVxwps1w+O7Oxnvb27d7X3Q/Cjz/59N72Z891XqEniU5z9XLMdGUM0nPDOcviwUJWLM6Yvx6+/t/osLqjTL5W9mVtCRIBPJMpYQA6Z4e/Ntfxefahbisaje1A+R/Z3VA3SMsC4F5kwo+NKHEf1q+pkP6qtAzZTakiNcJLmxkGyesEwi4X9qsRBP/EoL97GnpWLFiKvJsTyBRJKkwvi7ArhEFdtUJw5j0QZRJhQcw0Ibz6fa47qK8h2rNMt06yWPJeTyh8dhANwmdoH/04QMA9JcaHV9eWk6iJIJfxwoR5PkHzIqDlKvR3u9znoc6aUG25DhzLrQtyxbV3RdalaplvUYFjHYtw2Qw2vMQXV9HjE2Bdr7rfcbXv8F+JokNlFQcoqIV1DjEncsLpVxort3Awq7W/rgVkbUuLUw5h0APvCOzj+3mvmqCtmg1VJveYL8iLh4W3D0P/Pj8fNi+X8ugWu6D81f730D8yeA08yES6pO9HpNmBes2GRqbP6EF1Pi6zKflqsqsfCVdGKuA1pMxRIxIA0hVSLprWCS1Uns5xlqcLe7MgbsJ07KFNfyF7fS9sG6wFbwPldI3JDTgfTFCNmuScNVwcvPjezvBw6B60voiaxU7QPKfx9obCaZ6UgkqTcKL1eTQszKgiyrCE07qPS0LkrwmE3oOS0lAfFS5+6NGu2BJUZYr+EiDnHUZURGh9UyMwdMeNb26Z41de+elyb4dVUwWpaEy8UJZyZHJkb2MUMoUTQyfwYIkikGsKJkSuBUMXFnQ+SWZKeUX1LQTScSo0plTb4U0FnUbfOkT7WNFJX2T5EIQmT6ocEYE47OUZqTkxl432XzdVa+H6QUrdFO6BSWnBucwMEwSbkfNzVvb7MbJD1Uf/0ChQYqeQNS/FQRkyuIxL+Na2jYBN+3L6v6vzyZXHjCsp1W5QKAZGxd8oLKqvYvCJ5riscTlZdFK+A1vAsUCEgGbfD+tA3zHnBKo9Uzub54fnQYPTo8+vXrnafNe1F3wRfBmEQR8EzwNfgpOg7Mg2fxn6/7Wg6393h+9P3t/9f72rht3GsznQevpvfsXiMUzJw=</latexit>

sum instead of max Viterbi is a special case of the max-product algorithm, forward is a special case of the sum-product algorithm.

αm(ym) = X

y1:m−1

exp

m

X

n=1

sn(yn, yn−1)

<latexit sha1_base64="KhLME4V4Sic9fp1LltTVpr0QY=">AKNXicrVXdbuNEFPZ2YbMJf9vlkpuBqpNtyUuSItAkcqvkFaFItHdRZ2sNbHyezO2NbMpNvI8gvxAjwLF9whbnkFzsw4WTtxCxVYSnJy5nznO38+Myk4U3o4/O3W1u3Xr/Tu9sfvPHmW2+/c2/7/mOVz2VMz+Kc5/LphCjKWUbPNOcPi0kJWLC6ZPJi6/M+ZMLKhXLs5/0oqBjQaYZS1lMNKi7Tu/DHbxqWI+nojyZfUAmd9FaARwmouMGeCaRWVYhRWz8qT/bAyBljPqCYVwnGSawtJq5WHRSTMVykOQvgngsHuqe+8YsES5MwsQSpJXGJ6WfhdIQRV2QrBqveAlGUIC6JnMeHlz0veoLrC0Z7xdOMki4b1ZkL+o4Mw8B+hfRtgMD3jGgXlUZn0ROBbmMVirM8ylaFgE1qzDY7TJfhrqoQzXlOrBeblyQV72XjnrYjWeb1CBkYqE31SDjf8RWX42Ql43ay6O7G17Bfi6KDZR0HKiFMfYxJ9mU048UlawMO1v8kFztpsXVyYcg6BHjhDZgy/bzXzWemzoNVSp/kcuVfExtuCo394fx+WSz3zyZwRfeh+Zu9r2FuAjhNtd9gtaRXc8L7giWbznQwIQXM+LKsnxZmkUtoiGwx5kJkGBVJQBILN1zFY9a0FZDS1knixk8X9vogbuJp5KB1eyF5Z7fW9WCT4KhNYO3+IwnMBxNUoTo7q/Ubg+dWY0K5JmYIq6VsWxg9t2uGw5pPViozI805+Y8jZ53DlUwhm1y1WE9Ov8yhP+V8PprBI2iezvDw6F90KYQ1sKOVz+n0faWxEkezwXNdMyJUufhsNDjkjNYk6rAZ4rWpD4BZnScxAzAi0Zl/ZWrdAuaBKU5hI+mUZW20SURCi1EBOwNC+gWj8zyq6z87lOPx2XLCvmaxI0rnHOkcmSsaJUzSWPMFCSWDGJF8YxANzRc5FCnBs2M8guq24nEYlyq1LK3QpqIqg2+dIkOsKQZfRnQpAs+bDEKRGMLxKakjnX5hJOl3JXvR4kF6xQdelWLjnVOIc1wjLCzQKyW6itkvGrZoB/pCgyQ9gah/KgkOpcQibujKmjYFL9vVnh1nSXLVpYgtMqbQCQjKlLXtCsrNza5LmieDKV+bxoBbyBt4GCA5JCG5w9bcOcBUxpuD6Tm8Ljo8Pw48OjHz/ZOf6ynte73nveB57vhd5D79j7zjv1zry4t9172DvufdH/tf97/4/+n85061aNedrPf2/gYLQpVa</latexit>

= X

y1:m−1 m

Y

n=1

exp sn(yn, yn−1)

<latexit sha1_base64="mezX0sqYlDlGzh2M6vW+sE82j1s=">AKNXicrVXbtGEGWcNorUW9w+5mVbwzBZx67oBkiRQIDTGwoEbl2gzgVehViRS2mT3SWxu3QsEPyh/kC/pQ9K/raX+heKIWUqKRGS0DSaHZmzsyZ4ewkp0Sq4fD3a1vX3n3Ru9mf/De+x98+NGt7Y8fy6wQMT6LM5qJpxMkMSUcnymiKH6aC4zYhOInk5fmPMnF1hIkvFf1DzHY4amnKQkRkqrou0bvw524akPpyw8lV1B5jfeRWAEYCyYJASRpSMSjYKq+flyX5YGQOoZlihCsA4yZR1SatlhHnEzFfJDkL9jwWD3VPfRYWMJMCZWYBUoLiE+DL3u1IqrKVglXvaVDCAWRIzWJEy2cL3KDaEGjPRLpykXnDer0g/9FBGPiPwD74PgA69gwpl15VmZhITBm6jJYqSLMpWJAmiwMdrvMF6nO61QNXQc2ypUJeR1r73WwLlQT+QoMjGTE/KZa62AjXlSG90901HXW3YnlvsN+JYsOlFU/7aW5MY+pIhPKf5CQmEF62aw9texdLA2WhcWxJTqRA+cITGP7a+bz0SdBqdM8AO4Vsfm23MFb3h/nvyDL/bMFbOi+bv5672s3NwEUp8pvoFrQzZj6fYGCTGfK1I9oPkOl8Xb0mSRWRYNiD3gpkIGZMS1A7dEckvkJs9cZMnCzYTpcH1LDuQKkUbOtW6vLt7ZrXQ9WAc4agNYu/8IoueDMCxBXZ3V+o3Bc6sxwVQhM4TVQrYtjF7YNUP1mk+WKjMjzTn7jyNXnQOVTDW2TYT06/zKF/xXwzdcIGEW3doaHQ/uAdSGshR2vfk6j7S0BkywuGOYqpkjK83CYq3GJhCIxdUAFhLnKH6JpvhcixzploxLe6tWYFdrEpBmQn+4Alb9CgRk3LOJtrSvIBy9cwou87OC5V+NS4JzwuFeyA0oIClQFzRYOECBwrOtcCigXRuYJ4hnQ3lL7INU8NmBmF1i1C4nZuJSpRW+lNGFV2/nSFTqAnP8Ks4YQz5vIQpYoTOE5yigipzCacLuYuvO8kFyWVN3TIkxQpmeo0QjqhZQHYLtdV2ybhVM4DfYt0gU901j/lWCVCZ2Ju6Mq3bAp/NSs8OpNloQvLbXYLqu0CehiDC9ZjnlZubVJM4nhZCqyIm8lvOZvE9UBUKrb4Ox281Z6CkNV2dyXh8dBh+eXj0892d46/reb3p3fY+83wv9O5x94P3ql35sW97d693nHvYf+3/h/9P/t/OdOta7XPJ17r6f/9D9AplSI=</latexit>

= X

ym−1

(exp sm(ym, ym−1)) X

y1:m−2 m−1

Y

n=1

exp sn(yn, yn−1)

<latexit sha1_base64="1jW+4tvq9pLDCdP9LJYNMcCYaQ=">AKNXicrVb9s2Fazra69W7M9oVbERamtRKB3RYSC7YkCRLQOWtkPoCrRE2WxJSiCpNIagP7Q/sN+yh70Ne91fGC+yK9lylgATYPv48JzvfOeiw0lOiVTD4R+3t56+53bvTv9wbvf/Bh3e3P3oqs0LE+CzOaCaeT5DElHB8poi+HkuMGITip9NXn1jzp9dYCFJxn9R8xyPGZpykpIYKa2Ktm/NtiFp5L4cMLK19V9YH7nVQBGAMqCQUoYUTIq2SisXpQn+2FlDKCaYUqAOMkU9YlrZYI84iZr5IdhPofCwa7p75DhYwkwJnZAKlAcQnxZe53UQiqskXBqvd0UMIBZEjNYkTLXxdxg2oD0J5BunGSecN6PSH/yUEY+E/APvg+ABp7hpSjV1UGE4kpQ5fRUgVpNgWLIoBmFQa7XeYLqvOaqinXgUW5cUHeYO29AeuKapBvUIGRjJjfVGsdbOBFZfjliUZdr7o7sbXvsF9h0RFl1U976VoYx9SxKcUP5BQWMG6mVj767E0WDtaVyIKdVED5whMY/tpr5ovRJ0Gqp0zwG7hWxfFvu4D/eH+e/KJb7ZxPY0H3d/PXe125uAihOld+IaoNujqnfFyjIdKZM/ojmM+TqsnhbmlVktomiD3gJkMGZMS1A7eF5MumdbrmIksWfgany/dqEuT6SCPnWbdXJ+/MVroerDM9auNbu01srxlEzwdhWI6Oav1G4PnVmOCqUJmCKuFbFsYvbRrhuo1nyxVZkac/bdeRq97ByqYKy3yabDenSuSeF/DXj1NQJG0d2d4eHQPmBdCGthx6uf02h7S8AkiwuGuYopkvI8HOZqXCKhSExNYCFxDmKX6EpPtciR7ol49LeqhXY1ZoEpJnQH6A1TY9SsSknLOJtjQvoFw9M8qus/NCpV+MS8LzQmEeu0BpQYHKgLmiQUIEjhWdawHFgmiuIJ4h3Q2lL3Jdp0aYGaYXWLUTidm4lKmN3qI0YVXb+dIlOoACc/w6zhDPmshClihM4TnKCKnMJpwu5q173kwuSy7p0S0iKFcz0GiEcUbOA7BZq+2ScatmAL/FukECn2jWP+VYIJUJzcTdUZVu2BR+YlZ4dZUl4UtLbTKi0BnYypS5ZjXlZubdJMYjiZiqzIW4TX/C1RDYBS3QZnj9tuzkJPabg6k+vC06PD8OHh0c+f7x/Xc/rHe+e96ne6H3yDv2fvBOvTMv7m3HvWOe1/1f+/2f+r/7cz3bpV+3zstZ7+P/8CrWyU6g=</latexit>
slide-54
SLIDE 54

Learning in CRFs

9

■ Likelihood can be computed efficiently using forward algorithm. ■ As in decoding / Viterbi, can be decomposed into recursive substructure:

Define:

= X

ym−1

(exp sm(ym, ym−1)) × αm−1(ym−1)

<latexit sha1_base64="S+C2pUnFb9NeQcjrf+w7RtILE54=">AJH3iclVXdbts2FazJfW8v2a73A27IjcNJmVDtjQIkCxDcOAIlsGLG2H0BVoibKJkpRAUmkMQW/RF9jT7G7YbZ9lNzskZceylS0RYJs+PN/3nR8ecVxwps1w+O7Oxnvb27d7X3Q/Cjz/59N72Z891XqEniU5z9XLMdGUM0nPDOcviwUJWLM6Yvx6+/t/osLqjTL5W9mVtCRIBPJMpYQA6Z4e/Ntfxefahbisaje1A+R/Z3VA3SMsC4F5kwo+NKHEf1q+pkP6qtAzZTakiNcJLmxkGyesEwi4X9qsRBP/EoL97GnpWLFiKvJsTyBRJKkwvi7ArhEFdtUJw5j0QZRJhQcw0Ibz6fa47qK8h2rNMt06yWPJeTyh8dhANwmdoH/04QMA9JcaHV9eWk6iJIJfxwoR5PkHzIqDlKvR3u9znoc6aUG25DhzLrQtyxbV3RdalaplvUYFjHYtw2Qw2vMQXV9HjE2Bdr7rfcbXv8F+JokNlFQcoqIV1DjEncsLpVxort3Awq7W/rgVkbUuLUw5h0APvCOzj+3mvmqCtmg1VJveYL8iLh4W3D0P/Pj8fNi+X8ugWu6D81f730D8yeA08yES6pO9HpNmBes2GRqbP6EF1Pi6zKflqsqsfCVdGKuA1pMxRIxIA0hVSLprWCS1Uns5xlqcLe7MgbsJ07KFNfyF7fS9sG6wFbwPldI3JDTgfTFCNmuScNVwcvPjezvBw6B60voiaxU7QPKfx9obCaZ6UgkqTcKL1eTQszKgiyrCE07qPS0LkrwmE3oOS0lAfFS5+6NGu2BJUZYr+EiDnHUZURGh9UyMwdMeNb26Z41de+elyb4dVUwWpaEy8UJZyZHJkb2MUMoUTQyfwYIkikGsKJkSuBUMXFnQ+SWZKeUX1LQTScSo0plTb4U0FnUbfOkT7WNFJX2T5EIQmT6ocEYE47OUZqTkxl432XzdVa+H6QUrdFO6BSWnBucwMEwSbkfNzVvb7MbJD1Uf/0ChQYqeQNS/FQRkyuIxL+Na2jYBN+3L6v6vzyZXHjCsp1W5QKAZGxd8oLKqvYvCJ5riscTlZdFK+A1vAsUCEgGbfD+tA3zHnBKo9Uzub54fnQYPTo8+vXrnafNe1F3wRfBmEQR8EzwNfgpOg7Mg2fxn6/7Wg6393h+9P3t/9f72rht3GsznQevpvfsXiMUzJw=</latexit>

sum instead of max Viterbi is a special case of the max-product algorithm, forward is a special case of the sum-product algorithm.

vm(k) = M

k02Y

sm(k, k0) ⊗ vm1(k0).

αm(ym) = X

y1:m−1

exp

m

X

n=1

sn(yn, yn−1)

<latexit sha1_base64="KhLME4V4Sic9fp1LltTVpr0QY=">AKNXicrVXdbuNEFPZ2YbMJf9vlkpuBqpNtyUuSItAkcqvkFaFItHdRZ2sNbHyezO2NbMpNvI8gvxAjwLF9whbnkFzsw4WTtxCxVYSnJy5nznO38+Myk4U3o4/O3W1u3Xr/Tu9sfvPHmW2+/c2/7/mOVz2VMz+Kc5/LphCjKWUbPNOcPi0kJWLC6ZPJi6/M+ZMLKhXLs5/0oqBjQaYZS1lMNKi7Tu/DHbxqWI+nojyZfUAmd9FaARwmouMGeCaRWVYhRWz8qT/bAyBljPqCYVwnGSawtJq5WHRSTMVykOQvgngsHuqe+8YsES5MwsQSpJXGJ6WfhdIQRV2QrBqveAlGUIC6JnMeHlz0veoLrC0Z7xdOMki4b1ZkL+o4Mw8B+hfRtgMD3jGgXlUZn0ROBbmMVirM8ylaFgE1qzDY7TJfhrqoQzXlOrBeblyQV72XjnrYjWeb1CBkYqE31SDjf8RWX42Ql43ay6O7G17Bfi6KDZR0HKiFMfYxJ9mU048UlawMO1v8kFztpsXVyYcg6BHjhDZgy/bzXzWemzoNVSp/kcuVfExtuCo394fx+WSz3zyZwRfeh+Zu9r2FuAjhNtd9gtaRXc8L7giWbznQwIQXM+LKsnxZmkUtoiGwx5kJkGBVJQBILN1zFY9a0FZDS1knixk8X9vogbuJp5KB1eyF5Z7fW9WCT4KhNYO3+IwnMBxNUoTo7q/Ubg+dWY0K5JmYIq6VsWxg9t2uGw5pPViozI805+Y8jZ53DlUwhm1y1WE9Ov8yhP+V8PprBI2iezvDw6F90KYQ1sKOVz+n0faWxEkezwXNdMyJUufhsNDjkjNYk6rAZ4rWpD4BZnScxAzAi0Zl/ZWrdAuaBKU5hI+mUZW20SURCi1EBOwNC+gWj8zyq6z87lOPx2XLCvmaxI0rnHOkcmSsaJUzSWPMFCSWDGJF8YxANzRc5FCnBs2M8guq24nEYlyq1LK3QpqIqg2+dIkOsKQZfRnQpAs+bDEKRGMLxKakjnX5hJOl3JXvR4kF6xQdelWLjnVOIc1wjLCzQKyW6itkvGrZoB/pCgyQ9gah/KgkOpcQibujKmjYFL9vVnh1nSXLVpYgtMqbQCQjKlLXtCsrNza5LmieDKV+bxoBbyBt4GCA5JCG5w9bcOcBUxpuD6Tm8Ljo8Pw48OjHz/ZOf6ynte73nveB57vhd5D79j7zjv1zry4t9172DvufdH/tf97/4/+n85061aNedrPf2/gYLQpVa</latexit>

= X

y1:m−1 m

Y

n=1

exp sn(yn, yn−1)

<latexit sha1_base64="mezX0sqYlDlGzh2M6vW+sE82j1s=">AKNXicrVXbtGEGWcNorUW9w+5mVbwzBZx67oBkiRQIDTGwoEbl2gzgVehViRS2mT3SWxu3QsEPyh/kC/pQ9K/raX+heKIWUqKRGS0DSaHZmzsyZ4ewkp0Sq4fD3a1vX3n3Ru9mf/De+x98+NGt7Y8fy6wQMT6LM5qJpxMkMSUcnymiKH6aC4zYhOInk5fmPMnF1hIkvFf1DzHY4amnKQkRkqrou0bvw524akPpyw8lV1B5jfeRWAEYCyYJASRpSMSjYKq+flyX5YGQOoZlihCsA4yZR1SatlhHnEzFfJDkL9jwWD3VPfRYWMJMCZWYBUoLiE+DL3u1IqrKVglXvaVDCAWRIzWJEy2cL3KDaEGjPRLpykXnDer0g/9FBGPiPwD74PgA69gwpl15VmZhITBm6jJYqSLMpWJAmiwMdrvMF6nO61QNXQc2ypUJeR1r73WwLlQT+QoMjGTE/KZa62AjXlSG90901HXW3YnlvsN+JYsOlFU/7aW5MY+pIhPKf5CQmEF62aw9texdLA2WhcWxJTqRA+cITGP7a+bz0SdBqdM8AO4Vsfm23MFb3h/nvyDL/bMFbOi+bv5672s3NwEUp8pvoFrQzZj6fYGCTGfK1I9oPkOl8Xb0mSRWRYNiD3gpkIGZMS1A7dEckvkJs9cZMnCzYTpcH1LDuQKkUbOtW6vLt7ZrXQ9WAc4agNYu/8IoueDMCxBXZ3V+o3Bc6sxwVQhM4TVQrYtjF7YNUP1mk+WKjMjzTn7jyNXnQOVTDW2TYT06/zKF/xXwzdcIGEW3doaHQ/uAdSGshR2vfk6j7S0BkywuGOYqpkjK83CYq3GJhCIxdUAFhLnKH6JpvhcixzploxLe6tWYFdrEpBmQn+4Alb9CgRk3LOJtrSvIBy9cwou87OC5V+NS4JzwuFeyA0oIClQFzRYOECBwrOtcCigXRuYJ4hnQ3lL7INU8NmBmF1i1C4nZuJSpRW+lNGFV2/nSFTqAnP8Ks4YQz5vIQpYoTOE5yigipzCacLuYuvO8kFyWVN3TIkxQpmeo0QjqhZQHYLtdV2ybhVM4DfYt0gU901j/lWCVCZ2Ju6Mq3bAp/NSs8OpNloQvLbXYLqu0CehiDC9ZjnlZubVJM4nhZCqyIm8lvOZvE9UBUKrb4Ox281Z6CkNV2dyXh8dBh+eXj0892d46/reb3p3fY+83wv9O5x94P3ql35sW97d693nHvYf+3/h/9P/t/OdOta7XPJ17r6f/9D9AplSI=</latexit>

= X

ym−1

(exp sm(ym, ym−1)) X

y1:m−2 m−1

Y

n=1

exp sn(yn, yn−1)

<latexit sha1_base64="1jW+4tvq9pLDCdP9LJYNMcCYaQ=">AKNXicrVb9s2Fazra69W7M9oVbERamtRKB3RYSC7YkCRLQOWtkPoCrRE2WxJSiCpNIagP7Q/sN+yh70Ne91fGC+yK9lylgATYPv48JzvfOeiw0lOiVTD4R+3t56+53bvTv9wbvf/Bh3e3P3oqs0LE+CzOaCaeT5DElHB8poi+HkuMGITip9NXn1jzp9dYCFJxn9R8xyPGZpykpIYKa2Ktm/NtiFp5L4cMLK19V9YH7nVQBGAMqCQUoYUTIq2SisXpQn+2FlDKCaYUqAOMkU9YlrZYI84iZr5IdhPofCwa7p75DhYwkwJnZAKlAcQnxZe53UQiqskXBqvd0UMIBZEjNYkTLXxdxg2oD0J5BunGSecN6PSH/yUEY+E/APvg+ABp7hpSjV1UGE4kpQ5fRUgVpNgWLIoBmFQa7XeYLqvOaqinXgUW5cUHeYO29AeuKapBvUIGRjJjfVGsdbOBFZfjliUZdr7o7sbXvsF9h0RFl1U976VoYx9SxKcUP5BQWMG6mVj767E0WDtaVyIKdVED5whMY/tpr5ovRJ0Gqp0zwG7hWxfFvu4D/eH+e/KJb7ZxPY0H3d/PXe125uAihOld+IaoNujqnfFyjIdKZM/ojmM+TqsnhbmlVktomiD3gJkMGZMS1A7eF5MumdbrmIksWfgany/dqEuT6SCPnWbdXJ+/MVroerDM9auNbu01srxlEzwdhWI6Oav1G4PnVmOCqUJmCKuFbFsYvbRrhuo1nyxVZkac/bdeRq97ByqYKy3yabDenSuSeF/DXj1NQJG0d2d4eHQPmBdCGthx6uf02h7S8AkiwuGuYopkvI8HOZqXCKhSExNYCFxDmKX6EpPtciR7ol49LeqhXY1ZoEpJnQH6A1TY9SsSknLOJtjQvoFw9M8qus/NCpV+MS8LzQmEeu0BpQYHKgLmiQUIEjhWdawHFgmiuIJ4h3Q2lL3Jdp0aYGaYXWLUTidm4lKmN3qI0YVXb+dIlOoACc/w6zhDPmshClihM4TnKCKnMJpwu5q173kwuSy7p0S0iKFcz0GiEcUbOA7BZq+2ScatmAL/FukECn2jWP+VYIJUJzcTdUZVu2BR+YlZ4dZUl4UtLbTKi0BnYypS5ZjXlZubdJMYjiZiqzIW4TX/C1RDYBS3QZnj9tuzkJPabg6k+vC06PD8OHh0c+f7x/Xc/rHe+e96ne6H3yDv2fvBOvTMv7m3HvWOe1/1f+/2f+r/7cz3bpV+3zstZ7+P/8CrWyU6g=</latexit>
slide-55
SLIDE 55

Learning in CRFs

10

slide-56
SLIDE 56

Learning in CRFs

10

■ As in logistic regression, gradient of the likelihood w.r.t. parameters is difference

between observed and expected feature counts: ` ✓j =

N

X

i=1

E[fj(w(i), y)] − fj(w(i), y(i))

<latexit sha1_base64="gBH4cwI4J7/lJfldZ9/VtEYFLGQ=">AKFHicrVfbxtFEL+mkNTmT5si9YUHFqIoZ9KkvoBEVRSp4p+QqkCQSFuUdU/ruz1729290+46jXW6D8AX4NPwhnjlnW/BR2B29+z47AskgpNsj2dn5jfzm9m5YcGZNv3+nzfWbr7x5vrGrU73rbfef2nc27T3U+UQk9SXKeq+dDoilnkp4YZjh9XihKxJDTZ8NX9rzZ2dUaZbLH820oANBRpJlLCEGVPHm+s/dbXysWYiHonxd3Uf2d1r10CHCeiIwZ4IZHZfiMKpelEe7UWUNsBlTQyqEkzQ3ziWr5hGmsbBfpdiL4J/odbePQx8VC5Yib+YAMkWSEtPzImxLoVeVjRScegdAmURYEDNOC9/muH2qksC7dhI1y6yWLBeLSh8shf1widoF3TQxB7TIxPr6psTKJGgpzHcxXm+QjNSECLHS328xnqU7rVC1dey7KtQm5iLVzEawN1Ua+BgOHOhbhohp0eCFeXEaPjiDqKuv+xHfYr+URQvKsh94ARfWOMScyBGnDzRWTnBuFmt3FQuCNdHasDlHBLd84bMGn7XaOaLMmS9Rku95nPkr4jLt+GO/uX+eP8ZWf6fK+CS7kPzV3tfu/kJ4DQz4QKqA70cE+4LVmw0NrZ+wosx8bzMbsFi+yRcCxaEHcgbYUC6ViCg3REynTWl0LlaczPxunzfdqSVwl0qF3rfsL1Xu7pb3VgEOmgDO7j+CwIAwQTWq3PacGHy/G5MKTfETmE1k10P45duz3DY8+lcZYdkcdC+Ps3il61T1RvAOrnsJ6d7pUy+B/xuvGdrf5+3z1oVYhqYSuon+N4c03hNE8mgkqTcKL1adQvzKAkyrCE06qLJ5oWJHlFRvQUREmA8EHpXpoV2gZNirJcwUca5LSLHiURWk/FECzt/dLZ1bZdnY6MdnDQclkMTFUJh4om3BkcmTfwChliaGT0EgiWKQK0rGBMg28J6Gzi/AjCk/o6ZSCIGpc4ceiOloaiazue+0C5WVNLXS4EkenHJc6IYHya0oxMuLHv2Gwmt/F1Pz1jha6pm4fk1OActgSThNv94pZMU+12iN8kXfwVhQYpegRZf19QRUyuIBP/CqgYSP8od3Q1T9ZMjm3BLFZVukSgGIsL3lBZVn5rchzTfFwpPJ0Uh4xd8lCgFIBm3w9rTp5i1gSqPlmVwVnh7sR5/sH/zw6dbjL+p5vRW8H3wUhEUfBY8Dr4NjoOTIFn/a+PexgcbqPNL59fOb53fvenajdrnvaDxdP74G06/io=</latexit>
slide-57
SLIDE 57

Learning in CRFs

10

■ As in logistic regression, gradient of the likelihood w.r.t. parameters is difference

between observed and expected feature counts:

count of feature j for token sequence w(i), tag sequence y(i)

` ✓j =

N

X

i=1

E[fj(w(i), y)] − fj(w(i), y(i))

<latexit sha1_base64="gBH4cwI4J7/lJfldZ9/VtEYFLGQ=">AKFHicrVfbxtFEL+mkNTmT5si9YUHFqIoZ9KkvoBEVRSp4p+QqkCQSFuUdU/ruz1729290+46jXW6D8AX4NPwhnjlnW/BR2B29+z47AskgpNsj2dn5jfzm9m5YcGZNv3+nzfWbr7x5vrGrU73rbfef2nc27T3U+UQk9SXKeq+dDoilnkp4YZjh9XihKxJDTZ8NX9rzZ2dUaZbLH820oANBRpJlLCEGVPHm+s/dbXysWYiHonxd3Uf2d1r10CHCeiIwZ4IZHZfiMKpelEe7UWUNsBlTQyqEkzQ3ziWr5hGmsbBfpdiL4J/odbePQx8VC5Yib+YAMkWSEtPzImxLoVeVjRScegdAmURYEDNOC9/muH2qksC7dhI1y6yWLBeLSh8shf1widoF3TQxB7TIxPr6psTKJGgpzHcxXm+QjNSECLHS328xnqU7rVC1dey7KtQm5iLVzEawN1Ua+BgOHOhbhohp0eCFeXEaPjiDqKuv+xHfYr+URQvKsh94ARfWOMScyBGnDzRWTnBuFmt3FQuCNdHasDlHBLd84bMGn7XaOaLMmS9Rku95nPkr4jLt+GO/uX+eP8ZWf6fK+CS7kPzV3tfu/kJ4DQz4QKqA70cE+4LVmw0NrZ+wosx8bzMbsFi+yRcCxaEHcgbYUC6ViCg3REynTWl0LlaczPxunzfdqSVwl0qF3rfsL1Xu7pb3VgEOmgDO7j+CwIAwQTWq3PacGHy/G5MKTfETmE1k10P45duz3DY8+lcZYdkcdC+Ps3il61T1RvAOrnsJ6d7pUy+B/xuvGdrf5+3z1oVYhqYSuon+N4c03hNE8mgkqTcKL1adQvzKAkyrCE06qLJ5oWJHlFRvQUREmA8EHpXpoV2gZNirJcwUca5LSLHiURWk/FECzt/dLZ1bZdnY6MdnDQclkMTFUJh4om3BkcmTfwChliaGT0EgiWKQK0rGBMg28J6Gzi/AjCk/o6ZSCIGpc4ceiOloaiazue+0C5WVNLXS4EkenHJc6IYHya0oxMuLHv2Gwmt/F1Pz1jha6pm4fk1OActgSThNv94pZMU+12iN8kXfwVhQYpegRZf19QRUyuIBP/CqgYSP8od3Q1T9ZMjm3BLFZVukSgGIsL3lBZVn5rchzTfFwpPJ0Uh4xd8lCgFIBm3w9rTp5i1gSqPlmVwVnh7sR5/sH/zw6dbjL+p5vRW8H3wUhEUfBY8Dr4NjoOTIFn/a+PexgcbqPNL59fOb53fvenajdrnvaDxdP74G06/io=</latexit>
slide-58
SLIDE 58

Learning in CRFs

10

■ As in logistic regression, gradient of the likelihood w.r.t. parameters is difference

between observed and expected feature counts:

■ Gradients can be computed by automatic differentiation!

count of feature j for token sequence w(i), tag sequence y(i)

` ✓j =

N

X

i=1

E[fj(w(i), y)] − fj(w(i), y(i))

<latexit sha1_base64="gBH4cwI4J7/lJfldZ9/VtEYFLGQ=">AKFHicrVfbxtFEL+mkNTmT5si9YUHFqIoZ9KkvoBEVRSp4p+QqkCQSFuUdU/ruz1729290+46jXW6D8AX4NPwhnjlnW/BR2B29+z47AskgpNsj2dn5jfzm9m5YcGZNv3+nzfWbr7x5vrGrU73rbfef2nc27T3U+UQk9SXKeq+dDoilnkp4YZjh9XihKxJDTZ8NX9rzZ2dUaZbLH820oANBRpJlLCEGVPHm+s/dbXysWYiHonxd3Uf2d1r10CHCeiIwZ4IZHZfiMKpelEe7UWUNsBlTQyqEkzQ3ziWr5hGmsbBfpdiL4J/odbePQx8VC5Yib+YAMkWSEtPzImxLoVeVjRScegdAmURYEDNOC9/muH2qksC7dhI1y6yWLBeLSh8shf1widoF3TQxB7TIxPr6psTKJGgpzHcxXm+QjNSECLHS328xnqU7rVC1dey7KtQm5iLVzEawN1Ua+BgOHOhbhohp0eCFeXEaPjiDqKuv+xHfYr+URQvKsh94ARfWOMScyBGnDzRWTnBuFmt3FQuCNdHasDlHBLd84bMGn7XaOaLMmS9Rku95nPkr4jLt+GO/uX+eP8ZWf6fK+CS7kPzV3tfu/kJ4DQz4QKqA70cE+4LVmw0NrZ+wosx8bzMbsFi+yRcCxaEHcgbYUC6ViCg3REynTWl0LlaczPxunzfdqSVwl0qF3rfsL1Xu7pb3VgEOmgDO7j+CwIAwQTWq3PacGHy/G5MKTfETmE1k10P45duz3DY8+lcZYdkcdC+Ps3il61T1RvAOrnsJ6d7pUy+B/xuvGdrf5+3z1oVYhqYSuon+N4c03hNE8mgkqTcKL1adQvzKAkyrCE06qLJ5oWJHlFRvQUREmA8EHpXpoV2gZNirJcwUca5LSLHiURWk/FECzt/dLZ1bZdnY6MdnDQclkMTFUJh4om3BkcmTfwChliaGT0EgiWKQK0rGBMg28J6Gzi/AjCk/o6ZSCIGpc4ceiOloaiazue+0C5WVNLXS4EkenHJc6IYHya0oxMuLHv2Gwmt/F1Pz1jha6pm4fk1OActgSThNv94pZMU+12iN8kXfwVhQYpegRZf19QRUyuIBP/CqgYSP8od3Q1T9ZMjm3BLFZVukSgGIsL3lBZVn5rchzTfFwpPJ0Uh4xd8lCgFIBm3w9rTp5i1gSqPlmVwVnh7sR5/sH/zw6dbjL+p5vRW8H3wUhEUfBY8Dr4NjoOTIFn/a+PexgcbqPNL59fOb53fvenajdrnvaDxdP74G06/io=</latexit>
slide-59
SLIDE 59

Learning in CRFs

10

■ As in logistic regression, gradient of the likelihood w.r.t. parameters is difference

between observed and expected feature counts:

■ Gradients can be computed by automatic differentiation! ■ In the Olden Days, would use the forward-backward algorithm to compute

expected counts.

count of feature j for token sequence w(i), tag sequence y(i)

` ✓j =

N

X

i=1

E[fj(w(i), y)] − fj(w(i), y(i))

<latexit sha1_base64="gBH4cwI4J7/lJfldZ9/VtEYFLGQ=">AKFHicrVfbxtFEL+mkNTmT5si9YUHFqIoZ9KkvoBEVRSp4p+QqkCQSFuUdU/ruz1729290+46jXW6D8AX4NPwhnjlnW/BR2B29+z47AskgpNsj2dn5jfzm9m5YcGZNv3+nzfWbr7x5vrGrU73rbfef2nc27T3U+UQk9SXKeq+dDoilnkp4YZjh9XihKxJDTZ8NX9rzZ2dUaZbLH820oANBRpJlLCEGVPHm+s/dbXysWYiHonxd3Uf2d1r10CHCeiIwZ4IZHZfiMKpelEe7UWUNsBlTQyqEkzQ3ziWr5hGmsbBfpdiL4J/odbePQx8VC5Yib+YAMkWSEtPzImxLoVeVjRScegdAmURYEDNOC9/muH2qksC7dhI1y6yWLBeLSh8shf1widoF3TQxB7TIxPr6psTKJGgpzHcxXm+QjNSECLHS328xnqU7rVC1dey7KtQm5iLVzEawN1Ua+BgOHOhbhohp0eCFeXEaPjiDqKuv+xHfYr+URQvKsh94ARfWOMScyBGnDzRWTnBuFmt3FQuCNdHasDlHBLd84bMGn7XaOaLMmS9Rku95nPkr4jLt+GO/uX+eP8ZWf6fK+CS7kPzV3tfu/kJ4DQz4QKqA70cE+4LVmw0NrZ+wosx8bzMbsFi+yRcCxaEHcgbYUC6ViCg3REynTWl0LlaczPxunzfdqSVwl0qF3rfsL1Xu7pb3VgEOmgDO7j+CwIAwQTWq3PacGHy/G5MKTfETmE1k10P45duz3DY8+lcZYdkcdC+Ps3il61T1RvAOrnsJ6d7pUy+B/xuvGdrf5+3z1oVYhqYSuon+N4c03hNE8mgkqTcKL1adQvzKAkyrCE06qLJ5oWJHlFRvQUREmA8EHpXpoV2gZNirJcwUca5LSLHiURWk/FECzt/dLZ1bZdnY6MdnDQclkMTFUJh4om3BkcmTfwChliaGT0EgiWKQK0rGBMg28J6Gzi/AjCk/o6ZSCIGpc4ceiOloaiazue+0C5WVNLXS4EkenHJc6IYHya0oxMuLHv2Gwmt/F1Pz1jha6pm4fk1OActgSThNv94pZMU+12iN8kXfwVhQYpegRZf19QRUyuIBP/CqgYSP8od3Q1T9ZMjm3BLFZVukSgGIsL3lBZVn5rchzTfFwpPJ0Uh4xd8lCgFIBm3w9rTp5i1gSqPlmVwVnh7sR5/sH/zw6dbjL+p5vRW8H3wUhEUfBY8Dr4NjoOTIFn/a+PexgcbqPNL59fOb53fvenajdrnvaDxdP74G06/io=</latexit>
slide-60
SLIDE 60

Learning in CRFs

10

■ As in logistic regression, gradient of the likelihood w.r.t. parameters is difference

between observed and expected feature counts:

■ Gradients can be computed by automatic differentiation! ■ In the Olden Days, would use the forward-backward algorithm to compute

expected counts.

count of feature j for token sequence w(i), tag sequence y(i)

` ✓j =

N

X

i=1

E[fj(w(i), y)] − fj(w(i), y(i))

<latexit sha1_base64="gBH4cwI4J7/lJfldZ9/VtEYFLGQ=">AKFHicrVfbxtFEL+mkNTmT5si9YUHFqIoZ9KkvoBEVRSp4p+QqkCQSFuUdU/ruz1729290+46jXW6D8AX4NPwhnjlnW/BR2B29+z47AskgpNsj2dn5jfzm9m5YcGZNv3+nzfWbr7x5vrGrU73rbfef2nc27T3U+UQk9SXKeq+dDoilnkp4YZjh9XihKxJDTZ8NX9rzZ2dUaZbLH820oANBRpJlLCEGVPHm+s/dbXysWYiHonxd3Uf2d1r10CHCeiIwZ4IZHZfiMKpelEe7UWUNsBlTQyqEkzQ3ziWr5hGmsbBfpdiL4J/odbePQx8VC5Yib+YAMkWSEtPzImxLoVeVjRScegdAmURYEDNOC9/muH2qksC7dhI1y6yWLBeLSh8shf1widoF3TQxB7TIxPr6psTKJGgpzHcxXm+QjNSECLHS328xnqU7rVC1dey7KtQm5iLVzEawN1Ua+BgOHOhbhohp0eCFeXEaPjiDqKuv+xHfYr+URQvKsh94ARfWOMScyBGnDzRWTnBuFmt3FQuCNdHasDlHBLd84bMGn7XaOaLMmS9Rku95nPkr4jLt+GO/uX+eP8ZWf6fK+CS7kPzV3tfu/kJ4DQz4QKqA70cE+4LVmw0NrZ+wosx8bzMbsFi+yRcCxaEHcgbYUC6ViCg3REynTWl0LlaczPxunzfdqSVwl0qF3rfsL1Xu7pb3VgEOmgDO7j+CwIAwQTWq3PacGHy/G5MKTfETmE1k10P45duz3DY8+lcZYdkcdC+Ps3il61T1RvAOrnsJ6d7pUy+B/xuvGdrf5+3z1oVYhqYSuon+N4c03hNE8mgkqTcKL1adQvzKAkyrCE06qLJ5oWJHlFRvQUREmA8EHpXpoV2gZNirJcwUca5LSLHiURWk/FECzt/dLZ1bZdnY6MdnDQclkMTFUJh4om3BkcmTfwChliaGT0EgiWKQK0rGBMg28J6Gzi/AjCk/o6ZSCIGpc4ceiOloaiazue+0C5WVNLXS4EkenHJc6IYHya0oxMuLHv2Gwmt/F1Pz1jha6pm4fk1OActgSThNv94pZMU+12iN8kXfwVhQYpegRZf19QRUyuIBP/CqgYSP8od3Q1T9ZMjm3BLFZVukSgGIsL3lBZVn5rchzTfFwpPJ0Uh4xd8lCgFIBm3w9rTp5i1gSqPlmVwVnh7sR5/sH/zw6dbjL+p5vRW8H3wUhEUfBY8Dr4NjoOTIFn/a+PexgcbqPNL59fOb53fvenajdrnvaDxdP74G06/io=</latexit>

Ym1 = k0 Ym = k

αm1(k0) exp sm(k, k0) βm(k)

slide-61
SLIDE 61

Learning in CRFs

10

■ As in logistic regression, gradient of the likelihood w.r.t. parameters is difference

between observed and expected feature counts:

■ Gradients can be computed by automatic differentiation! ■ In the Olden Days, would use the forward-backward algorithm to compute

expected counts.

count of feature j for token sequence w(i), tag sequence y(i)

` ✓j =

N

X

i=1

E[fj(w(i), y)] − fj(w(i), y(i))

<latexit sha1_base64="gBH4cwI4J7/lJfldZ9/VtEYFLGQ=">AKFHicrVfbxtFEL+mkNTmT5si9YUHFqIoZ9KkvoBEVRSp4p+QqkCQSFuUdU/ruz1729290+46jXW6D8AX4NPwhnjlnW/BR2B29+z47AskgpNsj2dn5jfzm9m5YcGZNv3+nzfWbr7x5vrGrU73rbfef2nc27T3U+UQk9SXKeq+dDoilnkp4YZjh9XihKxJDTZ8NX9rzZ2dUaZbLH820oANBRpJlLCEGVPHm+s/dbXysWYiHonxd3Uf2d1r10CHCeiIwZ4IZHZfiMKpelEe7UWUNsBlTQyqEkzQ3ziWr5hGmsbBfpdiL4J/odbePQx8VC5Yib+YAMkWSEtPzImxLoVeVjRScegdAmURYEDNOC9/muH2qksC7dhI1y6yWLBeLSh8shf1widoF3TQxB7TIxPr6psTKJGgpzHcxXm+QjNSECLHS328xnqU7rVC1dey7KtQm5iLVzEawN1Ua+BgOHOhbhohp0eCFeXEaPjiDqKuv+xHfYr+URQvKsh94ARfWOMScyBGnDzRWTnBuFmt3FQuCNdHasDlHBLd84bMGn7XaOaLMmS9Rku95nPkr4jLt+GO/uX+eP8ZWf6fK+CS7kPzV3tfu/kJ4DQz4QKqA70cE+4LVmw0NrZ+wosx8bzMbsFi+yRcCxaEHcgbYUC6ViCg3REynTWl0LlaczPxunzfdqSVwl0qF3rfsL1Xu7pb3VgEOmgDO7j+CwIAwQTWq3PacGHy/G5MKTfETmE1k10P45duz3DY8+lcZYdkcdC+Ps3il61T1RvAOrnsJ6d7pUy+B/xuvGdrf5+3z1oVYhqYSuon+N4c03hNE8mgkqTcKL1adQvzKAkyrCE06qLJ5oWJHlFRvQUREmA8EHpXpoV2gZNirJcwUca5LSLHiURWk/FECzt/dLZ1bZdnY6MdnDQclkMTFUJh4om3BkcmTfwChliaGT0EgiWKQK0rGBMg28J6Gzi/AjCk/o6ZSCIGpc4ceiOloaiazue+0C5WVNLXS4EkenHJc6IYHya0oxMuLHv2Gwmt/F1Pz1jha6pm4fk1OActgSThNv94pZMU+12iN8kXfwVhQYpegRZf19QRUyuIBP/CqgYSP8od3Q1T9ZMjm3BLFZVukSgGIsL3lBZVn5rchzTfFwpPJ0Uh4xd8lCgFIBm3w9rTp5i1gSqPlmVwVnh7sR5/sH/zw6dbjL+p5vRW8H3wUhEUfBY8Dr4NjoOTIFn/a+PexgcbqPNL59fOb53fvenajdrnvaDxdP74G06/io=</latexit>

Ym1 = k0 Ym = k

αm1(k0) exp sm(k, k0) βm(k)

forward score: sum over all prefixes

slide-62
SLIDE 62

Learning in CRFs

10

■ As in logistic regression, gradient of the likelihood w.r.t. parameters is difference

between observed and expected feature counts:

■ Gradients can be computed by automatic differentiation! ■ In the Olden Days, would use the forward-backward algorithm to compute

expected counts.

count of feature j for token sequence w(i), tag sequence y(i)

` ✓j =

N

X

i=1

E[fj(w(i), y)] − fj(w(i), y(i))

<latexit sha1_base64="gBH4cwI4J7/lJfldZ9/VtEYFLGQ=">AKFHicrVfbxtFEL+mkNTmT5si9YUHFqIoZ9KkvoBEVRSp4p+QqkCQSFuUdU/ruz1729290+46jXW6D8AX4NPwhnjlnW/BR2B29+z47AskgpNsj2dn5jfzm9m5YcGZNv3+nzfWbr7x5vrGrU73rbfef2nc27T3U+UQk9SXKeq+dDoilnkp4YZjh9XihKxJDTZ8NX9rzZ2dUaZbLH820oANBRpJlLCEGVPHm+s/dbXysWYiHonxd3Uf2d1r10CHCeiIwZ4IZHZfiMKpelEe7UWUNsBlTQyqEkzQ3ziWr5hGmsbBfpdiL4J/odbePQx8VC5Yib+YAMkWSEtPzImxLoVeVjRScegdAmURYEDNOC9/muH2qksC7dhI1y6yWLBeLSh8shf1widoF3TQxB7TIxPr6psTKJGgpzHcxXm+QjNSECLHS328xnqU7rVC1dey7KtQm5iLVzEawN1Ua+BgOHOhbhohp0eCFeXEaPjiDqKuv+xHfYr+URQvKsh94ARfWOMScyBGnDzRWTnBuFmt3FQuCNdHasDlHBLd84bMGn7XaOaLMmS9Rku95nPkr4jLt+GO/uX+eP8ZWf6fK+CS7kPzV3tfu/kJ4DQz4QKqA70cE+4LVmw0NrZ+wosx8bzMbsFi+yRcCxaEHcgbYUC6ViCg3REynTWl0LlaczPxunzfdqSVwl0qF3rfsL1Xu7pb3VgEOmgDO7j+CwIAwQTWq3PacGHy/G5MKTfETmE1k10P45duz3DY8+lcZYdkcdC+Ps3il61T1RvAOrnsJ6d7pUy+B/xuvGdrf5+3z1oVYhqYSuon+N4c03hNE8mgkqTcKL1adQvzKAkyrCE06qLJ5oWJHlFRvQUREmA8EHpXpoV2gZNirJcwUca5LSLHiURWk/FECzt/dLZ1bZdnY6MdnDQclkMTFUJh4om3BkcmTfwChliaGT0EgiWKQK0rGBMg28J6Gzi/AjCk/o6ZSCIGpc4ceiOloaiazue+0C5WVNLXS4EkenHJc6IYHya0oxMuLHv2Gwmt/F1Pz1jha6pm4fk1OActgSThNv94pZMU+12iN8kXfwVhQYpegRZf19QRUyuIBP/CqgYSP8od3Q1T9ZMjm3BLFZVukSgGIsL3lBZVn5rchzTfFwpPJ0Uh4xd8lCgFIBm3w9rTp5i1gSqPlmVwVnh7sR5/sH/zw6dbjL+p5vRW8H3wUhEUfBY8Dr4NjoOTIFn/a+PexgcbqPNL59fOb53fvenajdrnvaDxdP74G06/io=</latexit>

Ym1 = k0 Ym = k

αm1(k0) exp sm(k, k0) βm(k)

forward score: sum over all prefixes backward score: sum over all suffixes

slide-63
SLIDE 63

Learning in CRFs

10

■ As in logistic regression, gradient of the likelihood w.r.t. parameters is difference

between observed and expected feature counts:

■ Gradients can be computed by automatic differentiation! ■ In the Olden Days, would use the forward-backward algorithm to compute

expected counts.

count of feature j for token sequence w(i), tag sequence y(i)

` ✓j =

N

X

i=1

E[fj(w(i), y)] − fj(w(i), y(i))

<latexit sha1_base64="gBH4cwI4J7/lJfldZ9/VtEYFLGQ=">AKFHicrVfbxtFEL+mkNTmT5si9YUHFqIoZ9KkvoBEVRSp4p+QqkCQSFuUdU/ruz1729290+46jXW6D8AX4NPwhnjlnW/BR2B29+z47AskgpNsj2dn5jfzm9m5YcGZNv3+nzfWbr7x5vrGrU73rbfef2nc27T3U+UQk9SXKeq+dDoilnkp4YZjh9XihKxJDTZ8NX9rzZ2dUaZbLH820oANBRpJlLCEGVPHm+s/dbXysWYiHonxd3Uf2d1r10CHCeiIwZ4IZHZfiMKpelEe7UWUNsBlTQyqEkzQ3ziWr5hGmsbBfpdiL4J/odbePQx8VC5Yib+YAMkWSEtPzImxLoVeVjRScegdAmURYEDNOC9/muH2qksC7dhI1y6yWLBeLSh8shf1widoF3TQxB7TIxPr6psTKJGgpzHcxXm+QjNSECLHS328xnqU7rVC1dey7KtQm5iLVzEawN1Ua+BgOHOhbhohp0eCFeXEaPjiDqKuv+xHfYr+URQvKsh94ARfWOMScyBGnDzRWTnBuFmt3FQuCNdHasDlHBLd84bMGn7XaOaLMmS9Rku95nPkr4jLt+GO/uX+eP8ZWf6fK+CS7kPzV3tfu/kJ4DQz4QKqA70cE+4LVmw0NrZ+wosx8bzMbsFi+yRcCxaEHcgbYUC6ViCg3REynTWl0LlaczPxunzfdqSVwl0qF3rfsL1Xu7pb3VgEOmgDO7j+CwIAwQTWq3PacGHy/G5MKTfETmE1k10P45duz3DY8+lcZYdkcdC+Ps3il61T1RvAOrnsJ6d7pUy+B/xuvGdrf5+3z1oVYhqYSuon+N4c03hNE8mgkqTcKL1adQvzKAkyrCE06qLJ5oWJHlFRvQUREmA8EHpXpoV2gZNirJcwUca5LSLHiURWk/FECzt/dLZ1bZdnY6MdnDQclkMTFUJh4om3BkcmTfwChliaGT0EgiWKQK0rGBMg28J6Gzi/AjCk/o6ZSCIGpc4ceiOloaiazue+0C5WVNLXS4EkenHJc6IYHya0oxMuLHv2Gwmt/F1Pz1jha6pm4fk1OActgSThNv94pZMU+12iN8kXfwVhQYpegRZf19QRUyuIBP/CqgYSP8od3Q1T9ZMjm3BLFZVukSgGIsL3lBZVn5rchzTfFwpPJ0Uh4xd8lCgFIBm3w9rTp5i1gSqPlmVwVnh7sR5/sH/zw6dbjL+p5vRW8H3wUhEUfBY8Dr4NjoOTIFn/a+PexgcbqPNL59fOb53fvenajdrnvaDxdP74G06/io=</latexit>

Ym1 = k0 Ym = k

αm1(k0) exp sm(k, k0) βm(k)

forward score: sum over all prefixes backward score: sum over all suffixes transition score

slide-64
SLIDE 64

Better features for sequence labeling?

11

■ Until now: hand-engineered features:

slide-65
SLIDE 65

Better features for sequence labeling?

11

■ Until now: hand-engineered features:

wi contains a particular prefix (from all prefixes of length  4) wi contains a particular suffix (from all suffixes of length  4) wi contains a number wi contains an upper-case letter wi contains a hyphen wi is all upper case wi’s word shape wi’s short word shape wi is upper case and has a digit and a dash (like CFC-12) wi is upper case and followed within 3 words by Co., Inc., etc.

Lexical fi±{0,1,2,3}, (mi−2,i−1), (mi−1,i), (mi−1,i+1), (mi,i+1), (mi+1,i+2), (mi−2,i−1,i), (mi−1,i,i+1), (mi,i+1,i+2), (mi−2,i−1,i+1), (mi−1,i+1,i+2) POS pi−{3,2,1}, ai+{0,1,2,3}, (pi−2,i−1), (ai+1,i+2), (pi−1, ai+1), (pi−2, pi−1, ai), (pi−2, pi−1, ai+1), (pi−1, ai, ai+1), (pi−1, ai+1, ai+2) Affix c:1, c:2, c:3, cn:, cn−1:, cn−2:, cn−3: Binary initial uppercase, all uppercase/lowercase, contains 1/2+ capital(s) not at the beginning, contains a (period/number/hyphen)

slide-66
SLIDE 66

Better features for sequence labeling?

11

■ Until now: hand-engineered features:

1 … 1 … … 1 …

wm-1 = will wm-1 = my wm = back wm+1 = ache ym-1 = MD

wi contains a particular prefix (from all prefixes of length  4) wi contains a particular suffix (from all suffixes of length  4) wi contains a number wi contains an upper-case letter wi contains a hyphen wi is all upper case wi’s word shape wi’s short word shape wi is upper case and has a digit and a dash (like CFC-12) wi is upper case and followed within 3 words by Co., Inc., etc.

Janet will back the bill .

Lexical fi±{0,1,2,3}, (mi−2,i−1), (mi−1,i), (mi−1,i+1), (mi,i+1), (mi+1,i+2), (mi−2,i−1,i), (mi−1,i,i+1), (mi,i+1,i+2), (mi−2,i−1,i+1), (mi−1,i+1,i+2) POS pi−{3,2,1}, ai+{0,1,2,3}, (pi−2,i−1), (ai+1,i+2), (pi−1, ai+1), (pi−2, pi−1, ai), (pi−2, pi−1, ai+1), (pi−1, ai, ai+1), (pi−1, ai+1, ai+2) Affix c:1, c:2, c:3, cn:, cn−1:, cn−2:, cn−3: Binary initial uppercase, all uppercase/lowercase, contains 1/2+ capital(s) not at the beginning, contains a (period/number/hyphen)

f(w, ym, ym−1, m) =

<latexit sha1_base64="y2cxWKwz3S3xVgA5ofaWNPquJR0=">AKNXicrVXdbts2FbTra69v6a73A23Ii8NJmVDejQwUD2iwFtgxY2g6hK9ASZbMlKYGk0xiCXmgvsGfZxe6G3e4VdkjKjhUra4JNgO3jw3O+7/zpcFxwps1g8PutjdtvHmnc7fbe+vtd959797m/Sc6n6mEniQ5z9WzMdGUM0lPDOcPisUJWLM6dPxy6/t+dMzqjTL5c9mXtCRIBPJMpYQA6p486vW18rFmIx6J8VT1A9nde9dEQYT0TmDPBjI5LMYyq5+XRblRZA2ym1JAK4STNjXPJqiXCPBb2qxR7EfwT/d72cehRsWAp8maOIFMkKTE9L8K2EPpV2QjBqXeAlEmEBTHThPDylwVv7oCaMci3TjJYsV6PaHw8V7UDx+jXfRdHwH2lBgfXlVZTKImgpzHSxXm+QtioBWq9DbjNfhDqvQ7Xl2nMoNy7IBdbOBVgbq0W+QWGOhbhqhp0eAUvLqNHR4C6XnV/4mrfYn8pihaWy37gBbWwxiHmRE4/URj5QTnZrl217kArMnWxoUp5xDonjdk1vCHRjOflyHrN1rqNV8g/4q4eBvu6DXvj/dfFMv/cwlc0X1o/nrvazc/AZxmJlxhdaRXc8L7ghWbTI3Nn/BiSnxdFm/LRXZI+GqaEncgbQZCqRjCQ7SFVIum9bqWqg8XfhZnDbf6wVxHaShd637C9l7u0t768THDQJnN1/JIEBYJqVGfntOHK5PndmFJuiJ3CaiG7HsYv3J7hsOfTpcoOyeqgfXuaxS9ap6o/gnVy1WE9O9cM4X8kfM01gobxva3B/sA9aF2IamErqJ/jeHND4TRPZoJKk3Ci9Wk0KMyoJMqwhNOqh2eaFiR5Sb0FERJoCOj0t2qFdoGTYqyXMFHGuS0qx4lEVrPxRgs7QuoL59ZdvZ6cxkn49KJouZoTLxRNmMI5Mje0WjlCmaGD4HgSKQawomRJohoGLHPqyQjOl/IyaZiKJGJU6c+yNkMaiajqf+0R7WFJXyW5ESmH5c4I4LxeUozMuPGXsLZQm6r14P0jBW6Lt0SklODc1gjTBJuF5DbQk21WzJ+1fTwNxQapOgRP1jQRUxuYJI/B1VQcMm+EO7wqt/s2RyaQliM63SBQDJ2LrkBZVl5dcmzXF4nKZ0Uj4DV/FygAkAza4O1p081bwJRGl2dyXhysB9un/w02dbh1/V83o3+CD4KAiDKHgYHAbfB8fBSZB0NjsPO4edL7u/df/o/tn9y5tu3Kp93g8aT/fvfwDACJWS</latexit>
slide-67
SLIDE 67

Better features for sequence labeling?

12

■ Until now: hand-engineered features:

1 … 1 … … 1 …

wm-1 = will wm-1 = my wm = back wm+1 = ache ym-1 = MD

Janet will back the bill .

f(w, ym, ym−1, m) =

<latexit sha1_base64="y2cxWKwz3S3xVgA5ofaWNPquJR0=">AKNXicrVXdbts2FbTra69v6a73A23Ii8NJmVDejQwUD2iwFtgxY2g6hK9ASZbMlKYGk0xiCXmgvsGfZxe6G3e4VdkjKjhUra4JNgO3jw3O+7/zpcFxwps1g8PutjdtvHmnc7fbe+vtd959797m/Sc6n6mEniQ5z9WzMdGUM0lPDOcPisUJWLM6dPxy6/t+dMzqjTL5c9mXtCRIBPJMpYQA6p486vW18rFmIx6J8VT1A9nde9dEQYT0TmDPBjI5LMYyq5+XRblRZA2ym1JAK4STNjXPJqiXCPBb2qxR7EfwT/d72cehRsWAp8maOIFMkKTE9L8K2EPpV2QjBqXeAlEmEBTHThPDylwVv7oCaMci3TjJYsV6PaHw8V7UDx+jXfRdHwH2lBgfXlVZTKImgpzHSxXm+QtioBWq9DbjNfhDqvQ7Xl2nMoNy7IBdbOBVgbq0W+QWGOhbhqhp0eAUvLqNHR4C6XnV/4mrfYn8pihaWy37gBbWwxiHmRE4/URj5QTnZrl217kArMnWxoUp5xDonjdk1vCHRjOflyHrN1rqNV8g/4q4eBvu6DXvj/dfFMv/cwlc0X1o/nrvazc/AZxmJlxhdaRXc8L7ghWbTI3Nn/BiSnxdFm/LRXZI+GqaEncgbQZCqRjCQ7SFVIum9bqWqg8XfhZnDbf6wVxHaShd637C9l7u0t768THDQJnN1/JIEBYJqVGfntOHK5PndmFJuiJ3CaiG7HsYv3J7hsOfTpcoOyeqgfXuaxS9ap6o/gnVy1WE9O9cM4X8kfM01gobxva3B/sA9aF2IamErqJ/jeHND4TRPZoJKk3Ci9Wk0KMyoJMqwhNOqh2eaFiR5Sb0FERJoCOj0t2qFdoGTYqyXMFHGuS0qx4lEVrPxRgs7QuoL59ZdvZ6cxkn49KJouZoTLxRNmMI5Mje0WjlCmaGD4HgSKQawomRJohoGLHPqyQjOl/IyaZiKJGJU6c+yNkMaiajqf+0R7WFJXyW5ESmH5c4I4LxeUozMuPGXsLZQm6r14P0jBW6Lt0SklODc1gjTBJuF5DbQk21WzJ+1fTwNxQapOgRP1jQRUxuYJI/B1VQcMm+EO7wqt/s2RyaQliM63SBQDJ2LrkBZVl5dcmzXF4nKZ0Uj4DV/FygAkAza4O1p081bwJRGl2dyXhysB9un/w02dbh1/V83o3+CD4KAiDKHgYHAbfB8fBSZB0NjsPO4edL7u/df/o/tn9y5tu3Kp93g8aT/fvfwDACJWS</latexit>

pros: cons:

slide-68
SLIDE 68

Better features for sequence labeling?

12

■ Until now: hand-engineered features:

1 … 1 … … 1 …

wm-1 = will wm-1 = my wm = back wm+1 = ache ym-1 = MD

Janet will back the bill .

f(w, ym, ym−1, m) =

<latexit sha1_base64="y2cxWKwz3S3xVgA5ofaWNPquJR0=">AKNXicrVXdbts2FbTra69v6a73A23Ii8NJmVDejQwUD2iwFtgxY2g6hK9ASZbMlKYGk0xiCXmgvsGfZxe6G3e4VdkjKjhUra4JNgO3jw3O+7/zpcFxwps1g8PutjdtvHmnc7fbe+vtd959797m/Sc6n6mEniQ5z9WzMdGUM0lPDOcPisUJWLM6dPxy6/t+dMzqjTL5c9mXtCRIBPJMpYQA6p486vW18rFmIx6J8VT1A9nde9dEQYT0TmDPBjI5LMYyq5+XRblRZA2ym1JAK4STNjXPJqiXCPBb2qxR7EfwT/d72cehRsWAp8maOIFMkKTE9L8K2EPpV2QjBqXeAlEmEBTHThPDylwVv7oCaMci3TjJYsV6PaHw8V7UDx+jXfRdHwH2lBgfXlVZTKImgpzHSxXm+QtioBWq9DbjNfhDqvQ7Xl2nMoNy7IBdbOBVgbq0W+QWGOhbhqhp0eAUvLqNHR4C6XnV/4mrfYn8pihaWy37gBbWwxiHmRE4/URj5QTnZrl217kArMnWxoUp5xDonjdk1vCHRjOflyHrN1rqNV8g/4q4eBvu6DXvj/dfFMv/cwlc0X1o/nrvazc/AZxmJlxhdaRXc8L7ghWbTI3Nn/BiSnxdFm/LRXZI+GqaEncgbQZCqRjCQ7SFVIum9bqWqg8XfhZnDbf6wVxHaShd637C9l7u0t768THDQJnN1/JIEBYJqVGfntOHK5PndmFJuiJ3CaiG7HsYv3J7hsOfTpcoOyeqgfXuaxS9ap6o/gnVy1WE9O9cM4X8kfM01gobxva3B/sA9aF2IamErqJ/jeHND4TRPZoJKk3Ci9Wk0KMyoJMqwhNOqh2eaFiR5Sb0FERJoCOj0t2qFdoGTYqyXMFHGuS0qx4lEVrPxRgs7QuoL59ZdvZ6cxkn49KJouZoTLxRNmMI5Mje0WjlCmaGD4HgSKQawomRJohoGLHPqyQjOl/IyaZiKJGJU6c+yNkMaiajqf+0R7WFJXyW5ESmH5c4I4LxeUozMuPGXsLZQm6r14P0jBW6Lt0SklODc1gjTBJuF5DbQk21WzJ+1fTwNxQapOgRP1jQRUxuYJI/B1VQcMm+EO7wqt/s2RyaQliM63SBQDJ2LrkBZVl5dcmzXF4nKZ0Uj4DV/FygAkAza4O1p081bwJRGl2dyXhysB9un/w02dbh1/V83o3+CD4KAiDKHgYHAbfB8fBSZB0NjsPO4edL7u/df/o/tn9y5tu3Kp93g8aT/fvfwDACJWS</latexit>

pros: cons:

■ interpretable, explainable ■ can generalize well ■ fast training and inference ■ channel domain knowledge

slide-69
SLIDE 69

Better features for sequence labeling?

12

■ Until now: hand-engineered features:

1 … 1 … … 1 …

wm-1 = will wm-1 = my wm = back wm+1 = ache ym-1 = MD

Janet will back the bill .

f(w, ym, ym−1, m) =

<latexit sha1_base64="y2cxWKwz3S3xVgA5ofaWNPquJR0=">AKNXicrVXdbts2FbTra69v6a73A23Ii8NJmVDejQwUD2iwFtgxY2g6hK9ASZbMlKYGk0xiCXmgvsGfZxe6G3e4VdkjKjhUra4JNgO3jw3O+7/zpcFxwps1g8PutjdtvHmnc7fbe+vtd959797m/Sc6n6mEniQ5z9WzMdGUM0lPDOcPisUJWLM6dPxy6/t+dMzqjTL5c9mXtCRIBPJMpYQA6p486vW18rFmIx6J8VT1A9nde9dEQYT0TmDPBjI5LMYyq5+XRblRZA2ym1JAK4STNjXPJqiXCPBb2qxR7EfwT/d72cehRsWAp8maOIFMkKTE9L8K2EPpV2QjBqXeAlEmEBTHThPDylwVv7oCaMci3TjJYsV6PaHw8V7UDx+jXfRdHwH2lBgfXlVZTKImgpzHSxXm+QtioBWq9DbjNfhDqvQ7Xl2nMoNy7IBdbOBVgbq0W+QWGOhbhqhp0eAUvLqNHR4C6XnV/4mrfYn8pihaWy37gBbWwxiHmRE4/URj5QTnZrl217kArMnWxoUp5xDonjdk1vCHRjOflyHrN1rqNV8g/4q4eBvu6DXvj/dfFMv/cwlc0X1o/nrvazc/AZxmJlxhdaRXc8L7ghWbTI3Nn/BiSnxdFm/LRXZI+GqaEncgbQZCqRjCQ7SFVIum9bqWqg8XfhZnDbf6wVxHaShd637C9l7u0t768THDQJnN1/JIEBYJqVGfntOHK5PndmFJuiJ3CaiG7HsYv3J7hsOfTpcoOyeqgfXuaxS9ap6o/gnVy1WE9O9cM4X8kfM01gobxva3B/sA9aF2IamErqJ/jeHND4TRPZoJKk3Ci9Wk0KMyoJMqwhNOqh2eaFiR5Sb0FERJoCOj0t2qFdoGTYqyXMFHGuS0qx4lEVrPxRgs7QuoL59ZdvZ6cxkn49KJouZoTLxRNmMI5Mje0WjlCmaGD4HgSKQawomRJohoGLHPqyQjOl/IyaZiKJGJU6c+yNkMaiajqf+0R7WFJXyW5ESmH5c4I4LxeUozMuPGXsLZQm6r14P0jBW6Lt0SklODc1gjTBJuF5DbQk21WzJ+1fTwNxQapOgRP1jQRUxuYJI/B1VQcMm+EO7wqt/s2RyaQliM63SBQDJ2LrkBZVl5dcmzXF4nKZ0Uj4DV/FygAkAza4O1p081bwJRGl2dyXhysB9un/w02dbh1/V83o3+CD4KAiDKHgYHAbfB8fBSZB0NjsPO4edL7u/df/o/tn9y5tu3Kp93g8aT/fvfwDACJWS</latexit>

pros: cons:

■ can be sparse/high variance ■ lack of shared representations ■ task-specific ■ worse performance ■ interpretable, explainable ■ can generalize well ■ fast training and inference ■ channel domain knowledge

slide-70
SLIDE 70

. . . . . .

Neural sequence labeling

13

■ Parameterize f with a (deep) neural network.

f(w, ym, ym−1, m) =

<latexit sha1_base64="y2cxWKwz3S3xVgA5ofaWNPquJR0=">AKNXicrVXdbts2FbTra69v6a73A23Ii8NJmVDejQwUD2iwFtgxY2g6hK9ASZbMlKYGk0xiCXmgvsGfZxe6G3e4VdkjKjhUra4JNgO3jw3O+7/zpcFxwps1g8PutjdtvHmnc7fbe+vtd959797m/Sc6n6mEniQ5z9WzMdGUM0lPDOcPisUJWLM6dPxy6/t+dMzqjTL5c9mXtCRIBPJMpYQA6p486vW18rFmIx6J8VT1A9nde9dEQYT0TmDPBjI5LMYyq5+XRblRZA2ym1JAK4STNjXPJqiXCPBb2qxR7EfwT/d72cehRsWAp8maOIFMkKTE9L8K2EPpV2QjBqXeAlEmEBTHThPDylwVv7oCaMci3TjJYsV6PaHw8V7UDx+jXfRdHwH2lBgfXlVZTKImgpzHSxXm+QtioBWq9DbjNfhDqvQ7Xl2nMoNy7IBdbOBVgbq0W+QWGOhbhqhp0eAUvLqNHR4C6XnV/4mrfYn8pihaWy37gBbWwxiHmRE4/URj5QTnZrl217kArMnWxoUp5xDonjdk1vCHRjOflyHrN1rqNV8g/4q4eBvu6DXvj/dfFMv/cwlc0X1o/nrvazc/AZxmJlxhdaRXc8L7ghWbTI3Nn/BiSnxdFm/LRXZI+GqaEncgbQZCqRjCQ7SFVIum9bqWqg8XfhZnDbf6wVxHaShd637C9l7u0t768THDQJnN1/JIEBYJqVGfntOHK5PndmFJuiJ3CaiG7HsYv3J7hsOfTpcoOyeqgfXuaxS9ap6o/gnVy1WE9O9cM4X8kfM01gobxva3B/sA9aF2IamErqJ/jeHND4TRPZoJKk3Ci9Wk0KMyoJMqwhNOqh2eaFiR5Sb0FERJoCOj0t2qFdoGTYqyXMFHGuS0qx4lEVrPxRgs7QuoL59ZdvZ6cxkn49KJouZoTLxRNmMI5Mje0WjlCmaGD4HgSKQawomRJohoGLHPqyQjOl/IyaZiKJGJU6c+yNkMaiajqf+0R7WFJXyW5ESmH5c4I4LxeUozMuPGXsLZQm6r14P0jBW6Lt0SklODc1gjTBJuF5DbQk21WzJ+1fTwNxQapOgRP1jQRUxuYJI/B1VQcMm+EO7wqt/s2RyaQliM63SBQDJ2LrkBZVl5dcmzXF4nKZ0Uj4DV/FygAkAza4O1p081bwJRGl2dyXhysB9un/w02dbh1/V83o3+CD4KAiDKHgYHAbfB8fBSZB0NjsPO4edL7u/df/o/tn9y5tu3Kp93g8aT/fvfwDACJWS</latexit>
slide-71
SLIDE 71

. . . . . .

Neural sequence labeling

13

■ Parameterize f with a (deep) neural network.

f(w, ym, ym−1, m) =

<latexit sha1_base64="y2cxWKwz3S3xVgA5ofaWNPquJR0=">AKNXicrVXdbts2FbTra69v6a73A23Ii8NJmVDejQwUD2iwFtgxY2g6hK9ASZbMlKYGk0xiCXmgvsGfZxe6G3e4VdkjKjhUra4JNgO3jw3O+7/zpcFxwps1g8PutjdtvHmnc7fbe+vtd959797m/Sc6n6mEniQ5z9WzMdGUM0lPDOcPisUJWLM6dPxy6/t+dMzqjTL5c9mXtCRIBPJMpYQA6p486vW18rFmIx6J8VT1A9nde9dEQYT0TmDPBjI5LMYyq5+XRblRZA2ym1JAK4STNjXPJqiXCPBb2qxR7EfwT/d72cehRsWAp8maOIFMkKTE9L8K2EPpV2QjBqXeAlEmEBTHThPDylwVv7oCaMci3TjJYsV6PaHw8V7UDx+jXfRdHwH2lBgfXlVZTKImgpzHSxXm+QtioBWq9DbjNfhDqvQ7Xl2nMoNy7IBdbOBVgbq0W+QWGOhbhqhp0eAUvLqNHR4C6XnV/4mrfYn8pihaWy37gBbWwxiHmRE4/URj5QTnZrl217kArMnWxoUp5xDonjdk1vCHRjOflyHrN1rqNV8g/4q4eBvu6DXvj/dfFMv/cwlc0X1o/nrvazc/AZxmJlxhdaRXc8L7ghWbTI3Nn/BiSnxdFm/LRXZI+GqaEncgbQZCqRjCQ7SFVIum9bqWqg8XfhZnDbf6wVxHaShd637C9l7u0t768THDQJnN1/JIEBYJqVGfntOHK5PndmFJuiJ3CaiG7HsYv3J7hsOfTpcoOyeqgfXuaxS9ap6o/gnVy1WE9O9cM4X8kfM01gobxva3B/sA9aF2IamErqJ/jeHND4TRPZoJKk3Ci9Wk0KMyoJMqwhNOqh2eaFiR5Sb0FERJoCOj0t2qFdoGTYqyXMFHGuS0qx4lEVrPxRgs7QuoL59ZdvZ6cxkn49KJouZoTLxRNmMI5Mje0WjlCmaGD4HgSKQawomRJohoGLHPqyQjOl/IyaZiKJGJU6c+yNkMaiajqf+0R7WFJXyW5ESmH5c4I4LxeUozMuPGXsLZQm6r14P0jBW6Lt0SklODc1gjTBJuF5DbQk21WzJ+1fTwNxQapOgRP1jQRUxuYJI/B1VQcMm+EO7wqt/s2RyaQliM63SBQDJ2LrkBZVl5dcmzXF4nKZ0Uj4DV/FygAkAza4O1p081bwJRGl2dyXhysB9un/w02dbh1/V83o3+CD4KAiDKHgYHAbfB8fBSZB0NjsPO4edL7u/df/o/tn9y5tu3Kp93g8aT/fvfwDACJWS</latexit>

pros: cons:

slide-72
SLIDE 72

. . . . . .

Neural sequence labeling

13

■ Parameterize f with a (deep) neural network.

f(w, ym, ym−1, m) =

<latexit sha1_base64="y2cxWKwz3S3xVgA5ofaWNPquJR0=">AKNXicrVXdbts2FbTra69v6a73A23Ii8NJmVDejQwUD2iwFtgxY2g6hK9ASZbMlKYGk0xiCXmgvsGfZxe6G3e4VdkjKjhUra4JNgO3jw3O+7/zpcFxwps1g8PutjdtvHmnc7fbe+vtd959797m/Sc6n6mEniQ5z9WzMdGUM0lPDOcPisUJWLM6dPxy6/t+dMzqjTL5c9mXtCRIBPJMpYQA6p486vW18rFmIx6J8VT1A9nde9dEQYT0TmDPBjI5LMYyq5+XRblRZA2ym1JAK4STNjXPJqiXCPBb2qxR7EfwT/d72cehRsWAp8maOIFMkKTE9L8K2EPpV2QjBqXeAlEmEBTHThPDylwVv7oCaMci3TjJYsV6PaHw8V7UDx+jXfRdHwH2lBgfXlVZTKImgpzHSxXm+QtioBWq9DbjNfhDqvQ7Xl2nMoNy7IBdbOBVgbq0W+QWGOhbhqhp0eAUvLqNHR4C6XnV/4mrfYn8pihaWy37gBbWwxiHmRE4/URj5QTnZrl217kArMnWxoUp5xDonjdk1vCHRjOflyHrN1rqNV8g/4q4eBvu6DXvj/dfFMv/cwlc0X1o/nrvazc/AZxmJlxhdaRXc8L7ghWbTI3Nn/BiSnxdFm/LRXZI+GqaEncgbQZCqRjCQ7SFVIum9bqWqg8XfhZnDbf6wVxHaShd637C9l7u0t768THDQJnN1/JIEBYJqVGfntOHK5PndmFJuiJ3CaiG7HsYv3J7hsOfTpcoOyeqgfXuaxS9ap6o/gnVy1WE9O9cM4X8kfM01gobxva3B/sA9aF2IamErqJ/jeHND4TRPZoJKk3Ci9Wk0KMyoJMqwhNOqh2eaFiR5Sb0FERJoCOj0t2qFdoGTYqyXMFHGuS0qx4lEVrPxRgs7QuoL59ZdvZ6cxkn49KJouZoTLxRNmMI5Mje0WjlCmaGD4HgSKQawomRJohoGLHPqyQjOl/IyaZiKJGJU6c+yNkMaiajqf+0R7WFJXyW5ESmH5c4I4LxeUozMuPGXsLZQm6r14P0jBW6Lt0SklODc1gjTBJuF5DbQk21WzJ+1fTwNxQapOgRP1jQRUxuYJI/B1VQcMm+EO7wqt/s2RyaQliM63SBQDJ2LrkBZVl5dcmzXF4nKZ0Uj4DV/FygAkAza4O1p081bwJRGl2dyXhysB9un/w02dbh1/V83o3+CD4KAiDKHgYHAbfB8fBSZB0NjsPO4edL7u/df/o/tn9y5tu3Kp93g8aT/fvfwDACJWS</latexit>

pros: cons:

■ shared representations ■ channel external knowledge (e.g.

word embeddings)

■ high accuracy

slide-73
SLIDE 73

. . . . . .

Neural sequence labeling

13

■ Parameterize f with a (deep) neural network.

f(w, ym, ym−1, m) =

<latexit sha1_base64="y2cxWKwz3S3xVgA5ofaWNPquJR0=">AKNXicrVXdbts2FbTra69v6a73A23Ii8NJmVDejQwUD2iwFtgxY2g6hK9ASZbMlKYGk0xiCXmgvsGfZxe6G3e4VdkjKjhUra4JNgO3jw3O+7/zpcFxwps1g8PutjdtvHmnc7fbe+vtd959797m/Sc6n6mEniQ5z9WzMdGUM0lPDOcPisUJWLM6dPxy6/t+dMzqjTL5c9mXtCRIBPJMpYQA6p486vW18rFmIx6J8VT1A9nde9dEQYT0TmDPBjI5LMYyq5+XRblRZA2ym1JAK4STNjXPJqiXCPBb2qxR7EfwT/d72cehRsWAp8maOIFMkKTE9L8K2EPpV2QjBqXeAlEmEBTHThPDylwVv7oCaMci3TjJYsV6PaHw8V7UDx+jXfRdHwH2lBgfXlVZTKImgpzHSxXm+QtioBWq9DbjNfhDqvQ7Xl2nMoNy7IBdbOBVgbq0W+QWGOhbhqhp0eAUvLqNHR4C6XnV/4mrfYn8pihaWy37gBbWwxiHmRE4/URj5QTnZrl217kArMnWxoUp5xDonjdk1vCHRjOflyHrN1rqNV8g/4q4eBvu6DXvj/dfFMv/cwlc0X1o/nrvazc/AZxmJlxhdaRXc8L7ghWbTI3Nn/BiSnxdFm/LRXZI+GqaEncgbQZCqRjCQ7SFVIum9bqWqg8XfhZnDbf6wVxHaShd637C9l7u0t768THDQJnN1/JIEBYJqVGfntOHK5PndmFJuiJ3CaiG7HsYv3J7hsOfTpcoOyeqgfXuaxS9ap6o/gnVy1WE9O9cM4X8kfM01gobxva3B/sA9aF2IamErqJ/jeHND4TRPZoJKk3Ci9Wk0KMyoJMqwhNOqh2eaFiR5Sb0FERJoCOj0t2qFdoGTYqyXMFHGuS0qx4lEVrPxRgs7QuoL59ZdvZ6cxkn49KJouZoTLxRNmMI5Mje0WjlCmaGD4HgSKQawomRJohoGLHPqyQjOl/IyaZiKJGJU6c+yNkMaiajqf+0R7WFJXyW5ESmH5c4I4LxeUozMuPGXsLZQm6r14P0jBW6Lt0SklODc1gjTBJuF5DbQk21WzJ+1fTwNxQapOgRP1jQRUxuYJI/B1VQcMm+EO7wqt/s2RyaQliM63SBQDJ2LrkBZVl5dcmzXF4nKZ0Uj4DV/FygAkAza4O1p081bwJRGl2dyXhysB9un/w02dbh1/V83o3+CD4KAiDKHgYHAbfB8fBSZB0NjsPO4edL7u/df/o/tn9y5tu3Kp93g8aT/fvfwDACJWS</latexit>

pros: cons:

■ hard to interpret feature

meaning, explain predictions

■ optimization/hyperparameters ■ prone to overfitting ■ compute-heavy ■ shared representations ■ channel external knowledge (e.g.

word embeddings)

■ high accuracy

slide-74
SLIDE 74

Neural sequence labeling

14

slide-75
SLIDE 75

Neural sequence labeling

14

back the bill will Janet <s> </s>

word embeddings

slide-76
SLIDE 76

Neural sequence labeling

14

back the bill will Janet <s> </s>

word embeddings

neural network

slide-77
SLIDE 77

Neural sequence labeling

14

back the bill will Janet <s> </s>

word embeddings per-token features

neural network

slide-78
SLIDE 78

Neural sequence labeling

14

back the bill will Janet <s> </s>

word embeddings per-token features

NNP MD VB? DT? NN?

neural network

slide-79
SLIDE 79

Neural sequence labeling

15

Bidirectional RNNs

back the bill will Janet <s> </s>

word embeddings

slide-80
SLIDE 80

Neural sequence labeling

15

Bidirectional RNNs

back the bill will Janet <s> </s>

word embeddings forward RNN

slide-81
SLIDE 81

Neural sequence labeling

15

Bidirectional RNNs

back the bill will Janet <s> </s>

word embeddings forward RNN backward RNN

slide-82
SLIDE 82

Neural sequence labeling

15

Bidirectional RNNs

concatenate

back the bill will Janet <s> </s>

word embeddings forward RNN backward RNN per-token features

slide-83
SLIDE 83

Neural sequence labeling

15

Bidirectional RNNs

softmax(

<latexit sha1_base64="/4jVRd01VWx7s0GMy/AWIkSfBo=">AD43icfVJb9MwFHabAaPcOnjkJVBV2hBUbTcJIQ0AQ+8Ia0bpPqUDnuSWvNl8h2SouVX8Ab4pV/xQt/hSectEXLOnAU+8u5HznO45Tzoztdn/V6sHWtes3tm82bt2+c/dec+f+iVGZpjCgit9FhMDnEkYWGY5nKUaiIg5nMbnbwr/6Qy0YUoe20UKkSATyRJGifWmUVO2cSzcPA9fhcP5yLFnB/nLsAT7a9Bfg14eNdr4eAqWfHK7c2xV+GUvb2B7FQLZ1RiBZnuzj817M3ara6nW65wk3QW4EWq2j0U49wmNFMwHSUk6MGfa6qY0c0ZRDr56ZiAl9JxMYOihJAJM5Eph8rDtLeMwUdq/0oal9WKGI8KYhYh9ZNGEuewrjFf5hplNXkSOyTSzIOmyUJLx0EtSqByOmQZq+cIDQjXzXEM6JZpQ62fhRbxQZgp8BrbaCBWRM0lZvUIpFv5bg4TPVAlB5PiJwkRjC/GkJCM29xhk6zxVdI8Hc9YalYqzZcyNTAHi5VmEyYJ5BYXGxVsz+mFpd7A78FPwsN7z3BDyloYpX2TIieFP3s5ngR7iA/4tk8m+kh9W2XEnAN1NIoFKQLi8h5coAjidaZWmF8EZ+SdT/gCRe8WU8VNOWEf5C9i5fv01w0u/09jv9jwetw9erq7mNHqLHaBf10HN0iN6hIzRAFP1Ev2tBbSuA4GvwLfi+DK3XVjkPUGUFP/4A1gxOkQ=</latexit>

)

<latexit sha1_base64="/4jVRd01VWx7s0GMy/AWIkSfBo=">AD43icfVJb9MwFHabAaPcOnjkJVBV2hBUbTcJIQ0AQ+8Ia0bpPqUDnuSWvNl8h2SouVX8Ab4pV/xQt/hSectEXLOnAU+8u5HznO45Tzoztdn/V6sHWtes3tm82bt2+c/dec+f+iVGZpjCgit9FhMDnEkYWGY5nKUaiIg5nMbnbwr/6Qy0YUoe20UKkSATyRJGifWmUVO2cSzcPA9fhcP5yLFnB/nLsAT7a9Bfg14eNdr4eAqWfHK7c2xV+GUvb2B7FQLZ1RiBZnuzj817M3ara6nW65wk3QW4EWq2j0U49wmNFMwHSUk6MGfa6qY0c0ZRDr56ZiAl9JxMYOihJAJM5Eph8rDtLeMwUdq/0oal9WKGI8KYhYh9ZNGEuewrjFf5hplNXkSOyTSzIOmyUJLx0EtSqByOmQZq+cIDQjXzXEM6JZpQ62fhRbxQZgp8BrbaCBWRM0lZvUIpFv5bg4TPVAlB5PiJwkRjC/GkJCM29xhk6zxVdI8Hc9YalYqzZcyNTAHi5VmEyYJ5BYXGxVsz+mFpd7A78FPwsN7z3BDyloYpX2TIieFP3s5ngR7iA/4tk8m+kh9W2XEnAN1NIoFKQLi8h5coAjidaZWmF8EZ+SdT/gCRe8WU8VNOWEf5C9i5fv01w0u/09jv9jwetw9erq7mNHqLHaBf10HN0iN6hIzRAFP1Ev2tBbSuA4GvwLfi+DK3XVjkPUGUFP/4A1gxOkQ=</latexit>

concatenate

back the bill will Janet <s> </s>

word embeddings forward RNN backward RNN per-token features

V B N N A D J P R P W H M D D T

slide-84
SLIDE 84

Neural sequence labeling

16

Bidirectional RNNs

back the bill will Janet <s> </s>

word embeddings forward RNN backward RNN per-token features

NNP MD VB?

slide-85
SLIDE 85

Neural sequence labeling

16

Bidirectional RNNs

back the bill will Janet <s> </s>

word embeddings forward RNN backward RNN per-token features

NNP MD VB?

ti-1

slide-86
SLIDE 86

Neural sequence labeling

17

Bidirectional RNNs

back the bill will Janet <s> </s>

word embeddings forward RNN backward RNN per-token features

NNP MD VB?

ti-1

DT? NN?

slide-87
SLIDE 87

Neural sequence labeling

17

Bidirectional RNNs

back the bill will Janet <s> </s>

word embeddings forward RNN backward RNN per-token features

NNP MD VB? DT? NN?

slide-88
SLIDE 88

Neural sequence labeling

18

Bidirectional RNNs

back the bill will Janet <s> </s>

word embeddings forward RNN backward RNN per-token features

NNP MD VB? DT? NN?

slide-89
SLIDE 89

Neural sequence labeling

18

Bidirectional RNNs

back the bill will Janet <s> </s>

word embeddings forward RNN backward RNN per-token features

NNP MD VB? DT? NN?

slide-90
SLIDE 90

Neural sequence labeling

19

Bidirectional RNN-CRFs

back the bill will Janet <s> </s>

word embeddings forward RNN backward RNN per-token features

NNP MD VB? DT? NN?

slide-91
SLIDE 91

Neural sequence labeling

20

back the bill will Janet <s> </s>

word embeddings per-token features

NNP MD VB? DT? NN?

neural network

slide-92
SLIDE 92

Convolutional neural networks

21

slide-93
SLIDE 93

Convolutional neural networks

21

■ In NLP

, CNNs merge information across contiguous, fixed-width spans of tokens.

slide-94
SLIDE 94

Convolutional neural networks

21

■ In NLP

, CNNs merge information across contiguous, fixed-width spans of tokens.

■ Unlike computer vision, in NLP we use 1D CNNs.

slide-95
SLIDE 95

Convolutional neural networks

21

■ In NLP

, CNNs merge information across contiguous, fixed-width spans of tokens.

■ Unlike computer vision, in NLP we use 1D CNNs. ■ For sentence/document classification: pooling function over representations.

wait for the video and do n't rent it

n x k representation of sentence with static and non-static channels Convolutional layer with multiple filter widths and feature maps Max-over-time pooling Fully connected layer with dropout and softmax output

Figure from: Yoon Kim. Convolutional Neural Networks for Sentence Classification. EMNLP 2014.

slide-96
SLIDE 96

Convolutional neural networks

21

■ In NLP

, CNNs merge information across contiguous, fixed-width spans of tokens.

■ Unlike computer vision, in NLP we use 1D CNNs. ■ For sentence/document classification: pooling function over representations. ■ For example: sum, average. Most common: max pooling (over time).

wait for the video and do n't rent it

n x k representation of sentence with static and non-static channels Convolutional layer with multiple filter widths and feature maps Max-over-time pooling Fully connected layer with dropout and softmax output

Figure from: Yoon Kim. Convolutional Neural Networks for Sentence Classification. EMNLP 2014.

slide-97
SLIDE 97

Convolutional neural networks

22

■ In NLP

, CNNs merge information across contiguous, fixed-width spans of tokens. back the bill will Janet <s>

dims: [dword]

slide-98
SLIDE 98

Convolutional neural networks

22

■ In NLP

, CNNs merge information across contiguous, fixed-width spans of tokens. back the bill will Janet <s>

dims: [dword] kernel size = 3

slide-99
SLIDE 99

Convolutional neural networks

22

■ In NLP

, CNNs merge information across contiguous, fixed-width spans of tokens. back the bill will Janet <s>

dims: [dword] [kdword] kernel size = 3

slide-100
SLIDE 100

Convolutional neural networks

22

■ In NLP

, CNNs merge information across contiguous, fixed-width spans of tokens. back the bill will Janet <s>

Θ(x→z)

<latexit sha1_base64="oC6lR82ch2B7kQhLCX1aCX53M=">ADtXicfVLbtNAEN3GXIq5pfDIiyGKVBEcVqpSAgpAh54QRSpaStlTbRej5NV92LtrkOC5d/ha3gFib9h7SobipG8vp4Lp4zZyfODO23/+z0/Ju3Lx1e/eOf/fe/QcP23uPTo3KNYURVzp85gY4EzCyDL4TzTQETM4Sy+eF/Fz+agDVPyxC4ziASZSpYySqxzTdrDLo5FsSiDt8F4MSnYq8PyTVCDgw0YbEBYRj4+mYElX4v9BbYq+P68nLQ7/V6/tmAbhGvQWs7nuy1IpwomguQlnJizDjsZzYqiLaMcih9nBvICL0gUxg7KIkAExX1qGXQdZ4kSJV2j7RB7b1cURBhzFLELlMQOzNXY5Xzutg4t+nrqGAy1IumqU5jxwU1a6BQnTQC1fOkCoZo5rQGdE2qdun73cpsZ8DnY5iBURIVJ6+4NSrFw3xokfKNKCKTFwVOiWB8mUBKcm7LApt0g6+T5mUyZ5lZq7RYyeRjDhYrzaZMEs4htbg6m73mlcnz7+AO4uNHxyBD9noIlV2jEheirIonR3M8VPcQX/l8nkv0wHm2MVNQE3TCWBykAWZQ0pVwZwPNUqzxqEt+prou4HJHWKr/KhWbKcAsZXl2/bXA6IUHvcGXw87w3Xo1d9ET9AztoxAdoSH6iI7RCFH0A/1Ev9Bv78iLvMRLV6mtnXNY9QwT/0FBYlBGA=</latexit>

dims: [kdword x dz] [dz] [dword] [kdword] kernel size = 3 # filters

slide-101
SLIDE 101

Convolutional neural networks

23

back the bill will Janet <s>

dims: [kdword x dz] [dz] [dword] [kdword]

Θ(x→z)

<latexit sha1_base64="oC6lR82ch2B7kQhLCX1aCX53M=">ADtXicfVLbtNAEN3GXIq5pfDIiyGKVBEcVqpSAgpAh54QRSpaStlTbRej5NV92LtrkOC5d/ha3gFib9h7SobipG8vp4Lp4zZyfODO23/+z0/Ju3Lx1e/eOf/fe/QcP23uPTo3KNYURVzp85gY4EzCyDL4TzTQETM4Sy+eF/Fz+agDVPyxC4ziASZSpYySqxzTdrDLo5FsSiDt8F4MSnYq8PyTVCDgw0YbEBYRj4+mYElX4v9BbYq+P68nLQ7/V6/tmAbhGvQWs7nuy1IpwomguQlnJizDjsZzYqiLaMcih9nBvICL0gUxg7KIkAExX1qGXQdZ4kSJV2j7RB7b1cURBhzFLELlMQOzNXY5Xzutg4t+nrqGAy1IumqU5jxwU1a6BQnTQC1fOkCoZo5rQGdE2qdun73cpsZ8DnY5iBURIVJ6+4NSrFw3xokfKNKCKTFwVOiWB8mUBKcm7LApt0g6+T5mUyZ5lZq7RYyeRjDhYrzaZMEs4htbg6m73mlcnz7+AO4uNHxyBD9noIlV2jEheirIonR3M8VPcQX/l8nkv0wHm2MVNQE3TCWBykAWZQ0pVwZwPNUqzxqEt+prou4HJHWKr/KhWbKcAsZXl2/bXA6IUHvcGXw87w3Xo1d9ET9AztoxAdoSH6iI7RCFH0A/1Ev9Bv78iLvMRLV6mtnXNY9QwT/0FBYlBGA=</latexit>

■ In NLP

, CNNs merge information across contiguous, fixed-width spans of tokens.

slide-102
SLIDE 102

Convolutional neural networks

23

back the bill will Janet <s>

dims: [kdword x dz] [dz] [dword] [kdword]

Θ(x→z)

<latexit sha1_base64="oC6lR82ch2B7kQhLCX1aCX53M=">ADtXicfVLbtNAEN3GXIq5pfDIiyGKVBEcVqpSAgpAh54QRSpaStlTbRej5NV92LtrkOC5d/ha3gFib9h7SobipG8vp4Lp4zZyfODO23/+z0/Ju3Lx1e/eOf/fe/QcP23uPTo3KNYURVzp85gY4EzCyDL4TzTQETM4Sy+eF/Fz+agDVPyxC4ziASZSpYySqxzTdrDLo5FsSiDt8F4MSnYq8PyTVCDgw0YbEBYRj4+mYElX4v9BbYq+P68nLQ7/V6/tmAbhGvQWs7nuy1IpwomguQlnJizDjsZzYqiLaMcih9nBvICL0gUxg7KIkAExX1qGXQdZ4kSJV2j7RB7b1cURBhzFLELlMQOzNXY5Xzutg4t+nrqGAy1IumqU5jxwU1a6BQnTQC1fOkCoZo5rQGdE2qdun73cpsZ8DnY5iBURIVJ6+4NSrFw3xokfKNKCKTFwVOiWB8mUBKcm7LApt0g6+T5mUyZ5lZq7RYyeRjDhYrzaZMEs4htbg6m73mlcnz7+AO4uNHxyBD9noIlV2jEheirIonR3M8VPcQX/l8nkv0wHm2MVNQE3TCWBykAWZQ0pVwZwPNUqzxqEt+prou4HJHWKr/KhWbKcAsZXl2/bXA6IUHvcGXw87w3Xo1d9ET9AztoxAdoSH6iI7RCFH0A/1Ev9Bv78iLvMRLV6mtnXNY9QwT/0FBYlBGA=</latexit>

stride = 1

■ In NLP

, CNNs merge information across contiguous, fixed-width spans of tokens.

slide-103
SLIDE 103

Convolutional neural networks

23

back the bill will Janet <s>

dims: [kdword x dz] [dz] [dword] [kdword]

Θ(x→z)

<latexit sha1_base64="oC6lR82ch2B7kQhLCX1aCX53M=">ADtXicfVLbtNAEN3GXIq5pfDIiyGKVBEcVqpSAgpAh54QRSpaStlTbRej5NV92LtrkOC5d/ha3gFib9h7SobipG8vp4Lp4zZyfODO23/+z0/Ju3Lx1e/eOf/fe/QcP23uPTo3KNYURVzp85gY4EzCyDL4TzTQETM4Sy+eF/Fz+agDVPyxC4ziASZSpYySqxzTdrDLo5FsSiDt8F4MSnYq8PyTVCDgw0YbEBYRj4+mYElX4v9BbYq+P68nLQ7/V6/tmAbhGvQWs7nuy1IpwomguQlnJizDjsZzYqiLaMcih9nBvICL0gUxg7KIkAExX1qGXQdZ4kSJV2j7RB7b1cURBhzFLELlMQOzNXY5Xzutg4t+nrqGAy1IumqU5jxwU1a6BQnTQC1fOkCoZo5rQGdE2qdun73cpsZ8DnY5iBURIVJ6+4NSrFw3xokfKNKCKTFwVOiWB8mUBKcm7LApt0g6+T5mUyZ5lZq7RYyeRjDhYrzaZMEs4htbg6m73mlcnz7+AO4uNHxyBD9noIlV2jEheirIonR3M8VPcQX/l8nkv0wHm2MVNQE3TCWBykAWZQ0pVwZwPNUqzxqEt+prou4HJHWKr/KhWbKcAsZXl2/bXA6IUHvcGXw87w3Xo1d9ET9AztoxAdoSH6iI7RCFH0A/1Ev9Bv78iLvMRLV6mtnXNY9QwT/0FBYlBGA=</latexit>

stride = 1

■ In NLP

, CNNs merge information across contiguous, fixed-width spans of tokens.

slide-104
SLIDE 104

Convolutional neural networks

23

back the bill will Janet <s>

dims: [kdword x dz] [dz] [dword] [kdword]

Θ(x→z)

<latexit sha1_base64="oC6lR82ch2B7kQhLCX1aCX53M=">ADtXicfVLbtNAEN3GXIq5pfDIiyGKVBEcVqpSAgpAh54QRSpaStlTbRej5NV92LtrkOC5d/ha3gFib9h7SobipG8vp4Lp4zZyfODO23/+z0/Ju3Lx1e/eOf/fe/QcP23uPTo3KNYURVzp85gY4EzCyDL4TzTQETM4Sy+eF/Fz+agDVPyxC4ziASZSpYySqxzTdrDLo5FsSiDt8F4MSnYq8PyTVCDgw0YbEBYRj4+mYElX4v9BbYq+P68nLQ7/V6/tmAbhGvQWs7nuy1IpwomguQlnJizDjsZzYqiLaMcih9nBvICL0gUxg7KIkAExX1qGXQdZ4kSJV2j7RB7b1cURBhzFLELlMQOzNXY5Xzutg4t+nrqGAy1IumqU5jxwU1a6BQnTQC1fOkCoZo5rQGdE2qdun73cpsZ8DnY5iBURIVJ6+4NSrFw3xokfKNKCKTFwVOiWB8mUBKcm7LApt0g6+T5mUyZ5lZq7RYyeRjDhYrzaZMEs4htbg6m73mlcnz7+AO4uNHxyBD9noIlV2jEheirIonR3M8VPcQX/l8nkv0wHm2MVNQE3TCWBykAWZQ0pVwZwPNUqzxqEt+prou4HJHWKr/KhWbKcAsZXl2/bXA6IUHvcGXw87w3Xo1d9ET9AztoxAdoSH6iI7RCFH0A/1Ev9Bv78iLvMRLV6mtnXNY9QwT/0FBYlBGA=</latexit>

stride = 1

■ In NLP

, CNNs merge information across contiguous, fixed-width spans of tokens.

slide-105
SLIDE 105

Convolutional neural networks

23

back the bill will Janet <s>

dims: [kdword x dz] [dz] [dword] [kdword]

Θ(x→z)

<latexit sha1_base64="oC6lR82ch2B7kQhLCX1aCX53M=">ADtXicfVLbtNAEN3GXIq5pfDIiyGKVBEcVqpSAgpAh54QRSpaStlTbRej5NV92LtrkOC5d/ha3gFib9h7SobipG8vp4Lp4zZyfODO23/+z0/Ju3Lx1e/eOf/fe/QcP23uPTo3KNYURVzp85gY4EzCyDL4TzTQETM4Sy+eF/Fz+agDVPyxC4ziASZSpYySqxzTdrDLo5FsSiDt8F4MSnYq8PyTVCDgw0YbEBYRj4+mYElX4v9BbYq+P68nLQ7/V6/tmAbhGvQWs7nuy1IpwomguQlnJizDjsZzYqiLaMcih9nBvICL0gUxg7KIkAExX1qGXQdZ4kSJV2j7RB7b1cURBhzFLELlMQOzNXY5Xzutg4t+nrqGAy1IumqU5jxwU1a6BQnTQC1fOkCoZo5rQGdE2qdun73cpsZ8DnY5iBURIVJ6+4NSrFw3xokfKNKCKTFwVOiWB8mUBKcm7LApt0g6+T5mUyZ5lZq7RYyeRjDhYrzaZMEs4htbg6m73mlcnz7+AO4uNHxyBD9noIlV2jEheirIonR3M8VPcQX/l8nkv0wHm2MVNQE3TCWBykAWZQ0pVwZwPNUqzxqEt+prou4HJHWKr/KhWbKcAsZXl2/bXA6IUHvcGXw87w3Xo1d9ET9AztoxAdoSH6iI7RCFH0A/1Ev9Bv78iLvMRLV6mtnXNY9QwT/0FBYlBGA=</latexit>

stride = 1

■ In NLP

, CNNs merge information across contiguous, fixed-width spans of tokens.

# filters

slide-106
SLIDE 106

Sequence labeling w/ CNNs

24

committee awards Strickland advanced optics who Nobel

I-ORG B-PER B-ORG O O O O

slide-107
SLIDE 107

Sequence labeling w/ CNNs

24

committee awards Strickland advanced optics who Nobel

I-ORG B-PER B-ORG O O O O

slide-108
SLIDE 108

Sequence labeling w/ CNNs

24

committee awards Strickland advanced optics who Nobel

I-ORG B-PER B-ORG O O O O

slide-109
SLIDE 109

Sequence labeling w/ CNNs

24

committee awards Strickland advanced optics who Nobel

I-ORG B-PER B-ORG O O O O

slide-110
SLIDE 110

Sequence labeling w/ CNNs

24

committee awards Strickland advanced optics who Nobel

I-ORG B-PER B-ORG O O O O

slide-111
SLIDE 111

Sequence labeling w/ CNNs

24

committee awards Strickland advanced optics who Nobel

I-ORG B-PER B-ORG O O O O

slide-112
SLIDE 112

Sequence labeling w/ CNNs

24

committee awards Strickland advanced optics who Nobel

I-ORG B-PER B-ORG O O O O

slide-113
SLIDE 113

Sequence labeling w/ CNNs

24

committee awards Strickland advanced optics who Nobel

I-ORG B-PER B-ORG O O O O

encode in parallel

slide-114
SLIDE 114

Sequence labeling w/ CNNs

24

committee awards Strickland advanced optics who Nobel

I-ORG B-PER B-ORG O O O O

encode in parallel

slide-115
SLIDE 115

Sequence labeling w/ CNNs

24

committee awards Strickland advanced optics who Nobel

I-ORG B-PER B-ORG O O O O

encode in parallel

■ Used for semantic role labeling, with poor results [Collobert et al. 2011].

slide-116
SLIDE 116

Sequence labeling w/ CNNs

24

committee awards Strickland advanced optics who Nobel

I-ORG B-PER B-ORG O O O O

encode in parallel

■ Used for semantic role labeling, with poor results [Collobert et al. 2011].

slide-117
SLIDE 117

Sequence labeling w/ CNNs

24

committee awards Strickland advanced optics who Nobel

I-ORG B-PER B-ORG O O O O

encode in parallel

■ Used for semantic role labeling, with poor results [Collobert et al. 2011]. ■ Not enough context: amount of context grows linearly w/ number of layers.

slide-118
SLIDE 118

Sequence labeling w/ dilated CNNs

25

committee awards Strickland advanced optics who Nobel

slide-119
SLIDE 119

Sequence labeling w/ dilated CNNs

25

■ Additional parameter: dilation width δ

committee awards Strickland advanced optics who Nobel

slide-120
SLIDE 120

Sequence labeling w/ dilated CNNs

25

■ Additional parameter: dilation width δ

committee awards Strickland advanced optics who Nobel

δ=1

slide-121
SLIDE 121

Sequence labeling w/ dilated CNNs

25

■ Additional parameter: dilation width δ

committee awards Strickland advanced optics who Nobel

δ=1

slide-122
SLIDE 122

Sequence labeling w/ dilated CNNs

25

■ Additional parameter: dilation width δ

committee awards Strickland advanced optics who Nobel

δ=1 δ=2

slide-123
SLIDE 123

Sequence labeling w/ dilated CNNs

25

■ Additional parameter: dilation width δ

committee awards Strickland advanced optics who Nobel

δ=1 δ=2

slide-124
SLIDE 124

Sequence labeling w/ dilated CNNs

25

■ Additional parameter: dilation width δ

committee awards Strickland advanced optics who Nobel

δ=1 δ=2 δ=4

slide-125
SLIDE 125

Sequence labeling w/ dilated CNNs

25

■ Additional parameter: dilation width δ

committee awards Strickland advanced optics who Nobel

δ=1 δ=2 δ=4

slide-126
SLIDE 126

Sequence labeling w/ dilated CNNs

25

■ Additional parameter: dilation width δ

committee awards Strickland advanced optics who Nobel

δ=1 δ=2 δ=4

slide-127
SLIDE 127

Sequence labeling w/ dilated CNNs

25

■ Additional parameter: dilation width δ

committee awards Strickland advanced optics who Nobel

δ=1 δ=2 δ=4 δ=1

slide-128
SLIDE 128

Sequence labeling w/ dilated CNNs

25

■ Additional parameter: dilation width δ

committee awards Strickland advanced optics who Nobel

B-ORG B-PER I-ORG

O O O O δ=1 δ=2 δ=4 δ=1

slide-129
SLIDE 129

Sequence labeling w/ dilated CNNs

25

■ Additional parameter: dilation width δ ■ Context window grows exponentially w/ number of layers.

committee awards Strickland advanced optics who Nobel

B-ORG B-PER I-ORG

O O O O δ=1 δ=2 δ=4 δ=1

slide-130
SLIDE 130

Sequence labeling w/ dilated CNNs

26

slide-131
SLIDE 131

Sequence labeling w/ dilated CNNs

26

■ Why use a (dilated) CNN over a (bidirectional) LSTM?

slide-132
SLIDE 132

Sequence labeling w/ dilated CNNs

26

■ Why use a (dilated) CNN over a (bidirectional) LSTM? ■ Efficiency (on GPUs). Representations for every token in the sequence can be

computed in parallel for CNN; linear dependence on sequence length for LSTM.

slide-133
SLIDE 133

Sequence labeling w/ dilated CNNs

26

■ Why use a (dilated) CNN over a (bidirectional) LSTM? ■ Efficiency (on GPUs). Representations for every token in the sequence can be

computed in parallel for CNN; linear dependence on sequence length for LSTM.

NER F1 sentence document Bi-LSTM-CRF 90.4 ± 0.1 90.6 ± 0.2 ID-CNN (ours) 90.3 ± 0.3 90.7 ± 0.2

slide-134
SLIDE 134

Sequence labeling w/ dilated CNNs

26

■ Why use a (dilated) CNN over a (bidirectional) LSTM? ■ Efficiency (on GPUs). Representations for every token in the sequence can be

computed in parallel for CNN; linear dependence on sequence length for LSTM.

NER F1 sentence document Bi-LSTM-CRF 90.4 ± 0.1 90.6 ± 0.2 ID-CNN (ours) 90.3 ± 0.3 90.7 ± 0.2

slide-135
SLIDE 135

Sequence labeling w/ dilated CNNs

26

■ Why use a (dilated) CNN over a (bidirectional) LSTM? ■ Efficiency (on GPUs). Representations for every token in the sequence can be

computed in parallel for CNN; linear dependence on sequence length for LSTM.

NER F1 sentence document Bi-LSTM-CRF 90.4 ± 0.1 90.6 ± 0.2 ID-CNN (ours) 90.3 ± 0.3 90.7 ± 0.2

1 4 x s p e e d

  • u

p

slide-136
SLIDE 136

Sequence labeling w/ dilated CNNs

26

■ Why use a (dilated) CNN over a (bidirectional) LSTM? ■ Efficiency (on GPUs). Representations for every token in the sequence can be

computed in parallel for CNN; linear dependence on sequence length for LSTM.

NER F1 sentence document Bi-LSTM-CRF 90.4 ± 0.1 90.6 ± 0.2 ID-CNN (ours) 90.3 ± 0.3 90.7 ± 0.2

1 4 x s p e e d

  • u

p

slide-137
SLIDE 137

Sequence labeling w/ dilated CNNs

26

■ Why use a (dilated) CNN over a (bidirectional) LSTM? ■ Efficiency (on GPUs). Representations for every token in the sequence can be

computed in parallel for CNN; linear dependence on sequence length for LSTM.

NER F1 sentence document Bi-LSTM-CRF 90.4 ± 0.1 90.6 ± 0.2 ID-CNN (ours) 90.3 ± 0.3 90.7 ± 0.2

1 4 x s p e e d

  • u

p 8 x s p e e d

  • u

p

slide-138
SLIDE 138

Character embeddings

27

slide-139
SLIDE 139

Character embeddings

27

■ Character-level representations of words help to deal with UNKs.

slide-140
SLIDE 140

Character embeddings

27

■ Character-level representations of words help to deal with UNKs. ■ Usually, CNNs + pooling are used to compose characters into word embeddings.

slide-141
SLIDE 141

Character embeddings

27

■ Character-level representations of words help to deal with UNKs. ■ Usually, CNNs + pooling are used to compose characters into word embeddings.

Images from: Ma and Hovy. End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF . ACL 2016.
 Lample et al. Neural Architectures for Named Entity Recognition. NAACL 2016.

P l a y i n g Padding Padding Char Embedding Convolution Max Pooling Char Representation

CNN + max pooling

slide-142
SLIDE 142

Character embeddings

27

■ Character-level representations of words help to deal with UNKs. ■ Usually, CNNs + pooling are used to compose characters into word embeddings.

Images from: Ma and Hovy. End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF . ACL 2016.
 Lample et al. Neural Architectures for Named Entity Recognition. NAACL 2016.

P l a y i n g Padding Padding Char Embedding Convolution Max Pooling Char Representation

bidirectional LSTM CNN + max pooling

slide-143
SLIDE 143

Character embeddings

27

■ Character-level representations of words help to deal with UNKs. ■ Usually, CNNs + pooling are used to compose characters into word embeddings.

Images from: Ma and Hovy. End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF . ACL 2016.
 Lample et al. Neural Architectures for Named Entity Recognition. NAACL 2016.

P l a y i n g Padding Padding Char Embedding Convolution Max Pooling Char Representation

bidirectional LSTM CNN + max pooling

slide-144
SLIDE 144

Multilingual part-of-speech tagging

28

slide-145
SLIDE 145

Multilingual part-of-speech tagging

28

■ Many UNKs in morphologically-rich languages like Czech, Hungarian, Turkish

slide-146
SLIDE 146

Multilingual part-of-speech tagging

28

■ Many UNKs in morphologically-rich languages like Czech, Hungarian, Turkish ■ 250,000 word corpus of Hungarian has > 2x as many types as a similarly

sized corpus of English

slide-147
SLIDE 147

Multilingual part-of-speech tagging

28

■ Many UNKs in morphologically-rich languages like Czech, Hungarian, Turkish ■ 250,000 word corpus of Hungarian has > 2x as many types as a similarly

sized corpus of English

■ 10 million word corpus of Turkish contains 4x as many types

slide-148
SLIDE 148

Multilingual part-of-speech tagging

28

■ Many UNKs in morphologically-rich languages like Czech, Hungarian, Turkish ■ 250,000 word corpus of Hungarian has > 2x as many types as a similarly

sized corpus of English

■ 10 million word corpus of Turkish contains 4x as many types ■ Information coded in morphology

slide-149
SLIDE 149

Multilingual part-of-speech tagging

28

■ Many UNKs in morphologically-rich languages like Czech, Hungarian, Turkish ■ 250,000 word corpus of Hungarian has > 2x as many types as a similarly

sized corpus of English

■ 10 million word corpus of Turkish contains 4x as many types ■ Information coded in morphology

partilerindeydi partisindeydiler

slide-150
SLIDE 150

Multilingual part-of-speech tagging

28

■ Many UNKs in morphologically-rich languages like Czech, Hungarian, Turkish ■ 250,000 word corpus of Hungarian has > 2x as many types as a similarly

sized corpus of English

■ 10 million word corpus of Turkish contains 4x as many types ■ Information coded in morphology

partilerindeydi partisindeydiler

he/she/they(sing) were/was at their(plur) party

slide-151
SLIDE 151

Multilingual part-of-speech tagging

28

■ Many UNKs in morphologically-rich languages like Czech, Hungarian, Turkish ■ 250,000 word corpus of Hungarian has > 2x as many types as a similarly

sized corpus of English

■ 10 million word corpus of Turkish contains 4x as many types ■ Information coded in morphology

partilerindeydi partisindeydiler

he/she/they(sing) were/was at their(plur) party they(plur) were at his/her/their(sing) party

slide-152
SLIDE 152

Multilingual part-of-speech tagging

28

■ Many UNKs in morphologically-rich languages like Czech, Hungarian, Turkish ■ 250,000 word corpus of Hungarian has > 2x as many types as a similarly

sized corpus of English

■ 10 million word corpus of Turkish contains 4x as many types ■ Information coded in morphology

partilerindeydi partisindeydiler

he/she/they(sing) were/was at their(plur) party party party they(plur) were at his/her/their(sing) party

slide-153
SLIDE 153

Multilingual part-of-speech tagging

28

■ Many UNKs in morphologically-rich languages like Czech, Hungarian, Turkish ■ 250,000 word corpus of Hungarian has > 2x as many types as a similarly

sized corpus of English

■ 10 million word corpus of Turkish contains 4x as many types ■ Information coded in morphology

partilerindeydi partisindeydiler

he/she/they(sing) were/was at their(plur) party party in party in they(plur) were at his/her/their(sing) party

slide-154
SLIDE 154

Multilingual part-of-speech tagging

28

■ Many UNKs in morphologically-rich languages like Czech, Hungarian, Turkish ■ 250,000 word corpus of Hungarian has > 2x as many types as a similarly

sized corpus of English

■ 10 million word corpus of Turkish contains 4x as many types ■ Information coded in morphology

partilerindeydi partisindeydiler

he/she/they(sing) were/was at their(plur) party party their in she/he/they(sing) was party his/her/their(sing) in they(plur) were they(plur) were at his/her/their(sing) party

slide-155
SLIDE 155

Multilingual part-of-speech tagging

28

■ Many UNKs in morphologically-rich languages like Czech, Hungarian, Turkish ■ 250,000 word corpus of Hungarian has > 2x as many types as a similarly

sized corpus of English

■ 10 million word corpus of Turkish contains 4x as many types ■ Information coded in morphology

partilerindeydi partisindeydiler

he/she/they(sing) were/was at their(plur) party party their in she/he/they(sing) was party his/her/their(sing) in they(plur) were they(plur) were at his/her/their(sing) party

  • 1. Yerdeki izin temizlenmesi gerek.

iz + Noun+A3sg+Pnon+Gen The trace on the floor should be cleaned. ¨

slide-156
SLIDE 156

Multilingual part-of-speech tagging

29

slide-157
SLIDE 157

Multilingual part-of-speech tagging

29

■ In non-word-space languages like Chinese, word segmentation is either applied

before tagging or performed jointly.

slide-158
SLIDE 158

Multilingual part-of-speech tagging

29

■ In non-word-space languages like Chinese, word segmentation is either applied

before tagging or performed jointly.

⁄€e;≥[ “Yao Ming reaches

Yao Ming reaches the finals

slide-159
SLIDE 159

Multilingual part-of-speech tagging

29

■ In non-word-space languages like Chinese, word segmentation is either applied

before tagging or performed jointly.

⁄€e;≥[ “Yao Ming reaches

Yao Ming reaches the finals

⁄ “Yao ;≥[ Ming reaches €e; ao Ming

YaoMing reaches finals

slide-160
SLIDE 160

Multilingual part-of-speech tagging

29

■ In non-word-space languages like Chinese, word segmentation is either applied

before tagging or performed jointly.

⁄€e;≥[ “Yao Ming reaches

Yao Ming reaches the finals

⁄ “Yao ;≥[ Ming reaches €e; ao Ming

YaoMing reaches finals CTB

slide-161
SLIDE 161

Multilingual part-of-speech tagging

29

■ In non-word-space languages like Chinese, word segmentation is either applied

before tagging or performed jointly.

⁄€e;≥[ “Yao Ming reaches

Yao Ming reaches the finals

⁄ “Yao ;≥[ Ming reaches €e; ao Ming

YaoMing reaches finals CTB

⁄ “Yao ;≥[ reaches €e; ao Ming

Yao Ming reaches overall finals

;≥ Ming reaches

  • “Yao
slide-162
SLIDE 162

Multilingual part-of-speech tagging

29

■ In non-word-space languages like Chinese, word segmentation is either applied

before tagging or performed jointly.

⁄€e;≥[ “Yao Ming reaches

Yao Ming reaches the finals

⁄ “Yao ;≥[ Ming reaches €e; ao Ming

YaoMing reaches finals CTB

⁄ “Yao ;≥[ reaches €e; ao Ming

Yao Ming reaches overall finals

;≥ Ming reaches

Peking U.

  • “Yao
slide-163
SLIDE 163

Multilingual part-of-speech tagging

29

■ In non-word-space languages like Chinese, word segmentation is either applied

before tagging or performed jointly.

■ UNKs are difficult: majority of unknown words are

common nouns and verbs due to compounding

⁄€e;≥[ “Yao Ming reaches

Yao Ming reaches the finals

⁄ “Yao ;≥[ Ming reaches €e; ao Ming

YaoMing reaches finals CTB

⁄ “Yao ;≥[ reaches €e; ao Ming

Yao Ming reaches overall finals

;≥ Ming reaches

Peking U.

  • “Yao
slide-164
SLIDE 164

Multilingual part-of-speech tagging

29

■ In non-word-space languages like Chinese, word segmentation is either applied

before tagging or performed jointly.

■ UNKs are difficult: majority of unknown words are

common nouns and verbs due to compounding

夏 天 太 热 (too) (hot) (summer) character representations GRU GRU GRU GRU forward RNN GRU GRU GRU GRU backward RNN B-NT E-NT S-AD S-VA CRF Layer 太 AD 热 VA 夏天 NT Output

Figure from: Shao et al. Character-based Joint Segmentation and POS Tagging for Chinese using Bidirectional RNN-CRF . IJCNLP 2017.

⁄€e;≥[ “Yao Ming reaches

Yao Ming reaches the finals I O B E S t a g g i n g

⁄ “Yao ;≥[ Ming reaches €e; ao Ming

YaoMing reaches finals CTB

⁄ “Yao ;≥[ reaches €e; ao Ming

Yao Ming reaches overall finals

;≥ Ming reaches

Peking U.

  • “Yao
slide-165
SLIDE 165

Multilingual part-of-speech tagging

30

slide-166
SLIDE 166

Multilingual part-of-speech tagging

30

■ Universal POS tags [Petrov et al. 2012] provide a cross-lingual tag set.

Language Source # Tags Arabic PADT/CoNLL07 (Hajiˇ c et al., 2004) 21 Basque Basque3LB/CoNLL07 (Aduriz et al., 2003) 64 Bulgarian BTB/CoNLL06 (Simov et al., 2002) 54 Catalan CESS-ECE/CoNLL07 (Mart´ ı et al., 2007) 54 Chinese Penn ChineseTreebank 6.0 (Palmer et al., 2007) 34 Chinese Sinica/CoNLL07 (Chen et al., 2003) 294 Czech PDT/CoNLL07 (B¨

  • hmov´

a et al., 2003) 63 Danish DDT/CoNLL06 (Kromann et al., 2003) 25 Dutch Alpino/CoNLL06 (Van der Beek et al., 2002) 12 English PennTreebank (Marcus et al., 1993) 45 French FrenchTreebank (Abeill´ e et al., 2003) 30 German Tiger/CoNLL06 (Brants et al., 2002) 54 German Negra (Skut et al., 1997) 54 Greek GDT/CoNLL07 (Prokopidis et al., 2005) 38 Hungarian Szeged/CoNLL07 (Csendes et al., 2005) 43 Italian ISST/CoNLL07 (Montemagni et al., 2003) 28 Japanese Verbmobil/CoNLL06 (Kawata and Bartels, 2000) 80 Japanese Kyoto4.0 (Kurohashi and Nagao, 1997) 42 Korean Sejong (http://www.sejong.or.kr) 187 Portuguese Floresta Sint´ a(c)tica/CoNLL06 (Afonso et al., 2002) 22 Russian SynTagRus-RNC (Boguslavsky et al., 2002) 11 Slovene SDT/CoNLL06 (Dˇ zeroski et al., 2006) 29 Spanish Ancora-Cast3LB/CoNLL06 (Civit and Mart´ ı, 2004) 47 Swedish Talbanken05/CoNLL06 (Nivre et al., 2006) 41 Turkish METU-Sabanci/CoNLL07 (Oflazer et al., 2003) 31

Table from: Petrov, Das and McDonald. A Universal Part-of-Speech Tagset. LREC 2012.

slide-167
SLIDE 167

Multilingual part-of-speech tagging

30

■ Universal POS tags [Petrov et al. 2012] provide a cross-lingual tag set. ■ Coarse grained: 16 tags

  • pen class

closed class

  • ther

ADJ ADP PUNCT ADV AUX SYM INTJ CCONJ X NOUN DET PROPN NUM VERB PART PRON SCONJ

slide-168
SLIDE 168

Multilingual part-of-speech tagging

30

■ Universal POS tags [Petrov et al. 2012] provide a cross-lingual tag set. ■ Coarse grained: 16 tags ■ Finer-grained analysis split off into morphological tags (case, gender, …)

  • pen class

closed class

  • ther

ADJ ADP PUNCT ADV AUX SYM INTJ CCONJ X NOUN DET PROPN NUM VERB PART PRON SCONJ

slide-169
SLIDE 169

Multilingual part-of-speech tagging

30

■ Universal POS tags [Petrov et al. 2012] provide a cross-lingual tag set. ■ Coarse grained: 16 tags ■ Finer-grained analysis split off into morphological tags (case, gender, …)

Example from: https://universaldependencies.org/

English: Bulgarian: Czech: Swedish:

  • pen class

closed class

  • ther

ADJ ADP PUNCT ADV AUX SYM INTJ CCONJ X NOUN DET PROPN NUM VERB PART PRON SCONJ

slide-170
SLIDE 170

Announcements

31

■ Project 2 released today after class: sequence labeling. ■ Due: October 16. ■ You will implement part-of-speech taggers for English and Norwegian: ■ HMM, BiLSTM, and BiLSTM-CRF. ■ Friday’s recitation will be an overview of P2.