Emma Strubell
Algorithms for NLP
CS 11-711 · Fall 2020
Algorithms for NLP CS 11-711 Fall 2020 Lecture 3: Nonlinear text - - PowerPoint PPT Presentation
Algorithms for NLP CS 11-711 Fall 2020 Lecture 3: Nonlinear text classification Emma Strubell Announcements Project 1: Text classification will be available after class today, due Friday September 25 Han will lead recitation this
Emma Strubell
CS 11-711 · Fall 2020
2
due Friday September 25
setup (Python, Jupyter, NumPy, PyTorch).
Representing text as a bag of words
3
xt w = (w1, w2, . . . , wT) ∈ V∗
el y ∈ Y.
The drinks were strong but the fish tacos were bland
w = x =
a a r d v a r k
… 1 1 … 1 … 1 2 1 … 2 …
b u t t h e w e r e b l a n d z y t h e r … t a c
s t r
g t a c
s h … … … …
4
Then:
where θ is a vector of weights, and f is a feature function
y
x ψ(x, y).
ψ(x, y) = θ · f (x, y) = X
j=1
θj × fj(x, y),
Linear classification on bag-of-words
5
such as:
fj(x, y) = ( x
xbland if y = negative
f (x, y = 1) = f (x, y = 2) = [
x0 x1 … x|V| … … x0 x1 … x|V| …
f (x, y = K) =
… x0 x1 … x|V|
| {z }
(K−1)×V
T T T
Feature functions
| {z }
(K−2)×V
{z
V
| {z }
(K−1)×V
6
6
■ The learning problem is to find the right weights θ.
6
■ The learning problem is to find the right weights θ. ■ Naïve Bayes: set θ equal to the empirical frequencies:
P = count(y) P
y0 count(y0).
= count(y, j) PV
j0=1 count(y, j0)
ˆ µy = p(y) =
<latexit sha1_base64="SVP5fbdei0Qd/DdSgRoYJhRZw=">AEXHicfVJb9MwFE63AqNjsIHECy+GaVLRtUOJHiZNAEPvEwMiV3QXCrHOWmt+RLZztbI+Ifywit/Aycp0GwDS4k/n4vP53O+OPM2MHge2tpuX3r9p2Vu53Ve2v3H6xvPDw2KtcUjqjiSp/GxABnEo4sxOMw1ExBxO4vN3pf/kArRhSn62RQYjQSaSpYwSG0zj9Z9bOBYO2ylY4r+6rt0e9jzCHFJLtFaX6Io7OLcrW+q75TYraznd1BRgx56cbMfT4l1he91tuYI7SFM9ESQ2dgVCDOJsCB2Sgl3X7xfqIswTZT9R9Ve4EO7xd9zp7oei9yPi1Ai64aQPTRe3xz0B9VC18FwDjaj+TocbyNcKJoLkBayokxZ8NBZkeOaMsoB9/BuYGM0HMygbMAJRFgRq6aiEdbwZKgVOnwSYsq62KGI8KYQsQhsnyzueorjTf5znKbvhk5JrPcgqR1oTnyCpUjhclTAO1vAiAUM0CV0SnRBNqgwhC6xfKTIFfgG0+hIqRM2lVvUEpFuGsQcIlVUIQmTx3OCWC8SKBlOTceodN+hvf1Jqd5IJlZt6lWd2mTtCZxUqzCZOEl5qrhNc0h21qcfXv4PcQZqHhIBD8mIEmVunApFaRD7OZ4Ke4hP+LZPJPZIDNZ7mKQHhM2QKVgXS+FiVXBnA80SrPGoSv5VdEwUkDR2v46GZVkcEQ6vyu86ON7tD1/2dz+92tx/O5fmSvQkehZ1o2H0OtqPkSH0VFEWwct0/rW8s/2u32anutDl1qzXMeRY3VfvwLq317IQ=</latexit>ˆ φy,j = p(xj | y) =
<latexit sha1_base64="ULBOLgVvgMA7ekXaKmi5LjqeziY=">AEiHicfVLbtQwFE3KAGV4tWXJxlBVmqEPzRSkwqJSViwQRSJPlA9jBznZuLWj8h2kaRv4UtfBJ/g/MAmrZgKfHxvdc+x9cnyjgzdjT6Gc7d6t2+c3f+Xv/+g4ePHi8sLh0YlWsK+1RxpY8iYoAzCfuWQ5HmQYiIg6H0enbKn94BtowJT/bIoOJIDPJEkaJ9aHpYri0giNRYpuCJe5rObCr46FDmENidbqHF1J+RqHUvcoJouqigbujVUNGCI1m/O45TYsnD/kqL0DbCRM8EuZiWBcJMIiyITSnh5RfnLvEiTGNl/8E69HroPi7bs/HInfTwnNkA1+zjfpNEuZ83Rr6MTVuea06YnZjGqK6cLy6ONUT3QdTBuwXLQjr3p4twEx4rmAqSlnBhzPB5ldlISbRnl4Po4N5ARekpmcOyhJALMpKxfz6EVH4lRorT/pEV19PKOkghjChH5yqo95mquCt6UO85t8npSMpnlFiRtiJKcI6tQZQUMw3U8sIDQjXzWhFNiSbUesP4Ll6iSYGfge1ehIpJaZKavSMpEn6tQcI5VUIQGb8ocUIE40UMCcm5dSU2yW98U2vW4jOWmbZLF02b+t6SFivNZkwSXtmz9mg37KfU4vrfx+/Av4WGD17gxw0sUp7JY3hnH+bGX6GK/i/Sib/VHrYvVZC/CXqVqgMpCla/zLlQEczbTKs47ga/trof4AkviON/XQ3dZUeEOr9rvOjY3Bi/3Nj89Gp5Z7e15nzwNHgeDIJxsBXsBO+DvWA/oGERfgu/hz96/d6ot9V705TOhe2eJ0Fn9HZ/AQ/jiSw=</latexit>6
■ The learning problem is to find the right weights θ. ■ Naïve Bayes: set θ equal to the empirical frequencies: ■ Perceptron update:
θ + f (x(i), y(i)) − f (x(i), ˆ y).
θ(t+1) ←θ(t) −
P = count(y) P
y0 count(y0).
= count(y, j) PV
j0=1 count(y, j0)
ˆ µy = p(y) =
<latexit sha1_base64="SVP5fbdei0Qd/DdSgRoYJhRZw=">AEXHicfVJb9MwFE63AqNjsIHECy+GaVLRtUOJHiZNAEPvEwMiV3QXCrHOWmt+RLZztbI+Ifywit/Aycp0GwDS4k/n4vP53O+OPM2MHge2tpuX3r9p2Vu53Ve2v3H6xvPDw2KtcUjqjiSp/GxABnEo4sxOMw1ExBxO4vN3pf/kArRhSn62RQYjQSaSpYwSG0zj9Z9bOBYO2ylY4r+6rt0e9jzCHFJLtFaX6Io7OLcrW+q75TYraznd1BRgx56cbMfT4l1he91tuYI7SFM9ESQ2dgVCDOJsCB2Sgl3X7xfqIswTZT9R9Ve4EO7xd9zp7oei9yPi1Ai64aQPTRe3xz0B9VC18FwDjaj+TocbyNcKJoLkBayokxZ8NBZkeOaMsoB9/BuYGM0HMygbMAJRFgRq6aiEdbwZKgVOnwSYsq62KGI8KYQsQhsnyzueorjTf5znKbvhk5JrPcgqR1oTnyCpUjhclTAO1vAiAUM0CV0SnRBNqgwhC6xfKTIFfgG0+hIqRM2lVvUEpFuGsQcIlVUIQmTx3OCWC8SKBlOTceodN+hvf1Jqd5IJlZt6lWd2mTtCZxUqzCZOEl5qrhNc0h21qcfXv4PcQZqHhIBD8mIEmVunApFaRD7OZ4Ke4hP+LZPJPZIDNZ7mKQHhM2QKVgXS+FiVXBnA80SrPGoSv5VdEwUkDR2v46GZVkcEQ6vyu86ON7tD1/2dz+92tx/O5fmSvQkehZ1o2H0OtqPkSH0VFEWwct0/rW8s/2u32anutDl1qzXMeRY3VfvwLq317IQ=</latexit>ˆ φy,j = p(xj | y) =
<latexit sha1_base64="ULBOLgVvgMA7ekXaKmi5LjqeziY=">AEiHicfVLbtQwFE3KAGV4tWXJxlBVmqEPzRSkwqJSViwQRSJPlA9jBznZuLWj8h2kaRv4UtfBJ/g/MAmrZgKfHxvdc+x9cnyjgzdjT6Gc7d6t2+c3f+Xv/+g4ePHi8sLh0YlWsK+1RxpY8iYoAzCfuWQ5HmQYiIg6H0enbKn94BtowJT/bIoOJIDPJEkaJ9aHpYri0giNRYpuCJe5rObCr46FDmENidbqHF1J+RqHUvcoJouqigbujVUNGCI1m/O45TYsnD/kqL0DbCRM8EuZiWBcJMIiyITSnh5RfnLvEiTGNl/8E69HroPi7bs/HInfTwnNkA1+zjfpNEuZ83Rr6MTVuea06YnZjGqK6cLy6ONUT3QdTBuwXLQjr3p4twEx4rmAqSlnBhzPB5ldlISbRnl4Po4N5ARekpmcOyhJALMpKxfz6EVH4lRorT/pEV19PKOkghjChH5yqo95mquCt6UO85t8npSMpnlFiRtiJKcI6tQZQUMw3U8sIDQjXzWhFNiSbUesP4Ll6iSYGfge1ehIpJaZKavSMpEn6tQcI5VUIQGb8ocUIE40UMCcm5dSU2yW98U2vW4jOWmbZLF02b+t6SFivNZkwSXtmz9mg37KfU4vrfx+/Av4WGD17gxw0sUp7JY3hnH+bGX6GK/i/Sib/VHrYvVZC/CXqVqgMpCla/zLlQEczbTKs47ga/trof4AkviON/XQ3dZUeEOr9rvOjY3Bi/3Nj89Gp5Z7e15nzwNHgeDIJxsBXsBO+DvWA/oGERfgu/hz96/d6ot9V705TOhe2eJ0Fn9HZ/AQ/jiSw=</latexit>6
■ The learning problem is to find the right weights θ. ■ Naïve Bayes: set θ equal to the empirical frequencies: ■ Perceptron update: ■ Large-margin update:
θ + f (x(i), y(i)) − f (x(i), ˆ y).
θ(t+1) ←θ(t) −
P = count(y) P
y0 count(y0).
= count(y, j) PV
j0=1 count(y, j0)
with
θ(t+1) ← θ(t) + f(x(i), y (i)) − f(x(i), ˆ y)
<latexit sha1_base64="RXVil6+ULiMq+WxrHfc+47A0JnU=">ADjXicbVJb9MwFHYbOka5rINHXiKqS0bVTNA8IDQBA/wOCS6TWpK5TgnjTVfItvpFln5m0j8Fx5wkiItXS0l/vJ95ZzTpQxqs10+qfT9R709h7uP+o/fvL02cHg8PmFlrkiMCOSXUVYQ2MCpgZahcZQowjxhcRtdfK/1yDUpTKX6aIoMFxytBE0qwcdRykIURt6FJweDylx2Z42Bc+iGDxGCl5I2/JTvxuOaSclRdtxVLx+WJXzRg7L/ZrYcpNrYox8vBcDqZ1se/D4INGKLNOV8ednthLEnOQRjCsNbzYJqZhcXKUMKg7Ie5hgyTa7yCuYMCc9ALW7em9I8cE/uJVO4Rxq/Zux4Wc60LHjlLjk2qt7WK3KXNc5N8XFgqstyAIE2iJGe+kX7VZz+mCohQOYKOpq9UmKFSbGTaN/dDdNCmwNpvUjVid15hYXcfetQMANkZxjEb+2YI5ZUMCc6ZKW2ok/94V1tO4jXN9KZDt02L+m7aJpSKrqjArJp8Pf427a7UhPW7XYKtjV3iqlyZgbBlDQmTGsJopWSetYKX2/51UBcAJ64zjT203RoLtzjB9prcBxenk+Dt5PTHu+HZl80K7aOX6BUaoQB9QGfoOzpHM0TQb/S30+vseQfe+T97kx7XY2Pi9Q63jf/gFH5C9z</latexit>ˆ y = argmax
y∈Y
θ · f(x(i), y) + c(y (i), y)
<latexit sha1_base64="A8+pB2HUH1eWwMsv3S3SIvbkjY=">AERXicfVJLb9QwEM52eZTl1cKRi6GqtEtLtVsQcEGqgAMXRJHoA9XLauJMdq36EdlO28jK7+PKlR/BDXEFJ1mgaQuWEn+ZR+abmS/OBLduOPzaWeheunzl6uK13vUbN2/dXlq+s2t1bhjuMC202Y/BouAKdx3AvczgyBjgXvx4avKv3eExnKtPrgiw7GEqeIpZ+CabL0ZXG0lM3QwflJ93a6NBSajA1IEx+picQfnWm1Ly351nVRWPijXSdGAXl0sZ/OwPmiHPTmgLwgFMxUwsnEF4RyRagEN2Mg/MeyPFWUJZo94+ig0CH9Yu/35OleHGsD7kPBjNwUo0P9uT5YUxTLJSrHBFh7MBpmbuzBOM4Elj2aW8yAHcIUDwJUINGOfT36kqwGS0JSbcKjHKmtpzM8SGsLGYfIqjt71lcZL/Id5C59PvZcZblDxZpCaS6I06TaI0m4QeZEQAwNXwmZgLmw7d7q6TIzFEfo2o0wOfY2rau3KMUyfBtUeMy0lKCSh56mILkoEkwhF6701Ka/8UWjWU+OeGbnUzpxtQLgnJUGz7lCkQlrlphbXO4Zo7W7x59jWEXBt8Ggu8yNOC0CUwavZRhN1N6n1bwf5Fc/YkMsN2WrwmEZqoR6AyVLxv5CW2RxlOj86xF+Fx+T8ANIw8SYe2lNRBDk6Kz8zoPdzY3R43N909Wtl7OpbkY3YseRP1oFD2LtqI30Xa0E7HO0w7tYCftfu5+637v/mhCFzrznLtR63R/gL9xHct</latexit>ˆ µy = p(y) =
<latexit sha1_base64="SVP5fbdei0Qd/DdSgRoYJhRZw=">AEXHicfVJb9MwFE63AqNjsIHECy+GaVLRtUOJHiZNAEPvEwMiV3QXCrHOWmt+RLZztbI+Ifywit/Aycp0GwDS4k/n4vP53O+OPM2MHge2tpuX3r9p2Vu53Ve2v3H6xvPDw2KtcUjqjiSp/GxABnEo4sxOMw1ExBxO4vN3pf/kArRhSn62RQYjQSaSpYwSG0zj9Z9bOBYO2ylY4r+6rt0e9jzCHFJLtFaX6Io7OLcrW+q75TYraznd1BRgx56cbMfT4l1he91tuYI7SFM9ESQ2dgVCDOJsCB2Sgl3X7xfqIswTZT9R9Ve4EO7xd9zp7oei9yPi1Ai64aQPTRe3xz0B9VC18FwDjaj+TocbyNcKJoLkBayokxZ8NBZkeOaMsoB9/BuYGM0HMygbMAJRFgRq6aiEdbwZKgVOnwSYsq62KGI8KYQsQhsnyzueorjTf5znKbvhk5JrPcgqR1oTnyCpUjhclTAO1vAiAUM0CV0SnRBNqgwhC6xfKTIFfgG0+hIqRM2lVvUEpFuGsQcIlVUIQmTx3OCWC8SKBlOTceodN+hvf1Jqd5IJlZt6lWd2mTtCZxUqzCZOEl5qrhNc0h21qcfXv4PcQZqHhIBD8mIEmVunApFaRD7OZ4Ke4hP+LZPJPZIDNZ7mKQHhM2QKVgXS+FiVXBnA80SrPGoSv5VdEwUkDR2v46GZVkcEQ6vyu86ON7tD1/2dz+92tx/O5fmSvQkehZ1o2H0OtqPkSH0VFEWwct0/rW8s/2u32anutDl1qzXMeRY3VfvwLq317IQ=</latexit>ˆ φy,j = p(xj | y) =
<latexit sha1_base64="ULBOLgVvgMA7ekXaKmi5LjqeziY=">AEiHicfVLbtQwFE3KAGV4tWXJxlBVmqEPzRSkwqJSViwQRSJPlA9jBznZuLWj8h2kaRv4UtfBJ/g/MAmrZgKfHxvdc+x9cnyjgzdjT6Gc7d6t2+c3f+Xv/+g4ePHi8sLh0YlWsK+1RxpY8iYoAzCfuWQ5HmQYiIg6H0enbKn94BtowJT/bIoOJIDPJEkaJ9aHpYri0giNRYpuCJe5rObCr46FDmENidbqHF1J+RqHUvcoJouqigbujVUNGCI1m/O45TYsnD/kqL0DbCRM8EuZiWBcJMIiyITSnh5RfnLvEiTGNl/8E69HroPi7bs/HInfTwnNkA1+zjfpNEuZ83Rr6MTVuea06YnZjGqK6cLy6ONUT3QdTBuwXLQjr3p4twEx4rmAqSlnBhzPB5ldlISbRnl4Po4N5ARekpmcOyhJALMpKxfz6EVH4lRorT/pEV19PKOkghjChH5yqo95mquCt6UO85t8npSMpnlFiRtiJKcI6tQZQUMw3U8sIDQjXzWhFNiSbUesP4Ll6iSYGfge1ehIpJaZKavSMpEn6tQcI5VUIQGb8ocUIE40UMCcm5dSU2yW98U2vW4jOWmbZLF02b+t6SFivNZkwSXtmz9mg37KfU4vrfx+/Av4WGD17gxw0sUp7JY3hnH+bGX6GK/i/Sib/VHrYvVZC/CXqVqgMpCla/zLlQEczbTKs47ga/trof4AkviON/XQ3dZUeEOr9rvOjY3Bi/3Nj89Gp5Z7e15nzwNHgeDIJxsBXsBO+DvWA/oGERfgu/hz96/d6ot9V705TOhe2eJ0Fn9HZ/AQ/jiSw=</latexit>6
■ The learning problem is to find the right weights θ. ■ Naïve Bayes: set θ equal to the empirical frequencies: ■ Perceptron update: ■ Large-margin update: ■ Logistic regression update:
θ + f (x(i), y(i)) − f (x(i), ˆ y).
θ(t+1) ←θ(t) −
P = count(y) P
y0 count(y0).
= count(y, j) PV
j0=1 count(y, j0)
with
θ(t+1) ← θ(t) + f(x(i), y (i)) − f(x(i), ˆ y)
<latexit sha1_base64="RXVil6+ULiMq+WxrHfc+47A0JnU=">ADjXicbVJb9MwFHYbOka5rINHXiKqS0bVTNA8IDQBA/wOCS6TWpK5TgnjTVfItvpFln5m0j8Fx5wkiItXS0l/vJ95ZzTpQxqs10+qfT9R709h7uP+o/fvL02cHg8PmFlrkiMCOSXUVYQ2MCpgZahcZQowjxhcRtdfK/1yDUpTKX6aIoMFxytBE0qwcdRykIURt6FJweDylx2Z42Bc+iGDxGCl5I2/JTvxuOaSclRdtxVLx+WJXzRg7L/ZrYcpNrYox8vBcDqZ1se/D4INGKLNOV8ednthLEnOQRjCsNbzYJqZhcXKUMKg7Ie5hgyTa7yCuYMCc9ALW7em9I8cE/uJVO4Rxq/Zux4Wc60LHjlLjk2qt7WK3KXNc5N8XFgqstyAIE2iJGe+kX7VZz+mCohQOYKOpq9UmKFSbGTaN/dDdNCmwNpvUjVid15hYXcfetQMANkZxjEb+2YI5ZUMCc6ZKW2ok/94V1tO4jXN9KZDt02L+m7aJpSKrqjArJp8Pf427a7UhPW7XYKtjV3iqlyZgbBlDQmTGsJopWSetYKX2/51UBcAJ64zjT203RoLtzjB9prcBxenk+Dt5PTHu+HZl80K7aOX6BUaoQB9QGfoOzpHM0TQb/S30+vseQfe+T97kx7XY2Pi9Q63jf/gFH5C9z</latexit>ˆ y = argmax
y∈Y
θ · f(x(i), y) + c(y (i), y)
<latexit sha1_base64="A8+pB2HUH1eWwMsv3S3SIvbkjY=">AERXicfVJLb9QwEM52eZTl1cKRi6GqtEtLtVsQcEGqgAMXRJHoA9XLauJMdq36EdlO28jK7+PKlR/BDXEFJ1mgaQuWEn+ZR+abmS/OBLduOPzaWeheunzl6uK13vUbN2/dXlq+s2t1bhjuMC202Y/BouAKdx3AvczgyBjgXvx4avKv3eExnKtPrgiw7GEqeIpZ+CabL0ZXG0lM3QwflJ93a6NBSajA1IEx+picQfnWm1Ly351nVRWPijXSdGAXl0sZ/OwPmiHPTmgLwgFMxUwsnEF4RyRagEN2Mg/MeyPFWUJZo94+ig0CH9Yu/35OleHGsD7kPBjNwUo0P9uT5YUxTLJSrHBFh7MBpmbuzBOM4Elj2aW8yAHcIUDwJUINGOfT36kqwGS0JSbcKjHKmtpzM8SGsLGYfIqjt71lcZL/Id5C59PvZcZblDxZpCaS6I06TaI0m4QeZEQAwNXwmZgLmw7d7q6TIzFEfo2o0wOfY2rau3KMUyfBtUeMy0lKCSh56mILkoEkwhF6701Ka/8UWjWU+OeGbnUzpxtQLgnJUGz7lCkQlrlphbXO4Zo7W7x59jWEXBt8Ggu8yNOC0CUwavZRhN1N6n1bwf5Fc/YkMsN2WrwmEZqoR6AyVLxv5CW2RxlOj86xF+Fx+T8ANIw8SYe2lNRBDk6Kz8zoPdzY3R43N909Wtl7OpbkY3YseRP1oFD2LtqI30Xa0E7HO0w7tYCftfu5+637v/mhCFzrznLtR63R/gL9xHct</latexit>ˆ µy = p(y) =
<latexit sha1_base64="SVP5fbdei0Qd/DdSgRoYJhRZw=">AEXHicfVJb9MwFE63AqNjsIHECy+GaVLRtUOJHiZNAEPvEwMiV3QXCrHOWmt+RLZztbI+Ifywit/Aycp0GwDS4k/n4vP53O+OPM2MHge2tpuX3r9p2Vu53Ve2v3H6xvPDw2KtcUjqjiSp/GxABnEo4sxOMw1ExBxO4vN3pf/kArRhSn62RQYjQSaSpYwSG0zj9Z9bOBYO2ylY4r+6rt0e9jzCHFJLtFaX6Io7OLcrW+q75TYraznd1BRgx56cbMfT4l1he91tuYI7SFM9ESQ2dgVCDOJsCB2Sgl3X7xfqIswTZT9R9Ve4EO7xd9zp7oei9yPi1Ai64aQPTRe3xz0B9VC18FwDjaj+TocbyNcKJoLkBayokxZ8NBZkeOaMsoB9/BuYGM0HMygbMAJRFgRq6aiEdbwZKgVOnwSYsq62KGI8KYQsQhsnyzueorjTf5znKbvhk5JrPcgqR1oTnyCpUjhclTAO1vAiAUM0CV0SnRBNqgwhC6xfKTIFfgG0+hIqRM2lVvUEpFuGsQcIlVUIQmTx3OCWC8SKBlOTceodN+hvf1Jqd5IJlZt6lWd2mTtCZxUqzCZOEl5qrhNc0h21qcfXv4PcQZqHhIBD8mIEmVunApFaRD7OZ4Ke4hP+LZPJPZIDNZ7mKQHhM2QKVgXS+FiVXBnA80SrPGoSv5VdEwUkDR2v46GZVkcEQ6vyu86ON7tD1/2dz+92tx/O5fmSvQkehZ1o2H0OtqPkSH0VFEWwct0/rW8s/2u32anutDl1qzXMeRY3VfvwLq317IQ=</latexit>ˆ φy,j = p(xj | y) =
<latexit sha1_base64="ULBOLgVvgMA7ekXaKmi5LjqeziY=">AEiHicfVLbtQwFE3KAGV4tWXJxlBVmqEPzRSkwqJSViwQRSJPlA9jBznZuLWj8h2kaRv4UtfBJ/g/MAmrZgKfHxvdc+x9cnyjgzdjT6Gc7d6t2+c3f+Xv/+g4ePHi8sLh0YlWsK+1RxpY8iYoAzCfuWQ5HmQYiIg6H0enbKn94BtowJT/bIoOJIDPJEkaJ9aHpYri0giNRYpuCJe5rObCr46FDmENidbqHF1J+RqHUvcoJouqigbujVUNGCI1m/O45TYsnD/kqL0DbCRM8EuZiWBcJMIiyITSnh5RfnLvEiTGNl/8E69HroPi7bs/HInfTwnNkA1+zjfpNEuZ83Rr6MTVuea06YnZjGqK6cLy6ONUT3QdTBuwXLQjr3p4twEx4rmAqSlnBhzPB5ldlISbRnl4Po4N5ARekpmcOyhJALMpKxfz6EVH4lRorT/pEV19PKOkghjChH5yqo95mquCt6UO85t8npSMpnlFiRtiJKcI6tQZQUMw3U8sIDQjXzWhFNiSbUesP4Ll6iSYGfge1ehIpJaZKavSMpEn6tQcI5VUIQGb8ocUIE40UMCcm5dSU2yW98U2vW4jOWmbZLF02b+t6SFivNZkwSXtmz9mg37KfU4vrfx+/Av4WGD17gxw0sUp7JY3hnH+bGX6GK/i/Sib/VHrYvVZC/CXqVqgMpCla/zLlQEczbTKs47ga/trof4AkviON/XQ3dZUeEOr9rvOjY3Bi/3Nj89Gp5Z7e15nzwNHgeDIJxsBXsBO+DvWA/oGERfgu/hz96/d6ot9V705TOhe2eJ0Fn9HZ/AQ/jiSw=</latexit>θ(t+1) ← θ(t) + f(x(i), y (i)) − Ey|x h f(x(i), y) i
<latexit sha1_base64="FZhWaDfVpIRI7w8uTmyhx8A+678=">AH5HictVRdb9s2FUzZ261jzb4164BUGkJTUsb8AGDAWKbisGDO0yIGk7hI5A0ZTNRpQ0kqtEvwHexv2un+1h/2WvexKlF07dranCjB8eXnuB+85ZFJmXOnh8O9bO+/0dt/t37jv/f+Bx/evbf30TNVJKyM1pkhXyREMUynrMzXGXpSEZFk7Hly+W2z/yKScWL/FTXJRsLMs15yinR4Ir3evkBToTBesY0sRcm0EdRaBHOWKqJlMUrdG0bNo9aX2qD5m/eHloj1HtjBDd376PZ0Sb2ob+QWehBwgTORVkHpsaYZ4jLIieUZKZX6xdqYswnRT6hqoh9EOD+s26y49FZeMapQBYB6ghbucQv1jtFL26dPFLKM4naAFtGpHCSAbdZG2nqSTUnJ5Y+EHFx0/tGhCGTnkz5W3YE4d9HEfLzVF3pmVct+4KvskH8Z2vSYATmAckCaKWBVhcjEKX8aZMQe7XjBczbw4huC5DZaQ4w4QNnr4FTX9t8aoTfU/Mf7B4qyRNdERuzD3HZ3xJXA1x7pAr0M7t01xUEFLRP4dKG0BcLx09DSivJ8FezkEUer8KVUIAhEt4ke3YweDAZbQ8yP8Wt7UxiWfDrT4tT/61cJVUJkOvhxv0AWR86zc4d+huXOnTNpcF8me0w9N9CZ9DW219h7AdPdsvaTckP763PxwM2w9tGlFn7HvdxLv7YzxpKCVYLmGVHqPBqWemyI1JxmzPq4Uqwk9JM2TmYORFMjU37Mlp0AJ4JSgsJv1yj1rsaYhQqhYJIJvRqut7jXPb3nml06/HhudlpVlOXaG0yhDo3lm0YTDNdBZDQahkOviM4I3AYNjzEIZaXMjGVXTK8fhIqxUWlbfa2lRMBaspy9oUQJ98bnBKBM/qCUtJlWlrsEoX9rbRHE+ueKm6Kc3dmHygTuMCKOI5XF2gseVy3d0S6Gj08XcMuJDsCT4U8k0YWETtxjboGbKf4UN+Z/IeHZWCDBXD+WaRuAwzQjKEqWG+u0nxWK4WQqi6pca3gjvm0UEpAUJu7wbD3MIUCQ0X5bRrPRoPoi8Ho5y/3Hz7qpHnb+8T7zAu8yPvKe+j94J14Zx7t/dX7Z7e3u9tP+7/1f+/4aA7t7qYj721r/nv138vCY=</latexit>6
■ The learning problem is to find the right weights θ. ■ Naïve Bayes: set θ equal to the empirical frequencies: ■ Perceptron update: ■ Large-margin update: ■ Logistic regression update: ■ All these methods for supervised learning assume a labeled dataset of N examples:
aset {(x(i), y(i))}N
i=1.
θ + f (x(i), y(i)) − f (x(i), ˆ y).
θ(t+1) ←θ(t) −
P = count(y) P
y0 count(y0).
= count(y, j) PV
j0=1 count(y, j0)
with
θ(t+1) ← θ(t) + f(x(i), y (i)) − f(x(i), ˆ y)
<latexit sha1_base64="RXVil6+ULiMq+WxrHfc+47A0JnU=">ADjXicbVJb9MwFHYbOka5rINHXiKqS0bVTNA8IDQBA/wOCS6TWpK5TgnjTVfItvpFln5m0j8Fx5wkiItXS0l/vJ95ZzTpQxqs10+qfT9R709h7uP+o/fvL02cHg8PmFlrkiMCOSXUVYQ2MCpgZahcZQowjxhcRtdfK/1yDUpTKX6aIoMFxytBE0qwcdRykIURt6FJweDylx2Z42Bc+iGDxGCl5I2/JTvxuOaSclRdtxVLx+WJXzRg7L/ZrYcpNrYox8vBcDqZ1se/D4INGKLNOV8ednthLEnOQRjCsNbzYJqZhcXKUMKg7Ie5hgyTa7yCuYMCc9ALW7em9I8cE/uJVO4Rxq/Zux4Wc60LHjlLjk2qt7WK3KXNc5N8XFgqstyAIE2iJGe+kX7VZz+mCohQOYKOpq9UmKFSbGTaN/dDdNCmwNpvUjVid15hYXcfetQMANkZxjEb+2YI5ZUMCc6ZKW2ok/94V1tO4jXN9KZDt02L+m7aJpSKrqjArJp8Pf427a7UhPW7XYKtjV3iqlyZgbBlDQmTGsJopWSetYKX2/51UBcAJ64zjT203RoLtzjB9prcBxenk+Dt5PTHu+HZl80K7aOX6BUaoQB9QGfoOzpHM0TQb/S30+vseQfe+T97kx7XY2Pi9Q63jf/gFH5C9z</latexit>ˆ y = argmax
y∈Y
θ · f(x(i), y) + c(y (i), y)
<latexit sha1_base64="A8+pB2HUH1eWwMsv3S3SIvbkjY=">AERXicfVJLb9QwEM52eZTl1cKRi6GqtEtLtVsQcEGqgAMXRJHoA9XLauJMdq36EdlO28jK7+PKlR/BDXEFJ1mgaQuWEn+ZR+abmS/OBLduOPzaWeheunzl6uK13vUbN2/dXlq+s2t1bhjuMC202Y/BouAKdx3AvczgyBjgXvx4avKv3eExnKtPrgiw7GEqeIpZ+CabL0ZXG0lM3QwflJ93a6NBSajA1IEx+picQfnWm1Ly351nVRWPijXSdGAXl0sZ/OwPmiHPTmgLwgFMxUwsnEF4RyRagEN2Mg/MeyPFWUJZo94+ig0CH9Yu/35OleHGsD7kPBjNwUo0P9uT5YUxTLJSrHBFh7MBpmbuzBOM4Elj2aW8yAHcIUDwJUINGOfT36kqwGS0JSbcKjHKmtpzM8SGsLGYfIqjt71lcZL/Id5C59PvZcZblDxZpCaS6I06TaI0m4QeZEQAwNXwmZgLmw7d7q6TIzFEfo2o0wOfY2rau3KMUyfBtUeMy0lKCSh56mILkoEkwhF6701Ka/8UWjWU+OeGbnUzpxtQLgnJUGz7lCkQlrlphbXO4Zo7W7x59jWEXBt8Ggu8yNOC0CUwavZRhN1N6n1bwf5Fc/YkMsN2WrwmEZqoR6AyVLxv5CW2RxlOj86xF+Fx+T8ANIw8SYe2lNRBDk6Kz8zoPdzY3R43N909Wtl7OpbkY3YseRP1oFD2LtqI30Xa0E7HO0w7tYCftfu5+637v/mhCFzrznLtR63R/gL9xHct</latexit>ˆ µy = p(y) =
<latexit sha1_base64="SVP5fbdei0Qd/DdSgRoYJhRZw=">AEXHicfVJb9MwFE63AqNjsIHECy+GaVLRtUOJHiZNAEPvEwMiV3QXCrHOWmt+RLZztbI+Ifywit/Aycp0GwDS4k/n4vP53O+OPM2MHge2tpuX3r9p2Vu53Ve2v3H6xvPDw2KtcUjqjiSp/GxABnEo4sxOMw1ExBxO4vN3pf/kArRhSn62RQYjQSaSpYwSG0zj9Z9bOBYO2ylY4r+6rt0e9jzCHFJLtFaX6Io7OLcrW+q75TYraznd1BRgx56cbMfT4l1he91tuYI7SFM9ESQ2dgVCDOJsCB2Sgl3X7xfqIswTZT9R9Ve4EO7xd9zp7oei9yPi1Ai64aQPTRe3xz0B9VC18FwDjaj+TocbyNcKJoLkBayokxZ8NBZkeOaMsoB9/BuYGM0HMygbMAJRFgRq6aiEdbwZKgVOnwSYsq62KGI8KYQsQhsnyzueorjTf5znKbvhk5JrPcgqR1oTnyCpUjhclTAO1vAiAUM0CV0SnRBNqgwhC6xfKTIFfgG0+hIqRM2lVvUEpFuGsQcIlVUIQmTx3OCWC8SKBlOTceodN+hvf1Jqd5IJlZt6lWd2mTtCZxUqzCZOEl5qrhNc0h21qcfXv4PcQZqHhIBD8mIEmVunApFaRD7OZ4Ke4hP+LZPJPZIDNZ7mKQHhM2QKVgXS+FiVXBnA80SrPGoSv5VdEwUkDR2v46GZVkcEQ6vyu86ON7tD1/2dz+92tx/O5fmSvQkehZ1o2H0OtqPkSH0VFEWwct0/rW8s/2u32anutDl1qzXMeRY3VfvwLq317IQ=</latexit>ˆ φy,j = p(xj | y) =
<latexit sha1_base64="ULBOLgVvgMA7ekXaKmi5LjqeziY=">AEiHicfVLbtQwFE3KAGV4tWXJxlBVmqEPzRSkwqJSViwQRSJPlA9jBznZuLWj8h2kaRv4UtfBJ/g/MAmrZgKfHxvdc+x9cnyjgzdjT6Gc7d6t2+c3f+Xv/+g4ePHi8sLh0YlWsK+1RxpY8iYoAzCfuWQ5HmQYiIg6H0enbKn94BtowJT/bIoOJIDPJEkaJ9aHpYri0giNRYpuCJe5rObCr46FDmENidbqHF1J+RqHUvcoJouqigbujVUNGCI1m/O45TYsnD/kqL0DbCRM8EuZiWBcJMIiyITSnh5RfnLvEiTGNl/8E69HroPi7bs/HInfTwnNkA1+zjfpNEuZ83Rr6MTVuea06YnZjGqK6cLy6ONUT3QdTBuwXLQjr3p4twEx4rmAqSlnBhzPB5ldlISbRnl4Po4N5ARekpmcOyhJALMpKxfz6EVH4lRorT/pEV19PKOkghjChH5yqo95mquCt6UO85t8npSMpnlFiRtiJKcI6tQZQUMw3U8sIDQjXzWhFNiSbUesP4Ll6iSYGfge1ehIpJaZKavSMpEn6tQcI5VUIQGb8ocUIE40UMCcm5dSU2yW98U2vW4jOWmbZLF02b+t6SFivNZkwSXtmz9mg37KfU4vrfx+/Av4WGD17gxw0sUp7JY3hnH+bGX6GK/i/Sib/VHrYvVZC/CXqVqgMpCla/zLlQEczbTKs47ga/trof4AkviON/XQ3dZUeEOr9rvOjY3Bi/3Nj89Gp5Z7e15nzwNHgeDIJxsBXsBO+DvWA/oGERfgu/hz96/d6ot9V705TOhe2eJ0Fn9HZ/AQ/jiSw=</latexit>θ(t+1) ← θ(t) + f(x(i), y (i)) − Ey|x h f(x(i), y) i
<latexit sha1_base64="FZhWaDfVpIRI7w8uTmyhx8A+678=">AH5HictVRdb9s2FUzZ261jzb4164BUGkJTUsb8AGDAWKbisGDO0yIGk7hI5A0ZTNRpQ0kqtEvwHexv2un+1h/2WvexKlF07dranCjB8eXnuB+85ZFJmXOnh8O9bO+/0dt/t37jv/f+Bx/evbf30TNVJKyM1pkhXyREMUynrMzXGXpSEZFk7Hly+W2z/yKScWL/FTXJRsLMs15yinR4Ir3evkBToTBesY0sRcm0EdRaBHOWKqJlMUrdG0bNo9aX2qD5m/eHloj1HtjBDd376PZ0Sb2ob+QWehBwgTORVkHpsaYZ4jLIieUZKZX6xdqYswnRT6hqoh9EOD+s26y49FZeMapQBYB6ghbucQv1jtFL26dPFLKM4naAFtGpHCSAbdZG2nqSTUnJ5Y+EHFx0/tGhCGTnkz5W3YE4d9HEfLzVF3pmVct+4KvskH8Z2vSYATmAckCaKWBVhcjEKX8aZMQe7XjBczbw4huC5DZaQ4w4QNnr4FTX9t8aoTfU/Mf7B4qyRNdERuzD3HZ3xJXA1x7pAr0M7t01xUEFLRP4dKG0BcLx09DSivJ8FezkEUer8KVUIAhEt4ke3YweDAZbQ8yP8Wt7UxiWfDrT4tT/61cJVUJkOvhxv0AWR86zc4d+huXOnTNpcF8me0w9N9CZ9DW219h7AdPdsvaTckP763PxwM2w9tGlFn7HvdxLv7YzxpKCVYLmGVHqPBqWemyI1JxmzPq4Uqwk9JM2TmYORFMjU37Mlp0AJ4JSgsJv1yj1rsaYhQqhYJIJvRqut7jXPb3nml06/HhudlpVlOXaG0yhDo3lm0YTDNdBZDQahkOviM4I3AYNjzEIZaXMjGVXTK8fhIqxUWlbfa2lRMBaspy9oUQJ98bnBKBM/qCUtJlWlrsEoX9rbRHE+ueKm6Kc3dmHygTuMCKOI5XF2gseVy3d0S6Gj08XcMuJDsCT4U8k0YWETtxjboGbKf4UN+Z/IeHZWCDBXD+WaRuAwzQjKEqWG+u0nxWK4WQqi6pca3gjvm0UEpAUJu7wbD3MIUCQ0X5bRrPRoPoi8Ho5y/3Hz7qpHnb+8T7zAu8yPvKe+j94J14Zx7t/dX7Z7e3u9tP+7/1f+/4aA7t7qYj721r/nv138vCY=</latexit>7
Nonlinear classification & evaluating classifiers
7
Nonlinear classification & evaluating classifiers
7
Nonlinear classification & evaluating classifiers
7
Nonlinear classification & evaluating classifiers
8
8
s Y = {Good, Bad, Okay}.
8
s Y = {Good, Bad, Okay}.
8
s Y = {Good, Bad, Okay}.
8
s Y = {Good, Bad, Okay}.
easy to predict the label, y.
8
s Y = {Good, Bad, Okay}.
9
. . . . . . x z y
regression:
9
. . . . . . x z y
Pr(zk = 1 | x) =σ(θ(x→z)
k
· x)
regression:
9
. . . . . . x z y
Pr(zk = 1 | x) =σ(θ(x→z)
k
· x)
logistic fn aka sigmoid σ
= 1 1 + e−θ(x→z)
k
x
<latexit sha1_base64="SWzkVHoh/1moFiZ3e/XQhAxJPE=">AF9nichVRfb9MwEM9aCqMw2ECB14M06SGdVNTkOBl0gRo4gUo0jaG5q5yXKf1FifBcbYGyxKfhDfEK1+HT8DX4PKnW9t1ECnJ+e53d7+7s+1GPo9Vq/V7oVK9Vrt+Y/Fm/dbtpTt3l1fu7cdhIinbo6EfygOXxMznAdtTXPnsIJKMCNdn9yT15n90ymTMQ+DXZVGrCvIOAep0SBqrdS+baGXaGxGjJFzJFuqHXHNgj7zFNEyvAMzZjBuJ7rPNPIfqNMy23TRGkh2Ghjvh0PidKpsetrpYS2ECZyIMiop1OEeYCwIGpIia8/GzORF2HaD9UVW3gQxvpxbqMj0VieinkiBqA2UJjdTkBvI10bHJjUW43jEk530hmZEpNCSARs/Z+pJQvVux8ALGXfemykgNJ3yrMvzsJ0Cu9Nzo3tsqZzv3JdJryIB/6lLguAXegHBGk4+RgcdS2i4hXRWqUsNmE9mTkcRmCB6ZxDmWADvbD19Qxj8X2nmo/jUx6U6Rjvr7EhvFNPsncCoRliF6KtRsb0ldbm638QZcFpxRWrfLpwJ7t4n5IE8ECRX0Sx4dOK1JdTaTi1GemjpOYRYSekAE7BDEgsVdnZ8Vg9ZA0deKOENFMq1kx6aiDhOhQvIrL541pYp59kOE+W97GoeRIliAS0SeYmPoNDs4KE+h8YoPwWBUMmBK6JDAg1ScDxhAhNphsw/ZWq6ECq6Ovby7FOUXAFryQJ2RkMhSNB/qrFHBPfTPvNI4iujceyN5XmtafZPeRSXRoVbarDnBUOJR/wAIYJl0F+I0yr4TdUOP/W8RsGs5DsHRD8EDFJVCiBSXG8DcxmgB/jTPwXEjbSGAnidFk6JwDFZC0IxZoU9wWfhgz7A5kmERThC/50QhAPGg4wWeTbsVCNiQzuz2uyzstzedZ5vtj89Xt1+VW3PRemQ9sRqWY72wtq23Vsfas2jlT3Wp+qD6sDaqfa/9qP0soJWF0ue+NfXUfv0FfrMKzA=</latexit>regression:
9
. . . . . . x z y
Pr(zk = 1 | x) =σ(θ(x→z)
k
· x)
logistic fn aka sigmoid σ
= 1 1 + e−θ(x→z)
k
x
<latexit sha1_base64="SWzkVHoh/1moFiZ3e/XQhAxJPE=">AF9nichVRfb9MwEM9aCqMw2ECB14M06SGdVNTkOBl0gRo4gUo0jaG5q5yXKf1FifBcbYGyxKfhDfEK1+HT8DX4PKnW9t1ECnJ+e53d7+7s+1GPo9Vq/V7oVK9Vrt+Y/Fm/dbtpTt3l1fu7cdhIinbo6EfygOXxMznAdtTXPnsIJKMCNdn9yT15n90ymTMQ+DXZVGrCvIOAep0SBqrdS+baGXaGxGjJFzJFuqHXHNgj7zFNEyvAMzZjBuJ7rPNPIfqNMy23TRGkh2Ghjvh0PidKpsetrpYS2ECZyIMiop1OEeYCwIGpIia8/GzORF2HaD9UVW3gQxvpxbqMj0VieinkiBqA2UJjdTkBvI10bHJjUW43jEk530hmZEpNCSARs/Z+pJQvVux8ALGXfemykgNJ3yrMvzsJ0Cu9Nzo3tsqZzv3JdJryIB/6lLguAXegHBGk4+RgcdS2i4hXRWqUsNmE9mTkcRmCB6ZxDmWADvbD19Qxj8X2nmo/jUx6U6Rjvr7EhvFNPsncCoRliF6KtRsb0ldbm638QZcFpxRWrfLpwJ7t4n5IE8ECRX0Sx4dOK1JdTaTi1GemjpOYRYSekAE7BDEgsVdnZ8Vg9ZA0deKOENFMq1kx6aiDhOhQvIrL541pYp59kOE+W97GoeRIliAS0SeYmPoNDs4KE+h8YoPwWBUMmBK6JDAg1ScDxhAhNphsw/ZWq6ECq6Ovby7FOUXAFryQJ2RkMhSNB/qrFHBPfTPvNI4iujceyN5XmtafZPeRSXRoVbarDnBUOJR/wAIYJl0F+I0yr4TdUOP/W8RsGs5DsHRD8EDFJVCiBSXG8DcxmgB/jTPwXEjbSGAnidFk6JwDFZC0IxZoU9wWfhgz7A5kmERThC/50QhAPGg4wWeTbsVCNiQzuz2uyzstzedZ5vtj89Xt1+VW3PRemQ9sRqWY72wtq23Vsfas2jlT3Wp+qD6sDaqfa/9qP0soJWF0ue+NfXUfv0FfrMKzA=</latexit>regression:
9
. . . . . . x z y
Pr(zk = 1 | x) =σ(θ(x→z)
k
· x) Θ(x!z) = [θ(x!z)
1
, θ(x!z)
2
, . . . , θ(x!z)
Kz
]>,
logistic fn aka sigmoid σ
= 1 1 + e−θ(x→z)
k
x
<latexit sha1_base64="SWzkVHoh/1moFiZ3e/XQhAxJPE=">AF9nichVRfb9MwEM9aCqMw2ECB14M06SGdVNTkOBl0gRo4gUo0jaG5q5yXKf1FifBcbYGyxKfhDfEK1+HT8DX4PKnW9t1ECnJ+e53d7+7s+1GPo9Vq/V7oVK9Vrt+Y/Fm/dbtpTt3l1fu7cdhIinbo6EfygOXxMznAdtTXPnsIJKMCNdn9yT15n90ymTMQ+DXZVGrCvIOAep0SBqrdS+baGXaGxGjJFzJFuqHXHNgj7zFNEyvAMzZjBuJ7rPNPIfqNMy23TRGkh2Ghjvh0PidKpsetrpYS2ECZyIMiop1OEeYCwIGpIia8/GzORF2HaD9UVW3gQxvpxbqMj0VieinkiBqA2UJjdTkBvI10bHJjUW43jEk530hmZEpNCSARs/Z+pJQvVux8ALGXfemykgNJ3yrMvzsJ0Cu9Nzo3tsqZzv3JdJryIB/6lLguAXegHBGk4+RgcdS2i4hXRWqUsNmE9mTkcRmCB6ZxDmWADvbD19Qxj8X2nmo/jUx6U6Rjvr7EhvFNPsncCoRliF6KtRsb0ldbm638QZcFpxRWrfLpwJ7t4n5IE8ECRX0Sx4dOK1JdTaTi1GemjpOYRYSekAE7BDEgsVdnZ8Vg9ZA0deKOENFMq1kx6aiDhOhQvIrL541pYp59kOE+W97GoeRIliAS0SeYmPoNDs4KE+h8YoPwWBUMmBK6JDAg1ScDxhAhNphsw/ZWq6ECq6Ovby7FOUXAFryQJ2RkMhSNB/qrFHBPfTPvNI4iujceyN5XmtafZPeRSXRoVbarDnBUOJR/wAIYJl0F+I0yr4TdUOP/W8RsGs5DsHRD8EDFJVCiBSXG8DcxmgB/jTPwXEjbSGAnidFk6JwDFZC0IxZoU9wWfhgz7A5kmERThC/50QhAPGg4wWeTbsVCNiQzuz2uyzstzedZ5vtj89Xt1+VW3PRemQ9sRqWY72wtq23Vsfas2jlT3Wp+qD6sDaqfa/9qP0soJWF0ue+NfXUfv0FfrMKzA=</latexit>regression:
element-wise.
9
. . . . . . x z y
Pr(zk = 1 | x) =σ(θ(x→z)
k
· x) Θ(x!z) = [θ(x!z)
1
, θ(x!z)
2
, . . . , θ(x!z)
Kz
]>,
logistic fn aka sigmoid σ
= 1 1 + e−θ(x→z)
k
x
<latexit sha1_base64="SWzkVHoh/1moFiZ3e/XQhAxJPE=">AF9nichVRfb9MwEM9aCqMw2ECB14M06SGdVNTkOBl0gRo4gUo0jaG5q5yXKf1FifBcbYGyxKfhDfEK1+HT8DX4PKnW9t1ECnJ+e53d7+7s+1GPo9Vq/V7oVK9Vrt+Y/Fm/dbtpTt3l1fu7cdhIinbo6EfygOXxMznAdtTXPnsIJKMCNdn9yT15n90ymTMQ+DXZVGrCvIOAep0SBqrdS+baGXaGxGjJFzJFuqHXHNgj7zFNEyvAMzZjBuJ7rPNPIfqNMy23TRGkh2Ghjvh0PidKpsetrpYS2ECZyIMiop1OEeYCwIGpIia8/GzORF2HaD9UVW3gQxvpxbqMj0VieinkiBqA2UJjdTkBvI10bHJjUW43jEk530hmZEpNCSARs/Z+pJQvVux8ALGXfemykgNJ3yrMvzsJ0Cu9Nzo3tsqZzv3JdJryIB/6lLguAXegHBGk4+RgcdS2i4hXRWqUsNmE9mTkcRmCB6ZxDmWADvbD19Qxj8X2nmo/jUx6U6Rjvr7EhvFNPsncCoRliF6KtRsb0ldbm638QZcFpxRWrfLpwJ7t4n5IE8ECRX0Sx4dOK1JdTaTi1GemjpOYRYSekAE7BDEgsVdnZ8Vg9ZA0deKOENFMq1kx6aiDhOhQvIrL541pYp59kOE+W97GoeRIliAS0SeYmPoNDs4KE+h8YoPwWBUMmBK6JDAg1ScDxhAhNphsw/ZWq6ECq6Ovby7FOUXAFryQJ2RkMhSNB/qrFHBPfTPvNI4iujceyN5XmtafZPeRSXRoVbarDnBUOJR/wAIYJl0F+I0yr4TdUOP/W8RsGs5DsHRD8EDFJVCiBSXG8DcxmgB/jTPwXEjbSGAnidFk6JwDFZC0IxZoU9wWfhgz7A5kmERThC/50QhAPGg4wWeTbsVCNiQzuz2uyzstzedZ5vtj89Xt1+VW3PRemQ9sRqWY72wtq23Vsfas2jlT3Wp+qD6sDaqfa/9qP0soJWF0ue+NfXUfv0FfrMKzA=</latexit>regression:
element-wise.
9
. . . . . . x z y
Pr(zk = 1 | x) =σ(θ(x→z)
k
· x) Θ(x!z) = [θ(x!z)
1
, θ(x!z)
2
, . . . , θ(x!z)
Kz
]>,
logistic fn aka sigmoid σ
= 1 1 + e−θ(x→z)
k
x
<latexit sha1_base64="SWzkVHoh/1moFiZ3e/XQhAxJPE=">AF9nichVRfb9MwEM9aCqMw2ECB14M06SGdVNTkOBl0gRo4gUo0jaG5q5yXKf1FifBcbYGyxKfhDfEK1+HT8DX4PKnW9t1ECnJ+e53d7+7s+1GPo9Vq/V7oVK9Vrt+Y/Fm/dbtpTt3l1fu7cdhIinbo6EfygOXxMznAdtTXPnsIJKMCNdn9yT15n90ymTMQ+DXZVGrCvIOAep0SBqrdS+baGXaGxGjJFzJFuqHXHNgj7zFNEyvAMzZjBuJ7rPNPIfqNMy23TRGkh2Ghjvh0PidKpsetrpYS2ECZyIMiop1OEeYCwIGpIia8/GzORF2HaD9UVW3gQxvpxbqMj0VieinkiBqA2UJjdTkBvI10bHJjUW43jEk530hmZEpNCSARs/Z+pJQvVux8ALGXfemykgNJ3yrMvzsJ0Cu9Nzo3tsqZzv3JdJryIB/6lLguAXegHBGk4+RgcdS2i4hXRWqUsNmE9mTkcRmCB6ZxDmWADvbD19Qxj8X2nmo/jUx6U6Rjvr7EhvFNPsncCoRliF6KtRsb0ldbm638QZcFpxRWrfLpwJ7t4n5IE8ECRX0Sx4dOK1JdTaTi1GemjpOYRYSekAE7BDEgsVdnZ8Vg9ZA0deKOENFMq1kx6aiDhOhQvIrL541pYp59kOE+W97GoeRIliAS0SeYmPoNDs4KE+h8YoPwWBUMmBK6JDAg1ScDxhAhNphsw/ZWq6ECq6Ovby7FOUXAFryQJ2RkMhSNB/qrFHBPfTPvNI4iujceyN5XmtafZPeRSXRoVbarDnBUOJR/wAIYJl0F+I0yr4TdUOP/W8RsGs5DsHRD8EDFJVCiBSXG8DcxmgB/jTPwXEjbSGAnidFk6JwDFZC0IxZoU9wWfhgz7A5kmERThC/50QhAPGg4wWeTbsVCNiQzuz2uyzstzedZ5vtj89Xt1+VW3PRemQ9sRqWY72wtq23Vsfas2jlT3Wp+qD6sDaqfa/9qP0soJWF0ue+NfXUfv0FfrMKzA=</latexit>matrix-vector product. dims: [k, V] * [V, 1] = [k, 1]
10
. . . . . . x z y
regression (multiclass): Vector of probabilities over each possible y is denoted:
10
. . . . . . x z y | = exp(θ(z→y)
j
· z + bj) P
j0∈Y exp(θ(z→y) j0
· z + bj0) ,
regression (multiclass): Vector of probabilities over each possible y is denoted:
10
. . . . . . x z y | = exp(θ(z→y)
j
· z + bj) P
j0∈Y exp(θ(z→y) j0
· z + bj0) ,
additive bias/offset vector
regression (multiclass): Vector of probabilities over each possible y is denoted:
10
. . . . . . x z y | = exp(θ(z→y)
j
· z + bj) P
j0∈Y exp(θ(z→y) j0
· z + bj0) ,
additive bias/offset vector
p(y | z) = SoftMax(Θ(z→y)z + b).
11
. . . . . . x z y
11
values for z, we compute it directly from x.
. . . . . . x z y
11
values for z, we compute it directly from x.
. . . . . . x z y
11
values for z, we compute it directly from x.
. . . . . . x z y
where
11
p(y | z) = SoftMax(Θ(z→y)z + b).
z =σ(Θ(x→z)x)
σ(Θ(x→z)x) = h σ(θ(x→z)
1
· x), σ(θ(x→z)
2
· x), ..., σ(θ(x→z)
Kz
· x) iT
<latexit sha1_base64="naie3XV/whZb+QED1FE+NWJT7wI=">AGv3ichVRb9MwFM5WKPcNnjkxTBNathWNQUJXiaNiyYQAoa0G5rbyHGd1lucBMfZmln+T/waJ7gp3By6dZuHUSKcnzOd+5f7MUBT1S7/WtuvnbjZv3Wwu3Gnbv37j9YXHq4l0SpGyXRkEkDzySsICHbFdxFbCDWDIivIDte8dvc/v+CZMJj8IdlcWsK8g5D6nRIHKXap9WMGe0FgNmSKmp5tq1bENwgHzFZEyOkWXzGBcLXS+aeafUa7ltlDWSnYaH2HQ+J0pmxGyuVhDYQJnIgyMjVGcI8RFgQNaQk0N+MmciLMO1H6pqsNtRDm9nFuYqPRWrcDHLETcBsoLE6HnID+dbQkSmMZTj3CJLzPhpD80Kk0JBNUFRqS8J1TvbBl7IuPXZTAFh6JTnU56F3S6xW65zbuxUPZ37Vecq4U8K90eQDswTwgSNMptgCHXscuI14XqVnBLie0JyOP2xA8NM1zyFoFsHM+fEd5/YXQKUL9x6exMu7VMdpZT29Xq7TPYZdjbCK0JltRsY0cMKBMUi8M6YaGNAuZ58KwUnDyfBJTtcZxJ+zhRwAs5dRXeuR7darZku+qN7Zq5zw5IPhqrb23EXl9utdvGgq4JTCctW9Wy7S/Nd3I9oKlioaECS5NBpx6qriVScBgwGkyYsJvSYDNghiCERLOnq4qc3aAU0feRHEt5QoUI76aGJSJMeIDMF5VctuXKWbDVPmvupqHcapYSMtEfhog6D2/QVCfw4ZVkIFAqORQK6JDAotWcM8AlSbSDFlwtR0I1R0deIX2adK8gScJQvZKY2EIGH/mcY+ETzI+swnaCMxok/lmeNZq1/wuOkmtKoHFMDWKNwBDviIbASGFTQaFpdbLDcYwO/Y7ALyT5BgV9iJomKJFRS3lMGdjPAT3Au/gsJf8QYCeJ0W7oAJrJRxDFLNSmvPaCKGHYG8gojacKvuJfFAoBiA8TL/Fs2q1EACGdy/S7Kux1Ws7zVufri+XNxU1F6zH1lOraTnWS2vTem9tW7sWrf2o/az9rv2pv64P6mE9LqHzc5XPI2vqWd/AS9EVTg=</latexit>values for z, we compute it directly from x.
. . . . . . x z y
12
12
in z = σ(Θ(x→z)x) i
12
in z = σ(Θ(x→z)x) i
e z = f (Θ(x→z)x) t
12
in z = σ(Θ(x→z)x) i
e z = f (Θ(x→z)x) t
12
in z = σ(Θ(x→z)x) i
e z = f (Θ(x→z)x) t
12
analyze, even further avoids saturation.
in z = σ(Θ(x→z)x) i
e z = f (Θ(x→z)x) t
12
analyze, even further avoids saturation.
in z = σ(Θ(x→z)x) i
e z = f (Θ(x→z)x) t
) = ( a, a ≥ 0 .0001a,
13
Gradient descent
13
where
θ(z!y)
k
θ(z!y)
k
⌘(t)rθ(z→y)
k
`(i),
Gradient descent
13
where
θ(z!y)
k
θ(z!y)
k
⌘(t)rθ(z→y)
k
`(i),
Gradient descent
13
where
θ(z!y)
k
θ(z!y)
k
⌘(t)rθ(z→y)
k
`(i),
Gradient descent
13
where
θ(z!y)
k
θ(z!y)
k
⌘(t)rθ(z→y)
k
`(i),
I rθ(z→y)
k
`(i)
ts θ(z!y)
k
,
rθ(z→y)
k
`(i) = @`(i) @✓(z!y)
k,1
, @`(i) @✓(z!y)
k,2
, . . . , @`(i) @✓(z!y)
k,Ky
Gradient descent
14
Backpropagation
14
Backpropagation
ts Θ(x!z)
14
compute a gradient on all parameters.
Backpropagation
ts Θ(x!z)
14
compute a gradient on all parameters.
Backpropagation
ts Θ(x!z)
with nodes for inputs, outputs, hidden layers, parameters.
15
with nodes for inputs, outputs, hidden layers, parameters.
Backpropagation
x(i) z ˆ y `(i) y(i) Θ vx vz vˆ
y
vΘ gˆ
y
g gz gz vy vΘ g gˆ
y
15
with nodes for inputs, outputs, hidden layers, parameters.
Backpropagation
x(i) z ˆ y `(i) y(i) Θ vx vz vˆ
y
vΘ gˆ
y
g gz gz vy vΘ g gˆ
y
15
with nodes for inputs, outputs, hidden layers, parameters.
Backpropagation
x(i) z ˆ y `(i) y(i) Θ vx vz vˆ
y
vΘ gˆ
y
g gz gz vy vΘ g gˆ
y
the chain rule
15
with nodes for inputs, outputs, hidden layers, parameters.
Backpropagation
x(i) z ˆ y `(i) y(i) Θ vx vz vˆ
y
vΘ gˆ
y
g gz gz vy vΘ g gˆ
y
the chain rule
the graph, and let automatic differentiation compute updates for every layer.
Another choice of R: word embeddings
16
Another choice of R: word embeddings
16
Another choice of R: word embeddings
16
each type, resulting in a matrix
Another choice of R: word embeddings
16
w
=
1.77 0.71
0.11
0.03 0.71
1.43
0.69 1.43 1.88 0.84
0.11 1.36
0.84 0.14 0.11
… … … … … … … … …
0.93
0.74
t h e w e r e t h e w e r e b l a n d t a c
b u t s t r
g d r i n k s
n, resulting in a m where X(0) ∈ RKe×M.
17
18
18
18
18
18
18
18
classifier’s performance, because real future data will differ from your test set in ways that you cannot anticipate.
18
19
The problem with accuracy is rare labels, also known as class imbalance.
19
acc(y, ˆ y) = 1 N
N
X
i=1
δ(y(i) = ˆ y).
The problem with accuracy is rare labels, also known as class imbalance.
19
acc(y, ˆ y) = 1 N
N
X
i=1
δ(y(i) = ˆ y).
The problem with accuracy is rare labels, also known as class imbalance.
19
acc(y, ˆ y) = 1 N
N
X
i=1
δ(y(i) = ˆ y).
The problem with accuracy is rare labels, also known as class imbalance.
19
acc(y, ˆ y) = 1 N
N
X
i=1
δ(y(i) = ˆ y).
The problem with accuracy is rare labels, also known as class imbalance.
19
acc(y, ˆ y) = 1 N
N
X
i=1
δ(y(i) = ˆ y).
20
correct labels predicted labels
20
correct labels predicted labels
predicts the label.
20
correct labels predicted labels
predicts the label.
to predict the label.
20
correct labels predicted labels
predicts the label.
to predict the label.
20
correct labels predicted labels
predicts the label.
to predict the label.
the label.
20
correct labels predicted labels
predicts the label.
to predict the label.
the label.
that the label does not apply to this instance.
20
correct labels predicted labels
21
correct labels predicted labels
21
correct labels predicted labels
recall = TP TP + FN =
<latexit sha1_base64="L312lWJX2ZlV0ijiKCWHOpePN+Y=">AEsXicfVJdTxQxFJ3BVXH9An30pUpIdgXJLphoYkiIGuOLigmfoeva6XR2Cu10naASdM/46/xVd/8N975QFlAm3R65tzb3tPbE+WCGzsY/ApnrnWu37g5e6t7+87de/fn5h/sGFVoyrapEkrvRcQwTO2bkVbC/XjMhIsN3o6E0V3z1m2nCVbdkyZyNJhlPOCUWqPF8+GoR9JhmzJL/BfXs0vDvkdYsMQSrdUJuhCG4FLNJb5XLacVy/t+GZUN6KNnV8dxSqwrfb+72CK0jDRE0lOx65EmGcIS2JTSoTb9/5cXYRprOw/qvZBD+2Vf/b87Es/LiEGnkPctbRGZ2n3EO9ZXTo62Bz3PgQivMYNam1Di2dZiBG1EITajb2vQwoeC7jxU5nlsYrAzqgS6DYQsWgnZsjudnRjhWtJAs1QYw6Gg9yOHNGWU8F8FxeG5YQekQk7AJgRyczI1c/s0SIwMUqUhplZVLPndzgijSlBJmVfnMxVpFXxQ4Km7wcOZ7lhWUZbQolhUBWocozKObQCtKAIRqDloRTQk0xIKzoK/nyqRMHDM7fREqR84kdfUpSZGEf80ydkKVlCSLnzqcEMlFGbOEFMJ6h01yhq9qzXJ8zHPTdum0aVMXvGux0nzCM3g8HFt5mkaltTi+tvFbxm8hWYfQOCnGlilQYljTM9vM0EP8YV/F8mz/5kApy+lqsFwGWqFqicZc43RhfKMBxNtCryKcGX9tdC4QCSQMebfDa9rckAQw4v2u8y2FldGa6trH5+vrDxurXmbPAoeBL0gmHwItgI3gebwXZAw2/h9/BH+LOz1tnvfO1ETepM2O5GEyNztFvrTGYbg=</latexit>correctly classified.
21
correct labels predicted labels
recall = TP TP + FN =
<latexit sha1_base64="L312lWJX2ZlV0ijiKCWHOpePN+Y=">AEsXicfVJdTxQxFJ3BVXH9An30pUpIdgXJLphoYkiIGuOLigmfoeva6XR2Cu10naASdM/46/xVd/8N975QFlAm3R65tzb3tPbE+WCGzsY/ApnrnWu37g5e6t7+87de/fn5h/sGFVoyrapEkrvRcQwTO2bkVbC/XjMhIsN3o6E0V3z1m2nCVbdkyZyNJhlPOCUWqPF8+GoR9JhmzJL/BfXs0vDvkdYsMQSrdUJuhCG4FLNJb5XLacVy/t+GZUN6KNnV8dxSqwrfb+72CK0jDRE0lOx65EmGcIS2JTSoTb9/5cXYRprOw/qvZBD+2Vf/b87Es/LiEGnkPctbRGZ2n3EO9ZXTo62Bz3PgQivMYNam1Di2dZiBG1EITajb2vQwoeC7jxU5nlsYrAzqgS6DYQsWgnZsjudnRjhWtJAs1QYw6Gg9yOHNGWU8F8FxeG5YQekQk7AJgRyczI1c/s0SIwMUqUhplZVLPndzgijSlBJmVfnMxVpFXxQ4Km7wcOZ7lhWUZbQolhUBWocozKObQCtKAIRqDloRTQk0xIKzoK/nyqRMHDM7fREqR84kdfUpSZGEf80ydkKVlCSLnzqcEMlFGbOEFMJ6h01yhq9qzXJ8zHPTdum0aVMXvGux0nzCM3g8HFt5mkaltTi+tvFbxm8hWYfQOCnGlilQYljTM9vM0EP8YV/F8mz/5kApy+lqsFwGWqFqicZc43RhfKMBxNtCryKcGX9tdC4QCSQMebfDa9rckAQw4v2u8y2FldGa6trH5+vrDxurXmbPAoeBL0gmHwItgI3gebwXZAw2/h9/BH+LOz1tnvfO1ETepM2O5GEyNztFvrTGYbg=</latexit>correctly classified.
■ The “never Telugu” classifier has 0 recall.
21
correct labels predicted labels
recall = TP TP + FN =
<latexit sha1_base64="L312lWJX2ZlV0ijiKCWHOpePN+Y=">AEsXicfVJdTxQxFJ3BVXH9An30pUpIdgXJLphoYkiIGuOLigmfoeva6XR2Cu10naASdM/46/xVd/8N975QFlAm3R65tzb3tPbE+WCGzsY/ApnrnWu37g5e6t7+87de/fn5h/sGFVoyrapEkrvRcQwTO2bkVbC/XjMhIsN3o6E0V3z1m2nCVbdkyZyNJhlPOCUWqPF8+GoR9JhmzJL/BfXs0vDvkdYsMQSrdUJuhCG4FLNJb5XLacVy/t+GZUN6KNnV8dxSqwrfb+72CK0jDRE0lOx65EmGcIS2JTSoTb9/5cXYRprOw/qvZBD+2Vf/b87Es/LiEGnkPctbRGZ2n3EO9ZXTo62Bz3PgQivMYNam1Di2dZiBG1EITajb2vQwoeC7jxU5nlsYrAzqgS6DYQsWgnZsjudnRjhWtJAs1QYw6Gg9yOHNGWU8F8FxeG5YQekQk7AJgRyczI1c/s0SIwMUqUhplZVLPndzgijSlBJmVfnMxVpFXxQ4Km7wcOZ7lhWUZbQolhUBWocozKObQCtKAIRqDloRTQk0xIKzoK/nyqRMHDM7fREqR84kdfUpSZGEf80ydkKVlCSLnzqcEMlFGbOEFMJ6h01yhq9qzXJ8zHPTdum0aVMXvGux0nzCM3g8HFt5mkaltTi+tvFbxm8hWYfQOCnGlilQYljTM9vM0EP8YV/F8mz/5kApy+lqsFwGWqFqicZc43RhfKMBxNtCryKcGX9tdC4QCSQMebfDa9rckAQw4v2u8y2FldGa6trH5+vrDxurXmbPAoeBL0gmHwItgI3gebwXZAw2/h9/BH+LOz1tnvfO1ETepM2O5GEyNztFvrTGYbg=</latexit>correctly classified.
■ The “never Telugu” classifier has 0 recall. ■ The “always Telugu” classifier has perfect recall.
21
correct labels predicted labels
recall = TP TP + FN =
<latexit sha1_base64="L312lWJX2ZlV0ijiKCWHOpePN+Y=">AEsXicfVJdTxQxFJ3BVXH9An30pUpIdgXJLphoYkiIGuOLigmfoeva6XR2Cu10naASdM/46/xVd/8N975QFlAm3R65tzb3tPbE+WCGzsY/ApnrnWu37g5e6t7+87de/fn5h/sGFVoyrapEkrvRcQwTO2bkVbC/XjMhIsN3o6E0V3z1m2nCVbdkyZyNJhlPOCUWqPF8+GoR9JhmzJL/BfXs0vDvkdYsMQSrdUJuhCG4FLNJb5XLacVy/t+GZUN6KNnV8dxSqwrfb+72CK0jDRE0lOx65EmGcIS2JTSoTb9/5cXYRprOw/qvZBD+2Vf/b87Es/LiEGnkPctbRGZ2n3EO9ZXTo62Bz3PgQivMYNam1Di2dZiBG1EITajb2vQwoeC7jxU5nlsYrAzqgS6DYQsWgnZsjudnRjhWtJAs1QYw6Gg9yOHNGWU8F8FxeG5YQekQk7AJgRyczI1c/s0SIwMUqUhplZVLPndzgijSlBJmVfnMxVpFXxQ4Km7wcOZ7lhWUZbQolhUBWocozKObQCtKAIRqDloRTQk0xIKzoK/nyqRMHDM7fREqR84kdfUpSZGEf80ydkKVlCSLnzqcEMlFGbOEFMJ6h01yhq9qzXJ8zHPTdum0aVMXvGux0nzCM3g8HFt5mkaltTi+tvFbxm8hWYfQOCnGlilQYljTM9vM0EP8YV/F8mz/5kApy+lqsFwGWqFqicZc43RhfKMBxNtCryKcGX9tdC4QCSQMebfDa9rckAQw4v2u8y2FldGa6trH5+vrDxurXmbPAoeBL0gmHwItgI3gebwXZAw2/h9/BH+LOz1tnvfO1ETepM2O5GEyNztFvrTGYbg=</latexit>precision = TP TP + FP =
<latexit sha1_base64="sCHI9VBUq0LibUzZDHZL9e36d8=">AE3XicfVPb9MwFE5GgVF+bXDkYpgmtWxU7UCy6QJEOICFGm/0Fwqx3Fab3Yc2c62yPKRG+LKf8Wd/4MriJekhXUrWEr8r3Pfp8/v0SZ4MZ2u9/DhUuNy1euLl5rXr9x89btpeU7u0blmrIdqoTS+xExTPCU7VhuBdvPNCMyEmwvOnpR5veOmTZcpdu2yNhAklHKE06JBWi4HI5WcSQdtmNmif/oWnat1/YIC5ZYorU6QefSkFyrsMS3yum0RHnbr6OiDtro0fw8HhPrCt9urk4itIkw0SNJToeuQJinCEtix5QI98H7M3URprGy/6jaBj20Vfz9nuyPZe6HBdTIWsDZRFM4G3MP9dbRoa+S9XbDQyjOYzSlkK0dJqBGlEpTShbrv4YGKr96WYHPKA8pL02eR+2X4HBpdvpVgNdDHqTYCWYjP5weWGAY0VzyVJLBTHmoNfN7MARbTkVzDdxblhG6BEZsQMIUyKZGbiqJTxaBSRGidLwpBZV6NkVjkhjChkBszyCOZ8rwXm5g9wmzwaOp1luWUrQkukFWo7C8Uc7DCigICQjUHrYiOCRhioQvB2DNlxkwcMzt7ECoHziRV9RlJkYRvzVJ2QpWUJI0fOpwQyURs4TkwnqHTKN51mzHh/zExcOq1takKfW6w0H/EU7hl6vmr8WRimscXVu4lfMrgLzd6AwHcZ08QqDUrqLvZwNyN8H5fh/5g8/cOEcPZYrhIAhyktUBlLna9/CqEMw9FIqzybEXxhfSUNiAJOF7z2eymgEN2TvfheD3Y1O73Fn4/2Tla3nk9ZcDO4FD4JW0AueBlvB6Af7AQ0/Bb+CH+GvxrDxqfG58aXmroQTtbcDWZG4+tvpRipKQ=</latexit>correctly classified.
■ The “never Telugu” classifier has 0 recall. ■ The “always Telugu” classifier has perfect recall.
correct.
21
correct labels predicted labels
recall = TP TP + FN =
<latexit sha1_base64="L312lWJX2ZlV0ijiKCWHOpePN+Y=">AEsXicfVJdTxQxFJ3BVXH9An30pUpIdgXJLphoYkiIGuOLigmfoeva6XR2Cu10naASdM/46/xVd/8N975QFlAm3R65tzb3tPbE+WCGzsY/ApnrnWu37g5e6t7+87de/fn5h/sGFVoyrapEkrvRcQwTO2bkVbC/XjMhIsN3o6E0V3z1m2nCVbdkyZyNJhlPOCUWqPF8+GoR9JhmzJL/BfXs0vDvkdYsMQSrdUJuhCG4FLNJb5XLacVy/t+GZUN6KNnV8dxSqwrfb+72CK0jDRE0lOx65EmGcIS2JTSoTb9/5cXYRprOw/qvZBD+2Vf/b87Es/LiEGnkPctbRGZ2n3EO9ZXTo62Bz3PgQivMYNam1Di2dZiBG1EITajb2vQwoeC7jxU5nlsYrAzqgS6DYQsWgnZsjudnRjhWtJAs1QYw6Gg9yOHNGWU8F8FxeG5YQekQk7AJgRyczI1c/s0SIwMUqUhplZVLPndzgijSlBJmVfnMxVpFXxQ4Km7wcOZ7lhWUZbQolhUBWocozKObQCtKAIRqDloRTQk0xIKzoK/nyqRMHDM7fREqR84kdfUpSZGEf80ydkKVlCSLnzqcEMlFGbOEFMJ6h01yhq9qzXJ8zHPTdum0aVMXvGux0nzCM3g8HFt5mkaltTi+tvFbxm8hWYfQOCnGlilQYljTM9vM0EP8YV/F8mz/5kApy+lqsFwGWqFqicZc43RhfKMBxNtCryKcGX9tdC4QCSQMebfDa9rckAQw4v2u8y2FldGa6trH5+vrDxurXmbPAoeBL0gmHwItgI3gebwXZAw2/h9/BH+LOz1tnvfO1ETepM2O5GEyNztFvrTGYbg=</latexit>precision = TP TP + FP =
<latexit sha1_base64="sCHI9VBUq0LibUzZDHZL9e36d8=">AE3XicfVPb9MwFE5GgVF+bXDkYpgmtWxU7UCy6QJEOICFGm/0Fwqx3Fab3Yc2c62yPKRG+LKf8Wd/4MriJekhXUrWEr8r3Pfp8/v0SZ4MZ2u9/DhUuNy1euLl5rXr9x89btpeU7u0blmrIdqoTS+xExTPCU7VhuBdvPNCMyEmwvOnpR5veOmTZcpdu2yNhAklHKE06JBWi4HI5WcSQdtmNmif/oWnat1/YIC5ZYorU6QefSkFyrsMS3yum0RHnbr6OiDtro0fw8HhPrCt9urk4itIkw0SNJToeuQJinCEtix5QI98H7M3URprGy/6jaBj20Vfz9nuyPZe6HBdTIWsDZRFM4G3MP9dbRoa+S9XbDQyjOYzSlkK0dJqBGlEpTShbrv4YGKr96WYHPKA8pL02eR+2X4HBpdvpVgNdDHqTYCWYjP5weWGAY0VzyVJLBTHmoNfN7MARbTkVzDdxblhG6BEZsQMIUyKZGbiqJTxaBSRGidLwpBZV6NkVjkhjChkBszyCOZ8rwXm5g9wmzwaOp1luWUrQkukFWo7C8Uc7DCigICQjUHrYiOCRhioQvB2DNlxkwcMzt7ECoHziRV9RlJkYRvzVJ2QpWUJI0fOpwQyURs4TkwnqHTKN51mzHh/zExcOq1takKfW6w0H/EU7hl6vmr8WRimscXVu4lfMrgLzd6AwHcZ08QqDUrqLvZwNyN8H5fh/5g8/cOEcPZYrhIAhyktUBlLna9/CqEMw9FIqzybEXxhfSUNiAJOF7z2eymgEN2TvfheD3Y1O73Fn4/2Tla3nk9ZcDO4FD4JW0AueBlvB6Af7AQ0/Bb+CH+GvxrDxqfG58aXmroQTtbcDWZG4+tvpRipKQ=</latexit>correctly classified.
■ The “never Telugu” classifier has 0 recall. ■ The “always Telugu” classifier has perfect recall.
correct.
■ The “never Telugu” classifier 0 precision.
21
correct labels predicted labels
recall = TP TP + FN =
<latexit sha1_base64="L312lWJX2ZlV0ijiKCWHOpePN+Y=">AEsXicfVJdTxQxFJ3BVXH9An30pUpIdgXJLphoYkiIGuOLigmfoeva6XR2Cu10naASdM/46/xVd/8N975QFlAm3R65tzb3tPbE+WCGzsY/ApnrnWu37g5e6t7+87de/fn5h/sGFVoyrapEkrvRcQwTO2bkVbC/XjMhIsN3o6E0V3z1m2nCVbdkyZyNJhlPOCUWqPF8+GoR9JhmzJL/BfXs0vDvkdYsMQSrdUJuhCG4FLNJb5XLacVy/t+GZUN6KNnV8dxSqwrfb+72CK0jDRE0lOx65EmGcIS2JTSoTb9/5cXYRprOw/qvZBD+2Vf/b87Es/LiEGnkPctbRGZ2n3EO9ZXTo62Bz3PgQivMYNam1Di2dZiBG1EITajb2vQwoeC7jxU5nlsYrAzqgS6DYQsWgnZsjudnRjhWtJAs1QYw6Gg9yOHNGWU8F8FxeG5YQekQk7AJgRyczI1c/s0SIwMUqUhplZVLPndzgijSlBJmVfnMxVpFXxQ4Km7wcOZ7lhWUZbQolhUBWocozKObQCtKAIRqDloRTQk0xIKzoK/nyqRMHDM7fREqR84kdfUpSZGEf80ydkKVlCSLnzqcEMlFGbOEFMJ6h01yhq9qzXJ8zHPTdum0aVMXvGux0nzCM3g8HFt5mkaltTi+tvFbxm8hWYfQOCnGlilQYljTM9vM0EP8YV/F8mz/5kApy+lqsFwGWqFqicZc43RhfKMBxNtCryKcGX9tdC4QCSQMebfDa9rckAQw4v2u8y2FldGa6trH5+vrDxurXmbPAoeBL0gmHwItgI3gebwXZAw2/h9/BH+LOz1tnvfO1ETepM2O5GEyNztFvrTGYbg=</latexit>precision = TP TP + FP =
<latexit sha1_base64="sCHI9VBUq0LibUzZDHZL9e36d8=">AE3XicfVPb9MwFE5GgVF+bXDkYpgmtWxU7UCy6QJEOICFGm/0Fwqx3Fab3Yc2c62yPKRG+LKf8Wd/4MriJekhXUrWEr8r3Pfp8/v0SZ4MZ2u9/DhUuNy1euLl5rXr9x89btpeU7u0blmrIdqoTS+xExTPCU7VhuBdvPNCMyEmwvOnpR5veOmTZcpdu2yNhAklHKE06JBWi4HI5WcSQdtmNmif/oWnat1/YIC5ZYorU6QefSkFyrsMS3yum0RHnbr6OiDtro0fw8HhPrCt9urk4itIkw0SNJToeuQJinCEtix5QI98H7M3URprGy/6jaBj20Vfz9nuyPZe6HBdTIWsDZRFM4G3MP9dbRoa+S9XbDQyjOYzSlkK0dJqBGlEpTShbrv4YGKr96WYHPKA8pL02eR+2X4HBpdvpVgNdDHqTYCWYjP5weWGAY0VzyVJLBTHmoNfN7MARbTkVzDdxblhG6BEZsQMIUyKZGbiqJTxaBSRGidLwpBZV6NkVjkhjChkBszyCOZ8rwXm5g9wmzwaOp1luWUrQkukFWo7C8Uc7DCigICQjUHrYiOCRhioQvB2DNlxkwcMzt7ECoHziRV9RlJkYRvzVJ2QpWUJI0fOpwQyURs4TkwnqHTKN51mzHh/zExcOq1takKfW6w0H/EU7hl6vmr8WRimscXVu4lfMrgLzd6AwHcZ08QqDUrqLvZwNyN8H5fh/5g8/cOEcPZYrhIAhyktUBlLna9/CqEMw9FIqzybEXxhfSUNiAJOF7z2eymgEN2TvfheD3Y1O73Fn4/2Tla3nk9ZcDO4FD4JW0AueBlvB6Af7AQ0/Bb+CH+GvxrDxqfG58aXmroQTtbcDWZG4+tvpRipKQ=</latexit>correctly classified.
■ The “never Telugu” classifier has 0 recall. ■ The “always Telugu” classifier has perfect recall.
correct.
■ The “never Telugu” classifier 0 precision. ■ The “always Telugu” classifier has 0.003 precision.
22
22
can be screened out later.
22
can be screened out later.
preference for high precision.
22
can be screened out later.
preference for high precision.
and recall.
22
F1 = 2 · precision · recall precision + recall
<latexit sha1_base64="gluqdi4WIqSk7wgy9umHKkSkbMU=">AFHnicfVNLb9QwE6XBcryaAtHLoaq0i6UareA4FKoAFVcgEXqC9VL5DjOrls7jhynbWT5v3Dkl3BDXOHfMHn0kXbBUuLxzDeb8YzQSJ4avr9PzOtK+2r167P3ujcvHX7ztz8wt3tVGWasi2qhNK7AUmZ4DHbMtwItptoRmQg2E5w8Law7xwynXIVb5o8YSNJxjGPOCUGVP7CzPclHEiLzYQZ4r7arnk86DmEBYsM0VodoQtmMD4udZHrFtxoeU9t4zySuihJ9PteEKMzV2vs1RLaA1hoseSHPs2R5jHCEtiJpQI+8W5c3ERpqEy/4jaAz60m5+d6/uxzJyfQ4ykC5g1dKJOJtxBvGW070pjdZ2/D8F5iE6gBREtrWbARpRMI02o3Rw6+CDixkfXAELRKS+qPA07LEb/uDUtlqndOpWn+t4Z9eBe63z5xf7K/1yocvCoBYWvXoN/YXWCIeKZpLFhgqSpnuDfmJGlmjDqWCug7OUJYQekDHbAzEmkqUjW/aUQ0ugCVGkNHyxQaX2vIclMk1zGQCyKEF60VYop9n2MhO9HFkeJ5lhMa0CRZlARqGiQVHIWEjchAI1Ry4IjohUDQDbQwFPxdmwsQhM81EqBzZNCqjNygFEs6axeyIKilJHD6yOCKSizxkEcmEcRan0Yk8rTL4SFP0rpKx1WZOjAoBivNxzyGR4KhKSenqYZtYnD57+B3DN5Csw9A8FPCNDFKA5NqDBy8zRg/wIX4PySPT5EgNtOyJQFIpiBSlhsXTVQqUMB2OtsqRB+J/SRQuIBFUvMKzpluFgIYcXGy/y8L26srg6crq52eL62/q1pz17nsPva438F546957b+htebQ13retV63f7W/tH+2f5VQVsztc89r7Hav/8CIW+yA=</latexit>can be screened out later.
preference for high precision.
and recall.
22
F1 = 2 · precision · recall precision + recall
<latexit sha1_base64="gluqdi4WIqSk7wgy9umHKkSkbMU=">AFHnicfVNLb9QwE6XBcryaAtHLoaq0i6UareA4FKoAFVcgEXqC9VL5DjOrls7jhynbWT5v3Dkl3BDXOHfMHn0kXbBUuLxzDeb8YzQSJ4avr9PzOtK+2r167P3ujcvHX7ztz8wt3tVGWasi2qhNK7AUmZ4DHbMtwItptoRmQg2E5w8Law7xwynXIVb5o8YSNJxjGPOCUGVP7CzPclHEiLzYQZ4r7arnk86DmEBYsM0VodoQtmMD4udZHrFtxoeU9t4zySuihJ9PteEKMzV2vs1RLaA1hoseSHPs2R5jHCEtiJpQI+8W5c3ERpqEy/4jaAz60m5+d6/uxzJyfQ4ykC5g1dKJOJtxBvGW070pjdZ2/D8F5iE6gBREtrWbARpRMI02o3Rw6+CDixkfXAELRKS+qPA07LEb/uDUtlqndOpWn+t4Z9eBe63z5xf7K/1yocvCoBYWvXoN/YXWCIeKZpLFhgqSpnuDfmJGlmjDqWCug7OUJYQekDHbAzEmkqUjW/aUQ0ugCVGkNHyxQaX2vIclMk1zGQCyKEF60VYop9n2MhO9HFkeJ5lhMa0CRZlARqGiQVHIWEjchAI1Ry4IjohUDQDbQwFPxdmwsQhM81EqBzZNCqjNygFEs6axeyIKilJHD6yOCKSizxkEcmEcRan0Yk8rTL4SFP0rpKx1WZOjAoBivNxzyGR4KhKSenqYZtYnD57+B3DN5Csw9A8FPCNDFKA5NqDBy8zRg/wIX4PySPT5EgNtOyJQFIpiBSlhsXTVQqUMB2OtsqRB+J/SRQuIBFUvMKzpluFgIYcXGy/y8L26srg6crq52eL62/q1pz17nsPva438F546957b+htebQ13retV63f7W/tH+2f5VQVsztc89r7Hav/8CIW+yA=</latexit>min(precision, recall) ≤ F1 ≤ 2 · min(precision, recall)
<latexit sha1_base64="5hf/30datH8Q35WsniBmRcgzYK4=">AFzXichVRLb9NAEHYDgRJeLRy5LFSVbBqJCDBpVIFqOICDVJfqBui9XqdbOu1zXrd1lqWK/+K/8GdK/wGxo+0cZqCJcezM9/MfDuPuHAE9Xp/FxoXLvevHFz8Vbr9p279+4vLT/YS6JUrZLoyCSBy5JWMBDtqu4CthBLBkRbsD23eM3uX3/hMmER+GOymI2EGQUcp9TokA1XG7sr2JXaKzGTBHzWdtqresYhAPmKyJldIpmzGBcK3S+sfPWa7ljmjrBQc9Gy+HY+J0plxWquVhDYQJnIkyNlQZwjzEGFB1JiSQH8yZiovwtSL1BVZHeBD7eziXMXHIjXDHLENmA20EQdj7mBfG10ZApjGW54BMm5hybQnIgUWjJgExRMfUmo3ukbeCHj1gdTA0LRKc+rPA/bL7Fbw+65sVfd6dyvOlcJL+KBf6XLA2AX6gFB7G7RBTh87jlxKsi2RVsNqEzFXlyC8FDY58j2pXdycfhC8rpF0KviPQfn+HSme9UzostCthBWrevowigPsRTQVLFQ0IEly2O3EaqCJVJwGDEimCYsJPSYjdghiSARLBrpYAYNWQeMhP5LwhgoV2mkPTUSZMIFZM47mbXlynm2w1T5rwah3GqWEjLRH4aIBWhfJ+Qx+HCKshAIFRy4IromEBDFGwdzMdUmjELTpiqX4SKgU78InuNkivgLFnITmkBAm9pxr7RPAg85hP0kAZjRN/Is8rTds74XFSVemsLFML+qdwJPmIh9Ak2PFi0etq+IwVLn5b+C2DXkj2Hghux0wSFUlgUm6tgd6M8GOci/9CwoBMkCDWr6ULAnCZvARzEJtyj+BIEoYdkcySuMa4Uv+BVEIQHyoeIlndbcSAQPZnR2/y8Jeb737fL38cXK5utqNBetR9YTy7a61ktr03pn9a1dizZ+NH41fjf+NLebafNr81sJbSxUPg+t2tP8/hc2sv2L</latexit>can be screened out later.
preference for high precision.
and recall.
important as precision:
22
F1 = 2 · precision · recall precision + recall
<latexit sha1_base64="gluqdi4WIqSk7wgy9umHKkSkbMU=">AFHnicfVNLb9QwE6XBcryaAtHLoaq0i6UareA4FKoAFVcgEXqC9VL5DjOrls7jhynbWT5v3Dkl3BDXOHfMHn0kXbBUuLxzDeb8YzQSJ4avr9PzOtK+2r167P3ujcvHX7ztz8wt3tVGWasi2qhNK7AUmZ4DHbMtwItptoRmQg2E5w8Law7xwynXIVb5o8YSNJxjGPOCUGVP7CzPclHEiLzYQZ4r7arnk86DmEBYsM0VodoQtmMD4udZHrFtxoeU9t4zySuihJ9PteEKMzV2vs1RLaA1hoseSHPs2R5jHCEtiJpQI+8W5c3ERpqEy/4jaAz60m5+d6/uxzJyfQ4ykC5g1dKJOJtxBvGW070pjdZ2/D8F5iE6gBREtrWbARpRMI02o3Rw6+CDixkfXAELRKS+qPA07LEb/uDUtlqndOpWn+t4Z9eBe63z5xf7K/1yocvCoBYWvXoN/YXWCIeKZpLFhgqSpnuDfmJGlmjDqWCug7OUJYQekDHbAzEmkqUjW/aUQ0ugCVGkNHyxQaX2vIclMk1zGQCyKEF60VYop9n2MhO9HFkeJ5lhMa0CRZlARqGiQVHIWEjchAI1Ry4IjohUDQDbQwFPxdmwsQhM81EqBzZNCqjNygFEs6axeyIKilJHD6yOCKSizxkEcmEcRan0Yk8rTL4SFP0rpKx1WZOjAoBivNxzyGR4KhKSenqYZtYnD57+B3DN5Csw9A8FPCNDFKA5NqDBy8zRg/wIX4PySPT5EgNtOyJQFIpiBSlhsXTVQqUMB2OtsqRB+J/SRQuIBFUvMKzpluFgIYcXGy/y8L26srg6crq52eL62/q1pz17nsPva438F546957b+htebQ13retV63f7W/tH+2f5VQVsztc89r7Hav/8CIW+yA=</latexit>Fβ = (1 + β2) precision · recall (β2 · precision) + recall
<latexit sha1_base64="OEijKjUDTt9md5BWbXclwVw2qQ=">AFeHicfVNLb9QwE5DF8ryauHIxVBVbKBUmwUJLpUqQBUXoEh9oXq7chxn162dRI7TNrL8G7jCT+OvcGLy6CPbLZYSj2e+mfk8nglSwTPd7/+Zc2/Nd27fWbjbvXf/wcNHi0uPd7MkV5Tt0EQkaj8gGRM8Zjua8H2U8WIDATbC4/lva9E6YynsTbukjZUJxzCNOiQbVaMl1V3AgDdYTpok9ND39yvcswoJFmiVnKIpMxhfVbrI9srtrNRyz6iohY89Hq2HU+INoX1uiuNhNYRJmosydnIFAjzGFJ9IQSYX5YeyUvwjRM9A1ZPeBDe8XluYmPZW5HBeRIe4BZR+fqdMIt5FtFR7Yy1uFGR5Cch+gcWhJR0igGbETFNFKEmu0tCx9k3PxqW0AoOuVlWdht2rs5si/MA6aO134Necm4WU8G903c0RDqAcEKPnV48Ah8OBVwe8KVCvgU3n8y4DjxaX+2v9aqHrgt8Iy06ztqBrhjhMaC5ZrKkgWXbg91M9NERpTgWzXZxnLCX0mIzZAYgxkSwbmqpbLVoBTYiRMEXa1Rpr3oYIrOskAEgy+Jm07ZSOct2kOvo/dDwOM01i2mdKMoF0gkqWx+FHC6sRQECoYoDV0QnBIqnYUDgKa+kmTBxwnT7IlQOTRZV2VuUAglnxWJ2ShMpSRy+NDgikosiZBHJhbYGZ9G5PKs0q+EJT7OmSmd1mbowghonio95DI8E41jNZFsN20Tj6t/Fnxi8hWJfgOC3lCmiEwVM6gGz8DZj/AyX4v+QPL5Agti+lqkIwGXKEiQpi42t51UkGcPBWCV52iJ8zb8iCgFIBWv8aztViOgIf3p9rsu7A7W/Ddrg+9vlzc+NK254Dx1njs9x3feORvOZ2fL2XGoy92f7i/39/zfDuq86Hg1J1rfJ4rdUZ/APUHdxF</latexit>min(precision, recall) ≤ F1 ≤ 2 · min(precision, recall)
<latexit sha1_base64="5hf/30datH8Q35WsniBmRcgzYK4=">AFzXichVRLb9NAEHYDgRJeLRy5LFSVbBqJCDBpVIFqOICDVJfqBui9XqdbOu1zXrd1lqWK/+K/8GdK/wGxo+0cZqCJcezM9/MfDuPuHAE9Xp/FxoXLvevHFz8Vbr9p279+4vLT/YS6JUrZLoyCSBy5JWMBDtqu4CthBLBkRbsD23eM3uX3/hMmER+GOymI2EGQUcp9TokA1XG7sr2JXaKzGTBHzWdtqresYhAPmKyJldIpmzGBcK3S+sfPWa7ljmjrBQc9Gy+HY+J0plxWquVhDYQJnIkyNlQZwjzEGFB1JiSQH8yZiovwtSL1BVZHeBD7eziXMXHIjXDHLENmA20EQdj7mBfG10ZApjGW54BMm5hybQnIgUWjJgExRMfUmo3ukbeCHj1gdTA0LRKc+rPA/bL7Fbw+65sVfd6dyvOlcJL+KBf6XLA2AX6gFB7G7RBTh87jlxKsi2RVsNqEzFXlyC8FDY58j2pXdycfhC8rpF0KviPQfn+HSme9UzostCthBWrevowigPsRTQVLFQ0IEly2O3EaqCJVJwGDEimCYsJPSYjdghiSARLBrpYAYNWQeMhP5LwhgoV2mkPTUSZMIFZM47mbXlynm2w1T5rwah3GqWEjLRH4aIBWhfJ+Qx+HCKshAIFRy4IromEBDFGwdzMdUmjELTpiqX4SKgU78InuNkivgLFnITmkBAm9pxr7RPAg85hP0kAZjRN/Is8rTds74XFSVemsLFML+qdwJPmIh9Ak2PFi0etq+IwVLn5b+C2DXkj2Hghux0wSFUlgUm6tgd6M8GOci/9CwoBMkCDWr6ULAnCZvARzEJtyj+BIEoYdkcySuMa4Uv+BVEIQHyoeIlndbcSAQPZnR2/y8Jeb737fL38cXK5utqNBetR9YTy7a61ktr03pn9a1dizZ+NH41fjf+NLebafNr81sJbSxUPg+t2tP8/hc2sv2L</latexit>23
23
24
negative.
24
negative.
all other classes.
24
negative.
all other classes.
24
negative.
all other classes.
24
negative.
all other classes.
negatives pooled across all classes, and compute a single F1 .
24
negative.
all other classes.
negatives pooled across all classes, and compute a single F1 .
24
→ Weights all classes equally, regardless of frequency.
negative.
all other classes.
negatives pooled across all classes, and compute a single F1 .
24
→ Weights all classes equally, regardless of frequency. → Emphasizes performance on high frequency classes.
25
problem:
25
problem:
25
problem:
25
problem:
25
problem:
25
problem:
25
26
26
accurate in the future (in the limit of an infinite number of independent evaluations).
26
accurate in the future (in the limit of an infinite number of independent evaluations).
set was due only to luck. This is the null hypothesis.
26
accurate in the future (in the limit of an infinite number of independent evaluations).
set was due only to luck. This is the null hypothesis.
26
accurate in the future (in the limit of an infinite number of independent evaluations).
set was due only to luck. This is the null hypothesis.
(73% → 82%) becomes vanishingly small unless C1 really is more accurate.
26
accurate in the future (in the limit of an infinite number of independent evaluations).
set was due only to luck. This is the null hypothesis.
(73% → 82%) becomes vanishingly small unless C1 really is more accurate.
26
27
equally likely to be correct.
27
equally likely to be correct.
27
equally likely to be correct.
region, we reject the null hypothesis with p < 0.05.
27
28
in closed form using e.g. SciPy or R.
28
in closed form using e.g. SciPy or R.
28
in closed form using e.g. SciPy or R.
sets, and count how often each hypothesis holds.
28
29
F1
29
Nils Reimers and Iryna Gurevych. Reporting score distributions makes a difference: Performance study of LSTM-networks for sequence tagging. EMNLP 2017.
F1
Two recent, state-of-the-art systems for NER are
proposed by Ma and Hovy (2016)5 and by Lample et al. (2016)6. Lample et al. report an F1-score of 90.94% and Ma and Hovy report an F1-score of 91.21%. Ma and Hovy draw the conclusion that
their system achieves a significant improvement
29
Nils Reimers and Iryna Gurevych. Reporting score distributions makes a difference: Performance study of LSTM-networks for sequence tagging. EMNLP 2017.
F1
Two recent, state-of-the-art systems for NER are
proposed by Ma and Hovy (2016)5 and by Lample et al. (2016)6. Lample et al. report an F1-score of 90.94% and Ma and Hovy report an F1-score of 91.21%. Ma and Hovy draw the conclusion that
their system achieves a significant improvement
Lample et al. (reported)
29
Nils Reimers and Iryna Gurevych. Reporting score distributions makes a difference: Performance study of LSTM-networks for sequence tagging. EMNLP 2017.
F1
Two recent, state-of-the-art systems for NER are
proposed by Ma and Hovy (2016)5 and by Lample et al. (2016)6. Lample et al. report an F1-score of 90.94% and Ma and Hovy report an F1-score of 91.21%. Ma and Hovy draw the conclusion that
their system achieves a significant improvement
Ma & Hovy (reported) Lample et al. (reported)
30
due Friday September 25
setup (Python, Jupyter, NumPy, PyTorch).