algorithms for nlp
play

Algorithms for NLP CS 11-711 Fall 2020 Lecture 3: Nonlinear text - PowerPoint PPT Presentation

Algorithms for NLP CS 11-711 Fall 2020 Lecture 3: Nonlinear text classification Emma Strubell Announcements Project 1: Text classification will be available after class today, due Friday September 25 Han will lead recitation this


  1. <latexit sha1_base64="ULBOLgVvgMA7ekXaKmi5LjqeziY=">AEiHicfVLbtQwFE3KAGV4tWXJxlBVmqEPzRSkwqJSViwQRSJPlA9jBznZuLWj8h2kaRv4UtfBJ/g/MAmrZgKfHxvdc+x9cnyjgzdjT6Gc7d6t2+c3f+Xv/+g4ePHi8sLh0YlWsK+1RxpY8iYoAzCfuWQ5HmQYiIg6H0enbKn94BtowJT/bIoOJIDPJEkaJ9aHpYri0giNRYpuCJe5rObCr46FDmENidbqHF1J+RqHUvcoJouqigbujVUNGCI1m/O45TYsnD/kqL0DbCRM8EuZiWBcJMIiyITSnh5RfnLvEiTGNl/8E69HroPi7bs/HInfTwnNkA1+zjfpNEuZ83Rr6MTVuea06YnZjGqK6cLy6ONUT3QdTBuwXLQjr3p4twEx4rmAqSlnBhzPB5ldlISbRnl4Po4N5ARekpmcOyhJALMpKxfz6EVH4lRorT/pEV19PKOkghjChH5yqo95mquCt6UO85t8npSMpnlFiRtiJKcI6tQZQUMw3U8sIDQjXzWhFNiSbUesP4Ll6iSYGfge1ehIpJaZKavSMpEn6tQcI5VUIQGb8ocUIE40UMCcm5dSU2yW98U2vW4jOWmbZLF02b+t6SFivNZkwSXtmz9mg37KfU4vrfx+/Av4WGD17gxw0sUp7JY3hnH+bGX6GK/i/Sib/VHrYvVZC/CXqVqgMpCla/zLlQEczbTKs47ga/trof4AkviON/XQ3dZUeEOr9rvOjY3Bi/3Nj89Gp5Z7e15nzwNHgeDIJxsBXsBO+DvWA/oGERfgu/hz96/d6ot9V705TOhe2eJ0Fn9HZ/AQ/jiSw=</latexit> <latexit sha1_base64="RXVil6+ULiMq+WxrHfc+47A0JnU=">ADjXicbVJb9MwFHYbOka5rINHXiKqS0bVTNA8IDQBA/wOCS6TWpK5TgnjTVfItvpFln5m0j8Fx5wkiItXS0l/vJ95ZzTpQxqs10+qfT9R709h7uP+o/fvL02cHg8PmFlrkiMCOSXUVYQ2MCpgZahcZQowjxhcRtdfK/1yDUpTKX6aIoMFxytBE0qwcdRykIURt6FJweDylx2Z42Bc+iGDxGCl5I2/JTvxuOaSclRdtxVLx+WJXzRg7L/ZrYcpNrYox8vBcDqZ1se/D4INGKLNOV8ednthLEnOQRjCsNbzYJqZhcXKUMKg7Ie5hgyTa7yCuYMCc9ALW7em9I8cE/uJVO4Rxq/Zux4Wc60LHjlLjk2qt7WK3KXNc5N8XFgqstyAIE2iJGe+kX7VZz+mCohQOYKOpq9UmKFSbGTaN/dDdNCmwNpvUjVid15hYXcfetQMANkZxjEb+2YI5ZUMCc6ZKW2ok/94V1tO4jXN9KZDt02L+m7aJpSKrqjArJp8Pf427a7UhPW7XYKtjV3iqlyZgbBlDQmTGsJopWSetYKX2/51UBcAJ64zjT203RoLtzjB9prcBxenk+Dt5PTHu+HZl80K7aOX6BUaoQB9QGfoOzpHM0TQb/S30+vseQfe+T97kx7XY2Pi9Q63jf/gFH5C9z</latexit> <latexit sha1_base64="SVP5fbdei0Qd/DdSgRoYJhRZw=">AEXHicfVJb9MwFE63AqNjsIHECy+GaVLRtUOJHiZNAEPvEwMiV3QXCrHOWmt+RLZztbI+Ifywit/Aycp0GwDS4k/n4vP53O+OPM2MHge2tpuX3r9p2Vu53Ve2v3H6xvPDw2KtcUjqjiSp/GxABnEo4sxOMw1ExBxO4vN3pf/kArRhSn62RQYjQSaSpYwSG0zj9Z9bOBYO2ylY4r+6rt0e9jzCHFJLtFaX6Io7OLcrW+q75TYraznd1BRgx56cbMfT4l1he91tuYI7SFM9ESQ2dgVCDOJsCB2Sgl3X7xfqIswTZT9R9Ve4EO7xd9zp7oei9yPi1Ai64aQPTRe3xz0B9VC18FwDjaj+TocbyNcKJoLkBayokxZ8NBZkeOaMsoB9/BuYGM0HMygbMAJRFgRq6aiEdbwZKgVOnwSYsq62KGI8KYQsQhsnyzueorjTf5znKbvhk5JrPcgqR1oTnyCpUjhclTAO1vAiAUM0CV0SnRBNqgwhC6xfKTIFfgG0+hIqRM2lVvUEpFuGsQcIlVUIQmTx3OCWC8SKBlOTceodN+hvf1Jqd5IJlZt6lWd2mTtCZxUqzCZOEl5qrhNc0h21qcfXv4PcQZqHhIBD8mIEmVunApFaRD7OZ4Ke4hP+LZPJPZIDNZ7mKQHhM2QKVgXS+FiVXBnA80SrPGoSv5VdEwUkDR2v46GZVkcEQ6vyu86ON7tD1/2dz+92tx/O5fmSvQkehZ1o2H0OtqPkSH0VFEWwct0/rW8s/2u32anutDl1qzXMeRY3VfvwLq317IQ=</latexit> <latexit sha1_base64="A8+pB2HUH1eWwMsv3S3SIvbkjY=">AERXicfVJLb9QwEM52eZTl1cKRi6GqtEtLtVsQcEGqgAMXRJHoA9XLauJMdq36EdlO28jK7+PKlR/BDXEFJ1mgaQuWEn+ZR+abmS/OBLduOPzaWeheunzl6uK13vUbN2/dXlq+s2t1bhjuMC202Y/BouAKdx3AvczgyBjgXvx4avKv3eExnKtPrgiw7GEqeIpZ+CabL0ZXG0lM3QwflJ93a6NBSajA1IEx+picQfnWm1Ly351nVRWPijXSdGAXl0sZ/OwPmiHPTmgLwgFMxUwsnEF4RyRagEN2Mg/MeyPFWUJZo94+ig0CH9Yu/35OleHGsD7kPBjNwUo0P9uT5YUxTLJSrHBFh7MBpmbuzBOM4Elj2aW8yAHcIUDwJUINGOfT36kqwGS0JSbcKjHKmtpzM8SGsLGYfIqjt71lcZL/Id5C59PvZcZblDxZpCaS6I06TaI0m4QeZEQAwNXwmZgLmw7d7q6TIzFEfo2o0wOfY2rau3KMUyfBtUeMy0lKCSh56mILkoEkwhF6701Ka/8UWjWU+OeGbnUzpxtQLgnJUGz7lCkQlrlphbXO4Zo7W7x59jWEXBt8Ggu8yNOC0CUwavZRhN1N6n1bwf5Fc/YkMsN2WrwmEZqoR6AyVLxv5CW2RxlOj86xF+Fx+T8ANIw8SYe2lNRBDk6Kz8zoPdzY3R43N909Wtl7OpbkY3YseRP1oFD2LtqI30Xa0E7HO0w7tYCftfu5+637v/mhCFzrznLtR63R/gL9xHct</latexit> How to obtain θ ? ■ The learning problem is to find the right weights θ . ■ Naïve Bayes: set θ equal to the empirical frequencies: P 0 count( y ) count( y , j ) ˆ φ y , j = p ( x j | y ) = = µ y = p ( y ) = ˆ = y 0 count( y 0 ) . P V P j 0 =1 count( y , j 0 ) ■ Perceptron update: θ ( t +1) ← θ ( t ) − θ + f ( x ( i ) , y ( i ) ) − f ( x ( i ) , ˆ y ) . ■ Large-margin update: θ ( t +1) ← θ ( t ) + f ( x ( i ) , y ( i ) ) − f ( x ( i ) , ˆ θ · f ( x ( i ) , y ) + c ( y ( i ) , y ) with y = argmax ˆ y ) y ∈ Y 6

  2. <latexit sha1_base64="FZhWaDfVpIRI7w8uTmyhx8A+678=">AH5HictVRdb9s2FUzZ261jzb4164BUGkJTUsb8AGDAWKbisGDO0yIGk7hI5A0ZTNRpQ0kqtEvwHexv2un+1h/2WvexKlF07dranCjB8eXnuB+85ZFJmXOnh8O9bO+/0dt/t37jv/f+Bx/evbf30TNVJKyM1pkhXyREMUynrMzXGXpSEZFk7Hly+W2z/yKScWL/FTXJRsLMs15yinR4Ir3evkBToTBesY0sRcm0EdRaBHOWKqJlMUrdG0bNo9aX2qD5m/eHloj1HtjBDd376PZ0Sb2ob+QWehBwgTORVkHpsaYZ4jLIieUZKZX6xdqYswnRT6hqoh9EOD+s26y49FZeMapQBYB6ghbucQv1jtFL26dPFLKM4naAFtGpHCSAbdZG2nqSTUnJ5Y+EHFx0/tGhCGTnkz5W3YE4d9HEfLzVF3pmVct+4KvskH8Z2vSYATmAckCaKWBVhcjEKX8aZMQe7XjBczbw4huC5DZaQ4w4QNnr4FTX9t8aoTfU/Mf7B4qyRNdERuzD3HZ3xJXA1x7pAr0M7t01xUEFLRP4dKG0BcLx09DSivJ8FezkEUer8KVUIAhEt4ke3YweDAZbQ8yP8Wt7UxiWfDrT4tT/61cJVUJkOvhxv0AWR86zc4d+huXOnTNpcF8me0w9N9CZ9DW219h7AdPdsvaTckP763PxwM2w9tGlFn7HvdxLv7YzxpKCVYLmGVHqPBqWemyI1JxmzPq4Uqwk9JM2TmYORFMjU37Mlp0AJ4JSgsJv1yj1rsaYhQqhYJIJvRqut7jXPb3nml06/HhudlpVlOXaG0yhDo3lm0YTDNdBZDQahkOviM4I3AYNjzEIZaXMjGVXTK8fhIqxUWlbfa2lRMBaspy9oUQJ98bnBKBM/qCUtJlWlrsEoX9rbRHE+ueKm6Kc3dmHygTuMCKOI5XF2gseVy3d0S6Gj08XcMuJDsCT4U8k0YWETtxjboGbKf4UN+Z/IeHZWCDBXD+WaRuAwzQjKEqWG+u0nxWK4WQqi6pca3gjvm0UEpAUJu7wbD3MIUCQ0X5bRrPRoPoi8Ho5y/3Hz7qpHnb+8T7zAu8yPvKe+j94J14Zx7t/dX7Z7e3u9tP+7/1f+/4aA7t7qYj721r/nv138vCY=</latexit> <latexit sha1_base64="RXVil6+ULiMq+WxrHfc+47A0JnU=">ADjXicbVJb9MwFHYbOka5rINHXiKqS0bVTNA8IDQBA/wOCS6TWpK5TgnjTVfItvpFln5m0j8Fx5wkiItXS0l/vJ95ZzTpQxqs10+qfT9R709h7uP+o/fvL02cHg8PmFlrkiMCOSXUVYQ2MCpgZahcZQowjxhcRtdfK/1yDUpTKX6aIoMFxytBE0qwcdRykIURt6FJweDylx2Z42Bc+iGDxGCl5I2/JTvxuOaSclRdtxVLx+WJXzRg7L/ZrYcpNrYox8vBcDqZ1se/D4INGKLNOV8ednthLEnOQRjCsNbzYJqZhcXKUMKg7Ie5hgyTa7yCuYMCc9ALW7em9I8cE/uJVO4Rxq/Zux4Wc60LHjlLjk2qt7WK3KXNc5N8XFgqstyAIE2iJGe+kX7VZz+mCohQOYKOpq9UmKFSbGTaN/dDdNCmwNpvUjVid15hYXcfetQMANkZxjEb+2YI5ZUMCc6ZKW2ok/94V1tO4jXN9KZDt02L+m7aJpSKrqjArJp8Pf427a7UhPW7XYKtjV3iqlyZgbBlDQmTGsJopWSetYKX2/51UBcAJ64zjT203RoLtzjB9prcBxenk+Dt5PTHu+HZl80K7aOX6BUaoQB9QGfoOzpHM0TQb/S30+vseQfe+T97kx7XY2Pi9Q63jf/gFH5C9z</latexit> <latexit sha1_base64="ULBOLgVvgMA7ekXaKmi5LjqeziY=">AEiHicfVLbtQwFE3KAGV4tWXJxlBVmqEPzRSkwqJSViwQRSJPlA9jBznZuLWj8h2kaRv4UtfBJ/g/MAmrZgKfHxvdc+x9cnyjgzdjT6Gc7d6t2+c3f+Xv/+g4ePHi8sLh0YlWsK+1RxpY8iYoAzCfuWQ5HmQYiIg6H0enbKn94BtowJT/bIoOJIDPJEkaJ9aHpYri0giNRYpuCJe5rObCr46FDmENidbqHF1J+RqHUvcoJouqigbujVUNGCI1m/O45TYsnD/kqL0DbCRM8EuZiWBcJMIiyITSnh5RfnLvEiTGNl/8E69HroPi7bs/HInfTwnNkA1+zjfpNEuZ83Rr6MTVuea06YnZjGqK6cLy6ONUT3QdTBuwXLQjr3p4twEx4rmAqSlnBhzPB5ldlISbRnl4Po4N5ARekpmcOyhJALMpKxfz6EVH4lRorT/pEV19PKOkghjChH5yqo95mquCt6UO85t8npSMpnlFiRtiJKcI6tQZQUMw3U8sIDQjXzWhFNiSbUesP4Ll6iSYGfge1ehIpJaZKavSMpEn6tQcI5VUIQGb8ocUIE40UMCcm5dSU2yW98U2vW4jOWmbZLF02b+t6SFivNZkwSXtmz9mg37KfU4vrfx+/Av4WGD17gxw0sUp7JY3hnH+bGX6GK/i/Sib/VHrYvVZC/CXqVqgMpCla/zLlQEczbTKs47ga/trof4AkviON/XQ3dZUeEOr9rvOjY3Bi/3Nj89Gp5Z7e15nzwNHgeDIJxsBXsBO+DvWA/oGERfgu/hz96/d6ot9V705TOhe2eJ0Fn9HZ/AQ/jiSw=</latexit> <latexit sha1_base64="SVP5fbdei0Qd/DdSgRoYJhRZw=">AEXHicfVJb9MwFE63AqNjsIHECy+GaVLRtUOJHiZNAEPvEwMiV3QXCrHOWmt+RLZztbI+Ifywit/Aycp0GwDS4k/n4vP53O+OPM2MHge2tpuX3r9p2Vu53Ve2v3H6xvPDw2KtcUjqjiSp/GxABnEo4sxOMw1ExBxO4vN3pf/kArRhSn62RQYjQSaSpYwSG0zj9Z9bOBYO2ylY4r+6rt0e9jzCHFJLtFaX6Io7OLcrW+q75TYraznd1BRgx56cbMfT4l1he91tuYI7SFM9ESQ2dgVCDOJsCB2Sgl3X7xfqIswTZT9R9Ve4EO7xd9zp7oei9yPi1Ai64aQPTRe3xz0B9VC18FwDjaj+TocbyNcKJoLkBayokxZ8NBZkeOaMsoB9/BuYGM0HMygbMAJRFgRq6aiEdbwZKgVOnwSYsq62KGI8KYQsQhsnyzueorjTf5znKbvhk5JrPcgqR1oTnyCpUjhclTAO1vAiAUM0CV0SnRBNqgwhC6xfKTIFfgG0+hIqRM2lVvUEpFuGsQcIlVUIQmTx3OCWC8SKBlOTceodN+hvf1Jqd5IJlZt6lWd2mTtCZxUqzCZOEl5qrhNc0h21qcfXv4PcQZqHhIBD8mIEmVunApFaRD7OZ4Ke4hP+LZPJPZIDNZ7mKQHhM2QKVgXS+FiVXBnA80SrPGoSv5VdEwUkDR2v46GZVkcEQ6vyu86ON7tD1/2dz+92tx/O5fmSvQkehZ1o2H0OtqPkSH0VFEWwct0/rW8s/2u32anutDl1qzXMeRY3VfvwLq317IQ=</latexit> <latexit sha1_base64="A8+pB2HUH1eWwMsv3S3SIvbkjY=">AERXicfVJLb9QwEM52eZTl1cKRi6GqtEtLtVsQcEGqgAMXRJHoA9XLauJMdq36EdlO28jK7+PKlR/BDXEFJ1mgaQuWEn+ZR+abmS/OBLduOPzaWeheunzl6uK13vUbN2/dXlq+s2t1bhjuMC202Y/BouAKdx3AvczgyBjgXvx4avKv3eExnKtPrgiw7GEqeIpZ+CabL0ZXG0lM3QwflJ93a6NBSajA1IEx+picQfnWm1Ly351nVRWPijXSdGAXl0sZ/OwPmiHPTmgLwgFMxUwsnEF4RyRagEN2Mg/MeyPFWUJZo94+ig0CH9Yu/35OleHGsD7kPBjNwUo0P9uT5YUxTLJSrHBFh7MBpmbuzBOM4Elj2aW8yAHcIUDwJUINGOfT36kqwGS0JSbcKjHKmtpzM8SGsLGYfIqjt71lcZL/Id5C59PvZcZblDxZpCaS6I06TaI0m4QeZEQAwNXwmZgLmw7d7q6TIzFEfo2o0wOfY2rau3KMUyfBtUeMy0lKCSh56mILkoEkwhF6701Ka/8UWjWU+OeGbnUzpxtQLgnJUGz7lCkQlrlphbXO4Zo7W7x59jWEXBt8Ggu8yNOC0CUwavZRhN1N6n1bwf5Fc/YkMsN2WrwmEZqoR6AyVLxv5CW2RxlOj86xF+Fx+T8ANIw8SYe2lNRBDk6Kz8zoPdzY3R43N909Wtl7OpbkY3YseRP1oFD2LtqI30Xa0E7HO0w7tYCftfu5+637v/mhCFzrznLtR63R/gL9xHct</latexit> How to obtain θ ? ■ The learning problem is to find the right weights θ . ■ Naïve Bayes: set θ equal to the empirical frequencies: P 0 count( y ) count( y , j ) ˆ φ y , j = p ( x j | y ) = = µ y = p ( y ) = ˆ = y 0 count( y 0 ) . P V P j 0 =1 count( y , j 0 ) ■ Perceptron update: θ ( t +1) ← θ ( t ) − θ + f ( x ( i ) , y ( i ) ) − f ( x ( i ) , ˆ y ) . ■ Large-margin update: θ ( t +1) ← θ ( t ) + f ( x ( i ) , y ( i ) ) − f ( x ( i ) , ˆ θ · f ( x ( i ) , y ) + c ( y ( i ) , y ) with y = argmax ˆ y ) y ∈ Y ■ Logistic regression update: θ ( t +1) ← θ ( t ) + f ( x ( i ) , y ( i ) ) − E y | x h i f ( x ( i ) , y ) 6

  3. <latexit sha1_base64="FZhWaDfVpIRI7w8uTmyhx8A+678=">AH5HictVRdb9s2FUzZ261jzb4164BUGkJTUsb8AGDAWKbisGDO0yIGk7hI5A0ZTNRpQ0kqtEvwHexv2un+1h/2WvexKlF07dranCjB8eXnuB+85ZFJmXOnh8O9bO+/0dt/t37jv/f+Bx/evbf30TNVJKyM1pkhXyREMUynrMzXGXpSEZFk7Hly+W2z/yKScWL/FTXJRsLMs15yinR4Ir3evkBToTBesY0sRcm0EdRaBHOWKqJlMUrdG0bNo9aX2qD5m/eHloj1HtjBDd376PZ0Sb2ob+QWehBwgTORVkHpsaYZ4jLIieUZKZX6xdqYswnRT6hqoh9EOD+s26y49FZeMapQBYB6ghbucQv1jtFL26dPFLKM4naAFtGpHCSAbdZG2nqSTUnJ5Y+EHFx0/tGhCGTnkz5W3YE4d9HEfLzVF3pmVct+4KvskH8Z2vSYATmAckCaKWBVhcjEKX8aZMQe7XjBczbw4huC5DZaQ4w4QNnr4FTX9t8aoTfU/Mf7B4qyRNdERuzD3HZ3xJXA1x7pAr0M7t01xUEFLRP4dKG0BcLx09DSivJ8FezkEUer8KVUIAhEt4ke3YweDAZbQ8yP8Wt7UxiWfDrT4tT/61cJVUJkOvhxv0AWR86zc4d+huXOnTNpcF8me0w9N9CZ9DW219h7AdPdsvaTckP763PxwM2w9tGlFn7HvdxLv7YzxpKCVYLmGVHqPBqWemyI1JxmzPq4Uqwk9JM2TmYORFMjU37Mlp0AJ4JSgsJv1yj1rsaYhQqhYJIJvRqut7jXPb3nml06/HhudlpVlOXaG0yhDo3lm0YTDNdBZDQahkOviM4I3AYNjzEIZaXMjGVXTK8fhIqxUWlbfa2lRMBaspy9oUQJ98bnBKBM/qCUtJlWlrsEoX9rbRHE+ueKm6Kc3dmHygTuMCKOI5XF2gseVy3d0S6Gj08XcMuJDsCT4U8k0YWETtxjboGbKf4UN+Z/IeHZWCDBXD+WaRuAwzQjKEqWG+u0nxWK4WQqi6pca3gjvm0UEpAUJu7wbD3MIUCQ0X5bRrPRoPoi8Ho5y/3Hz7qpHnb+8T7zAu8yPvKe+j94J14Zx7t/dX7Z7e3u9tP+7/1f+/4aA7t7qYj721r/nv138vCY=</latexit> <latexit sha1_base64="RXVil6+ULiMq+WxrHfc+47A0JnU=">ADjXicbVJb9MwFHYbOka5rINHXiKqS0bVTNA8IDQBA/wOCS6TWpK5TgnjTVfItvpFln5m0j8Fx5wkiItXS0l/vJ95ZzTpQxqs10+qfT9R709h7uP+o/fvL02cHg8PmFlrkiMCOSXUVYQ2MCpgZahcZQowjxhcRtdfK/1yDUpTKX6aIoMFxytBE0qwcdRykIURt6FJweDylx2Z42Bc+iGDxGCl5I2/JTvxuOaSclRdtxVLx+WJXzRg7L/ZrYcpNrYox8vBcDqZ1se/D4INGKLNOV8ednthLEnOQRjCsNbzYJqZhcXKUMKg7Ie5hgyTa7yCuYMCc9ALW7em9I8cE/uJVO4Rxq/Zux4Wc60LHjlLjk2qt7WK3KXNc5N8XFgqstyAIE2iJGe+kX7VZz+mCohQOYKOpq9UmKFSbGTaN/dDdNCmwNpvUjVid15hYXcfetQMANkZxjEb+2YI5ZUMCc6ZKW2ok/94V1tO4jXN9KZDt02L+m7aJpSKrqjArJp8Pf427a7UhPW7XYKtjV3iqlyZgbBlDQmTGsJopWSetYKX2/51UBcAJ64zjT203RoLtzjB9prcBxenk+Dt5PTHu+HZl80K7aOX6BUaoQB9QGfoOzpHM0TQb/S30+vseQfe+T97kx7XY2Pi9Q63jf/gFH5C9z</latexit> <latexit sha1_base64="ULBOLgVvgMA7ekXaKmi5LjqeziY=">AEiHicfVLbtQwFE3KAGV4tWXJxlBVmqEPzRSkwqJSViwQRSJPlA9jBznZuLWj8h2kaRv4UtfBJ/g/MAmrZgKfHxvdc+x9cnyjgzdjT6Gc7d6t2+c3f+Xv/+g4ePHi8sLh0YlWsK+1RxpY8iYoAzCfuWQ5HmQYiIg6H0enbKn94BtowJT/bIoOJIDPJEkaJ9aHpYri0giNRYpuCJe5rObCr46FDmENidbqHF1J+RqHUvcoJouqigbujVUNGCI1m/O45TYsnD/kqL0DbCRM8EuZiWBcJMIiyITSnh5RfnLvEiTGNl/8E69HroPi7bs/HInfTwnNkA1+zjfpNEuZ83Rr6MTVuea06YnZjGqK6cLy6ONUT3QdTBuwXLQjr3p4twEx4rmAqSlnBhzPB5ldlISbRnl4Po4N5ARekpmcOyhJALMpKxfz6EVH4lRorT/pEV19PKOkghjChH5yqo95mquCt6UO85t8npSMpnlFiRtiJKcI6tQZQUMw3U8sIDQjXzWhFNiSbUesP4Ll6iSYGfge1ehIpJaZKavSMpEn6tQcI5VUIQGb8ocUIE40UMCcm5dSU2yW98U2vW4jOWmbZLF02b+t6SFivNZkwSXtmz9mg37KfU4vrfx+/Av4WGD17gxw0sUp7JY3hnH+bGX6GK/i/Sib/VHrYvVZC/CXqVqgMpCla/zLlQEczbTKs47ga/trof4AkviON/XQ3dZUeEOr9rvOjY3Bi/3Nj89Gp5Z7e15nzwNHgeDIJxsBXsBO+DvWA/oGERfgu/hz96/d6ot9V705TOhe2eJ0Fn9HZ/AQ/jiSw=</latexit> <latexit sha1_base64="SVP5fbdei0Qd/DdSgRoYJhRZw=">AEXHicfVJb9MwFE63AqNjsIHECy+GaVLRtUOJHiZNAEPvEwMiV3QXCrHOWmt+RLZztbI+Ifywit/Aycp0GwDS4k/n4vP53O+OPM2MHge2tpuX3r9p2Vu53Ve2v3H6xvPDw2KtcUjqjiSp/GxABnEo4sxOMw1ExBxO4vN3pf/kArRhSn62RQYjQSaSpYwSG0zj9Z9bOBYO2ylY4r+6rt0e9jzCHFJLtFaX6Io7OLcrW+q75TYraznd1BRgx56cbMfT4l1he91tuYI7SFM9ESQ2dgVCDOJsCB2Sgl3X7xfqIswTZT9R9Ve4EO7xd9zp7oei9yPi1Ai64aQPTRe3xz0B9VC18FwDjaj+TocbyNcKJoLkBayokxZ8NBZkeOaMsoB9/BuYGM0HMygbMAJRFgRq6aiEdbwZKgVOnwSYsq62KGI8KYQsQhsnyzueorjTf5znKbvhk5JrPcgqR1oTnyCpUjhclTAO1vAiAUM0CV0SnRBNqgwhC6xfKTIFfgG0+hIqRM2lVvUEpFuGsQcIlVUIQmTx3OCWC8SKBlOTceodN+hvf1Jqd5IJlZt6lWd2mTtCZxUqzCZOEl5qrhNc0h21qcfXv4PcQZqHhIBD8mIEmVunApFaRD7OZ4Ke4hP+LZPJPZIDNZ7mKQHhM2QKVgXS+FiVXBnA80SrPGoSv5VdEwUkDR2v46GZVkcEQ6vyu86ON7tD1/2dz+92tx/O5fmSvQkehZ1o2H0OtqPkSH0VFEWwct0/rW8s/2u32anutDl1qzXMeRY3VfvwLq317IQ=</latexit> <latexit sha1_base64="A8+pB2HUH1eWwMsv3S3SIvbkjY=">AERXicfVJLb9QwEM52eZTl1cKRi6GqtEtLtVsQcEGqgAMXRJHoA9XLauJMdq36EdlO28jK7+PKlR/BDXEFJ1mgaQuWEn+ZR+abmS/OBLduOPzaWeheunzl6uK13vUbN2/dXlq+s2t1bhjuMC202Y/BouAKdx3AvczgyBjgXvx4avKv3eExnKtPrgiw7GEqeIpZ+CabL0ZXG0lM3QwflJ93a6NBSajA1IEx+picQfnWm1Ly351nVRWPijXSdGAXl0sZ/OwPmiHPTmgLwgFMxUwsnEF4RyRagEN2Mg/MeyPFWUJZo94+ig0CH9Yu/35OleHGsD7kPBjNwUo0P9uT5YUxTLJSrHBFh7MBpmbuzBOM4Elj2aW8yAHcIUDwJUINGOfT36kqwGS0JSbcKjHKmtpzM8SGsLGYfIqjt71lcZL/Id5C59PvZcZblDxZpCaS6I06TaI0m4QeZEQAwNXwmZgLmw7d7q6TIzFEfo2o0wOfY2rau3KMUyfBtUeMy0lKCSh56mILkoEkwhF6701Ka/8UWjWU+OeGbnUzpxtQLgnJUGz7lCkQlrlphbXO4Zo7W7x59jWEXBt8Ggu8yNOC0CUwavZRhN1N6n1bwf5Fc/YkMsN2WrwmEZqoR6AyVLxv5CW2RxlOj86xF+Fx+T8ANIw8SYe2lNRBDk6Kz8zoPdzY3R43N909Wtl7OpbkY3YseRP1oFD2LtqI30Xa0E7HO0w7tYCftfu5+637v/mhCFzrznLtR63R/gL9xHct</latexit> How to obtain θ ? ■ The learning problem is to find the right weights θ . ■ Naïve Bayes: set θ equal to the empirical frequencies: P 0 count( y ) count( y , j ) ˆ φ y , j = p ( x j | y ) = = µ y = p ( y ) = ˆ = y 0 count( y 0 ) . P V P j 0 =1 count( y , j 0 ) ■ Perceptron update: θ ( t +1) ← θ ( t ) − θ + f ( x ( i ) , y ( i ) ) − f ( x ( i ) , ˆ y ) . ■ Large-margin update: θ ( t +1) ← θ ( t ) + f ( x ( i ) , y ( i ) ) − f ( x ( i ) , ˆ θ · f ( x ( i ) , y ) + c ( y ( i ) , y ) with y = argmax ˆ y ) y ∈ Y ■ Logistic regression update: θ ( t +1) ← θ ( t ) + f ( x ( i ) , y ( i ) ) − E y | x h i f ( x ( i ) , y ) ■ All these methods for supervised learning assume a labeled dataset of N examples: aset { ( x ( i ) , y ( i ) ) } N i =1 . 6

  4. Today Nonlinear classification & evaluating classifiers Engineered features 7

  5. Today Nonlinear classification & evaluating classifiers Engineered features linear classification 7

  6. Today Nonlinear classification & evaluating classifiers Engineered features Learned features linear classification ~mid 2010s 7

  7. Today Nonlinear classification & evaluating classifiers Engineered features Learned features linear classification nonlinear classification ~mid 2010s 7

  8. A simple feed-forward architecture 8

  9. A simple feed-forward architecture ■ Suppose we want to label stories as s Y = { Good , Bad , Okay } . 8

  10. A simple feed-forward architecture ■ Suppose we want to label stories as s Y = { Good , Bad , Okay } . ■ What makes a good story? 8

  11. A simple feed-forward architecture ■ Suppose we want to label stories as s Y = { Good , Bad , Okay } . ■ What makes a good story? ■ Exciting plot, compelling characters, interesting setting… 8

  12. A simple feed-forward architecture ■ Suppose we want to label stories as s Y = { Good , Bad , Okay } . ■ What makes a good story? ■ Exciting plot, compelling characters, interesting setting… ■ Let’s call this vector of features z . 8

  13. A simple feed-forward architecture ■ Suppose we want to label stories as s Y = { Good , Bad , Okay } . ■ What makes a good story? ■ Exciting plot, compelling characters, interesting setting… ■ Let’s call this vector of features z . ■ If z is well-chosen, it will be easy to predict from x (the words), and it will make it easy to predict the label, y . 8

  14. A simple feed-forward architecture y z . . . x . . . 9

  15. A simple feed-forward architecture ■ Let’s predict each z k from x by binary logistic regression: Pr( z k = 1 | x ) = σ ( θ ( x → z ) · x ) k y z . . . x . . . 9

  16. <latexit sha1_base64="SWzkVHoh/1moFiZ3e/XQhAxJPE=">AF9nichVRfb9MwEM9aCqMw2ECB14M06SGdVNTkOBl0gRo4gUo0jaG5q5yXKf1FifBcbYGyxKfhDfEK1+HT8DX4PKnW9t1ECnJ+e53d7+7s+1GPo9Vq/V7oVK9Vrt+Y/Fm/dbtpTt3l1fu7cdhIinbo6EfygOXxMznAdtTXPnsIJKMCNdn9yT15n90ymTMQ+DXZVGrCvIOAep0SBqrdS+baGXaGxGjJFzJFuqHXHNgj7zFNEyvAMzZjBuJ7rPNPIfqNMy23TRGkh2Ghjvh0PidKpsetrpYS2ECZyIMiop1OEeYCwIGpIia8/GzORF2HaD9UVW3gQxvpxbqMj0VieinkiBqA2UJjdTkBvI10bHJjUW43jEk530hmZEpNCSARs/Z+pJQvVux8ALGXfemykgNJ3yrMvzsJ0Cu9Nzo3tsqZzv3JdJryIB/6lLguAXegHBGk4+RgcdS2i4hXRWqUsNmE9mTkcRmCB6ZxDmWADvbD19Qxj8X2nmo/jUx6U6Rjvr7EhvFNPsncCoRliF6KtRsb0ldbm638QZcFpxRWrfLpwJ7t4n5IE8ECRX0Sx4dOK1JdTaTi1GemjpOYRYSekAE7BDEgsVdnZ8Vg9ZA0deKOENFMq1kx6aiDhOhQvIrL541pYp59kOE+W97GoeRIliAS0SeYmPoNDs4KE+h8YoPwWBUMmBK6JDAg1ScDxhAhNphsw/ZWq6ECq6Ovby7FOUXAFryQJ2RkMhSNB/qrFHBPfTPvNI4iujceyN5XmtafZPeRSXRoVbarDnBUOJR/wAIYJl0F+I0yr4TdUOP/W8RsGs5DsHRD8EDFJVCiBSXG8DcxmgB/jTPwXEjbSGAnidFk6JwDFZC0IxZoU9wWfhgz7A5kmERThC/50QhAPGg4wWeTbsVCNiQzuz2uyzstzedZ5vtj89Xt1+VW3PRemQ9sRqWY72wtq23Vsfas2jlT3Wp+qD6sDaqfa/9qP0soJWF0ue+NfXUfv0FfrMKzA=</latexit> A simple feed-forward architecture logistic fn aka sigmoid σ 1 ■ Let’s predict each z k from x by binary logistic = 1 + e − θ ( x → z ) regression: x k Pr( z k = 1 | x ) = σ ( θ ( x → z ) · x ) k y z . . . x . . . 9

  17. <latexit sha1_base64="SWzkVHoh/1moFiZ3e/XQhAxJPE=">AF9nichVRfb9MwEM9aCqMw2ECB14M06SGdVNTkOBl0gRo4gUo0jaG5q5yXKf1FifBcbYGyxKfhDfEK1+HT8DX4PKnW9t1ECnJ+e53d7+7s+1GPo9Vq/V7oVK9Vrt+Y/Fm/dbtpTt3l1fu7cdhIinbo6EfygOXxMznAdtTXPnsIJKMCNdn9yT15n90ymTMQ+DXZVGrCvIOAep0SBqrdS+baGXaGxGjJFzJFuqHXHNgj7zFNEyvAMzZjBuJ7rPNPIfqNMy23TRGkh2Ghjvh0PidKpsetrpYS2ECZyIMiop1OEeYCwIGpIia8/GzORF2HaD9UVW3gQxvpxbqMj0VieinkiBqA2UJjdTkBvI10bHJjUW43jEk530hmZEpNCSARs/Z+pJQvVux8ALGXfemykgNJ3yrMvzsJ0Cu9Nzo3tsqZzv3JdJryIB/6lLguAXegHBGk4+RgcdS2i4hXRWqUsNmE9mTkcRmCB6ZxDmWADvbD19Qxj8X2nmo/jUx6U6Rjvr7EhvFNPsncCoRliF6KtRsb0ldbm638QZcFpxRWrfLpwJ7t4n5IE8ECRX0Sx4dOK1JdTaTi1GemjpOYRYSekAE7BDEgsVdnZ8Vg9ZA0deKOENFMq1kx6aiDhOhQvIrL541pYp59kOE+W97GoeRIliAS0SeYmPoNDs4KE+h8YoPwWBUMmBK6JDAg1ScDxhAhNphsw/ZWq6ECq6Ovby7FOUXAFryQJ2RkMhSNB/qrFHBPfTPvNI4iujceyN5XmtafZPeRSXRoVbarDnBUOJR/wAIYJl0F+I0yr4TdUOP/W8RsGs5DsHRD8EDFJVCiBSXG8DcxmgB/jTPwXEjbSGAnidFk6JwDFZC0IxZoU9wWfhgz7A5kmERThC/50QhAPGg4wWeTbsVCNiQzuz2uyzstzedZ5vtj89Xt1+VW3PRemQ9sRqWY72wtq23Vsfas2jlT3Wp+qD6sDaqfa/9qP0soJWF0ue+NfXUfv0FfrMKzA=</latexit> A simple feed-forward architecture logistic fn aka sigmoid σ 1 ■ Let’s predict each z k from x by binary logistic = 1 + e − θ ( x → z ) regression: x k Pr( z k = 1 | x ) = σ ( θ ( x → z ) · x ) k y z . . . x . . . 9

  18. <latexit sha1_base64="SWzkVHoh/1moFiZ3e/XQhAxJPE=">AF9nichVRfb9MwEM9aCqMw2ECB14M06SGdVNTkOBl0gRo4gUo0jaG5q5yXKf1FifBcbYGyxKfhDfEK1+HT8DX4PKnW9t1ECnJ+e53d7+7s+1GPo9Vq/V7oVK9Vrt+Y/Fm/dbtpTt3l1fu7cdhIinbo6EfygOXxMznAdtTXPnsIJKMCNdn9yT15n90ymTMQ+DXZVGrCvIOAep0SBqrdS+baGXaGxGjJFzJFuqHXHNgj7zFNEyvAMzZjBuJ7rPNPIfqNMy23TRGkh2Ghjvh0PidKpsetrpYS2ECZyIMiop1OEeYCwIGpIia8/GzORF2HaD9UVW3gQxvpxbqMj0VieinkiBqA2UJjdTkBvI10bHJjUW43jEk530hmZEpNCSARs/Z+pJQvVux8ALGXfemykgNJ3yrMvzsJ0Cu9Nzo3tsqZzv3JdJryIB/6lLguAXegHBGk4+RgcdS2i4hXRWqUsNmE9mTkcRmCB6ZxDmWADvbD19Qxj8X2nmo/jUx6U6Rjvr7EhvFNPsncCoRliF6KtRsb0ldbm638QZcFpxRWrfLpwJ7t4n5IE8ECRX0Sx4dOK1JdTaTi1GemjpOYRYSekAE7BDEgsVdnZ8Vg9ZA0deKOENFMq1kx6aiDhOhQvIrL541pYp59kOE+W97GoeRIliAS0SeYmPoNDs4KE+h8YoPwWBUMmBK6JDAg1ScDxhAhNphsw/ZWq6ECq6Ovby7FOUXAFryQJ2RkMhSNB/qrFHBPfTPvNI4iujceyN5XmtafZPeRSXRoVbarDnBUOJR/wAIYJl0F+I0yr4TdUOP/W8RsGs5DsHRD8EDFJVCiBSXG8DcxmgB/jTPwXEjbSGAnidFk6JwDFZC0IxZoU9wWfhgz7A5kmERThC/50QhAPGg4wWeTbsVCNiQzuz2uyzstzedZ5vtj89Xt1+VW3PRemQ9sRqWY72wtq23Vsfas2jlT3Wp+qD6sDaqfa/9qP0soJWF0ue+NfXUfv0FfrMKzA=</latexit> A simple feed-forward architecture logistic fn aka sigmoid σ 1 ■ Let’s predict each z k from x by binary logistic = 1 + e − θ ( x → z ) regression: x k Pr( z k = 1 | x ) = σ ( θ ( x → z ) · x ) k y ■ The weights can be collected into a matrix, Θ ( x ! z ) = [ θ ( x ! z ) , θ ( x ! z ) , . . . , θ ( x ! z ) ] > , z . . . 1 2 K z x . . . 9

  19. <latexit sha1_base64="SWzkVHoh/1moFiZ3e/XQhAxJPE=">AF9nichVRfb9MwEM9aCqMw2ECB14M06SGdVNTkOBl0gRo4gUo0jaG5q5yXKf1FifBcbYGyxKfhDfEK1+HT8DX4PKnW9t1ECnJ+e53d7+7s+1GPo9Vq/V7oVK9Vrt+Y/Fm/dbtpTt3l1fu7cdhIinbo6EfygOXxMznAdtTXPnsIJKMCNdn9yT15n90ymTMQ+DXZVGrCvIOAep0SBqrdS+baGXaGxGjJFzJFuqHXHNgj7zFNEyvAMzZjBuJ7rPNPIfqNMy23TRGkh2Ghjvh0PidKpsetrpYS2ECZyIMiop1OEeYCwIGpIia8/GzORF2HaD9UVW3gQxvpxbqMj0VieinkiBqA2UJjdTkBvI10bHJjUW43jEk530hmZEpNCSARs/Z+pJQvVux8ALGXfemykgNJ3yrMvzsJ0Cu9Nzo3tsqZzv3JdJryIB/6lLguAXegHBGk4+RgcdS2i4hXRWqUsNmE9mTkcRmCB6ZxDmWADvbD19Qxj8X2nmo/jUx6U6Rjvr7EhvFNPsncCoRliF6KtRsb0ldbm638QZcFpxRWrfLpwJ7t4n5IE8ECRX0Sx4dOK1JdTaTi1GemjpOYRYSekAE7BDEgsVdnZ8Vg9ZA0deKOENFMq1kx6aiDhOhQvIrL541pYp59kOE+W97GoeRIliAS0SeYmPoNDs4KE+h8YoPwWBUMmBK6JDAg1ScDxhAhNphsw/ZWq6ECq6Ovby7FOUXAFryQJ2RkMhSNB/qrFHBPfTPvNI4iujceyN5XmtafZPeRSXRoVbarDnBUOJR/wAIYJl0F+I0yr4TdUOP/W8RsGs5DsHRD8EDFJVCiBSXG8DcxmgB/jTPwXEjbSGAnidFk6JwDFZC0IxZoU9wWfhgz7A5kmERThC/50QhAPGg4wWeTbsVCNiQzuz2uyzstzedZ5vtj89Xt1+VW3PRemQ9sRqWY72wtq23Vsfas2jlT3Wp+qD6sDaqfa/9qP0soJWF0ue+NfXUfv0FfrMKzA=</latexit> A simple feed-forward architecture logistic fn aka sigmoid σ 1 ■ Let’s predict each z k from x by binary logistic = 1 + e − θ ( x → z ) regression: x k Pr( z k = 1 | x ) = σ ( θ ( x → z ) · x ) k y ■ The weights can be collected into a matrix, Θ ( x ! z ) = [ θ ( x ! z ) , θ ( x ! z ) , . . . , θ ( x ! z ) ] > , z . . . 1 2 K z ■ so that E[ z ] = σ ( Θ ( x → z ) x ), where σ is applied x . . . element-wise. 9

  20. <latexit sha1_base64="SWzkVHoh/1moFiZ3e/XQhAxJPE=">AF9nichVRfb9MwEM9aCqMw2ECB14M06SGdVNTkOBl0gRo4gUo0jaG5q5yXKf1FifBcbYGyxKfhDfEK1+HT8DX4PKnW9t1ECnJ+e53d7+7s+1GPo9Vq/V7oVK9Vrt+Y/Fm/dbtpTt3l1fu7cdhIinbo6EfygOXxMznAdtTXPnsIJKMCNdn9yT15n90ymTMQ+DXZVGrCvIOAep0SBqrdS+baGXaGxGjJFzJFuqHXHNgj7zFNEyvAMzZjBuJ7rPNPIfqNMy23TRGkh2Ghjvh0PidKpsetrpYS2ECZyIMiop1OEeYCwIGpIia8/GzORF2HaD9UVW3gQxvpxbqMj0VieinkiBqA2UJjdTkBvI10bHJjUW43jEk530hmZEpNCSARs/Z+pJQvVux8ALGXfemykgNJ3yrMvzsJ0Cu9Nzo3tsqZzv3JdJryIB/6lLguAXegHBGk4+RgcdS2i4hXRWqUsNmE9mTkcRmCB6ZxDmWADvbD19Qxj8X2nmo/jUx6U6Rjvr7EhvFNPsncCoRliF6KtRsb0ldbm638QZcFpxRWrfLpwJ7t4n5IE8ECRX0Sx4dOK1JdTaTi1GemjpOYRYSekAE7BDEgsVdnZ8Vg9ZA0deKOENFMq1kx6aiDhOhQvIrL541pYp59kOE+W97GoeRIliAS0SeYmPoNDs4KE+h8YoPwWBUMmBK6JDAg1ScDxhAhNphsw/ZWq6ECq6Ovby7FOUXAFryQJ2RkMhSNB/qrFHBPfTPvNI4iujceyN5XmtafZPeRSXRoVbarDnBUOJR/wAIYJl0F+I0yr4TdUOP/W8RsGs5DsHRD8EDFJVCiBSXG8DcxmgB/jTPwXEjbSGAnidFk6JwDFZC0IxZoU9wWfhgz7A5kmERThC/50QhAPGg4wWeTbsVCNiQzuz2uyzstzedZ5vtj89Xt1+VW3PRemQ9sRqWY72wtq23Vsfas2jlT3Wp+qD6sDaqfa/9qP0soJWF0ue+NfXUfv0FfrMKzA=</latexit> A simple feed-forward architecture logistic fn aka sigmoid σ 1 ■ Let’s predict each z k from x by binary logistic = 1 + e − θ ( x → z ) regression: x k Pr( z k = 1 | x ) = σ ( θ ( x → z ) · x ) k y ■ The weights can be collected into a matrix, Θ ( x ! z ) = [ θ ( x ! z ) , θ ( x ! z ) , . . . , θ ( x ! z ) ] > , z . . . 1 2 K z ■ so that E[ z ] = σ ( Θ ( x → z ) x ), where σ is applied x . . . element-wise. matrix-vector product. dims: [k, V] * [V, 1] = [k, 1] 9

  21. A simple feed-forward architecture y z . . . x . . . 10

  22. A simple feed-forward architecture ■ Next we predict y from z , again using logistic regression (multiclass): | exp( θ ( z → y ) · z + b j ) j Pr( y = j | z ) = j 0 ∈ Y exp( θ ( z → y ) , P · z + b j 0 ) y j 0 Vector of probabilities over each possible y is denoted: z . . . x . . . 10

  23. A simple feed-forward architecture ■ Next we predict y from z , again using logistic additive bias/offset vector regression (multiclass): | exp( θ ( z → y ) · z + b j ) j Pr( y = j | z ) = j 0 ∈ Y exp( θ ( z → y ) , P · z + b j 0 ) y j 0 Vector of probabilities over each possible y is denoted: z . . . x . . . 10

  24. A simple feed-forward architecture ■ Next we predict y from z , again using logistic additive bias/offset vector regression (multiclass): | exp( θ ( z → y ) · z + b j ) j Pr( y = j | z ) = j 0 ∈ Y exp( θ ( z → y ) , P · z + b j 0 ) y j 0 Vector of probabilities over each possible y is denoted: z . . . p ( y | z ) = SoftMax( Θ ( z → y ) z + b ) . x . . . 10

  25. A simple feed-forward architecture y z . . . x . . . 11

  26. A simple feed-forward architecture ■ In reality, we never observe z , it is a hidden layer . We don’t bother predicting 0/1 values for z , we compute it directly from x . y z . . . x . . . 11

  27. A simple feed-forward architecture ■ In reality, we never observe z , it is a hidden layer . We don’t bother predicting 0/1 values for z , we compute it directly from x . ■ This makes p( y | x ) a complex, nonlinear function of x . y z . . . x . . . 11

  28. A simple feed-forward architecture ■ In reality, we never observe z , it is a hidden layer . We don’t bother predicting 0/1 values for z , we compute it directly from x . ■ This makes p( y | x ) a complex, nonlinear function of x . ■ We can have multiple hidden layers z (1) , z (2) , … adding even more expressiveness. y z . . . x . . . 11

  29. <latexit sha1_base64="naie3XV/whZb+QED1FE+NWJT7wI=">AGv3ichVRb9MwFM5WKPcNnjkxTBNathWNQUJXiaNiyYQAoa0G5rbyHGd1lucBMfZmln+T/waJ7gp3By6dZuHUSKcnzOd+5f7MUBT1S7/WtuvnbjZv3Wwu3Gnbv37j9YXHq4l0SpGyXRkEkDzySsICHbFdxFbCDWDIivIDte8dvc/v+CZMJj8IdlcWsK8g5D6nRIHKXap9WMGe0FgNmSKmp5tq1bENwgHzFZEyOkWXzGBcLXS+aeafUa7ltlDWSnYaH2HQ+J0pmxGyuVhDYQJnIgyMjVGcI8RFgQNaQk0N+MmciLMO1H6pqsNtRDm9nFuYqPRWrcDHLETcBsoLE6HnID+dbQkSmMZTj3CJLzPhpD80Kk0JBNUFRqS8J1TvbBl7IuPXZTAFh6JTnU56F3S6xW65zbuxUPZ37Vecq4U8K90eQDswTwgSNMptgCHXscuI14XqVnBLie0JyOP2xA8NM1zyFoFsHM+fEd5/YXQKUL9x6exMu7VMdpZT29Xq7TPYZdjbCK0JltRsY0cMKBMUi8M6YaGNAuZ58KwUnDyfBJTtcZxJ+zhRwAs5dRXeuR7darZku+qN7Zq5zw5IPhqrb23EXl9utdvGgq4JTCctW9Wy7S/Nd3I9oKlioaECS5NBpx6qriVScBgwGkyYsJvSYDNghiCERLOnq4qc3aAU0feRHEt5QoUI76aGJSJMeIDMF5VctuXKWbDVPmvupqHcapYSMtEfhog6D2/QVCfw4ZVkIFAqORQK6JDAotWcM8AlSbSDFlwtR0I1R0deIX2adK8gScJQvZKY2EIGH/mcY+ETzI+swnaCMxok/lmeNZq1/wuOkmtKoHFMDWKNwBDviIbASGFTQaFpdbLDcYwO/Y7ALyT5BgV9iJomKJFRS3lMGdjPAT3Au/gsJf8QYCeJ0W7oAJrJRxDFLNSmvPaCKGHYG8gojacKvuJfFAoBiA8TL/Fs2q1EACGdy/S7Kux1Ws7zVufri+XNxU1F6zH1lOraTnWS2vTem9tW7sWrf2o/az9rv2pv64P6mE9LqHzc5XPI2vqWd/AS9EVTg=</latexit> A simple feed-forward architecture ■ In reality, we never observe z , it is a hidden layer . We don’t bother predicting 0/1 values for z , we compute it directly from x . ■ This makes p( y | x ) a complex, nonlinear function of x . ■ We can have multiple hidden layers z (1) , z (2) , … adding even more expressiveness. ■ To summarize: y z = σ ( Θ ( x → z ) x ) z . . . p ( y | z ) = SoftMax( Θ ( z → y ) z + b ) . where x . . . i T h σ ( θ ( x → z ) · x ), σ ( θ ( x → z ) · x ), ..., σ ( θ ( x → z ) σ ( Θ ( x → z ) x ) = · x ) 1 2 K z 11

  30. Activation functions 12

  31. Activation functions ■ The sigmoid in is called an activation function . in z = σ ( Θ ( x → z ) x ) i 12

  32. Activation functions ■ The sigmoid in is called an activation function . in z = σ ( Θ ( x → z ) x ) i ■ In general, we can write to indicate an arbitrary activation function. e z = f ( Θ ( x → z ) x ) t 12

  33. Activation functions ■ The sigmoid in is called an activation function . in z = σ ( Θ ( x → z ) x ) i ■ In general, we can write to indicate an arbitrary activation function. e z = f ( Θ ( x → z ) x ) t ■ Other choices include: 12

  34. Activation functions ■ The sigmoid in is called an activation function . in z = σ ( Θ ( x → z ) x ) i ■ In general, we can write to indicate an arbitrary activation function. e z = f ( Θ ( x → z ) x ) t ■ Other choices include: ■ Hyperbolic tangent : tanh, centered at 0, helps avoid saturation 12

  35. Activation functions ■ The sigmoid in is called an activation function . in z = σ ( Θ ( x → z ) x ) i ■ In general, we can write to indicate an arbitrary activation function. e z = f ( Θ ( x → z ) x ) t ■ Other choices include: ■ Hyperbolic tangent : tanh, centered at 0, helps avoid saturation ■ Rectified linear unit : ReLU( a ) = max(0, a ), which is fast to evaluate, easy to analyze, even further avoids saturation. 12

  36. Activation functions ■ The sigmoid in is called an activation function . in z = σ ( Θ ( x → z ) x ) i ■ In general, we can write to indicate an arbitrary activation function. e z = f ( Θ ( x → z ) x ) t ■ Other choices include: ■ Hyperbolic tangent : tanh, centered at 0, helps avoid saturation ■ Rectified linear unit : ReLU( a ) = max(0, a ), which is fast to evaluate, easy to analyze, even further avoids saturation. ■ Leaky ReLU : ( a ≥ 0 a, ) = . 0001 a, otherwise . 12

  37. Training neural networks Gradient descent 13

  38. Training neural networks Gradient descent ■ In general, neural networks are learned by gradient descent, using minibatches: θ ( z ! y ) θ ( z ! y ) � ⌘ ( t ) r θ ( z → y ) ` ( i ) , k k k where 13

  39. Training neural networks Gradient descent ■ In general, neural networks are learned by gradient descent, using minibatches: θ ( z ! y ) θ ( z ! y ) � ⌘ ( t ) r θ ( z → y ) ` ( i ) , k k k where I ⌘ ( t ) ■ is the learning rate at update t 13

  40. Training neural networks Gradient descent ■ In general, neural networks are learned by gradient descent, using minibatches: θ ( z ! y ) θ ( z ! y ) � ⌘ ( t ) r θ ( z → y ) ` ( i ) , k k k where I ⌘ ( t ) ■ is the learning rate at update t I ` ( i ) ■ is the loss on instance (minibatch) i 13

  41. Training neural networks Gradient descent ■ In general, neural networks are learned by gradient descent, using minibatches: θ ( z ! y ) θ ( z ! y ) � ⌘ ( t ) r θ ( z → y ) ` ( i ) , k k k where I ⌘ ( t ) ■ is the learning rate at update t I ` ( i ) ■ is the loss on instance (minibatch) i ` ( i ) I r θ ( z → y ) ■ is the gradient of the loss with respect to the output weights ts θ ( z ! y ) , k k   @` ( i ) @` ( i ) @` ( i ) ` ( i ) = r θ ( z → y )   @✓ ( z ! y ) , @✓ ( z ! y ) , . . . , @✓ ( z ! y ) k k , 1 k , 2 k , K y 13

  42. Training neural networks Backpropagation 14

  43. Training neural networks Backpropagation ■ If we don’t observe z , how can we learn the weights ? ts Θ ( x ! z ) 14

  44. Training neural networks Backpropagation ■ If we don’t observe z , how can we learn the weights ? ts Θ ( x ! z ) ■ Backpropagation : compute a loss on y , and apply the chain rule from calculus to compute a gradient on all parameters. 14

  45. Training neural networks Backpropagation ■ If we don’t observe z , how can we learn the weights ? ts Θ ( x ! z ) ■ Backpropagation : compute a loss on y , and apply the chain rule from calculus to compute a gradient on all parameters. ■ Backpropagation as an algorithm: construct a directed acyclic computation graph with nodes for inputs, outputs, hidden layers, parameters. 14

  46. Training neural networks Backpropagation ■ Backpropagation as an algorithm: construct a directed acyclic computation graph with nodes for inputs, outputs, hidden layers, parameters. v z v ˆ v y v x y y ( i ) x ( i ) ` ( i ) ˆ z y g z g � g � g ˆ y g ˆ g z y v Θ v Θ Θ 15

  47. Training neural networks Backpropagation ■ Backpropagation as an algorithm: construct a directed acyclic computation graph with nodes for inputs, outputs, hidden layers, parameters. ■ Forward pass : values (e.g. v x ) go from parents to children v z v ˆ v y v x y y ( i ) x ( i ) ` ( i ) ˆ z y g z g � g � g ˆ y g ˆ g z y v Θ v Θ Θ 15

  48. Training neural networks Backpropagation ■ Backpropagation as an algorithm: construct a directed acyclic computation graph with nodes for inputs, outputs, hidden layers, parameters. ■ Forward pass : values (e.g. v x ) go from parents to children ■ Backward pass : gradients (e.g. g z ) go from children to parents, implementing the chain rule v z v ˆ v y v x y y ( i ) x ( i ) ` ( i ) ˆ z y g z g � g � g ˆ y g ˆ g z y v Θ v Θ Θ 15

  49. Training neural networks Backpropagation ■ Backpropagation as an algorithm: construct a directed acyclic computation graph with nodes for inputs, outputs, hidden layers, parameters. ■ Forward pass : values (e.g. v x ) go from parents to children ■ Backward pass : gradients (e.g. g z ) go from children to parents, implementing the chain rule ■ As long as the gradient is implemented for a layer/operation, you can add it to the graph, and let automatic di ff erentiation compute updates for every layer. v z v ˆ v y v x y y ( i ) x ( i ) ` ( i ) ˆ z y g z g � g � g ˆ y g ˆ g z y v Θ v Θ Θ 15

  50. How to represent text for classification? Another choice of R : word embeddings 16

  51. How to represent text for classification? Another choice of R : word embeddings ■ Text is naturally viewed as a sequence of tokens w 1 , w 2 , …, w T 16

  52. How to represent text for classification? Another choice of R : word embeddings ■ Text is naturally viewed as a sequence of tokens w 1 , w 2 , …, w T ■ Context is lost when this sequence is converted to a bag-of-words. 16

  53. How to represent text for classification? Another choice of R : word embeddings ■ Text is naturally viewed as a sequence of tokens w 1 , w 2 , …, w T ■ Context is lost when this sequence is converted to a bag-of-words. ■ Instead, a lookup layer can compute embeddings (real-valued vectors) for n, resulting in a m each type, resulting in a matrix where X (0) ∈ R K e × M . -1.36 1.77 0.71 -0.25 0.11 -1.36 0.03 0.71 -0.45 -0.23 -0.58 1.43 -1.27 -0.71 -0.23 0.69 1.43 1.88 0.84 -0.33 0.11 1.36 -1.08 0.84 0.14 0.11 -0.18 … … … … … … … … … -0.067 0.93 -5.6 0.74 -0.07 -0.067 -0.36 -5.6 -1.58 s g e e e t d s e k r n u h r o n e h n e = o b w t c w a t i w r r a l t d b s t 16

  54. Evaluating classifiers 17

  55. Evaluating your classifier 18

  56. Evaluating your classifier ■ Want to predict future performance, on unseen data. 18

  57. Evaluating your classifier ■ Want to predict future performance, on unseen data. ■ It’s hard to predict the future. Do not evaluate on data that was already used for: 18

  58. Evaluating your classifier ■ Want to predict future performance, on unseen data. ■ It’s hard to predict the future. Do not evaluate on data that was already used for: ■ training 18

  59. Evaluating your classifier ■ Want to predict future performance, on unseen data. ■ It’s hard to predict the future. Do not evaluate on data that was already used for: ■ training ■ hyperparameter selection 18

  60. Evaluating your classifier ■ Want to predict future performance, on unseen data. ■ It’s hard to predict the future. Do not evaluate on data that was already used for: ■ training ■ hyperparameter selection ■ selecting classification model, model structure 18

  61. Evaluating your classifier ■ Want to predict future performance, on unseen data. ■ It’s hard to predict the future. Do not evaluate on data that was already used for: ■ training ■ hyperparameter selection ■ selecting classification model, model structure ■ preprocessing decisions, such as vocabulary selection 18

  62. Evaluating your classifier ■ Want to predict future performance, on unseen data. ■ It’s hard to predict the future. Do not evaluate on data that was already used for: ■ training ■ hyperparameter selection ■ selecting classification model, model structure ■ preprocessing decisions, such as vocabulary selection ■ Even if you follow all these rules, you will probably still over-estimate your classifier’s performance, because real future data will differ from your test set in ways that you cannot anticipate. 18

  63. Accuracy 19

  64. Accuracy ■ Most basic metric: accuracy . How often is the classifier right? N y ) = 1 δ ( y ( i ) = ˆ X acc( y , ˆ y ) . N i =1 The problem with accuracy is rare labels , also known as class imbalance . 19

  65. Accuracy ■ Most basic metric: accuracy . How often is the classifier right? N y ) = 1 δ ( y ( i ) = ˆ X acc( y , ˆ y ) . N i =1 The problem with accuracy is rare labels , also known as class imbalance . 19

  66. Accuracy ■ Most basic metric: accuracy . How often is the classifier right? N y ) = 1 δ ( y ( i ) = ˆ X acc( y , ˆ y ) . N i =1 The problem with accuracy is rare labels , also known as class imbalance . ■ Consider a system for detecting whether a tweet is written in Telugu. 19

  67. Accuracy ■ Most basic metric: accuracy . How often is the classifier right? N y ) = 1 δ ( y ( i ) = ˆ X acc( y , ˆ y ) . N i =1 The problem with accuracy is rare labels , also known as class imbalance . ■ Consider a system for detecting whether a tweet is written in Telugu. ■ 0.3% of tweets are written in Telugu [Bergsma et al. 2012]. 19

  68. Accuracy ■ Most basic metric: accuracy . How often is the classifier right? N y ) = 1 δ ( y ( i ) = ˆ X acc( y , ˆ y ) . N i =1 The problem with accuracy is rare labels , also known as class imbalance . ■ Consider a system for detecting whether a tweet is written in Telugu. ■ 0.3% of tweets are written in Telugu [Bergsma et al. 2012]. ■ A system that says ŷ = NotTelugu 100% of the time is 99.7% accurate. 19

  69. Beyond “right” and “wrong” correct labels predicted labels 20

  70. Beyond “right” and “wrong” correct labels ■ For any label, there are two ways to be “wrong:” predicted labels 20

  71. Beyond “right” and “wrong” correct labels ■ For any label, there are two ways to be “wrong:” ■ False positive : the system incorrectly predicts the label. predicted labels 20

  72. Beyond “right” and “wrong” correct labels ■ For any label, there are two ways to be “wrong:” ■ False positive : the system incorrectly predicts the label. ■ False negative : the system incorrectly fails to predict the label. predicted labels 20

  73. Beyond “right” and “wrong” correct labels ■ For any label, there are two ways to be “wrong:” ■ False positive : the system incorrectly predicts the label. ■ False negative : the system incorrectly fails to predict the label. ■ Similarly, there are two ways to be “right:” predicted labels 20

  74. Beyond “right” and “wrong” correct labels ■ For any label, there are two ways to be “wrong:” ■ False positive : the system incorrectly predicts the label. ■ False negative : the system incorrectly fails to predict the label. ■ Similarly, there are two ways to be “right:” ■ True positive : the system correctly predicts the label. predicted labels 20

  75. Beyond “right” and “wrong” correct labels ■ For any label, there are two ways to be “wrong:” ■ False positive : the system incorrectly predicts the label. ■ False negative : the system incorrectly fails to predict the label. ■ Similarly, there are two ways to be “right:” ■ True positive : the system correctly predicts the label. ■ True negative : the system correctly predicts that the label does not apply to this instance. predicted labels 20

  76. Precision and recall correct labels predicted labels 21

  77. <latexit sha1_base64="L312lWJX2ZlV0ijiKCWHOpePN+Y=">AEsXicfVJdTxQxFJ3BVXH9An30pUpIdgXJLphoYkiIGuOLigmfoeva6XR2Cu10naASdM/46/xVd/8N975QFlAm3R65tzb3tPbE+WCGzsY/ApnrnWu37g5e6t7+87de/fn5h/sGFVoyrapEkrvRcQwTO2bkVbC/XjMhIsN3o6E0V3z1m2nCVbdkyZyNJhlPOCUWqPF8+GoR9JhmzJL/BfXs0vDvkdYsMQSrdUJuhCG4FLNJb5XLacVy/t+GZUN6KNnV8dxSqwrfb+72CK0jDRE0lOx65EmGcIS2JTSoTb9/5cXYRprOw/qvZBD+2Vf/b87Es/LiEGnkPctbRGZ2n3EO9ZXTo62Bz3PgQivMYNam1Di2dZiBG1EITajb2vQwoeC7jxU5nlsYrAzqgS6DYQsWgnZsjudnRjhWtJAs1QYw6Gg9yOHNGWU8F8FxeG5YQekQk7AJgRyczI1c/s0SIwMUqUhplZVLPndzgijSlBJmVfnMxVpFXxQ4Km7wcOZ7lhWUZbQolhUBWocozKObQCtKAIRqDloRTQk0xIKzoK/nyqRMHDM7fREqR84kdfUpSZGEf80ydkKVlCSLnzqcEMlFGbOEFMJ6h01yhq9qzXJ8zHPTdum0aVMXvGux0nzCM3g8HFt5mkaltTi+tvFbxm8hWYfQOCnGlilQYljTM9vM0EP8YV/F8mz/5kApy+lqsFwGWqFqicZc43RhfKMBxNtCryKcGX9tdC4QCSQMebfDa9rckAQw4v2u8y2FldGa6trH5+vrDxurXmbPAoeBL0gmHwItgI3gebwXZAw2/h9/BH+LOz1tnvfO1ETepM2O5GEyNztFvrTGYbg=</latexit> Precision and recall ■ Recall : fraction of positive instances that were correct labels correctly classified. TP recall = TP + FN = predicted labels 21

  78. <latexit sha1_base64="L312lWJX2ZlV0ijiKCWHOpePN+Y=">AEsXicfVJdTxQxFJ3BVXH9An30pUpIdgXJLphoYkiIGuOLigmfoeva6XR2Cu10naASdM/46/xVd/8N975QFlAm3R65tzb3tPbE+WCGzsY/ApnrnWu37g5e6t7+87de/fn5h/sGFVoyrapEkrvRcQwTO2bkVbC/XjMhIsN3o6E0V3z1m2nCVbdkyZyNJhlPOCUWqPF8+GoR9JhmzJL/BfXs0vDvkdYsMQSrdUJuhCG4FLNJb5XLacVy/t+GZUN6KNnV8dxSqwrfb+72CK0jDRE0lOx65EmGcIS2JTSoTb9/5cXYRprOw/qvZBD+2Vf/b87Es/LiEGnkPctbRGZ2n3EO9ZXTo62Bz3PgQivMYNam1Di2dZiBG1EITajb2vQwoeC7jxU5nlsYrAzqgS6DYQsWgnZsjudnRjhWtJAs1QYw6Gg9yOHNGWU8F8FxeG5YQekQk7AJgRyczI1c/s0SIwMUqUhplZVLPndzgijSlBJmVfnMxVpFXxQ4Km7wcOZ7lhWUZbQolhUBWocozKObQCtKAIRqDloRTQk0xIKzoK/nyqRMHDM7fREqR84kdfUpSZGEf80ydkKVlCSLnzqcEMlFGbOEFMJ6h01yhq9qzXJ8zHPTdum0aVMXvGux0nzCM3g8HFt5mkaltTi+tvFbxm8hWYfQOCnGlilQYljTM9vM0EP8YV/F8mz/5kApy+lqsFwGWqFqicZc43RhfKMBxNtCryKcGX9tdC4QCSQMebfDa9rckAQw4v2u8y2FldGa6trH5+vrDxurXmbPAoeBL0gmHwItgI3gebwXZAw2/h9/BH+LOz1tnvfO1ETepM2O5GEyNztFvrTGYbg=</latexit> Precision and recall ■ Recall : fraction of positive instances that were correct labels correctly classified. TP recall = TP + FN = ■ The “never Telugu” classifier has 0 recall. predicted labels 21

  79. <latexit sha1_base64="L312lWJX2ZlV0ijiKCWHOpePN+Y=">AEsXicfVJdTxQxFJ3BVXH9An30pUpIdgXJLphoYkiIGuOLigmfoeva6XR2Cu10naASdM/46/xVd/8N975QFlAm3R65tzb3tPbE+WCGzsY/ApnrnWu37g5e6t7+87de/fn5h/sGFVoyrapEkrvRcQwTO2bkVbC/XjMhIsN3o6E0V3z1m2nCVbdkyZyNJhlPOCUWqPF8+GoR9JhmzJL/BfXs0vDvkdYsMQSrdUJuhCG4FLNJb5XLacVy/t+GZUN6KNnV8dxSqwrfb+72CK0jDRE0lOx65EmGcIS2JTSoTb9/5cXYRprOw/qvZBD+2Vf/b87Es/LiEGnkPctbRGZ2n3EO9ZXTo62Bz3PgQivMYNam1Di2dZiBG1EITajb2vQwoeC7jxU5nlsYrAzqgS6DYQsWgnZsjudnRjhWtJAs1QYw6Gg9yOHNGWU8F8FxeG5YQekQk7AJgRyczI1c/s0SIwMUqUhplZVLPndzgijSlBJmVfnMxVpFXxQ4Km7wcOZ7lhWUZbQolhUBWocozKObQCtKAIRqDloRTQk0xIKzoK/nyqRMHDM7fREqR84kdfUpSZGEf80ydkKVlCSLnzqcEMlFGbOEFMJ6h01yhq9qzXJ8zHPTdum0aVMXvGux0nzCM3g8HFt5mkaltTi+tvFbxm8hWYfQOCnGlilQYljTM9vM0EP8YV/F8mz/5kApy+lqsFwGWqFqicZc43RhfKMBxNtCryKcGX9tdC4QCSQMebfDa9rckAQw4v2u8y2FldGa6trH5+vrDxurXmbPAoeBL0gmHwItgI3gebwXZAw2/h9/BH+LOz1tnvfO1ETepM2O5GEyNztFvrTGYbg=</latexit> Precision and recall ■ Recall : fraction of positive instances that were correct labels correctly classified. TP recall = TP + FN = ■ The “never Telugu” classifier has 0 recall. ■ The “always Telugu” classifier has perfect recall. predicted labels 21

  80. <latexit sha1_base64="L312lWJX2ZlV0ijiKCWHOpePN+Y=">AEsXicfVJdTxQxFJ3BVXH9An30pUpIdgXJLphoYkiIGuOLigmfoeva6XR2Cu10naASdM/46/xVd/8N975QFlAm3R65tzb3tPbE+WCGzsY/ApnrnWu37g5e6t7+87de/fn5h/sGFVoyrapEkrvRcQwTO2bkVbC/XjMhIsN3o6E0V3z1m2nCVbdkyZyNJhlPOCUWqPF8+GoR9JhmzJL/BfXs0vDvkdYsMQSrdUJuhCG4FLNJb5XLacVy/t+GZUN6KNnV8dxSqwrfb+72CK0jDRE0lOx65EmGcIS2JTSoTb9/5cXYRprOw/qvZBD+2Vf/b87Es/LiEGnkPctbRGZ2n3EO9ZXTo62Bz3PgQivMYNam1Di2dZiBG1EITajb2vQwoeC7jxU5nlsYrAzqgS6DYQsWgnZsjudnRjhWtJAs1QYw6Gg9yOHNGWU8F8FxeG5YQekQk7AJgRyczI1c/s0SIwMUqUhplZVLPndzgijSlBJmVfnMxVpFXxQ4Km7wcOZ7lhWUZbQolhUBWocozKObQCtKAIRqDloRTQk0xIKzoK/nyqRMHDM7fREqR84kdfUpSZGEf80ydkKVlCSLnzqcEMlFGbOEFMJ6h01yhq9qzXJ8zHPTdum0aVMXvGux0nzCM3g8HFt5mkaltTi+tvFbxm8hWYfQOCnGlilQYljTM9vM0EP8YV/F8mz/5kApy+lqsFwGWqFqicZc43RhfKMBxNtCryKcGX9tdC4QCSQMebfDa9rckAQw4v2u8y2FldGa6trH5+vrDxurXmbPAoeBL0gmHwItgI3gebwXZAw2/h9/BH+LOz1tnvfO1ETepM2O5GEyNztFvrTGYbg=</latexit> <latexit sha1_base64="sCHI9VBUq0LibUzZDHZL9e36d8=">AE3XicfVPb9MwFE5GgVF+bXDkYpgmtWxU7UCy6QJEOICFGm/0Fwqx3Fab3Yc2c62yPKRG+LKf8Wd/4MriJekhXUrWEr8r3Pfp8/v0SZ4MZ2u9/DhUuNy1euLl5rXr9x89btpeU7u0blmrIdqoTS+xExTPCU7VhuBdvPNCMyEmwvOnpR5veOmTZcpdu2yNhAklHKE06JBWi4HI5WcSQdtmNmif/oWnat1/YIC5ZYorU6QefSkFyrsMS3yum0RHnbr6OiDtro0fw8HhPrCt9urk4itIkw0SNJToeuQJinCEtix5QI98H7M3URprGy/6jaBj20Vfz9nuyPZe6HBdTIWsDZRFM4G3MP9dbRoa+S9XbDQyjOYzSlkK0dJqBGlEpTShbrv4YGKr96WYHPKA8pL02eR+2X4HBpdvpVgNdDHqTYCWYjP5weWGAY0VzyVJLBTHmoNfN7MARbTkVzDdxblhG6BEZsQMIUyKZGbiqJTxaBSRGidLwpBZV6NkVjkhjChkBszyCOZ8rwXm5g9wmzwaOp1luWUrQkukFWo7C8Uc7DCigICQjUHrYiOCRhioQvB2DNlxkwcMzt7ECoHziRV9RlJkYRvzVJ2QpWUJI0fOpwQyURs4TkwnqHTKN51mzHh/zExcOq1takKfW6w0H/EU7hl6vmr8WRimscXVu4lfMrgLzd6AwHcZ08QqDUrqLvZwNyN8H5fh/5g8/cOEcPZYrhIAhyktUBlLna9/CqEMw9FIqzybEXxhfSUNiAJOF7z2eymgEN2TvfheD3Y1O73Fn4/2Tla3nk9ZcDO4FD4JW0AueBlvB6Af7AQ0/Bb+CH+GvxrDxqfG58aXmroQTtbcDWZG4+tvpRipKQ=</latexit> Precision and recall ■ Recall : fraction of positive instances that were correct labels correctly classified. TP recall = TP + FN = ■ The “never Telugu” classifier has 0 recall. ■ The “always Telugu” classifier has perfect recall. ■ Precision : fraction of positive predictions that were correct. TP precision = TP + FP = predicted labels 21

  81. <latexit sha1_base64="sCHI9VBUq0LibUzZDHZL9e36d8=">AE3XicfVPb9MwFE5GgVF+bXDkYpgmtWxU7UCy6QJEOICFGm/0Fwqx3Fab3Yc2c62yPKRG+LKf8Wd/4MriJekhXUrWEr8r3Pfp8/v0SZ4MZ2u9/DhUuNy1euLl5rXr9x89btpeU7u0blmrIdqoTS+xExTPCU7VhuBdvPNCMyEmwvOnpR5veOmTZcpdu2yNhAklHKE06JBWi4HI5WcSQdtmNmif/oWnat1/YIC5ZYorU6QefSkFyrsMS3yum0RHnbr6OiDtro0fw8HhPrCt9urk4itIkw0SNJToeuQJinCEtix5QI98H7M3URprGy/6jaBj20Vfz9nuyPZe6HBdTIWsDZRFM4G3MP9dbRoa+S9XbDQyjOYzSlkK0dJqBGlEpTShbrv4YGKr96WYHPKA8pL02eR+2X4HBpdvpVgNdDHqTYCWYjP5weWGAY0VzyVJLBTHmoNfN7MARbTkVzDdxblhG6BEZsQMIUyKZGbiqJTxaBSRGidLwpBZV6NkVjkhjChkBszyCOZ8rwXm5g9wmzwaOp1luWUrQkukFWo7C8Uc7DCigICQjUHrYiOCRhioQvB2DNlxkwcMzt7ECoHziRV9RlJkYRvzVJ2QpWUJI0fOpwQyURs4TkwnqHTKN51mzHh/zExcOq1takKfW6w0H/EU7hl6vmr8WRimscXVu4lfMrgLzd6AwHcZ08QqDUrqLvZwNyN8H5fh/5g8/cOEcPZYrhIAhyktUBlLna9/CqEMw9FIqzybEXxhfSUNiAJOF7z2eymgEN2TvfheD3Y1O73Fn4/2Tla3nk9ZcDO4FD4JW0AueBlvB6Af7AQ0/Bb+CH+GvxrDxqfG58aXmroQTtbcDWZG4+tvpRipKQ=</latexit> <latexit sha1_base64="L312lWJX2ZlV0ijiKCWHOpePN+Y=">AEsXicfVJdTxQxFJ3BVXH9An30pUpIdgXJLphoYkiIGuOLigmfoeva6XR2Cu10naASdM/46/xVd/8N975QFlAm3R65tzb3tPbE+WCGzsY/ApnrnWu37g5e6t7+87de/fn5h/sGFVoyrapEkrvRcQwTO2bkVbC/XjMhIsN3o6E0V3z1m2nCVbdkyZyNJhlPOCUWqPF8+GoR9JhmzJL/BfXs0vDvkdYsMQSrdUJuhCG4FLNJb5XLacVy/t+GZUN6KNnV8dxSqwrfb+72CK0jDRE0lOx65EmGcIS2JTSoTb9/5cXYRprOw/qvZBD+2Vf/b87Es/LiEGnkPctbRGZ2n3EO9ZXTo62Bz3PgQivMYNam1Di2dZiBG1EITajb2vQwoeC7jxU5nlsYrAzqgS6DYQsWgnZsjudnRjhWtJAs1QYw6Gg9yOHNGWU8F8FxeG5YQekQk7AJgRyczI1c/s0SIwMUqUhplZVLPndzgijSlBJmVfnMxVpFXxQ4Km7wcOZ7lhWUZbQolhUBWocozKObQCtKAIRqDloRTQk0xIKzoK/nyqRMHDM7fREqR84kdfUpSZGEf80ydkKVlCSLnzqcEMlFGbOEFMJ6h01yhq9qzXJ8zHPTdum0aVMXvGux0nzCM3g8HFt5mkaltTi+tvFbxm8hWYfQOCnGlilQYljTM9vM0EP8YV/F8mz/5kApy+lqsFwGWqFqicZc43RhfKMBxNtCryKcGX9tdC4QCSQMebfDa9rckAQw4v2u8y2FldGa6trH5+vrDxurXmbPAoeBL0gmHwItgI3gebwXZAw2/h9/BH+LOz1tnvfO1ETepM2O5GEyNztFvrTGYbg=</latexit> Precision and recall ■ Recall : fraction of positive instances that were correct labels correctly classified. TP recall = TP + FN = ■ The “never Telugu” classifier has 0 recall. ■ The “always Telugu” classifier has perfect recall. ■ Precision : fraction of positive predictions that were correct. TP precision = TP + FP = ■ The “never Telugu” classifier 0 precision. predicted labels 21

  82. <latexit sha1_base64="sCHI9VBUq0LibUzZDHZL9e36d8=">AE3XicfVPb9MwFE5GgVF+bXDkYpgmtWxU7UCy6QJEOICFGm/0Fwqx3Fab3Yc2c62yPKRG+LKf8Wd/4MriJekhXUrWEr8r3Pfp8/v0SZ4MZ2u9/DhUuNy1euLl5rXr9x89btpeU7u0blmrIdqoTS+xExTPCU7VhuBdvPNCMyEmwvOnpR5veOmTZcpdu2yNhAklHKE06JBWi4HI5WcSQdtmNmif/oWnat1/YIC5ZYorU6QefSkFyrsMS3yum0RHnbr6OiDtro0fw8HhPrCt9urk4itIkw0SNJToeuQJinCEtix5QI98H7M3URprGy/6jaBj20Vfz9nuyPZe6HBdTIWsDZRFM4G3MP9dbRoa+S9XbDQyjOYzSlkK0dJqBGlEpTShbrv4YGKr96WYHPKA8pL02eR+2X4HBpdvpVgNdDHqTYCWYjP5weWGAY0VzyVJLBTHmoNfN7MARbTkVzDdxblhG6BEZsQMIUyKZGbiqJTxaBSRGidLwpBZV6NkVjkhjChkBszyCOZ8rwXm5g9wmzwaOp1luWUrQkukFWo7C8Uc7DCigICQjUHrYiOCRhioQvB2DNlxkwcMzt7ECoHziRV9RlJkYRvzVJ2QpWUJI0fOpwQyURs4TkwnqHTKN51mzHh/zExcOq1takKfW6w0H/EU7hl6vmr8WRimscXVu4lfMrgLzd6AwHcZ08QqDUrqLvZwNyN8H5fh/5g8/cOEcPZYrhIAhyktUBlLna9/CqEMw9FIqzybEXxhfSUNiAJOF7z2eymgEN2TvfheD3Y1O73Fn4/2Tla3nk9ZcDO4FD4JW0AueBlvB6Af7AQ0/Bb+CH+GvxrDxqfG58aXmroQTtbcDWZG4+tvpRipKQ=</latexit> <latexit sha1_base64="L312lWJX2ZlV0ijiKCWHOpePN+Y=">AEsXicfVJdTxQxFJ3BVXH9An30pUpIdgXJLphoYkiIGuOLigmfoeva6XR2Cu10naASdM/46/xVd/8N975QFlAm3R65tzb3tPbE+WCGzsY/ApnrnWu37g5e6t7+87de/fn5h/sGFVoyrapEkrvRcQwTO2bkVbC/XjMhIsN3o6E0V3z1m2nCVbdkyZyNJhlPOCUWqPF8+GoR9JhmzJL/BfXs0vDvkdYsMQSrdUJuhCG4FLNJb5XLacVy/t+GZUN6KNnV8dxSqwrfb+72CK0jDRE0lOx65EmGcIS2JTSoTb9/5cXYRprOw/qvZBD+2Vf/b87Es/LiEGnkPctbRGZ2n3EO9ZXTo62Bz3PgQivMYNam1Di2dZiBG1EITajb2vQwoeC7jxU5nlsYrAzqgS6DYQsWgnZsjudnRjhWtJAs1QYw6Gg9yOHNGWU8F8FxeG5YQekQk7AJgRyczI1c/s0SIwMUqUhplZVLPndzgijSlBJmVfnMxVpFXxQ4Km7wcOZ7lhWUZbQolhUBWocozKObQCtKAIRqDloRTQk0xIKzoK/nyqRMHDM7fREqR84kdfUpSZGEf80ydkKVlCSLnzqcEMlFGbOEFMJ6h01yhq9qzXJ8zHPTdum0aVMXvGux0nzCM3g8HFt5mkaltTi+tvFbxm8hWYfQOCnGlilQYljTM9vM0EP8YV/F8mz/5kApy+lqsFwGWqFqicZc43RhfKMBxNtCryKcGX9tdC4QCSQMebfDa9rckAQw4v2u8y2FldGa6trH5+vrDxurXmbPAoeBL0gmHwItgI3gebwXZAw2/h9/BH+LOz1tnvfO1ETepM2O5GEyNztFvrTGYbg=</latexit> Precision and recall ■ Recall : fraction of positive instances that were correct labels correctly classified. TP recall = TP + FN = ■ The “never Telugu” classifier has 0 recall. ■ The “always Telugu” classifier has perfect recall. ■ Precision : fraction of positive predictions that were correct. TP precision = TP + FP = ■ The “never Telugu” classifier 0 precision. predicted labels ■ The “always Telugu” classifier has 0.003 precision. 21

  83. Combining precision and recall 22

  84. Combining precision and recall ■ Inherent tradeoff between precision and recall. Choice is problem-specific. 22

  85. Combining precision and recall ■ Inherent tradeoff between precision and recall. Choice is problem-specific. ■ For a preliminary medical diagnosis, we might prefer high recall. False positives can be screened out later. 22

  86. Combining precision and recall ■ Inherent tradeoff between precision and recall. Choice is problem-specific. ■ For a preliminary medical diagnosis, we might prefer high recall. False positives can be screened out later. ■ The “beyond reasonable doubt” standard of U.S. criminal law implies a preference for high precision. 22

  87. <latexit sha1_base64="gluqdi4WIqSk7wgy9umHKkSkbMU=">AFHnicfVNLb9QwE6XBcryaAtHLoaq0i6UareA4FKoAFVcgEXqC9VL5DjOrls7jhynbWT5v3Dkl3BDXOHfMHn0kXbBUuLxzDeb8YzQSJ4avr9PzOtK+2r167P3ujcvHX7ztz8wt3tVGWasi2qhNK7AUmZ4DHbMtwItptoRmQg2E5w8Law7xwynXIVb5o8YSNJxjGPOCUGVP7CzPclHEiLzYQZ4r7arnk86DmEBYsM0VodoQtmMD4udZHrFtxoeU9t4zySuihJ9PteEKMzV2vs1RLaA1hoseSHPs2R5jHCEtiJpQI+8W5c3ERpqEy/4jaAz60m5+d6/uxzJyfQ4ykC5g1dKJOJtxBvGW070pjdZ2/D8F5iE6gBREtrWbARpRMI02o3Rw6+CDixkfXAELRKS+qPA07LEb/uDUtlqndOpWn+t4Z9eBe63z5xf7K/1yocvCoBYWvXoN/YXWCIeKZpLFhgqSpnuDfmJGlmjDqWCug7OUJYQekDHbAzEmkqUjW/aUQ0ugCVGkNHyxQaX2vIclMk1zGQCyKEF60VYop9n2MhO9HFkeJ5lhMa0CRZlARqGiQVHIWEjchAI1Ry4IjohUDQDbQwFPxdmwsQhM81EqBzZNCqjNygFEs6axeyIKilJHD6yOCKSizxkEcmEcRan0Yk8rTL4SFP0rpKx1WZOjAoBivNxzyGR4KhKSenqYZtYnD57+B3DN5Csw9A8FPCNDFKA5NqDBy8zRg/wIX4PySPT5EgNtOyJQFIpiBSlhsXTVQqUMB2OtsqRB+J/SRQuIBFUvMKzpluFgIYcXGy/y8L26srg6crq52eL62/q1pz17nsPva438F546957b+htebQ13retV63f7W/tH+2f5VQVsztc89r7Hav/8CIW+yA=</latexit> Combining precision and recall ■ Inherent tradeoff between precision and recall. Choice is problem-specific. ■ For a preliminary medical diagnosis, we might prefer high recall. False positives can be screened out later. ■ The “beyond reasonable doubt” standard of U.S. criminal law implies a preference for high precision. ■ Most often, we weight them equally using F 1 measure : harmonic mean of precision and recall. F 1 = 2 · precision · recall precision + recall 22

  88. <latexit sha1_base64="5hf/30datH8Q35WsniBmRcgzYK4=">AFzXichVRLb9NAEHYDgRJeLRy5LFSVbBqJCDBpVIFqOICDVJfqBui9XqdbOu1zXrd1lqWK/+K/8GdK/wGxo+0cZqCJcezM9/MfDuPuHAE9Xp/FxoXLvevHFz8Vbr9p279+4vLT/YS6JUrZLoyCSBy5JWMBDtqu4CthBLBkRbsD23eM3uX3/hMmER+GOymI2EGQUcp9TokA1XG7sr2JXaKzGTBHzWdtqresYhAPmKyJldIpmzGBcK3S+sfPWa7ljmjrBQc9Gy+HY+J0plxWquVhDYQJnIkyNlQZwjzEGFB1JiSQH8yZiovwtSL1BVZHeBD7eziXMXHIjXDHLENmA20EQdj7mBfG10ZApjGW54BMm5hybQnIgUWjJgExRMfUmo3ukbeCHj1gdTA0LRKc+rPA/bL7Fbw+65sVfd6dyvOlcJL+KBf6XLA2AX6gFB7G7RBTh87jlxKsi2RVsNqEzFXlyC8FDY58j2pXdycfhC8rpF0KviPQfn+HSme9UzostCthBWrevowigPsRTQVLFQ0IEly2O3EaqCJVJwGDEimCYsJPSYjdghiSARLBrpYAYNWQeMhP5LwhgoV2mkPTUSZMIFZM47mbXlynm2w1T5rwah3GqWEjLRH4aIBWhfJ+Qx+HCKshAIFRy4IromEBDFGwdzMdUmjELTpiqX4SKgU78InuNkivgLFnITmkBAm9pxr7RPAg85hP0kAZjRN/Is8rTds74XFSVemsLFML+qdwJPmIh9Ak2PFi0etq+IwVLn5b+C2DXkj2Hghux0wSFUlgUm6tgd6M8GOci/9CwoBMkCDWr6ULAnCZvARzEJtyj+BIEoYdkcySuMa4Uv+BVEIQHyoeIlndbcSAQPZnR2/y8Jeb737fL38cXK5utqNBetR9YTy7a61ktr03pn9a1dizZ+NH41fjf+NLebafNr81sJbSxUPg+t2tP8/hc2sv2L</latexit> <latexit sha1_base64="gluqdi4WIqSk7wgy9umHKkSkbMU=">AFHnicfVNLb9QwE6XBcryaAtHLoaq0i6UareA4FKoAFVcgEXqC9VL5DjOrls7jhynbWT5v3Dkl3BDXOHfMHn0kXbBUuLxzDeb8YzQSJ4avr9PzOtK+2r167P3ujcvHX7ztz8wt3tVGWasi2qhNK7AUmZ4DHbMtwItptoRmQg2E5w8Law7xwynXIVb5o8YSNJxjGPOCUGVP7CzPclHEiLzYQZ4r7arnk86DmEBYsM0VodoQtmMD4udZHrFtxoeU9t4zySuihJ9PteEKMzV2vs1RLaA1hoseSHPs2R5jHCEtiJpQI+8W5c3ERpqEy/4jaAz60m5+d6/uxzJyfQ4ykC5g1dKJOJtxBvGW070pjdZ2/D8F5iE6gBREtrWbARpRMI02o3Rw6+CDixkfXAELRKS+qPA07LEb/uDUtlqndOpWn+t4Z9eBe63z5xf7K/1yocvCoBYWvXoN/YXWCIeKZpLFhgqSpnuDfmJGlmjDqWCug7OUJYQekDHbAzEmkqUjW/aUQ0ugCVGkNHyxQaX2vIclMk1zGQCyKEF60VYop9n2MhO9HFkeJ5lhMa0CRZlARqGiQVHIWEjchAI1Ry4IjohUDQDbQwFPxdmwsQhM81EqBzZNCqjNygFEs6axeyIKilJHD6yOCKSizxkEcmEcRan0Yk8rTL4SFP0rpKx1WZOjAoBivNxzyGR4KhKSenqYZtYnD57+B3DN5Csw9A8FPCNDFKA5NqDBy8zRg/wIX4PySPT5EgNtOyJQFIpiBSlhsXTVQqUMB2OtsqRB+J/SRQuIBFUvMKzpluFgIYcXGy/y8L26srg6crq52eL62/q1pz17nsPva438F546957b+htebQ13retV63f7W/tH+2f5VQVsztc89r7Hav/8CIW+yA=</latexit> Combining precision and recall ■ Inherent tradeoff between precision and recall. Choice is problem-specific. ■ For a preliminary medical diagnosis, we might prefer high recall. False positives can be screened out later. ■ The “beyond reasonable doubt” standard of U.S. criminal law implies a preference for high precision. ■ Most often, we weight them equally using F 1 measure : harmonic mean of precision and recall. F 1 = 2 · precision · recall min( precision , recall ) ≤ F 1 ≤ 2 · min( precision , recall ) precision + recall 22

  89. <latexit sha1_base64="5hf/30datH8Q35WsniBmRcgzYK4=">AFzXichVRLb9NAEHYDgRJeLRy5LFSVbBqJCDBpVIFqOICDVJfqBui9XqdbOu1zXrd1lqWK/+K/8GdK/wGxo+0cZqCJcezM9/MfDuPuHAE9Xp/FxoXLvevHFz8Vbr9p279+4vLT/YS6JUrZLoyCSBy5JWMBDtqu4CthBLBkRbsD23eM3uX3/hMmER+GOymI2EGQUcp9TokA1XG7sr2JXaKzGTBHzWdtqresYhAPmKyJldIpmzGBcK3S+sfPWa7ljmjrBQc9Gy+HY+J0plxWquVhDYQJnIkyNlQZwjzEGFB1JiSQH8yZiovwtSL1BVZHeBD7eziXMXHIjXDHLENmA20EQdj7mBfG10ZApjGW54BMm5hybQnIgUWjJgExRMfUmo3ukbeCHj1gdTA0LRKc+rPA/bL7Fbw+65sVfd6dyvOlcJL+KBf6XLA2AX6gFB7G7RBTh87jlxKsi2RVsNqEzFXlyC8FDY58j2pXdycfhC8rpF0KviPQfn+HSme9UzostCthBWrevowigPsRTQVLFQ0IEly2O3EaqCJVJwGDEimCYsJPSYjdghiSARLBrpYAYNWQeMhP5LwhgoV2mkPTUSZMIFZM47mbXlynm2w1T5rwah3GqWEjLRH4aIBWhfJ+Qx+HCKshAIFRy4IromEBDFGwdzMdUmjELTpiqX4SKgU78InuNkivgLFnITmkBAm9pxr7RPAg85hP0kAZjRN/Is8rTds74XFSVemsLFML+qdwJPmIh9Ak2PFi0etq+IwVLn5b+C2DXkj2Hghux0wSFUlgUm6tgd6M8GOci/9CwoBMkCDWr6ULAnCZvARzEJtyj+BIEoYdkcySuMa4Uv+BVEIQHyoeIlndbcSAQPZnR2/y8Jeb737fL38cXK5utqNBetR9YTy7a61ktr03pn9a1dizZ+NH41fjf+NLebafNr81sJbSxUPg+t2tP8/hc2sv2L</latexit> <latexit sha1_base64="gluqdi4WIqSk7wgy9umHKkSkbMU=">AFHnicfVNLb9QwE6XBcryaAtHLoaq0i6UareA4FKoAFVcgEXqC9VL5DjOrls7jhynbWT5v3Dkl3BDXOHfMHn0kXbBUuLxzDeb8YzQSJ4avr9PzOtK+2r167P3ujcvHX7ztz8wt3tVGWasi2qhNK7AUmZ4DHbMtwItptoRmQg2E5w8Law7xwynXIVb5o8YSNJxjGPOCUGVP7CzPclHEiLzYQZ4r7arnk86DmEBYsM0VodoQtmMD4udZHrFtxoeU9t4zySuihJ9PteEKMzV2vs1RLaA1hoseSHPs2R5jHCEtiJpQI+8W5c3ERpqEy/4jaAz60m5+d6/uxzJyfQ4ykC5g1dKJOJtxBvGW070pjdZ2/D8F5iE6gBREtrWbARpRMI02o3Rw6+CDixkfXAELRKS+qPA07LEb/uDUtlqndOpWn+t4Z9eBe63z5xf7K/1yocvCoBYWvXoN/YXWCIeKZpLFhgqSpnuDfmJGlmjDqWCug7OUJYQekDHbAzEmkqUjW/aUQ0ugCVGkNHyxQaX2vIclMk1zGQCyKEF60VYop9n2MhO9HFkeJ5lhMa0CRZlARqGiQVHIWEjchAI1Ry4IjohUDQDbQwFPxdmwsQhM81EqBzZNCqjNygFEs6axeyIKilJHD6yOCKSizxkEcmEcRan0Yk8rTL4SFP0rpKx1WZOjAoBivNxzyGR4KhKSenqYZtYnD57+B3DN5Csw9A8FPCNDFKA5NqDBy8zRg/wIX4PySPT5EgNtOyJQFIpiBSlhsXTVQqUMB2OtsqRB+J/SRQuIBFUvMKzpluFgIYcXGy/y8L26srg6crq52eL62/q1pz17nsPva438F546957b+htebQ13retV63f7W/tH+2f5VQVsztc89r7Hav/8CIW+yA=</latexit> <latexit sha1_base64="OEijKjUDTt9md5BWbXclwVw2qQ=">AFeHicfVNLb9QwE5DF8ryauHIxVBVbKBUmwUJLpUqQBUXoEh9oXq7chxn162dRI7TNrL8G7jCT+OvcGLy6CPbLZYSj2e+mfk8nglSwTPd7/+Zc2/Nd27fWbjbvXf/wcNHi0uPd7MkV5Tt0EQkaj8gGRM8Zjua8H2U8WIDATbC4/lva9E6YynsTbukjZUJxzCNOiQbVaMl1V3AgDdYTpok9ND39yvcswoJFmiVnKIpMxhfVbrI9srtrNRyz6iohY89Hq2HU+INoX1uiuNhNYRJmosydnIFAjzGFJ9IQSYX5YeyUvwjRM9A1ZPeBDe8XluYmPZW5HBeRIe4BZR+fqdMIt5FtFR7Yy1uFGR5Cch+gcWhJR0igGbETFNFKEmu0tCx9k3PxqW0AoOuVlWdht2rs5si/MA6aO134Necm4WU8G903c0RDqAcEKPnV48Ah8OBVwe8KVCvgU3n8y4DjxaX+2v9aqHrgt8Iy06ztqBrhjhMaC5ZrKkgWXbg91M9NERpTgWzXZxnLCX0mIzZAYgxkSwbmqpbLVoBTYiRMEXa1Rpr3oYIrOskAEgy+Jm07ZSOct2kOvo/dDwOM01i2mdKMoF0gkqWx+FHC6sRQECoYoDV0QnBIqnYUDgKa+kmTBxwnT7IlQOTRZV2VuUAglnxWJ2ShMpSRy+NDgikosiZBHJhbYGZ9G5PKs0q+EJT7OmSmd1mbowghonio95DI8E41jNZFsN20Tj6t/Fnxi8hWJfgOC3lCmiEwVM6gGz8DZj/AyX4v+QPL5Agti+lqkIwGXKEiQpi42t51UkGcPBWCV52iJ8zb8iCgFIBWv8aztViOgIf3p9rsu7A7W/Ddrg+9vlzc+NK254Dx1njs9x3feORvOZ2fL2XGoy92f7i/39/zfDuq86Hg1J1rfJ4rdUZ/APUHdxF</latexit> Combining precision and recall ■ Inherent tradeoff between precision and recall. Choice is problem-specific. ■ For a preliminary medical diagnosis, we might prefer high recall. False positives can be screened out later. ■ The “beyond reasonable doubt” standard of U.S. criminal law implies a preference for high precision. ■ Most often, we weight them equally using F 1 measure : harmonic mean of precision and recall. F 1 = 2 · precision · recall min( precision , recall ) ≤ F 1 ≤ 2 · min( precision , recall ) precision + recall ■ Can generalize F-measure to adjust the tradeoff, such that recall is β -times as important as precision: precision · recall F β = (1 + β 2 ) ( β 2 · precision ) + recall 22

  90. Trading off precision and recall: ROC curve 23

  91. Trading off precision and recall: ROC curve 23

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend