machine learning 2

Machine Learning 2 DS 4420 - Spring 2020 Structured prediction, II - PowerPoint PPT Presentation

Machine Learning 2 DS 4420 - Spring 2020 Structured prediction, II Byron C Wallace Today From HMMs to MEMMs to CRFs <latexit


  1. Machine Learning 2 DS 4420 - Spring 2020 Structured prediction, II Byron C Wallace

  2. Today • From HMMs to MEMMs to CRFs

  3. <latexit sha1_base64="MWSDWkw1NdOauHNwPQkLknLX4o4=">AB6nicbVBNS8NAEJ34WetX1aOXxSJ4KkV9Fj04rGi/YA2lM120y7dbMLuRCyhP8GLB0W8+ou8+W/ctjlo64OBx3szMwLEikMu63s7K6tr6xWdgqbu/s7u2XDg6bJk414w0Wy1i3A2q4FIo3UKDk7URzGgWSt4LRzdRvPXJtRKwecJxwP6IDJULBKFrp/qn9Uplt+LOQJaJl5My5Kj3Sl/dfszSiCtkhrT8dwE/YxqFEzySbGbGp5QNqID3rFU0YgbP5udOiGnVumTMNa2FJKZ+nsio5Ex4yiwnRHFoVn0puJ/XifF8MrPhEpS5IrNF4WpJBiT6d+kLzRnKMeWUKaFvZWwIdWUoU2naEPwFl9eJs1qxTuvVO8uyrXrPI4CHMJnIEHl1CDW6hDAxgM4Ble4c2Rzovz7nzMW1ecfOYI/sD5/AEM/o2k</latexit> <latexit sha1_base64="vdTCQWpAcdEoAqjXndSIH2U27gw=">AB6nicbVBNS8NAEJ34WetX1aOXxSJ4KkV9Fj04rGi/YA2lM120y7dbMLuRCyhP8GLB0W8+ou8+W/ctjlo64OBx3szMwLEikMu63s7K6tr6xWdgqbu/s7u2XDg6bJk414w0Wy1i3A2q4FIo3UKDk7URzGgWSt4LRzdRvPXJtRKwecJxwP6IDJULBKFrp/qlX7ZXKbsWdgSwTLydlyFHvlb6/ZilEVfIJDWm47kJ+hnVKJjk2I3NTyhbEQHvGOpohE3fjY7dUJOrdInYaxtKSQz9fdERiNjxlFgOyOKQ7PoTcX/vE6K4ZWfCZWkyBWbLwpTSTAm079JX2jOUI4toUwLeythQ6opQ5tO0YbgLb68TJrVindeqd5dlGvXeRwFOIYTOAMPLqEGt1CHBjAYwDO8wpsjnRfn3fmYt64+cwR/IHz+QMOgo2l</latexit> <latexit sha1_base64="GcfdmiVXIuQAVIE+3vSRqlRiStc=">AB6nicbVDLTgJBEOzF+IL9ehlIjHxRHbBRI9ELx4xyiOBDZkdemHC7OxmZtZICJ/gxYPGePWLvPk3DrAHBSvpFLVne6uIBFcG9f9dnJr6xubW/ntws7u3v5B8fCoqeNUMWywWMSqHVCNgktsG4EthOFNAoEtoLRzcxvPaLSPJYPZpygH9GB5CFn1Fjp/qlX7RVLbtmdg6wSLyMlyFDvFb+6/ZilEUrDBNW647mJ8SdUGc4ETgvdVGNC2YgOsGOpBFqfzI/dUrOrNInYaxsSUPm6u+JCY20HkeB7YyoGeplbyb+53VSE175Ey6T1KBki0VhKoiJyexv0ucKmRFjSyhT3N5K2JAqyoxNp2BD8JZfXiXNStmrlit3F6XadRZHk7gFM7Bg0uowS3UoQEMBvAMr/DmCOfFeXc+Fq05J5s5hj9wPn8AEAaNpg=</latexit> <latexit sha1_base64="I1Yqf/dyfMgBYEmGxGuyvCmwQ4=">AB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0laQY9FLx4r2lpoQ9lsJ+3SzSbsboQS+hO8eFDEq7/Im/GbZuDtj4YeLw3w8y8IBFcG9f9dgpr6xubW8Xt0s7u3v5B+fCoreNUMWyxWMSqE1CNgktsGW4EdhKFNAoEPgbjm5n/+IRK81g+mEmCfkSHkoecUWOl+0m/3i9X3Ko7B1klXk4qkKPZL3/1BjFLI5SGCap13MT42dUGc4ETku9VGNC2ZgOsWupBFqP5ufOiVnVhmQMFa2pCFz9fdERiOtJ1FgOyNqRnrZm4n/ed3UhFd+xmWSGpRsShMBTExmf1NBlwhM2JiCWK21sJG1FmbHplGwI3vLq6Rdq3r1au3uotK4zuMowgmcwjl4cAkNuIUmtIDBEJ7hFd4c4bw4787HorXg5DPH8AfO5w8RjI2n</latexit> <latexit sha1_base64="UmY8miGJFsYtImgQ4UOSFc3rPg=">AB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0mqoMeiF48V7Qe0oWy2m3bpZhN2J0Io/QlePCji1V/kzX/jts1BWx8MPN6bYWZekEh0HW/ncLa+sbmVnG7tLO7t39QPjxqmTjVjDdZLGPdCajhUijeRIGSdxLNaRI3g7GtzO/cS1EbF6xCzhfkSHSoSCUbTSQ9av9csVt+rOQVaJl5MK5Gj0y1+9QczSiCtkhrT9dwE/QnVKJjk01IvNTyhbEyHvGupohE3/mR+6pScWVAwljbUkjm6u+JCY2MyaLAdkYUR2bZm4n/ed0Uw2t/IlSIldsShMJcGYzP4mA6E5Q5lZQpkW9lbCRlRThjadkg3BW35lbRqVe+iWru/rNRv8jiKcAKncA4eXEd7qABTWAwhGd4hTdHOi/Ou/OxaC04+cwx/IHz+QMQCI2m</latexit> <latexit sha1_base64="gXLzr9lA6QyErQrPkt90wKdvXMk=">AB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0mqoMeiF48V7Qe0oWy2m3bpZhN2J0Io/QlePCji1V/kzX/jts1BWx8MPN6bYWZekEh0HW/ncLa+sbmVnG7tLO7t39QPjxqmTjVjDdZLGPdCajhUijeRIGSdxLNaRI3g7GtzO/cS1EbF6xCzhfkSHSoSCUbTSQ9b3+uWKW3XnIKvEy0kFcjT65a/eIGZpxBUySY3pem6C/oRqFEzyamXGp5QNqZD3rVU0YgbfzI/dUrOrDIgYaxtKSRz9fEhEbGZFgOyOKI7PszcT/vG6K4bU/ESpJkSu2WBSmkmBMZn+TgdCcocwsoUwLeythI6opQ5tOyYbgLb+8Slq1qndRrd1fVuo3eRxFOIFTOAcPrqAOd9CAJjAYwjO8wpsjnRfn3flYtBacfOY/sD5/AEOhI2l</latexit> Structured output spaces y 1 y 2 y 3 “ Play Kanye West” x 1 x 3 x 2

  4. Structured output spaces Source: http://cocodataset.org/

  5. Space of problems Given Predict Type? An image Contains a cat? Classification An image Coordinates that Structured outline all cats prediction Structured A tweet Names in the tweet prediction A tweet Sentiment in tweet Classification

  6. <latexit sha1_base64="uGKM4ZjoMHjNKaJNf9IciBFWvE=">ACJXicbVDLSgMxFM3UV62vqks3wSK0iGWmCrqwUHTjsoJ9QDsOmTRtQzOZIcmIwzg/48ZfcePCIoIrf8V02oW2HgczjmXm3vcgFGpTPLyCwtr6yuZdzG5tb2zv53b2m9EOBSQP7zBdtF0nCKCcNRUj7UAQ5LmMtNzR9cRvPRAhqc/vVBQ20MDTvsUI6UlJ39Zhd1A+D0nplUruY/5sZXAejFy6FOktRMrKc0FknrxMXVpyckXzLKZAi4Sa0YKYIa6kx93ez4OPcIVZkjKjmUGyo6RUBQzkuS6oSQBwiM0IB1NOfKItOP0ygQeaUH+7QjyuYqr8nYuRJGXmuTnpIDeW8NxH/8zqh6l/YMeVBqAjH0X9kEHlw0lsEcFwYpFmiAsqP4rxEMkEFa62JwuwZo/eZE0K2XrtFy5PSvUrmZ1ZMEBOARFYIFzUAM3oA4aAINn8Arewdh4Md6MD+NzGs0Ys5l98AfG9w9gk6SZ</latexit> A generative model of sequences P ( X 1 = x 1 . . . X n = x n , Y 1 = y 1 . . . Y n = y n ) n +1 n Y Y = P ( y i | y i − 1 ) P ( x i | y i ) i =1 i =1 Transition probability Emission probability

  7. <latexit sha1_base64="uGKM4ZjoMHjNKaJNf9IciBFWvE=">ACJXicbVDLSgMxFM3UV62vqks3wSK0iGWmCrqwUHTjsoJ9QDsOmTRtQzOZIcmIwzg/48ZfcePCIoIrf8V02oW2HgczjmXm3vcgFGpTPLyCwtr6yuZdzG5tb2zv53b2m9EOBSQP7zBdtF0nCKCcNRUj7UAQ5LmMtNzR9cRvPRAhqc/vVBQ20MDTvsUI6UlJ39Zhd1A+D0nplUruY/5sZXAejFy6FOktRMrKc0FknrxMXVpyckXzLKZAi4Sa0YKYIa6kx93ez4OPcIVZkjKjmUGyo6RUBQzkuS6oSQBwiM0IB1NOfKItOP0ygQeaUH+7QjyuYqr8nYuRJGXmuTnpIDeW8NxH/8zqh6l/YMeVBqAjH0X9kEHlw0lsEcFwYpFmiAsqP4rxEMkEFa62JwuwZo/eZE0K2XrtFy5PSvUrmZ1ZMEBOARFYIFzUAM3oA4aAINn8Arewdh4Md6MD+NzGs0Ys5l98AfG9w9gk6SZ</latexit> A generative model of sequences P ( X 1 = x 1 . . . X n = x n , Y 1 = y 1 . . . Y n = y n ) n +1 n Y Y = P ( y i | y i − 1 ) P ( x i | y i ) i =1 i =1 Transition probability Emission probability

  8. Graphical Model (HMMs) y 0 y 1 y 2 y 3 y 4 y 5 x 1 x 2 x 3 x 4 x 5

  9. Limitations to HMMs We are restricted to features that have a coherent “generative • story” Why bother “modeling” x — it’s given! What we really care about • is p( y | x )

  10. Generative v discriminative Generative Model joint distribution P(x,y) 
 Can generate new “examples” 
 To predict y, use Bayes’ rule

  11. Generative v discriminative Generative Model joint distribution P(x,y) 
 Can generate new “examples” 
 To predict y, use Bayes’ rule Discriminative Model conditional distribution P(y|x) 
 Not as amenable to semi-supervised settings; cannot readily “generate” new samples

  12. Enter Max Entropy Markov Models (MEMMs) These extend standard log-linear models to capture structure in • the outputs. A bit like the structured perceptron we introduced last time, but • explicitly model conditional probabilities of labels.

  13. <latexit sha1_base64="Ww4HFIBGVrevhEotSmFMziMzX8Y=">ACWnicfVFNaxsxFNRu23w4/XCb3np51JR4IZjdtNBcCqG9JhAnaRYxmhlbSyi1arS28SLun+yl1DoXylEdnzIR+mAYJh5g/RGuVHSYZr+juJHj5+srW9sdraePnv+ovy1bGrasvFkFeqsqc5c0JLYoUYlTYwUrcyVO8vMvC/kQlgnK/0NGyPGJTvTspCcYZAm3R+m3/yc78JlAp+AFpZx7ymKOXoxN23bvwTKpxUCNTPZn+82SdJ6upy4psdoFIDLRnOFP+e9vC/6LQ7CQJQDvp9tJBugQ8JNmK9MgKh5PuLzqteF0KjVwx50ZanDsmUXJlWg7tHbCMH7OzsQoUM1K4cZ+WU0L74IyhaKy4WiEpXo74VnpXFPmYXKxiLvLcR/eaMai/2xl9rUKDS/uaioFWAFi5hKq3gqJpAGLcyvBX4jIV+MfxGJ5SQ3V/5ITneG2TvB3tH3oHn1d1bJA35C3pk4x8JAfkKzkQ8LJFfkbrUXr0Z84jfjrZvROFpltskdxK+vAcqFs6w=</latexit> Log-Linear Models exp( w · φ ( x, y )) p ( y | x, w ) = P y 0 2 Y exp( w · φ ( x, y 0 ))

  14. <latexit sha1_base64="Ww4HFIBGVrevhEotSmFMziMzX8Y=">ACWnicfVFNaxsxFNRu23w4/XCb3np51JR4IZjdtNBcCqG9JhAnaRYxmhlbSyi1arS28SLun+yl1DoXylEdnzIR+mAYJh5g/RGuVHSYZr+juJHj5+srW9sdraePnv+ovy1bGrasvFkFeqsqc5c0JLYoUYlTYwUrcyVO8vMvC/kQlgnK/0NGyPGJTvTspCcYZAm3R+m3/yc78JlAp+AFpZx7ymKOXoxN23bvwTKpxUCNTPZn+82SdJ6upy4psdoFIDLRnOFP+e9vC/6LQ7CQJQDvp9tJBugQ8JNmK9MgKh5PuLzqteF0KjVwx50ZanDsmUXJlWg7tHbCMH7OzsQoUM1K4cZ+WU0L74IyhaKy4WiEpXo74VnpXFPmYXKxiLvLcR/eaMai/2xl9rUKDS/uaioFWAFi5hKq3gqJpAGLcyvBX4jIV+MfxGJ5SQ3V/5ITneG2TvB3tH3oHn1d1bJA35C3pk4x8JAfkKzkQ8LJFfkbrUXr0Z84jfjrZvROFpltskdxK+vAcqFs6w=</latexit> Log-Linear Models measures plausibility of y given x exp( w · φ ( x, y )) p ( y | x, w ) = P y 0 2 Y exp( w · φ ( x, y 0 ))

  15. <latexit sha1_base64="Ww4HFIBGVrevhEotSmFMziMzX8Y=">ACWnicfVFNaxsxFNRu23w4/XCb3np51JR4IZjdtNBcCqG9JhAnaRYxmhlbSyi1arS28SLun+yl1DoXylEdnzIR+mAYJh5g/RGuVHSYZr+juJHj5+srW9sdraePnv+ovy1bGrasvFkFeqsqc5c0JLYoUYlTYwUrcyVO8vMvC/kQlgnK/0NGyPGJTvTspCcYZAm3R+m3/yc78JlAp+AFpZx7ymKOXoxN23bvwTKpxUCNTPZn+82SdJ6upy4psdoFIDLRnOFP+e9vC/6LQ7CQJQDvp9tJBugQ8JNmK9MgKh5PuLzqteF0KjVwx50ZanDsmUXJlWg7tHbCMH7OzsQoUM1K4cZ+WU0L74IyhaKy4WiEpXo74VnpXFPmYXKxiLvLcR/eaMai/2xl9rUKDS/uaioFWAFi5hKq3gqJpAGLcyvBX4jIV+MfxGJ5SQ3V/5ITneG2TvB3tH3oHn1d1bJA35C3pk4x8JAfkKzkQ8LJFfkbrUXr0Z84jfjrZvROFpltskdxK+vAcqFs6w=</latexit> Log-likelihood exp( w · φ ( x, y )) p ( y | x, w ) = P y 0 2 Y exp( w · φ ( x, y 0 ))

  16. <latexit sha1_base64="7mJHTCqAYDNbu6aBYQG92LgaqOc=">ACE3icbVC7SgNBFJ2Nrxhfq5Y2g0FIRMJuFLQRgjYWKSKYByRhmZ1MkiGzD2bumoQ1/2Djr9hYKGJrY+fOHkUmnjgwuGce7n3HjcUXIFlfRuJpeWV1bXkempjc2t7x9zdq6gkpSVaSACWXOJYoL7rAwcBKuFkhHPFazq9q7HfvWeScUD/w6GIWt6pOPzNqcEtOSYx8Vip/Fl7ihIs/hOG4AG0Asg4ejXCYGTr8YeDwE9zPOmbaylkT4EViz0gazVByzK9GK6CRx3ygihVt60QmjGRwKlgo1QjUiwktEc6rK6pTzymvHkpxE+0koLtwOpywc8UX9PxMRTaui5utMj0FXz3lj8z6tH0L5oxtwPI2A+nS5qRwJDgMcB4RaXjIYakKo5PpWTLtEgo6xpQOwZ5/eZFU8jn7NJe/PUsXrmZxJNEBOkQZKNzVEA3qITKiKJH9Ixe0ZvxZLwY78bHtDVhzGb20R8Ynz/IlJzd</latexit> <latexit sha1_base64="Ww4HFIBGVrevhEotSmFMziMzX8Y=">ACWnicfVFNaxsxFNRu23w4/XCb3np51JR4IZjdtNBcCqG9JhAnaRYxmhlbSyi1arS28SLun+yl1DoXylEdnzIR+mAYJh5g/RGuVHSYZr+juJHj5+srW9sdraePnv+ovy1bGrasvFkFeqsqc5c0JLYoUYlTYwUrcyVO8vMvC/kQlgnK/0NGyPGJTvTspCcYZAm3R+m3/yc78JlAp+AFpZx7ymKOXoxN23bvwTKpxUCNTPZn+82SdJ6upy4psdoFIDLRnOFP+e9vC/6LQ7CQJQDvp9tJBugQ8JNmK9MgKh5PuLzqteF0KjVwx50ZanDsmUXJlWg7tHbCMH7OzsQoUM1K4cZ+WU0L74IyhaKy4WiEpXo74VnpXFPmYXKxiLvLcR/eaMai/2xl9rUKDS/uaioFWAFi5hKq3gqJpAGLcyvBX4jIV+MfxGJ5SQ3V/5ITneG2TvB3tH3oHn1d1bJA35C3pk4x8JAfkKzkQ8LJFfkbrUXr0Z84jfjrZvROFpltskdxK+vAcqFs6w=</latexit> Log-likelihood exp( w · φ ( x, y )) p ( y | x, w ) = P y 0 2 Y exp( w · φ ( x, y 0 )) X LL ( w ) = log p ( y i | x i , w ) i

Recommend


More recommend