latent models
play

Latent Models: Sequence Models Beyond HMMs and Machine Translation - PowerPoint PPT Presentation

Latent Models: Sequence Models Beyond HMMs and Machine Translation Alignment CMSC 473/673 UMBC Outline Review: EM for HMMs Machine Translation Alignment Limited Sequence Models Maximum Entropy Markov Models Conditional Random Fields


  1. Latent Models: Sequence Models Beyond HMMs and Machine Translation Alignment CMSC 473/673 UMBC

  2. Outline Review: EM for HMMs Machine Translation Alignment Limited Sequence Models Maximum Entropy Markov Models Conditional Random Fields Recurrent Neural Networks Basic Definitions Example in PyTorch

  3. Why Do We Need Both the Forward and Backward Algorithms? Compute posteriors Ξ±(i, s) * Ξ²(i, s) = total probability of paths through state s at step i π‘ž 𝑨 𝑗 = 𝑑 π‘₯ 1 , β‹― , π‘₯ 𝑂 ) = 𝛽 𝑗, 𝑑 βˆ— 𝛾(𝑗, 𝑑) 𝛽(𝑂 + 1, END ) Ξ±(i, s) * p(s’ | s) * p(obs at i+1 | s’) * Ξ²(i+1, s’) = total probability of paths through the s οƒ  s’ arc (at time i) π‘ž 𝑨 𝑗 = 𝑑, 𝑨 𝑗+1 = 𝑑 β€² π‘₯ 1 , β‹― , π‘₯ 𝑂 ) = 𝛽 𝑗, 𝑑 βˆ— π‘ž 𝑑 β€² 𝑑 βˆ— π‘ž obs 𝑗+1 𝑑 β€² βˆ— 𝛾(𝑗 + 1, 𝑑′) 𝛽(𝑂 + 1, END )

  4. EM for HMMs p obs (w | s) 0. Assume some value for your parameters p trans (s’ | s) Two step, iterative algorithm 1. E-step: count under uncertainty, assuming these parameters π‘ž βˆ— 𝑨 𝑗 = 𝑑, 𝑨 𝑗+1 = 𝑑 β€² π‘₯ 1 , β‹― , π‘₯ 𝑂 ) = π‘ž βˆ— 𝑨 𝑗 = 𝑑 π‘₯ 1 , β‹― , π‘₯ 𝑂 ) = 𝛽 𝑗, 𝑑 βˆ— 𝛾(𝑗, 𝑑) 𝛽 𝑗, 𝑑 βˆ— π‘ž 𝑑 β€² 𝑑 βˆ— π‘ž obs 𝑗+1 𝑑 β€² βˆ— 𝛾(𝑗 + 1, 𝑑′) 𝛽(𝑂 + 1, END ) 𝛽(𝑂 + 1, END ) 2. M-step: maximize log-likelihood, assuming these uncertain counts estimated counts

  5. EM For HMMs Ξ± = computeForwards() Ξ² = computeBackwards() (Baum-Welch L = Ξ± [N+1][ EN D ] Algorithm) for(i = N; i β‰₯ 0; --i) { for(next = 0; next < K*; ++next) { c obs (obs i+1 | next) += Ξ± [i+1][next]* Ξ² [i+1][next]/L for(state = 0; state < K*; ++state) { u = p obs (obs i+1 | next) * p trans (next | state) c trans (next| state) += Ξ± [i][state] * u * Ξ² [i+1][next]/L } } } update p obs , p trans using c obs , c trans

  6. Semi-Supervised Learning ? ? ?  ? ? ? EM  ? ? ?  ? ? ?  ? ? ?  ? ? ?  ? ? ? ? ? ? labeled data: unlabeled data: β€’ human annotated β€’ raw; not annotated β€’ relatively small/few β€’ plentiful examples

  7. Semi-Supervised Parameter Estimation for HMMs Transition Counts Emission Counts N V end w 1 w 2 W 3 w 4 start 2 0 0 N 2 0 1 2 N 1 2 2 V 0 2 1 0 V 2 1 0 Mixed Transition Counts Mixed Emission Counts N V end w 1 w 2 W 3 w 4 start 3.8 .1 .1 N 2.4 .3 1.2 2.2 N 2.5 2.8 2.1 V .1 2.6 1.3 .3 V 3.4 2.1 .4 Expected Transition Expected Counts Emission Counts N V end w 1 w 2 W 3 w 4 start 1.8 .1 .1 N .4 .3 .2 .2 N 1.5 .8 .1 V .1 .6 .3 .3 V 1.4 1.1 .4

  8. Outline Review: EM for HMMs Machine Translation Alignment Limited Sequence Models Maximum Entropy Markov Models Conditional Random Fields Recurrent Neural Networks Basic Definitions Example in PyTorch

  9. Warren Weaver’s Note When I look at an article in Russian, I say β€œThis is really written in English, but it has been coded in some strange symbols. I will now proceed to decode.” (Warren Weaver, 1947) http://www.mt-archive.info/Weaver-1949.pdf Slides courtesy Rebecca Knowles

  10. Noisy Channel Model language text w language speak or d Decode Rerank язы́к text language translation/ (clean) language decode model model w speak or d observed Russian (noisy) text written in English (clean) English Slides courtesy Rebecca Knowles

  11. Noisy Channel Model language text w language speak or d Decode Rerank язы́к text language translation/ (clean) language decode model model w speak or d observed Russian (noisy) text written in English (clean) English Slides courtesy Rebecca Knowles

  12. Noisy Channel Model language text w language speak or d Decode Rerank язы́к text language translation/ (clean) language decode model model w speak or d observed Russian (noisy) text written in English (clean) English Slides courtesy Rebecca Knowles

  13. Translation Translate French (observed) into English: Le chat est sur la chaise. The cat is on the chair. Slides courtesy Rebecca Knowles

  14. Translation Translate French (observed) into English: Le chat est sur la chaise. The cat is on the chair. Slides courtesy Rebecca Knowles

  15. Translation Translate French (observed) into English: Le chat est sur la chaise. The cat is on the chair. Slides courtesy Rebecca Knowles

  16. Alignment Le chat est sur la chaise. ? The cat is on the chair. Le chat est sur la chaise. The cat is on the chair. Slides courtesy Rebecca Knowles

  17. Parallel Texts Whereas recognition of the inherent dignity and of the Yolki, pampa ni tlatepanitalotl, ni tlasenkauajkayotl iuan ni kuali nemilistli ipan ni tlalpan, yaya ni moneki moixmatis uan monemilis, equal and inalienable rights of all members of the human ijkinoj nochi kuali tiitstosej ika touampoyouaj. family is the foundation of freedom, justice and peace in the world, Pampa tlaj amo tikixmatij tlatepanitalistli uan tlen kuali nemilistli ipan ni tlalpan, yeka onkatok kualantli, onkatok tlateuilistli, onkatok Whereas disregard and contempt for human rights have majmajtli uan sekinok tlamantli teixpanolistli; yeka moneki ma kuali timouikakaj ika nochi touampoyouaj, ma amo onkaj majmajyotl uan resulted in barbarous acts which have outraged the teixpanolistli; moneki ma onkaj yejyektlalistli, ma titlajtlajtokaj uan ma conscience of mankind, and the advent of a world in which tijneltokakaj tlen tojuantij tijnekij tijneltokasej uan amo tlen ma human beings shall enjoy freedom of speech and belief topanti, kenke, pampa tijnekij ma onkaj tlatepanitalistli. and freedom from fear and want has been proclaimed as the highest aspiration of the common people, Pampa ni tlatepanitalotl moneki ma tiyejyekokaj, ma tijchiuakaj uan ma tijmanauikaj; ma nojkia kiixmatikaj tekiuajtinij, uejueyij tekiuajtinij, ijkinoj amo onkas nopeka se akajya touampoj san tlen ueli kinekis Whereas it is essential, if man is not to be compelled to techchiuilis, technauatis, kinekis technauatis ma tijchiuakaj se have recourse, as a last resort, to rebellion against tyranny tlamantli tlen amo kuali; yeka ni tlatepanitalotl tlauel moneki ipan and oppression, that human rights should be protected by tonemilis ni tlalpan. the rule of law, Pampa nojkia tlauel moneki ma kuali timouikakaj, ma tielikaj keuak tiiknimej, nochi tlen tlakamej uan siuamej tlen tiitstokej ni tlalpan. Whereas it is essential to promote the development of … friendly relations between nations, … http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=nhn http://www.un.org/en/universal-declaration-human-rights/ Slides courtesy Rebecca Knowles

  18. Preprocessing β€’ Sentence align Whereas recognition of the inherent dignity and of the equal and inalienable rights of all members of the human family is the foundation β€’ of freedom, justice and peace in the world, Clean corpus Whereas disregard and contempt for human rights have resulted in barbarous acts which have outraged the conscience of mankind, and the advent of a world in which human beings shall enjoy freedom of speech β€’ Tokenize and belief and freedom from fear and want has been proclaimed as the highest aspiration of the common people, Whereas it is essential, if man is not to be compelled to have recourse, β€’ as a last resort, to rebellion against tyranny and oppression, that human Handle case rights should be protected by the rule of law, Whereas it is essential to promote the development of friendly relations between nations, β€’ Word segmentation … http://www.un.org/en/universal-declaration-human-rights/ Yolki, pampa ni tlatepanitalotl, ni tlasenkauajkayotl iuan ni kuali nemilistli ipan ni tlalpan, yaya ni moneki moixmatis uan monemilis, ijkinoj nochi kuali tiitstosej ika touampoyouaj. (morphological, BPE, etc.) Pampa tlaj amo tikixmatij tlatepanitalistli uan tlen kuali nemilistli ipan ni tlalpan, yeka onkatok kualantli, onkatok tlateuilistli, onkatok majmajtli uan sekinok tlamantli teixpanolistli; yeka moneki ma kuali timouikakaj ika nochi touampoyouaj, ma amo β€’ onkaj majmajyotl uan teixpanolistli; moneki ma onkaj yejyektlalistli, ma titlajtlajtokaj Language -specific uan ma tijneltokakaj tlen tojuantij tijnekij tijneltokasej uan amo tlen ma topanti, kenke, pampa tijnekij ma onkaj tlatepanitalistli. Pampa ni tlatepanitalotl moneki ma tiyejyekokaj, ma tijchiuakaj uan ma tijmanauikaj; preprocessing (example: ma nojkia kiixmatikaj tekiuajtinij, uejueyij tekiuajtinij, ijkinoj amo onkas nopeka se akajya touampoj san tlen ueli kinekis techchiuilis, technauatis, kinekis technauatis ma tijchiuakaj se tlamantli tlen amo kuali; yeka ni tlatepanitalotl tlauel moneki ipan tonemilis ni tlalpan. pre-reordering) Pampa nojkia tlauel moneki ma kuali timouikakaj, ma tielikaj keuak tiiknimej, nochi tlen tlakamej uan siuamej tlen tiitstokej ni tlalpan. … β€’ http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=nhn ... Slides courtesy Rebecca Knowles

  19. Alignments If we had word-aligned text, we could easily estimate P( f | e ). But we don’t usually have word alignments, and they are expensive to produce by hand… If we had P( f | e ) we could produce alignments automatically. Slides courtesy Rebecca Knowles

  20. IBM Model 1 (1993) β€’ Lexical Translation Model β€’ Word Alignment Model β€’ The simplest of the original IBM models β€’ For all IBM models, see the original paper (Brown et al, 1993): http://www.aclweb.org/anthology/J93-2003 Slides courtesy Rebecca Knowles

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend