NLU lecture 6: Compositional character representations Adam Lopez - PowerPoint PPT Presentation

NLU lecture 6: Compositional character representations Adam Lopez alopez@inf.ed.ac.uk Credits: Clara Vania 2 Feb 2018

Let’s revisit an assumption in language modeling (& word2vec) When does this assumption make sense for language modeling?

But words are not a finite set! • Bengio et al.: “Rare words with frequency ≤ 3 were merged into a single symbol, reducing the vocabulary size to |V| = 16,383.” • Bahdanau et al.: “we use a shortlist of 30,000 most frequent words in each language to train our models. Any word not included in the shortlist is mapped to a special token ([UNK]).” -------------------------------------------------- Src | ⽇本の主要作物は⽶である。 Ref | the main crop of japan is rice . Hyp | the _UNK is popular of _UNK . _EOS --------------------------------------------------

What if we could scale softmax to the training data vocabulary? Would that help?

What if we could scale softmax to the training data vocabulary? Would that help? SOFTMAX ALL THE WORDS

Idea: scale by partitioning • Partition the vocabulary into smaller pieces. p ( w i | h i ) = p ( c i | h i ) p ( w i | c i , h i ) Class-based LM

Idea: scale by partitioning • Partition the vocabulary into smaller pieces hierarchically ( hierarchical softmax). Brown clustering: hard clustering based on mutual information

Idea: scale by partitioning Source: Strategies for training large vocabulary language models. • Differentiated softmax: assign more parameters to more frequent words, fewer to less frequent words. Chen, Auli, and Grangier, 2015

Partitioning helps Source: Strategies for training large vocabulary language models. Chen, Auli, and Grangier, 2015

Partitioning helps… but could be better Source: Strategies for training large vocabulary language models. Chen, Auli, and Grangier, 2015

contrastive estimation Noise Partitioning helps… but could be better Source: Strategies for training large vocabulary language models. Chen, Auli, and Grangier, 2015

normalization altogether step Skip Partitioning helps… but could be better Source: Strategies for training large vocabulary language models. Chen, Auli, and Grangier, 2015

improvement Room for Partitioning helps… but could be better Source: Strategies for training large vocabulary language models. Chen, Auli, and Grangier, 2015

V is not finite • Practical problem: softmax computation is linear in vocabulary size. • Theorem. The vocabulary of word types is infinite. Proof 1. productive morphology, loanwords, “fleek”   Proof 2. 1, 2, 3, 4, …

What set is finite?

What set is finite? Characters.

What set is finite? Characters. More precisely, unicode code points.

What set is finite? Characters. More precisely, unicode code points. Are you sure? 🤸

What set is finite? Characters. More precisely, unicode code points. Are you sure? 🤸 Not all characters are the same, because not all languages have alphabets. Some have syllabaries (e.g. Japanese kana) and/ or logographies (Chinese hànzì).

Rather than look up word representations… Source: Finding function in form: compositional character models for open vocabulary word representation, Ling et al. 2015

into word representations with LSTMs Compose character representations Source: Finding function in form: compositional character models for open vocabulary word representation, Ling et al. 2015

Compose character representations into word representations with CNNs Source: Character-aware neural language models, Kim et al. 2015

them long enough, they generate words Character models actually work. Train Source: Finding function in form: compositional character models for open vocabulary word representation, Ling et al. 2015

Character models actually work. Train them long enough, they generate words anterest hamburgo artifactive identimity capacited ipoteca capitaling nightmale compensive orience dermitories patholicism despertator pinguenas dividement sammitment extremilated tasteman faxemary understrumental follect wisholver

Character models actually work. Train them long enough, they generate words anterest hamburgo artifactive identimity capacited ipoteca capitaling nightmale compensive orience dermitories patholicism despertator pinguenas dividement sammitment extremilated tasteman faxemary understrumental follect wisholver Wow, the disconversated vocabulations of their system are fantastics! —Sharon Goldwater

How good are character-level NLP models? Implied(?): character-level neural models learn everything they need to know about language.

Word embeddings have obvious limitations • Closed vocabulary assumption • Cannot exploit functional relationships in learning ?

And we know a lot about linguistic structure Morpheme : the smallest meaningful unit of language “loves” love +s root/stem : love affix : -s morph. analysis : 3rd.SG.PRES

� �� The ratio of morphemes to words varies by language Analytic languages Vietnamese one morpheme per word English Synthetic languages Turkish many morphemes per word West Greenlandic

Morphology can change syntax or semantics of a word “love” (VB) Inflectional morphology love (VB), love s (VB), lov ing (VB), lov ed (VB) Derivational morphology love r (NN), love ly (ADJ), lov able (ADJ)

Morphemes can represent one or more features Agglutinative languages one feature per morpheme (Turkish) oku- r - sa - m read-AOR.COND.1SG ‘If I read …’ Fusional languages (English) many features per morpheme read- s read-3SG.SG ‘reads’

Words can have more than one stem Affixation one stem per word (English) studying study + ing Compounding many stems per word (German) Rettungshubschraubernotlandeplatz Rettung + s + hubschrauber + not + lande + platz rescue + LNK + helicopter + emergency + landing + place ‘Rescue helicopter emergency landing pad’

Inflection is not limited to affixation Base Modification drink , dr a nk , dr u nk (English) k (a) t (a) b (a) (Arabic) Root & Pattern write-PST.3SG.M ‘he wrote’ (Indonesian) ke merah ~ merah an Reduplication red-ADJ ‘reddish’

NLU lecture 6: Compositional character representations Adam Lopez - PowerPoint PPT Presentation

NLU lecture 6: Compositional character representations Adam Lopez alopez@inf.ed.ac.uk Credits: Clara Vania 2 Feb 2018 Lets revisit an assumption in language modeling (& word2vec) When does this assumption make sense for language

History and goals of NLU; course plan and goals Bill MacCartney and Christopher Potts CS 244U:

NLU lecture 5: Word representations and morphology Adam Lopez alopez@inf.ed.ac.uk Essential

Design Elements Issue Task Force March 12, 2014 1 Historic Character 2 Historic Character 3

Curriculum on Character Development L1/A: Character in Leadership Character Development Agenda

Curriculum on Character Development Character in Leadership Character Development Agenda

(Construction) Grammar does not Suffice for NLU Jerome Feldman, ICSI & UC Berkeley Natural

ASR, NLU, DM Ling575 Spoken Dialog Systems April 12, 2017 Roadmap ASR Basic

Why NLU doesnt generalize to NLG Yejin Choi Paul G. Allen School of Computer Science &

Chatbot models, NLU & ASR Pierre Lison IN4080 : Natural Language Processing (Fall 2020)

Advanced NLU & Dialog Models Ling575 Spoken Dialog Systems April 21, 2016 Roadmap

SDS: ASR, NLU, & VXML Ling575 Spoken Dialog April 14, 2016 Roadmap Dialog System

61A Lecture 16 Announcements String Representations String Representations 4 String

Character Education at Character Education at Northampton Academy An Academy of Character and

CANTERBURY TALES: POWERPOINT CHARACTER PRESENTATION CHARACTER PRESENTER PHYSICAL CHARACTER

- Character set - Character escape conventions - Canonical form - Line editing conventions

Strings II Review Strings are stored character by character. Can access each character

Shell Scripting Dalhousie University Winter 2019 Reading Glass and Ables, Chapter 8: bash Your

Cutting MapReduce Cost with Spot Market Huan Liu Accenture Technology Labs Why spot market? 2

DTrace Topics: /* print main headers */ OPT_dump ? printf("%s %s %s % DTraceToolkit

CORS & WEBSOCKETS Dealin g wit h Sam e -Origi n Polic y application lives on llama.com

STAT 113: FINAL EXAM PRACTICE PROBLEMS COLIN REIMER DAWSON, FALL 2015 Research Design /

using Amazon Key Management Service Jan Lindstrm, Principal Engineer, MariaDB Corporation

1 CephFS fsck: Distributed Filesystem Checking Hi, Im Greg Greg Farnum CephFS Tech Lead,

ASSENT COMPLIANCE Building a Conflict Minerals Program 2015/16 info@assentcompliance.com |

Sambuz

Useful Links

Newsletter

Mail Us

NLU lecture 6: Compositional character representations Adam Lopez - PowerPoint PPT Presentation

NLU lecture 6: Compositional character representations Adam Lopez alopez@inf.ed.ac.uk Credits: Clara Vania 2 Feb 2018 Lets revisit an assumption in language modeling (& word2vec) When does this assumption make sense for language

History and goals of NLU; course plan and goals Bill MacCartney and Christopher Potts CS 244U:

NLU lecture 5: Word representations and morphology Adam Lopez alopez@inf.ed.ac.uk Essential

Design Elements Issue Task Force March 12, 2014 1 Historic Character 2 Historic Character 3

Curriculum on Character Development L1/A: Character in Leadership Character Development Agenda

Curriculum on Character Development Character in Leadership Character Development Agenda

(Construction) Grammar does not Suffice for NLU Jerome Feldman, ICSI &amp; UC Berkeley Natural

ASR, NLU, DM Ling575 Spoken Dialog Systems April 12, 2017 Roadmap ASR Basic

Why NLU doesnt generalize to NLG Yejin Choi Paul G. Allen School of Computer Science &amp;

Chatbot models, NLU &amp; ASR Pierre Lison IN4080 : Natural Language Processing (Fall 2020)

Advanced NLU &amp; Dialog Models Ling575 Spoken Dialog Systems April 21, 2016 Roadmap

SDS: ASR, NLU, &amp; VXML Ling575 Spoken Dialog April 14, 2016 Roadmap Dialog System

61A Lecture 16 Announcements String Representations String Representations 4 String

Character Education at Character Education at Northampton Academy An Academy of Character and

CANTERBURY TALES: POWERPOINT CHARACTER PRESENTATION CHARACTER PRESENTER PHYSICAL CHARACTER

- Character set - Character escape conventions - Canonical form - Line editing conventions

Strings II Review Strings are stored character by character. Can access each character

Shell Scripting Dalhousie University Winter 2019 Reading Glass and Ables, Chapter 8: bash Your

Cutting MapReduce Cost with Spot Market Huan Liu Accenture Technology Labs Why spot market? 2

DTrace Topics: /* print main headers */ OPT_dump ? printf(&quot;%s %s %s % DTraceToolkit

CORS &amp; WEBSOCKETS Dealin g wit h Sam e -Origi n Polic y application lives on llama.com

STAT 113: FINAL EXAM PRACTICE PROBLEMS COLIN REIMER DAWSON, FALL 2015 Research Design /

using Amazon Key Management Service Jan Lindstrm, Principal Engineer, MariaDB Corporation

1 CephFS fsck: Distributed Filesystem Checking Hi, Im Greg Greg Farnum CephFS Tech Lead,

ASSENT COMPLIANCE Building a Conflict Minerals Program 2015/16 info@assentcompliance.com |

Sambuz

Useful Links

Newsletter

Mail Us

(Construction) Grammar does not Suffice for NLU Jerome Feldman, ICSI & UC Berkeley Natural

Why NLU doesnt generalize to NLG Yejin Choi Paul G. Allen School of Computer Science &

Chatbot models, NLU & ASR Pierre Lison IN4080 : Natural Language Processing (Fall 2020)

Advanced NLU & Dialog Models Ling575 Spoken Dialog Systems April 21, 2016 Roadmap

SDS: ASR, NLU, & VXML Ling575 Spoken Dialog April 14, 2016 Roadmap Dialog System

DTrace Topics: /* print main headers */ OPT_dump ? printf("%s %s %s % DTraceToolkit

CORS & WEBSOCKETS Dealin g wit h Sam e -Origi n Polic y application lives on llama.com