Character-level Language Models With Word-level Learning Arvid - PowerPoint PPT Presentation

Aug 27, 2023 •997 likes •1.07k views

Character-level Language Models With Word-level Learning Arvid Frydenlund March 16, 2018 Character-level Language models Want language models with an open vocabulary Character-level models give this for free Treat the probability of

Character-level Language Models With Word-level Learning Arvid Frydenlund March 16, 2018
Character-level Language models ◮ Want language models with an open vocabulary ◮ Character-level models give this for free ◮ Treat the probability of a word as the product of character probabilities m e s c ( c j +1 , j ) � P w ( w = c 1 , ..., c m | h i ) = (1) � c ′ ∈ V c e s c ( c ′ , j ) j =0 ◮ Where V c is the character ‘vocabulary’ ◮ Models are trained to minimize per character cross entropy ◮ Issue: Training focuses on how words look and not what they mean ◮ Solution: Do not define the probability of a word as the product of character probabilities
Globally normalized word probabilities ◮ Conditional Random Field objective P w ( w = c 1 , ..., c m | h i ) = e s w ( w = c 1 ,..., c m , h i ) (2) w ′ ∈ V e s w ( w ′ , h i ) � ◮ normalizing partition function over all words in the (open) vocabulary ◮ Issue: Partition function is intractable ◮ Solution: Use beam search to limit the scope of the elements comprising the partition function. ◮ This can be seen as approximating P ( w ) by normalizing over the top most probable candidate words. ◮ Issue: Elements of partition are words of different length. ◮ Score function and beam search need to be length agnostic.
c 1 c 1 c 1 Projection . Projection . Projection . . . . h j =0 h j =1 h j =2 h j =3 q = s w ( w = ‘ sat ′ , h i =2 ) Beam 1 • . . . Argmax Argmax Argmax c | V c | ‘s’ c | V c | ‘a’ c | V c | ‘t’ c 1 c 1 c 1 ‘s’ ‘o’ ‘t’ Projection . Projection . Projection . . . . h j =0 h j =1 h j =2 h j =3 q = s w ( w = ‘ sot ′ , h i =2 ) Beam 2 • . . . c | V c | c | V c | c | V c | ... h i =1 h i =2 ‘t’ ‘h’ ‘e’ ‘c’ ‘a’ ‘t’ Figure: Predicting the next word in the sequence ‘the cat’. The beam search uses two beams over three steps and produces the words ‘sat’ and ‘sot’ in the top beams. ◮ Beam search in back pass as well   n � � s w ( w ′ , h i ) J =  − s w ( w i , h i ) + (3)  i =1 w ′ ∈ B top ( i )
Experiments ◮ Toy problem of generating word-forms given word embeddings ◮ Compare to LSTM baseline ◮ Test accuracy across different score functions (average character score, average character probability, hidden-state score) ◮ Test accuracy across different beam-sizes ◮ Eventually a full language model ◮ This model has dynamic vocabulary at every step ◮ New evaluation metric for open vocabulary language models

Recommend

Design Elements Issue Task Force March 12, 2014 1 Historic Character 2 Historic Character 3

Design Elements Issue Task Force March 12, 2014 1 Historic Character 2 Historic Character 3 Historic Character 4 Historic Character 5 Historic Character 6 Historic Character 7 Historic Character 8 Historic Character 9 Historic

1.43k views • 121 slides

Curriculum on Character Development L1/A: Character in Leadership Character Development Agenda

California Cadet Corps Curriculum on Character Development L1/A: Character in Leadership Character Development Agenda A1. Character defined A2. Core Values A3. Cadet Honor Code CHARACTER DEFINED A1. Define character as it relates

538 views • 35 slides

Curriculum on Character Development Character in Leadership Character Development Agenda

California Cadet Corps Curriculum on Character Development Character in Leadership Character Development Agenda A1. Character defined A2. Core Values A3. Cadet Honor Code CHARACTER DEFINED A1. Define character as it relates to

678 views • 32 slides

Memory Memory Decoders M bits M bits RWM NVRWM ROM S 0 S 0 Word 0 Word 0 S 1 Word 1 Word

Memory Memory Decoders M bits M bits RWM NVRWM ROM S 0 S 0 Word 0 Word 0 S 1 Word 1 Word 1 A 0 S 2 Storage Storage Random Non-Random Word 2 Word 2 N Words Cell A 1 Cell EPROM Mask-Programmed Decoder Access Access E 2 PROM

465 views • 21 slides

Character Eyes: Seeing Language through Character-Level Taggers Yuval Pinter Marc Marone Jacob

Character Eyes: Seeing Language through Character-Level Taggers Yuval Pinter Marc Marone Jacob Eisenstein @yuvalpi @ruyimarone @jacobeisenstein Blackbox NLP 2019 https://github.com/ruyimarone/character-eyes Taggers N sg V past RB DET

616 views • 58 slides

Models of Language Evolution models thereof its evolution language Models of Language Evolution

Models of Language Evolution models thereof its evolution language Models of Language Evolution ? What is language? never start with a dictionary definition!! the language of Google search query completion A language is a dialect with

1.13k views • 30 slides

Character Education at Character Education at Northampton Academy An Academy of Character and

Character Education at Character Education at Northampton Academy An Academy of Character and Excellence Chris Clyne Vice Principal Vice Principal Vice Principal - Pastoral Northampton Academy Large number of secondary school mission

541 views • 51 slides

CANTERBURY TALES: POWERPOINT CHARACTER PRESENTATION CHARACTER PRESENTER PHYSICAL CHARACTER

NAME: _________________________________ CANTERBURY TALES: POWERPOINT CHARACTER PRESENTATION CHARACTER PRESENTER PHYSICAL CHARACTER TRAIT BEHAVIOR/ATTITUDES SQUIRE YEOMAN PRIORESS MONK FRIAR MERCHANT OXFORD CLERIC SGT. OF LAW FRANKLIN

418 views • 3 slides

- Character set - Character escape conventions - Canonical form - Line editing conventions

CHARACTER STREAM PROCESSING - Character set - Character escape conventions - Canonical form - Line editing conventions CHARACTER SET CONSIDERATIONS minimum: graphic range equal to a typical office typewriter support: throughout system -programs

476 views • 9 slides

Strings II Review Strings are stored character by character. Can access each character

Strings II Review Strings are stored character by character. Can access each character individually by using an index: 0 1 2 3 4 5 6 7 "C" "o" "m" "p"

304 views • 27 slides

Is this a word that would be used by a mature language user? Is it a frequently used word?

Questions to help determine if a word falls into tier 2: Is it a word that students are not likely to know? Is this a word that would be used by a mature language user? Is it a frequently used word? Is it a word that is generally

870 views • 9 slides

Word Sense Word Sense Word Sense Disambiguation Disambiguation Disambiguation Presented by

Word Sense Word Sense Word Sense Disambiguation Disambiguation Disambiguation Presented by Jen-Wei Kuo Reference Foundations of Statistical Natural Language Processing, Chapter 7, Word Sense Disambiguation Speech and Language Processing,

877 views • 32 slides

Character-Aware Neural Language Models Yoon Kim Yacine Jernite David Sontag Alexander Rush

Character-Aware Neural Language Models Yoon Kim Yacine Jernite David Sontag Alexander Rush Harvard SEAS New York University Code: https://github.com/yoonkim/lstm-char-cnn Kim, Jernite, Sontag, Rush Character-Aware Neural Language Models 1 /

1.22k views • 82 slides

Character-Aware Neural Language Models Yoon Kim Yacine Jernite David Sontag Alexander Rush

1.2k views • 71 slides

Outline Language learning Computers Computers Computers Topic 6: CALL Topic 6: CALL Topic 6:

Language and Language and Language and Outline Language learning Computers Computers Computers Topic 6: CALL Topic 6: CALL Topic 6: CALL Language learning Language learning Language learning First language aquisition First language

343 views • 6 slides

Word Embeddings Natural Language Processing VU (706.230) - Andi Rexha 02/04/2020 Word Embeddings

Word Embeddings Natural Language Processing VU (706.230) - Andi Rexha 02/04/2020 Word Embeddings Agenda Traditional NLP Word Embeddings-1 Word Embeddings-2 Text preprocessing Topic Modeling ELMo Bag-of-words model Neural

901 views • 64 slides

Ando dilation and its applications Bata Krishna Das Indian Institute of Technology Bombay OTOA -

Ando dilation and its applications Bata Krishna Das Indian Institute of Technology Bombay OTOA - 2016 ISI Bangalore, December 20 ( joint work with J. Sarkar and S. Sarkar ) B. K. Das Ando dilation and its applications Introduction D : Open

414 views • 28 slides

Generators of quantum Markov Semigroups Matt Ziemke University of South Carolina Virginia

Generators of quantum Markov Semigroups Matt Ziemke University of South Carolina Virginia Operator Theory and Complex Analysis Meeting (VOTCAM) November 7th, 2015 Matt Ziemke Generators of QMS Paper (with G. Androulakis), Generators of

372 views • 22 slides

Ramsey properties and ultrahomogeneity for Banach and operator spaces. J. Lopez-Abad Instituto

Ramsey properties and ultrahomogeneity for Banach and operator spaces. J. Lopez-Abad Instituto de Ciencias Matem aticas,CSIC, Madrid joint work with D. Barto sov a, M. Lupini and B. Mbombo Transfinite methods in Banach spaces and

687 views • 66 slides

Tree Computation for Ranking and Classification CS240A, T. Yang, 2016 Outlines Decision Trees

Tree Computation for Ranking and Classification CS240A, T. Yang, 2016 Outlines Decision Trees Learning Assembles: Random forest, boosted trees Decision Trees Decision trees can express any function of the input attributes.

452 views • 31 slides

G4beamline target simulation for Fermilab muon g-2 Raffaele Miceli, Stony Brook University

G4beamline target simulation for Fermilab muon g-2 Raffaele Miceli, Stony Brook University Summary The Muon g-2 Experiment, BNL and Fermilab Benchmarking G4beamline Target Station Simulation Future developments Muon g-2 as a

804 views • 27 slides

High Resolution Observations From the Ground What the SST can do to enhance SOLAR-B science

High Resolution Observations From the Ground What the SST can do to enhance SOLAR-B science Gran Scharmer 17th SOT meeting, Tokyo 18 April 2006 17th SOT meeting, Tokyo 18 April 2006 1 Gran Scharmer (ISP) High Resolution Observations From

364 views • 23 slides

Characterization of multi parameter BMO spaces through commutators Stefanie Petermichl

Characterization of multi parameter BMO spaces through commutators Stefanie Petermichl Universit e Paul Sabatier IWOTA Chemnitz August 2017 S. Petermichl (Universit e Paul Sabatier) Commutators and BMO Chemnitz 1 / 30 history Hankel

576 views • 30 slides

Seriesrlciuits order Second transient response current-voltagerelations.hr# capaoi.r.IE#=cdI

Seriesrlciuits order Second transient response current-voltagerelations.hr# capaoi.r.IE#=cdI 1- i - v Component symbol relation R = in Resistor a- Who . R Or + - Or I terrors L did Inductor K = + q - dt R L - m -

93 views • 8 slides

Character-level Language Models With Word-level Learning Arvid - PowerPoint PPT Presentation

Character-level Language Models With Word-level Learning Arvid Frydenlund March 16, 2018 Character-level Language models Want language models with an open vocabulary Character-level models give this for free Treat the probability of

Design Elements Issue Task Force March 12, 2014 1 Historic Character 2 Historic Character 3

Curriculum on Character Development L1/A: Character in Leadership Character Development Agenda

Curriculum on Character Development Character in Leadership Character Development Agenda

Memory Memory Decoders M bits M bits RWM NVRWM ROM S 0 S 0 Word 0 Word 0 S 1 Word 1 Word

Character Eyes: Seeing Language through Character-Level Taggers Yuval Pinter Marc Marone Jacob

Models of Language Evolution models thereof its evolution language Models of Language Evolution

Character Education at Character Education at Northampton Academy An Academy of Character and

CANTERBURY TALES: POWERPOINT CHARACTER PRESENTATION CHARACTER PRESENTER PHYSICAL CHARACTER

- Character set - Character escape conventions - Canonical form - Line editing conventions

Strings II Review Strings are stored character by character. Can access each character

Is this a word that would be used by a mature language user? Is it a frequently used word?

Word Sense Word Sense Word Sense Disambiguation Disambiguation Disambiguation Presented by

Character-Aware Neural Language Models Yoon Kim Yacine Jernite David Sontag Alexander Rush

Character-Aware Neural Language Models Yoon Kim Yacine Jernite David Sontag Alexander Rush

Outline Language learning Computers Computers Computers Topic 6: CALL Topic 6: CALL Topic 6:

Word Embeddings Natural Language Processing VU (706.230) - Andi Rexha 02/04/2020 Word Embeddings

Ando dilation and its applications Bata Krishna Das Indian Institute of Technology Bombay OTOA -

Generators of quantum Markov Semigroups Matt Ziemke University of South Carolina Virginia

Ramsey properties and ultrahomogeneity for Banach and operator spaces. J. Lopez-Abad Instituto

Tree Computation for Ranking and Classification CS240A, T. Yang, 2016 Outlines Decision Trees

G4beamline target simulation for Fermilab muon g-2 Raffaele Miceli, Stony Brook University

High Resolution Observations From the Ground What the SST can do to enhance SOLAR-B science

Characterization of multi parameter BMO spaces through commutators Stefanie Petermichl

Seriesrlciuits order Second transient response current-voltagerelations.hr# capaoi.r.IE#=cdI

Sambuz

Useful Links

Newsletter

Mail Us