Neural Encoding with Structured Decoding Pushpendre Rastogi 3 rd - PowerPoint PPT Presentation

Neural Encoding with Structured Decoding Pushpendre Rastogi 3 rd year CS Phd. Student pushpendre@jhu.edu Johns Hopkins University CLSP Student Seminar, Spring 2016 Pushpendre Rastogi (CLSP, JHU) Representations . . . 1 / 18

Outline 1 Introduction 2 Best of Both Worlds: Neural Encoding with Structured Decoding 3 Acknowledgements and References Pushpendre Rastogi (CLSP, JHU) Representations . . . 2 / 18

Introduction: Two Themes 1 Improving Neural Network Architectures. Pushpendre Rastogi (CLSP, JHU) Representations . . . 3 / 18

Outline 1 Introduction 2 Best of Both Worlds: Neural Encoding with Structured Decoding 3 Acknowledgements and References Pushpendre Rastogi (CLSP, JHU) Representations . . . 4 / 18

Background: What is the task? String transduction : Convert an input string to an output string. Example • Morphological Transduction: • Convert an imperative word in german to its past participle form. a b r e i b t �→ a b g e r i e b e n • Lemmatization: • Lemmatize a word in tagalog. b i n a w a l a n �→ b a w a l • Annotate a string: • Bob is a builder �→ Noun Verb Det Noun Pushpendre Rastogi (CLSP, JHU) Representations . . . 5 / 18

What do we offer? Task = 13SIA Task = 2PIE 100 95 Accuracy 90 85 80 75 Task = 2PKE Task = rP 100 95 Accuracy 90 85 80 75 BiLSTM Seq2Seq BiLSTM Seq2Seq WFST Attention WFST Attention Method Method Pushpendre Rastogi (CLSP, JHU) Representations . . . 6 / 18

The Idea Use a Neural Sequence Encoder to weight the arcs of a Weighted FST. Pushpendre Rastogi (CLSP, JHU) Representations . . . 7 / 18

Background Weighted Finite State Transducers: Deterministic 3 0 1 2 s:s a:a y:y Pushpendre Rastogi (CLSP, JHU) Representations . . . 8 / 18

Background Weighted Finite State Transducers: Deterministic 3 0 1 2 s:s a:a y:y What is a State? The States of an FST/WFST are its Memory. Previous Work weights this transducer. Pushpendre Rastogi (CLSP, JHU) Representations . . . 8 / 18

Background Weighted Finite State Transducers: Non-Deterministic y:s ! :s ! :s s $ s:s y:y ! :a s:a a y a:s d: ! d:y i:y Pushpendre Rastogi (CLSP, JHU) Representations . . . 9 / 18

Background Weighted Finite State Transducers: Non-Deterministic y:s What’s in a Path? A Path is an alignment. ! :s ( ǫ :s s:a a:s y:s) �→ say:sass ! :s s $ ( ǫ :s s:a a: ǫ y:y) �→ say:say s:s ( ǫ : ǫ s:s a:a y:y) �→ say:say y:y ! :a s:a ( ǫ :s s:a a:s y:y) �→ say:sasy Previous Work weights this a y a:s transducer. d: ! d:y i:y Pushpendre Rastogi (CLSP, JHU) Representations . . . 9 / 18

Background Neural Bi-Directional Sequence Encoder α 0 e y e a e s Pushpendre Rastogi (CLSP, JHU) Representations . . . 10 / 18

Background Neural Bi-Directional Sequence Encoder α 0 f ( α 0 , e s ) e y e a e s Pushpendre Rastogi (CLSP, JHU) Representations . . . 10 / 18

Background Neural Bi-Directional Sequence Encoder α 0 α 1 e y e a e s Pushpendre Rastogi (CLSP, JHU) Representations . . . 10 / 18

Background Neural Bi-Directional Sequence Encoder α 0 α 1 α 2 e y e a e s Pushpendre Rastogi (CLSP, JHU) Representations . . . 10 / 18

Background Neural Bi-Directional Sequence Encoder α 0 α 1 α 2 e y e a e s β 3 Pushpendre Rastogi (CLSP, JHU) Representations . . . 10 / 18

Background Neural Bi-Directional Sequence Encoder α 0 α 1 α 2 e y e a e s β 2 β 3 Pushpendre Rastogi (CLSP, JHU) Representations . . . 10 / 18

Background Neural Bi-Directional Sequence Encoder α 0 α 1 α 2 e y e a e s β 1 β 2 β 3 Pushpendre Rastogi (CLSP, JHU) Representations . . . 10 / 18

Background: Existing models. Weighted Finite State Transducers [Moh97, Eis02] Pros Cons Neural Encoders and Decoders [SVL14] Pros Cons Pushpendre Rastogi (CLSP, JHU) Representations . . . 11 / 18

Background: Existing models. Weighted Finite State Transducers [Moh97, Eis02] Pros The states in an FST can be tailored for the task. Can compute the probability of a string. Cons Neural Encoders and Decoders [SVL14] Pros Cons Pushpendre Rastogi (CLSP, JHU) Representations . . . 11 / 18

Background: Existing models. Weighted Finite State Transducers [Moh97, Eis02] Pros The states in an FST can be tailored for the task. Can compute the probability of a string. Cons Traditionally arcs weights are linear functionals of arc features. • ROI on feature engineering may be low. • The model may become slow if there are too many features. • The local features may not be expressive enough. Neural Encoders and Decoders [SVL14] Pros Cons Pushpendre Rastogi (CLSP, JHU) Representations . . . 11 / 18

Background: Existing models. Weighted Finite State Transducers [Moh97, Eis02] Pros The states in an FST can be tailored for the task. Can compute the probability of a string. Cons Traditionally arcs weights are linear functionals of arc features. • ROI on feature engineering may be low. • The model may become slow if there are too many features. • The local features may not be expressive enough. Neural Encoders and Decoders [SVL14] Pros Produce reasonable results with zero feature engineering. Cons Pushpendre Rastogi (CLSP, JHU) Representations . . . 11 / 18

Background: Existing models. Weighted Finite State Transducers [Moh97, Eis02] Pros The states in an FST can be tailored for the task. Can compute the probability of a string. Cons Traditionally arcs weights are linear functionals of arc features. • ROI on feature engineering may be low. • The model may become slow if there are too many features. • The local features may not be expressive enough. Neural Encoders and Decoders [SVL14] Pros Produce reasonable results with zero feature engineering. Cons Require a lot of training data for performance. Cannot return the probability of a string. Pushpendre Rastogi (CLSP, JHU) Representations . . . 11 / 18

Neural Encoding with Structured Decoding 3 0 1 2 s:s a:a y:y Figure: The automaton I encoding say . y:s ! :s ! :s s $ s:s y:y ! :a s:a a y a:s d: ! d:y i:y Figure: Transducer F . Only a few of the possible states and edit arcs are shown. Previous Work weights these transducers Pushpendre Rastogi (CLSP, JHU) Representations . . . 12 / 18

Neural Encoding with Structured Decoding ! :s ! :s ! :s ! :s 3 y:s 0 1 2 s:s a:s s:s a:a y:y s: ! a: ! y: ! 3, s Figure: The automaton I encoding say . 0, s 1, s 2, s s:a a:a y:a y:s ! :s ! :a ! :s ! :a ! :s ! :a ! :s ! :a ! :s ! :s s $ s:s s:s a:s y:s 0, a 1, a 2, a 3, a y:y s: ! a: ! y: ! ! :a s:a s:s a:s y:s a y a:s ! :s ! :s ! :s ! :s d: ! d:y i:y Figure: G = I ◦ F . Only a few states, Figure: Transducer F . Only a few of the but all arcs between them are shown. possible states and edit arcs are shown. Our Work weights this transducer. Previous Work weights these transducers Pushpendre Rastogi (CLSP, JHU) Representations . . . 12 / 18

Neural Encoding with Structured Decoding ! :s ! :s ! :s ! :s 3 y:s 0 1 2 s:s a:s s:s a:a y:y s: ! a: ! y: ! 3, s Figure: The automaton I encoding say . 0, s 2, s 1, s s:a a:a y:a y:s ! :s ! :a ! :s ! :a ! :s ! :a ! :s ! :a ! :s ! :s s $ s:s s:s a:s y:s 0, a 1, a 2, a 3, a y:y s: ! a: ! y: ! ! :a s:a a s:s a:s y:s y a:s ! :s ! :s ! :s ! :s d: ! d:y i:y Figure: G = I ◦ F . Only a few states, Figure: Transducer F . Only a few of the but all arcs between them are shown. possible states and edit arcs are shown. Our Work weights this transducer. Previous Work weights these transducers Why do we do this? Weighting F ≡ Weighting edits per type. Weighting G ≡ Weighting edits per token. Neural features encode entire sentence. We get a context dependent output side language model. Pushpendre Rastogi (CLSP, JHU) Representations . . . 12 / 18

Neural Encoding with Structured Decoding ! :s ! :s ! :s ! :s y:s s:s a:s α 0 α 1 α 2 s: ! a: ! y: ! e y 3, s e a 0, s 1, s 2, s e s s:a a:a y:a β 1 β 2 β 3 ! :s ! :a ! :s ! :a ! :s ! :a ! :s ! :a s:s a:s y:s 0, a 1, a 2, a 3, a s: ! a: ! y: ! s:s a:s y:s ! :s ! :s ! :s ! :s Figure: G = I ◦ F . Only a few states, but all arcs between them are shown. Our Work weights this transducer. Pushpendre Rastogi (CLSP, JHU) Representations . . . 12 / 18

Neural Encoding with Structured Decoding Pushpendre Rastogi 3 rd - PowerPoint PPT Presentation

Neural Encoding with Structured Decoding Pushpendre Rastogi 3 rd year CS Phd. Student pushpendre@jhu.edu Johns Hopkins University CLSP Student Seminar, Spring 2016 Pushpendre Rastogi (CLSP, JHU) Representations . . . 1 / 18 Outline 1

Overview Understanding the neural code Neural Encoding Encoding: Prediction of neural response to

A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE

Why decoding? Understanding the neural code. Neural Decoding Given spikes, what was the

Structured Prediction Introduction What is structured prediction? CS 6355: Structured Prediction

By et al Siegfried Engelmann Decoding Strategies: Decoding B1- Teacher's Presentation Book

Decoding Philipp Koehn 17 September 2020 Philipp Koehn Machine Translation: Decoding 17

Bidirectional and executable specifications of machine code decoding and encoding Gang Tan, Penn

Bidirectional and executable specifications of machine code decoding and encoding Gang Tan, Penn

Beyond Sequential decoding toward parallel decoding In the context of neural sequence modelling

61A Extra Lecture 4 Announcements Encoding Strings Representing Strings: UTF-8 Encoding 4

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

Efficient Video Decoding on GPUs Efficient Video Decoding on GPUs by Point Based Rendering by

Chapter 6 Decoding Statistical Machine Translation Decoding We have a mathematical model for

List Decoding of Algebraic Codes Peter Beelen, Kristian Brander and Johan S.R. Nielsen DTU

Observation Decoding with Sensor Models: Recognition Tasks via Classical Planning Diego Aineto,

Statistical models for neural encoding, decoding, information estimation, and optimal on-line

State%Assignment%(Encoding) p clock/input Moore,type state/outputs Storage Elements k

A Tutorial on Techniques for Scalable Privacy-preserving Record Linkage Peter Christen 1 ,

HeapTherapy+: Efficient Handling of (Almost) All Heap Vulnerabilities Using Targeted

Tutorial: Sparse Recovery Using Sparse Matrices Piotr Indyk MIT Problem Formulation

Concise Encoding of Flow Attributes in SDN Switches Robert MacDavid *, Rdiger Birkner , Ori

String Theory String Theory Thiago Macieira Thiago Macieira Qt Developer Days 2014 Qt

IT452 Advanced Web and Internet Systems Set 8: XML, XPath, and XSLT (Chapter 15.1-4,15.8) Some

Chameleon: Keeping data safe for the nave and thri6y Ansley Post and Peter Druschel MPISWS