Probing RNN Encoder-Decoder Generalization of Subregular Functions - PowerPoint PPT Presentation

Introduction Computational Properties of Reduplication Methods Results Discussion References Appendix Probing RNN Encoder-Decoder Generalization of Subregular Functions Using Reduplication Max Nelson, Hossep Dolatian, Jonathan Rawski, Brandon Prickett University of Massachusetts Amherst, Stony Brook University January 5, 2020 1

Introduction Computational Properties of Reduplication Methods Results Discussion References Appendix Talk in a Nutshell ● Formal Languages/Automata: ▸ Necessary and sufficient conditions on computable functions ▸ Provide target function classes for generalization/learning ▸ transparent, analytical guarantees independent of the machine ● Recurrent Neural Network/ finite-state connections ● What is the generalization capacity of RNN Encoder-Decoders? Encoder-decoders and Subregular Reduplication ● Reduplication: variable-length subregular copy functions ● Vanilla Encoder-Decoders struggle to capture generalizable reduplication, networks with attention reliably succeed ● Attention weights mirror subregular 2-way FST processing, suggests they are approximating them 2

Introduction Computational Properties of Reduplication Methods Results Discussion References Appendix RNN and regular languages Language : Does string w belong to stringset (language) L ● Computed by different classes of grammars ( acceptors ) How expressive are RNNs? Turing complete infinite precision+time (Siegelmann, 2012) ⊆ counter languages LSTM/ReLU (Weiss et al., 2018) Regular SRNN/GRU (Weiss et al., 2018) asymptotic acceptance (Merrill, 2019) Weighted FSA Linear 2nd Order RNN (Rabusseau et al., 2019) Subregular LSTM problems (Avcu et al., 2017) pic credit: Casey 1996 3

Introduction Computational Properties of Reduplication Methods Results Discussion References Appendix RNN Encoder-Decoder and Transducers ● Function : Given string w , generate f ( w ) = v = accepted pairs of input & output strings ▸ Computed by different classes of grammars ( transducers ) ● Recurrent encoder maps a sequence to v ∈ R n , recurrent decoder language model conditioned on v (Sutskever et al., 2014) ● How expressive are they? 4

Introduction Computational Properties of Reduplication Methods 1-w ay FSTs for reduplication Results 2-w ay FSTs for reduplication Discussion References Appendix Brief typology of reduplication ● Reduplication is typologically common 1 ● Basic division: partial vs. total reduplication (1) Partial reduplication = bounded copy a. CV: guyon → gu ∼ guyon ‘to jest’ → ‘to jest repeatedly’ (Sundanese) b. Foot: (gindal)ba → gindal ∼ gindalba ‘lizard sp.’ → ‘lizards’ (Yidin) c. Syllable vam.se → vam ∼ vamse ‘hurry’ → ‘hurry (habitual)’ (Yaqui) (2) Total reduplication = unbounded copy wanita → wanita ∼ wanita a. ‘woman’ → ‘women’ (Indonesian) 1 (Moravcsik, 1978; Rubino, 2013) 5

Introduction Computational Properties of Reduplication Methods 1-w ay FSTs for reduplication Results 2-w ay FSTs for reduplication Discussion References Appendix Subregular computing of reduplication ● Why reduplication ( Red )? ▸ inhabits sub classes of regular string-to-string functions ▸ computed by restricted types of Finite-State Transducers 1. 1-way FST : reads input once in one direction ∼ computes Rational functions e.g., Sequential functions like partial Red 2. 2-way FST : reads multiple times, moves back and forth ∼ computes Regular functions e.g., Concatenated-Sequential functions like partial & total Red 2-way FST = Regular 1-way = Rational C-Sequential Sequential 6

Introduction Computational Properties of Reduplication Methods 1-w ay FSTs for reduplication Results 2-w ay FSTs for reduplication Discussion References Appendix Partial reduplication with 1-way FSTs ● Working example: pat → [pa ∼ pat] Input: ⋊ p a t ⋉ Output: q 2 Σ ∶ Σ t:t a:a ∼ ta ⋊ : λ ⋉ : λ q 1 q 4 q 5 start q 0 p:p a:a ∼ pa q 3 7

Introduction Computational Properties of Reduplication Methods 1-w ay FSTs for reduplication Results 2-w ay FSTs for reduplication Discussion References Appendix Partial reduplication with 1-way FSTs ● Working example: pat → [pa ∼ pat] Input: ⋊ p a t ⋉ Output: q 2 Σ ∶ Σ t:t a:a ∼ ta ⋊ : λ ⋉ : λ q 0 q 4 q 5 start q 1 p:p a:a ∼ pa q 3 7

Introduction Computational Properties of Reduplication Methods 1-w ay FSTs for reduplication Results 2-w ay FSTs for reduplication Discussion References Appendix Partial reduplication with 1-way FSTs ● Working example: pat → [pa ∼ pat] Input: ⋊ p a t ⋉ Output: p q 2 Σ ∶ Σ t:t a:a ∼ ta ⋊ : λ ⋉ : λ q 0 q 1 q 4 q 5 start p:p a:a ∼ pa q 3 7

Introduction Computational Properties of Reduplication Methods 1-w ay FSTs for reduplication Results 2-w ay FSTs for reduplication Discussion References Appendix Partial reduplication with 1-way FSTs ● Working example: pat → [pa ∼ pat] Input: ⋊ p a t ⋉ Output: p a ∼ pa q 2 Σ ∶ Σ t:t a:a ∼ ta ⋊ : λ ⋉ : λ q 0 q 1 q 5 start q 4 p:p a:a ∼ pa q 3 7

Introduction Computational Properties of Reduplication Methods 1-w ay FSTs for reduplication Results 2-w ay FSTs for reduplication Discussion References Appendix Partial reduplication with 1-way FSTs ● Working example: pat → [pa ∼ pat] Input: ⋊ p a t ⋉ Output: p a ∼ pa t q 2 Σ ∶ Σ t:t a:a ∼ ta ⋊ : λ ⋉ : λ q 0 q 1 q 5 start q 4 p:p a:a ∼ pa q 3 7

Introduction Computational Properties of Reduplication Methods 1-w ay FSTs for reduplication Results 2-w ay FSTs for reduplication Discussion References Appendix Partial reduplication with 1-way FSTs ● Working example: pat → [pa ∼ pat] Input: ⋊ p a t ⋉ Output: p a ∼ pa t q 2 Σ ∶ Σ t:t a:a ∼ ta ⋊ : λ ⋉ : λ q 0 q 1 q 4 start q 5 p:p a:a ∼ pa q 3 7

Introduction Computational Properties of Reduplication Methods 1-w ay FSTs for reduplication Results 2-w ay FSTs for reduplication Discussion References Appendix Partial reduplication with 1-way FSTs ● Working example: pat → [pa ∼ pat] Input: ⋊ p a t ⋉ � Output: p a ∼ pa t q 2 Σ ∶ Σ t:t a:a ∼ ta ⋊ : λ ⋉ : λ q 0 q 1 q 4 q 5 start p:p a:a ∼ pa q 3 7

Introduction Computational Properties of Reduplication Methods 1-w ay FSTs for reduplication Results 2-w ay FSTs for reduplication Discussion References Appendix 1-way FST Limitations ● How does a 1-way FST handle reduplication? → memorizes all possible reduplicants ● Many limitations: 1. State explosion : ▸ scaling problems as size of reduplicant and alphabet increases ▸ unwieldy machines (Roark and Sproat, 2007:54) 2. Limited expressivity : ▸ can do partial reduplication but not total reduplication ▸ No bound on how big the copies are 3. Segment alignment : ▸ Memorizes, doesn’t ‘copy’ 8

Introduction Computational Properties of Reduplication Methods 1-w ay FSTs for reduplication Results 2-w ay FSTs for reduplication Discussion References Appendix Partial reduplication with 2-way FSTs ● Working example: pat → [pa ∼ pat] Input: ⋊ c o p i e s ⋉ Output: ⋊ : λ :+1 C:C:+1 V:V:+1 q 1 q 2 q 3 start q 0 C:C:-1 ⋉ : λ :+1 q 4 q 5 q 6 Σ ∶ λ ∶ − 1 ⋊ : ∼ ∶ + 1 Σ ∶ Σ ∶ + 1 9

Introduction Computational Properties of Reduplication Methods 1-w ay FSTs for reduplication Results 2-w ay FSTs for reduplication Discussion References Appendix Partial reduplication with 2-way FSTs ● Working example: pat → [pa ∼ pat] Input: ⋊ p a t ⋉ Output: ⋊ : λ :+1 C:C:+1 q 1 q 2 start q 0 V:V:-1 ⋉ : λ :+1 q 3 q 4 q 5 Σ ∶ λ ∶ − 1 ⋊ : ∼ ∶ + 1 Σ ∶ Σ ∶ + 1 9

Introduction Computational Properties of Reduplication Methods 1-w ay FSTs for reduplication Results 2-w ay FSTs for reduplication Discussion References Appendix Partial reduplication with 2-way FSTs ● Working example: pat → [pa ∼ pat] Input: ⋊ p a t ⋉ Output: ⋊ : λ :+1 C:C:+1 q 0 q 2 start q 1 V:V:-1 ⋉ : λ :+1 q 3 q 4 q 5 Σ ∶ λ ∶ − 1 ⋊ : ∼ ∶ + 1 Σ ∶ Σ ∶ + 1 9

Introduction Computational Properties of Reduplication Methods 1-w ay FSTs for reduplication Results 2-w ay FSTs for reduplication Discussion References Appendix Partial reduplication with 2-way FSTs ● Working example: pat → [pa ∼ pat] Input: ⋊ p a t ⋉ Output: p ⋊ : λ :+1 C:C:+1 q 0 q 1 start q 2 V:V:-1 ⋉ : λ :+1 q 3 q 4 q 5 Σ ∶ λ ∶ − 1 ⋊ : ∼ ∶ + 1 Σ ∶ Σ ∶ + 1 9

Introduction Computational Properties of Reduplication Methods 1-w ay FSTs for reduplication Results 2-w ay FSTs for reduplication Discussion References Appendix Partial reduplication with 2-way FSTs ● Working example: pat → [pa ∼ pat] Input: ⋊ p a t ⋉ Output: p a ⋊ : λ :+1 C:C:+1 q 0 q 1 q 2 start V:V:-1 ⋉ : λ :+1 q 5 q 6 Σ ∶ λ ∶ − 1 q 3 ⋊ : ∼ ∶ + 1 Σ ∶ Σ ∶ + 1 9

Probing RNN Encoder-Decoder Generalization of Subregular Functions - PowerPoint PPT Presentation

Introduction Computational Properties of Reduplication Methods Results Discussion References Appendix Probing RNN Encoder-Decoder Generalization of Subregular Functions Using Reduplication Max Nelson, Hossep Dolatian, Jonathan Rawski,

Exercise 2: Encoder / Decoder Framework Goals : Implement basic framework for encoder and decoder

UN13750 Programmable Encoder/Decoder Single chip contains both Encoder and Decoder. Schmitt

Three subregular classes of formal languages for phonology Jeffrey Heinz heinz@udel.edu

Outline Gated Feedback Recurrent Neural Networks. arXiv1502. Introduction: RNN & Gated RNN

Aural Pattern Recognition Experiments and the Subregular Hierarchy James Rogers and Geoffrey K.

The Attention Mechanism & Encoder-Decoder Variants CMSC 470 Marine Carpuat Introduction to

Image and Video Coding: Introduction bitstream encoder decoder Motivation Image and Video

Image and Video Coding: Representation, Acquisition, Display ... 10011 ... encoder decoder

A Hierarchical Encoder-Decoder for Paragraph Summarization Farzaneh Mahdisoltani Department of

Contents PRO-Decoder Function Methods Results Abstract Experiment Computer RBS-Decoder

Attention-based Encoder-Decoder Networks NLP challenges Methods for Spelling and Grammatical

Attention Graham Neubig Site https://phontron.com/class/nn4nlp2017/ Encoder-decoder Models

Attention Graham Neubig Site https://phontron.com/class/nn4nlp2020/ Encoder-decoder Models

Linear probing with constant independence Anna Pagh, Rasmus Pagh, and Milan Ru i IT

Exemplar Encoder Decoder for Neural Conversation Generation By Gaurav Pandey, Danish

Adaptive Multi-pass Decoder for Neural Machine Translation EMNLP 2018

Decoding in Compressed Sensing Ronald DeVore USC, 2008 p. 1/33 Discrete Compressed Sensing R

Composing f and f 1 . x = f 1 ( y ) then We know that if y = f ( x ) . Substituting f ( x

Bridging Shannon and Hamming: Codes for computationally simple channels Venkatesan Guruswami

CSE 140 Lecture 12 Standard Combinational Modules Professor CK Cheng CSE Dept. UC San Diego 1

On Row-by-Row Coding for 2-D Constraints Ido Tal Tuvi Etzion Ron M. Roth Computer Science

The Year of the Linux Video Codec Drivers Embedded Linux Conference 2017 Portland Laurent

Brief Encounter Networks Vassilis Kostakos Oxford University, 16 October 2007 Motivation

Foundations of Chemical Kinetics Lecture 29: Diffusion-influenced reactions, Part II Marc R.

Probing RNN Encoder-Decoder Generalization of Subregular Functions - PowerPoint PPT Presentation

Introduction Computational Properties of Reduplication Methods Results Discussion References Appendix Probing RNN Encoder-Decoder Generalization of Subregular Functions Using Reduplication Max Nelson, Hossep Dolatian, Jonathan Rawski,

Exercise 2: Encoder / Decoder Framework Goals : Implement basic framework for encoder and decoder

UN13750 Programmable Encoder/Decoder Single chip contains both Encoder and Decoder. Schmitt

Three subregular classes of formal languages for phonology Jeffrey Heinz heinz@udel.edu

Outline Gated Feedback Recurrent Neural Networks. arXiv1502. Introduction: RNN &amp; Gated RNN

Aural Pattern Recognition Experiments and the Subregular Hierarchy James Rogers and Geoffrey K.

The Attention Mechanism &amp; Encoder-Decoder Variants CMSC 470 Marine Carpuat Introduction to

Image and Video Coding: Introduction bitstream encoder decoder Motivation Image and Video

Image and Video Coding: Representation, Acquisition, Display ... 10011 ... encoder decoder

A Hierarchical Encoder-Decoder for Paragraph Summarization Farzaneh Mahdisoltani Department of

Contents PRO-Decoder Function Methods Results Abstract Experiment Computer RBS-Decoder

Attention-based Encoder-Decoder Networks NLP challenges Methods for Spelling and Grammatical

Attention Graham Neubig Site https://phontron.com/class/nn4nlp2017/ Encoder-decoder Models

Attention Graham Neubig Site https://phontron.com/class/nn4nlp2020/ Encoder-decoder Models

Linear probing with constant independence Anna Pagh, Rasmus Pagh, and Milan Ru i IT

Exemplar Encoder Decoder for Neural Conversation Generation By Gaurav Pandey, Danish

Adaptive Multi-pass Decoder for Neural Machine Translation EMNLP 2018

Decoding in Compressed Sensing Ronald DeVore USC, 2008 p. 1/33 Discrete Compressed Sensing R

Composing f and f 1 . x = f 1 ( y ) then We know that if y = f ( x ) . Substituting f ( x

Bridging Shannon and Hamming: Codes for computationally simple channels Venkatesan Guruswami

CSE 140 Lecture 12 Standard Combinational Modules Professor CK Cheng CSE Dept. UC San Diego 1

On Row-by-Row Coding for 2-D Constraints Ido Tal Tuvi Etzion Ron M. Roth Computer Science

The Year of the Linux Video Codec Drivers Embedded Linux Conference 2017 Portland Laurent

Brief Encounter Networks Vassilis Kostakos Oxford University, 16 October 2007 Motivation

Foundations of Chemical Kinetics Lecture 29: Diffusion-influenced reactions, Part II Marc R.

Outline Gated Feedback Recurrent Neural Networks. arXiv1502. Introduction: RNN & Gated RNN

The Attention Mechanism & Encoder-Decoder Variants CMSC 470 Marine Carpuat Introduction to