Machine Learning Language Tensors Reduplication
Understanding Machine Learning with Language and Tensors Jon Rawski - - PowerPoint PPT Presentation
Understanding Machine Learning with Language and Tensors Jon Rawski - - PowerPoint PPT Presentation
Machine Learning Language Tensors Reduplication Understanding Machine Learning with Language and Tensors Jon Rawski Linguistics Department Institute for Advanced Computational Science Stony Brook University 1 Machine Learning Language
Machine Learning Language Tensors Reduplication
Thinking Like A Linguist
1 Language, like physics, is not just data you throw at a machine 2 Language is a fundamentally computational process, uniquely
learned by humans from small data.
3 We can use core properties of language to understand how
- ther systems generalize, learn, and perform inference.
2
Machine Learning Language Tensors Reduplication
3
Machine Learning Language Tensors Reduplication
The Zipf Problem (Yang 2013)
4
Machine Learning Language Tensors Reduplication
A Recipe for Machine Learning
1 Given training data: {xi,yi}N
i=1
2 Choose each of these:
◮ Decision Function: ˆ
y = fθ (xi)
◮ Loss Function: ℓ(ˆ
y,yi) ∈ R
3 Define Goal:
θ ∗ = argminθ ∑N
i=1ℓ(fθ (xi),yi)
4 Train (take small steps opposite the gradient):
θ (t+1) = θ (t) −ηt∇ℓ(fθ (xi),yi)
5
Machine Learning Language Tensors Reduplication
“Neural” Networks & Automatic Differentiation
p.c. Matt Gormley 6
Machine Learning Language Tensors Reduplication
Recurrent Neural Networks (RNN)
Acceptor: Read in a sequence. Predict from the end state. Backprop the error all the way back.
p.c. Yoav Goldberg 7
Machine Learning Language Tensors Reduplication
Recurrent Neural Networks (RNN)
Acceptor: Read in a sequence. Predict from the end state. Backprop the error all the way back.
p.c. Yoav Goldberg 7
Machine Learning Language Tensors Reduplication
What is a function for language?
Alphabet: Σ = {a,b,c,...}
◮ Examples: letters, DNA peptides, words, map directions, etc.
Σ∗: all possible sequences (strings) using alphabet
◮ Examples: aaaaaaaaa, baba, bcabaca,...
Languages: Subsets of Σ∗ following some pattern
◮ Examples:
◮ {ba, baba, bababa, bababababa, ...}: 1 or more ba ◮ {ab, aabb, aaabbb, aaaaaabbbbbb,...}: anbn ◮ {aa, aab, aba, aabbaabbaa,...}: Even # of a’s
8
Machine Learning Language Tensors Reduplication
What is a function for language?
◮ Grammar/Automaton: Computational device that decides
whether a string is in a language (says yes/no)
◮ Functional perspective: f : Σ∗ → {0,1}
p.c. Casey 1996 9
Machine Learning Language Tensors Reduplication
Regular Languages & Finite-State Automata
Regular Language: Memory required is finite w.r.t. input (ba)*: {ba, baba, bababa,...} q0 start q1 b a b(a*): {b, ba, baaaaaa,....} q0 start q1 b a
10
Machine Learning Language Tensors Reduplication
Regular Languages & Finite-State Automata
f : Σ∗ → R
p.c. B. Balle, X. Carreras, A. Quattoni - ENMLP’14 tutorial 11
Machine Learning Language Tensors Reduplication
Supra-Regularity in Natural Language
12
Machine Learning Language Tensors Reduplication
Chomsky Hierarchy
Computably Enumerable
Context- Sensitive Mildly Context- Sensitive Context-Free Regular Finite Yoruba copying Kobele 2006 Swiss German Shieber 1985 English nested embedding Chomsky 1957 English consonant clusters Clements and Keyser 1983 Kwakiutl stress Bach 1975 Chumash sibilant harmony Applegate 1972
p.c. Rawski & Heinz 2019 13
Machine Learning Language Tensors Reduplication
Chomsky Hierarchy
Computably Enumerable
Context- Sensitive Mildly Context- Sensitive Context-Free Regular Finite Yoruba copying Kobele 2006 Swiss German Shieber 1985 English nested embedding Chomsky 1957 English consonant clusters Clements and Keyser 1983 Kwakiutl stress Bach 1975 Chumash sibilant harmony Applegate 1972
p.c. Rawski & Heinz 2019 13
Machine Learning Language Tensors Reduplication
Tensors: Quick and Dirty Overview
◮ Order 1 — vector:
- v ∈ A = ∑
i
Cv
i −
→ ai
◮ Order 2 — matrix:
M ∈ A⊗B = ∑
ij
CM
ij −
→ ai ⊗− → bj
◮ Order 3 — Cuboid:
R ∈ A⊗B⊗C = ∑
ijk
CR
ijk−
→ ai ⊗− → bj ⊗− → ck
14
Machine Learning Language Tensors Reduplication
Tensor Networks (Penrose Notation?)
(T ×1 A×2 B×3 C)i1,i2,i3 = ∑k1k2k3 Tk1k2k3Ai1k1Bi2k2Ci3k3
p.c. Guillaume Rabusseau 15
Machine Learning Language Tensors Reduplication
Second-Order RNN
Hidden state is computed by ht = g(W ×2 xt ×3 ht−1) The computation of a finite-state machine is very similar! where A ∈ Rn×Σ×n defined by A:,σ,: = Aσ
p.c. Guillaume Rabusseau 16
Machine Learning Language Tensors Reduplication
Theorem (Rabusseau et al 2019) Weighted FSA are expressively equivalent to second-order linear RNNs (linear 2-RNNs) for computing functions over sequences of discrete symbols. Theorem (Merrill 2019) RNNs asymptotically accept exactly the regular languages Theorem (Casey 1996) A finite-dimensional RNN can robustly perform only finite-state computations.
17
Machine Learning Language Tensors Reduplication
Theorem (Casey 1996) An RNN with finite-state behavior necessarily partitions its state space into disjoint regions that correspond to the states of the minimal FSA
18
Machine Learning Language Tensors Reduplication
Analyzing Specific Neuron Dynamics
◮ RNN with only 2 neurons in its hidden state trained on
“Even-A" language.
◮ Input: stream of strings separated by $ symbol ◮ Neuron 1: all even as, and $ symbol after a rejected string ◮ Neuron B: all b’s following even number of a’s, and $ after an
accepted string.
p.c. Oliva & Lago-Fernàndez 2019 19
Machine Learning Language Tensors Reduplication
But...Translation Needs an Output!
f : Σ∗ → ∆∗
p.c. Bahdanau et al 2014 20
Machine Learning Language Tensors Reduplication
RNN Encoder-Decoder
p.c. Chris Dyer 21
Machine Learning Language Tensors Reduplication
RNN Encoder-Decoder
p.c. Chris Dyer 21
Machine Learning Language Tensors Reduplication
RNN Encoder-Decoder
p.c. Chris Dyer 21
Machine Learning Language Tensors Reduplication
RNN Encoder-Decoder
p.c. Chris Dyer 21
Machine Learning Language Tensors Reduplication
RNN Encoder-Decoder
p.c. Chris Dyer 21
Machine Learning Language Tensors Reduplication
RNN Encoder-Decoder
p.c. Chris Dyer 21
Machine Learning Language Tensors Reduplication
Our idea: Use functions that copy!
(1) Total reduplication = unbounded copy (∼83%) a. wanita→wanita∼wanita ‘woman’→‘women’ (Indonesian) (2) Partial reduplication = bounded copy (∼75%)
- a. C:
gen→g∼gen ‘to sleep’→‘to be sleeping’ (Shilh)
- b. CV:
guyon→gu∼guyon ‘to jest’→‘to jest repeatedly’ (Sundanese)
- c. CVC:
takki→ tak∼takki ‘leg’→‘legs’ (Agta)
- d. CVCV:
banagañu→bana∼banagañu ‘return’ (Dyirbal)
22
Machine Learning Language Tensors Reduplication
1-way and 2-way Finite-State Transducers
Finite-state transducer Origin information 1-way a.i a.ii q0 start q1 q2 q3 q4 qf
(⋊:⋊) (t:t) (p:p) (a:a∼ta) (a:a∼pa) (Σ : Σ) (⋉:⋉)
p a t p a p a t 2-way b.i b.ii q0 start q1 q2 q3 q4 qf
(⋊:λ:+1) (C:C:+1) (V:V:-1) (Σ:Σ:-1) (⋊:∼:+1) (Σ:Σ:+1) (⋉:λ:+1)
p a t p a p a t
23
Machine Learning Language Tensors Reduplication
Encoder-Decoder = 1-way or 2-way FST?
24
Machine Learning Language Tensors Reduplication
Encoder-Decoder = 1-way or 2-way FST?
24
Machine Learning Language Tensors Reduplication
Encoder-Decoder = 1-way or 2-way FST?
24
Machine Learning Language Tensors Reduplication
Encoder-Decoder = 1-way or 2-way FST?
24
Machine Learning Language Tensors Reduplication
Encoder-Decoder = 1-way or 2-way FST?
24
Machine Learning Language Tensors Reduplication
Main Points
1 Language is not just data you throw at a machine 2 Language is a fundamentally computational process uniquely
learned by humans.
3 We can use core properties of language to understand how
- ther systems learn.
Want More?
◮ Mathematical Linguistics Reading Group
◮ Fridays, 12pm-1pm, SBS N250 ◮ Website: complab-stonybrook.github.io/mlrg/
◮ IACS Machine Learning and Statistical Inference Working
Group
◮ Every other week, contact me for details