understanding machine learning with language and tensors
play

Understanding Machine Learning with Language and Tensors Jon Rawski - PowerPoint PPT Presentation

Machine Learning Language Tensors Reduplication Results References Understanding Machine Learning with Language and Tensors Jon Rawski Linguistics Department Institute for Advanced Computational Science Stony Brook University 1 Machine


  1. Machine Learning Language Tensors Reduplication Results References Understanding Machine Learning with Language and Tensors Jon Rawski Linguistics Department Institute for Advanced Computational Science Stony Brook University 1

  2. Machine Learning Language Tensors Reduplication Results References Thinking Like A Linguist 1 Language, like physics, is not just data you throw at a machine 2 Language is a fundamentally computational process, uniquely learned by humans from small, sparse, impoverished data. 3 We can use core properties of language to understand how other systems generalize, learn, and perform inference. 2

  3. Machine Learning Language Tensors Reduplication Results References Gaps Between wet and dry brains Data gap ◮ Modern ML is training-data hungry, requires orders of magnitude more training data than biological brains ◮ Biological brains have species-specific, adaptively-evolved prior structure, encoded in the species genome and reflected in mesoscale brain connectivity Energy Gap ◮ Modern computational infrastructure is energy-hungry, consuming orders of magnitude more power than biological brains. IT sector is a growing contributor to climate destruction 3

  4. Machine Learning Language Tensors Reduplication Results References 4

  5. Machine Learning Language Tensors Reduplication Results References The Zipf Problem (Yang 2013) 5

  6. Machine Learning Language Tensors Reduplication Results References A Recipe for Machine Learning 1 Given training data: { x i , y i } N i = 1 2 Choose each of these: ◮ Decision Function: ˆ y = f θ ( x i ) ◮ Loss Function: ℓ ( ˆ y , y i ) ∈ R 3 Define Goal: θ ∗ = argmin θ ∑ N i = 1 ℓ ( f θ ( x i ) , y i ) 4 Train (take small steps opposite the gradient): θ ( t + 1 ) = θ ( t ) − η t ∇ ℓ ( f θ ( x i ) , y i ) 6

  7. Machine Learning Language Tensors Reduplication Results References “Neural” Networks & Automatic Differentiation p.c. Matt Gormley 7

  8. Machine Learning Language Tensors Reduplication Results References Recurrent Neural Networks (RNN) Acceptor: Read in a sequence. Predict from the end state. Backprop the error all the way back. p.c. Yoav Goldberg 8

  9. Machine Learning Language Tensors Reduplication Results References Recurrent Neural Networks (RNN) Acceptor: Read in a sequence. Predict from the end state. Backprop the error all the way back. p.c. Yoav Goldberg 8

  10. Machine Learning Language Tensors Reduplication Results References Zebrafinches exhibit the opposite Expected behavior in Machine behavior: when presented the same Learning: Multiple presentations song multiple times, imitation should yield a better fit to training accuracy decreases samples Tchernichovsky et al, PNAS 1999 9

  11. Machine Learning Language Tensors Reduplication Results References When we consider it carefully, it is clear that no system — computer program or human — has any basis to reliably classify new examples that go beyond those it has already seen during training, unless that system has some additional prior knowledge or assumptions that go beyond the training examples. In short, there is no free lunch — no way to generalize beyond the specific training examples, unless the learner commits to some additional assumptions. Tom Mitchell, Machine Learning, 2nd ed Don’t “confuse ignorance of biases with abscence of biases" (Rawski and Heinz 2019) 10

  12. Machine Learning Language Tensors Reduplication Results References What is a function for language? Alphabet : Σ = { a , b , c ,... } ◮ Examples: letters, DNA peptides, words, map directions, etc. Σ ∗ : all possible sequences (strings) using alphabet ◮ Examples: aaaaaaaaa, baba, bcabaca,... Languages: Subsets of Σ ∗ following some pattern ◮ Examples: ◮ {ba, baba, bababa, bababababa, ...}: 1 or more ba ◮ {ab, aabb, aaabbb, aaaaaabbbbbb,...}: a n b n ◮ {aa, aab, aba, aabbaabbaa,...}: Even # of a’s 11

  13. Machine Learning Language Tensors Reduplication Results References What is a function for language? ◮ Grammar/Automaton: Computational device that decides whether a string is in a language (says yes/no) ◮ Functional perspective: f : Σ ∗ → { 0 , 1 } p.c. Casey 1996 12

  14. Machine Learning Language Tensors Reduplication Results References Regular Languages & Finite-State Automata Regular Language: Memory required is finite w.r.t. input (ba)*: {ba, baba, bababa,...} b q 0 q 1 start a b(a*): {b, ba, baaaaaa,....} a b q 0 q 1 start 13

  15. Machine Learning Language Tensors Reduplication Results References Regular Languages & Finite-State Automata f : Σ ∗ → R p.c. B. Balle, X. Carreras, A. Quattoni - ENMLP’14 tutorial 14

  16. Machine Learning Language Tensors Reduplication Results References Finite-State Automata & Representation Learning ◮ An FSA induces a mapping φ : Σ ∗ → R ◮ The mapping φ is compositional ◮ The output f A ( x ) = � φ ( x ) , ω � is linear in φ ( x ) p.c. Guillaume Rabusseau 15

  17. Machine Learning Language Tensors Reduplication Results References Supra-Regularity in Natural Language 16

  18. Machine Learning Language Tensors Reduplication Results References Chomsky Hierarchy Swiss German English nested embedding Chumash sibilant harmony Shieber 1985 Chomsky 1957 Applegate 1972 Yoruba copying Kobele 2006 Mildly Context- Finite Regular Context-Free Context- Sensitive Sensitive English consonant clusters Kwakiutl stress Clements and Keyser 1983 Computably Enumerable Bach 1975 p.c. Rawski & Heinz 2019 17

  19. Machine Learning Language Tensors Reduplication Results References Chomsky Hierarchy Swiss German English nested embedding Chumash sibilant harmony Shieber 1985 Chomsky 1957 Applegate 1972 Yoruba copying Kobele 2006 Mildly Context- Finite Regular Context-Free Context- Sensitive Sensitive English consonant clusters Kwakiutl stress Clements and Keyser 1983 Computably Enumerable Bach 1975 p.c. Rawski & Heinz 2019 17

  20. Machine Learning Language Tensors Reduplication Results References RNN and regular languages Language : Does string w belong to stringset (language) L ◮ Computed by different classes of grammars ( acceptors ) How expressive are RNNs? Turing complete infinite precision+time (Siegelmann 2012) ⊆ counter languages LSTM/ReLU (Weiss et al. 2018) Regular SRNN/GRU (Weiss et al. 2018) asymptotic acceptance (Merrill 2019) Weighted FSA Linear 2nd Order RNN (Rabusseau et al. 2019) Subregular LSTM problems (Avcu et al. 2017) pic credit: Casey 1996 18

  21. Machine Learning Language Tensors Reduplication Results References Tensors: Quick and Dirty Overview ◮ Order 1 — vector: i − → v ∈ A = ∑ C v � a i i ◮ Order 2 — matrix: a i ⊗− → ij − → M ∈ A ⊗ B = ∑ C M b j ij ◮ Order 3 — Cuboid: a i ⊗− → ijk − → b j ⊗− → R ∈ A ⊗ B ⊗ C = ∑ C R c k ijk 19

  22. Machine Learning Language Tensors Reduplication Results References Tensor Networks (Penrose Notation?) ( T × 1 A × 2 B × 3 C ) i 1 , i 2 , i 3 = ∑ k 1 k 2 k 3 T k 1 k 2 k 3 A i 1 k 1 B i 2 k 2 C i 3 k 3 p.c. Guillaume Rabusseau 20

  23. Machine Learning Language Tensors Reduplication Results References Second-Order RNN Hidden state is computed by h t = g ( W × 2 x t × 3 h t − 1 ) The computation of a finite-state machine is very similar! where A ∈ R n × Σ × n defined by A : , σ , : = A σ p.c. Guillaume Rabusseau 21

  24. Machine Learning Language Tensors Reduplication Results References Theorem (Rabusseau et al 2019) Weighted FSA are expressively equivalent to second-order linear RNNs (linear 2-RNNs) for computing functions over sequences of discrete symbols. Theorem (Merrill 2019) RNNs asymptotically accept exactly the regular languages Theorem (Casey 1996) A finite-dimensional RNN can robustly perform only finite-state computations. 22

  25. Machine Learning Language Tensors Reduplication Results References Theorem (Casey 1996) An RNN with finite-state behavior necessarily partitions its state space into disjoint regions that correspond to the states of the minimal FSA 23

  26. Machine Learning Language Tensors Reduplication Results References Analyzing Specific Neuron Dynamics ◮ RNN with only 2 neurons in its hidden state trained on “Even-A" language. ◮ Input: stream of strings separated by $ symbol ◮ Neuron 1: all even as, and $ symbol after a rejected string ◮ Neuron B: all b’s following even number of a’s, and $ after an accepted string. p.c. Oliva & Lago-Fernàndez 2019 24

  27. Machine Learning Language Tensors Reduplication Results References RNN Encoder-Decoder and Transducers ◮ Function : Given string w , generate f ( w ) = v = accepted pairs of input & output strings ◮ Computed by different classes of grammars ( transducers ) ◮ Recurrent encoder maps a sequence to v ∈ R n , recurrent decoder language model conditioned on v (Sutskever et al. 2014) ◮ How expressive are they? 25

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend