Understanding Machine Learning with Language and Tensors Jon Rawski - PowerPoint PPT Presentation

Machine Learning Language Tensors Reduplication Results References Understanding Machine Learning with Language and Tensors Jon Rawski Linguistics Department Institute for Advanced Computational Science Stony Brook University 1

Machine Learning Language Tensors Reduplication Results References Thinking Like A Linguist 1 Language, like physics, is not just data you throw at a machine 2 Language is a fundamentally computational process, uniquely learned by humans from small, sparse, impoverished data. 3 We can use core properties of language to understand how other systems generalize, learn, and perform inference. 2

Machine Learning Language Tensors Reduplication Results References Gaps Between wet and dry brains Data gap ◮ Modern ML is training-data hungry, requires orders of magnitude more training data than biological brains ◮ Biological brains have species-specific, adaptively-evolved prior structure, encoded in the species genome and reflected in mesoscale brain connectivity Energy Gap ◮ Modern computational infrastructure is energy-hungry, consuming orders of magnitude more power than biological brains. IT sector is a growing contributor to climate destruction 3

Machine Learning Language Tensors Reduplication Results References 4

Machine Learning Language Tensors Reduplication Results References The Zipf Problem (Yang 2013) 5

Machine Learning Language Tensors Reduplication Results References A Recipe for Machine Learning 1 Given training data: { x i , y i } N i = 1 2 Choose each of these: ◮ Decision Function: ˆ y = f θ ( x i ) ◮ Loss Function: ℓ ( ˆ y , y i ) ∈ R 3 Define Goal: θ ∗ = argmin θ ∑ N i = 1 ℓ ( f θ ( x i ) , y i ) 4 Train (take small steps opposite the gradient): θ ( t + 1 ) = θ ( t ) − η t ∇ ℓ ( f θ ( x i ) , y i ) 6

Machine Learning Language Tensors Reduplication Results References “Neural” Networks & Automatic Differentiation p.c. Matt Gormley 7

Machine Learning Language Tensors Reduplication Results References Recurrent Neural Networks (RNN) Acceptor: Read in a sequence. Predict from the end state. Backprop the error all the way back. p.c. Yoav Goldberg 8

Machine Learning Language Tensors Reduplication Results References Zebrafinches exhibit the opposite Expected behavior in Machine behavior: when presented the same Learning: Multiple presentations song multiple times, imitation should yield a better fit to training accuracy decreases samples Tchernichovsky et al, PNAS 1999 9

Machine Learning Language Tensors Reduplication Results References When we consider it carefully, it is clear that no system — computer program or human — has any basis to reliably classify new examples that go beyond those it has already seen during training, unless that system has some additional prior knowledge or assumptions that go beyond the training examples. In short, there is no free lunch — no way to generalize beyond the specific training examples, unless the learner commits to some additional assumptions. Tom Mitchell, Machine Learning, 2nd ed Don’t “confuse ignorance of biases with abscence of biases" (Rawski and Heinz 2019) 10

Machine Learning Language Tensors Reduplication Results References What is a function for language? Alphabet : Σ = { a , b , c ,... } ◮ Examples: letters, DNA peptides, words, map directions, etc. Σ ∗ : all possible sequences (strings) using alphabet ◮ Examples: aaaaaaaaa, baba, bcabaca,... Languages: Subsets of Σ ∗ following some pattern ◮ Examples: ◮ {ba, baba, bababa, bababababa, ...}: 1 or more ba ◮ {ab, aabb, aaabbb, aaaaaabbbbbb,...}: a n b n ◮ {aa, aab, aba, aabbaabbaa,...}: Even # of a’s 11

Machine Learning Language Tensors Reduplication Results References What is a function for language? ◮ Grammar/Automaton: Computational device that decides whether a string is in a language (says yes/no) ◮ Functional perspective: f : Σ ∗ → { 0 , 1 } p.c. Casey 1996 12

Machine Learning Language Tensors Reduplication Results References Regular Languages & Finite-State Automata Regular Language: Memory required is finite w.r.t. input (ba)*: {ba, baba, bababa,...} b q 0 q 1 start a b(a*): {b, ba, baaaaaa,....} a b q 0 q 1 start 13

Machine Learning Language Tensors Reduplication Results References Regular Languages & Finite-State Automata f : Σ ∗ → R p.c. B. Balle, X. Carreras, A. Quattoni - ENMLP’14 tutorial 14

Machine Learning Language Tensors Reduplication Results References Finite-State Automata & Representation Learning ◮ An FSA induces a mapping φ : Σ ∗ → R ◮ The mapping φ is compositional ◮ The output f A ( x ) = � φ ( x ) , ω � is linear in φ ( x ) p.c. Guillaume Rabusseau 15

Machine Learning Language Tensors Reduplication Results References Supra-Regularity in Natural Language 16

Machine Learning Language Tensors Reduplication Results References Chomsky Hierarchy Swiss German English nested embedding Chumash sibilant harmony Shieber 1985 Chomsky 1957 Applegate 1972 Yoruba copying Kobele 2006 Mildly Context- Finite Regular Context-Free Context- Sensitive Sensitive English consonant clusters Kwakiutl stress Clements and Keyser 1983 Computably Enumerable Bach 1975 p.c. Rawski & Heinz 2019 17

Machine Learning Language Tensors Reduplication Results References RNN and regular languages Language : Does string w belong to stringset (language) L ◮ Computed by different classes of grammars ( acceptors ) How expressive are RNNs? Turing complete infinite precision+time (Siegelmann 2012) ⊆ counter languages LSTM/ReLU (Weiss et al. 2018) Regular SRNN/GRU (Weiss et al. 2018) asymptotic acceptance (Merrill 2019) Weighted FSA Linear 2nd Order RNN (Rabusseau et al. 2019) Subregular LSTM problems (Avcu et al. 2017) pic credit: Casey 1996 18

Machine Learning Language Tensors Reduplication Results References Tensors: Quick and Dirty Overview ◮ Order 1 — vector: i − → v ∈ A = ∑ C v � a i i ◮ Order 2 — matrix: a i ⊗− → ij − → M ∈ A ⊗ B = ∑ C M b j ij ◮ Order 3 — Cuboid: a i ⊗− → ijk − → b j ⊗− → R ∈ A ⊗ B ⊗ C = ∑ C R c k ijk 19

Machine Learning Language Tensors Reduplication Results References Tensor Networks (Penrose Notation?) ( T × 1 A × 2 B × 3 C ) i 1 , i 2 , i 3 = ∑ k 1 k 2 k 3 T k 1 k 2 k 3 A i 1 k 1 B i 2 k 2 C i 3 k 3 p.c. Guillaume Rabusseau 20

Machine Learning Language Tensors Reduplication Results References Second-Order RNN Hidden state is computed by h t = g ( W × 2 x t × 3 h t − 1 ) The computation of a finite-state machine is very similar! where A ∈ R n × Σ × n defined by A : , σ , : = A σ p.c. Guillaume Rabusseau 21

Machine Learning Language Tensors Reduplication Results References Theorem (Rabusseau et al 2019) Weighted FSA are expressively equivalent to second-order linear RNNs (linear 2-RNNs) for computing functions over sequences of discrete symbols. Theorem (Merrill 2019) RNNs asymptotically accept exactly the regular languages Theorem (Casey 1996) A finite-dimensional RNN can robustly perform only finite-state computations. 22

Machine Learning Language Tensors Reduplication Results References Theorem (Casey 1996) An RNN with finite-state behavior necessarily partitions its state space into disjoint regions that correspond to the states of the minimal FSA 23

Machine Learning Language Tensors Reduplication Results References Analyzing Specific Neuron Dynamics ◮ RNN with only 2 neurons in its hidden state trained on “Even-A" language. ◮ Input: stream of strings separated by $ symbol ◮ Neuron 1: all even as, and $ symbol after a rejected string ◮ Neuron B: all b’s following even number of a’s, and $ after an accepted string. p.c. Oliva & Lago-Fernàndez 2019 24

Machine Learning Language Tensors Reduplication Results References RNN Encoder-Decoder and Transducers ◮ Function : Given string w , generate f ( w ) = v = accepted pairs of input & output strings ◮ Computed by different classes of grammars ( transducers ) ◮ Recurrent encoder maps a sequence to v ∈ R n , recurrent decoder language model conditioned on v (Sutskever et al. 2014) ◮ How expressive are they? 25

Understanding Machine Learning with Language and Tensors Jon Rawski - PowerPoint PPT Presentation

Machine Learning Language Tensors Reduplication Results References Understanding Machine Learning with Language and Tensors Jon Rawski Linguistics Department Institute for Advanced Computational Science Stony Brook University 1 Machine

Understanding Machine Learning with Language and Tensors Jon Rawski Linguistics Department

Outline Outline 4 Basic Rules 4 Basic Rules 4 Vectors and Tensors 4 Vectors and Tensors 4

Computing With Tensors: Modern Algorithm for . . . Modern Algorithm for . . . Potential

Sparse tensors are a natural way of representing real-world data 1 Sparse tensors are a natural

Tensors Lek-Heng Lim Statistics Department Retreat October 27, 2012 Thanks: NSF DMS 1209136 and

09 - Introduction to Tensors Data Mining and Matrices Universitt des Saarlandes, Saarbrcken

A CLT for Wishart Tensors Dan Mikulincer Weizmann Institute of Science 1 Wishart Tensors Let {

Spectral Methods from Tensor Networks Alex Wein Courant Institute, NYU Joint work with Ankur

a tensor manipulation language Elsbeth Turcan Eliana Ward-Lev Motivation What is a tensor?

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

EE-559 Deep learning 1b. PyTorch Tensors Fran cois Fleuret https://fleuret.org/dlc/

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

Tensor Field Visualization 9-1 Ronald Peikert SciVis 2007 - Tensor Fields Tensors

Isosurfaces Lecture 7 February 6, 2020 CS530 / Spring 2020 : Introduction to Scientific

Realistic and Realtime Rendering Steve Pettifer September 2013 School of Computer Science The

Talking to patients with osteoporosis None about initiating therapy Deborah Sellmeyer, MD

2nd Quarter 2017 Financial Results and Outlook June 29, 2017 The following slides accompany a

shape modeling of the paranasal sinuses to estimate natural variations Ayushi Sinha a , Simon

Satellite IR Imagery 00Z 09Z 18Z Infrared satellite imagery for Nov. 10, 1998, based on

Differentiation tests for the mean shape and its variance Stefan GIEBEL (University of

Chapter 8: Electron Beams: Physical and Clinical Aspects Set of 91 slides based on the chapter

Sambuz

Useful Links

Newsletter

Mail Us