Understanding Machine Learning with Language and Tensors Jon Rawski - PowerPoint PPT Presentation

Machine Learning Language Tensors Reduplication Understanding Machine Learning with Language and Tensors Jon Rawski Linguistics Department Institute for Advanced Computational Science Stony Brook University 1

Machine Learning Language Tensors Reduplication Thinking Like A Linguist 1 Language, like physics, is not just data you throw at a machine 2 Language is a fundamentally computational process, uniquely learned by humans from small data. 3 We can use core properties of language to understand how other systems generalize, learn, and perform inference. 2

Machine Learning Language Tensors Reduplication 3

Machine Learning Language Tensors Reduplication The Zipf Problem (Yang 2013) 4

Machine Learning Language Tensors Reduplication A Recipe for Machine Learning 1 Given training data: { x i , y i } N i = 1 2 Choose each of these: ◮ Decision Function: ˆ y = f θ ( x i ) ◮ Loss Function: ℓ ( ˆ y , y i ) ∈ R 3 Define Goal: θ ∗ = argmin θ ∑ N i = 1 ℓ ( f θ ( x i ) , y i ) 4 Train (take small steps opposite the gradient): θ ( t + 1 ) = θ ( t ) − η t ∇ ℓ ( f θ ( x i ) , y i ) 5

Machine Learning Language Tensors Reduplication “Neural” Networks & Automatic Differentiation p.c. Matt Gormley 6

Machine Learning Language Tensors Reduplication Recurrent Neural Networks (RNN) Acceptor: Read in a sequence. Predict from the end state. Backprop the error all the way back. p.c. Yoav Goldberg 7

Machine Learning Language Tensors Reduplication What is a function for language? Alphabet : Σ = { a , b , c ,... } ◮ Examples: letters, DNA peptides, words, map directions, etc. Σ ∗ : all possible sequences (strings) using alphabet ◮ Examples: aaaaaaaaa, baba, bcabaca,... Languages: Subsets of Σ ∗ following some pattern ◮ Examples: ◮ {ba, baba, bababa, bababababa, ...}: 1 or more ba ◮ {ab, aabb, aaabbb, aaaaaabbbbbb,...}: a n b n ◮ {aa, aab, aba, aabbaabbaa,...}: Even # of a’s 8

Machine Learning Language Tensors Reduplication What is a function for language? ◮ Grammar/Automaton: Computational device that decides whether a string is in a language (says yes/no) ◮ Functional perspective: f : Σ ∗ → { 0 , 1 } p.c. Casey 1996 9

Machine Learning Language Tensors Reduplication Regular Languages & Finite-State Automata Regular Language: Memory required is finite w.r.t. input (ba)*: {ba, baba, bababa,...} b q 0 q 1 start a b(a*): {b, ba, baaaaaa,....} a b q 0 q 1 start 10

Machine Learning Language Tensors Reduplication Regular Languages & Finite-State Automata f : Σ ∗ → R p.c. B. Balle, X. Carreras, A. Quattoni - ENMLP’14 tutorial 11

Machine Learning Language Tensors Reduplication Supra-Regularity in Natural Language 12

Machine Learning Language Tensors Reduplication Chomsky Hierarchy Swiss German English nested embedding Chumash sibilant harmony Shieber 1985 Chomsky 1957 Applegate 1972 Yoruba copying Kobele 2006 Mildly Context- Finite Regular Context-Free Context- Sensitive Sensitive English consonant clusters Kwakiutl stress Clements and Keyser 1983 Computably Enumerable Bach 1975 p.c. Rawski & Heinz 2019 13

Machine Learning Language Tensors Reduplication Tensors: Quick and Dirty Overview ◮ Order 1 — vector: i − → v ∈ A = ∑ C v � a i i ◮ Order 2 — matrix: a i ⊗− → ij − → M ∈ A ⊗ B = ∑ C M b j ij ◮ Order 3 — Cuboid: a i ⊗− → ijk − → b j ⊗− → R ∈ A ⊗ B ⊗ C = ∑ C R c k ijk 14

Machine Learning Language Tensors Reduplication Tensor Networks (Penrose Notation?) ( T × 1 A × 2 B × 3 C ) i 1 , i 2 , i 3 = ∑ k 1 k 2 k 3 T k 1 k 2 k 3 A i 1 k 1 B i 2 k 2 C i 3 k 3 p.c. Guillaume Rabusseau 15

Machine Learning Language Tensors Reduplication Second-Order RNN Hidden state is computed by h t = g ( W × 2 x t × 3 h t − 1 ) The computation of a finite-state machine is very similar! where A ∈ R n × Σ × n defined by A : , σ , : = A σ p.c. Guillaume Rabusseau 16

Machine Learning Language Tensors Reduplication Theorem (Rabusseau et al 2019) Weighted FSA are expressively equivalent to second-order linear RNNs (linear 2-RNNs) for computing functions over sequences of discrete symbols. Theorem (Merrill 2019) RNNs asymptotically accept exactly the regular languages Theorem (Casey 1996) A finite-dimensional RNN can robustly perform only finite-state computations. 17

Machine Learning Language Tensors Reduplication Theorem (Casey 1996) An RNN with finite-state behavior necessarily partitions its state space into disjoint regions that correspond to the states of the minimal FSA 18

Machine Learning Language Tensors Reduplication Analyzing Specific Neuron Dynamics ◮ RNN with only 2 neurons in its hidden state trained on “Even-A" language. ◮ Input: stream of strings separated by $ symbol ◮ Neuron 1: all even as, and $ symbol after a rejected string ◮ Neuron B: all b’s following even number of a’s, and $ after an accepted string. p.c. Oliva & Lago-Fernàndez 2019 19

Machine Learning Language Tensors Reduplication But...Translation Needs an Output! f : Σ ∗ → ∆ ∗ p.c. Bahdanau et al 2014 20

Machine Learning Language Tensors Reduplication RNN Encoder-Decoder p.c. Chris Dyer 21

Machine Learning Language Tensors Reduplication Our idea: Use functions that copy! (1) Total reduplication = unbounded copy ( ∼ 83%) a. wanita → wanita ∼ wanita ‘woman’ → ‘women’ (Indonesian) (2) Partial reduplication = bounded copy ( ∼ 75%) a. C: gen → g ∼ gen ‘to sleep’ → ‘to be sleeping’ (Shilh) b. CV: guyon → gu ∼ guyon ‘to jest’ → ‘to jest repeatedly’ (Sundanese) c. CVC: takki → tak ∼ takki ‘leg’ → ‘legs’ (Agta) d. CVCV: banagañu → bana ∼ banagañu ‘return’ (Dyirbal) 22

Machine Learning Language Tensors Reduplication 1-way and 2-way Finite-State Transducers Finite-state transducer Origin information 1-way a.i a.ii ( ⋊ : ⋊ ) (t:t) (a:a ∼ ta) ( ⋉ : ⋉ ) q f q 0 q 1 q 2 q 4 start p a t (a:a ∼ pa) (p:p) ( Σ : Σ ) p p q 3 a a t 2-way b.i b.ii p a t ( ⋊ : λ :+1) (C:C:+1) start q 0 q 1 q 2 ( Σ : Σ :+1) (V:V:-1) p p a a t q 3 q 4 q f ( Σ : Σ :-1) ( ⋊ : ∼ :+1) ( ⋉ : λ :+1) 23

Machine Learning Language Tensors Reduplication Encoder-Decoder = 1-way or 2-way FST? 24

Machine Learning Language Tensors Reduplication Main Points 1 Language is not just data you throw at a machine 2 Language is a fundamentally computational process uniquely learned by humans. 3 We can use core properties of language to understand how other systems learn. Want More? ◮ Mathematical Linguistics Reading Group ◮ Fridays, 12pm-1pm, SBS N250 ◮ Website: complab-stonybrook.github.io/mlrg/ ◮ IACS Machine Learning and Statistical Inference Working Group ◮ Every other week, contact me for details 25

Understanding Machine Learning with Language and Tensors Jon Rawski - PowerPoint PPT Presentation

Machine Learning Language Tensors Reduplication Understanding Machine Learning with Language and Tensors Jon Rawski Linguistics Department Institute for Advanced Computational Science Stony Brook University 1 Machine Learning Language

Outline Outline 4 Basic Rules 4 Basic Rules 4 Vectors and Tensors 4 Vectors and Tensors 4

Understanding Machine Learning with Language and Tensors Jon Rawski Linguistics Department

Computing With Tensors: Modern Algorithm for . . . Modern Algorithm for . . . Potential

Sparse tensors are a natural way of representing real-world data 1 Sparse tensors are a natural

Tensors Lek-Heng Lim Statistics Department Retreat October 27, 2012 Thanks: NSF DMS 1209136 and

09 - Introduction to Tensors Data Mining and Matrices Universitt des Saarlandes, Saarbrcken

A CLT for Wishart Tensors Dan Mikulincer Weizmann Institute of Science 1 Wishart Tensors Let {

Spectral Methods from Tensor Networks Alex Wein Courant Institute, NYU Joint work with Ankur

a tensor manipulation language Elsbeth Turcan Eliana Ward-Lev Motivation What is a tensor?

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

EE-559 Deep learning 1b. PyTorch Tensors Fran cois Fleuret https://fleuret.org/dlc/

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

Tensor Field Visualization 9-1 Ronald Peikert SciVis 2007 - Tensor Fields Tensors

Module 7 ~ General Outline Purpose of Sathya Sai Centres & Groups The Universal Nature

Interactive proof and zero knowledge protocols Zero-knowledge: definition Probabilistic

Select Index Components & Import Data Manipulating Time Series Data in Python Market

Rigid Body Dynamics CSE169: Computer Animation Instructor: Steve Rotenberg UCSD, Winter 2019

Optimal interval clustering: Application to Bregman clustering and statistical mixture learning

rs stt rtr

Finite-state Strategies in Delay Games Martin Zimmermann Saarland University September 21st,

Observations of IPv6 Addresses David Malone <David.Malone@nuim.ie> Hamilton Institute, NUI

Sambuz

Useful Links

Newsletter

Mail Us

Understanding Machine Learning with Language and Tensors Jon Rawski - PowerPoint PPT Presentation

Machine Learning Language Tensors Reduplication Understanding Machine Learning with Language and Tensors Jon Rawski Linguistics Department Institute for Advanced Computational Science Stony Brook University 1 Machine Learning Language

Outline Outline 4 Basic Rules 4 Basic Rules 4 Vectors and Tensors 4 Vectors and Tensors 4

Understanding Machine Learning with Language and Tensors Jon Rawski Linguistics Department

Computing With Tensors: Modern Algorithm for . . . Modern Algorithm for . . . Potential

Sparse tensors are a natural way of representing real-world data 1 Sparse tensors are a natural

Tensors Lek-Heng Lim Statistics Department Retreat October 27, 2012 Thanks: NSF DMS 1209136 and

09 - Introduction to Tensors Data Mining and Matrices Universitt des Saarlandes, Saarbrcken

A CLT for Wishart Tensors Dan Mikulincer Weizmann Institute of Science 1 Wishart Tensors Let {

Spectral Methods from Tensor Networks Alex Wein Courant Institute, NYU Joint work with Ankur

a tensor manipulation language Elsbeth Turcan Eliana Ward-Lev Motivation What is a tensor?

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

EE-559 Deep learning 1b. PyTorch Tensors Fran cois Fleuret https://fleuret.org/dlc/

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

Tensor Field Visualization 9-1 Ronald Peikert SciVis 2007 - Tensor Fields Tensors

Module 7 ~ General Outline Purpose of Sathya Sai Centres &amp; Groups The Universal Nature

Interactive proof and zero knowledge protocols Zero-knowledge: definition Probabilistic

Select Index Components &amp; Import Data Manipulating Time Series Data in Python Market

Rigid Body Dynamics CSE169: Computer Animation Instructor: Steve Rotenberg UCSD, Winter 2019

Optimal interval clustering: Application to Bregman clustering and statistical mixture learning

rs stt rtr

Finite-state Strategies in Delay Games Martin Zimmermann Saarland University September 21st,

Observations of IPv6 Addresses David Malone &lt;David.Malone@nuim.ie&gt; Hamilton Institute, NUI

Sambuz

Useful Links

Newsletter

Mail Us

Module 7 ~ General Outline Purpose of Sathya Sai Centres & Groups The Universal Nature

Select Index Components & Import Data Manipulating Time Series Data in Python Market

Observations of IPv6 Addresses David Malone <David.Malone@nuim.ie> Hamilton Institute, NUI