Understanding Machine Learning with Language and Tensors Jon Rawski - - PowerPoint PPT Presentation

understanding machine learning with language and tensors
SMART_READER_LITE
LIVE PREVIEW

Understanding Machine Learning with Language and Tensors Jon Rawski - - PowerPoint PPT Presentation

Machine Learning Language Tensors Reduplication Understanding Machine Learning with Language and Tensors Jon Rawski Linguistics Department Institute for Advanced Computational Science Stony Brook University 1 Machine Learning Language


slide-1
SLIDE 1

Machine Learning Language Tensors Reduplication

Understanding Machine Learning with Language and Tensors

Jon Rawski Linguistics Department Institute for Advanced Computational Science Stony Brook University

1

slide-2
SLIDE 2

Machine Learning Language Tensors Reduplication

Thinking Like A Linguist

1 Language, like physics, is not just data you throw at a machine 2 Language is a fundamentally computational process, uniquely

learned by humans from small data.

3 We can use core properties of language to understand how

  • ther systems generalize, learn, and perform inference.

2

slide-3
SLIDE 3

Machine Learning Language Tensors Reduplication

3

slide-4
SLIDE 4

Machine Learning Language Tensors Reduplication

The Zipf Problem (Yang 2013)

4

slide-5
SLIDE 5

Machine Learning Language Tensors Reduplication

A Recipe for Machine Learning

1 Given training data: {xi,yi}N

i=1

2 Choose each of these:

◮ Decision Function: ˆ

y = fθ (xi)

◮ Loss Function: ℓ(ˆ

y,yi) ∈ R

3 Define Goal:

θ ∗ = argminθ ∑N

i=1ℓ(fθ (xi),yi)

4 Train (take small steps opposite the gradient):

θ (t+1) = θ (t) −ηt∇ℓ(fθ (xi),yi)

5

slide-6
SLIDE 6

Machine Learning Language Tensors Reduplication

“Neural” Networks & Automatic Differentiation

p.c. Matt Gormley 6

slide-7
SLIDE 7

Machine Learning Language Tensors Reduplication

Recurrent Neural Networks (RNN)

Acceptor: Read in a sequence. Predict from the end state. Backprop the error all the way back.

p.c. Yoav Goldberg 7

slide-8
SLIDE 8

Machine Learning Language Tensors Reduplication

Recurrent Neural Networks (RNN)

Acceptor: Read in a sequence. Predict from the end state. Backprop the error all the way back.

p.c. Yoav Goldberg 7

slide-9
SLIDE 9

Machine Learning Language Tensors Reduplication

What is a function for language?

Alphabet: Σ = {a,b,c,...}

◮ Examples: letters, DNA peptides, words, map directions, etc.

Σ∗: all possible sequences (strings) using alphabet

◮ Examples: aaaaaaaaa, baba, bcabaca,...

Languages: Subsets of Σ∗ following some pattern

◮ Examples:

◮ {ba, baba, bababa, bababababa, ...}: 1 or more ba ◮ {ab, aabb, aaabbb, aaaaaabbbbbb,...}: anbn ◮ {aa, aab, aba, aabbaabbaa,...}: Even # of a’s

8

slide-10
SLIDE 10

Machine Learning Language Tensors Reduplication

What is a function for language?

◮ Grammar/Automaton: Computational device that decides

whether a string is in a language (says yes/no)

◮ Functional perspective: f : Σ∗ → {0,1}

p.c. Casey 1996 9

slide-11
SLIDE 11

Machine Learning Language Tensors Reduplication

Regular Languages & Finite-State Automata

Regular Language: Memory required is finite w.r.t. input (ba)*: {ba, baba, bababa,...} q0 start q1 b a b(a*): {b, ba, baaaaaa,....} q0 start q1 b a

10

slide-12
SLIDE 12

Machine Learning Language Tensors Reduplication

Regular Languages & Finite-State Automata

f : Σ∗ → R

p.c. B. Balle, X. Carreras, A. Quattoni - ENMLP’14 tutorial 11

slide-13
SLIDE 13

Machine Learning Language Tensors Reduplication

Supra-Regularity in Natural Language

12

slide-14
SLIDE 14

Machine Learning Language Tensors Reduplication

Chomsky Hierarchy

Computably Enumerable

Context- Sensitive Mildly Context- Sensitive Context-Free Regular Finite Yoruba copying Kobele 2006 Swiss German Shieber 1985 English nested embedding Chomsky 1957 English consonant clusters Clements and Keyser 1983 Kwakiutl stress Bach 1975 Chumash sibilant harmony Applegate 1972

p.c. Rawski & Heinz 2019 13

slide-15
SLIDE 15

Machine Learning Language Tensors Reduplication

Chomsky Hierarchy

Computably Enumerable

Context- Sensitive Mildly Context- Sensitive Context-Free Regular Finite Yoruba copying Kobele 2006 Swiss German Shieber 1985 English nested embedding Chomsky 1957 English consonant clusters Clements and Keyser 1983 Kwakiutl stress Bach 1975 Chumash sibilant harmony Applegate 1972

p.c. Rawski & Heinz 2019 13

slide-16
SLIDE 16

Machine Learning Language Tensors Reduplication

Tensors: Quick and Dirty Overview

◮ Order 1 — vector:

  • v ∈ A = ∑

i

Cv

i −

→ ai

◮ Order 2 — matrix:

M ∈ A⊗B = ∑

ij

CM

ij −

→ ai ⊗− → bj

◮ Order 3 — Cuboid:

R ∈ A⊗B⊗C = ∑

ijk

CR

ijk−

→ ai ⊗− → bj ⊗− → ck

14

slide-17
SLIDE 17

Machine Learning Language Tensors Reduplication

Tensor Networks (Penrose Notation?)

(T ×1 A×2 B×3 C)i1,i2,i3 = ∑k1k2k3 Tk1k2k3Ai1k1Bi2k2Ci3k3

p.c. Guillaume Rabusseau 15

slide-18
SLIDE 18

Machine Learning Language Tensors Reduplication

Second-Order RNN

Hidden state is computed by ht = g(W ×2 xt ×3 ht−1) The computation of a finite-state machine is very similar! where A ∈ Rn×Σ×n defined by A:,σ,: = Aσ

p.c. Guillaume Rabusseau 16

slide-19
SLIDE 19

Machine Learning Language Tensors Reduplication

Theorem (Rabusseau et al 2019) Weighted FSA are expressively equivalent to second-order linear RNNs (linear 2-RNNs) for computing functions over sequences of discrete symbols. Theorem (Merrill 2019) RNNs asymptotically accept exactly the regular languages Theorem (Casey 1996) A finite-dimensional RNN can robustly perform only finite-state computations.

17

slide-20
SLIDE 20

Machine Learning Language Tensors Reduplication

Theorem (Casey 1996) An RNN with finite-state behavior necessarily partitions its state space into disjoint regions that correspond to the states of the minimal FSA

18

slide-21
SLIDE 21

Machine Learning Language Tensors Reduplication

Analyzing Specific Neuron Dynamics

◮ RNN with only 2 neurons in its hidden state trained on

“Even-A" language.

◮ Input: stream of strings separated by $ symbol ◮ Neuron 1: all even as, and $ symbol after a rejected string ◮ Neuron B: all b’s following even number of a’s, and $ after an

accepted string.

p.c. Oliva & Lago-Fernàndez 2019 19

slide-22
SLIDE 22

Machine Learning Language Tensors Reduplication

But...Translation Needs an Output!

f : Σ∗ → ∆∗

p.c. Bahdanau et al 2014 20

slide-23
SLIDE 23

Machine Learning Language Tensors Reduplication

RNN Encoder-Decoder

p.c. Chris Dyer 21

slide-24
SLIDE 24

Machine Learning Language Tensors Reduplication

RNN Encoder-Decoder

p.c. Chris Dyer 21

slide-25
SLIDE 25

Machine Learning Language Tensors Reduplication

RNN Encoder-Decoder

p.c. Chris Dyer 21

slide-26
SLIDE 26

Machine Learning Language Tensors Reduplication

RNN Encoder-Decoder

p.c. Chris Dyer 21

slide-27
SLIDE 27

Machine Learning Language Tensors Reduplication

RNN Encoder-Decoder

p.c. Chris Dyer 21

slide-28
SLIDE 28

Machine Learning Language Tensors Reduplication

RNN Encoder-Decoder

p.c. Chris Dyer 21

slide-29
SLIDE 29

Machine Learning Language Tensors Reduplication

Our idea: Use functions that copy!

(1) Total reduplication = unbounded copy (∼83%) a. wanita→wanita∼wanita ‘woman’→‘women’ (Indonesian) (2) Partial reduplication = bounded copy (∼75%)

  • a. C:

gen→g∼gen ‘to sleep’→‘to be sleeping’ (Shilh)

  • b. CV:

guyon→gu∼guyon ‘to jest’→‘to jest repeatedly’ (Sundanese)

  • c. CVC:

takki→ tak∼takki ‘leg’→‘legs’ (Agta)

  • d. CVCV:

banagañu→bana∼banagañu ‘return’ (Dyirbal)

22

slide-30
SLIDE 30

Machine Learning Language Tensors Reduplication

1-way and 2-way Finite-State Transducers

Finite-state transducer Origin information 1-way a.i a.ii q0 start q1 q2 q3 q4 qf

(⋊:⋊) (t:t) (p:p) (a:a∼ta) (a:a∼pa) (Σ : Σ) (⋉:⋉)

p a t p a p a t 2-way b.i b.ii q0 start q1 q2 q3 q4 qf

(⋊:λ:+1) (C:C:+1) (V:V:-1) (Σ:Σ:-1) (⋊:∼:+1) (Σ:Σ:+1) (⋉:λ:+1)

p a t p a p a t

23

slide-31
SLIDE 31

Machine Learning Language Tensors Reduplication

Encoder-Decoder = 1-way or 2-way FST?

24

slide-32
SLIDE 32

Machine Learning Language Tensors Reduplication

Encoder-Decoder = 1-way or 2-way FST?

24

slide-33
SLIDE 33

Machine Learning Language Tensors Reduplication

Encoder-Decoder = 1-way or 2-way FST?

24

slide-34
SLIDE 34

Machine Learning Language Tensors Reduplication

Encoder-Decoder = 1-way or 2-way FST?

24

slide-35
SLIDE 35

Machine Learning Language Tensors Reduplication

Encoder-Decoder = 1-way or 2-way FST?

24

slide-36
SLIDE 36

Machine Learning Language Tensors Reduplication

Main Points

1 Language is not just data you throw at a machine 2 Language is a fundamentally computational process uniquely

learned by humans.

3 We can use core properties of language to understand how

  • ther systems learn.

Want More?

◮ Mathematical Linguistics Reading Group

◮ Fridays, 12pm-1pm, SBS N250 ◮ Website: complab-stonybrook.github.io/mlrg/

◮ IACS Machine Learning and Statistical Inference Working

Group

◮ Every other week, contact me for details

25