Sequential neural networks as automata William Merrill Advised by - - PowerPoint PPT Presentation

sequential neural networks as automata
SMART_READER_LITE
LIVE PREVIEW

Sequential neural networks as automata William Merrill Advised by - - PowerPoint PPT Presentation

Sequential neural networks as automata William Merrill Advised by Dana Angluin Robert Frank Neural Networks Modern Artificial Intelligence Most recent advances in AI use neural networks Especially true for language (NLP) What is a


slide-1
SLIDE 1

Sequential neural networks as automata

William Merrill

Advised by Dana Angluin Robert Frank

slide-2
SLIDE 2

Neural Networks

slide-3
SLIDE 3

Modern Artificial Intelligence

  • Most recent advances in AI use neural networks
  • Especially true for language (NLP)
slide-4
SLIDE 4

What is a Neural Network?

  • A network of artificial cells which send information to each other
  • Learn the weights for cell connections from data

97% “cat”

slide-5
SLIDE 5

Sequential Neural Networks

  • For language, we use networks that can read variable-length sequences

Hello world !

slide-6
SLIDE 6

Interpretability of Neural Networks

  • Neural networks are good at translation, classification, summarization, etc.
  • But, how and why they work is still an open question
  • Cell connections must encode some kind of grammar
slide-7
SLIDE 7

Why is Interpretability Important?

  • Guiding research
  • Social accountability
  • Intellectual value
slide-8
SLIDE 8

My Method

  • Build off of formal language theory
  • Prove what kinds of linguistic structure neural networks can model
slide-9
SLIDE 9

Formal Language Theory

slide-10
SLIDE 10

Formal Languages

  • Potentially infinite sets of valid sentences

english = {“I am Will.”, “I like AI!”, ..} íslenska = {“Ég heiti Will.”, “Mér líkar við gervigreind!”, ..} palindromes = {“aa”, “aba”, “abba”, ..}

slide-11
SLIDE 11

Automata

  • Grammar/Automaton: Computational device that decides whether a

sentence is in a language (says yes/no)

Automaton Sentence x Yes, if x is valid No, if x is not valid

slide-12
SLIDE 12

Types of Automata

  • More computationally complex automata can accept more languages

Regular languages Counter languages Context-free languages Turing-acceptable languages

slide-13
SLIDE 13

Formal and Natural Languages

  • Formal languages and automata are well studied (since 1930s)
  • Formal languages model structures in natural language

Formal Languages Natural Languages Neural Networks Chomsky, Parikh, etc. Turing, Church, Fischer, etc.

slide-14
SLIDE 14

Research Questions

1. What kinds of formal languages can neural networks accept? 2. How do these languages relate to formal models of natural language?

slide-15
SLIDE 15

My Contributions

1. Definitions

a. Language acceptance for neural networks b. Measure of network’s memory

2. Results

a. SRNs b. LSTMs c. Attention d. CNNs

3. Experiments

slide-16
SLIDE 16

Definitions

slide-17
SLIDE 17

Asymptotic Acceptance

  • Networks output a probability (not yes/no)
  • Need to make network say yes or no
slide-18
SLIDE 18

State Complexity

  • Measure of network’s memory (as function of sentence length)
  • How many states can network be in after reading n words?

memory = log2(state complexity)

slide-19
SLIDE 19

Theoretical Results

slide-20
SLIDE 20

Simple Recurrent Networks (SRNs)

  • Simplest architecture for recurrent neural networks
  • Turing-complete under unconstrained definition of acceptance

(Siegelmann, 1995)

Hello world !

slide-21
SLIDE 21

SRNs as Automata

  • Thm 2.1.2: SRNs accept exactly the regular languages
  • State complexity: O(1)

(Constant)

  • Reduced characterization is more accurate than Siegelmann (1995)’s
  • Similar result for gated recurrent units (GRUs)

RL TL

slide-22
SLIDE 22

Long Short-Term Memory Networks (LSTMs)

  • More complicated recurrent neural network
  • Used for machine translation and other tasks requiring syntax
slide-23
SLIDE 23

LSTMs as Automata

  • Thm 2.2.2: LSTMs accept a subclass of the counter languages
  • State complexity: O(nk)

(Polynomial in sentence length)

  • More powerful than other recurrent networks
  • But not powerful enough to model complex tree structure
slide-24
SLIDE 24

Attention

  • Modern machine translation uses attention
  • Focus on specific input words at different steps

LSTM

Hace

LSTM

mucho

LSTM

frío

LSTM

aquí

LSTM

It

LSTM

‘s

LSTM

very

LSTM

cold

LSTM

here

slide-25
SLIDE 25

Attention Results

  • State complexity: 2O(n)

(Exponential in sentence length)

  • Additional memory allows:

○ Copying a sequence (primitive translation) ○ More complex hierarchical representations

  • Supports claim “attention is all you need” (Vaswani, 2017)
slide-26
SLIDE 26

Convolutional Neural Networks (CNNs)

  • CNNs model words at the character level
  • Deal with phonology, morphology

○ pain versus pains

slide-27
SLIDE 27

CNNs as Automata

  • Thm 3.1.1: CNNs accept the strictly local languages
  • Explains success of character-level CNNs
  • Strictly local languages* are good model of phonological grammar

(Heinz et al., 2011) *Tier-based strictly local languages

slide-28
SLIDE 28

State Complexity Hierarchy

CNN SRN GRU LSTM Attention

O(1) O(nk) 2O(n)

slide-29
SLIDE 29

Experiments

slide-30
SLIDE 30

LSTMs as Counter Automata

  • Prediction: LSTMs are equivalent to counter machines
  • LSTMs use memory to “count” (Weiss et al, 2018)

LSTM cell values on anbn

Cell value Position in sequence

slide-31
SLIDE 31

Memory Constraints of LSTMs

  • Prediction: LSTMs don’t have enough memory to reverse sentences
  • LSTM cannot reverse long sentences!
slide-32
SLIDE 32

Validating State Complexity

  • Counting requires O(nk) complexity
  • Copying requires 2O(n) complexity

CNN SRN GRU LSTM Attention

O(1) O(nk) 2O(n)

Can count! Can copy!

slide-33
SLIDE 33

Summary

  • Theoretical tools

○ Language acceptance ○ Formalizing memory

  • Results about types of networks
  • Experiments
slide-34
SLIDE 34

Conclusion

  • Step towards understanding the

“black box” of neural networks

○ Extendable to other architectures

  • Related neural networks to mental

grammar

○ LSTM can’t do complex trees ○ CNN can do phonology

slide-35
SLIDE 35

Acknowledgements

  • My advisors, Bob and Dana
  • Computational Linguistics at Yale:

○ Yiding, David, Noah, Andrew, Annie, Yong, Simon, Aarohi, Yi Chern, Sarah, Rachel

  • Linguistics Senior Seminar:

○ Anelisa, James, Jay, Jisu, Magda, Noah, Rose, Hadas, Raffaella

  • Advanced Natural Language Processing Seminar:

○ Michi, Suyi, Davey, John, Yavuz, Gaurav, Tianwei, Tomoe, Rui, Danny, Angus, Brian, Yong, Garrett, Noah, Alex, Talley, Ishita, Bo, Jack, Tao, Yi Chern, Irene, Drago

  • Vidur Joshi and others @ Allen Institute for Artificial Intelligence