Sentence Processing in a Vectorial Model of Working Memory William - PowerPoint PPT Presentation

Sentence Processing in a Vectorial Model of Working Memory William Schuler Department of Linguistics, The Ohio State University June 29, 2014 William Schuler Sentence Processing in a Vectorial Model of Working Memory

Introduction I’m envious of my computational cog neuro colleagues; they define. . . ◮ associative memory in terms of neural activation (vector prod. model). [Marr, 1971, Anderson et al., 1977, Murdock, 1982, McClelland et al., 1995, Howard and Kahana, 2002] ◮ one (possibly superposed) activation-based state: cortex as vector ◮ a set of weight-based cued associations: hippocampus as matrix ◮ neural activation in terms of ligands, receptors, chemistry, physics. I’d like to define parsing in terms of (vectorial) associative memory models! But existing sent. proc. models don’t do parsing / connect to vector memory: ◮ connectionist models don’t explain why syntactic prob. is so predictive. (subjacency, gap propagation to modifiers, . . . ) [Fossum and Levy, 2012, van Schijndel et al., 2013b, van Schijndel et al., 2014] ◮ ACT-R is a good candidate, but it is serial (ditto GP , construal, race). vector state can easily be superposed, why not in sentence proc? ◮ full parallel surprisal accounts don’t explain center embedding effects. superposing distinct analyses requires huge tensors, then all available. William Schuler Sentence Processing in a Vectorial Model of Working Memory

Introduction So I’ll build a model based on our earlier symbolic parallel model: ◮ builds ‘incomplete categories’ in left-corner parse [Schuler et al., 2010]: ◮ top-down for right children, to build ‘awaited’ category: S/VP V → S/NP ◮ bottom-up for left children, to build ‘active’ category: NP/N N → S/VP ◮ unlike earlier work, syntactic category states are superposed in vector ◮ constraints on ‘awaited’ categories are multiplied in at right children ◮ constraints on ‘active’ categories are reconstructed at left children Results: ◮ seems to work, theoretically justifies parallel left-corner parsing model ◮ predicts processing difficulty in center embedding: ◮ result of noise in reconstruction after multiplied-in constraints (Warning: ‘existence proof’ results, not a state-of-the-art parser.) William Schuler Sentence Processing in a Vectorial Model of Working Memory

Previous Work: Left-corner Parsing In left-corner parse [van Schijndel et al., 2013a], either do a fork or don’t: a a –F: +F: b b a ′ x t x t Build a complete category (triangle). a / b x t b → x t (–F) a a / b x t + → a ′ ... ; a ′ → x t b (+F) a ′ a / b William Schuler Sentence Processing in a Vectorial Model of Working Memory

Previous Work: Left-corner Parsing Then, either do a join or don’t (incrementally build top-down or bottom-up): a a +J: –J: b b a ′ a ′′ b ′′ a ′′ b ′′ Build incomplete category (trapezoid) out of complete category (triangle). a ′′ a / b b → a ′′ b ′′ (+J) a / b ′′ a ′′ a / b + → a ′ ... ; a ′ → a ′′ b ′′ a ′ / b ′′ b (–J) a / b William Schuler Sentence Processing in a Vectorial Model of Working Memory

Previous Work: Vectorial Memory Model connections in associative memory w. matrix [Anderson et al., 1977]: v = M u (1) def = � J (1 ′ ) ( M u ) [ i ] j = 1 M [ i , j ] · u [ j ] Build cued associations using outer product: M t = M t − 1 + v ⊗ u (2) def (2 ′ ) ( v ⊗ u ) [ i , j ] = v [ i ] · u [ j ] Combine cued associations using pointwise / diagonal product: w = diag ( u ) v (3) def (3 ′ ) ( diag ( v ) u ) [ i ] = v [ i ] · u [ i ] William Schuler Sentence Processing in a Vectorial Model of Working Memory

Vectorial Parser We can implement the two left-corner parser phases using these operations. Here’s what we need: Permanent ‘procedural’ associations (separate matrices, for simplicity): ◮ associative store for preterminal category given observation: P = � i p i ⊗ x i ◮ associative store for grammar rule given parent / l. child / r. child: G ′ = � G ′′ = � i g i ⊗ c ′ i g i ⊗ c ′′ G = � i g i ⊗ c i ; i ; i ◮ associative store for l. descendant category given ancestor category: + k ← G ′⊤ G D ′ D ′ D ′ ← D ′ 0 ← diag ( 1 ); D 0 ← diag ( 0 ); 1 ; D k k − k − 1 ◮ associative store for r. descendant category given ancestor category: + k ← G ′′⊤ G E ′ E ′ E ′ ← E ′ 0 ← diag ( 1 ); E 0 ← diag ( 0 ); 1 ; E k k − k − 1 William Schuler Sentence Processing in a Vectorial Model of Working Memory

Vectorial Parser We’ll also need: Temporary state vector ‘working memory’: ◮ lowest awaited node: b (can be superposed, of course) ◮ observations: x (word token) Temporary associations (separate matrices, for simplicity): ◮ associative store for ‘active’ node above ‘awaited’ node: A ◮ associative store for ‘awaited’ node above ‘active’ node: B ◮ associative store for category type of node: C William Schuler Sentence Processing in a Vectorial Model of Working Memory

Vectorial Parser - ‘fork’ phase 1 (= a ′′ a t − t ) a t − 1 –F: +F: b t − b t − 1 1 B a ′ . 5 (= a ′′ t ) t − x t x t c − t = diag ( P x t ) C t − 1 b t − (no-fork preterminal category combines x , b ) 1 c + t = diag ( P x t ) D C t − 1 b t − (forked preterminal category goes through D ) 1 (100 of 10 R 20 . 5 , a ′ − 150 to be sparse, avoid over-/underflow) a t − . 5 ∼ Exp t − a t − 1 = A t − 1 b t − (define a ) 1 1 ⊗ a ′ . 5 = B t − 1 + b t − . 5 + B t − B t − 1 a t − 1 ⊗ a t − (update B for new nodes) . 5 t − 1 + c + t ⊗ a ′ 1 ) E ⊤ c − C t − . 5 = C t − . 5 + diag ( C t − 1 a t − t ⊗ a t − (reconstruct via E ) . 5 t − William Schuler Sentence Processing in a Vectorial Model of Working Memory

Vectorial Parser - ‘join’ phase a t − a t − . 5 . 5 +J: –J: A b t − b t − . 5 . 5 B a ′ t A a ′′ b ′′ t t a ′′ b ′′ t t t = diag ( G ′ C t − g + . 5 a ′′ (join rule combines categories of a ′′ , b ) t ) G C t − . 5 b t − . 5 t = diag ( G ′ C t − g − . 5 a ′′ t ) G D C t − . 5 b t − (no-join rule goes through D ) . 5 (100 of 10 R 20 a ′ t , b ′′ − 150 to be sparse, avoid over-/underflow) t ∼ Exp . 5 || g + t || + a ′ t || g − A t − 1 b t − t || t |||| ⊗ b ′′ A t = A t − 1 + (update A w. weighted avg) . 5 || g + t t || + a ′ t || g − || A t − 1 b t − . 5 ⊗ a ′ (define B for a ′ ) B t = B t − . 5 + b t − t G ′′⊤ g + t + G ′′⊤ g − . 5 + G ⊤ g − t ⊗ a ′ t || ⊗ b ′′ C t = C t − t + t (update C w. weighted avg) || G ′′⊤ g + t t + G ′′⊤ g − William Schuler Sentence Processing in a Vectorial Model of Working Memory

Vectorial Grammar Parser accepts PCFGs: (note this grammar can be center-embedded) P ( T → S T ) = 1 . 0 P ( S → NP VP ) = 0 . 5 P ( S → IF S THEN S ) = 0 . 25 P ( S → EITHER S OR S ) = 0 . 25 P ( IF → if ) = 1 . 0 P ( THEN → then ) = 1 . 0 P ( EITHER → either ) = 1 . 0 P ( OR → or ) = 1 . 0 P ( NP → kim ) = 0 . 5 P ( NP → pat ) = 0 . 5 P ( VP → leaves ) = 0 . 5 P ( VP → stays ) = 0 . 5 William Schuler Sentence Processing in a Vectorial Model of Working Memory

Predictions This parser can process short sentences using a simple associative store (meaning it usually predicts a top-level category at the correct position): condition correct incorrect right-branching: If Kim stays then if Kim leaves then Pat leaves. 297 203 center-embedded: If either Kim stays or Kim leaves then Pat leaves. 231* 269 And it also predicts difficulty at center embedded constructions (* p < . 001)! William Schuler Sentence Processing in a Vectorial Model of Working Memory

Predictions Why is center embedding difficult for this model? ◮ traversal to r. child multiplies constraints on b , eliminates hypotheses. e.g. if b is S or NP (say after know ), then after word the , b ′′ must be N. a +J: A b a ′′ b ′′ ◮ traversal from l. child reconstructs constraints on a using b ′′ , but lossy. , after the dog : b ′′ is N, reconstructed a is S or NP e.g. if a was S or NP . ◮ longer r. traversal mean more constraints are ignored, more distortion. William Schuler Sentence Processing in a Vectorial Model of Working Memory

Scalability Flaw: why is accuracy on both types of sentences so low? ◮ vectors are short ◮ vectors are only positive ◮ reconstruction is not done as cleverly as possible ◮ outer products could be added using Howard-Kahana norming ◮ . . . Maybe someday this could be broad-coverage, but don’t need it today. William Schuler Sentence Processing in a Vectorial Model of Working Memory

Conclusion This talk defined parsing in terms of (vectorial) associative memory models [Marr, 1971, Anderson et al., 1977, Murdock, 1982, McClelland et al., 1995, Howard and Kahana, 2002] ◮ one (possibly superposed) activation-based state: cortex as vector ◮ a set of weight-based cued associations: hippocampus as matrix Model provides algorithmic-level justification for parallel left-corner parsing. Model provides algorithmic-level justification for PCFG model. Model rightly predicts that center embedded sentences are harder to parse. Model provides an explanatory model of center embedding difficulty: ◮ due to need to reconstruct active category after constraints on awaited. Thank you! William Schuler Sentence Processing in a Vectorial Model of Working Memory

Sentence Processing in a Vectorial Model of Working Memory William - PowerPoint PPT Presentation

Sentence Processing in a Vectorial Model of Working Memory William Schuler Department of Linguistics, The Ohio State University June 29, 2014 William Schuler Sentence Processing in a Vectorial Model of Working Memory Introduction Im

Probabilistic Models of Human Sentence Experiment 1: Entropy and Sentence Length 2 Processing

Memory Memory processing is the ability to: Acquire (Short term memory) Manipulate

Vectorial Quasi-flat Zones for Color Image Simplification Erhan Aptoula, Jonathan Weber,

Generalized Correlation Analysis of Vectorial Boolean Functions Claude Carlet, Khoongming Khoo,

Memory II. Memory improvement III. Problems with memory 3 systems/stages of Memory: memory

A Sentence is a Sentence is a Sentence? Zarah Weiss Introduction Parallels and Differences

SENTENCE STRUCTURE ATI TEAS ENGLISH AND LANGUAGE USAGE SENTENCE STRUCTURE Sentence Structure

1 Memory SoC Persistent Memory-Driven Memory Memory Processor-Centric Memory SoC SoC

Networks Computer-Computer Comm CPU CPU CPU CPU Memory Device Device Memory Memory

Structure for Semantic Tasks Gabriel Stanovsky, Ido Dagan and Mausam Sentence Level Semantic

I. Watch the Einstein video and answer the following questions: What is a sentence? What is a

A Unified Local and Global Model for Discourse Coherence Micha Elsner, Joseph Austerweil, Eugene

Working models of working memory Omri Barak and Misha Tsodyks 2014, Curr. Op. in Neurobiology

Function Pointers Refined Memory Model 1 The C0 Memory Model so far Local Memory Allocated

Virtual Memory 1 Memory Hierarchy Memory 4GB Cache 1M Registers 1K Question: What if

Personal SE Computer Memory Addresses C Pointers Computer Memory Organization Memory is a

Using evidence accumulation to bridge the gap between neural networks and symbolic cognitive

Biologically Inspired Computation F21BC2 Artificial Neural Networks Nick Taylor Room EM 1.62

I-tutorial Learning of Invariant Representations in Sensory Cortex tomaso poggio CBMM McGovern

4/20/15 Special Agents of Change Spring 2015 Webinar Series The New Science of Learning: Trip

Cortex-A15 Processor ARMs next generation mobile applications processor Travis Lanier Senior

1 AMAM2000 Kimura 7 10 Studies on Neuro-Mechanics Sensory Feedback to CPGs Dynamic Coupling

How the Brain Sees: Fundamentals and Recent Progress in Modeling Vision Stephen Grossberg Ennio

Integrating genetic and epigenetic variation in schizophrenia Jonathan Mill