Compositionality in Recursive Neural Networks Martha Lewis ILLC - - PowerPoint PPT Presentation

compositionality in recursive neural networks
SMART_READER_LITE
LIVE PREVIEW

Compositionality in Recursive Neural Networks Martha Lewis ILLC - - PowerPoint PPT Presentation

Compositionality in Recursive Neural Networks Martha Lewis ILLC University of Amsterdam SYCO3, March 2019 Oxford, UK M. Lewis Compositionality in TreeRNNs 1/25 Outline Compositional distributional semantics Pregroup grammars and how to


slide-1
SLIDE 1

Compositionality in Recursive Neural Networks

Martha Lewis

ILLC University of Amsterdam SYCO3, March 2019

Oxford, UK

  • M. Lewis

Compositionality in TreeRNNs 1/25

slide-2
SLIDE 2

Outline

Compositional distributional semantics Pregroup grammars and how to map to vector spaces Recursive neural networks (TreeRNNs) Mapping pregroup grammars to TreeRNNs Implications

  • M. Lewis

Compositionality in TreeRNNs 2/25

slide-3
SLIDE 3

Compositional Distributional Semantics

Frege’s principle of compositionality The meaning of a complex expression is determined by the meanings of its parts and the rules used for combining them.

  • M. Lewis

Compositionality in TreeRNNs 3/25

slide-4
SLIDE 4

Compositional Distributional Semantics

Frege’s principle of compositionality The meaning of a complex expression is determined by the meanings of its parts and the rules used for combining them. Distributional hypothesis Words that occur in similar contexts have similar meanings

[Harris, 1958].

  • M. Lewis

Compositionality in TreeRNNs 3/25

slide-5
SLIDE 5

Symbolic Structure

A pregroup algebra is a partially ordered monoid, where each element p has a left and a right adjoint such that: p · pr ≤ 1 ≤ pr · p pl · p ≤ 1 ≤ p · pl Elements of the pregroup are basic (atomic) grammatical types, e.g. B = {n, s}. Atomic grammatical types can be combined to form types of higher order (e.g. n · nl or nr · s · nl) A sentence w1w2 . . . wn (with word wi to be of type ti) is grammatical whenever: t1 · t2 · . . . · tn ≤ s

  • M. Lewis

Compositionality in TreeRNNs 4/25

slide-6
SLIDE 6

Pregroup derivation: example

p · pr ≤ 1 ≤ pr · p pl · p ≤ 1 ≤ p · pl

S NP Adj trembling N shadows VP V play N hide-and-seek trembling shadows play hide-and-seek n nl n nr s nl n

n · nl · n · nr · s · nl · n ≤ n · 1 · nr · s · 1 = n · nr · s ≤ 1 · s = s

  • M. Lewis

Compositionality in TreeRNNs 5/25

slide-7
SLIDE 7

Distributional Semantics

Words are represented as vectors Entries of the vector represent how often the target word co-occurs with the context word iguana cuddly smelly scaly teeth cute

1 10 15 7 2

scaly cuddly smelly Wilbur iguana Similarity is given by cosine distance: sim(v, w) = cos(θv,w) = v, w ||v||||w||

  • M. Lewis

Compositionality in TreeRNNs 6/25

slide-8
SLIDE 8

The role of compositionality

Compositional distributional models We can produce a sentence vector by composing the vectors

  • f the words in that sentence.

− → s = f (− → w1, − → w2, . . . , − → wn) Three generic classes of CDMs: Vector mixture models [Mitchell and Lapata (2010)] Tensor-based models [Coecke, Sadrzadeh, Clark (2010); Baroni and

Zamparelli (2010)]

Neural models [Socher et al. (2012); Kalchbrenner et al. (2014)]

  • M. Lewis

Compositionality in TreeRNNs 7/25

slide-9
SLIDE 9

A multi-linear model

The grammatical type of a word defines the vector space in which the word lives: Nouns are vectors in N; adjectives are linear maps N → N, i.e elements in N ⊗ N; intransitive verbs are linear maps N → S, i.e. elements in N ⊗ S; transitive verbs are bi-linear maps N ⊗ N → S, i.e. elements of N ⊗ S ⊗ N; The composition operation is tensor contraction, i.e. elimination of matching dimensions by application of inner product. Coecke, Sadrzadeh, Clarke 2010

  • M. Lewis

Compositionality in TreeRNNs 8/25

slide-10
SLIDE 10

Diagrammatic calculus: Summary

A f A V V W V W Z B morphisms tensors A Ar A Ar A = A Ar A ǫ-map η-map (ǫr

A ⊗ 1A) ◦ (1A ⊗ ηr A) = 1A

  • M. Lewis

Compositionality in TreeRNNs 9/25

slide-11
SLIDE 11

Diagrammatic calculus: example

trembling shadows play hide-and-seek

N VP Adj N V N S

F( ) =

N Nl N Nr S Nl N

F(α)(trembling ⊗ − − − − − → shadows ⊗ play ⊗ − − − − − − − − − → hide-and-seek)

trembling shadows play hide-and-seek N Nl N Nr S Nl N

  • i

− → wi → F(α) →

  • M. Lewis

Compositionality in TreeRNNs 10/25

slide-12
SLIDE 12

Recursive Neural Networks

− → p2 = g(− − − − → Clowns, − → p1) − − − − → Clowns − → p1 = g(− → tell, − − → jokes) − → tell − − → jokes gRNN : Rn × Rn → Rn :: (− → v1, − → v2) → f1

  • M ·

− → v1 − → v2

  • gRNTN : Rn×Rn → Rn :: (−

→ v1, − → v2) → gRNN(− → v1, − → v2)+f2 − → v1⊤ · T · − → v2

  • M. Lewis

Compositionality in TreeRNNs 11/25

slide-13
SLIDE 13

How compositional is this?

Successful Some element of grammatical structure The compositionality function has to do everything Does that help us understand what’s going on?

  • M. Lewis

Compositionality in TreeRNNs 12/25

slide-14
SLIDE 14

Information-routing words

− − − − → Clowns − − → who − → tell − − → jokes

  • M. Lewis

Compositionality in TreeRNNs 13/25

slide-15
SLIDE 15

Information-routing words

− − → John − − − − − − → introduces − − − − → himself

  • M. Lewis

Compositionality in TreeRNNs 13/25

slide-16
SLIDE 16

Can we map pregroup grammar onto TreeRNNs?

− → p2 = g(− − − − → Clowns, − → p1) − − − − → Clowns − → p1 = g(− → tell, − − → jokes) − → tell − − → jokes

  • M. Lewis

Compositionality in TreeRNNs 14/25

slide-17
SLIDE 17

Can we map pregroup grammar onto TreeRNNs?

Clowns tell jokes gLinTen gLinTen − → p1 = gLinTen(− − − → cross, − − − → roads) − → p2 = gLinTen(− − − − → Clowns, − → p1)

  • M. Lewis

Compositionality in TreeRNNs 15/25

slide-18
SLIDE 18

Can we map pregroup grammar onto TreeRNNs?

Clowns tell jokes gLinTen gLinTen

  • M. Lewis

Compositionality in TreeRNNs 16/25

slide-19
SLIDE 19

Why?

Opens up more possibilities to use tools from formal semantics in computational linguistics. We can immediately see possibilities for building alternative networks - perhaps different compositionality functions for different parts of speech Decomposing the tensors for functional words into repeated applications of a compositionality function gives options for learning representations.

  • M. Lewis

Compositionality in TreeRNNs 17/25

slide-20
SLIDE 20

Why?

who : nrnsls dragons breathe fire who = dragons breathe fire

  • M. Lewis

Compositionality in TreeRNNs 18/25

slide-21
SLIDE 21

Why?

himself : nsrnrrnrs John loves himself = John loves

  • M. Lewis

Compositionality in TreeRNNs 19/25

slide-22
SLIDE 22

Experiments?

Not yet. But there are a number of avenues for exploration Examining performance of this kind of model with standard categorical compositional distributional models Different compositionality functions for different word types Testing the performance of TreeRNNs with formally analyzed information-routing words. Investigating the effects of switching between word types. Investigating meanings of logical words and quantifiers. Extending the analysis to other types of recurrent neural network such as long short-term memory networks or gated recurrent units.

  • M. Lewis

Compositionality in TreeRNNs 20/25

slide-23
SLIDE 23

Summary

We have shown how to interpret a simplification of recursive neural networks within a formal semantics framework We can then analyze ‘information routing’ words such as pronouns as specific functions rather than as vectors This also provides a simplification of tensor-based vector composition architectures, reducing the number of high order tensors to be learnt, and making representations more flexible and reusable. Plenty of work to do on both the experimental and the theoretical side!

  • M. Lewis

Compositionality in TreeRNNs 21/25

slide-24
SLIDE 24

Thanks!

NWO Veni grant ‘Metaphorical Meanings for Artificial Agents’

  • M. Lewis

Compositionality in TreeRNNs 22/25

slide-25
SLIDE 25

Category-Theoretic Background

The category of pregroups Preg and the category of finite dimensional vector spaces FdVect are both compact closed This means that they share a structure, namely:

Both have a tensor product ⊗ with a unit 1 Both have adjoints Ar, Al Both have special morphisms ǫr : A ⊗ Ar → 1, ǫl : Al ⊗ A → 1 ηr : 1 → Ar ⊗ A, ηl : 1 → A ⊗ Al These morphisms interact in a certain way.

In Preg: p · pr ≤ 1 ≤ pr · p pl · p ≤ 1 ≤ p · pl

  • M. Lewis

Compositionality in TreeRNNs 23/25

slide-26
SLIDE 26

A functor from syntax to semantics

We define a functor F : Preg → FdVect such that: F(p) = P ∀p ∈ B F(1) = R F(p · q) = F(p) ⊗ F(q) F(pr) = F(pl) = F(p) F(p ≤ q) = F(p) → F(q) F(ǫr) = F(ǫl) = inner product in FdVect F(ηr) = F(ηl) = identity maps in FdVect

[Kartsaklis, Sadrzadeh, Pulman and Coecke, 2016]

  • M. Lewis

Compositionality in TreeRNNs 24/25

slide-27
SLIDE 27

References I

  • M. Lewis

Compositionality in TreeRNNs 25/25