Fodor & Pylyshyn 1998 Lake & Baroni 2018 Jacob Andreas / - - PowerPoint PPT Presentation

fodor pylyshyn 1998 lake baroni 2018
SMART_READER_LITE
LIVE PREVIEW

Fodor & Pylyshyn 1998 Lake & Baroni 2018 Jacob Andreas / - - PowerPoint PPT Presentation

Fodor & Pylyshyn 1998 Lake & Baroni 2018 Jacob Andreas / MIT 6.884 / Fall 2020 Today 1. F&P: Are there fundamental di fg erences between symbolist / classical accounts of information processing and connectionist / neural ones? 2.


slide-1
SLIDE 1

Fodor & Pylyshyn 1998 Lake & Baroni 2018

Jacob Andreas / MIT 6.884 / Fall 2020

slide-2
SLIDE 2

Today

  • 1. F&P: Are there fundamental difgerences

between symbolist / classical accounts of information processing and connectionist / neural ones?

  • 2. How much progress have neural models

made towards addressing the concerns raised by F&P?

slide-3
SLIDE 3

The research program

F&P: “The architecture of the cognitive system consists of the set of basic

  • perations, resources, functions, principles, etc (generally the sorts of

properties that would be described in a “user’s manual” for that architecture if it were available on a computer), whose domain and range are the representational states of the organism. It follows that, if you want to make good the Connectionist theory as a theory of cognitive architecture, you have to show that the processes which operate on the representational states of an organism are those which are specified by a Connectionist architecture.”

slide-4
SLIDE 4

Historical context

Smolensky 1988: “Higher-level analyses [of] connectionist models reveal subtle relations to symbolic models. […] At the lower level, compntation has the character of massively parallel satisfaction of soft numerical constraints; at the higher level, this can lead to competence characterizable by hard rules. Performance will typically deviate from this competence since behavior is achieved not by interpreting hard rules but by satisfying soft constraints.”

slide-5
SLIDE 5

Historical context

Rumelhart & McClelland 1985: “Children are typically said to pass through a three-phase acquisition process in which they first learn past tense by rote, then learn the past tense rule and overregularize, and then finally learn the exceptions to the rule. We show that the acquisition data can be accounted for in more detail by dispensing with the assumption that the child [eamns rules and substituting in its place a simple homogeneous learning

  • procedure. We show how ‘rule-like’ behavior can emerge from the

interactions among a network of units encoding the root form to past tense mapping.”

slide-6
SLIDE 6

The research program F&P: “Not so fast!

Specific aspects of human mental representations and information processing seem poorly captured by current connectionist models.”

slide-7
SLIDE 7

F&P's argument

  • 1a. Classical representations have combinatorial syntax

& semantics; connectionist ones cannot.

slide-8
SLIDE 8

F&P's argument

  • 1b. Classical information processing operations are

sensitive to structure; connectionist ones are not.

  • 1a. Classical representations have combinatorial syntax

& semantics; connectionist ones cannot.

slide-9
SLIDE 9

F&P's argument

  • 2a. Human language (& thought?) are productive, which

requires structure sensitivity and combinatoriality.

slide-10
SLIDE 10

F&P's argument

  • 2b. Ditto for systematicity rather than productivity.
  • 2a. Human language (& thought?) are productive, which

requires structure sensitivity and combinatoriality.

slide-11
SLIDE 11

F&P's argument

∴ Connectionist models cannot model human

language (/ thought). (But classical models probably can.)

slide-12
SLIDE 12

Discussion

slide-13
SLIDE 13

Combinatorial structure

slide-14
SLIDE 14

Sample task

The cat is on the mat.

[https://www.amazon.in/Feline-Yogi-Original-Yoga-Cat]

True

slide-15
SLIDE 15

Sample task

The fox is in a box.

[https://www.amazon.in/Feline-Yogi-Original-Yoga-Cat]

False

slide-16
SLIDE 16

A classical implementation

The cat is on the mat.

[https://www.amazon.in/Feline-Yogi-Original-Yoga-Cat]

[[The cat] [is [on the mat]]]

slide-17
SLIDE 17

The cat is on the mat.

[https://www.amazon.in/Feline-Yogi-Original-Yoga-Cat]

cat(x), mat(y), on(x, y)

A classical implementation

[[The cat] [is [on the mat]]]

slide-18
SLIDE 18

The cat is on the mat.

[https://www.amazon.in/Feline-Yogi-Original-Yoga-Cat]

cat(x) mat(y) red(y)

  • n(x, y)

… cat(x), mat(y), on(x, y)

A classical implementation

[[The cat] [is [on the mat]]]

slide-19
SLIDE 19

The cat is on the mat.

[https://www.amazon.in/Feline-Yogi-Original-Yoga-Cat]

True

cat(x), mat(y), on(x, y)

A classical implementation

[[The cat] [is [on the mat]]] cat(x) mat(y) red(y)

  • n(x, y)

slide-20
SLIDE 20

A connectionist implementation

The cat is on the mat.

slide-21
SLIDE 21

A connectionist implementation

The cat is on the mat.

slide-22
SLIDE 22

A connectionist implementation

The cat is on the mat.

slide-23
SLIDE 23

A connectionist implementation

The cat is on the mat.

slide-24
SLIDE 24

A connectionist implementation

The cat is on the mat.

  • n(cat, mat)
  • n(cat,

mat)

slide-25
SLIDE 25

A modern neural implementation

The cat is on the mat.

[https://www.amazon.in/Feline-Yogi-Original-Yoga-Cat]

True

slide-26
SLIDE 26

The cat is on the mat and the fox is in a box.

[https://www.amazon.in/Feline-Yogi-Original-Yoga-Cat]

True

[[[The cat] [is [on the mat]]] [and [the fox [is [in a box]]]]] cat(x) mat(y) red(y)

  • n(x, y)

… cat(x), mat(y), box(z), on(x, y), …

A classical implementation

slide-27
SLIDE 27

A connectionist implementation

  • n(cat, mat)

The cat is on the mat and the fox is in a box.

in(fox, box) ???

slide-28
SLIDE 28

A connectionist implementation

  • n1(., cat)

The cat is on the mat and the fox is in a box.

  • n2(., mat) ???
slide-29
SLIDE 29

A modern neural implementation

[https://www.amazon.in/Feline-Yogi-Original-Yoga-Cat]

True The cat is on the mat and the fox is in a box.

slide-30
SLIDE 30

Classical representations contain their constituents

[[[The cat] [is [on the mat]]] [and [the fox [is [in a box]]]]] [[The cat] [is [on the mat]]]

slide-31
SLIDE 31

Classical representations contain their constituents

[[[The cat] [is [on the mat]]] [and [the fox [is [in a box]]]]] [[The cat] [is [on the mat]]]

slide-32
SLIDE 32

Constituents of connectionist representations?

The cat is on the mat and the fox is in the box. The cat is on the mat.

slide-33
SLIDE 33

Algebraic structure [[[The cat] [is [on the mat]]] [and [the fox [is [in a box]]]]] [[The cat] [is [on the mat]]] [the fox [is [in a box]]]

* =

slide-34
SLIDE 34

Combinatorial structure and(in(fox, box), on(cat, mat))

  • n(cat, mat)

in(fox, box)

* =

slide-35
SLIDE 35

Combinatorial structure

* =

slide-36
SLIDE 36

Combinatorial structure

* =

???

slide-37
SLIDE 37

Algebraic structure [[[The cat] [is [on the mat]]] [and [the fox [is [in a box]]]]] [[The cat] [is [on the mat]]] [the fox [is [in a box]]]

* =

slide-38
SLIDE 38

Algebraic structure

* =

α β (α * β)

slide-39
SLIDE 39

Discussion

slide-40
SLIDE 40

The cat is on the mat.

[https://www.amazon.in/Feline-Yogi-Original-Yoga-Cat]

True

cat(x), mat(y), on(x, y)

Structure-sensitive processing

[[The cat] [is [on the mat]]] cat(x) mat(y) red(y)

  • n(x, y)

slide-41
SLIDE 41

Structure-sensitive processing

True

α ∧ β → β

and(in(fox, box), on(cat, mat))

  • n(cat, mat)

slide-42
SLIDE 42

Structure-sensitive processing

True

[[.]]

αβ → [[α]] ∧ [[β]]

red cat and(cat(x), red(x))

slide-43
SLIDE 43

Structure-sensitive processing

True

αβ → [[α]] ∧ [[β]]

fake gun and(fake(x), gun(x))

[[.]]

slide-44
SLIDE 44

Structure-sensitive processing

The cat is on the mat.

slide-45
SLIDE 45

Structure-sensitive processing

The cat is on the mat.

slide-46
SLIDE 46

Structure-sensitive processing

The cat is on the mat. True

slide-47
SLIDE 47

Structure-sensitive processing

True The cat is on the mat.

slide-48
SLIDE 48

Discussion

slide-49
SLIDE 49

Break

slide-50
SLIDE 50

Linguistic productivity “Infinite use of finite means”

  • W. von Humboldt

this is the dog that chased the cat that ate the rat that lived in the house that Jack built…

slide-51
SLIDE 51

The competence/performance distinction

Chomsky 1965: Linguistic theory is concerned primarily with an ideal speaker-listener, in a completely homogeneous speech- community, who knows its (the speech community's) language perfectly and is unafgected by such grammatically irrelevant conditions as memory limitations, distractions, shifts of attention and interest, and errors (random or characteristic) in applying his knowledge of this language in actual performance.

Linguistic competence (including claims about productivity of language) concerns this idealized speaker.

slide-52
SLIDE 52

The competence/performance distinction?

Labov 1971: It is now evident to many linguists that the primary purpose of the [performance/competence] distinction has been to help the linguist exclude data which he finds inconvenient to handle.

slide-53
SLIDE 53

The cat is on the mat. True

cat(x), mat(y), on(x, y)

Productivity in classical models

[[The cat] [is [on the mat]]] cat(x) mat(y) red(y)

  • n(x, y)

Claim: like humans, the classical model can interpret arbitrarily complex sentences:

slide-54
SLIDE 54

The cat is on the mat. True

cat(x), mat(y), on(x, y)

Productivity in classical models

[[The cat] [is [on the mat]]] cat(x) mat(y) red(y)

  • n(x, y)

Claim: like humans, the classical model can interpret arbitrarily complex sentences: Need more processing power? Just add RAM!

slide-55
SLIDE 55

Productivity in connectionist models

  • n(cat, mat)

and(on(cat, mat), in(fox, box)) and(on(cat, mat), and(in(fox, box), in(cub, tub)))

slide-56
SLIDE 56

You can’t cram the meaning of a whole %&!$# sentence into a single $&!#* vector! [Ray Mooney, ca. 2014]

slide-57
SLIDE 57

Productivity in neural models

[Bahdanau 2015]

slide-58
SLIDE 58

Productivity in neural models

[Bahdanau 2015]

Need more processing power? Just add steps/layers/precision!

slide-59
SLIDE 59

Unit 439 bakery OR bank vault OR shopfront

IoU 0.08

Unit 314 operating room OR castle OR bathroom

IoU 0.05

(d) polysemanticity

Logical labels for neurons

[Mu and Andreas 2020; c.f. Bau et al. 2017, Dalvi et al. 2018]

slide-60
SLIDE 60
  • Logical labels for neurons

[Mu and Andreas 2020; c.f. Bau et al. 2017, Dalvi et al. 2018]

slide-61
SLIDE 61

Discussion

slide-62
SLIDE 62

Systematicity

F&P: What we mean when we say that linguistic capacities are systematic is that the ability to produce / understand some sentences is intrinsically connected to the ability to produce / understand certain others.

the cat is on the mat the mat is on the cat

slide-63
SLIDE 63

Systematicity

NP NP V PP → PP

  • n NP

→ NP the cat sat on the mat ⇝

slide-64
SLIDE 64

Systematicity

NP the mat sat on the cat ⇒ ⇝ NP NP V PP → PP

  • n NP

→ NP the cat sat on the mat ⇝

slide-65
SLIDE 65

Connectionist models permit non-systematicity

The cat is on the mat.

  • n(cat, mat)
  • n(mat, cat)
slide-66
SLIDE 66

(but so do classical ones)

PP

  • n NP2

→ NP the mat sat on the cat ⇏ ⇝ NP NP1 V PP → NP the cat sat on the mat ⇝

slide-67
SLIDE 67

(but so do classical ones)

PP

  • n NP2

→ NP *itself sat on the cat ⇏ ⇝ NP NP1 V PP → NP the cat sat on itself ⇝

slide-68
SLIDE 68

Takeaway Systematicity is a property of a parameterization, not just a model class!

slide-69
SLIDE 69

Discussion

slide-70
SLIDE 70

F&P's conclusions

F&P: By contrast, since the Connectionist architecture recognizes no combinatorial structure in mental representations, gaps in cognitive competence should proliferate arbitrarily. It’s not just that you’d expect to get them from time to time; it’s that, on the ‘no-structure’ story, gaps are the unmarked case. It's the systematic competence that the theory is required to treat as an embarrassment. But, as a matter

  • f fact, inferential competences are blatantly systematic. So there

must be something deeply wrong with Connectionist architecture. […but] we have no objection at all to networks as potential implementation models, nor do we suppose that any of the arguments we’ve given are incompatible with this proposal.

slide-71
SLIDE 71

F&P's conclusions

F&P: By contrast, since the Connectionist architecture recognizes no combinatorial structure in mental representations, gaps in cognitive competence should proliferate arbitrarily. It’s not just that you’d expect to get them from time to time; it’s that, on the ‘no-structure’ story, gaps are the unmarked case. It's the systematic competence that the theory is required to treat as an embarrassment. But, as a matter

  • f fact, inferential competences are blatantly systematic. So there

must be something deeply wrong with Connectionist architecture. […but] we have no objection at all to networks as potential implementation models, nor do we suppose that any of the arguments we’ve given are incompatible with this proposal.

slide-72
SLIDE 72

The worst RNN in the world

[0.00100…] [0.00101…] [0.01101…]

slide-73
SLIDE 73

More realistic connectionist symbol processing?

[Kaiser & Sutskever 2015]

slide-74
SLIDE 74

Discussion

slide-75
SLIDE 75

Empirical results

L&B: connectionist models can be made systematic in principle, but are they systematic in practice?

slide-76
SLIDE 76

Operationalizing systematicity

jump ⇒ JUMP jump left ⇒ LTURN JUMP jump around right ⇒ RTURN JUMP RTURN JUMP RTURN JUMP RTURN JUMP turn left twice ⇒ LTURN LTURN jump thrice ⇒ JUMP JUMP JUMP jump opposite left and walk thrice ⇒ LTURN LTURN JUMP WALK WALK WALK jump opposite left after walk around left ⇒ LTURN WALK LTURN WALK LTURN WALK LTURN WALK LTURN LTURN JUMP

slide-77
SLIDE 77

Operationalizing systematicity

jump ⇒ JUMP jump left ⇒ LTURN JUMP jump around right ⇒ RTURN JUMP RTURN JUMP RTURN JUMP RTURN JUMP turn left twice ⇒ LTURN LTURN jump thrice ⇒ JUMP JUMP JUMP jump opposite left and walk thrice ⇒ LTURN LTURN JUMP WALK WALK WALK jump opposite left after walk around left ⇒ LTURN WALK LTURN WALK LTURN WALK LTURN WALK LTURN LTURN JUMP

slide-78
SLIDE 78

Empirical results

1% 2% 4% 8% 16% 32% 64% 3ercent of commands used for training 20 40 60 80 100 Accuracy on new commands (%)

slide-79
SLIDE 79

Empirical results

slide-80
SLIDE 80

Empirical results

90.3% 1.2%

“turn left” “jump”

slide-81
SLIDE 81

Conclusions

L&B: Given the astounding successes of seq2seq models in challenging tasks such as machine translation, one might argue that failure to generalize by systematic composition indicates that neural networks are poor models of some aspects of human cognition, but it is of little practical import. However, systematicity is an extremely effjcient way to generalize […] this ability is still beyond the grasp of state-of-the-art neural networks, likely contributing to their striking need for very large training sets. These results give us hope that neural networks capable of systematic compositionality could greatly benefit machine translation, language modeling, and other applications.

slide-82
SLIDE 82

Discussion

slide-83
SLIDE 83

See you next week!