A computational model of S-selection Aaron Steven White 1 2 Kyle - - PowerPoint PPT Presentation

a computational model of s selection
SMART_READER_LITE
LIVE PREVIEW

A computational model of S-selection Aaron Steven White 1 2 Kyle - - PowerPoint PPT Presentation

A computational model of S-selection Aaron Steven White 1 2 Kyle Rawlins 1 Semantics and Linguistic Theory 26 University of Texas, Austin 14 th May, 2016 Johns Hopkins University 1 Department of Cognitive Science 2 Center for Language and Speech


slide-1
SLIDE 1

A computational model of S-selection

Aaron Steven White 1 2 Kyle Rawlins 1 Semantics and Linguistic Theory 26 University of Texas, Austin 14th May, 2016

Johns Hopkins University

1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

1

slide-2
SLIDE 2

Slides available at aswhite.net

2

slide-3
SLIDE 3

Introduction

slide-4
SLIDE 4

Introduction

Preliminary Traditional distributional analyses have had tremendous suc- cess in helping us understand S(emantic)-selection S-selection What type signatures does a predicate’s denotation have? Challenge These analyses can be difficult to scale to an entire lexicon

4

slide-5
SLIDE 5

Introduction

Preliminary Traditional distributional analyses have had tremendous suc- cess in helping us understand S(emantic)-selection S-selection What type signatures does a predicate’s denotation have? Challenge These analyses can be difficult to scale to an entire lexicon

4

slide-6
SLIDE 6

Introduction

Preliminary Traditional distributional analyses have had tremendous suc- cess in helping us understand S(emantic)-selection S-selection What type signatures does a predicate’s denotation have? Challenge These analyses can be difficult to scale to an entire lexicon

4

slide-7
SLIDE 7

Introduction

Goals

  • 1. Demonstrate a combined experimental-computational

method for scaling distributional analysis

  • 2. Show that this method provides insight into general

principles governing lexical semantic structure Basic idea

  • 1. Formalize S(emantic)-selection, projection rules, and

lexical idiosyncrasy at Marr’s (1982) computational level

  • 2. Collect data on

1000 verbs’ syntactic distributions

  • 3. Given syntactic distribution data, use computational

techniques to automate inference of projection rules and verbs’ semantic type, controlling for lexical idiosyncrasy

5

slide-8
SLIDE 8

Introduction

Goals

  • 1. Demonstrate a combined experimental-computational

method for scaling distributional analysis

  • 2. Show that this method provides insight into general

principles governing lexical semantic structure Basic idea

  • 1. Formalize S(emantic)-selection, projection rules, and

lexical idiosyncrasy at Marr’s (1982) computational level

  • 2. Collect data on

1000 verbs’ syntactic distributions

  • 3. Given syntactic distribution data, use computational

techniques to automate inference of projection rules and verbs’ semantic type, controlling for lexical idiosyncrasy

5

slide-9
SLIDE 9

Introduction

Goals

  • 1. Demonstrate a combined experimental-computational

method for scaling distributional analysis

  • 2. Show that this method provides insight into general

principles governing lexical semantic structure Basic idea

  • 1. Formalize S(emantic)-selection, projection rules, and

lexical idiosyncrasy at Marr’s (1982) computational level

  • 2. Collect data on

1000 verbs’ syntactic distributions

  • 3. Given syntactic distribution data, use computational

techniques to automate inference of projection rules and verbs’ semantic type, controlling for lexical idiosyncrasy

5

slide-10
SLIDE 10

Introduction

Goals

  • 1. Demonstrate a combined experimental-computational

method for scaling distributional analysis

  • 2. Show that this method provides insight into general

principles governing lexical semantic structure Basic idea

  • 1. Formalize S(emantic)-selection, projection rules, and

lexical idiosyncrasy at Marr’s (1982) computational level

  • 2. Collect data on ∼1000 verbs’ syntactic distributions
  • 3. Given syntactic distribution data, use computational

techniques to automate inference of projection rules and verbs’ semantic type, controlling for lexical idiosyncrasy

5

slide-11
SLIDE 11

Introduction

Goals

  • 1. Demonstrate a combined experimental-computational

method for scaling distributional analysis

  • 2. Show that this method provides insight into general

principles governing lexical semantic structure Basic idea

  • 1. Formalize S(emantic)-selection, projection rules, and

lexical idiosyncrasy at Marr’s (1982) computational level

  • 2. Collect data on ∼1000 verbs’ syntactic distributions
  • 3. Given syntactic distribution data, use computational

techniques to automate inference of projection rules and verbs’ semantic type, controlling for lexical idiosyncrasy

5

slide-12
SLIDE 12

Introduction

Focus Clause-embedding predicates (∼1000 in English) Case study Responsive predicates: take both interrogative and declaratives (1) John knows {that, whether} it’s raining. Importance Deep literature on S-selection properties of responsives Do they take questions, propositions, or both? (Karttunen 1977, Groenendijk

& Stokhof 1984, Heim 1994, Ginzburg 1995, Lahiri 2002, George 2011, Rawlins 2013, Spector & Egre 2015, Uegaki 2015)

6

slide-13
SLIDE 13

Introduction

Focus Clause-embedding predicates (∼1000 in English) Case study Responsive predicates: take both interrogative and declaratives (1) John knows {that, whether} it’s raining. Importance Deep literature on S-selection properties of responsives Do they take questions, propositions, or both? (Karttunen 1977, Groenendijk

& Stokhof 1984, Heim 1994, Ginzburg 1995, Lahiri 2002, George 2011, Rawlins 2013, Spector & Egre 2015, Uegaki 2015)

6

slide-14
SLIDE 14

Introduction

Focus Clause-embedding predicates (∼1000 in English) Case study Responsive predicates: take both interrogative and declaratives (1) John knows {that, whether} it’s raining. Importance Deep literature on S-selection properties of responsives Do they take questions, propositions, or both? (Karttunen 1977, Groenendijk

& Stokhof 1984, Heim 1994, Ginzburg 1995, Lahiri 2002, George 2011, Rawlins 2013, Spector & Egre 2015, Uegaki 2015)

6

slide-15
SLIDE 15

Outline

Introduction Selection and clausal embedding The MegaAttitude data set Model fitting and results Conclusions and future directions Appendix

7

slide-16
SLIDE 16

Selection and clausal embedding

slide-17
SLIDE 17

Multiplicity

Many verbs are syntactically multiplicitous (2) a. John knows {that, whether} it’s raining. b. John wants {it to rain, rain}. Syntactic multiplicity does not imply semantic multiplicity (3) a. John knows [what the answer is]S. b. John knows [the answer]NP. (3b) = (3a) suggests it is possible for type NP type S

9

slide-18
SLIDE 18

Multiplicity

Many verbs are syntactically multiplicitous (2) a. John knows {that, whether} it’s raining. b. John wants {it to rain, rain}. Syntactic multiplicity does not imply semantic multiplicity (3) a. John knows [what the answer is]S. b. John knows [the answer]NP. (3b) = (3a) suggests it is possible for type NP type S

9

slide-19
SLIDE 19

Multiplicity

Many verbs are syntactically multiplicitous (2) a. John knows {that, whether} it’s raining. b. John wants {it to rain, rain}. Syntactic multiplicity does not imply semantic multiplicity (3) a. John knows [what the answer is]S. b. John knows [the answer]NP. (3b) = (3a) suggests it is possible for type(NP) = type(S)

9

slide-20
SLIDE 20

Projection

What do the projection rules look like? How are a verb’s semantic type signatures projected onto its syntactic type signatures (subcategorization frames)?

(Gruber 1965, Jackendoff 1972, Carter 1976, Grimshaw 1979, 1990, Chomsky 1981, Pesetsky 1982, 1991, Pinker 1984, 1989, Levin 1993)

[ Q] ⟨⟨⟨s,t⟩,t⟩, t⟩ [ Q]

(Grimshaw’s notation) (Montagovian notation)

[ S] [ NP] Semantic type Projection Syntactic type

10

slide-21
SLIDE 21

Projection

What do the projection rules look like? How are a verb’s semantic type signatures projected onto its syntactic type signatures (subcategorization frames)?

(Gruber 1965, Jackendoff 1972, Carter 1976, Grimshaw 1979, 1990, Chomsky 1981, Pesetsky 1982, 1991, Pinker 1984, 1989, Levin 1993)

[ Q] ⟨⟨⟨s,t⟩,t⟩, t⟩ [ Q]

(Grimshaw’s notation) (Montagovian notation)

[ S] [ NP] Semantic type Projection Syntactic type

10

slide-22
SLIDE 22

Projection

What do the projection rules look like? How are a verb’s semantic type signatures projected onto its syntactic type signatures (subcategorization frames)?

(Gruber 1965, Jackendoff 1972, Carter 1976, Grimshaw 1979, 1990, Chomsky 1981, Pesetsky 1982, 1991, Pinker 1984, 1989, Levin 1993)

[ Q] ⟨⟨⟨s,t⟩,t⟩, t⟩ [ Q]

(Grimshaw’s notation) (Montagovian notation)

[ S] [ NP] Semantic type Projection Syntactic type

10

slide-23
SLIDE 23

Projection

What do the projection rules look like? How are a verb’s semantic type signatures projected onto its syntactic type signatures (subcategorization frames)?

(Gruber 1965, Jackendoff 1972, Carter 1976, Grimshaw 1979, 1990, Chomsky 1981, Pesetsky 1982, 1991, Pinker 1984, 1989, Levin 1993)

[ Q] ⟨⟨⟨s,t⟩,t⟩, t⟩ [ Q]

(Grimshaw’s notation) (Montagovian notation)

[ S] [ NP] Semantic type Projection Syntactic type

10

slide-24
SLIDE 24

A model of S-selection and projection

Semantic Type Syntactic Distribution Idealized Syntactic Distribution Observed Syntactic Distribution Acceptability Judgment Data Projection Rules Lexical Noise Noise Model

11

slide-25
SLIDE 25

Lexical idiosyncrasy

Lexical idiosyncrasy Observed syntactic distributions are not a perfect reflection of semantic type + projection rules Example Some Q(uestion)-selecting verbs allow concealed questions... (4) a. Mary asked what time it was. b. Mary asked the time. ...others do not (Grimshaw 1979, Pesetsky 1982, 1991, Nathan 2006, Frana 2010, a.o.) (5) a. Mary wondered what time it was.

  • b. *Mary wondered the time.

12

slide-26
SLIDE 26

Lexical idiosyncrasy

Lexical idiosyncrasy Observed syntactic distributions are not a perfect reflection of semantic type + projection rules Example Some Q(uestion)-selecting verbs allow concealed questions... (4) a. Mary asked what time it was. b. Mary asked the time. ...others do not (Grimshaw 1979, Pesetsky 1982, 1991, Nathan 2006, Frana 2010, a.o.) (5) a. Mary wondered what time it was.

  • b. *Mary wondered the time.

12

slide-27
SLIDE 27

Two kinds of lexical idiosyncrasy

Grimshaw (1979) Verbs are related to semantic type signatures (S-selection) and syntactic type signatures (C-selection) Pesetsky (1982, 1991) Verbs are related to semantic type signatures (S-selection); C- selection is an epiphenomenon of verbs’ abstract case Shared core Lexical noise (idiosyncrasy) alters verbs’ idealized syntactic dis- tributions

13

slide-28
SLIDE 28

Two kinds of lexical idiosyncrasy

Grimshaw (1979) Verbs are related to semantic type signatures (S-selection) and syntactic type signatures (C-selection) Pesetsky (1982, 1991) Verbs are related to semantic type signatures (S-selection); C- selection is an epiphenomenon of verbs’ abstract case Shared core Lexical noise (idiosyncrasy) alters verbs’ idealized syntactic dis- tributions

13

slide-29
SLIDE 29

Two kinds of lexical idiosyncrasy

Grimshaw (1979) Verbs are related to semantic type signatures (S-selection) and syntactic type signatures (C-selection) Pesetsky (1982, 1991) Verbs are related to semantic type signatures (S-selection); C- selection is an epiphenomenon of verbs’ abstract case Shared core Lexical noise (idiosyncrasy) alters verbs’ idealized syntactic dis- tributions

13

slide-30
SLIDE 30

A model of S-selection and projection

Semantic Type Syntactic Distribution Idealized Syntactic Distribution Observed Syntactic Distribution Acceptability Judgment Data Projection Rules Lexical Noise Noise Model

14

slide-31
SLIDE 31

Specifying the model

Question How do we represent each object in the model? A minimalistic answer Every object is a matrix of boolean values Strategy

  • 1. Give model in terms of sets and functions
  • 2. Convert this model into a boolean matrix model

15

slide-32
SLIDE 32

Specifying the model

Question How do we represent each object in the model? A minimalistic answer Every object is a matrix of boolean values Strategy

  • 1. Give model in terms of sets and functions
  • 2. Convert this model into a boolean matrix model

15

slide-33
SLIDE 33

Specifying the model

Question How do we represent each object in the model? A minimalistic answer Every object is a matrix of boolean values Strategy

  • 1. Give model in terms of sets and functions
  • 2. Convert this model into a boolean matrix model

15

slide-34
SLIDE 34

Specifying the model

Question How do we represent each object in the model? A minimalistic answer Every object is a matrix of boolean values Strategy

  • 1. Give model in terms of sets and functions
  • 2. Convert this model into a boolean matrix model

15

slide-35
SLIDE 35

A model of S-selection and projection

Semantic Type Syntactic Distribution Idealized Syntactic Distribution Observed Syntactic Distribution Acceptability Judgment Data Projection Rules Lexical Noise Noise Model

16

slide-36
SLIDE 36

A boolean model of S-selection

know → {[ P], [ Q]} think → {[ P]} wonder → {[ Q]} S =      [ P] [ Q] · · · think 1 · · · know 1 1 · · · wonder 1 · · · · · · . . . . . . ...     

17

slide-37
SLIDE 37

A boolean model of S-selection

know → {[ P], [ Q]} think → {[ P]} wonder → {[ Q]} S =      [ P] [ Q] · · · think 1 · · · know 1 1 · · · wonder 1 · · · · · · . . . . . . ...     

17

slide-38
SLIDE 38

A boolean model of S-selection

know → {[ P], [ Q]} think → {[ P]} wonder → {[ Q]} S =      [ P] [ Q] · · · think 1 · · · know 1 1 · · · wonder 1 · · · · · · . . . . . . ...     

17

slide-39
SLIDE 39

A boolean model of S-selection

know → {[ P], [ Q]} think → {[ P]} wonder → {[ Q]} S =      [ P] [ Q] · · · think 1 · · · know 1 1 · · · wonder 1 · · · · · · . . . . . . ...     

17

slide-40
SLIDE 40

A boolean model of projection

[ P] → {[ that S], [ NP], ...} [ Q] → {[ whether S], [ NP], ...} Π =    [ that S] [ whether S] [ NP] · · · [ P] 1 1 · · · [ Q] 1 1 · · · · · · . . . . . . . . . ...   

18

slide-41
SLIDE 41

A boolean model of projection

[ P] → {[ that S], [ NP], ...} [ Q] → {[ whether S], [ NP], ...} Π =    [ that S] [ whether S] [ NP] · · · [ P] 1 1 · · · [ Q] 1 1 · · · · · · . . . . . . . . . ...   

18

slide-42
SLIDE 42

A boolean model of idealized syntactic distribution

ˆ D(VERB, SYNTYPE) = ∨

t∈SEMTYPES S(VERB, t) ∧ Π(t, SYNTYPE)

ˆ D(know, [ that S]) = ∨

t∈{[ P],[ Q],...} S(know, t) ∧ Π(t, [

that S]) ˆ D(wonder, [ NP]) = ∨

t∈{[ P],[ Q],...} S(wonder, t) ∧ Π(t, [

NP]) ˆ D(VERB, SYNTYPE) = ∨

t∈SEMTYPES S(VERB, t) ∧ Π(t, SYNTYPE)

ˆ D(know, [ that S]) = 1 − ∏

t∈{[ P],[ Q],...} 1 − S(know, t) × Π(t, [

that S])       [ P] [ Q] · · · think 1 · · · know 1 1 · · · wonder 1 · · · · · · . . . . . . ...             [ P] [ Q] · · · think 0.94 0.03 · · · know 0.97 0.91 · · · wonder 0.17 0.93 · · · · · · . . . . . . ...           [ that S] [ whether S] [ NP] · · · [ P] 1 1 · · · [ Q] 1 1 · · · · · · . . . . . . . . . ...         [ that S] [ whether S] · · · [ P] 0.99 0.12 · · · [ Q] 0.07 0.98 · · · · · · . . . . . . ...           [ that S] [ whether S] [ NP] · · · think 1 1 · · · know 1 1 1 · · · wonder 1 1 · · · · · · . . . . . . . . . ...             [ that S] [ whether S] · · · think 0.97 0.14 · · · know 0.95 0.99 · · · wonder 0.12 0.99 · · · · · · . . . . . . ...       19

slide-43
SLIDE 43

A boolean model of idealized syntactic distribution

ˆ D(VERB, SYNTYPE) = ∨

t∈SEMTYPES S(VERB, t) ∧ Π(t, SYNTYPE)

ˆ D(know, [ that S]) = ∨

t∈{[ P],[ Q],...} S(know, t) ∧ Π(t, [

that S]) ˆ D(wonder, [ NP]) = ∨

t∈{[ P],[ Q],...} S(wonder, t) ∧ Π(t, [

NP]) ˆ D(VERB, SYNTYPE) = ∨

t∈SEMTYPES S(VERB, t) ∧ Π(t, SYNTYPE)

ˆ D(know, [ that S]) = 1 − ∏

t∈{[ P],[ Q],...} 1 − S(know, t) × Π(t, [

that S])       [ P] [ Q] · · · think 1 · · · know 1 1 · · · wonder 1 · · · · · · . . . . . . ...             [ P] [ Q] · · · think 0.94 0.03 · · · know 0.97 0.91 · · · wonder 0.17 0.93 · · · · · · . . . . . . ...           [ that S] [ whether S] [ NP] · · · [ P] 1 1 · · · [ Q] 1 1 · · · · · · . . . . . . . . . ...         [ that S] [ whether S] · · · [ P] 0.99 0.12 · · · [ Q] 0.07 0.98 · · · · · · . . . . . . ...           [ that S] [ whether S] [ NP] · · · think 1 1 · · · know 1 1 1 · · · wonder 1 1 · · · · · · . . . . . . . . . ...             [ that S] [ whether S] · · · think 0.97 0.14 · · · know 0.95 0.99 · · · wonder 0.12 0.99 · · · · · · . . . . . . ...       19

slide-44
SLIDE 44

A boolean model of idealized syntactic distribution

ˆ D(VERB, SYNTYPE) = ∨

t∈SEMTYPES S(VERB, t) ∧ Π(t, SYNTYPE)

ˆ D(know, [ that S]) = ∨

t∈{[ P],[ Q],...} S(know, t) ∧ Π(t, [

that S]) ˆ D(wonder, [ NP]) = ∨

t∈{[ P],[ Q],...} S(wonder, t) ∧ Π(t, [

NP]) ˆ D(VERB, SYNTYPE) = ∨

t∈SEMTYPES S(VERB, t) ∧ Π(t, SYNTYPE)

ˆ D(know, [ that S]) = 1 − ∏

t∈{[ P],[ Q],...} 1 − S(know, t) × Π(t, [

that S])       [ P] [ Q] · · · think 1 · · · know 1 1 · · · wonder 1 · · · · · · . . . . . . ...             [ P] [ Q] · · · think 0.94 0.03 · · · know 0.97 0.91 · · · wonder 0.17 0.93 · · · · · · . . . . . . ...           [ that S] [ whether S] [ NP] · · · [ P] 1 1 · · · [ Q] 1 1 · · · · · · . . . . . . . . . ...         [ that S] [ whether S] · · · [ P] 0.99 0.12 · · · [ Q] 0.07 0.98 · · · · · · . . . . . . ...           [ that S] [ whether S] [ NP] · · · think 1 1 · · · know 1 1 1 · · · wonder 1 1 · · · · · · . . . . . . . . . ...             [ that S] [ whether S] · · · think 0.97 0.14 · · · know 0.95 0.99 · · · wonder 0.12 0.99 · · · · · · . . . . . . ...       19

slide-45
SLIDE 45

A boolean model of idealized syntactic distribution

ˆ D(VERB, SYNTYPE) = ∨

t∈SEMTYPES S(VERB, t) ∧ Π(t, SYNTYPE)

ˆ D(know, [ that S]) = ∨

t∈{[ P],[ Q],...} S(know, t) ∧ Π(t, [

that S]) ˆ D(wonder, [ NP]) = ∨

t∈{[ P],[ Q],...} S(wonder, t) ∧ Π(t, [

NP]) ˆ D(VERB, SYNTYPE) = ∨

t∈SEMTYPES S(VERB, t) ∧ Π(t, SYNTYPE)

ˆ D(know, [ that S]) = 1 − ∏

t∈{[ P],[ Q],...} 1 − S(know, t) × Π(t, [

that S])       [ P] [ Q] · · · think 1 · · · know 1 1 · · · wonder 1 · · · · · · . . . . . . ...             [ P] [ Q] · · · think 0.94 0.03 · · · know 0.97 0.91 · · · wonder 0.17 0.93 · · · · · · . . . . . . ...           [ that S] [ whether S] [ NP] · · · [ P] 1 1 · · · [ Q] 1 1 · · · · · · . . . . . . . . . ...         [ that S] [ whether S] · · · [ P] 0.99 0.12 · · · [ Q] 0.07 0.98 · · · · · · . . . . . . ...           [ that S] [ whether S] [ NP] · · · think 1 1 · · · know 1 1 1 · · · wonder 1 1 · · · · · · . . . . . . . . . ...             [ that S] [ whether S] · · · think 0.97 0.14 · · · know 0.95 0.99 · · · wonder 0.12 0.99 · · · · · · . . . . . . ...       19

slide-46
SLIDE 46

A boolean model of idealized syntactic distribution

ˆ D(VERB, SYNTYPE) = ∨

t∈SEMTYPES S(VERB, t) ∧ Π(t, SYNTYPE)

ˆ D(know, [ that S]) = ∨

t∈{[ P],[ Q],...} S(know, t) ∧ Π(t, [

that S]) ˆ D(wonder, [ NP]) = ∨

t∈{[ P],[ Q],...} S(wonder, t) ∧ Π(t, [

NP]) ˆ D(VERB, SYNTYPE) = ∨

t∈SEMTYPES S(VERB, t) ∧ Π(t, SYNTYPE)

ˆ D(know, [ that S]) = 1 − ∏

t∈{[ P],[ Q],...} 1 − S(know, t) × Π(t, [

that S])       [ P] [ Q] · · · think 1 · · · know 1 1 · · · wonder 1 · · · · · · . . . . . . ...             [ P] [ Q] · · · think 0.94 0.03 · · · know 0.97 0.91 · · · wonder 0.17 0.93 · · · · · · . . . . . . ...           [ that S] [ whether S] [ NP] · · · [ P] 1 1 · · · [ Q] 1 1 · · · · · · . . . . . . . . . ...         [ that S] [ whether S] · · · [ P] 0.99 0.12 · · · [ Q] 0.07 0.98 · · · · · · . . . . . . ...           [ that S] [ whether S] [ NP] · · · think 1 1 · · · know 1 1 1 · · · wonder 1 1 · · · · · · . . . . . . . . . ...             [ that S] [ whether S] · · · think 0.97 0.14 · · · know 0.95 0.99 · · · wonder 0.12 0.99 · · · · · · . . . . . . ...       19

slide-47
SLIDE 47

A boolean model of idealized syntactic distribution

ˆ D(VERB, SYNTYPE) = ∨

t∈SEMTYPES S(VERB, t) ∧ Π(t, SYNTYPE)

ˆ D(know, [ that S]) = ∨

t∈{[ P],[ Q],...} S(know, t) ∧ Π(t, [

that S]) ˆ D(wonder, [ NP]) = ∨

t∈{[ P],[ Q],...} S(wonder, t) ∧ Π(t, [

NP]) ˆ D(VERB, SYNTYPE) = ∨

t∈SEMTYPES S(VERB, t) ∧ Π(t, SYNTYPE)

ˆ D(know, [ that S]) = 1 − ∏

t∈{[ P],[ Q],...} 1 − S(know, t) × Π(t, [

that S])       [ P] [ Q] · · · think 1 · · · know 1 1 · · · wonder 1 · · · · · · . . . . . . ...             [ P] [ Q] · · · think 0.94 0.03 · · · know 0.97 0.91 · · · wonder 0.17 0.93 · · · · · · . . . . . . ...           [ that S] [ whether S] [ NP] · · · [ P] 1 1 · · · [ Q] 1 1 · · · · · · . . . . . . . . . ...         [ that S] [ whether S] · · · [ P] 0.99 0.12 · · · [ Q] 0.07 0.98 · · · · · · . . . . . . ...           [ that S] [ whether S] [ NP] · · · think 1 1 · · · know 1 1 1 · · · wonder 1 1 · · · · · · . . . . . . . . . ...             [ that S] [ whether S] · · · think 0.97 0.14 · · · know 0.95 0.99 · · · wonder 0.12 0.99 · · · · · · . . . . . . ...       19

slide-48
SLIDE 48

A model of S-selection and projection

Semantic Type Syntactic Distribution Idealized Syntactic Distribution Observed Syntactic Distribution Acceptability Judgment Data Projection Rules Lexical Noise Noise Model

20

slide-49
SLIDE 49

A boolean model of observed syntactic distribution

∀t ∈ SYNTYPE : D(wonder, t) = ˆ D(wonder, t) ∧ N(wonder, t)

      [ that S] [ whether S] [ NP] · · · think 1 1 · · · know 1 1 1 · · · wonder 1 1 · · · · · · . . . . . . . . . ...             [ that S] [ whether S] [ NP] · · · think 1 1 1 · · · know 1 1 1 · · · wonder 1 1 · · · · · · . . . . . . . . . ...             [ that S] [ whether S] [ NP] · · · think 1 1 · · · know 1 1 1 · · · wonder 1 · · · · · · . . . . . . . . . ...      

21

slide-50
SLIDE 50

A boolean model of observed syntactic distribution

∀t ∈ SYNTYPE : D(wonder, t) = ˆ D(wonder, t) ∧ N(wonder, t)

      [ that S] [ whether S] [ NP] · · · think 1 1 · · · know 1 1 1 · · · wonder 1 1 · · · · · · . . . . . . . . . ...             [ that S] [ whether S] [ NP] · · · think 1 1 1 · · · know 1 1 1 · · · wonder 1 1 · · · · · · . . . . . . . . . ...             [ that S] [ whether S] [ NP] · · · think 1 1 · · · know 1 1 1 · · · wonder 1 · · · · · · . . . . . . . . . ...      

21

slide-51
SLIDE 51

A boolean model of observed syntactic distribution

∀t ∈ SYNTYPE : D(wonder, t) = ˆ D(wonder, t) ∧ N(wonder, t)

      [ that S] [ whether S] [ NP] · · · think 1 1 · · · know 1 1 1 · · · wonder 1 1 · · · · · · . . . . . . . . . ...             [ that S] [ whether S] [ NP] · · · think 1 1 1 · · · know 1 1 1 · · · wonder 1 1 · · · · · · . . . . . . . . . ...             [ that S] [ whether S] [ NP] · · · think 1 1 · · · know 1 1 1 · · · wonder 1 · · · · · · . . . . . . . . . ...      

21

slide-52
SLIDE 52

A boolean model of observed syntactic distribution

∀t ∈ SYNTYPE : D(wonder, t) = ˆ D(wonder, t) ∧ N(wonder, t)

      [ that S] [ whether S] [ NP] · · · think 1 1 · · · know 1 1 1 · · · wonder 1 1 · · · · · · . . . . . . . . . ...             [ that S] [ whether S] [ NP] · · · think 1 1 1 · · · know 1 1 1 · · · wonder 1 1 · · · · · · . . . . . . . . . ...             [ that S] [ whether S] [ NP] · · · think 1 1 · · · know 1 1 1 · · · wonder 1 · · · · · · . . . . . . . . . ...      

21

slide-53
SLIDE 53

Animating abstractions

Question What is this model useful for? Answer In conjunction with modern computational techniques, this model allow us to scale distributional analysis to an entire lexicon Basic idea Distributional analysis corresponds to reversing model arrows

22

slide-54
SLIDE 54

A model of S-selection and projection

Semantic Type Syntactic Distribution Idealized Syntactic Distribution Observed Syntactic Distribution Acceptability Judgment Data Projection Rules Lexical Noise Noise Model

23

slide-55
SLIDE 55

A model of S-selection and projection

Semantic Type Syntactic Distribution Idealized Syntactic Distribution Observed Syntactic Distribution Acceptability Judgment Data Projection Rules Lexical Noise Noise Model

23

slide-56
SLIDE 56

The MegaAttitude data set

slide-57
SLIDE 57

MegaAttitude materials

Ordinal (1-7 scale) acceptability ratings for 1000 clause-embedding verbs 50 syntactic frames

25

slide-58
SLIDE 58

MegaAttitude materials

Ordinal (1-7 scale) acceptability ratings for 1000 clause-embedding verbs 50 syntactic frames

25

slide-59
SLIDE 59

Verb selection

26

slide-60
SLIDE 60

MegaAttitude materials

Ordinal (1-7 scale) acceptability ratings for 1000 clause-embedding verbs × 50 syntactic frames

27

slide-61
SLIDE 61

Sentence construction

Challenge Automate construction of a very large set of frames in a way that is sufficiently general to many verbs Solution Construct semantically bleached frames using indefinites (6) Examples of responsives a. know + NP V {that, whether} S Someone knew {that, whether} something happened. b. tell + NP V NP {that, whether} S Someone told someone {that, whether} something happened.

28

slide-62
SLIDE 62

Frame construction

Syntactic type NP PP S [ NP] [ PP] [ NP S] [ S] [ NP PP] [ PP S] ACTIVE PASSIVE COMP TENSE that [+Q] for ∅ whether which NP [+FIN] [-FIN]

  • ed would

to ∅

  • ing

29

slide-63
SLIDE 63

Frame construction

Syntactic type NP PP S [ NP] [ PP] [ NP S] [ S] [ NP PP] [ PP S] ACTIVE PASSIVE COMP TENSE that [+Q] for ∅ whether which NP [+FIN] [-FIN]

  • ed would

to ∅

  • ing

29

slide-64
SLIDE 64

Frame construction

Syntactic type NP PP S [ NP] [ PP] [ NP S] [ S] [ NP PP] [ PP S] ACTIVE PASSIVE COMP TENSE that [+Q] for ∅ whether which NP [+FIN] [-FIN]

  • ed would

to ∅

  • ing

29

slide-65
SLIDE 65

Frame construction

Syntactic type NP PP S [ NP] [ PP] [ NP S] [ S] [ NP PP] [ PP S] ACTIVE PASSIVE COMP TENSE that [+Q] for ∅ whether which NP [+FIN] [-FIN]

  • ed would

to ∅

  • ing

29

slide-66
SLIDE 66

Frame construction

Syntactic type NP PP S [ NP] [ PP] [ NP S] [ S] [ NP PP] [ PP S] ACTIVE PASSIVE COMP TENSE that [+Q] for ∅ whether which NP [+FIN] [-FIN]

  • ed would

to ∅

  • ing

29

slide-67
SLIDE 67

Frame construction

Syntactic type NP PP S [ NP] [ PP] [ NP S] [ S] [ NP PP] [ PP S] ACTIVE PASSIVE COMP TENSE that [+Q] for ∅ whether which NP [+FIN] [-FIN]

  • ed would

to ∅

  • ing

29

slide-68
SLIDE 68

Frame construction

Syntactic type NP PP S [ NP] [ PP] [ NP S] [ S] [ NP PP] [ PP S] ACTIVE PASSIVE COMP TENSE that [+Q] for ∅ whether which NP [+FIN] [-FIN]

  • ed would

to ∅

  • ing

29

slide-69
SLIDE 69

Frame construction

Syntactic type NP PP S [ NP] [ PP] [ NP S] [ S] [ NP PP] [ PP S] ACTIVE PASSIVE COMP TENSE that [+Q] for ∅ whether which NP [+FIN] [-FIN]

  • ed would

to ∅

  • ing

29

slide-70
SLIDE 70

Frame construction

Syntactic type NP PP S [ NP] [ PP] [ NP S] [ S] [ NP PP] [ PP S] ACTIVE PASSIVE COMP TENSE that [+Q] for ∅ whether which NP [+FIN] [-FIN]

  • ed would

to ∅

  • ing

29

slide-71
SLIDE 71

Frame construction

Syntactic type NP PP S [ NP] [ PP] [ NP S] [ S] [ NP PP] [ PP S] ACTIVE PASSIVE COMP TENSE that [+Q] for ∅ whether which NP [+FIN] [-FIN]

  • ed would

to ∅

  • ing

29

slide-72
SLIDE 72

Sentence construction

Challenge Automate construction of a very large set of frames in a way that is sufficiently general to many verbs Solution Construct semantically bleached frames using indefinites (6) Examples of responsives a. know + NP V {that, whether} S Someone knew {that, whether} something happened. b. tell + NP V NP {that, whether} S Someone told someone {that, whether} something happened.

30

slide-73
SLIDE 73

Sentence construction

Challenge Automate construction of a very large set of frames in a way that is sufficiently general to many verbs Solution Construct semantically bleached frames using indefinites (6) Examples of responsives a. know + NP V {that, whether} S Someone knew {that, whether} something happened. b. tell + NP V NP {that, whether} S Someone told someone {that, whether} something happened.

30

slide-74
SLIDE 74

Sentence construction

Challenge Automate construction of a very large set of frames in a way that is sufficiently general to many verbs Solution Construct semantically bleached frames using indefinites (6) Examples of responsives a. know + NP V {that, whether} S Someone knew {that, whether} something happened. b. tell + NP V NP {that, whether} S Someone told someone {that, whether} something happened.

30

slide-75
SLIDE 75

Sentence construction

Challenge Automate construction of a very large set of frames in a way that is sufficiently general to many verbs Solution Construct semantically bleached frames using indefinites (6) Examples of responsives a. know + NP V {that, whether} S Someone knew {that, whether} something happened. b. tell + NP V NP {that, whether} S Someone told someone {that, whether} something happened.

30

slide-76
SLIDE 76

Data collection

  • 1,000 verbs × 50 syntactic frames = 50,000 sentences
  • 1,000 lists of 50 items each
  • Each verb only once per list
  • Each frame only once per list
  • 727 unique Mechanical Turk participants
  • Annotators allowed to do multiple lists, but never the

same list twice

  • 5 judgments per item
  • No annotator sees the same sentence more than once

31

slide-77
SLIDE 77

Data collection

  • 1,000 verbs × 50 syntactic frames = 50,000 sentences
  • 1,000 lists of 50 items each
  • Each verb only once per list
  • Each frame only once per list
  • 727 unique Mechanical Turk participants
  • Annotators allowed to do multiple lists, but never the

same list twice

  • 5 judgments per item
  • No annotator sees the same sentence more than once

31

slide-78
SLIDE 78

Data collection

  • 1,000 verbs × 50 syntactic frames = 50,000 sentences
  • 1,000 lists of 50 items each
  • Each verb only once per list
  • Each frame only once per list
  • 727 unique Mechanical Turk participants
  • Annotators allowed to do multiple lists, but never the

same list twice

  • 5 judgments per item
  • No annotator sees the same sentence more than once

31

slide-79
SLIDE 79

Data collection

  • 1,000 verbs × 50 syntactic frames = 50,000 sentences
  • 1,000 lists of 50 items each
  • Each verb only once per list
  • Each frame only once per list
  • 727 unique Mechanical Turk participants
  • Annotators allowed to do multiple lists, but never the

same list twice

  • 5 judgments per item
  • No annotator sees the same sentence more than once

31

slide-80
SLIDE 80

Data collection

  • 1,000 verbs × 50 syntactic frames = 50,000 sentences
  • 1,000 lists of 50 items each
  • Each verb only once per list
  • Each frame only once per list
  • 727 unique Mechanical Turk participants
  • Annotators allowed to do multiple lists, but never the

same list twice

  • 5 judgments per item
  • No annotator sees the same sentence more than once

31

slide-81
SLIDE 81

Data collection

  • 1,000 verbs × 50 syntactic frames = 50,000 sentences
  • 1,000 lists of 50 items each
  • Each verb only once per list
  • Each frame only once per list
  • 727 unique Mechanical Turk participants
  • Annotators allowed to do multiple lists, but never the

same list twice

  • 5 judgments per item
  • No annotator sees the same sentence more than once

31

slide-82
SLIDE 82

Data collection

  • 1,000 verbs × 50 syntactic frames = 50,000 sentences
  • 1,000 lists of 50 items each
  • Each verb only once per list
  • Each frame only once per list
  • 727 unique Mechanical Turk participants
  • Annotators allowed to do multiple lists, but never the

same list twice

  • 5 judgments per item
  • No annotator sees the same sentence more than once

31

slide-83
SLIDE 83

Data collection

  • 1,000 verbs × 50 syntactic frames = 50,000 sentences
  • 1,000 lists of 50 items each
  • Each verb only once per list
  • Each frame only once per list
  • 727 unique Mechanical Turk participants
  • Annotators allowed to do multiple lists, but never the

same list twice

  • 5 judgments per item
  • No annotator sees the same sentence more than once

31

slide-84
SLIDE 84

Task

Turktools (Erlewine & Kotek 2015)

32

slide-85
SLIDE 85

Validating the data

Interannotator agreement Spearman rank correlation calculated by list on a pilot 30 verbs Pilot verb selection Same verbs used by White (2015), White et al. (2015), selected based on Hacquard & Wellwood’s (2012) attitude verb classifi- cation

  • 1. Linguist-to-linguist

median: 0.70, 95% CI: [0.62, 0.78]

  • 2. Linguist-to-annotator

median: 0.55, 95% CI: [0.52, 0.58]

  • 3. Annotator-to-annotator

median: 0.56, 95% CI: [0.53, 0.59]

33

slide-86
SLIDE 86

Results

1 2 3 4 5 6 7 1 2 3 4 5 6 7

NP V S NP V whether S

34

slide-87
SLIDE 87

Results

know think want wonder

1 2 3 4 5 6 7 1 2 3 4 5 6 7

NP V S NP V whether S

35

slide-88
SLIDE 88

Model fitting and results

slide-89
SLIDE 89

A model of S-selection and projection

Semantic Type Syntactic Distribution Idealized Syntactic Distribution Observed Syntactic Distribution Acceptability Judgment Data Projection Rules Lexical Noise Noise Model

37

slide-90
SLIDE 90

A model of S-selection and projection

Semantic Type Syntactic Distribution Idealized Syntactic Distribution Observed Syntactic Distribution Acceptability Judgment Data Projection Rules Lexical Noise Noise Model

37

slide-91
SLIDE 91

A model of S-selection and projection

Semantic Type Syntactic Distribution Idealized Syntactic Distribution Observed Syntactic Distribution Acceptability Judgment Data Projection Rules Lexical Noise Noise Model

37

slide-92
SLIDE 92

Fitting the model

Goal Find representations of verbs’ semantic type signatures and projection rules that best explain the acceptability judgments Challenges

  • 1. Infeasible to search over 21000T

250T possible configurations (T # of type signatures)

  • 2. Finding the best boolean model fails to capture

uncertainty inherent in judgment data

38

slide-93
SLIDE 93

Fitting the model

Goal Find representations of verbs’ semantic type signatures and projection rules that best explain the acceptability judgments Challenges

  • 1. Infeasible to search over 21000T × 250T possible

configurations (T = # of type signatures)

  • 2. Finding the best boolean model fails to capture

uncertainty inherent in judgment data

38

slide-94
SLIDE 94

Fitting the model

Solution Search probability distributions over verbs’ semantic type sig- natures and projection rules Going probabilistic Wrap boolean expressions in probability measures

39

slide-95
SLIDE 95

Fitting the model

Solution Search probability distributions over verbs’ semantic type sig- natures and projection rules Going probabilistic Wrap boolean expressions in probability measures

39

slide-96
SLIDE 96

A boolean model of idealized syntactic distribution

ˆ D(VERB, SYNTYPE) = ∨

t∈SEMTYPES S(VERB, t) ∧ Π(t, SYNTYPE)

ˆ D(know, [ that S]) = ∨

t∈{[ P],[ Q],...} S(know, t) ∧ Π(t, [

that S]) ˆ D(wonder, [ NP]) = ∨

t∈{[ P],[ Q],...} S(wonder, t) ∧ Π(t, [

NP]) ˆ D(VERB, SYNTYPE) = ∨

t∈SEMTYPES S(VERB, t) ∧ Π(t, SYNTYPE)

ˆ D(know, [ that S]) = 1 − ∏

t∈{[ P],[ Q],...} 1 − S(know, t) × Π(t, [

that S])       [ P] [ Q] · · · think 1 · · · know 1 1 · · · wonder 1 · · · · · · . . . . . . ...             [ P] [ Q] · · · think 0.94 0.03 · · · know 0.97 0.91 · · · wonder 0.17 0.93 · · · · · · . . . . . . ...           [ that S] [ whether S] [ NP] · · · [ P] 1 1 · · · [ Q] 1 1 · · · · · · . . . . . . . . . ...         [ that S] [ whether S] · · · [ P] 0.99 0.12 · · · [ Q] 0.07 0.98 · · · · · · . . . . . . ...           [ that S] [ whether S] [ NP] · · · think 1 1 · · · know 1 1 1 · · · wonder 1 1 · · · · · · . . . . . . . . . ...             [ that S] [ whether S] · · · think 0.97 0.14 · · · know 0.95 0.99 · · · wonder 0.12 0.99 · · · · · · . . . . . . ...       40

slide-97
SLIDE 97

A boolean model of idealized syntactic distribution

ˆ D(VERB, SYNTYPE) = ∨

t∈SEMTYPES S(VERB, t) ∧ Π(t, SYNTYPE)

ˆ D(know, [ that S]) = ∨

t∈{[ P],[ Q],...} S(know, t) ∧ Π(t, [

that S]) ˆ D(wonder, [ NP]) = ∨

t∈{[ P],[ Q],...} S(wonder, t) ∧ Π(t, [

NP]) ˆ D(VERB, SYNTYPE) = ∨

t∈SEMTYPES S(VERB, t) ∧ Π(t, SYNTYPE)

ˆ D(know, [ that S]) = 1 − ∏

t∈{[ P],[ Q],...} 1 − S(know, t) × Π(t, [

that S])       [ P] [ Q] · · · think 1 · · · know 1 1 · · · wonder 1 · · · · · · . . . . . . ...             [ P] [ Q] · · · think 0.94 0.03 · · · know 0.97 0.91 · · · wonder 0.17 0.93 · · · · · · . . . . . . ...           [ that S] [ whether S] [ NP] · · · [ P] 1 1 · · · [ Q] 1 1 · · · · · · . . . . . . . . . ...         [ that S] [ whether S] · · · [ P] 0.99 0.12 · · · [ Q] 0.07 0.98 · · · · · · . . . . . . ...           [ that S] [ whether S] [ NP] · · · think 1 1 · · · know 1 1 1 · · · wonder 1 1 · · · · · · . . . . . . . . . ...             [ that S] [ whether S] · · · think 0.97 0.14 · · · know 0.95 0.99 · · · wonder 0.12 0.99 · · · · · · . . . . . . ...       40

slide-98
SLIDE 98

Wrapping with probabilities

P(S[VERB, t] ∧ Π[t, SYNTYPE]) = P(S[VERB, t])P(Π[t, SYNTYPE] | S[VERB, t]) = P(S[VERB, t])P(Π[t, SYNTYPE]) P (∨

t

S[VERB, t] ∧ Π[t, SYNTYPE] ) = P ( ¬ ∧

t

¬(S[VERB, t] ∧ Π[t, SYNTYPE]) ) = 1 − P (∧

t

¬(S[VERB, t] ∧ Π[t, SYNTYPE]) ) = 1 − ∏

t

P (¬(S[VERB, t] ∧ Π[t, SYNTYPE])) = 1 − ∏

t

1 − P (S[VERB, t] ∧ Π[t, SYNTYPE]) = 1 − ∏

t

1 − P(S[VERB, t])P(Π[t, SYNTYPE])

41

slide-99
SLIDE 99

Fitting the model

Algorithm Projected gradient descent with adaptive gradient (Duchi et al. 2011) Remaining challenge Don’t know the number of type signatures T Standard solution Fit the model with many type signatures and compare using an information criterion, e.g., the Akaike Information Criterion (AIC)

42

slide-100
SLIDE 100

Fitting the model

Algorithm Projected gradient descent with adaptive gradient (Duchi et al. 2011) Remaining challenge Don’t know the number of type signatures T Standard solution Fit the model with many type signatures and compare using an information criterion, e.g., the Akaike Information Criterion (AIC)

42

slide-101
SLIDE 101

Fitting the model

Algorithm Projected gradient descent with adaptive gradient (Duchi et al. 2011) Remaining challenge Don’t know the number of type signatures T Standard solution Fit the model with many type signatures and compare using an information criterion, e.g., the Akaike Information Criterion (AIC)

42

slide-102
SLIDE 102

Akaike Information Criterion

High-level idea Measures the information theoretic “distance” to the true model from the best model with T types signatures (Akaike 1974) Low-level idea (cf. Gelman et al. 2013) For each datapoint...

  • 1. ...remove that datapoint from the dataset
  • 2. ...fit the model to the remaining data
  • 3. ...predict the held-out datapoint

In the limit, model with lowest error on step 3 has lowest AIC

43

slide-103
SLIDE 103

Akaike Information Criterion

High-level idea Measures the information theoretic “distance” to the true model from the best model with T types signatures (Akaike 1974) Low-level idea (cf. Gelman et al. 2013) For each datapoint...

  • 1. ...remove that datapoint from the dataset
  • 2. ...fit the model to the remaining data
  • 3. ...predict the held-out datapoint

In the limit, model with lowest error on step 3 has lowest AIC

43

slide-104
SLIDE 104

Fitting the model

Result 12 is the optimal number of type signatures according to AIC Reporting findings Remainder of talk: best model with 12 type signatures

44

slide-105
SLIDE 105

Findings

Three findings

  • 1. Cognitive predicates

1.1 Two distinct type signatures [ P] and [ Q] 1.2 Coercion of [ P] to [ Q] and [ Q] to [ P]

  • 2. Communicative predicates

2.1 Two unified type signatures [ (Ent) P Q] (optional recipient) and [ Ent P Q] (obligatory recipient)

45

slide-106
SLIDE 106

Findings

[ P] [ Q] [ that S] [ whether S] [ (Ent) P⊕Q] [ Ent P⊕Q] [ to NP that S] [ to NP whether S] [ NP that S] [ NP whether S]

46

slide-107
SLIDE 107

Findings

[ P] [ Q] [ that S] [ whether S] [ (Ent) P⊕Q] [ Ent P⊕Q] [ to NP that S] [ to NP whether S] [ NP that S] [ NP whether S]

46

slide-108
SLIDE 108

Findings

Three findings

  • 1. Cognitive predicates

1.1 Two distinct type signatures [ P] and [ Q] 1.2 Coercion of [ P] to [ Q] and [ Q] to [ P]

  • 2. Communicative predicates

2.1 Two unified type signatures [ (Ent) P Q] (optional recipient) and [ Ent P Q] (obligatory recipient)

47

slide-109
SLIDE 109

Findings

[ P] [ Q] [ that S] [ whether S] [ (Ent) P⊕Q] [ Ent P⊕Q] [ to NP that S] [ to NP whether S] [ NP that S] [ NP whether S]

48

slide-110
SLIDE 110

Findings

Three findings

  • 1. Cognitive predicates

1.1 Two distinct type signatures [ P] and [ Q] 1.2 Coercion of [ P] to [ Q] and [ Q] to [ P]

  • 2. Communicative predicates

2.1 Two unified type signatures [ (Ent) P⊕Q] (optional recipient) and [ Ent P⊕Q] (obligatory recipient)

49

slide-111
SLIDE 111

Findings

[ P] [ Q] [ that S] [ whether S] [ (Ent) P⊕Q] [ Ent P⊕Q] [ to NP that S] [ to NP whether S] [ NP that S] [ NP whether S]

50

slide-112
SLIDE 112

Findings

[ P] [ Q] [ that S] [ whether S] [ (Ent) P⊕Q] [ Ent P⊕Q] [ to NP that S] [ to NP whether S] [ NP that S] [ NP whether S]

50

slide-113
SLIDE 113

Hybrid types

Question What do we mean by P⊕Q? Example Structures with potentially both informative and inquisitive con- tent (Groenendijk & Roelofsen 2009, a.o.)

  • S-selectional behavior of responsive predicates on some

accounts (Uegaki 2012; Rawlins 2013)

  • Some attitudes whose content is a hybrid Lewisian (1988)

subject matter (Rawlins 2013 on think v. think about)

51

slide-114
SLIDE 114

Projection

NP Ved to VP[eventive] NP Ved for NP to VP NP Ved to VP[stative] NP Ved VPing NP Ved NP to VP[stative] NP Ved NP VP NP Ved NP to VP[eventive] NP Ved to NP whether S[future] NP Ved to NP whether S NP Ved to NP that S[future] NP Ved to NP that S NP Ved to NP that S[-tense] NP Ved NP to NP NP Ved about whether S NP Ved about NP NP Ved NP was Ved whether S[future] NP was Ved whether S NP was Ved about whether S NP was Ved that S[future] NP was Ved that S NP was Ved about NP NP Ved NP whichNP S NP Ved NP that S[-tense] NP was Ved whichNP S NP was Ved whether to VP NP Ved NP whether S[future] NP Ved NP whether S NP Ved NP that S[future] NP Ved NP that S NP was Ved whichNP to VP NP was Ved that S[-tense] NP was Ved S NP was Ved so NP was Ved to VP[stative] NP was Ved to VP[eventive] NP was Ved NP Ved whichNP to VP NP Ved whether to VP NP Ved whether S[future] NP Ved whether S NP Ved whichNP S NP Ved NP VPing NP Ved NP NP Ved that S[-tense] NP Ved so S, I V NP Ved S NP Ved that S[future] NP Ved that S

52

slide-115
SLIDE 115

Projection

NP Ved NP NP Ved NP VPing NP Ved S S, I V NP Ved that S NP Ved that S[future] NP was Ved to VP[stative] NP Ved whether S NP Ved whether S[future] NP Ved whether to VP NP Ved whichNP S NP Ved whichNP to VP

0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00

[ P] [ Q]

53

slide-116
SLIDE 116

Projection

0.0001 0.001 0.01 0.1 0.25 0.5 0.75 0.9 0.99 0.999

sselectP sselectQ

54

slide-117
SLIDE 117

accept acknowledge admit affirm agree announce assume attest believe decide detect expect figure out find out guarantee hope swear wish 0.0001 0.001 0.01 0.1 0.25 0.5 0.75 0.9 0.99 0.999

[ P] [ Q]

55

slide-118
SLIDE 118

accept analyze assume brainstorm clarify contemplate decide detect figure out find out miss

  • utline

query question 0.0001 0.001 0.01 0.1 0.25 0.5 0.75 0.9 0.99 0.999

[ P] [ Q]

56

slide-119
SLIDE 119

Findings

[ P] [ Q] [ that S] [ whether S] [ (Ent) P⊕Q] [ Ent P⊕Q] [ to NP that S] [ to NP whether S] [ NP that S] [ NP whether S]

57

slide-120
SLIDE 120

Projection

NP Ved NP NP Ved NP to NP NP Ved S S, I V NP Ved that S NP Ved that S[future] NP Ved to NP that S NP Ved to NP that S[future] NP Ved to NP that S[-tense] NP Ved to NP whether S NP Ved to NP whether S[future] NP was Ved to VP[stative] NP Ved whether S NP Ved whether S[future]

0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00

[ P] [ (Ent) P⊕Q]

58

slide-121
SLIDE 121

S-selection

acknowledge advertise announce babble chat claim complain confirm deny explain fax lie repeat reveal say share signal write 0.0001 0.001 0.01 0.1 0.25 0.5 0.75 0.9 0.99 0.999 0.9999

[ P] [ (Ent) P⊕Q]

59

slide-122
SLIDE 122

Findings

[ P] [ Q] [ that S] [ whether S] [ (Ent) P⊕Q] [ Ent P⊕Q] [ to NP that S] [ to NP whether S] [ NP that S] [ NP whether S]

60

slide-123
SLIDE 123

Projection

NP Ved NP that S NP Ved NP that S[future] NP Ved NP that S[-tense] NP Ved NP whether S NP Ved NP whether S[future] NP was Ved S NP was Ved about NP NP was Ved about whether S NP was Ved NP was Ved that S NP was Ved that S[future] NP was Ved that S[-tense] NP was Ved whether S NP was Ved whether S[future] NP was Ved whether to VP NP was Ved whichNP to VP

0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00

[ Ent P] [ Ent P⊕Q]

61

slide-124
SLIDE 124

S-selection

0.0001 0.001 0.01 0.1 0.25 0.5 0.75 0.9 0.99 0.999

[ Ent P] [ Ent P⊕Q]

62

slide-125
SLIDE 125

S-selection

advise alert ask bet email fax notify remind tell 0.0001 0.001 0.01 0.1 0.25 0.5 0.75 0.9 0.99 0.999

[ Ent P] [ Ent P⊕Q]

63

slide-126
SLIDE 126

Findings

[ P] [ Q] [ that S] [ whether S] [ (Ent) P⊕Q] [ Ent P⊕Q] [ to NP that S] [ to NP whether S] [ NP that S] [ NP whether S]

64

slide-127
SLIDE 127

Discussion

What we conclude Proposition and question types live alongside hybrid types, and the presence of a hybrid type correlates with communicativity What we can exclude Accounts that reduce (or unify) declarative and interrogative se- lection solely to S-selection of a single type + coercion Methodological point Coercion can have measurable effects

65

slide-128
SLIDE 128

Discussion

What we conclude Proposition and question types live alongside hybrid types, and the presence of a hybrid type correlates with communicativity What we can exclude Accounts that reduce (or unify) declarative and interrogative se- lection solely to S-selection of a single type + coercion Methodological point Coercion can have measurable effects

65

slide-129
SLIDE 129

Discussion

What we conclude Proposition and question types live alongside hybrid types, and the presence of a hybrid type correlates with communicativity What we can exclude Accounts that reduce (or unify) declarative and interrogative se- lection solely to S-selection of a single type + coercion Methodological point Coercion can have measurable effects

65

slide-130
SLIDE 130

Conclusions and future directions

slide-131
SLIDE 131

Conclusions

Goals

  • 1. Demonstrate a combined experimental-computational

method for scaling distributional analysis

  • 2. Show that this method provides insight into general

principles governing lexical semantic structure Basic idea

  • 1. Formalize S(emantic)-selection, projection rule, and

lexical idiosyncrasy at Marr’s (1982) computational level

  • 2. Collect data on

1000 verbs’ syntactic distributions

  • 3. Given syntactic distribution data, use computational

techniques to automate inference of projection rules and verbs’ semantic type, controlling for lexical idiosyncrasy

67

slide-132
SLIDE 132

Conclusions

Goals

  • 1. Demonstrate a combined experimental-computational

method for scaling distributional analysis

  • 2. Show that this method provides insight into general

principles governing lexical semantic structure Basic idea

  • 1. Formalize S(emantic)-selection, projection rule, and

lexical idiosyncrasy at Marr’s (1982) computational level

  • 2. Collect data on

1000 verbs’ syntactic distributions

  • 3. Given syntactic distribution data, use computational

techniques to automate inference of projection rules and verbs’ semantic type, controlling for lexical idiosyncrasy

67

slide-133
SLIDE 133

Conclusions

Goals

  • 1. Demonstrate a combined experimental-computational

method for scaling distributional analysis

  • 2. Show that this method provides insight into general

principles governing lexical semantic structure Basic idea

  • 1. Formalize S(emantic)-selection, projection rule, and

lexical idiosyncrasy at Marr’s (1982) computational level

  • 2. Collect data on

1000 verbs’ syntactic distributions

  • 3. Given syntactic distribution data, use computational

techniques to automate inference of projection rules and verbs’ semantic type, controlling for lexical idiosyncrasy

67

slide-134
SLIDE 134

Conclusions

Goals

  • 1. Demonstrate a combined experimental-computational

method for scaling distributional analysis

  • 2. Show that this method provides insight into general

principles governing lexical semantic structure Basic idea

  • 1. Formalize S(emantic)-selection, projection rule, and

lexical idiosyncrasy at Marr’s (1982) computational level

  • 2. Collect data on ∼1000 verbs’ syntactic distributions
  • 3. Given syntactic distribution data, use computational

techniques to automate inference of projection rules and verbs’ semantic type, controlling for lexical idiosyncrasy

67

slide-135
SLIDE 135

Conclusions

Goals

  • 1. Demonstrate a combined experimental-computational

method for scaling distributional analysis

  • 2. Show that this method provides insight into general

principles governing lexical semantic structure Basic idea

  • 1. Formalize S(emantic)-selection, projection rule, and

lexical idiosyncrasy at Marr’s (1982) computational level

  • 2. Collect data on ∼1000 verbs’ syntactic distributions
  • 3. Given syntactic distribution data, use computational

techniques to automate inference of projection rules and verbs’ semantic type, controlling for lexical idiosyncrasy

67

slide-136
SLIDE 136

Conclusions

Focus Clause-embedding predicates (∼1000 in English) Case study Responsive predicates and the features that underly their se- lectional behavior. (7) John knows {that, whether} it’s raining. By looking at such a large data set, we can discover the relevant s-selectional features, and get an angle on the problem at the scale of the entire lexicon.

68

slide-137
SLIDE 137

Conclusions

Focus Clause-embedding predicates (∼1000 in English) Case study Responsive predicates and the features that underly their se- lectional behavior. (7) John knows {that, whether} it’s raining. By looking at such a large data set, we can discover the relevant s-selectional features, and get an angle on the problem at the scale of the entire lexicon.

68

slide-138
SLIDE 138

Future directions

Further investigation of type signatures Seven other type signatures that are also remarkably coherent Example Many nonfinite-taking verbs

69

slide-139
SLIDE 139

Future directions

Atomic v. structured type signatures Currently treating type signatures as atomic but type signatures have rich structure Example Preliminary experiments with models that represent type struc- ture suggest that our glosses for the types are correct

70

slide-140
SLIDE 140

Future directions

Homophony v. regular polysemy v. underspecification Patterns in how semantic type signatures distribute across verbs may belie regular polysemy rules Example Preliminary experiments with a more elaborated model suggest responsive predicates display a regular polysemy (cf. George 2011)

71

slide-141
SLIDE 141

Thanks

We are grateful to audiences at Johns Hopkins University for discussion of this work. We would like to thank Shevaun Lewis and Drew Reisinger in particular for useful comments on this talk. This work was funded by NSF DDRIG-1456013 (Doctoral Dissertation Research: Learning attitude verb meanings), NSF INSPIRE BCS-1344269 (Gradient symbolic computation), and the JHU Science of Learning Institute.

72

slide-142
SLIDE 142

Bibliography I

Akaike, Hirotugu. 1974. A new look at the statistical model

  • identification. IEEE Transactions on Automatic Control 19(6).

716–723. Carter, Richard. 1976. Some linking regularities. On Linking: Papers by Richard Carter Cambridge MA: Center for Cognitive Science, MIT (Lexicon Project Working Papers No. 25) . Chomsky, Noam. 1981. Lectures on Government and Binding: The Pisa

  • Lectures. Walter de Gruyter.

Duchi, John, Elad Hazan & Yoram Singer. 2011. Adaptive subgradient methods for online learning and stochastic optimization. The Journal of Machine Learning Research 12. 2121–2159.

73

slide-143
SLIDE 143

Bibliography II

Erlewine, Michael Yoshitaka & Hadas Kotek. 2015. A streamlined approach to online linguistic surveys. Natural Language & Linguistic Theory 1–15. doi:10.1007/s11049-015-9305-9. http://link.springer.com/article/10.1007/ s11049-015-9305-9. Frana, Ilaria. 2010. Concealed Questions: in search of answers: University of Massachusetts at Amherst Ph.D. dissertation. Gelman, Andrew, Jessica Hwang & Aki Vehtari. 2013. Understanding predictive information criteria for Bayesian models. Statistics and Computing 1–20. George, Benjamin Ross. 2011. Question embedding and the semantics

  • f answers: University of California Los Angeles dissertation.

Ginzburg, Jonathan. 1995. Resolving questions, II. Linguistics and Philosophy 18(6). 567–609.

74

slide-144
SLIDE 144

Bibliography III

Grimshaw, Jane. 1979. Complement selection and the lexicon. Linguistic Inquiry 10(2). 279–326. Grimshaw, Jane. 1990. Argument structure. Cambridge, MA: MIT Press. Groenendijk, Jeroen & Floris Roelofsen. 2009. Inquisitive semantics and pragmatics. Paper presented at Stanford workshop on Language, Communication, and Rational Agency. Groenendijk, Jeroen & Martin Stokhof. 1984. On the semantics of questions and the pragmatics of answers. Varieties of formal semantics 3. 143–170. Gruber, Jeffrey Steven. 1965. Studies in lexical relations: Massachusetts Institute of Technology dissertation. Hacquard, Valentine & Alexis Wellwood. 2012. Embedding epistemic modals in English: A corpus-based study. Semantics and Pragmatics 5(4). 1–29.

75

slide-145
SLIDE 145

Bibliography IV

Heim, Irene. 1994. Interrogative semantics and Karttunen’s semantics for know. In Proceedings of IATL, vol. 1, 128–144. Jackendoff, Ray. 1972. Semantic interpretation in generative grammar. Cambridge, MA: MIT Press. Karttunen, Lauri. 1977. Syntax and semantics of questions. Linguistics and philosophy 1(1). 3–44. Lahiri, Utpal. 2002. Questions and answers in embedded contexts. Oxford University Press. Levin, Beth. 1993. English verb classes and alternations: A preliminary investigation. University of Chicago Press. Lewis, David. 1988. Relevant implication. Theoria 54(3). 161–174.

76

slide-146
SLIDE 146

Bibliography V

Marr, David. 1982. Vision: a computational investigation into the human representation and processing of visual information. Henry Holt and Co. . Nathan, Lance Edward. 2006. On the interpretation of concealed questions: Massachusetts Institute of Technology dissertation. Pesetsky, David. 1982. Paths and categories: MIT dissertation. Pesetsky, David. 1991. Zero syntax: vol. 2: Infinitives. Pinker, Steven. 1984. Language learnability and language

  • development. Harvard University Press.

Pinker, Steven. 1989. Learnability and cognition: The acquisition of argument structure. Cambridge, MA: MIT Press. Rawlins, Kyle. 2013. About ’about’. In Semantics and Linguistic Theory, vol. 23, 336–357.

77

slide-147
SLIDE 147

Bibliography VI

Spector, Benjamin & Paul Egre. 2015. A uniform semantics for embedded interrogatives: An answer, not necessarily the answer. Synthese 192(6). 1729–1784. Uegaki, Wataru. 2012. Content nouns and the semantics of question-embedding predicates. In Ana Aguilar-Guevara, Anna Chernilovskaya & Rick Nouwen (eds.), Proceedings of SuB 16, . Uegaki, Wataru. 2015. Interpreting questions under attitudes: MIT dissertation. White, Aaron Steven. 2015. Information and incrementality in syntactic bootstrapping: University of Maryland dissertation. White, Aaron Steven, Valentine Hacquard & Jeffrey Lidz. 2015. Projecting attitudes.

78

slide-148
SLIDE 148

Appendix

slide-149
SLIDE 149

The response model

Two functions

  • 1. Normalize for participants’ judgments so they are

comparable

  • 2. Control for lexicosyntactic noise

80

slide-150
SLIDE 150

The response model

Two functions

  • 1. Normalize for participants’ judgments so they are

comparable

  • 2. Control for lexicosyntactic noise

80

slide-151
SLIDE 151

The response model

Why normalize judgments? Necessary to control for differences in participants’ use of scale

1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7

81

slide-152
SLIDE 152

The response model

Why normalize judgments? Necessary to control for differences in participants’ use of scale

1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7

81

slide-153
SLIDE 153

The response model

1 1 −3 2 −2 3 −1 4 5 1 6 2 7 3 −3 −2 −1 1 2 3 R

82

slide-154
SLIDE 154

The response model

1 1 −3 2 −2 3 −1 4 5 1 6 2 7 3 −3 −2 −1 1 2 3 R

82

slide-155
SLIDE 155

The response model

1 1 −3 2 −2 3 −1 4 5 1 6 2 7 3 −3 −2 −1 1 2 3 R

82

slide-156
SLIDE 156

The response model

1 1 −3 2 −2 3 −1 4 5 1 6 2 7 3 −3 −2 −1 1 2 3 R

82

slide-157
SLIDE 157

The response model

  • 10
  • 5

5 10

  • 10
  • 5

5 10

NP V S NP V whether S

83

slide-158
SLIDE 158

The response model

  • 10
  • 5

5 10

  • 10
  • 5

5 10

NP V S NP V whether S

84

slide-159
SLIDE 159

The response model

1 2 3 4 5 6 7 1 2 3 4 5 6 7

NP V S NP V whether S

85

slide-160
SLIDE 160

The response model

know think want wonder

1 2 3 4 5 6 7 1 2 3 4 5 6 7

NP V S NP V whether S

86

slide-161
SLIDE 161

The response model

know think want wonder

1 2 3 4 5 6 7 1 2 3 4 5 6 7

NP V S NP V whether S

87

slide-162
SLIDE 162

The response model

know think want wonder

1 2 3 4 5 6 7 1 2 3 4 5 6 7

NP V S NP V whether S

88

slide-163
SLIDE 163

The response model

know think want wonder

1 2 3 4 5 6 7 1 2 3 4 5 6 7

NP V S NP V whether S

89

slide-164
SLIDE 164

The response model

1 1 −3 2 −2 3 −1 4 5 1 6 2 7 3 −3 −2 −1 1 2 3 R

90

slide-165
SLIDE 165

The response model

1 1 −3 2 −2 3 −1 4 5 1 6 2 7 3 −3 −2 −1 1 2 3 R

90

slide-166
SLIDE 166

The response model

0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00

NP V S NP V whether S

91

slide-167
SLIDE 167

Fitting the model

Subgoal Find the optimal number T of type signatures Goodness of T model’s ability to... ...fit observed judgments ...predict unobserved judgments

  • T too small

bad fit bad prediction

  • T too large

good fit bad prediction Measure Akaike Information Criterion (AIC) trades off fit to observed data and prediction of unobserved data

92

slide-168
SLIDE 168

Fitting the model

Subgoal Find the optimal number T of type signatures Goodness of T ↔ model’s ability to... ...fit observed judgments ...predict unobserved judgments

  • T too small

bad fit bad prediction

  • T too large

good fit bad prediction Measure Akaike Information Criterion (AIC) trades off fit to observed data and prediction of unobserved data

92

slide-169
SLIDE 169

Fitting the model

Subgoal Find the optimal number T of type signatures Goodness of T ↔ model’s ability to... ...fit observed judgments ...predict unobserved judgments

  • T too small →

   bad fit bad prediction

  • T too large

good fit bad prediction Measure Akaike Information Criterion (AIC) trades off fit to observed data and prediction of unobserved data

92

slide-170
SLIDE 170

Fitting the model

Subgoal Find the optimal number T of type signatures Goodness of T ↔ model’s ability to... ...fit observed judgments ...predict unobserved judgments

  • T too small →

   bad fit bad prediction

  • T too large →

   good fit bad prediction Measure Akaike Information Criterion (AIC) trades off fit to observed data and prediction of unobserved data

92

slide-171
SLIDE 171

Fitting the model

Number of type signatures 1 2 3 4 5 6 7 Low extreme All verbs’ syntactic distributions explained by single rule High extreme # types ≥ # frames every syntactic frame has separate rule

93

slide-172
SLIDE 172

Fitting the model

Number of type signatures 1 2 3 4 5 6 7 Low extreme All verbs’ syntactic distributions explained by single rule High extreme # types ≥ # frames every syntactic frame has separate rule

93

slide-173
SLIDE 173

Fitting the model

Number of type signatures 1 2 3 4 5 6 7 Low extreme All verbs’ syntactic distributions explained by single rule High extreme # types ≥ # frames every syntactic frame has separate rule

93

slide-174
SLIDE 174

Fitting the model

Subgoal Find the optimal number T of type signatures Goodness of T ↔ model’s ability to... ...fit observed judgments ...predict unobserved judgments

  • T too small →

   bad fit bad prediction

  • T too large →

   good fit bad prediction Measure Akaike Information Criterion (AIC) trades off fit to observed data and prediction of unobserved data

94

slide-175
SLIDE 175

Model comparison

620000 640000 660000 680000 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Number of semantic type signatures Akaike Information Criterion

95

slide-176
SLIDE 176

Model comparison

621500 622000 622500 623000 623500 8 9 10 11 12 13 14 15 8 9 10 11 12 13 14 15 8 9 10 11 12 13 14 15

Number of semantic type signatures Akaike Information Criterion

96