A computational model of S-selection Aaron Steven White 1 2 Kyle - PowerPoint PPT Presentation

Lexical idiosyncrasy Lexical idiosyncrasy Observed syntactic distributions are not a perfect reflection of semantic type + projection rules Example Some Q(uestion)-selecting verbs allow concealed questions... (4) a. Mary asked what time it was. b. Mary asked the time. ...others do not (Grimshaw 1979, Pesetsky 1982, 1991, Nathan 2006, Frana 2010, a.o.) (5) a. Mary wondered what time it was. b. *Mary wondered the time. 12

Pesetsky (1982, 1991) Verbs are related to semantic type signatures ( S-selection ); C- selection is an epiphenomenon of verbs’ abstract case Shared core Lexical noise (idiosyncrasy) alters verbs’ idealized syntactic distributions Two kinds of lexical idiosyncrasy Grimshaw (1979) Verbs are related to semantic type signatures ( S-selection ) and syntactic type signatures ( C-selection ) 13

Shared core Lexical noise (idiosyncrasy) alters verbs’ idealized syntactic distributions Two kinds of lexical idiosyncrasy Grimshaw (1979) Verbs are related to semantic type signatures ( S-selection ) and syntactic type signatures ( C-selection ) Pesetsky (1982, 1991) Verbs are related to semantic type signatures ( S-selection ); C- selection is an epiphenomenon of verbs’ abstract case 13

Two kinds of lexical idiosyncrasy Grimshaw (1979) Verbs are related to semantic type signatures ( S-selection ) and syntactic type signatures ( C-selection ) Pesetsky (1982, 1991) Verbs are related to semantic type signatures ( S-selection ); C- selection is an epiphenomenon of verbs’ abstract case Shared core Lexical noise (idiosyncrasy) alters verbs’ idealized syntactic distributions 13

A model of S-selection and projection Semantic Type Projection Rules Syntactic Idealized Distribution Syntactic Distribution Lexical Noise Model Noise Acceptability Observed Judgment Syntactic 14 Distribution Data

A minimalistic answer Every object is a matrix of boolean values Strategy 1. Give model in terms of sets and functions 2. Convert this model into a boolean matrix model Specifying the model Question How do we represent each object in the model? 15

Strategy 1. Give model in terms of sets and functions 2. Convert this model into a boolean matrix model Specifying the model Question How do we represent each object in the model? A minimalistic answer Every object is a matrix of boolean values 15

2. Convert this model into a boolean matrix model Specifying the model Question How do we represent each object in the model? A minimalistic answer Every object is a matrix of boolean values Strategy 1. Give model in terms of sets and functions 15

Specifying the model Question How do we represent each object in the model? A minimalistic answer Every object is a matrix of boolean values Strategy 1. Give model in terms of sets and functions 2. Convert this model into a boolean matrix model 15

A boolean model of S-selection think → {[ P]} know → {[ P], [ Q]} wonder → {[ Q]} [ P] [ Q] · · · think 1 0   · · · know 1 1 · · ·   S =   wonder 0 1 · · ·   . . ...   . . . . · · · 17

A boolean model of projection [ P] → {[ that S], [ NP], ...} [ Q] → {[ whether S], [ NP], ...} [ that S] [ whether S] [ NP] · · · [ P] 1 0 1  · · ·  [ Q] 0 1 1 · · · Π =   . . .  ...  . . . . . . · · · 18

A boolean model of idealized syntactic distribution D ( know, [ ˆ D ( wonder, [ ˆ ˆ ˆ that S] ) = ∨ NP] ) = ∨ Q] ,... } S ( know , t ) ∧ Π ( t , [ Q] ,... } S ( wonder , t ) ∧ Π ( t , [ that S] ) NP] ) ˆ D ( know, [ D ( VERB, SYNTYPE ) = ∨ D ( VERB, SYNTYPE ) = ∨ that S] ) = 1 − ∏ t ∈ SEMTYPES S ( VERB , t ) ∧ Π ( t , SYNTYPE ) t ∈ SEMTYPES S ( VERB , t ) ∧ Π ( t , SYNTYPE ) Q] ,... } 1 − S ( know , t ) × Π ( t , [ that S] ) t ∈{ [ t ∈{ [ t ∈{ [ P] , [ P] , [ P] , [ [ [ P] P] [ [ Q] Q] [ [ that S] that S] [ [ whether S] whether S] [ NP] · · · · · · · · · · · · think think   0 . 94 1 0 . 03 0   [ [ P] P]   0 . 99 1 0 . 12 0 1   · · · · · · · · · · · · know know 0 . 97 1 0 . 91 1 [ [ Q] Q] 0 . 07 0 0 . 98 1 1 · · · · · · · · · · · ·                 . . . . . ... ... wonder wonder 0 . 17 0 0 . 93 1     · · · · · ·   . . . .  .      . . . . . . . . . ... ... · · · · · ·   . . . .   . . . . · · · · · · [ that S] [ that S] [ whether S] [ whether S] [ NP] · · · · · · think think   1 0 . 97 0 0 . 14 1   · · · · · · know know 1 0 . 95 1 0 . 99 1 · · · · · ·         wonder wonder 0 0 . 12 1 0 . 99 1     · · · · · ·     . . . . . ... ...   . . . . .   . . . . . · · · · · · 19

A boolean model of observed syntactic distribution ∀ t ∈ SYNTYPE : D ( wonder , t ) = ˆ D ( wonder , t ) ∧ N ( wonder , t ) [ that S] [ whether S] [ NP] [ that S] [ whether S] [ NP] · · · · · · think 1 0 1 think 1 1 1 · · · · · ·     know 1 1 1 know 1 1 1 · · · · · ·     wonder  0 1 1  wonder  1 1 0  · · · · · ·         . . . . . . ... ...     . . . . . . . . . . . . · · · · · · [ that S] [ whether S] [ NP] · · · think 1 0 1 · · ·   know 1 1 1 · · ·   wonder  0 1 0  · · ·    . . .  ...  . . .  . . . · · · 21

Animating abstractions Question What is this model useful for? Answer In conjunction with modern computational techniques, this model allow us to scale distributional analysis to an entire lexicon Basic idea Distributional analysis corresponds to reversing model arrows 22

The MegaAttitude data set

for 1000 clause-embedding verbs 50 syntactic frames MegaAttitude materials Ordinal (1-7 scale) acceptability ratings 25

50 syntactic frames MegaAttitude materials Ordinal (1-7 scale) acceptability ratings for 1000 clause-embedding verbs 25

Verb selection 26

MegaAttitude materials Ordinal (1-7 scale) acceptability ratings for 1000 clause-embedding verbs × 50 syntactic frames 27

Solution Construct semantically bleached frames using indefinites (6) Examples of responsives a. know + NP V {that, whether} S Someone knew {that, whether} something happened. b. tell + NP V NP {that, whether} S Someone told someone {that, whether} something happened. Sentence construction Challenge Automate construction of a very large set of frames in a way that is sufficiently general to many verbs 28

Frame construction Syntactic type NP PP S ACTIVE PASSIVE COMP TENSE that for [+Q] [+FIN] [-FIN] ∅ [ NP] [ NP PP] [ PP] [ PP S] [ NP S] [ S] to ∅ whether which NP -ed would -ing 29

(6) Examples of responsives a. know + NP V {that, whether} S Someone knew {that, whether} something happened. b. tell + NP V NP {that, whether} S Someone told someone {that, whether} something happened. Sentence construction Challenge Automate construction of a very large set of frames in a way that is sufficiently general to many verbs Solution Construct semantically bleached frames using indefinites 30

b. tell + NP V NP {that, whether} S Someone told someone {that, whether} something happened. Sentence construction Challenge Automate construction of a very large set of frames in a way that is sufficiently general to many verbs Solution Construct semantically bleached frames using indefinites (6) Examples of responsives a. know + NP V {that, whether} S Someone knew {that, whether} something happened. 30

Sentence construction Challenge Automate construction of a very large set of frames in a way that is sufficiently general to many verbs Solution Construct semantically bleached frames using indefinites (6) Examples of responsives a. know + NP V {that, whether} S Someone knew {that, whether} something happened. b. tell + NP V NP {that, whether} S Someone told someone {that, whether} something happened. 30

• 1,000 lists of 50 items each • Each verb only once per list • Each frame only once per list • 727 unique Mechanical Turk participants • Annotators allowed to do multiple lists, but never the same list twice • 5 judgments per item • No annotator sees the same sentence more than once Data collection • 1,000 verbs × 50 syntactic frames = 50,000 sentences 31

• Each verb only once per list • Each frame only once per list • 727 unique Mechanical Turk participants • Annotators allowed to do multiple lists, but never the same list twice • 5 judgments per item • No annotator sees the same sentence more than once Data collection • 1,000 verbs × 50 syntactic frames = 50,000 sentences • 1,000 lists of 50 items each 31

• Each frame only once per list • 727 unique Mechanical Turk participants • Annotators allowed to do multiple lists, but never the same list twice • 5 judgments per item • No annotator sees the same sentence more than once Data collection • 1,000 verbs × 50 syntactic frames = 50,000 sentences • 1,000 lists of 50 items each • Each verb only once per list 31

• 727 unique Mechanical Turk participants • Annotators allowed to do multiple lists, but never the same list twice • 5 judgments per item • No annotator sees the same sentence more than once Data collection • 1,000 verbs × 50 syntactic frames = 50,000 sentences • 1,000 lists of 50 items each • Each verb only once per list • Each frame only once per list 31

• Annotators allowed to do multiple lists, but never the same list twice • 5 judgments per item • No annotator sees the same sentence more than once Data collection • 1,000 verbs × 50 syntactic frames = 50,000 sentences • 1,000 lists of 50 items each • Each verb only once per list • Each frame only once per list • 727 unique Mechanical Turk participants 31

• 5 judgments per item • No annotator sees the same sentence more than once Data collection • 1,000 verbs × 50 syntactic frames = 50,000 sentences • 1,000 lists of 50 items each • Each verb only once per list • Each frame only once per list • 727 unique Mechanical Turk participants • Annotators allowed to do multiple lists, but never the same list twice 31

• No annotator sees the same sentence more than once Data collection • 1,000 verbs × 50 syntactic frames = 50,000 sentences • 1,000 lists of 50 items each • Each verb only once per list • Each frame only once per list • 727 unique Mechanical Turk participants • Annotators allowed to do multiple lists, but never the same list twice • 5 judgments per item 31

Data collection • 1,000 verbs × 50 syntactic frames = 50,000 sentences • 1,000 lists of 50 items each • Each verb only once per list • Each frame only once per list • 727 unique Mechanical Turk participants • Annotators allowed to do multiple lists, but never the same list twice • 5 judgments per item • No annotator sees the same sentence more than once 31

Task Turktools (Erlewine & Kotek 2015) 32

Validating the data Interannotator agreement Spearman rank correlation calculated by list on a pilot 30 verbs Pilot verb selection Same verbs used by White (2015), White et al. (2015), selected based on Hacquard & Wellwood’s (2012) attitude verb classifi- cation 1. Linguist-to-linguist median: 0.70, 95% CI: [0.62, 0.78] 2. Linguist-to-annotator median: 0.55, 95% CI: [0.52, 0.58] 3. Annotator-to-annotator median: 0.56, 95% CI: [0.53, 0.59] 33

Results 7 6 NP V whether S 5 4 3 2 1 1 2 3 4 5 6 7 NP V S 34

Results 7 wonder know 6 NP V whether S 5 4 think 3 2 1 want 1 2 3 4 5 6 7 NP V S 35

Model fitting and results

Challenges 2 50 T possible 1. Infeasible to search over 2 1000 T configurations ( T # of type signatures) 2. Finding the best boolean model fails to capture uncertainty inherent in judgment data Fitting the model Goal Find representations of verbs’ semantic type signatures and projection rules that best explain the acceptability judgments 38

Fitting the model Goal Find representations of verbs’ semantic type signatures and projection rules that best explain the acceptability judgments Challenges 1. Infeasible to search over 2 1000 T × 2 50 T possible configurations ( T = # of type signatures) 2. Finding the best boolean model fails to capture uncertainty inherent in judgment data 38

Going probabilistic Wrap boolean expressions in probability measures Fitting the model Solution Search probability distributions over verbs’ semantic type signatures and projection rules 39

Fitting the model Solution Search probability distributions over verbs’ semantic type signatures and projection rules Going probabilistic Wrap boolean expressions in probability measures 39

A boolean model of idealized syntactic distribution D ( know, [ ˆ D ( wonder, [ ˆ ˆ ˆ that S] ) = ∨ NP] ) = ∨ Q] ,... } S ( know , t ) ∧ Π ( t , [ Q] ,... } S ( wonder , t ) ∧ Π ( t , [ that S] ) NP] ) ˆ D ( know, [ D ( VERB, SYNTYPE ) = ∨ D ( VERB, SYNTYPE ) = ∨ that S] ) = 1 − ∏ t ∈ SEMTYPES S ( VERB , t ) ∧ Π ( t , SYNTYPE ) t ∈ SEMTYPES S ( VERB , t ) ∧ Π ( t , SYNTYPE ) Q] ,... } 1 − S ( know , t ) × Π ( t , [ that S] ) t ∈{ [ t ∈{ [ t ∈{ [ P] , [ P] , [ P] , [ [ [ P] P] [ [ Q] Q] [ [ that S] that S] [ [ whether S] whether S] [ NP] · · · · · · · · · · · · think think   0 . 94 1 0 . 03 0   [ [ P] P]   0 . 99 1 0 . 12 0 1   · · · · · · · · · · · · know know 0 . 97 1 0 . 91 1 [ [ Q] Q] 0 . 07 0 0 . 98 1 1 · · · · · · · · · · · ·                 . . . . . ... ... wonder wonder 0 . 17 0 0 . 93 1     · · · · · ·   . . . .  .      . . . . . . . . . ... ... · · · · · ·   . . . .   . . . . · · · · · · [ that S] [ that S] [ whether S] [ whether S] [ NP] · · · · · · think think   1 0 . 97 0 0 . 14 1   · · · · · · know know 1 0 . 95 1 0 . 99 1 · · · · · ·         wonder wonder 0 0 . 12 1 0 . 99 1     · · · · · ·     . . . . . ... ...   . . . . .   . . . . . · · · · · · 40

Wrapping with probabilities P ( S [ VERB , t ] ∧ Π [ t , SYNTYPE ]) = P ( S [ VERB , t ]) P ( Π [ t , SYNTYPE ] | S [ VERB , t ]) = P ( S [ VERB , t ]) P ( Π [ t , SYNTYPE ]) (∨ ) ( ) ∧ S [ VERB , t ] ∧ Π [ t , SYNTYPE ] ¬ ( S [ VERB , t ] ∧ Π [ t , SYNTYPE ]) = P ¬ P t t (∧ ) = 1 − P ¬ ( S [ VERB , t ] ∧ Π [ t , SYNTYPE ]) t ∏ = 1 − P ( ¬ ( S [ VERB , t ] ∧ Π [ t , SYNTYPE ])) t = 1 − ∏ 1 − P ( S [ VERB , t ] ∧ Π [ t , SYNTYPE ]) t = 1 − ∏ 1 − P ( S [ VERB , t ]) P ( Π [ t , SYNTYPE ]) t 41

Remaining challenge Don’t know the number of type signatures T Standard solution Fit the model with many type signatures and compare using an information criterion, e.g., the Akaike Information Criterion (AIC) Fitting the model Algorithm Projected gradient descent with adaptive gradient (Duchi et al. 2011) 42

Standard solution Fit the model with many type signatures and compare using an information criterion, e.g., the Akaike Information Criterion (AIC) Fitting the model Algorithm Projected gradient descent with adaptive gradient (Duchi et al. 2011) Remaining challenge Don’t know the number of type signatures T 42

A computational model of S-selection Aaron Steven White 1 2 Kyle - PowerPoint PPT Presentation

A computational model of S-selection Aaron Steven White 1 2 Kyle Rawlins 1 Semantics and Linguistic Theory 26 University of Texas, Austin 14 th May, 2016 Johns Hopkins University 1 Department of Cognitive Science 2 Center for Language and Speech

ERP Selection KIRTANE & PANDIT Suhas Deshpande Why ERP Selection is important ?

STAT 213 Model Selection II Colin Reimer Dawson Oberlin College March 30, 2018 1 / 13 Outline

SECONDHAND SELECTION Sales Price - 275,000.00 EU SECONDHAND SELECTION INTERNAL VIEWS SECONDHAND

Variable selection bias Bias in Ensemble Bias in Ensemble Methods Methods Variable selection

SELECTION Deterministic Stochastic Proportionate selection: Roulette Wheel Selection

Selection 2 Selection Selection given a set of (distinct) elements, finding the element larger

MODEL SELECTION AND REGULARISATION MODEL SELECTION ESTIMATING THE ACCURACY OF THE MODEL We

Model Selection and Assumptions November 15, 2019 November 15, 2019 1 / 32 Forward Selection

STAT 213 Multicollinearity and Model Selection Colin Reimer Dawson Oberlin College 7 April 2016

Demo (Step 1, Selection) Demo (Step 1, Optimization) Demo (Step 2, Selection) Demo (Step 2,

Conference Site Selection Stephanie Sabal Program Coordinator: Site Selection sabal@acm.org

Selection Sort Section 10.2 Code for Selection Sort (cont.) Code for an Array Sort Code for an

Selection Rules: Selection Rules Each of the spectroscopies have associated selection

Bayesian Model Selection and Averaging Nonlinear Models Bayes factors Example Families FFX

Component selection 1 (c) 2020 A.J.M. Montagne Component selection + - + - + - 2 (c)

Class of 2024 1 Course selection worksheet 1 Course selection online directions for

The Status of Phoneme Inventories: The Role of Contrastive Feature Hierarchies B. Elan Dresher

General Philosophy General Philosophy Dr Peter Millican Millican, Hertford College , Hertford

GNU epsilon an extensible programming language Luca Saiu <positron@gnu.org> GNU Hackers

Crazy Ideas June 2015 Consciousness and Rationality Explained John Rushby Computer Science

Nixon, Light Switches and King Ludwig of Bavaria: How to Model Counterfactual Reasoning Katrin

Queens University Myself As a mathematician I am an artist. As a teacher my job is to be an

Making Sense of Word Sense 24 February, 2011 Deutschen Gesellschaft fr Sprachwissenschaft (DGfS)

Essencial: Adding value to healthcare through discontinuation of low-value practices Anna

A computational model of S-selection Aaron Steven White 1 2 Kyle - PowerPoint PPT Presentation

A computational model of S-selection Aaron Steven White 1 2 Kyle Rawlins 1 Semantics and Linguistic Theory 26 University of Texas, Austin 14 th May, 2016 Johns Hopkins University 1 Department of Cognitive Science 2 Center for Language and Speech

ERP Selection KIRTANE &amp; PANDIT Suhas Deshpande Why ERP Selection is important ?

STAT 213 Model Selection II Colin Reimer Dawson Oberlin College March 30, 2018 1 / 13 Outline

SECONDHAND SELECTION Sales Price - 275,000.00 EU SECONDHAND SELECTION INTERNAL VIEWS SECONDHAND

Variable selection bias Bias in Ensemble Bias in Ensemble Methods Methods Variable selection

SELECTION Deterministic Stochastic Proportionate selection: Roulette Wheel Selection

Selection 2 Selection Selection given a set of (distinct) elements, finding the element larger

MODEL SELECTION AND REGULARISATION MODEL SELECTION ESTIMATING THE ACCURACY OF THE MODEL We

Model Selection and Assumptions November 15, 2019 November 15, 2019 1 / 32 Forward Selection

STAT 213 Multicollinearity and Model Selection Colin Reimer Dawson Oberlin College 7 April 2016

Demo (Step 1, Selection) Demo (Step 1, Optimization) Demo (Step 2, Selection) Demo (Step 2,

Conference Site Selection Stephanie Sabal Program Coordinator: Site Selection sabal@acm.org

Selection Sort Section 10.2 Code for Selection Sort (cont.) Code for an Array Sort Code for an

Selection Rules: Selection Rules Each of the spectroscopies have associated selection

Bayesian Model Selection and Averaging Nonlinear Models Bayes factors Example Families FFX

Component selection 1 (c) 2020 A.J.M. Montagne Component selection + - + - + - 2 (c)

Class of 2024 1 Course selection worksheet 1 Course selection online directions for

The Status of Phoneme Inventories: The Role of Contrastive Feature Hierarchies B. Elan Dresher

General Philosophy General Philosophy Dr Peter Millican Millican, Hertford College , Hertford

GNU epsilon an extensible programming language Luca Saiu &lt;positron@gnu.org&gt; GNU Hackers

Crazy Ideas June 2015 Consciousness and Rationality Explained John Rushby Computer Science

Nixon, Light Switches and King Ludwig of Bavaria: How to Model Counterfactual Reasoning Katrin

Queens University Myself As a mathematician I am an artist. As a teacher my job is to be an

Making Sense of Word Sense 24 February, 2011 Deutschen Gesellschaft fr Sprachwissenschaft (DGfS)

Essencial: Adding value to healthcare through discontinuation of low-value practices Anna

ERP Selection KIRTANE & PANDIT Suhas Deshpande Why ERP Selection is important ?

GNU epsilon an extensible programming language Luca Saiu <positron@gnu.org> GNU Hackers