Online Entropy-based Model of Lexical Category Acquisition Grzegorz - PowerPoint PPT Presentation

Online Entropy-based Model of Lexical Category Acquisition Grzegorz Chrupa� la Afra Alishahi Spoken Language Systems and Department of Computational Linguistics Saarland University CoNLL 2010 Chrupala and Alishahi (UdS) Online Category Acquisition CoNLL 2010 1 / 35

Lexical category acquisition in humans 1 Online information-theoretic model 2 Task-based evaluation 3 Chrupala and Alishahi (UdS) Online Category Acquisition CoNLL 2010 2 / 35

Outline Lexical category acquisition in humans 1 Online information-theoretic model 2 Task-based evaluation 3 Chrupala and Alishahi (UdS) Online Category Acquisition CoNLL 2010 3 / 35

Human category acquisition Humans incrementally learn lexical categories from exposure to language ◮ Children form robust lexical categories early on [Gelman and Taylor, 1984, Kemp et al., 2005] Distributional properties of words provide cues about its category ◮ Children are sensitive to co-occurrence statistics [Aslin et al., 1998] ◮ Child-directed speech provides contextual evidence for learning categories [Redington et al., 1998, Mintz, 2002] Chrupala and Alishahi (UdS) Online Category Acquisition CoNLL 2010 4 / 35

Unsupervised category induction Many unsupervised models use distributional information to learn categories [Brown et al., 1992, Clark, 2003, Goldwater and Griffiths, 2007] ◮ But most are not cognitively plausible ◮ process data in batch mode ◮ categorize word types instead of word tokens ◮ pre-define the number of categories Chrupala and Alishahi (UdS) Online Category Acquisition CoNLL 2010 5 / 35

Online category induction A few online models of category induction are proposed ◮ [Cartwright and Brent, 1997, Parisien et al., 2008] ◮ More cognitively motivated But may require large amounts of training, and be over-sensitive to context variation We propose ◮ A simple algorithm which incrementally learns an unbounded number of categories ◮ A task-based approach to evaluating human categorization models Chrupala and Alishahi (UdS) Online Category Acquisition CoNLL 2010 6 / 35

Informativeness versus parsimony A good categorization model partitions words into discrete categories such that: ◮ The number and distribution of categories is as simple as possible ◮ Categories are highly informative about their members In other words trade-off parsimony against informativeness (goodness-of-fit) Chrupala and Alishahi (UdS) Online Category Acquisition CoNLL 2010 8 / 35

Joint entropy criterion Parsimony N � H ( Y ) = − P ( Y = y i ) log 2 [ P ( Y = y i )] (1) i =1 Informativeness N � H ( X | Y ) = P ( Y = y i ) H ( X | Y = y i ) (2) i =1 Joint entropy minimizes the sum of both H ( X, Y ) = H ( Y ) + H ( X | Y ) (3) Chrupala and Alishahi (UdS) Online Category Acquisition CoNLL 2010 9 / 35

Joint minimization for multiple variables Optimize simultaneously for all features M M � � � � H ( X j , Y ) = H ( X j | Y ) + H ( Y ) (4) j =1 j =1 M � � � = H ( X j | Y ) + M × H ( Y ) j =1 Chrupala and Alishahi (UdS) Online Category Acquisition CoNLL 2010 10 / 35

Incremental updates At point t find the best assignment Y = y i : � if ∀ y n [∆ H t y N +1 ≤ ∆ H t y N +1 y n ] y = ˆ i =1 ∆ H t argmin y ∈{ y } N otherwise y (5) where M � ∆ H t H t y ( X j , Y ) − H t − 1 ( X j , Y ) � � y = (6) j =1 H t ( X j , Y ) can be computed incrementally. Chrupala and Alishahi (UdS) Online Category Acquisition CoNLL 2010 11 / 35

Data Manchester portion of CHILDES, mothers’ turns Discard one-word sentences and punctuation Data Set Sessions #Sentences #Words Training 26–28 22 , 491 125 , 339 Development 29–30 15 , 193 85 , 361 Test 32–33 14 , 940 84 , 130 Chrupala and Alishahi (UdS) Online Category Acquisition CoNLL 2010 13 / 35

Labeling with categories ∆H . Categories induced from the training set Features: want to try them on PoS. POS tags from the Manchester corpus Words. Word types Parisien. Categories induced by Bayesian model of [Parisien et al., 2008] from the training set. Chrupala and Alishahi (UdS) Online Category Acquisition CoNLL 2010 14 / 35

Example clusters Chrupala and Alishahi (UdS) Online Category Acquisition CoNLL 2010 15 / 35

How to evaluate induced categories? Against gold POS tags ◮ Arbitrary choice of granularity and/or criteria for membership Task based evaluation ◮ Different tasks may call for different category representations Proposal: evaluate on a number tasks, simulating key aspects of human language processing Chrupala and Alishahi (UdS) Online Category Acquisition CoNLL 2010 16 / 35

Evaluation against POS labels Variation of Information: VI ( X, X ′ ) = H ( X ) + H ( X ′ ) − 2 I ( X, X ′ ) Adjusted Rand Index VI ARI Gold Gold Words Words Parisien Parisien ∆ H ∆ H 0 1 2 3 4 5 0 20 40 60 80 100 Chrupala and Alishahi (UdS) Online Category Acquisition CoNLL 2010 17 / 35

Task-based evaluation Word prediction ◮ Guess a missing word based on its sentential context Semantic feature prediction ◮ Predict the semantic properties of a novel word based on context Grammaticality judgement ◮ Assess the syntactic well-formedness of a sentence based on the category labels assigned to its words Chrupala and Alishahi (UdS) Online Category Acquisition CoNLL 2010 18 / 35

Word prediction Human subjects are remarkably accurate at guessing words from context, e.g. in Cloze Test: Petroleum, or crude oil, is one of the world’s (1) —– natural resources. Plastics, synthetic fibres, and (2) —– chemicals are produced from petroleum. It is also used to make lubricants and waxes. (3) —– , its most important use is as a fuel for heating, for (4) – — electricity, and (5) —– for powering vehicles. A. as important B. most important C. so importantly D. less importantly E. too important Chrupala and Alishahi (UdS) Online Category Acquisition CoNLL 2010 19 / 35

Word prediction Reciprocal rank want to put them on Chrupala and Alishahi (UdS) Online Category Acquisition CoNLL 2010 20 / 35

Word prediction Reciprocal rank y 123 make take rank − 1 = 1 put want to put them on 3 get y 123 sit eat let Chrupala and Alishahi (UdS) Online Category Acquisition CoNLL 2010 20 / 35

Word prediction: variants ∆ H max R ( y i | h ) − 1 ) P ( w | h ) = P ( w | argmax i ∆ H Σ N R( y i | h ) − 1 � P ( w | h ) = P ( w | y i ) � N i =1 R( y i | h ) − 1 i =1 Chrupala and Alishahi (UdS) Online Category Acquisition CoNLL 2010 21 / 35

Word prediction: Results Gold POS Parisien ∆ H max ∆ H Σ 0 5 10 15 20 25 30 35 MRR Chrupala and Alishahi (UdS) Online Category Acquisition CoNLL 2010 22 / 35

Comparison to n-gram language models Gold LM 1 LM 2 LM 3 LM 4 LM 5 ∆ H Σ 0 5 10 15 20 25 30 35 MRR Chrupala and Alishahi (UdS) Online Category Acquisition CoNLL 2010 23 / 35

Predicting semantic properties Look, this is a zav! Look, this is Zav! Point to the zav. Point to Zav. [Gelman and Taylor, 1984]: 2-year-olds treat words preceded by a determiner (“the zav”) as common nouns, and interpret them as category members (block-like toy). Chrupala and Alishahi (UdS) Online Category Acquisition CoNLL 2010 24 / 35

Predicting semantic properties Look, this is Zav! Point to Zav. [Gelman and Taylor, 1984]: 2-year-olds treat words not preceded by a determiner (“Zav”) as proper nouns, and interpret them as individuals (animal-like toy). Chrupala and Alishahi (UdS) Online Category Acquisition CoNLL 2010 25 / 35

Semantic features from WordNet and VerbNet Semantic profile for each category is the multiset union of the semantic sets of its members Chrupala and Alishahi (UdS) Online Category Acquisition CoNLL 2010 26 / 35

Semantic feature prediction task I had cake for lunch Chrupala and Alishahi (UdS) Online Category Acquisition CoNLL 2010 27 / 35

Semantic feature prediction task I had cake for lunch y 123     y 123 entity cake   substance baked goods           matter food   AP ,   food solid     edible substance         ...   | F | AP( F, R ) = 1 � P ( r ) × 1 R ( F r ) (7) | R | r =1 Chrupala and Alishahi (UdS) Online Category Acquisition CoNLL 2010 27 / 35

Predicting semantic properties: Results Gold POS Parisien ∆ H 0 5 10 15 20 25 30 35 MAP Chrupala and Alishahi (UdS) Online Category Acquisition CoNLL 2010 28 / 35

Grammaticality judgement Both children and adults have a reliable concept of what is grammatical [Theakston, 2004]: “ She gave the book me ” Is it ok, or is it a bit silly? Silly “ She gave me the book ” Is it ok, or is it a bit silly? OK Chrupala and Alishahi (UdS) Online Category Acquisition CoNLL 2010 29 / 35

Online Entropy-based Model of Lexical Category Acquisition Grzegorz - PowerPoint PPT Presentation

Online Entropy-based Model of Lexical Category Acquisition Grzegorz Chrupa la Afra Alishahi Spoken Language Systems and Department of Computational Linguistics Saarland University CoNLL 2010 Chrupala and Alishahi (UdS) Online Category

Entropy, Relative Entropy, Cross Entropy Entropy Entropy, H(x) is a measure of the uncertainty of

Formal Modeling in Cognitive Science Lecture 25: Entropy, Joint Entropy, Conditional Entropy 1

Heterogeneous Lexical Resources MultiJEDI ERC 259234 Lexical Resource Lexical Resource Lexical

LEXICAL TYPOLOGY Peter Koch (Part I) Koch, Lexical typology, 2010-8-24 A. General introduction

Compilers Lexical Analysis Alex Aiken Lexical Analysis 1. Lexical Analysis 2. Parsing 3.

Entropy Coding Definition of Entropy Three Entropy coding techniques: (taken from the

1) Entropy = measure of randomness 2) Entropy = measure of compressibility More random = Less

Chapter 2 Entropy, Relative Entropy, and Mutual Infor- mation Peng-Hua Wang Graduate Institute

LEXICAL TYPOLOGY LEXICAL TYPOLOGY Peter Koch (Part II) Department of Romance Studies, Tbingen

LEXICAL SEMANTICS LEXICAL SEMANTICS CS 224N 2011 Gerald Penn Slides largely adapted from

Lesson 2 Lexical Analysis CS 226/326 Spring 2003 Lexical Analysis Transform source program

Lexical analysis Lexical analysis Lexical analysis checks the correctness of program words and

Introduction to Lexical Analysis Outline Informal sketch of lexical analysis

Lexical Category Acquisition as an Incremental Process Afra Alishahi, Grzegorz Chrupa a FEAST,

Road detection via entropy By Anna Zaidman 1 1 What is entropy? Entropy is a mathematically

Entropy Change in Entropy Reversible Isobaric Process Ideal Gas in a Reversible Process Free

CS 171: Introduction to Computer Science II Simple Sorting (cont.) + Interface Simple Sorting

INCPEN - research on environmental Click to edit Master title style and social impact of

Semantic Full-Text Search Semantic Full Text Search Talk @ SIGIR JIWES Talk @ SIGIR

Racism: Not in our sector? DSC Zoom Talk Wednesday 12 August 2020 Racism: Not in our sector? 1.

Foundations of Language Science and Technology: Morphology Berthold Crysmann crysmann@dfki.de

41st Annual General Meeting 24 th August 2016 DEWAN BERJAYA, BUKIT KIARA EQUESTRIAN & COUNTRY

CS70: Countability and Uncountability Alex Psomas June 30, 2016 Warning! Warning: Im really

Edifices: B ohm Trees for the Symmetric Interaction Combinators Damiano Mazza Laboratoire

Sambuz

Useful Links

Newsletter

Mail Us

Online Entropy-based Model of Lexical Category Acquisition Grzegorz - PowerPoint PPT Presentation

Online Entropy-based Model of Lexical Category Acquisition Grzegorz Chrupa la Afra Alishahi Spoken Language Systems and Department of Computational Linguistics Saarland University CoNLL 2010 Chrupala and Alishahi (UdS) Online Category

Entropy, Relative Entropy, Cross Entropy Entropy Entropy, H(x) is a measure of the uncertainty of

Formal Modeling in Cognitive Science Lecture 25: Entropy, Joint Entropy, Conditional Entropy 1

Heterogeneous Lexical Resources MultiJEDI ERC 259234 Lexical Resource Lexical Resource Lexical

LEXICAL TYPOLOGY Peter Koch (Part I) Koch, Lexical typology, 2010-8-24 A. General introduction

Compilers Lexical Analysis Alex Aiken Lexical Analysis 1. Lexical Analysis 2. Parsing 3.

Entropy Coding Definition of Entropy Three Entropy coding techniques: (taken from the

1) Entropy = measure of randomness 2) Entropy = measure of compressibility More random = Less

Chapter 2 Entropy, Relative Entropy, and Mutual Infor- mation Peng-Hua Wang Graduate Institute

LEXICAL TYPOLOGY LEXICAL TYPOLOGY Peter Koch (Part II) Department of Romance Studies, Tbingen

LEXICAL SEMANTICS LEXICAL SEMANTICS CS 224N 2011 Gerald Penn Slides largely adapted from

Lesson 2 Lexical Analysis CS 226/326 Spring 2003 Lexical Analysis Transform source program

Lexical analysis Lexical analysis Lexical analysis checks the correctness of program words and

Introduction to Lexical Analysis Outline Informal sketch of lexical analysis

Lexical Category Acquisition as an Incremental Process Afra Alishahi, Grzegorz Chrupa a FEAST,

Road detection via entropy By Anna Zaidman 1 1 What is entropy? Entropy is a mathematically

Entropy Change in Entropy Reversible Isobaric Process Ideal Gas in a Reversible Process Free

CS 171: Introduction to Computer Science II Simple Sorting (cont.) + Interface Simple Sorting

INCPEN - research on environmental Click to edit Master title style and social impact of

Semantic Full-Text Search Semantic Full Text Search Talk @ SIGIR JIWES Talk @ SIGIR

Racism: Not in our sector? DSC Zoom Talk Wednesday 12 August 2020 Racism: Not in our sector? 1.

Foundations of Language Science and Technology: Morphology Berthold Crysmann crysmann@dfki.de

41st Annual General Meeting 24 th August 2016 DEWAN BERJAYA, BUKIT KIARA EQUESTRIAN &amp; COUNTRY

CS70: Countability and Uncountability Alex Psomas June 30, 2016 Warning! Warning: Im really

Edifices: B ohm Trees for the Symmetric Interaction Combinators Damiano Mazza Laboratoire

Sambuz

Useful Links

Newsletter

Mail Us

41st Annual General Meeting 24 th August 2016 DEWAN BERJAYA, BUKIT KIARA EQUESTRIAN & COUNTRY