informativeness a review of work by regier and colleagues
play

Informativeness: A review of work by Regier and colleagues (and a - PowerPoint PPT Presentation

Informativeness: A review of work by Regier and colleagues (and a response) Jon W. Carr Centre for Language Evolution School of Philosophy, Psychology and Language Sciences University of Edinburgh What shapes language? compressibility


  1. Informativeness: A review of work by Regier and colleagues (and a response) Jon W. Carr Centre for Language Evolution School of Philosophy, Psychology and Language Sciences University of Edinburgh

  2. What shapes language? compressibility expressivity Language Learning Communication simplicity informativeness

  3. How do learning and communication shape the structure of semantic categories?

  4. a pressure for simplicity How do learning and communication shape the structure of semantic categories? a pressure for informativeness ✗ ✔

  5. Kinship terms are simple and informative Kemp & Regier (2012) English Northern Paiute 4 3 ⬅ Informative 2 1 0 0 50 100 ⬅ Simple

  6. Learning and communication in the CLE framework 4 3 ⬅ Informative 2 1 0 0 50 100 ⬅ Simple

  7. Learning and communication in the CLE framework tuge tuge tuge 4 tuge tuge tuge tuge tuge tuge tupim tupim tupim Learning miniku miniku miniku tupin tupin tupin 3 poi poi poi poi poi poi poi poi poi ⬅ Informative Kirby, Cornish, & Smith (2008) 2 1 0 0 50 100 ⬅ Simple

  8. Learning and communication in the CLE framework tuge tuge tuge 4 tuge tuge tuge tuge tuge tuge tupim tupim tupim Learning miniku miniku miniku pihino nemone piga kawake tupin tupin tupin 3 poi poi poi poi poi poi poi poi poi ⬅ Informative kapa gakho wuwele nepi Kirby, Cornish, & Smith (2008) 2 newhomo kamone gaku hokako Kirby, Tamariz, Cornish, & Smith (2015) 1 Communication 0 0 50 100 ⬅ Simple

  9. Learning and communication in the CLE framework tuge tuge tuge 4 tuge tuge tuge tuge tuge tuge tupim tupim tupim Learning miniku miniku miniku pihino nemone piga kawake tupin tupin tupin 3 poi poi poi poi poi poi poi poi poi ⬅ Informative kapa gakho wuwele nepi Kirby, Cornish, & Smith (2008) 2 newhomo kamone gaku hokako Kirby, Tamariz, Cornish, & Smith (2015) Kirby, Tamariz, Cornish, & Smith (2015) 1 Learning and communication egewawu egewawa egewuwu ege Communication 0 mega megawawa megawuwu wulagi 0 50 100 ⬅ Simple gamenewawu gamenewawa gamenewuwu gamene

  10. Summary Pressure from learning Pressure from communication Compressibility: To what Expressivity: How many extent can the language meaning distinctions does CLE be compressed? the language allow? Measure: MDL, gzip, Measure: Number of entropy words Simplicity: How many Informativeness: How words does an individual effectively can a meaning Regier need to remember? be transmitted? Measure: Number of Measure: Communicative words, number of rules cost

  11. Summary Pressure from learning Pressure from communication Compressibility: To what Informativeness: How extent can the language effectively can a meaning be compressed? be transmitted? Measure: MDL, gzip, Measure: Communicative entropy cost bits required to represent the language bits lost during communication

  12. Communicative cost

  13. Communicative cost: High-level overview

  14. Communicative cost: Low-level details To compute the cost of a category partition, we start by considering a individual target meaning and compute how much error would be incurred in trying to reconstruct that target Reconstruction error is defined as the Kullback-Leibler divergence between s and l : s ( i ) 1 � D KL ( s || l ) = s ( i ) log 2 l ( i ) = log 2 l ( t ) i ∈ U Summing the divergences for all targets yields the communicative cost for the partition: � k = p ( t ) D KL ( s || l ) t ∈ U 1 � k = p ( t ) log 2 l ( t ) t ∈ U

  15. Communicative cost: Example of a discrete categorizer 1 � universe U = { i 1 , i 2 , ..., i 16 } k = p ( t ) log 2 l ( t ) t ∈ U category partition P = { C 1 , C 2 , C 3 , C 4 } 1 1 = {{ i 1 , i 2 , i 3 , i 4 } , { i 5 , i 6 , i 7 , i 8 } , { i 9 , i 10 , i 11 , i 12 } , { i 13 , i 14 , i 15 , i 16 }} � = 16 log 2 1 / 4 t ∈ U speaker’s lexicon S = { C 1 → �� , C 2 → �� , C 3 → �� , C 4 → �� } = 16( 1 1 16 log 2 1 / 4) listener’s lexicon L = { �� → C 1 , �� → C 2 , �� → C 3 , �� → C 4 } 1 p = [ 1 16 , 1 16 , 1 16 , 1 16 , 1 16 , 1 16 , 1 16 , 1 16 , 1 16 , 1 16 , 1 16 , 1 16 , 1 16 , 1 16 , 1 16 , 1 = log 2 need probabilities 16] 1 / 4 = log 2 4 speaker distributions s 1 = [1 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0] (for each meaning) s 2 = [0 , 1 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0] = 2 bits ... s 16 = [0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 1] l C 1 = [ 1 4 , 1 4 , 1 4 , 1 listener distributions 4 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0] (for each category) l C 2 = [0 , 0 , 0 , 0 , 1 4 , 1 4 , 1 4 , 1 4 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0] l C 3 = [0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 1 4 , 1 4 , 1 4 , 1 4 , 0 , 0 , 0 , 0] l C 4 = [0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 1 4 , 1 4 , 1 4 , 1 4 ]

  16. Communicative cost: Example of a discrete categorizer 1 � universe U = { i 1 , i 2 , ..., i 16 } k = p ( t ) log 2 l ( t ) t ∈ U category partition P = { C 1 , C 2 , C 3 , C 4 } 1 1 Why 2 bits? = {{ i 1 , i 2 , i 3 , i 4 } , { i 5 , i 6 , i 7 , i 8 } , { i 9 , i 10 , i 11 , i 12 } , { i 13 , i 14 , i 15 , i 16 }} � = 16 log 2 1 / 4 t ∈ U speaker’s lexicon S = { C 1 → �� , C 2 → �� , C 3 → �� , C 4 → �� } 0000 0100 1000 1100 0001 0101 1001 1101 = 16( 1 1 Ideal system: 4-bit signals 0010 0110 1010 1110 16 log 2 1 / 4) listener’s lexicon 0011 0111 1011 1111 L = { �� → C 1 , �� → C 2 , �� → C 3 , �� → C 4 } (1 signal for every meaning) 1 p = [ 1 16 , 1 16 , 1 16 , 1 16 , 1 16 , 1 16 , 1 16 , 1 16 , 1 16 , 1 16 , 1 16 , 1 16 , 1 16 , 1 16 , 1 16 , 1 = log 2 need probabilities 16] 1 / 4 Actual system: 2-bit signals 00 01 10 11 (Pressure from leaning prefers more compressed system) = log 2 4 speaker distributions s 1 = [1 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0] (for each meaning) s 2 = [0 , 1 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0] = 2 bits ... Loss of information on every communicative episode: s 16 = [0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 1] 4 bits – 2 bits = 2 bits l C 1 = [ 1 4 , 1 4 , 1 4 , 1 listener distributions 4 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0] (for each category) l C 2 = [0 , 0 , 0 , 0 , 1 4 , 1 4 , 1 4 , 1 4 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0] l C 3 = [0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 1 4 , 1 4 , 1 4 , 1 4 , 0 , 0 , 0 , 0] l C 4 = [0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 1 4 , 1 4 , 1 4 , 1 4 ]

  17. Communicative cost: Listener distributions Humans aren’t discrete categorizers; in human cognition, we see two effects: (a) within-category prototypicality (b) across-category fuzziness Instead, the listener 
 distributions can be 
 Discrete categorizer Fuzzy categorizer Non-categorizer modelled as Gaussians: � e γ d ( i,j ) l C ( i ) ∝ j ∈ U where γ allows you to model various types of categorizer 1—4 5—8 9—12 13—16 1—4 5—8 9—12 13—16 1—4 5—8 9—12 13—16

  18. Communicative cost: Example of a fuzzy categorizer 1 � universe U = { i 1 , i 2 , ..., i 16 } k = p ( t ) log 2 l ( t ) t ∈ U category partition P = { C 1 , C 2 , C 3 , C 4 } = 3 . 636 bits = {{ i 1 , i 2 , i 3 , i 4 } , { i 5 , i 6 , i 7 , i 8 } , { i 9 , i 10 , i 11 , i 12 } , { i 13 , i 14 , i 15 , i 16 }} speaker’s lexicon S = { C 1 → �� , C 2 → �� , C 3 → �� , C 4 → �� } listener’s lexicon L = { �� → C 1 , �� → C 2 , �� → C 3 , �� → C 4 } p = [ 1 16 , 1 16 , 1 16 , 1 16 , 1 16 , 1 16 , 1 16 , 1 16 , 1 16 , 1 16 , 1 16 , 1 16 , 1 16 , 1 16 , 1 16 , 1 need probabilities 16] speaker distributions s 1 = [1 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0] (for each meaning) s 2 = [0 , 1 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0] ... s 16 = [0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 1] l C 1 = [ . 079 , . 082 , . 082 , . 079 , . 071 , . 064 , . 058 , . 053 , . 048 , . 045 , . 045 , . 048 , . 053 , . 058 , . 064 , . 071] listener distributions (for each category) l C 2 = [ . 053 , . 058 , . 064 , . 071 , . 079 , . 082 , . 082 , . 079 , . 071 , . 064 , . 058 , . 053 , . 048 , . 045 , . 045 , . 048] l C 3 = [ . 048 , . 045 , . 045 , . 048 , . 053 , . 058 , . 064 , . 071 , . 079 , . 082 , . 082 , . 079 , . 071 , . 064 , . 058 , . 053] l C 4 = [ . 071 , . 064 , . 058 , . 053 , . 048 , . 045 , . 045 , . 048 , . 053 , . 058 , . 064 , . 071 , . 079 , . 082 , . 082 , . 079]

  19. Communicative cost: Six predictions Expressivity A system of many categories is more informative Convexity A system of convex categories (blue) is more than a system of few categories informative than a system of nonconvex categories (red) Balanced categories A system of equally sized categories is more Discreteness A system of discrete categories is more informative informative than a system of unequally sized categories than a system of fuzzy categories Dimensionality A system that uses many dimensions is less (?) Compactness A system of compact categories is more informative than a system that uses few dimensions informative than a system of noncompact categories

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend