Formal Modeling in Cognitive Science What are Collocations? Lecture - PowerPoint PPT Presentation

Application: Discovering Collocations Application: Discovering Collocations Codes Codes 1 Application: Discovering Collocations Formal Modeling in Cognitive Science What are Collocations? Lecture 27: Application of Mutual Information; Codes The Naive Approach Using Mutual Information Frank Keller 2 Codes School of Informatics University of Edinburgh Source Codes keller@inf.ed.ac.uk Properties of Codes March 12, 2006 Frank Keller Formal Modeling in Cognitive Science 1 Frank Keller Formal Modeling in Cognitive Science 2 What are Collocations? What are Collocations? Application: Discovering Collocations Application: Discovering Collocations The Naive Approach The Naive Approach Codes Codes Using Mutual Information Using Mutual Information Discovering Collocations Discovering Collocations Remember collocations from Informatics 1B? collocations are sequences of words that occur together; (1) He spoke English with a/n . . . French accent. correspond to conventionalized, habitual ways of saying things; a. average b. careless are often highly frequent in the language; c. widespread collocations contrast with other expressions that are d. pronounced near-synonyms, but not conventionalized ( strong tea vs. e. chronic powerful tea ; strong car vs. powerful car ); Task: automatically identify collocations in a large corpus. Frank Keller Formal Modeling in Cognitive Science 3 Frank Keller Formal Modeling in Cognitive Science 4

What are Collocations? What are Collocations? Application: Discovering Collocations Application: Discovering Collocations The Naive Approach The Naive Approach Codes Codes Using Mutual Information Using Mutual Information Discovering Collocations Discovering Collocations (2) He gave us a . . . account of all that you had achieved over (3) Could you please give me a/n . . . account? there. a. itemized a. ready b. dreadful b. yellow c. great c. careless d. luxury d. luxury e. glowing e. glowing Frank Keller Formal Modeling in Cognitive Science 5 Frank Keller Formal Modeling in Cognitive Science 6 What are Collocations? What are Collocations? Application: Discovering Collocations Application: Discovering Collocations The Naive Approach The Naive Approach Codes Codes Using Mutual Information Using Mutual Information Discovering Collocations Discovering Collocations Why do we care about collocations? In cognitive science: Speakers of a language have strong intuitions about collocations (see previous slides). (4) Kim and Sandy made . . . after the argument. Where do these intuitions come from? Can collocational a. with knowledge be learned from exposure? Is simple co-occurrence b. about frequency enough to learn them? c. off Engineering applications: d. up collocations are different for different text types: discover e. for them automatically to create dictionaries; translation systems have to replace a collocation in the source language with a valid collocation in the target language. Can we discover collocations in corpora (large collections of text)? Frank Keller Formal Modeling in Cognitive Science 7 Frank Keller Formal Modeling in Cognitive Science 8

What are Collocations? What are Collocations? Application: Discovering Collocations Application: Discovering Collocations The Naive Approach The Naive Approach Codes Codes Using Mutual Information Using Mutual Information The Naive Approach The Naive Approach c ( w 1 , w 2 ) w 1 w 2 89871 of the The simplest way of finding collocations is counting. If two words 58841 in the occur together a lot, they form a collocation: 26430 to the go to a corpus; 21842 on the 21839 for the look for two word combinations (bigrams); 18568 and the count their frequency; 16121 that the select most frequent combinations; 15630 at the assume these are collocations. 15494 to be . . . . . . . . . 11428 New York Frank Keller Formal Modeling in Cognitive Science 9 Frank Keller Formal Modeling in Cognitive Science 10 What are Collocations? What are Collocations? Application: Discovering Collocations Application: Discovering Collocations The Naive Approach The Naive Approach Codes Codes Using Mutual Information Using Mutual Information Pointwise Mutual Information Pointwise Mutual Information As the previous example shows, if two words co-occur a lot in a corpus, it does not mean that they are collocations; I ( w 1 ; w 2 ) c ( w 1 ) c ( w 2 ) c ( w 1 , w 2 ) w 1 w 2 18.38 42 20 20 Ayatollah Ruhollah if we have a set of candidate collocations (e.g., all co-occurrences of tea ), then we can use χ 2 to filter them (see 17.98 41 27 20 Bette Midler 16.31 30 117 20 Agatha Christie Informatics 1B); 15.94 77 59 20 videocassette recorder however, this doesn’t work so well for discovering collocations 15.19 24 320 20 unsalted butter from scratch; 1.09 14907 9017 20 first made 1.01 13484 10570 20 over many instead: use pointwise mutual information; 0.53 14734 13487 20 into them intuitively, MI tells us how informative the occurrence of one 0.46 14093 14776 20 like people word is about the occurrence of another word; 0.29 15019 15629 20 time last words that are highly informative about each other form a collocation. Frank Keller Formal Modeling in Cognitive Science 11 Frank Keller Formal Modeling in Cognitive Science 12

What are Collocations? Application: Discovering Collocations Application: Discovering Collocations Source Codes The Naive Approach Codes Codes Properties of Codes Using Mutual Information Pointwise Mutual Information Source Codes Definition: Source Code Example A source code C for a random variable X is a mapping from x ∈ X Take an example from the table: to { 0 , 1 } ∗ . Let C ( x ) denote the code word for x and l ( x ) denote the length of C ( x ). c ( x , y ) I ( x ; y ) = log f ( x , y ) N f ( x ) f ( y ) = log Here, { 0 , 1 } ∗ is the set of all finite binary strings (we will only c ( x ) c ( y ) N N consider binary codes). 20 14307668 I (unsalted; butter) = log = 15 . 19 Definition: Expected Length 24 320 14307668 14307668 The expected length L ( C ) of a source code C ( x ) for a random This means: the amount of information we have about unsalted at variable with the probability distribution f ( x ) is: position i increases by 15.19 bits if we are told that butter is at position i + 1 (i.e., uncertainty is reduced by 15.19 bits). � L ( C ) = f ( x ) l ( x ) x ∈ X Frank Keller Formal Modeling in Cognitive Science 13 Frank Keller Formal Modeling in Cognitive Science 14 Application: Discovering Collocations Source Codes Application: Discovering Collocations Source Codes Codes Properties of Codes Codes Properties of Codes Source Codes Properties of Codes Definition: Non-singular Code Example A code is called non-singular if every x ∈ X maps into a different Let X be a random variable with the following distribution and string in { 0 , 1 } ∗ . code word assignment: x a b c d 1 1 1 1 f ( x ) If a code is non-singular, then we can transmit a value of X 2 4 8 8 C ( x ) 0 10 110 111 unambiguously. However, what happens if we want to transmit several values The expected code length of X is: of X in a row? f ( x ) l ( x ) = 1 2 · 1 + 1 4 · 2 + 1 8 · 3 + 1 We could use a special symbol to separate the code words. � L ( C ) = 8 · 3 = 1 . 75 However, this is not an efficient use of the special symbol; x ∈ X instead use self-punctuating codes (prefix codes). Frank Keller Formal Modeling in Cognitive Science 15 Frank Keller Formal Modeling in Cognitive Science 16

Formal Modeling in Cognitive Science What are Collocations? Lecture - PowerPoint PPT Presentation

Application: Discovering Collocations Application: Discovering Collocations Codes Codes 1 Application: Discovering Collocations Formal Modeling in Cognitive Science What are Collocations? Lecture 27: Application of Mutual Information; Codes

Cognitive Modeling Symbolic School Lecture 2: Approaches Symbolic Models 2 Symbolic

Formal Definition of a Finite Automaton Formal Definition of a Finite Automaton p.1/23 Why a

Cognitive Interviewing Debbie Collins What is cognitive interviewing? Cognitive interviewing

Formal Methods and Cryptography Lecture 25 Formal Methods Formal Methods Logical foundations

Formal Methods and Cryptography Lecture 24 1 Formal Methods 2 Formal Methods Logical

Cognitive Computing Venkat N Gudivada East Carolina University Greenville, North Carolina USA

Cognitive Load Effects Robert Whelan E19.2174 Cognitive Science & Educational Technology I

Formal Verification of RISC-V cores with riscv-formal Clifford Wolf CTO, Symbiotic EDA

1 PSYC530 wk01 handouts Cognitive Engineering Cognitive Science Applied to Human Factors

Modeling of proteins and complexes High resolution Low resolution Modeling of domains Modeling

Virtual Reality Modeling Virtual Reality Modeling from http://www.okino.com/ Modeling Modeling

Formal Modeling in Cognitive Science Lecture 20: Joint, Marginal, and Conditional Distributions

Formal Modeling in Cognitive Science Lecture 17: Sample Spaces, Events, Probabilities Steve

Formal Modeling in Cognitive Science Lecture 18: Conditional Probability; Bayes Theorem Steve

Formal Modeling in Cognitive Science Lecture 19: Application of Bayes Theorem; Discrete Random

Formal Modeling in Cognitive Science Lecture 25: Entropy, Joint Entropy, Conditional Entropy 1

Na Native Ame merican Spirit Gathering An And Feast of the Ha Harvest Sayville Con

ASTR 1120 ASTR 1120 General Astronomy: General Astronomy: Stars & Galaxies Stars &

533D Animation Physics: Why? Natural phenomena: passive motion Film/TV: difficult with

B.K.Lubsandorzhiev Institute for Nuclear Research Moscow Russia University of Tuebingen

RESTORATION OF DEAD BONES Ezekiel 37:1-14 Ezekiel 1 Isaiah 6 Numbers 14 597 BC - King

ELECTRONICS Circuits Module 4.2 Proudly developed by SMART with funding from Inspiring Australia

programme called Physics show in teaching physics J. Jaloveczki, Baja, Hungary Scheme

From 2:45-4:00; 1.25 Hrs Total 1 H OW TO G ET & W RITE T HE B EST R ECOMMENDATION L ETTERS M

Formal Modeling in Cognitive Science What are Collocations? Lecture - PowerPoint PPT Presentation

Application: Discovering Collocations Application: Discovering Collocations Codes Codes 1 Application: Discovering Collocations Formal Modeling in Cognitive Science What are Collocations? Lecture 27: Application of Mutual Information; Codes

Cognitive Modeling Symbolic School Lecture 2: Approaches Symbolic Models 2 Symbolic

Formal Definition of a Finite Automaton Formal Definition of a Finite Automaton p.1/23 Why a

Cognitive Interviewing Debbie Collins What is cognitive interviewing? Cognitive interviewing

Formal Methods and Cryptography Lecture 25 Formal Methods Formal Methods Logical foundations

Formal Methods and Cryptography Lecture 24 1 Formal Methods 2 Formal Methods Logical

Cognitive Computing Venkat N Gudivada East Carolina University Greenville, North Carolina USA

Cognitive Load Effects Robert Whelan E19.2174 Cognitive Science &amp; Educational Technology I

Formal Verification of RISC-V cores with riscv-formal Clifford Wolf CTO, Symbiotic EDA

1 PSYC530 wk01 handouts Cognitive Engineering Cognitive Science Applied to Human Factors

Modeling of proteins and complexes High resolution Low resolution Modeling of domains Modeling

Virtual Reality Modeling Virtual Reality Modeling from http://www.okino.com/ Modeling Modeling

Formal Modeling in Cognitive Science Lecture 20: Joint, Marginal, and Conditional Distributions

Formal Modeling in Cognitive Science Lecture 17: Sample Spaces, Events, Probabilities Steve

Formal Modeling in Cognitive Science Lecture 18: Conditional Probability; Bayes Theorem Steve

Formal Modeling in Cognitive Science Lecture 19: Application of Bayes Theorem; Discrete Random

Formal Modeling in Cognitive Science Lecture 25: Entropy, Joint Entropy, Conditional Entropy 1

Na Native Ame merican Spirit Gathering An And Feast of the Ha Harvest Sayville Con

ASTR 1120 ASTR 1120 General Astronomy: General Astronomy: Stars &amp; Galaxies Stars &amp;

533D Animation Physics: Why? Natural phenomena: passive motion Film/TV: difficult with

B.K.Lubsandorzhiev Institute for Nuclear Research Moscow Russia University of Tuebingen

RESTORATION OF DEAD BONES Ezekiel 37:1-14 Ezekiel 1 Isaiah 6 Numbers 14 597 BC - King

ELECTRONICS Circuits Module 4.2 Proudly developed by SMART with funding from Inspiring Australia

programme called Physics show in teaching physics J. Jaloveczki, Baja, Hungary Scheme

From 2:45-4:00; 1.25 Hrs Total 1 H OW TO G ET &amp; W RITE T HE B EST R ECOMMENDATION L ETTERS M

Cognitive Load Effects Robert Whelan E19.2174 Cognitive Science & Educational Technology I

ASTR 1120 ASTR 1120 General Astronomy: General Astronomy: Stars & Galaxies Stars &

From 2:45-4:00; 1.25 Hrs Total 1 H OW TO G ET & W RITE T HE B EST R ECOMMENDATION L ETTERS M