Probabilistic Models of Human Sentence Experiment 1: Entropy and - PowerPoint PPT Presentation

From Sentence to Text From Sentence to Text Experiment 1: Entropy and Sentence Length Experiment 1: Entropy and Sentence Length Experiment 2: Entropy in Context Experiment 2: Entropy in Context Experiment 3: Entropy out of Context Experiment 3: Entropy out of Context From Sentence to Text 1 Entropy Rate Principle Predictions Probabilistic Models of Human Sentence Experiment 1: Entropy and Sentence Length 2 Processing Method Cognitive Modeling Guest Lecture 2 Results Discussion 3 Experiment 2: Entropy in Context Frank Keller Method Results School of Informatics Discussion University of Edinburgh keller@inf.ed.ac.uk Experiment 3: Entropy out of Context 4 Method November 9, 2006 Results Discussion Frank Keller Models of Sentence Processing 1 Frank Keller Models of Sentence Processing 2 From Sentence to Text From Sentence to Text Experiment 1: Entropy and Sentence Length Entropy Rate Principle Experiment 1: Entropy and Sentence Length Entropy Rate Principle Experiment 2: Entropy in Context Predictions Experiment 2: Entropy in Context Predictions Experiment 3: Entropy out of Context Experiment 3: Entropy out of Context From Sentence to Text From Sentence to Text Entropy Rate Principle Speakers produce language whose entropy rate is on average We have successfully modeled the processing of individual constant (Genzel and Charniak 2002, 2003; G&C). sentences using probabilistic models. Can the probabilistic approach be extended to text, i.e., Motivation: connected sequences of sentences? information theory: most efficient way of transmitting Use notions from information theory to formalize the relationship between processing effort for sentences and information through a noisy channel is at a constant rate; processing effort for text. if human communication has evolved to be optimal, then humans produce text and speech with constant entropy; some evidence for speech (Aylett 1999). Frank Keller Models of Sentence Processing 3 Frank Keller Models of Sentence Processing 4

From Sentence to Text From Sentence to Text Experiment 1: Entropy and Sentence Length Entropy Rate Principle Experiment 1: Entropy and Sentence Length Entropy Rate Principle Experiment 2: Entropy in Context Predictions Experiment 2: Entropy in Context Predictions Experiment 3: Entropy out of Context Experiment 3: Entropy out of Context Entropy Rate Principle Entropy Rate Principle Applying the Entropy Rate Principle (ERP) to text: entropy is constant, but the amount of context available to Sentences in context: the speaker increases with increasing sentence position; Sentences out of context: 1. a a a a a a a prediction: if we measure entropy out of context (i.e., based a a a a a a a H = 0 . 4 2. b b b b b b b on the probability of isolated sentences), then entropy should 3. c c c c c c c increase with sentence position; 4. d d d d d d d e e e e e e e H = 0 . 7 G&C show that this is true for both function and content 5. e e e e e e e words, and for a range of languages and genres; entropy can be estimated using a language model or a probabilistic parser. Frank Keller Models of Sentence Processing 5 Frank Keller Models of Sentence Processing 6 From Sentence to Text From Sentence to Text Experiment 1: Entropy and Sentence Length Entropy Rate Principle Experiment 1: Entropy and Sentence Length Entropy Rate Principle Experiment 2: Entropy in Context Predictions Experiment 2: Entropy in Context Predictions Experiment 3: Entropy out of Context Experiment 3: Entropy out of Context Predictions for Human Language Processing Predictions for Human Language Processing Out-of-context predictions In-context predictions out-of-context entropy increases with sentence position; tested in-context entropy on average the same for all sentences; extensively by G&C (replicated in Exp. 1); prediction: in-context reading time not correlated with out-of-context processing effort increases with sentence sentence position (Exp. 2). position; processing effort increases with entropy; reading time as an index of processing effort; reading time as an index of processing effort; prediction: out of context reading time correlated with prediction: reading time correlated with entropy (Exp. 2). sentence position (Exp. 3). Frank Keller Models of Sentence Processing 7 Frank Keller Models of Sentence Processing 8

From Sentence to Text From Sentence to Text Method Method Experiment 1: Entropy and Sentence Length Experiment 1: Entropy and Sentence Length Results Results Experiment 2: Entropy in Context Experiment 2: Entropy in Context Discussion Discussion Experiment 3: Entropy out of Context Experiment 3: Entropy out of Context Experiment 1: Method Results Replication of G&C’s results: use Wall Street Journal corpus (1M words), divided into training and test set; Correlation of sentence entropy and sentence position ( c = 25): treat each article as a separate text; compute sentence position by counting from beginning of text (1–149). Binned data Raw data compute per-word entropy computed using an n -gram Entropy 3-gram 0 . 6387 ∗∗ 0 . 0598 ∗∗ language model: Sentence length − 0 . 4607 ∗ − 0 . 0635 ∗∗ H ( X ) = − 1 ˆ � log P ( x i | x i − ( n − 1) . . . x i − 1 ) | X | Significance level: ∗ p < . 05, ∗∗ p < . 01 x i ∈ X Extension of G&C’s results: correlation on raw data or on binned data (avg. by position); baseline model: sentence length | X | . Frank Keller Models of Sentence Processing 9 Frank Keller Models of Sentence Processing 10 From Sentence to Text From Sentence to Text Method Method Experiment 1: Entropy and Sentence Length Experiment 1: Entropy and Sentence Length Results Results Experiment 2: Entropy in Context Experiment 2: Entropy in Context Discussion Discussion Experiment 3: Entropy out of Context Experiment 3: Entropy out of Context Results Results 9.5 We need to disconfound entropy and sentence length. 9 25 Sentence length Entropy [bits] Compute correlation of entropy and sentence length with sentence 8.5 position, with the other factor partialled out ( c = 25): 20 8 7.5 Binned data Binned data Raw data 15 Entropy 3-gram 0 . 6708 ∗∗ 0 . 0784 ∗∗ 7 0 20 40 60 80 0 20 40 60 80 Sentence position Sentence position Sentence length − 0 . 7435 ∗∗ − 0 . 0983 ∗∗ Correlation of sentence entropy Correlation of sentence length and sentence pos. (binned data) and sentence pos. (binned data) Frank Keller Models of Sentence Processing 11 Frank Keller Models of Sentence Processing 12

From Sentence to Text From Sentence to Text Method Method Experiment 1: Entropy and Sentence Length Experiment 1: Entropy and Sentence Length Results Results Experiment 2: Entropy in Context Experiment 2: Entropy in Context Discussion Discussion Experiment 3: Entropy out of Context Experiment 3: Entropy out of Context Discussion Aims of Experiment 2 This experiment tests the psycholinguistic predictions of the ERP Results of Exp. 1 confirm G&C’s main finding: entropy in context: increases with sentence position; entropy predicted to correlate with processing effort; however: sign. negative correlation between sentence position and sentence length: longer sentences tend to occur earlier in test this using a corpus of newspaper text annotated with the text; eye-tracking data; further analyses show that entropy rate is a significant eye-tracking measures of reading time reflect processing effort independent predictor, even if sentence length is controlled for; for words and sentences; G&Cs effect holds even without binning: important for sentences position predicted not to correlate with processing evaluation against human data (binning not allowed). effort for in-context sentences. Frank Keller Models of Sentence Processing 13 Frank Keller Models of Sentence Processing 14 From Sentence to Text From Sentence to Text Method Method Experiment 1: Entropy and Sentence Length Experiment 1: Entropy and Sentence Length Results Results Experiment 2: Entropy in Context Experiment 2: Entropy in Context Discussion Discussion Experiment 3: Entropy out of Context Experiment 3: Entropy out of Context Method Results Correlation of entropy and position on the Embra corpus: Test set: Embra eye-tracking corpus (McDonald and Shillcock 2003); 2,262 words of text from UK newspapers; Binned data Raw data Entropy 3-gram − 0 . 5512 ∗∗ − 0 . 1674 regression used to control confounding factors: word length, word frequency (Lorch and Myers 1990); Sentence length 0 . 3902 0 . 0885 training and development set: broadsheet newspaper section of the BNC; training: 6.7M words, development: 0.7M words; Correlation of reading times with entropy and sentence position: sentence position: 1–24 in test set, 1–206 in development set; Entropy 3-gram 0 . 1646 ∗∗ entropy computed as in Experiment 1; Sentence position − 0 . 0266 Frank Keller Models of Sentence Processing 15 Frank Keller Models of Sentence Processing 16

Probabilistic Models of Human Sentence Experiment 1: Entropy and - PowerPoint PPT Presentation

From Sentence to Text From Sentence to Text Experiment 1: Entropy and Sentence Length Experiment 1: Entropy and Sentence Length Experiment 2: Entropy in Context Experiment 2: Entropy in Context Experiment 3: Entropy out of Context Experiment

Chicken Human 1 Human 2 Rat Chicken Human 1 Human 2 Rat Chicken Human 1 Human 2 Rat

A Sentence is a Sentence is a Sentence? Zarah Weiss Introduction Parallels and Differences

SENTENCE STRUCTURE ATI TEAS ENGLISH AND LANGUAGE USAGE SENTENCE STRUCTURE Sentence Structure

Probabilistic model Probabilistic model c Probabilistic model Probabilistic model c c

Probabilistic Models of Human Parsing Parser Architectures Informatics 2A: Lecture 23 2

Probabilistic Tracking and Probabilistic Tracking and Probabilistic Tracking and Thesis

Probabilistic Tracking and Probabilistic Tracking and Probabilistic Tracking and Reconstruction

Probabilistic Graphical Models CMSC 678 UMBC Probabilistic Graphical Models A graph G that

Structure for Semantic Tasks Gabriel Stanovsky, Ido Dagan and Mausam Sentence Level Semantic

I. Watch the Einstein video and answer the following questions: What is a sentence? What is a

CS 4110 Probabilistic Programming Probabilistic Programming It's not about writing software.

Table of Contents I Probabilistic Reasoning Classical Probabilistic Models Basic Probabilistic

From Probabilistic Circuits to Probabilistic Programs and Back Guy Van den Broeck PROBPROG - Oct

Probabilistic Morphable Models 2019: Hands-on part Ghazi Bouabene Probabilistic Morphable Models

Computer Science Let me be provocative Probabilistic graphical models is how we do probabilistic

Outline Graphical Models - Part I Greg Mori - CMPT 419/726 Probabilistic Models Bishop PRML Ch.

Lecture 01 The Security Mindset Stephen Checkoway CS 343 Fall 2020 Adapted from Michael

Wentworth Institute of Technology College of Engineering and Technology COMP201 Computer

Trust Region Policy Optimization John Schulman, Sergey Levine, Philipp Moritz, Michael I. Jordan,

A Key-recovery Attack on 855-Round Trivium Ximing Fu, Xiaoyun Wang, Xiaoyang Dong , Willi Meier

Overview Focus Projection Focus Projection Focus to Accent Focus to Accent Restricted View of

Evaluating Data Sources in a Large Czech-English Corpus CzEng 0.9 ek Ond rej Bojar, Adam

Logic as a Tool Chapter 3: Understanding First-order Logic 3.4 Truth, validity, logical

Semantic Structural Evaluation for Text Simplification Elior Sulem, Omri Abend and Ari Rappoport