Probabilistic approaches to language and language learning John - PowerPoint PPT Presentation

Probabilistic approaches to language and language learning John Goldsmith The University of Chicago 1

This work is based on the work of too many people to name them all directly. Nonetheless, I must specifjcally acknowledge Jorma Rissanen (MDL), Michael Brent and Carl de Marcken (applying MDL to word discovery), and Yu Hu, Colin Sprague, Jason Riggle, and Aris Xanthos, at the University of Chicago. 2

How can it be innovative —much less subversive — to propose to use statistical and probabilistic methods in a scientifjc analysis in the year 2006 Anno Domini? 3

1. Rationalism and empiricism—and modern science. 2. The mystery of the synthetic apriori is still lurking. 3. Universal grammar is a fjne scientifjc hypothesis, but not a good synthetic a priori. 4. Grammar construction as maximum a posteriori probability . 4

1. The development of modern science The reasonable The surprising efgectiveness of efgectiveness of understanding the mathematics in universe by understanding observing it the universe. carefully. 5

Rationalism Empiricism The efgectiveness The efgectiveness of mathematical models of observing the of the universe, and the universe even when mind’s ability to develop what we see is not what abstract models, and we expected. make predictions from Especially then. them. T rust the senses. T rust the mind. 6

Francis Bacon Those who have handled sciences have been either men of experiment or men of dogmas. The men of experiment are like the ant , they only collect and use; the reasoners resemble spiders , who make cobwebs out of their own substance. But the bee takes a middle course: it gathers its material from the fmowers of the garden and of the fjeld, but transforms and digests it by a power of its own. Not unlike this is the true business of philosophy ; for it neither relies solely or chiefmy on the powers of the mind , nor does it take the matter which it gathers from natural history and mechanical experiments and lay it up in the memory whole, as it fjnds it, but lays it up in the understanding altered and digested. 7

The collision of rationalism and empiricism Kant’s synthetic apriori: The proposal that there exist contentful truths knowable independent of experience. They are accessible because the very possibility of mind presupposes them. Space, time, causality, induction. 8

2. Synthetic apriori The problem is still lurking. Efgorts to dissolve it have been many. One method, in both linguistics and psychology, is to naturalize it: to view it as a scientifjc problem. “The problem lies in the object of study: the human brain.” 9

Synthetic apriori The mind’s construction of the world is its best understanding of what the senses provides it with. = World arg max pr ( world | observatio ns ) i world i ∈ possible worlds The real world is the one which is most probable, given our observations. Bayesian, maximum a posteriori reasoning 10

D = Data H = Hypothesis Bayes’ Rule pr ( D | H ) pr ( H ) = pr ( H | D ) pr ( D ) 11

D = Data H = Hypothesis Bayes’ Rule pr ( D | H ) pr ( H ) = pr ( H | D ) pr ( D ) = = pr ( H | D ) pr ( D ) pr ( D and H ) pr ( D | H ) pr ( H ) Defjnition Defjnition 13

If reality is the most probable hypothesis, given the evidence... we must fjnd the hypothesis for which the following is a maximum: D = Data H = Hypothesis pr ( D | H ) pr ( H ) How do we calculate the How do we calculate the probability probability of our observations, given our of our hypothesis about what understanding of reality? reality is? rationalism empiricism 16

How do we calculate the How do we calculate the probability probability of our observations, given our of our hypothesis about what understanding of reality? reality is? Assign a (“prior”) Insist that your probability to all grammars be hypotheses, based on probabilistic: they assign their coherence. a probability to their Measure the coherence. generated output. Call it an evaluation metric . 17

Generative grammar Construct an evaluation metric: Choose the grammar which best satisfjes the evaluation metric, as long as it somehow matches up with the data. Generative grammar satisfjes the rationalist need. 18

Generative grammar Construct an evaluation metric: Choose the grammar which best satisfjes the evaluation metric, as long as it somehow matches up with the data. Generative grammar satisfjes the rationalist need. It fails to say anything at all about the empiricist need. 19

Assigning probability to algorithms after Solomonofg, Chaitin, Kolmogorov The probability the length of its most ...related of an compact expression to... algorithm log pr(A) = - length (A) pr(A) = 2 -length(A) 20

Assigning probability to algorithms after Solomonofg, Chaitin, Kolmogorov The probability the length of its most ...related of an compact expression to... algorithm log pr(A) = - length (A) pr(A) = 2 -length(A) The promise of this approach is that it ofgers an apriori measure of complexity expressed in the language of probability 23

Let’s get to work and write some grammars. We will make sure they all assign probabilities to our observations. We will make sure we can calculate their length. Then we know how to rationally pick the best one... 24

The real challenge for the linguist is to see if this methodology will lead to the automatic discovery of structure that we already know is there. 25

T o maximize pr(Grammar)*pr(Data|Grammar) we maximize log pr(Grammar)+log pr (Data|Grammar) or minimize -log pr(Grammar)–log pr(Data|Grammar) or minimize Length(Grammar) – log pr(Data|Grammar) 26

An observation: thedogsawthecatandthecatsawthedog 27

An observation: thedogsawthecatandthecatsawthedog What is its probability? 28

An observation: thedogsawthecatandthecatsawthedog What is its probability? Its probability depends on the model we propose. The mind is active. The mind chooses. 29

An observation: thedogsawthecatandthecatsawthedog What is its probability? If we only know that the language has phonemes , we can calculate the probability based on phonemes . 30

Phonological structure (1)The probability of a phoneme can be calculated independent of context; or (2) We can calculate a phoneme’s probability conditioned by the phoneme that precedes it. 31

Phonological structure (1)The probability of a phoneme can be calculated independent of context; or (2) We can calculate a phoneme’s probability conditioned by the phoneme that precedes it. T o make life simple for now, we choose (1). 32

Probability of our observation: thedogsawthecatandthecatsawthedog pr(t) * pr(h) * pr(e)…pr(g) Multiply the probability of all 33 letters. − 33 = 2 . 04 * 10 33

D = Data pr ( D | H ) pr ( H ) H = Hypothesis We have pr(D|H): probability of the data given the phoneme hypothesis. What is the probability of the phoneme hypothesis: pr(H)? 34

D = Data pr ( D | H ) pr ( H ) H = Hypothesis We have pr(D|H): probability of the data given the phoneme hypothesis. What is the probability of the phoneme hypothesis: pr(H)? We interpret that as the question: What is the probability of a system with 11 distinct phonemes ? 35

D = Data pr ( D | H ) pr ( H ) H = Hypothesis We have pr(D|H): probability of the data given the phoneme hypothesis. What is the probability of the phoneme hypothesis: pr(H)? We interpret that as the question: What is the probability of a system with Π(11)=Prob[Phoneme Inventory (Lg)=11] 11 distinct phonemes ? 36

D = Data pr ( D | H ) pr ( H ) H = Hypothesis We have pr(D|H): probability of the data given the phoneme hypothesis. What is the probability of the phoneme hypothesis: pr(H)? And is there a better hypothesis available, anyway? Yes, there is. 37

The word hypothesis: There is a vocabulary in this language: the dog saw cat and 38

Probabilistic approaches to language and language learning John - PowerPoint PPT Presentation

Probabilistic approaches to language and language learning John Goldsmith The University of Chicago 1 This work is based on the work of too many people to name them all directly. Nonetheless, I must specifjcally acknowledge Jorma Rissanen

Probabilistic model Probabilistic model c Probabilistic model Probabilistic model c c

CS 4110 Probabilistic Programming Probabilistic Programming It's not about writing software.

Probabilistic Graphical Models Probabilistic Graphical Models Structure learning in Bayesian

Probabilistic Tracking and Probabilistic Tracking and Probabilistic Tracking and Thesis

Probabilistic Tracking and Probabilistic Tracking and Probabilistic Tracking and Reconstruction

From Probabilistic Circuits to Probabilistic Programs and Back Guy Van den Broeck PROBPROG - Oct

Table of Contents I Probabilistic Reasoning Classical Probabilistic Models Basic Probabilistic

Running Probabilistic Running Probabilistic Running Probabilistic Programs Backwards Programs

Probabilistic Computation Lecture 13 BPP vs. PH 1 Recap 2 Recap Probabilistic computation 2

Probabilistic Computation Lecture 12 Flipping coins, taking chances PP, BPP 1 Probabilistic

Probabilistic Computation Lecture 13 Understanding BPP 1 Recap 2 Recap Probabilistic

Probabilistic Graphical Models Probabilistic Graphical Models introduction to learning Siamak

Automated learning with a probabilistic programming language: Birch 1. The Birch probabilistic

Probabilistic Graphical Models Probabilistic Graphical Models Learning with partial observations

Probabilistic Graphical Models Probabilistic Graphical Models Parameter learning in Bayesian

Probabilistic Graphical Models Probabilistic Graphical Models parameter learning in undirected

Resurrecting dinosaurs, what can possibly go wrong? How Containerised Apps could eat our users.

Here Beside the Rising Tide: The Dead, the Counterculture, & American Democracy PRELUDE

Unit 1: Introduction to data 3. More exploratory data analysis STA 104 - Summer 2017 PS 1 is

Debian Science Umbrella for scientific packages or dustbin for scientific code Andreas Tille

Keep Lead from Keep Lead from Lurking Lurking Lead Testing and Lead Testing and Healthy

RECASTING EXPERIMENTAL SEARCHES Michele Papucci LBNL & BCTP Amherst, November 12th, 2015

Realization formulae for bounded holomorphic functions on certain domains and an application to

Active Galactic Nuclei (AGNs): General Astronomy: Another Type of Galactic Fireworks Stars &

Probabilistic approaches to language and language learning John - PowerPoint PPT Presentation

Probabilistic approaches to language and language learning John Goldsmith The University of Chicago 1 This work is based on the work of too many people to name them all directly. Nonetheless, I must specifjcally acknowledge Jorma Rissanen

Probabilistic model Probabilistic model c Probabilistic model Probabilistic model c c

CS 4110 Probabilistic Programming Probabilistic Programming It's not about writing software.

Probabilistic Graphical Models Probabilistic Graphical Models Structure learning in Bayesian

Probabilistic Tracking and Probabilistic Tracking and Probabilistic Tracking and Thesis

Probabilistic Tracking and Probabilistic Tracking and Probabilistic Tracking and Reconstruction

From Probabilistic Circuits to Probabilistic Programs and Back Guy Van den Broeck PROBPROG - Oct

Table of Contents I Probabilistic Reasoning Classical Probabilistic Models Basic Probabilistic

Running Probabilistic Running Probabilistic Running Probabilistic Programs Backwards Programs

Probabilistic Computation Lecture 13 BPP vs. PH 1 Recap 2 Recap Probabilistic computation 2

Probabilistic Computation Lecture 12 Flipping coins, taking chances PP, BPP 1 Probabilistic

Probabilistic Computation Lecture 13 Understanding BPP 1 Recap 2 Recap Probabilistic

Probabilistic Graphical Models Probabilistic Graphical Models introduction to learning Siamak

Automated learning with a probabilistic programming language: Birch 1. The Birch probabilistic

Probabilistic Graphical Models Probabilistic Graphical Models Learning with partial observations

Probabilistic Graphical Models Probabilistic Graphical Models Parameter learning in Bayesian

Probabilistic Graphical Models Probabilistic Graphical Models parameter learning in undirected

Resurrecting dinosaurs, what can possibly go wrong? How Containerised Apps could eat our users.

Here Beside the Rising Tide: The Dead, the Counterculture, &amp; American Democracy PRELUDE

Unit 1: Introduction to data 3. More exploratory data analysis STA 104 - Summer 2017 PS 1 is

Debian Science Umbrella for scientific packages or dustbin for scientific code Andreas Tille

Keep Lead from Keep Lead from Lurking Lurking Lead Testing and Lead Testing and Healthy

RECASTING EXPERIMENTAL SEARCHES Michele Papucci LBNL &amp; BCTP Amherst, November 12th, 2015

Realization formulae for bounded holomorphic functions on certain domains and an application to

Active Galactic Nuclei (AGNs): General Astronomy: Another Type of Galactic Fireworks Stars &amp;

Here Beside the Rising Tide: The Dead, the Counterculture, & American Democracy PRELUDE

RECASTING EXPERIMENTAL SEARCHES Michele Papucci LBNL & BCTP Amherst, November 12th, 2015

Active Galactic Nuclei (AGNs): General Astronomy: Another Type of Galactic Fireworks Stars &