Grammars, graphs and automata (Probabilistic) finite state machines - PowerPoint PPT Presentation

High-level overview • Probability distributions and graphical models Grammars, graphs and automata • (Probabilistic) finite state machines and context-free grammars – computation (dynamic programming) Mark Johnson – estimation Brown University • Log-linear models ESSLLI 2005 – stochastic unification-based grammars – reranking parsing • Weighted CFGs and proper PCFGs slides available from http:/ /cog.brown.edu/˜mj 1 2 Topics What is computational linguistics? • Graphical models and Bayes networks Computational linguistics studies the computational processes involved in language production, comprehension and acquisition . • (Hidden) Markov models • assumption that language is inherently computational • (Probabilistic) context-free grammars and finite-state machines • scientific side: • Computation with and estimation of PCFGs – modeling human performance (computational psycholinguistics) • Lexicalized and bi-lexicalized PCFGs – understanding how it can be done at all • Non-local dependencies and log-linear models • technological applications: • Features in reranking parsing – speech recognition • Stochastic unification-based grammar – information extraction (who did what to whom) and question answering • Weighted CFGs and proper PCFGs – machine translation (translation by computer) 3 4

(Some of the) problems in modeling language Aspects of linguistic structure + Language is a product of the human mind • Phonetics: the (production and perception) of speech sounds ⇒ any structure we observe is a product of the mind • Phonology: the organization and regularities of speech sounds − Language involves a transduction between form and meaning , but we don’t • Morphology: the structure and organization of words know much about the way meanings are represented • Syntax: the way words combine to form phrases and sentences + / − We have (reasonable?) guesses about some of the computational processes • Semantics: the way meaning is associated with sentences involved in language • Pragmatics: how language can be used to do things − We don’t know very much about the cognitive processes that language interacts with In general the further we get from speech, the less well we understand what’s going on! − We know little about the anatomical layout of language in the brain − We know little about neural networks that might support linguistic computations 5 6 Aspects of syntactic and semantic structure A very brief history (Antiquity) Birth of linguistics, logic, rhetoric S S (1900s) Structuralist linguistics (phrase structure) NP VP CONJ S (1900s) Mathematical logic DT NN VB NP But NP VP (1900s) Probability and statistics Most people hate VBD NNS DT NNS VBD S (1940s) Behaviorism (discovery procedures, corpus linguistics) baked beans the students promised NP VP (1940s) Ciphers and codes PRO TO VP (1950s) Information theory to VB NP (1950s) Automata theory (1960s) Context-free grammars eat PRP • Anaphora: it refers to baked beans (1960s) Generative grammar dominates (US) linguistics (Chomsky) them • Predicate-argument structure: the students is agent of eat (1980s) “Neural networks” (learning as parameter estimation) • Discourse structure: second clause is contrasted with first (1980s) Graphical models (Bayes nets, Markov Random Fields) These all refer to phrase structure entities! Parsing is the process of recovering (1980s) Statistical models dominate speech recognition these entities. (1980s) Probabilistic grammars (1990s) Statistical methods dominate computational linguistics (1990s) Computational learning theory 7 8

Topics Probability distributions • Graphical models and Bayes networks • A probability distribution over a countable set Ω is a function P : Ω → [0 , 1] which satisfies 1 = � ω ∈ Ω P( ω ). • (Hidden) Markov models � • A random variable is a function X : Ω → X . P( X = x ) = P( ω ) • (Probabilistic) context-free grammars ω : X ( ω )= x • (Probabilistic) finite-state machines • If there are several random variables X 1 , . . . , X n , then: • Computation with PCFGs – P( X 1 , . . . , X n ) is the joint distribution – P( X i ) is the marginal distribution of X i • Estimation of PCFGs • X 1 , . . . , X n are independent iff P( X 1 , . . . , X n ) = P( X 1 ) . . . P( X n ), • Lexicalized and bi-lexicalized PCFGs i.e., the joint is the product of the marginals • Non-local dependencies and log-linear models • The conditional distribution of X given Y is P( X | Y ) = P( X, Y ) / P( Y ) • Stochastic unification-based grammars so P( X, Y ) = P( Y )P( X | Y ) = P( X )P( Y | X ) (Bayes rule) • X 1 , . . . , X n are conditionally independent given Y iff P( X 1 , . . . , X n | Y ) = P( X 1 | Y ) . . . P( X n | Y ) 9 10 Bayes inversion and the noisy channel model Why graphical models? Given an acoustic signal a , find words � w ( a ) most likely to correspond to a • Graphical models depict factorizations of probability distributions • Statistical and computational properties depend on the factorization w ⋆ ( a ) = arg max P( W = w | A = a ) w – complexity of dynamic programming is size of a certain cut in the Language model P( A )P( W | A ) = P( W, A ) = P( W )P( A | W ) graphical model P( W ) • Two different (but related) graphical representations P( W | A ) = P( W )P( A | W ) – Bayes nets (directed graphs; products of conditionals) P( A ) Acoustic model – Markov Random Fields (undirected graphs; products of arbitrary terms) P( A | W ) P( W = w )P( A = a | W = w ) w ⋆ ( a ) = arg max • Each random variable X i is represented by a node P( A = a ) w = arg max P( W = w )P( A = a | W = w ) Acoustic signal A w Advantages of noisy channel model: • P( W | A ) is hard to construct directly; P( A | W ) is easier • noisy channel also exploits language model P( W ) 11 12

Bayes nets (directed graph) Markov Random Field (undirected) • Factorize joint P( X 1 , . . . , X n ) into product of conditionals • Factorize P( X 1 , . . . , X n ) into product of potentials g c ( X c ), where c ⊆ (1 , . . . , n ) and c ∈ C (a set of tuples of indices) n � P( X 1 , . . . , X n ) = P( X i | X P a ( i ) ) � 1 P( X 1 , . . . , X n ) = g c ( X c ) i =1 Z c ∈C where Pa ( i ) ⊆ ( X 1 , . . . , X i − 1 ) • If i, j ∈ c ∈ C , then an edge connects i and j • The Bayes net contains an arc from each j ∈ Pa ( i ) to i C = { (1 , 2 , 3) , (3 , 4) } P( X 1 , X 2 , X 3 , X 4 ) = P( X 1 )P( X 2 )P( X 3 | X 1 , X 2 )P( X 4 | X 3 ) 1 P( X 1 , X 2 , X 3 , X 4 ) = Z g 123 ( X 1 , X 2 , X 3 ) g 34 ( X 3 , X 4 ) X 1 X 3 X 4 X 1 X 3 X 4 X 2 X 2 13 14 A rose by any other name ... Bayes nets and MRFs • MRFs have the same form as Maximum Entropy models , Exponential • MRFs are more general than Bayes nets models , Log-linear models , Harmony models , . . . • Its easy to find the MRF representation of a Bayes net � 1 P( X ) = g c ( X c ) P( X 1 , X 2 , X 3 , X 4 ) = P( X 1 )P( X 2 )P( X 3 | X 1 , X 2 ) P( X 4 | X 3 ) Z � �� c ∈C � g 123 ( X 1 , X 2 , X 3 ) g 34 ( X 3 , X 4 ) 1 ( θ X c = x c ) [ [ X c = x c ] ] , where θ X c = x c = g c ( x c ) = Z • Moralization , i.e, “marry the parents” c ∈C ,x c ∈X c � 1 X 1 X 1 = Z exp [ [ X c = x c ] ] φ X c = x c , where φ X c = x c = log g c ( x c ) c ∈C ,X c ∈X c X 3 X 4 X 3 X 4 1 P( X ) = Z g 123 ( X 1 , X 2 , X 3 ) g 34 ( X 3 , X 4 )   1  [ [ X 123 = 000] ] φ 000 + [ [ X 123 = 001] ] φ 001 + . . . X 2 X 2  = Z exp [ [ X 34 = 00] ] φ 00 + [ [ X 34 = 01] ] φ 01 + . . . 15 16

Conditionalization in MRFs Marginalization in MRFs • Conditionalization is fixing the value of certain variables • Marginalization is summing over all possible values of certain variables • To get a MRF representation of the conditional distribution, delete nodes • To get a MRF representation of the marginal distribution, delete the whose values are fixed and arcs connected to them marginalized nodes and interconnect all of their neighbours � 1 P( X 1 , X 2 , X 4 ) = P( X 1 , X 2 , X 3 , X 4 ) P( X 1 , X 2 , X 4 | X 3 = v ) = Z P( X 3 = v ) g 123 ( X 1 , X 2 , v ) g 34 ( v, X 4 ) X 3 � 1 g ′ g ′ = g 123 ( X 1 , X 2 , X 3 ) g 34 ( X 3 , X 4 ) = 12 ( X 1 , X 2 ) 4 ( X 4 ) Z ′ ( v ) X 3 X 1 X 1 g ′ = 124 ( X 1 , X 2 , X 4 ) X 3 = v X 4 X 4 X 1 X 1 X 3 X 4 X 4 ✂✁✂ �✁� X 2 X 2 X 2 X 2 17 18 Classification ML and CML Estimation • Given value of X , predict value of Y • Maximum likelihood estimation (MLE) picks the θ that makes the data D = ( x, y ) as likely as possible • Given a probabilistic model P( Y | X ), predict: � θ = arg max P θ ( x, y ) y ⋆ ( x ) = arg max P( y | x ) θ y • Conditional maximum likelihood estimation (CMLE) picks the θ that • Learn P( Y | X ) from data D = (( x 1 , y 1 ) , . . . , ( x n , y n )) maximizes conditional likelihood of the data D = ( x, y ) • Restrict attention to a parametric model class P θ parameterized by θ ′ � = arg max P θ ( y | x ) parameter vector θ θ – learning is estimating θ from D • P( X, Y ) = P( X )P( Y | X ), so CMLE ignores P( X ) 19 20

Grammars, graphs and automata (Probabilistic) finite state machines - PowerPoint PPT Presentation

High-level overview Probability distributions and graphical models Grammars, graphs and automata (Probabilistic) finite state machines and context-free grammars computation (dynamic programming) Mark Johnson estimation Brown

CSC 473 Automata, Grammars & Languages 9/29/10 Automata, Grammars and Languages Discourse 03

CSC 473 Automata, Grammars & Languages 11/9/10 Automata, Grammars and Languages Discourse 06

CSC 473 Automata, Grammars & Languages 8/15/10 Automata, Grammars and Languages Discourse 01

Grammars and Parsing Grammars and Sentence Structure What makes a good grammar A

Graphs () Graphs () Graphs Graphs Graphs are collections of nodes

Weighted graphs Weighted graphs Weighted graphs Weighted graphs Graphs with numbers, called

Speech and Language Processing Formal Grammars Chapter 12 Today Formal Grammars

Formal Grammars Why Study Grammars? Whats a Grammar? August 24, 2014 Parsing Brian A.

Graph Automata Jan Leike July 2nd, 2012 Motivation We want an automata model that Motivation

3. Parsing 3.1 Context-Free Grammars and Push-Down Automata 3.2 Recursive Descent Parsing 3.3

Formal Languages, Grammars and Automata Lecture 5 Helle Hvid Hansen helle@cs.ru.nl

Week 4 Kullmann Graphs and directed graphs Elementary Graph Algorithms Representing graphs

Graphs Graphs Examples Definitions Implementation/Representation of graphs Graphs

On some classes of Deza graphs Deza graphs without 3-cocliques Line graphs V.V. Kabanov 1 Deza

Automata and Formal Languages Grammars Regular Expressions Example of Peter Wood Research

Automata and Formal Languages Grammars Regular Expressions Example of Peter Wood Research

L95: Introduction to Natural Language Syntax and Parsing Lecture 9: Pragmatics Simone Teufel

Session State Objectives Identify and discuss some complicating factors in enterprise

Using Gaussian Mixture Models to Detect Figurative Language in Context Linlin Li and Caroline

T YPES OF M ODELS Prasun Dewan Department of Computer Science University of North Carolina at

NATIONAL FOOD HUB SURVEY 2015 November 19, 2015 Presentation Outline Technical Orientation

Planning and Theorem Proving Slides by Svetlana Lazebnik, 9/2016 with modifications by Mark

Individual Rights to Privacy versus the needs of the State Bob Ayers UNIVERSAL DECLARATION OF

THE INVESTMENT APPROACH AND SOCIAL INVESTMENT : WHA T'S TO LIKE, WHA T'S TO WORRY ABOUT Bill

Grammars, graphs and automata (Probabilistic) finite state machines - PowerPoint PPT Presentation

High-level overview Probability distributions and graphical models Grammars, graphs and automata (Probabilistic) finite state machines and context-free grammars computation (dynamic programming) Mark Johnson estimation Brown

CSC 473 Automata, Grammars &amp; Languages 9/29/10 Automata, Grammars and Languages Discourse 03

CSC 473 Automata, Grammars &amp; Languages 11/9/10 Automata, Grammars and Languages Discourse 06

CSC 473 Automata, Grammars &amp; Languages 8/15/10 Automata, Grammars and Languages Discourse 01

Grammars and Parsing Grammars and Sentence Structure What makes a good grammar A

Graphs () Graphs () Graphs Graphs Graphs are collections of nodes

Weighted graphs Weighted graphs Weighted graphs Weighted graphs Graphs with numbers, called

Speech and Language Processing Formal Grammars Chapter 12 Today Formal Grammars

Formal Grammars Why Study Grammars? Whats a Grammar? August 24, 2014 Parsing Brian A.

Graph Automata Jan Leike July 2nd, 2012 Motivation We want an automata model that Motivation

3. Parsing 3.1 Context-Free Grammars and Push-Down Automata 3.2 Recursive Descent Parsing 3.3

Formal Languages, Grammars and Automata Lecture 5 Helle Hvid Hansen helle@cs.ru.nl

Week 4 Kullmann Graphs and directed graphs Elementary Graph Algorithms Representing graphs

Graphs Graphs Examples Definitions Implementation/Representation of graphs Graphs

On some classes of Deza graphs Deza graphs without 3-cocliques Line graphs V.V. Kabanov 1 Deza

Automata and Formal Languages Grammars Regular Expressions Example of Peter Wood Research

Automata and Formal Languages Grammars Regular Expressions Example of Peter Wood Research

L95: Introduction to Natural Language Syntax and Parsing Lecture 9: Pragmatics Simone Teufel

Session State Objectives Identify and discuss some complicating factors in enterprise

Using Gaussian Mixture Models to Detect Figurative Language in Context Linlin Li and Caroline

T YPES OF M ODELS Prasun Dewan Department of Computer Science University of North Carolina at

NATIONAL FOOD HUB SURVEY 2015 November 19, 2015 Presentation Outline Technical Orientation

Planning and Theorem Proving Slides by Svetlana Lazebnik, 9/2016 with modifications by Mark

Individual Rights to Privacy versus the needs of the State Bob Ayers UNIVERSAL DECLARATION OF

THE INVESTMENT APPROACH AND SOCIAL INVESTMENT : WHA T'S TO LIKE, WHA T'S TO WORRY ABOUT Bill

CSC 473 Automata, Grammars & Languages 9/29/10 Automata, Grammars and Languages Discourse 03

CSC 473 Automata, Grammars & Languages 11/9/10 Automata, Grammars and Languages Discourse 06

CSC 473 Automata, Grammars & Languages 8/15/10 Automata, Grammars and Languages Discourse 01