Conditional Random Fields LING 572 Advanced Statistical Methods in - PowerPoint PPT Presentation

Conditional Random Fields LING 572 Advanced Statistical Methods in NLP February 11, 2020 1

Announcements ● HW4 grades out: 93.1 mean ● HW6 posted later today ● Implement beam search ● Note: pay attention to data format + feature vectors (in test time situation) ● Reading #2 posted! ● Due Feb 18 at 11AM 2

Highlights ● CRF is a form of undirected graphical model ● Proposed by Lafferty, McCallum and Pereira in 2001 ● Used in many NLP tasks: e.g., Named-entity detection ● Often conjoined with neural models, e.g. LSTM + CRF ● Types: ● Linear-chain CRF ● Skip-chain CRF ● General CRF 3

Outline ● Graphical models ● Linear-chain CRF ● Skip-chain CRF 4

Graphical models 5

Graphical model ● A graphical model is a probabilistic model for which a graph denotes the conditional independence structure between random variables: ● Nodes: random variables ● Edges: dependency relation between random variables ● Types of graphical models: ● Bayesian network: directed acyclic graph (DAG) ● Markov random fields: undirected graph 6

Bayesian network 7

Bayesian network ● Graph: directed acyclic graph (DAG) ● Nodes: random variables ● Edges: conditional dependencies ● Each node X is associated with a probability function P ( X | parents ( X )) ● Learning and inference: efficient algorithms exist. 8

An example   (from http://en.wikipedia.org/wiki/Bayesian_network) P(rain) P(sprinkler | rain) P(grassWet | sprinkler, rain) 9

Another example P(E) B E P(B) P(A|B, E) D A P(D|E) C P(C|A) 10

Bayesian network: properties 11

E B D A C 12

Naïve Bayes Model Y … f n f 2 f 1 13

HMM … X n+1 X 2 X 3 X 1 o n o 1 o 2 ● State sequence: X 1,n+1 ● Output sequence: O 1,n n ∏ P ( X i +1 | X i ) P ( O i | X i +1 ) P ( O 1: n , X 1: n +1 ) = π ( X 1 ) i =1 14

Generative model ● A directed graphical model in which the output (i.e., what to predict) topologically precedes the input (i.e., what is given as observation). ● Naïve Bayes and HMM are generative models. 15

Markov Random Field 16

Markov random field ● Also called “Markov network” ● A graphical model in which a set of random variables have a Markov property: ● Local Markov property: A variable is conditionally independent of all other variables given its neighbors. 17

Cliques ● A clique in an undirected graph is a subset of its vertices such that every two vertices in the subset are connected by an edge. ● A maximal clique is a clique that cannot be extended by adding one more vertex. ● A maximum clique is a clique of the largest possible size in a given graph. A clique: C B maximum clique: maximal clique: D E 18

Clique factorization A B C E D 19

Conditional Random Field A CRF is a random field globally conditioned on the observation X. 20

Linear-chain CRF 21

Motivation ● Sequence labeling problem: e.g., POS tagging ● HMM: Find best sequence, but cannot use rich features ● MaxEnt: Use rich features, but may not find the best sequence ● Linear-chain CRF: HMM + MaxEnt 22

Relations between NB, MaxEnt, HMM, and CRF 23

Most Basic Linear-chain CRF 24

Linear-chain CRF (**) 25

Training and decoding λ j ● Training: estimate ● similar to the one used for MaxEnt ● Ex: L-BFGS ● Decoding: find the best sequence y ● similar to the one used for HMM ● Viterbi algorithm 26

Skip-chain CRF 27

Motivation ● Sometimes, we need to handle long-distance dependency, which is not allowed by linear-chain CRF ● An example: NE detection ● “Senator John Green … Green ran …” 28

Linear-chain CRF: Skip-chain CRF: 29

CRFs in Larger Models 30

CRFs in Larger Models 31

Source: NLP Progress 32

Summary ● Graphical models: ● Bayesian network (BN) ● Markov random field (MRF) ● CRF is a variant of MRF: ● Linear-chain CRF: HMM + MaxEnt ● Skip-chain CRF: can handle long-distance dependency ● General CRF ● Pros and cons of CRF: ● Pros: higher accuracy than HMM and MaxEnt ● Cons: training and inference can be very slow 33

Conditional Random Fields LING 572 Advanced Statistical Methods in - PowerPoint PPT Presentation

Conditional Random Fields LING 572 Advanced Statistical Methods in NLP February 11, 2020 1 Announcements HW4 grades out: 93.1 mean HW6 posted later today Implement beam search Note: pay attention to data format + feature vectors

Multiscale Conditional 1) Generalization of conditional random fields (CRF) to multiscale

Conditional Random Fields [Hanna M. Wallach, Conditional Random Fields: An Introduction,

Sequential Data Modeling - Conditional Random Fields Graham Neubig Nara Institute of Science and

Conditional Random Fields Dietrich Klakow Overview Sequence Labeling Bayesian Networks

Graphical Models - Part II Oliver Schulte - CMPT 726 Bishop PRML Ch. 8 Markov Random Fields

Visualization Visualization Height Fields and Contours Height Fields and Contours Scalar Fields

Markov random fields 2. conditional specifications 3. conditional auto-regression Rasmus

Conditional Random Fields Andrea Passerini passerini@disi.unitn.it Statistical relational

Part 4: Conditional Random Fields Sebastian Nowozin and Christoph H. Lampert Colorado Springs,

Random Numbers RANDOM VS PSEUDO RANDOM Truly Random numbers From Wolfram: A random number

Review: Conditional Probability Conditional Probability The conditional probability of event

11/15/16 Conditional distributions Let X and Y be discrete r.v.s. Conditional probability mass

Conditional quenched CLTs for random walks among random conductances Christophe Gallesco Nina

Limit theorems for excursion sets of stationary random fields Evgeny Spodarev | 23.01.2013 WIAS,

A conditional quenched CLT for random walks among random conductances on Z d Christophe Gallesco

Function Fields, Curves Introduction Function Fields vs. Curves and Global sections Function

Sampling regular directed graphs in polynomial time Catherine Greenhill School of Mathematics

Glauber dynamics for edge colorings of trees Michelle Delcourt 3 Marc Heinrich 1 Guillem Perarnau 2

Advanced Algorithms (XII) Shanghai Jiao Tong University Chihao Zhang May 25, 2020 Random Walk

Spanning trees of tree graphs Philippe Biane, CNRS-IGM-Universit e Paris-Est Firenze, May 18

Probabilistic Graphical Models CMSC 691 UMBC Two Problems for Graphical Models 1 ,

Graph-based Approaches for Analysing Team Interaction on the Example of Soccer Markus Brandt and

15-388/688 - Practical Data Science: Graph and network processing J. Zico Kolter Carnegie Mellon

Spectra of magnetic chain graphs Pavel Exner Doppler Institute for Mathematical Physics and