Introduction to Probability and Statistics Machine Translation - PowerPoint PPT Presentation

Introduction to Probability and Statistics Machine Translation Lecture 2 Instructor: Chris Callison-Burch TAs: Mitchell Stern, Justin Chiu Website: mt-class.org/penn

Last time ... 1) Formulate a model of pairs of sentences.   2) Learn an instance of the model from data .   3) Use it to infer translations of new inputs.

Why Probability? • Probability formalizes ... • the concept of models • the concept of data • the concept of learning • the concept of inference (prediction) Probability is expectation founded upon partial knowledge.

p ( x | partial knowledge) “Partial knowledge” is an apt description of   what we know about language and translation!

Probability Models • Key components of a probability model • The space of events ( Ω or 𝙏 ) • The assumptions about conditional independence / dependence among events • Functions assigning probability (density) to events • We will assume discrete distributions.

Events and Random Variables A random variable is a function from a random event from a set of possible outcomes ( Ω ) and a probability distribution ( 𝘲 ), a function from outcomes to probabilities. Ω = { 1 , 2 , 3 , 4 , 5 , 6 } X ( ω ) = ω ( 1 if x = 1 , 2 , 3 , 4 , 5 , 6 6 ρ X ( x ) = 0 otherwise

Events and Random Variables A random variable is a function from a random event from a set of possible outcomes ( Ω ) and a probability distribution ( 𝘲 ), a function from outcomes to probabilities. Ω = { 1 , 2 , 3 , 4 , 5 , 6 } ( 0 if ω ∈ { 2 , 4 , 6 } Y ( ω ) = 1 otherwise ( 1 if y = 0 , 1 2 ρ Y ( y ) = 0 otherwise

What is our event space? What are our random variables?

Probability Distributions A probability distribution ( 𝘲 X ) assigns probabilities to   the values of a random variable (X). There are a couple of philosophically different ways   to define probabilities, but we will give only the invariants in terms of random variables . X ρ X ( x ) = 1 x ∈ X ρ X ( x ) ≥ 0 ∀ x ∈ X Probability distributions of a random variable may be specified in a number of ways.

Specifying Distributions • Engineering/mathematical convenience • Important techniques in this course • Probability mass functions • Tables (“stupid multinomials”) • Log-linear parameterizations (maximum entropy, random field, multinomial logistic regression) • Construct random variables from other r.v.’s with known distributions

Sampling Notation x = 4 × z + 1 . 7 Expression y ∼ Distribution( θ ) Variable Distribution Random variable Parameter

Sampling Notation x = 4 × z + 1 . 7 y ∼ Distribution( θ ) Distribution Random variable Parameter

Sampling Notation x = 4 × z + 1 . 7 y ∼ Distribution( θ ) y 0 = y × x

Multivariate r.v.’s Probability theory is particularly useful because it lets   us reason about (cor)related and dependent events. A joint probability distribution is a probability   distribution over r.v.’s with the following form:  X ( ω ) � Z = Y ( ω ) ✓ x �◆ ✓ x �◆ X ≥ 0 ∀ x ∈ X , y ∈ Y = 1 ρ Z ρ Z y y x ∈ X ,y ∈ Y

Ω = { 1 , 2 , 3 , 4 , 5 , 6 } X ( ω ) = ω Ω = { (1 , 1) , (1 , 2) , (1 , 3) , (1 , 4) , (1 , 5) , (1 , 6) , (2 , 1) , (2 , 2) , (2 , 3) , (2 , 4) , (2 , 5) , (2 , 6) , (3 , 1) , (3 , 2) , (3 , 3) , (3 , 4) , (3 , 5) , (3 , 6) , (4 , 1) , (4 , 2) , (4 , 3) , (4 , 4) , (4 , 5) , (4 , 6) , (5 , 1) , (5 , 2) , (5 , 3) , (5 , 4) , (5 , 5) , (5 , 6) , (6 , 1) , (6 , 2) , (6 , 3) , (6 , 4) , (6 , 5) , (6 , 6) , } X ( ω ) = ω 1 Y ( ω ) = ω 2 ( 1 if ( x, y ) ∈ Ω 36 ρ X,Y ( x, y ) = 0 otherwise

Ω = { 1 , 2 , 3 , 4 , 5 , 6 } X ( ω ) = ω Ω = { (1 , 1) , (1 , 2) , (1 , 3) , (1 , 4) , (1 , 5) , (1 , 6) , (2 , 1) , (2 , 2) , (2 , 3) , (2 , 4) , (2 , 5) , (2 , 6) , (3 , 1) , (3 , 2) , (3 , 3) , (3 , 4) , (3 , 5) , (3 , 6) , (4 , 1) , (4 , 2) , (4 , 3) , (4 , 4) , (4 , 5) , (4 , 6) , (5 , 1) , (5 , 2) , (5 , 3) , (5 , 4) , (5 , 5) , (5 , 6) , (6 , 1) , (6 , 2) , (6 , 3) , (6 , 4) , (6 , 5) , (6 , 6) , } X ( ω ) = ω 1 Y ( ω ) = ω 2 ( x + y if ( x, y ) ∈ Ω 252 ρ X,Y ( x, y ) = 0 otherwise

Marginal Probability p ( X = x, Y = y ) = ρ X ( x, y ) X p ( X = x, Y = y 0 ) p ( X = x ) = y 0 = Y X p ( X = x 0 , Y = y ) p ( Y = y ) = x 0 = X Ω = { (1 , 1) , (1 , 2) , (1 , 3) , (1 , 4) , (1 , 5) , (1 , 6) , (2 , 1) , (2 , 2) , (2 , 3) , (2 , 4) , (2 , 5) , (2 , 6) , (3 , 1) , (3 , 2) , (3 , 3) , (3 , 4) , (3 , 5) , (3 , 6) , X p ( X = 4 , Y = y 0 ) p ( X = 4) = (4 , 1) , (4 , 2) , (4 , 3) , (4 , 4) , (4 , 5) , (4 , 6) , y 0 2 [1 , 6] (5 , 1) , (5 , 2) , (5 , 3) , (5 , 4) , (5 , 5) , (5 , 6) , (6 , 1) , (6 , 2) , (6 , 3) , (6 , 4) , (6 , 5) , (6 , 6) , } X p ( X = x 0 , Y = 3) p ( Y = 3) = x 0 2 [1 , 6]

( 1 if ( x, y ) ∈ Ω 36 ρ X,Y ( x, y ) = 0 otherwise Ω = { (1 , 1) , (1 , 2) , (1 , 3) , (1 , 4) , (1 , 5) , (1 , 6) , (2 , 1) , (2 , 2) , (2 , 3) , (2 , 4) , (2 , 5) , (2 , 6) , (3 , 1) , (3 , 2) , (3 , 3) , (3 , 4) , (3 , 5) , (3 , 6) , 36 = 1 6 (4 , 1) , (4 , 2) , (4 , 3) , (4 , 4) , (4 , 5) , (4 , 6) , 6 (5 , 1) , (5 , 2) , (5 , 3) , (5 , 4) , (5 , 5) , (5 , 6) , (6 , 1) , (6 , 2) , (6 , 3) , (6 , 4) , (6 , 5) , (6 , 6) , } ( x + y if ( x, y ) ∈ Ω 252 ρ X,Y ( x, y ) = 0 otherwise Ω = { (1 , 1) , (1 , 2) , (1 , 3) , (1 , 4) , (1 , 5) , (1 , 6) , (2 , 1) , (2 , 2) , (2 , 3) , (2 , 4) , (2 , 5) , (2 , 6) , (3 , 1) , (3 , 2) , (3 , 3) , (3 , 4) , (3 , 5) , (3 , 6) , 4 + 1 + 4 + 2 + 4 + 3 + 4 + 4 + 4 + 5 + 4 + 6 = 45 (4 , 1) , (4 , 2) , (4 , 3) , (4 , 4) , (4 , 5) , (4 , 6) , 252 252 (5 , 1) , (5 , 2) , (5 , 3) , (5 , 4) , (5 , 5) , (5 , 6) , (6 , 1) , (6 , 2) , (6 , 3) , (6 , 4) , (6 , 5) , (6 , 6) , }

Conditional Probability The conditional probability of one random variable given another is defined as follows: p ( X = x | Y = y ) = p ( X = x, Y = y ) = joint probability p ( Y = y ) marginal Given that p ( y ) 6 = 0 Conditional probability distributions are   useful for specifying joint distributions since: p ( x | y ) p ( y ) = p ( x, y ) = p ( y | x ) p ( x ) Why might this be useful?

Conditional Probability Distributions A conditional probability distribution is a   probability distribution over r.v.’s X and Y with the   form . ρ X | Y = y ( x ) X ρ X | Y = y ( x ) ∀ y ∈ Y =1 x ∈ X

Independence Two random variables are independent iff p ( X = x, Y = y ) = p ( X = x ) p ( Y = y ) Equivalently, (use def. of cond. prob to prove) p ( X = x | Y = y ) = p ( X = x ) Equivalently again: p ( Y = y | X = x ) = p ( Y = y ) “Knowing about X doesn’t tell me about Y”

( 1 if ( x, y ) ∈ Ω 36 ρ X,Y ( x, y ) = 0 otherwise Ω = { (1 , 1) , (1 , 2) , (1 , 3) , (1 , 4) , (1 , 5) , (1 , 6) , (2 , 1) , (2 , 2) , (2 , 3) , (2 , 4) , (2 , 5) , (2 , 6) , (3 , 1) , (3 , 2) , (3 , 3) , (3 , 4) , (3 , 5) , (3 , 6) , (4 , 1) , (4 , 2) , (4 , 3) , (4 , 4) , (4 , 5) , (4 , 6) , (5 , 1) , (5 , 2) , (5 , 3) , (5 , 4) , (5 , 5) , (5 , 6) , (6 , 1) , (6 , 2) , (6 , 3) , (6 , 4) , (6 , 5) , (6 , 6) , } ( x + y if ( x, y ) ∈ Ω 252 ρ X,Y ( x, y ) = 0 otherwise Ω = { (1 , 1) , (1 , 2) , (1 , 3) , (1 , 4) , (1 , 5) , (1 , 6) , (2 , 1) , (2 , 2) , (2 , 3) , (2 , 4) , (2 , 5) , (2 , 6) , (3 , 1) , (3 , 2) , (3 , 3) , (3 , 4) , (3 , 5) , (3 , 6) , (4 , 1) , (4 , 2) , (4 , 3) , (4 , 4) , (4 , 5) , (4 , 6) , (5 , 1) , (5 , 2) , (5 , 3) , (5 , 4) , (5 , 5) , (5 , 6) , (6 , 1) , (6 , 2) , (6 , 3) , (6 , 4) , (6 , 5) , (6 , 6) , }

Independence Independence has practical benefits . Think about how many parameters you need for a naive parameterization of vs and ρ X,Y ( x, y ) ρ Y ( y ) ρ X ( x ) vs O ( xy ) O ( x + y )

Independence • Some variables are independent In Nature • How do we know? • Some variables we pretend are independent for computational convenience • Examples? • Assuming independence is equivalent to letting our model “forget” something that happened in its past • What should we forget in language?

A Word About Data • When we formulate our models there will be two kinds of random variables: observed and latent • Observed: words, sentences(?), parallel corpora, web pages, formatting... • Latent: parameters, syntax, “meaning”, word alignments, translation dictionaries...

Introduction to Probability and Statistics Machine Translation - PowerPoint PPT Presentation

Introduction to Probability and Statistics Machine Translation Lecture 2 Instructor: Chris Callison-Burch TAs: Mitchell Stern, Justin Chiu Website: mt-class.org/penn Last time ... 1) Formulate a model of pairs of sentences. 2) Learn an

Probability Basics Martin Emms October 1, 2020 Probability Basics Outline Probability

Continuing Probability. Wrap up: Total Probability and Conditional Probability. Continuing

Chapter 2 Probability 1. Definition of Probability 2. Probability of disjoint events 3.

Probability Basics Probability Background Martin Emms October 1, 2020 Probability Basics

Chapter 2 Probability 1. Definition of Probability 2. Probability of disjoint events 3.

Categorical Probability and Statistics Peter McCullagh Department of Statistics University of

Counting and Probability Whats to come? Counting and Probability Whats to come?

Unit 2: Probability and distributions Lecture 1: Probability and conditional probability

Statistics 1B Statistics 1B 1 (11) 0. Lecture 1. Introduction and probability review

Which probability Which probability Which probability Which probability theory for cosmology?

Recap of Basic Probability Elements of basic probability theory probability theory The

1 2 3 4 Stopping Probability Visiting Probability 5 Stopping

ACMS 20340 Statistics for Life Sciences Chapter 9: Introducing Probability Why Consider

Statistics 370 Probability and Statistics for Engineers Instructor: Peter Bloomfield Course

Chapter II.2: Basic Probability Theory and Statistics 1. What is a probability? 1.1. Probability

Official Statistics Matt Dray, Assistant Statistician Official Statistics 2 Official

Verification of security protocols: from confidentiality to privacy Stphanie Delaune LSV, CNRS

CSCI261C/E Lecture 3: C++ Fundamentals (continued) August 31, 2011 ? Alan Turing 1912 - 1954

CS3102 Theory of Computation www.cs.virginia.edu/~njb2b/cstheory/s2020 Warm up: Is there

The Enigma Cipher Machine Kadri Hendla University of Tartu kadri h@ut.ee Research Seminar in

Behind the Scenes Programming for Engineers Winter 2015 Andreas Zeller, Saarland University

Event Pitch vadim@grammarware.net Hackers & Founders Vadim Zaytsev, SWAT, CWI 2012 CWI

Verification of security protocols: from confidentiality to privacy Stphanie Delaune LSV, CNRS

Summary Security: Applications & Aspects Part I Cryptographic Features Introduction

Introduction to Probability and Statistics Machine Translation - PowerPoint PPT Presentation

Introduction to Probability and Statistics Machine Translation Lecture 2 Instructor: Chris Callison-Burch TAs: Mitchell Stern, Justin Chiu Website: mt-class.org/penn Last time ... 1) Formulate a model of pairs of sentences. 2) Learn an

Probability Basics Martin Emms October 1, 2020 Probability Basics Outline Probability

Continuing Probability. Wrap up: Total Probability and Conditional Probability. Continuing

Chapter 2 Probability 1. Definition of Probability 2. Probability of disjoint events 3.

Probability Basics Probability Background Martin Emms October 1, 2020 Probability Basics

Chapter 2 Probability 1. Definition of Probability 2. Probability of disjoint events 3.

Categorical Probability and Statistics Peter McCullagh Department of Statistics University of

Counting and Probability Whats to come? Counting and Probability Whats to come?

Unit 2: Probability and distributions Lecture 1: Probability and conditional probability

Statistics 1B Statistics 1B 1 (11) 0. Lecture 1. Introduction and probability review

Which probability Which probability Which probability Which probability theory for cosmology?

Recap of Basic Probability Elements of basic probability theory probability theory The

1 2 3 4 Stopping Probability Visiting Probability 5 Stopping

ACMS 20340 Statistics for Life Sciences Chapter 9: Introducing Probability Why Consider

Statistics 370 Probability and Statistics for Engineers Instructor: Peter Bloomfield Course

Chapter II.2: Basic Probability Theory and Statistics 1. What is a probability? 1.1. Probability

Official Statistics Matt Dray, Assistant Statistician Official Statistics 2 Official

Verification of security protocols: from confidentiality to privacy Stphanie Delaune LSV, CNRS

CSCI261C/E Lecture 3: C++ Fundamentals (continued) August 31, 2011 ? Alan Turing 1912 - 1954

CS3102 Theory of Computation www.cs.virginia.edu/~njb2b/cstheory/s2020 Warm up: Is there

The Enigma Cipher Machine Kadri Hendla University of Tartu kadri h@ut.ee Research Seminar in

Behind the Scenes Programming for Engineers Winter 2015 Andreas Zeller, Saarland University

Event Pitch vadim@grammarware.net Hackers &amp; Founders Vadim Zaytsev, SWAT, CWI 2012 CWI

Verification of security protocols: from confidentiality to privacy Stphanie Delaune LSV, CNRS

Summary Security: Applications &amp; Aspects Part I Cryptographic Features Introduction

Event Pitch vadim@grammarware.net Hackers & Founders Vadim Zaytsev, SWAT, CWI 2012 CWI

Summary Security: Applications & Aspects Part I Cryptographic Features Introduction