Basic Statistics and Probability Theory Based on Foundations of - PowerPoint PPT Presentation

0. Basic Statistics and Probability Theory Based on “Foundations of Statistical NLP” C. Manning & H. Sch¨ utze, ch. 2, MIT Press, 2002 “Probability theory is nothing but common sense reduced to calculation.” Pierre Simon, Marquis de Laplace (1749-1827)

1. PLAN 1. Elementary Probability Notions: • Event Space, and Probability Function • Conditional Probabiblity • Bayes’ Theorem • Independence of Probabilistic Events 2. Random Variables: • Discrete Variables and Continuous Variables • Mean, Variance and Standard Deviation • Standard Distributions • Joint, Marginal and and Conditional Distributions • Independence of Random Variables 3. Limit Theorems 4. Estimating the parameters of probab. models from data 5. Elementary Information Theory

2. 1. Elementary Probability Notions • sample/event space: Ω (either discrete or continuous) • event: A ⊆ Ω – the certain event: Ω – the impossible event: ∅ • event space: F = 2 Ω (or a subspace of 2 Ω that contains ∅ and is closed under complement and countable union) • probability function/distribution: P : F → [0 , 1] such that: – P (Ω) = 1 – the “countable additivity” property: ∀ A 1 , ..., A k disjoint events, P ( ∪ A i ) = � P ( A i ) Consequence: for a uniform distribution in a finite sample space: P ( A ) = # favorable events # all events

3. Conditional Probabiblity • P ( A | B ) = P ( A ∩ B ) P ( B ) Note: P ( A | B ) is called the a posteriory probability of A, given B. • The “multiplication” rule: P ( A ∩ B ) = P ( A | B ) P ( B ) = P ( B | A ) P ( A ) • The “chain” rule: P ( A 1 ∩ A 2 ∩ . . . ∩ A n ) = P ( A 1 ) P ( A 2 | A 1 ) P ( A 3 | A 1 , A 2 ) . . . P ( A n | A 1 , A 2 , . . . , A n − 1 )

5. Independence of Probabilistic Events • Independent events: P ( A ∩ B ) = P ( A ) P ( B ) Note: When P ( B ) � = 0 , the above definition is equivalent to P ( A | B ) = P ( A ) . • Conditionally independent events: P ( A ∩ B | C ) = P ( A | C ) P ( B | C ) , assuming, of course, that P ( C ) � = 0 . Note: When P ( B ∩ C ) � = 0 , the above definition is equivalent to P ( A | B, C ) = P ( A | C ) .

6. 2. Random Variables 2.1 Basic Definitions Let Ω be a sample space, and P : 2 Ω → [0 , 1] a probability function. • A random variable of distribution P is a function X : Ω → R n ◦ For now, let us consider n = 1 . ◦ The cumulative distribution function of X is F : R → [0 , ∞ ) defined by F ( x ) = P ( X ≤ x ) = P ( { ω ∈ Ω | X ( ω ) ≤ x } )

7. 2.2 Discrete Random Variables Definition: Let P : 2 Ω → [0 , 1] be a probability function, and X be a random variable of distribution P . • If Image ( X ) is either finite or unfinite countable, then X is called a discrete random variable. ◦ For such a variable we define the probability mass function (pmf) def p : R → [0 , 1] as p ( x ) = p ( X = x ) = P ( { ω ∈ Ω | X ( ω ) = x } ) . (Obviously, it follows that � x i ∈ Image ( X ) p ( x i ) = 1 .) Mean, Variance, and Standard Deviation: • Expectation / mean of X : = E [ X ] = � not. E ( X ) x xp ( x ) if X is a discrete random variable. not. = Var [ X ] = E (( X − E ( X )) 2 ) . • Variance of X : Var ( X ) � • Standard deviation: σ = Var ( X ) . Covariance of X and Y , two random variables of distribution P : • Cov ( X, Y ) = E [( X − E [ X ])( Y − E [ Y ])]

8. Exemplification: n p r (1 − p ) n − r ( 0 ≤ r ≤ n ) • the Binomial distribution: b ( r ; n, p ) = C r mean: np , variance: np (1 − p ) ◦ the Bernoulli distribution: b ( r ; 1 , p ) The probability mass function and the cumulative distribution function of the Binomial distribution:

9. 2.3 Continuous Random Variables Definitions: Let P : 2 Ω → [0 , 1] be a probability function, and X : Ω → R be a random variable of distribution P . • If Image ( X ) is unfinite non-countable set, and F , the cumulative distribution function of X is continuous, then X is called a continuous random variable. (It follows, naturally, that P ( X = x ) = 0 , for all x ∈ R .) � x ◦ If there exists p : R → [0 , ∞ ) such that F ( x ) = −∞ p ( t ) dt , then X is called absolutely continuous. In such a case, p is called the probability density function (pdf) of X . � ◦ For B ⊆ R for which B p ( x ) dx exists, � def = P ( { ω ∈ Ω | X ( ω ) ∈ B } ) = Pr ( B ) B p ( x ) dx . � + ∞ • In particular, −∞ p ( x ) dx = 1 . � not. • Expectation / mean of X : E ( X ) = E [ X ] = xp ( x ) dx .

10. Exemplification: − ( x − µ ) 2 2 πσ e √ 1 2 σ 2 • Normal (Gaussean) distribution: N ( x ; µ, σ ) = mean: µ , variance: σ 2 ◦ Standard Normal distribution: N ( x ; 0 , 1) • Remark: For n, p such that np (1 − p ) > 5 , the Binomial distributions can be approximated by Normal distributions.

11. The Normal distribution: the probability density function and the cumulative distribution function

12. 2.4 Basic Properties of Random Variables Let P : 2 Ω → [0 , 1] be a probability function, X : Ω → R n be a random discrete/continuous variable of distribution P . • If g : R n → R m is a function, then g ( X ) is a random variable. If g ( X ) is discrete, then E ( g ( X )) = � x g ( x ) p ( x ) . � If g ( X ) is continuous, then E ( g ( X )) = g ( x ) p ( x ) dx . • E ( aX + b ) = aE ( X ) + b . ◦ If g is non-linear �⇒ E ( g ( X )) = g ( E ( X )) . • E ( X + Y ) = E ( X ) + E ( Y ) . • Var ( X ) = E ( X 2 ) − E 2 ( X ) . ◦ Var ( aX ) = a 2 Var ( X ). • Cov ( X, Y ) = E [ XY ] − E [ X ] E [ Y ] .

13. 2.5 Joint, Marginal and Conditional Distributions Exemplification for the bi-variate case: Let Ω be a sample space, P : 2 Ω → [0 , 1] a probability function, and V : Ω → R 2 be a random variable of distribution P . One could naturally see V as a pair of two random variables X : Ω → R and Y : Ω → R . (More precisely, V ( ω ) = ( x, y ) = ( X ( ω ) , Y ( ω )) .) • the joint pmf/pdf of X and Y is defined by not. p ( x, y ) = p X,Y ( x, y ) = P ( X = x, Y = y ) = P ( ω ∈ Ω | X ( ω ) = x, Y ( ω ) = y ) . • the marginal pmf/pdf functions of X and Y are: for the discrete case: p X ( x ) = � p Y ( y ) = � y p ( x, y ) , x p ( x, y ) for the continuous case: � � p X ( x ) = y p ( x, y ) dy , p Y ( y ) = x p ( x, y ) dx • the conditional pmf/pdf of X given Y is: p X | Y ( x | y ) = p X,Y ( x, y ) p Y ( y )

14. 2.6 Independence of Random Variables Definitions: • Let X, Y be random variables of the same type (i.e. either discrete or continuous), and p X,Y their joint pmf/pdf. X and Y are are said to be independent if p X,Y ( x, y ) = p X ( x ) · p Y ( y ) for all possible values x and y of X and Y respectively. • Similarly, let X, Y and Z be random variables of the same type, and p their joint pmf/pdf. X and Y are conditionally independent given Z if p X,Y | Z ( x, y | z ) = p X | Z ( x | z ) · p Y | Z ( y | z ) for all possible values x, y and z of X, Y and Z respectively.

15. Properties of random variables pertaining to independence • If X, Y are independent, then Var ( X + Y ) = Var ( X ) + Var ( Y ). • If X, Y are independent, then E ( XY ) = E ( X ) E ( Y ) , i.e. Cov ( X, Y ) = 0 . ◦ Cov ( X, Y ) = 0 �⇒ X, Y are independent. ◦ The covariance matrix corresponding to a vector of random variables is symmetric and positive semi-definite. • If the covariance matrix of a multi-variate Gaussian distribution is diagonal, then the marginal distributions are independent.

16. 3. Limit Theorems [ Sheldon Ross, A first course in probability, 5th ed., 1998 ] “The most important results in probability theory are limit theorems. Of these, the most important are... laws of large numbers, concerned with stating conditions under which the average of a sequence of random variables converge (in some sense) to the expected average; central limit theorems, concerned with determining the conditions under which the sum of a large number of random variables has a probability distribution that is approximately normal.”

17. Two basic inequalities and the weak law of large numbers Markov’s inequality: If X is a random variable that takes only non-negative values, then for any value a > 0 , P ( X ≥ a ) ≤ E [ X ] a Chebyshev’s inequality: If X is a random variable with finite mean µ and variance σ 2 , then for any value k > 0 , P ( | X − µ |≥ k ) ≤ σ 2 k 2 The weak law of large numbers (Bernoulli; Khintchine): Let X 1 , X 2 , . . . , X n be a sequence of independent and identically dis- tributed random variables, each having a finite mean E [ X i ] = µ . Then, for any value ǫ > 0 , �� X 1 + . . . + X n � � − µ � ≥ ǫ → 0 as n → ∞ P � � � n

Basic Statistics and Probability Theory Based on Foundations of - PowerPoint PPT Presentation

0. Basic Statistics and Probability Theory Based on Foundations of Statistical NLP C. Manning & H. Sch utze, ch. 2, MIT Press, 2002 Probability theory is nothing but common sense reduced to calculation. Pierre Simon,

Recap of Basic Probability Elements of basic probability theory probability theory The

Probability Basics Martin Emms October 1, 2020 Probability Basics Outline Probability

Which probability Which probability Which probability Which probability theory for cosmology?

Continuing Probability. Wrap up: Total Probability and Conditional Probability. Continuing

Chapter 2 Probability 1. Definition of Probability 2. Probability of disjoint events 3.

Probability Basics Probability Background Martin Emms October 1, 2020 Probability Basics

Chapter 2 Probability 1. Definition of Probability 2. Probability of disjoint events 3.

Chapter II.2: Basic Probability Theory and Statistics 1. What is a probability? 1.1. Probability

Quick Tour of Basic Probability Theory and Linear Algebra CS224w: Social and Information Network

Categorical Probability and Statistics Peter McCullagh Department of Statistics University of

Probability Theory p ( E ) = p ( a 1 ) + p ( a 2 ) + ... + p ( a m ) 1 2 3 4 5 6 7 8 9 10 11 12 13

Counting and Probability Whats to come? Counting and Probability Whats to come?

Unit 2: Probability and distributions Lecture 1: Probability and conditional probability

CS 630 Basic Probability and Information Theory Tim Campbell 21 January 2003 Probability

Probability statistics So, understand some basic probability Chapters 4 & 5 Also,

Chapter 2: Basics from Probability Theory and Statistics 2.1 Probability Theory Events,

Gibbs Sampling Bayesian Networks: A First Attempt with Cilk++ Alexander Dubbs May 13, 2010

Introduction to the Read Paper Young Statisticians Section Mark Girolami Department of

Stationary states in 2D systems driven by L evy noises Bart lomiej Dybiec and Krzysztof

with population imbalance Shoichiro Tsutsui (RIKEN Nishina Center for Accelerator-Based Science)

webinar series NM SMART Grid Center Student Research Spotlight Presenters: Jeewon Choi (UNM),

TDDD17 Informatjon Security Topic: Database Privacy Olaf Hartjg olaf.hartjg@liu.se

Laplace Sanitizer Claire McKay Bowen Postdoctoral Researcher, Los Alamos National Laboratory

Releasing Search Queries and Clicks Privately Arne Bayer July 24, 2017 Arne Bayer Releasing

Sambuz

Useful Links

Newsletter

Mail Us

Basic Statistics and Probability Theory Based on Foundations of - PowerPoint PPT Presentation

0. Basic Statistics and Probability Theory Based on Foundations of Statistical NLP C. Manning & H. Sch utze, ch. 2, MIT Press, 2002 Probability theory is nothing but common sense reduced to calculation. Pierre Simon,

Recap of Basic Probability Elements of basic probability theory probability theory The

Probability Basics Martin Emms October 1, 2020 Probability Basics Outline Probability

Which probability Which probability Which probability Which probability theory for cosmology?

Continuing Probability. Wrap up: Total Probability and Conditional Probability. Continuing

Chapter 2 Probability 1. Definition of Probability 2. Probability of disjoint events 3.

Probability Basics Probability Background Martin Emms October 1, 2020 Probability Basics

Chapter 2 Probability 1. Definition of Probability 2. Probability of disjoint events 3.

Chapter II.2: Basic Probability Theory and Statistics 1. What is a probability? 1.1. Probability

Quick Tour of Basic Probability Theory and Linear Algebra CS224w: Social and Information Network

Categorical Probability and Statistics Peter McCullagh Department of Statistics University of

Probability Theory p ( E ) = p ( a 1 ) + p ( a 2 ) + ... + p ( a m ) 1 2 3 4 5 6 7 8 9 10 11 12 13

Counting and Probability Whats to come? Counting and Probability Whats to come?

Unit 2: Probability and distributions Lecture 1: Probability and conditional probability

CS 630 Basic Probability and Information Theory Tim Campbell 21 January 2003 Probability

Probability statistics So, understand some basic probability Chapters 4 &amp; 5 Also,

Chapter 2: Basics from Probability Theory and Statistics 2.1 Probability Theory Events,

Gibbs Sampling Bayesian Networks: A First Attempt with Cilk++ Alexander Dubbs May 13, 2010

Introduction to the Read Paper Young Statisticians Section Mark Girolami Department of

Stationary states in 2D systems driven by L evy noises Bart lomiej Dybiec and Krzysztof

with population imbalance Shoichiro Tsutsui (RIKEN Nishina Center for Accelerator-Based Science)

webinar series NM SMART Grid Center Student Research Spotlight Presenters: Jeewon Choi (UNM),

TDDD17 Informatjon Security Topic: Database Privacy Olaf Hartjg olaf.hartjg@liu.se

Laplace Sanitizer Claire McKay Bowen Postdoctoral Researcher, Los Alamos National Laboratory

Releasing Search Queries and Clicks Privately Arne Bayer July 24, 2017 Arne Bayer Releasing

Sambuz

Useful Links

Newsletter

Mail Us

Probability statistics So, understand some basic probability Chapters 4 & 5 Also,