Chapter II.2: Basic Probability Theory and Statistics 1. What is a - PowerPoint PPT Presentation

Chapter II.2: Basic Probability Theory and Statistics 1. What is a probability? 1.1. Probability spaces, events, and random variables 2. Distributions 2.1. Discrete distributions 2.2. Continuous distributions 3. Moments, independence, and Bayes’ rule 3.1. Expectation, variance, and higher moments 3.2. Independence 3.3. Bayes’ rule 4. Bounds and convergence 5. Statistical inference Wasserman, Ch. 1–5 IR&DM, WS'13/14 24 October 2013 II.2- 1

What is a probability • “If I throw a dice, I will probably get 4 or less” • “I’ll probably go running after this lecture” • The term “probability” here means different things – The outcome of a repeatable experiment – My personal belief IR&DM, WS'13/14 24 October 2013 II.2- 2

Views on probability • In classical definition, probability is equally shared among all outcomes, provided the outcomes are equally likely – “Equally likely” is decided based on physical symmetries or the like • In frequentism , a probability is the frequency of which something happens over repeated experiments – Requires infinite number of repetitions • In subjectivism ( Bayesianism ), probability refers to my subjective “degree of belief” – But everybody’s belief is different IR&DM, WS'13/14 24 October 2013 II.2- 3

Axiomatic approach: sample spaces and events • A sample space Ω is a set of all possible outcomes of an experiment – Element e ∈ Ω is a sample outcome or realization • Subsets E ⊆ Ω are events • Examples: – If we toss a coin twice, Ω = {HH, HT, TH, TT} • Event “Second toss is tails” is A = {HT, TT} – If we toss a coin until we get tails, Ω = {T, HT, HHT, HHHT, HHHHT, HHHHHT, …} – If we measure a temperature in Kelvins, Ω = { x ∈ ℝ , x ≥ 0} IR&DM, WS'13/14 24 October 2013 II.2- 4

Axiomatic approach: probability measures • Collection 𝒝 ⊆ 2 Ω is a σ - algebra of Ω if – Ω ∈ 𝒝 – If A ∈ 𝒝 , then ( Ω \ A ) ∈ 𝒝 – If A 1 , A 2 , A 3 , … ∈ 𝒝 , then ( ∪ i A i ) ∈ 𝒝 • Function Pr: 𝒝 → [0, 1] is a probability measure if – Axiom 1: Pr[ A ] ≥ 0 for every A ∈ 𝒝 – Axiom 2: Pr[ Ω ] = 1 – Axiom 3: If A 1 , A 2 , … are disjoint, then Pr[ ∪ i A i ] = ∑ i Pr[ A i ] (countably many A i s) IR&DM, WS'13/14 24 October 2013 II.2- 5

Intermission: some combinatorics • The power set of a set A , 2 A (or 𝒬 ( A )) is a collection of all subsets of A – If A = {1, 2, 3}, then 2 A = { ∅ , {1}, {2}, {3}, {1, 2}, {1, 3}, {2, 3}, {1, 2, 3}} – The size of the power set is 2 | A | • If A is finite, this is a natural number • If A = ℕ , this is the same cardinality as the real numbers • If A = ℝ , this is the next cardinal number • The number of size- k subsets of A is ✓ | A | ◆ | A | ! = k ! ( | A | − k ) ! k IR&DM, WS'13/14 24 October 2013 II.2- 6

Axiomatic approach: probability spaces and further properties • A probability space is a triple ( Ω , 𝒝 , Pr) – 𝒝 contains all the events we can assign a probability • If Ω is finite or countably infinite, we can have 𝒝 = 2 Ω • If Ω is uncountable, it contains sets that cannot have probability (unmeasurable sets) • From the axioms we can derive that – Pr[ ∅ ] = 0 – If A ⊆ B , then Pr[ A ] ≤ Pr[ B ] – Pr[ Ω \ A ] = 1 – Pr[ A ] – Pr[ A ∪ B ] = Pr[ A ] + Pr[ B ] – Pr[ A ∩ B ] IR&DM, WS'13/14 24 October 2013 II.2- 7

Axiomatic approach: random variables • A random variable ( r.v. ) is a function X : 𝒝 → ℝ such that { e ∈ Ω : X ( e ) ≤ r } ∈ 𝒝 for all r ∈ ℝ – This is needed to define probabilities like Pr[ a ≤ X ≤ b ] – Pr[ X = x ] is a shorthand for Pr[{ e ∈ Ω : X ( e ) = x }] • An r.v. is discrete if it takes at most countably infinite different discrete values – None of the complexities applies • An r.v. is continuous if it varies continuously in one or more intervals – These are the ones that cause problems IR&DM, WS'13/14 24 October 2013 II.2- 8

Example r.v.’s • Indicator variable 𝟚 E or χ E for event E ∈ 𝒝 – 𝟚 E ( x ) = 1 if x ∈ E and 𝟚 E ( x ) = 0 otherwise – Pr[ E ] = Pr[ 𝟚 E = 1] • Let r.v. X be the number of heads in 10 coin flips – If e = HTTTTTHHTT, then X ( e ) = 3 – Discrete r.v. • Let r.v. Y be the room temperature of my kitchen (in Celsius) – if e = “00:22 on 22 Oct”, then X ( e ) = 22,7 – Continuous r.v. IR&DM, WS'13/14 24 October 2013 II.2- 9

Some diagrams (1) • The Venn diagram is a way to visualize the combinatorial relationships of three sets A ∩ B A B A ∩ B ∩ C A ∩ C B ∩ C The inclusion–exclusion principle for C three sets: Pr[ A ∪ B ∪ C ] = Pr[ A ] + Pr[ B ] + Pr[ C ] – Pr[ A ∩ B ] – Pr[ A ∩ C ] – Pr[ B ∩ C ] + Pr[ A ∩ B ∩ C ] IR&DM, WS'13/14 24 October 2013 II.2- 10

Some diagrams (2) • R.v. X that takes finite number of values partitions the sample space into finite sets (the pre-image of X ) – If X is a roll of dice, we have E 1 = { e ∈ Ω : X ( e ) = 1} = X –1 (1), and similarly for E 2 , E 3 , …, E 6 – If Y is indicator variable for “X ≥ 2”, we get 1 2 0 3 4 1 5 6 IR&DM, WS'13/14 24 October 2013 II.2- 11

Distributions • The cumulative distribution function ( cdf ) of r.v. X is a function F X : ℝ → [0, 1], F X ( x ) = Pr[ X ≤ x ] • If X is discrete, the probability mass function ( pmf ) of X is f X ( x ) = Pr[ X = x ] • If X is continuous, the probability density function ( pdf ) of X is a function f X for which – f X ( x ) ≥ 0 for all x R ∞ – −∞ f X ( x ) d x = 1 R x – We have that F X ( x ) = −∞ f X ( t ) d t IR&DM, WS'13/14 24 October 2013 II.2- 12

Example of a CDF and PDF 1 0,75 CDF: 0,5 0,25 -5 -4 -3 -2 -1 0 1 2 3 4 5 0,5 0,4 0,3 PDF: 0,2 0,1 -5 -4 -3 -2 -1 0 1 2 3 4 5 IR&DM, WS'13/14 24 October 2013 II.2- 13

Some discrete distributions • Uniform distribution over {1, 2, …, m } – Pr[ X = k ] = 1/ m for 1 ≤ k ≤ m • Bernoulli distribution with parameter p – Binary, single coin toss – Pr[ X = k ] = p k (1 – p ) 1 – k for k ∈ {0, 1} • Binomial distribution with parameters p and n – n repeated Bernoulli experiments with parameter p p k ( 1– p ) n − k ⇣ n ⌘ – for 0 ≤ k ≤ n Pr[ X = k ] = k • Geometric distribution with parameter p – Pr[ X = k ] = (1 – p ) k p for k ≥ 0 • Poisson distribution with rate parameter λ – Pr[ X = k ] = e − λ λ k / k ! IR&DM, WS'13/14 24 October 2013 II.2- 14

Some continuous distributions • Uniform distribution in the interval [ a , b ] 1 – for x ∈ [ a , b ] f X ( x ) = b − a • Exponential distribution with rate λ – Time between two events in a Poisson process – for x ≥ 0 f X ( x ) = λ e − λ x • t -distribution with ν degrees of freedom – Typical distribution for test statistics ⌘ − ν + 1 Γ ( ν + 1 2 ) 1 + x 2 ⇣ – 2 f X ( x ) = √ νπ Γ ( ν 2 ) ν • χ 2 distribution with k degrees of freedom k 2 − 1 e − x 1 – f X ( x ) = 2 k / 2 Γ ( k / 2 ) x 2 IR&DM, WS'13/14 24 October 2013 II.2- 15

Normal (Gaussian) distribution • Two parameters, µ (mean) and σ 2 (variance) 2 πσ 2 e − ( x − µ ) 2 – 1 f X ( x ) = 2 σ 2 √ • For standard normal distribution µ = 0 and σ 2 = 1 • Many, many applications 1 0,5 0,4 0,75 0,3 0,5 0,2 0,25 0,1 -5 -4 -3 -2 -1 0 1 2 3 4 5 -5 -4 -3 -2 -1 0 1 2 3 4 5 • R.v. X is log-normally distributed if its logarithm is normally distributed IR&DM, WS'13/14 24 October 2013 II.2- 16

Multivariate distributions • If X and Y are two discrete variables, their joint mass function is f X,Y ( x , y ) = Pr[ X = x , Y = y ] – For continuous variables it is a non-negative function s.t. R R • f X , Y ( x , y ) = R f ( x , y ) d x d y = 1 R • for any A ∈ ℝ × ℝ , ! Pr[ ( X , Y ) ∈ A ] = A f X , Y ( x , y ) d x d y • The marginal distribution (mass function) for X is – for discrete X f X ( x ) = Pr[ X = x ] = P y f X , Y ( x , y ) – for continuous X R f X ( x ) = R f X , Y ( x , y ) d y • All these concepts extend naturally to more than two variables IR&DM, WS'13/14 24 October 2013 II.2- 17

Multivariate normal distribution • A.k.a. multidimensional Gaussian distribution • Two variables, vector µ and matrix Σ – For n variables, µ ∈ ℝ n and Σ ∈ ℝ n × n • The density function is n 1 2 ( x − µ ) T Σ − 1 ( x − mu ) o 1 f ( x ; µ , Σ ) = ( 2 π ) k / 2 | Σ | 1 / 2 exp • In the standard multivariate normal distribution, µ is all-zeros and Σ is the identity, giving n 1 o 1 2 x T x f ( x ) = ( 2 π ) k / 2 exp IR&DM, WS'13/14 24 October 2013 II.2- 18

Bivariate normal distribution IR&DM, WS'13/14 24 October 2013 II.2- 19

Independence, moments & Bayes’ • Two events A and B are independent if Pr[ A ∩ B ] = Pr[ A ]Pr[ B ] • Two r.v.’s X and Y are independent if f X,Y ( x, y ) = f X ( x ) f Y ( y ) for all x , y • The conditional probability of A given B is Pr[ A | B ] = Pr[ A ∩ B ]/Pr[ B ] – Assumes Pr[ B ] > 0 – If A and B are independent, Pr[ A | B ] = Pr[ A ] • The conditional pmf/pdf is f X | Y ( x | y ) = f X,Y ( x, y )/ f Y ( y ) – For independent X and Y , f X | Y ( x | y ) = f X ( x ) • A and B are conditionally independent given C if Pr[ A ∩ B | C ] = Pr[ A | C ]Pr[ B | C ] IR&DM, WS'13/14 24 October 2013 II.2- 20

Chapter II.2: Basic Probability Theory and Statistics 1. What is a - PowerPoint PPT Presentation

Chapter II.2: Basic Probability Theory and Statistics 1. What is a probability? 1.1. Probability spaces, events, and random variables 2. Distributions 2.1. Discrete distributions 2.2. Continuous distributions 3. Moments, independence, and

Recap of Basic Probability Elements of basic probability theory probability theory The

Chapter 2 Probability 1. Definition of Probability 2. Probability of disjoint events 3.

Chapter 2 Probability 1. Definition of Probability 2. Probability of disjoint events 3.

Probability Basics Martin Emms October 1, 2020 Probability Basics Outline Probability

Which probability Which probability Which probability Which probability theory for cosmology?

Continuing Probability. Wrap up: Total Probability and Conditional Probability. Continuing

Probability Basics Probability Background Martin Emms October 1, 2020 Probability Basics

Quick Tour of Basic Probability Theory and Linear Algebra CS224w: Social and Information Network

Chapter II: Basics from probability theory and statistics Information Retrieval & Data

Chapter 1: Probability Theory (a recap) STK4011/9011: Statistical Inference Theory Johan Pensar

Categorical Probability and Statistics Peter McCullagh Department of Statistics University of

Chapter 2: Basics from Probability Theory and Statistics 2.1 Probability Theory Events,

Probability Theory p ( E ) = p ( a 1 ) + p ( a 2 ) + ... + p ( a m ) 1 2 3 4 5 6 7 8 9 10 11 12 13

Counting and Probability Whats to come? Counting and Probability Whats to come?

ACMS 20340 Statistics for Life Sciences Chapter 9: Introducing Probability Why Consider

Chapter 3: Basics from Probability Theory and Statistics 3.1 Probability Theory Events,

Draft Simulation de chaines de Markov: briser le mur de la convergence en n 1 / 2 Pierre

tt ss str

Linear Regression II, SGD Milan Straka October 12, 2020 Charles University in Prague Faculty of

k -means clustering Method to automatically separate data sets into distinct groups. Clustering

Poisson Convergence Will Perkins February 28, 2013 Back to the Birthday Problem On HW # 2, you

Generalization theory Daniel Hsu Columbia TRIPODS Bootcamp 1 Motivation 2 Support vector

Asymptotics of representations of classical Lie groups Alexey Bufetov Department of Mathematics,

Representations of classical Lie groups: two regimes of growth Alexey Bufetov University of Bonn

Chapter II.2: Basic Probability Theory and Statistics 1. What is a - PowerPoint PPT Presentation

Chapter II.2: Basic Probability Theory and Statistics 1. What is a probability? 1.1. Probability spaces, events, and random variables 2. Distributions 2.1. Discrete distributions 2.2. Continuous distributions 3. Moments, independence, and

Recap of Basic Probability Elements of basic probability theory probability theory The

Chapter 2 Probability 1. Definition of Probability 2. Probability of disjoint events 3.

Chapter 2 Probability 1. Definition of Probability 2. Probability of disjoint events 3.

Probability Basics Martin Emms October 1, 2020 Probability Basics Outline Probability

Which probability Which probability Which probability Which probability theory for cosmology?

Continuing Probability. Wrap up: Total Probability and Conditional Probability. Continuing

Probability Basics Probability Background Martin Emms October 1, 2020 Probability Basics

Quick Tour of Basic Probability Theory and Linear Algebra CS224w: Social and Information Network

Chapter II: Basics from probability theory and statistics Information Retrieval &amp; Data

Chapter 1: Probability Theory (a recap) STK4011/9011: Statistical Inference Theory Johan Pensar

Categorical Probability and Statistics Peter McCullagh Department of Statistics University of

Chapter 2: Basics from Probability Theory and Statistics 2.1 Probability Theory Events,

Probability Theory p ( E ) = p ( a 1 ) + p ( a 2 ) + ... + p ( a m ) 1 2 3 4 5 6 7 8 9 10 11 12 13

Counting and Probability Whats to come? Counting and Probability Whats to come?

ACMS 20340 Statistics for Life Sciences Chapter 9: Introducing Probability Why Consider

Chapter 3: Basics from Probability Theory and Statistics 3.1 Probability Theory Events,

Draft Simulation de chaines de Markov: briser le mur de la convergence en n 1 / 2 Pierre

tt ss str

Linear Regression II, SGD Milan Straka October 12, 2020 Charles University in Prague Faculty of

k -means clustering Method to automatically separate data sets into distinct groups. Clustering

Poisson Convergence Will Perkins February 28, 2013 Back to the Birthday Problem On HW # 2, you

Generalization theory Daniel Hsu Columbia TRIPODS Bootcamp 1 Motivation 2 Support vector

Asymptotics of representations of classical Lie groups Alexey Bufetov Department of Mathematics,

Representations of classical Lie groups: two regimes of growth Alexey Bufetov University of Bonn

Chapter II: Basics from probability theory and statistics Information Retrieval & Data