[PPT] - Review of Probability 1 Probability Theory: Many techniques in PowerPoint Presentation

SLIDE 1

1

Review of Probability

SLIDE 2

2

Probability Theory:

 Many techniques in speech processing

require the manipulation of probabilities and statistics.

 The two principal application areas we will

encounter are:

Statistical pattern recognition. Modeling of linear systems.

SLIDE 3

3

Events:

 It is customary to refer to the probability of

an event.

 An event is a certain set of possible

utcomes of an experiment or trial.

 Outcomes are assumed to be mutually

exclusive and, taken together, to cover all possibilities.

SLIDE 4

4

Axioms of Probability:

 To any event A we can assign a number,

P(A), which satisfies the following axioms:

P(A)≥0. P(S)=1. If A and B are mutually exclusive, then

P(A+B)=P(A)+P(B).

 The number P(A) is called the probability

f A.

SLIDE 5

5

Axioms of Probability (some consequence):

 Some immediate consequence:

If is the complement of A, then

 

P(0) ,the probability of the impossible event, is 0. P(A) ≤ 1.

 If two event A and B are not mutually

exclusive, we can show that

P(A+B)=P(A)+P(B)-P(AB).

A

S A A   ) ( ) ( 1 ) ( A P A P  

SLIDE 6

6

Conditional Probability:

 The conditional probability of an event A,

given that event B has occurred, is defined as:

 We can infer P(B|A) by means of Bayes’

theorem:

) ( ) ( ) | ( B P AB P B A P  ) ( ) ( ) | ( ) | ( A P B P B A P A B P 

SLIDE 7

7

Independence:

 Events A and B may have nothing to do with

each other and they are said to be independent.

 Two events are independent if

P(AB)=P(A)P(B).

 From the definition of conditional probability:

) ( ) | ( A P B A P  ) ( ) | ( B P A B P  ) ( ) ( ) ( ) ( ) ( B P A P B P A P B A P    

SLIDE 8

8

Independence:

 Three events A,B and C are independent

nly if:

           ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( C P B P A P ABC P C P B P BC P C P A P AC P B P A P AB P

SLIDE 9

9

Random Variables:

 A random variable is a number chosen at

random as the outcome of an experiment.

 Random variable may be real or complex

and may be discrete or continuous.

In S.P. ,the random variable encounter are

most often real and discrete.

 We can characterize a random variable by

its probability distribution or by its probability density function (pdf).

SLIDE 10

10

Random Variables (distribution function):

 The distribution function for a random

variable y is the probability that y does not exceed some value u,

 and

) ( ) ( u y P u Fy   ) ( ) ( ) ( u F v F v y u P

y y

   

SLIDE 11

11

Random Variables (probability density function):

 The probability density function is the

derivative of the distribution:

 and,  

) ( ) ( u F du d u f

y y





  

v u y

dy y f v y u P ) ( ) (

1 ) (  

y

F

1 ) ( 



  

dy y f y

SLIDE 12

12

Random Variables (expected value):

 We can also characterize a random

variable by its statistics.

 The expected value of g(x) is written

E{g(x)} or <g(x)> and defined as

 Continuous random variable:  Discrete random variable:



  

  dx x f x g x g ) ( ) ( ) (



 

x

x p x g x g ) ( ) ( ) (

SLIDE 13

13

Random Variables (moments):

 The statistics of greatest interest are the

moment of X.

 The kth moment of X is the expected value

f .

 For a discrete random variable:

k

X



 

x k k k

x p x X m ) (

SLIDE 14

14

Random Variables (mean & variance):

 The first moment, ,is the mean of x.

Continuous: Discrete:

 The second central moment, also known

as the variance of p(x), is given by

1

m

m = X =< X >= xp(x)

x

å



  

 dx x xf X ) (

2 2 2 2

) ( ) ( X m x p x x

x

     

SLIDE 15

15

Random Variables …:

 To estimate the statistics of a random

variable, we repeat the experiment which generates the variable a large number of times.

If the experiment is run N times, then each

value x will occur Np(x) times, thus





N i i x

x N

1

1 ˆ 





N i k i k

x N m

1

1 ˆ

SLIDE 16

16

Random Variables (Uniform density):

 A random variable has a uniform density

n the interval (a, b) if :

  

      

therwise

, ), /( 1 ) ( b x a a b x f X

            b x b x a a b a x a x x FX , 1 ), /( ) ( , ) (

2 2

) ( 12 1 a b   

SLIDE 17

17

Random Variables (Gaussian density):

 The Gaussian, or normal density function

is given by:

2 2 2

/ ) (

2 1 ) , ; (

 

   

 



x

e x n

SLIDE 18

18

Random Variables (…Gaussian density):

 The distribution function of a normal

variable is:

 If we define error function as  Thus,

du u n x N

x

 



 ) , ; ( ) , ; (    

du e x erf

x u

 

 



2 /

2

2 1 ) ( 

) ( 1 ) , ; (        x erf x N

SLIDE 19

19

Two Random Variables:

 If two random variables x and y are to be

considered together, they can be described in terms of their joint probability density f(x, y) or, for discrete variables, p(x, y).

 Two random variable are independent if



) ( ) ( ) , ( y p x p y x p 

SLIDE 20

20

Two Random Variables(…Continue):

 Given a function g(x, y), its expected

value is defined as:

 Continuous:  Discrete:  And joint moment for two discrete random variable is:

 

     

  dxdy y x f y x g y x g ) , ( ) , ( ) , (



 

y x

y x p y x g y x g

,

) , ( ) , ( ) , (





y x j i ij

y x p y x m

,

) , (

SLIDE 21

21

Two Random Variables(…Continue):

 Moments are estimated in practice by averaging

repeated measurements:

 A measure of the dependence of two random

variables is their correlation and the correlation

f two variables is their joint second moment:



 

y x

y x xyp xy m

, 11

) , (

j N i ij

y x N m

  





1

1 ˆ

SLIDE 22

22

Two Random Variables(…Continue):

 The joint second central moment of x , y is

their covariance:

 If x and y are independent then their covariance is zero.

 The correlation coefficient of x and y is

their covariance normalized to their standard deviations:

y x xy xy

r     y x m y y x x

xy

    

11

) )( ( 

SLIDE 23

23

Two Random Variables(…Gaussian Random Variable):

 Two random variables x and y are jointly

Gaussian if their density function is :

Where

y x xy xy

r    

                     

2 2 2 2 2 2

2 ) 1 ( 2 1 exp 1 2 1 ) , (

y y x x y x

y rxy x r r y x n      

SLIDE 24

24

Two Random Variables(…Sum of Random Variables):

 The expected value of the sum of two

random variables is :

 This is true whether x and y are independent or not

And also we have :

 

  

i i i i

x x

       y x y x     x c cx

SLIDE 25

25

Two Random Variables(…Sum of Random Variable):

 The variance of the sum of the two independent

random variable is :



 If two random variable are independent, the

probability density of their sum is the convolution

f the densities of the individual variables :

 Continuous:  Discrete: 2 2 2 y x y x

    





   

  du u z f u f z f

y x y x

) ( ) ( ) (



   

 

u y x y x

u z p u p z p ) ( ) ( ) (

SLIDE 26

26

Central Limit Theorem

 Central Limit Theorem (informal

paraphrase): If many independent random variables are summed, the probability density function (pdf) of the sum tends toward the Gaussian density, no matter what their individual densities are.

SLIDE 27

27

Multivariate Normal Density

 The normal density function can be generalized

to any number of random variables.

 Let X be the random vector,  Where  The matrix R is the covariance matrix of X

(R is Positive-Definite)

        

 

) ( 2 1 exp | | ) 2 ( ) (

2 / 1 2 /

x x Q R x N

n



) ( ) ( ) (

1

x x R x x x x Q

T

   



   

T

x x x x R ) )( (

] ,..., , [

2 1 n

X X X Col

SLIDE 28

28

Random Functions :

 A random function is one arising as the

utcome of an experiment.

 Random functions do not need to be

functions of time, but in all cases of interest to us they will be.

 A discrete stochastic process is

characterized by many probability density functions of the form,

) ,..., , , , ,..., , , (

3 2 1 3 2 1 n n

t t t t x x x x p

SLIDE 29

29

Random Functions :

 If the individual values of the random

signal are independent, then

 If these individual probability densities are

all the same, then we have a sequence of independent, identically distributed samples (i.i.d.). ) , ( )... , ( ) , ( ) ,..., , , ,..., , (

2 2 1 1 2 1 2 1 n n n n

t x p t x p t x p t t t x x x p 

SLIDE 30

30

mean & autocorrelation

 The mean is the expected value of x(t) :  The autocorrelation function is the

expected value of the product :



 

x

t x xp t x t x ) , ( ) ( ) (

) , , , ( ) ( ) ( ) , (

2 1 , 2 1 2 1 2 1 2 1

2 1

t t x x p x x t x t x t t r

x x

 

) ( ) (

2 1

t x t x

SLIDE 31

31

ensemble & time average

 Mean and autocorrelation can be determined in

two ways:

The experiment can be repeated many times

and the average taken over all these

functions. Such an average is called

ensemble average.

Take any one of these function as being

representative of the ensemble and find the average from a number of samples of this one

function. This is called a time average.

SLIDE 32

32

ergodicity & stationarity

 If the time average and ensemble average

f a random function are the same, it is

said to be ergodic.

 A random function is said to be stationary

if its statistics do not change as a function

f time.

This is also called strict sense stationarity (vs.

wide sense stationarity).

 Any ergodic function is also stationary.

SLIDE 33

33

ergodicity & stationarity

 For a stationary signal we have:  Stationarity is defined as:

 Where

 And the autocorrelation function is :

x t x  ) (

) , , ( ) , , , (

2 1 2 1 2 1

 x x p t t x x p 

1 2

t t   





2 1,

2 1 2 1

) , , ( ) (

x x

x x p x x r  

SLIDE 34

34

ergodicity & stationarity

 When x(t) is ergodic, its mean and

autocorrelation are :



   



N N t N

t x N x ) ( 2 1 lim

) ( ) ( 2 1 lim ) ( ) ( ) (   



   

   

N N t N

t x t x N t x t x r

SLIDE 35

35

cross-correlation

 The cross-correlation of two ergodic

random functions is :

 The subscript xy indicates a cross-correlation.



   

   

N N t N xy

t y t x N t y t x r ) ( ) ( 1 lim ) ( ) ( ) (   

SLIDE 36

36

Random Functions (power & cross spectral density):

 The Fourier transform of

(the autocorrelation function of an ergodic random function) is called the power spectral density of x(t) :

 The cross-spectral density of two ergodic

random functions is :



   



 

 

j

e r S ) ( ) (



   



 

 

j xy xy

e r S ) ( ) ( ) ( r

SLIDE 37

37

Random Functions (…power density):

 For an ergodic signal x(t),

can be written as:

 Then from elementary Fourier transform properties,

2

| ) ( | ) ( ) ( ) ( ) ( ) (       X X X X X S    



) ( r ) ( ) ( ) (       x x r

Assuming x(t) is real

SLIDE 38

38

Random Functions (White Noise):

 If all values of a random signal are

uncorrelated,

 Then this random function is called white noise

 The power spectrum of white noise is constant,  White noise is a mixture of all frequencies.

) ( ) (

2

     r

2

) (    S

SLIDE 39

39

Random Signal in Linear Systems :

 Let T[ ] represent the linear operation; then  Given a system with impulse response h(n),  A stationary signal applied to a linear system

yields a stationary output,

] ) ( [ )] ( [     t x T t x T

) ( ) ( ) ( ) ( ) ( n h n x n h n x n y       ) ( ) ( ) ( ) (         h h r r

xx yy 2

| ) ( | ) ( ) (    H S S

xx yy