Machine Learning for Signal Processing Linear Gaussian Models
Class 21. 12 Nov 2013 Instructor: Bhiksha Raj
12 Nov 2013 11755/18797 1
Machine Learning for Signal Processing Linear Gaussian Models - - PowerPoint PPT Presentation
Machine Learning for Signal Processing Linear Gaussian Models Class 21. 12 Nov 2013 Instructor: Bhiksha Raj 12 Nov 2013 11755/18797 1 Administrivia HW3 is up . Projects please send us an update 12 Nov 2013 11755/18797 2
12 Nov 2013 11755/18797 1
– .
12 Nov 2013 11755/18797 2
y = argmax Y P(Y|x)
12 Nov 2013 11755/18797 3
12 Nov 2013 11755/18797 4
y x z
y x z
z E ] [
yy yx xy xx zz
C C C C C z Var ) ( ] ) )( [(
T y x xy
y x E C
T z z zz zz z
z z C C N z P ) )( ( 5 . exp | | 2 1 ) , ( ) (
12 Nov 2013 11755/18797 5
F1
X Y
12 Nov 2013 11755/18797 6
x0
– The slice in the figure is Gaussian
– Uncertainty is reduced
12 Nov 2013 11755/18797 7
) ), ( ( ) | (
1 1 xy xx T yx yy x xx yx y
C C C C x C C N x y P
) ( ] [
1 | | x xx yx y x y x y
x C C y E
xy xx T xy yy
C C C C x y Var
1
) | (
F1
12 Nov 2013 11755/18797 8
Most likely value x0
12 Nov 2013 11755/18797 9
x0
] [ ) | ( max arg ˆ
|
y E x y P y
x y y
12 Nov 2013 11755/18797 10
] | ˆ ˆ [ ] | ˆ [
2
x y y y y x y y
T
E E Err ] | [ ˆ 2 ˆ ˆ ] | [ ] | ˆ 2 ˆ ˆ [ x y y y y x y y x y y y y y y E E E Err
T T T T T T
ˆ ] | [ 2 ˆ ˆ 2 . y x y y y d E d Err d
T T
The MMSE estimate is the mean of the distribution
12 Nov 2013 11755/18797 11
Most likely value is also The MEAN value
12
Let P(y|x) be a mixture density The MMSE estimate of y is given by Just a weighted combination of the MMSE
estimates from the component distributions
) , | ( ) | ( ) | ( x y x x y k P k P P
k
y x y x y x y d k P k P E
k
) , | ( ) | ( ] | [
y x y y x d k P k P
k
) , | ( ) | (
] , | [ ) | ( x y x k E k P
k
12 Nov 2013 11755/18797 13
P(y|x) is also a Gaussian mixture Let P(x,y) be a Gaussian Mixture
) , ; ( ) ( ) ( ) (
k k k
N k P P P
z z y x,
y x z ) ( ) , | ( ) | ( ) ( ) ( ) , , ( ) ( ) ( ) | ( x x y x x x y x x y x, x y P k P k P P P k P P P P
k k
k
k P k P P ) , | ( ) | ( ) | ( x y x x y
12 Nov 2013 11755/18797 14
Let P(y|x) is a Gaussian Mixture
k
k P k P P ) , | ( ) | ( ) | ( x y x x y ) , ( ) , , (
, , , , , ,
yy yx xy xx y x
x y
k k k k k k
C C C C N k P ) ), ( ( ) , | (
, 1 , , ,
x xx yx y
x x y
k k k k
C C N k P
k k k k k
C C N k P P ) ), ( ( ) | ( ) | (
, 1 , , , x xx yx y
x x x y
12 Nov 2013 11755/18797 15
E[y|x] is also a mixture P(y|x) is a mixture Gaussian density
k k k k k
C C N k P P ) ), ( ( ) | ( ) | (
, 1 , , , x xx yx y
x x x y
k
k E k P E ] , | [ ) | ( ] | [ x y x x y
k k k k k
C C k P E ) ( ) | ( ] | [
, 1 , , , x xx yx y
x x x y
12 Nov 2013 11755/18797 16
Weighted combination of MMSE estimates
Weight P(k|x) is easily computed too..
k k k k k
C C k P E ) ( ) | ( ] | [
, 1 , , , x xx yx y
x x x y
) ( ) , ( ) | ( x x x P k P k P ) , ( ) ( ) (
, xx x k k
C N k P P
x
12 Nov 2013 11755/18797 17
A mixture of estimates from individual Gaussians
– Cepstral vector sequence
12 Nov 2013 11755/18797 18
12 Nov 2013 11755/18797 19
parameters related to it..
RVs are Gaussian
– Other probability densities may also be used..
12 Nov 2013 11755/18797 20
12 Nov 2013 11755/18797 21
best explain the given data
BC is minimum
– While constraining that the columns of B are orthonormal
12 Nov 2013 11755/18797 22
f = wf,1 V1+ wf,2 V2 + wf,3 V3 +.. + wf,k Vk
12 Nov 2013 11755/18797 23
matrix:
– Principal directions of tightest ellipse centered on origin – Directions that retain maximum energy
12 Nov 2013 11755/18797 24
matrix:
– Principal directions of tightest ellipse centered on origin – Directions that retain maximum energy
12 Nov 2013 11755/18797 25
matrix:
– Principal directions of tightest ellipse centered on data – Directions that retain maximum variance
matrix:
– Principal directions of tightest ellipse centered on origin – Directions that retain maximum energy
12 Nov 2013 11755/18797 26
matrix:
– Principal directions of tightest ellipse centered on data – Directions that retain maximum variance
matrix:
– Principal directions of tightest ellipse centered on origin – Directions that retain maximum energy
12 Nov 2013 11755/18797 27
matrix:
– Principal directions of tightest ellipse centered on data – Directions that retain maximum variance
– Assume data centered at origin for simplicity
12 Nov 2013 11755/18797 28
f = wf,1 V1+ wf,2 V2 + wf,3 V3 +.. + wf,k Vk
12 Nov 2013 11755/18797 29
– Error is orthogonal to representation – Weight and error are specific to data instance
12 Nov 2013 11755/18797 30
e1 w11
Illustration assuming 3D space
– Error is orthogonal to representation – Weight and error are specific to data instance
12 Nov 2013 11755/18797 31
e2 w12
Illustration assuming 3D space Error is at 90o to the eigenface
90o
– Error is orthogonal to representation
12 Nov 2013 11755/18797 32
w
All data with the same representation wV1 lie a plane orthogonal to wV1
– Error is orthogonal to representation – Weight and error are specific to data instance
12 Nov 2013 11755/18797 33
e1
w11
Illustration assuming 3D space 0,0 Error is at 90o to the eigenfaces
w21
– Error is orthogonal to representation – Weight and error are specific to data instance
12 Nov 2013 11755/18797 34
e2
w12
Illustration assuming 3D space Error is at 90o to the eigenfaces
w22
– Error is orthogonal to representation – Weight and error are specific to data instance
12 Nov 2013 11755/18797 35
e2
w12
Error is at 90o to the eigenfaces
w22
V1 V2 D2
i i i i
w w V V X e
2 1 2 1
12 Nov 2013 11755/18797 36
e2
w12
Error is at 90o to the eigenface
w22
V1 V2 D2
– Variance in remaining subspace is minimal
12 Nov 2013 11755/18797 37
38
– eTV = 0
– Average wTw = Diagonal : Eigen representations are uncorrelated – Determinant eTe = minimum: Error variance is minimum
12 Nov 2013 11755/18797
e2
w12
Error is at 90o to the eigenface
w22
V1 V2 D2
covariance
39
12 Nov 2013 11755/18797
e2
w12
Error is at 90o to the eigenface
w22
V1 V2 D2
– In the process also estimate B and E
40
12 Nov 2013 11755/18797
– In the process also estimate B and E
41
12 Nov 2013 11755/18797
Gaussian random variables
– A “weight” variable w – An “error” variable e – Error not correlated to weight: E[eTw] = 0
given instances of x
– The problem of learning the distribution of a Gaussian RV
42
12 Nov 2013 11755/18797
12 Nov 2013 11755/18797 43
T T
12 Nov 2013 11755/18797 44
T
μ x V V μ x V V x
1
5 . exp | | 2 1 ) ( E B E B P
T T T D
– The variables are , V, B and E
12 Nov 2013 11755/18797 45
T
– Vw = VCC-1w = (VC)(C-1w) – We need extra constraints to make the solution unique
– Variance of w is an identity matrix
12 Nov 2013 11755/18797 46
T
– The variables are , V, and E
12 Nov 2013 11755/18797 47
T
12 Nov 2013 11755/18797 48
T
i i
– = 0
– Estimate mean of data
– Subtract it from the data
12 Nov 2013 11755/18797 49
– The variables are V, and E
12 Nov 2013 11755/18797 50
T
– x1, x2,..xN
12 Nov 2013 11755/18797 51
T
12 Nov 2013 11755/18797 52
i i
) , ( ) ( E N P e
) , ( ) | ( E N P Vw w x
) ( ) ( 5 . exp | | 2 1 ) | (
1
Vw x Vw x w x
E E P
T D
) .. | .. ( log
1 1 N N
P w w x x
i i i T i i
E E N ) ( ) ( 5 . | | log 5 .
1 1
Vw x Vw x
12 Nov 2013 11755/18797 53
i i i T i i
E E N LL ) ( ) ( 5 . | | log 5 .
1 1
Vw x Vw x ) ( 2
1
T i i i i
E w Vw x
1
i T i i i T i i
w w w x V
i i T i i T i i
N E x w V x x 1
– So how to deal with this?
12 Nov 2013 11755/18797 54
1
i T i i i T i i
w w w x V
i i T i i T i i
N E x w V x x 1
i i
) , ( ) ( E N P e
missing information
posterior probability P(z|x) and counted as usual
posteriori probability of the missing data: P(z|x)
12 Nov 2013 11755/18797 55 Collection of “blue” numbers Collection of “red” numbers
6
.. ..
Collection of “blue” numbers Collection of “red” numbers
6
.. ..
Collection of “blue” numbers Collection of “red” numbers
6
6 6
6
6 6
..
6
..
Instance from blue dice Instance from red dice Dice unknown
12 Nov 2013 11755/18797 56
1
i T i i i T i i
w w w x V
i i T i i T i i
N E x w V x x 1
i i
) , ( ) ( E N P e
i T i i T i i
i
E N N E x w V x x
x w|
] [ 1 1
1
] [ ] [
i T i T i
i i
E E ww w x V
x w| x w|
12 Nov 2013 11755/18797 57
1
i T i i i T i i
w w w x V
i i T i i T i i
N E x w V x x 1
i i
) , ( ) ( E N P e
i T i i T i i
i
E N N E x w V x x
x w|
] [ 1 1
1
] [ ] [
i T i T i
i i
E E ww w x V
x w| x w|
– x is Gaussian – w is Gaussian – They are linearly related
12 Nov 2013 11755/18797 58
) , ( ) ( E N P e
) , ( ) ( I N P w ) , ( ) ( E N P
T
VV x
w x z
) , ( ) (
zz z
z C N P
12 Nov 2013 11755/18797 59
ww wx xw xx zz
C C C C C w x z
w x z
V w x
w x xw
] ) )( [(
T
E C
) , ( ) (
zz z
z C N P
) , ( ) ( E N P
T
VV x
) , ( ) ( I N P w
I E C
T T
V V VV
zz
12 Nov 2013 11755/18797 60
) ), ( ( ) | (
1 1 xw xx wx ww x xx wx w
x w C C C C x C C N P
T
I E C
T T
V V VV
zz
ww wx xw xx zz
C C C C C
w x z
) ) ( , ) ( ( ) | (
1 1
V VV V x VV V x w
E I E N P
T T T T i T T
E E
i
x VV V w
x w 1 |
) ( ] [
T T
i i i
E E Var E ] [ ] [ ) ( ] [
| | |
w w w ww
x w x w x w
T T T T
i i i
E E E I E ] [ ] [ ) ( ] [
| | 1 |
w w V VV V ww
x w x w x w
11755/18797 61
i T T
E E
i
x VV V w
x w 1 |
) ( ] [
T T T T
i i i
E E E I E ] [ ] [ ) ( ] [
| | 1 |
w w V VV V ww
x w x w x w
1
] [ ] [
i T i T i
i i
E E ww w x V
x w| x w|
i T i i T i i
i
E N N E x w V x x
x w|
] [ 1 1
Gaussian PDF for a variable x
– PCA
– Factor Analysis
12 Nov 2013 11755/18797 62
3 Oct 2011 11755/18797 63
3 Oct 2011 11755/18797 64
FULL COV FIGURE
12 Nov 2013 11755/18797 65