Machine Learning for Signal Processing
Expectation Maximization Mixture Models
Bhiksha Raj 27 Oct 2016
11755/18797 1
Mixture Models Bhiksha Raj 27 Oct 2016 11755/18797 1 Learning - - PowerPoint PPT Presentation
Machine Learning for Signal Processing Expectation Maximization Mixture Models Bhiksha Raj 27 Oct 2016 11755/18797 1 Learning Distributions for Data Problem: Given a collection of examples from some data, estimate its distribution
11755/18797 1
11755/18797 2
– Figure out what the probabilities of the various numbers are for dice
– Estimate that makes the observed sequence of numbers most probable
11755/18797 3
6 3 1 5 4 1 2 4 …
11755/18797 4
11755/18797 5
n1 n2 n3 n4 n5 n6 p1 p2 p3 p4 p5 p6 p1 p2 p3 p4 p5 p6
– I.e. the generating process draws from the distribution
– The data you have observed are very typical of the process
generating the observed data
– Not necessarily true
the data
– Should assign lower probability to less frequent observations and vice versa
11755/18797 6
– Log() is a monotonic function – argmaxx f(x) = argmaxx log(f(x))
– Requires constrained optimization to ensure probabilities sum to 1
11755/18797 7
i n i
i
p Const n n n n n n P ) , , , , , (
6 5 4 3 2 1
i i i
p n Const n n n n n n P log ) log( ) , , , , , ( log
6 5 4 3 2 1
j j i i
n n p
EVENTUALLY ITS JUST COUNTING!
11755/18797 8
) ( ) ( 5 . exp | | ) 2 ( 1 ) , ; ( ) (
1
m m m Q Q Q
X
X X N X P
T d
Given a collection of observations (X1, X2,…),
11755/18797 9
Q Q
i i T i d
X X X X P ) ( ) ( 5 . exp | | ) 2 ( 1 ,...) , (
1 2 1
m m
Q
i T i i i i
X X N X N m m m 1 1
ITS STILL JUST COUNTING!
Q Q
i i T i
X X C X X P ) ( ) ( | | log 5 . ,...) , ( log
1 2 1
m m
11755/18797 10
b x b b x L x P | | exp 2 1 ) , ; ( ) ( m m
Given a collection of observations (x1, x2,…), estimate
11755/18797 11
i i
b x b N C x x P | | ) log( ,...) , ( log
2 1
m
i i i
x N b x median | | 1 }) ({ m m
Still just counting
– Determine mode and curvature
– X = [x1 x2 .. xK], Si xi = 1, xi >= 0 for all i
11755/18797 12 K=3. Clockwise from top left: α=(6, 2, 2), (3, 7, 5), (6, 2, 6), (2, 3, 4)
(from wikipedia)
log of the density as we change α from α=(0.3, 0.3, 0.3) to (2.0, 2.0, 2.0), keeping all the individual αi's equal to each other.
i i i i i i
i
x X D X P
1
) ( ) ; ( ) (
a
a a a
Given a collection of observations (X1, X2,…),
– Needs gradient ascent
11755/18797 13
i i i i i j j i i
N N X X X P a a a log log ) log( ) 1 ( ,...) , ( log
, 2 1
– The dice are differently loaded for the two of them
11755/18797 14
6 3 1 5 4 1 2 4 … 4 4 1 6 3 2 1 2 …
numbers from the two dice
– As indicated by the colors, we know who rolled what number
11755/18797 15
6 4 5 1 2 3 4 5 2 2 1 4 3 4 6 2 1 6…
numbers from the two dice
– As indicated by the colors, we know who rolled what number
11755/18797 16
6 4 5 1 2 3 4 5 2 2 1 4 3 4 6 2 1 6…
6 5 2 4 2 1 3 6 1.. 4 1 3 5 2 4 4 2 6..
Collection of “blue” numbers Collection of “red” numbers
numbers from the two dice – As indicated by the colors, we
know who rolled what number
probabilities for each of the 6 possible outcomes
rolls
number total rolled number was times
no. ) ( number P
11755/18797 17
6 4 5 1 2 3 4 5 2 2 1 4 3 4 6 2 1 6…
6 5 2 4 2 1 3 6 1.. 4 1 3 5 2 4 4 2 6..
0.05 0.1 0.15 0.2 0.25 0.3 1 2 3 4 5 6 0.05 0.1 0.15 0.2 0.25 0.3 1 2 3 4 5 6
– 40% of the time he calls out the number from the left shooter, and 60% of the time, the one from the right (and you know this)
11755/18797 18
6 4 1 5 3 2 2 2 …
6 3 1 5 4 1 2 4 … 4 4 1 6 3 2 1 2 …
11755/18797 19
6 4 1 5 3 2 2 2 …
6 3 1 5 4 1 2 4 … 4 4 1 6 3 2 1 2 …
– He selects “RED”, and the Red die rolls the number X – OR – He selects “BLUE” and the Blue die rolls the number X
– E.g. P(6) = P(Red)P(6|Red) + P(Blue)P(6|Blue)
is a mixture multinomial
11755/18797 20
Z
Z X P Z P X P ) | ( ) ( ) (
Mixture weights Component multinomials
– Component distributions may be of varied type
11755/18797 21
Z
Z X P Z P X P ) | ( ) ( ) (
Mixture weights Component distributions
Q
Z z z
X N Z P X P ) , ; ( ) ( ) ( m
Mixture Gaussian
Q
Z i i z z i Z z z
b X L Z P X N Z P X P ) , ; ( ) ( ) , ; ( ) ( ) (
,
m m
Mixture of Gaussians and Laplacians
11755/18797 22
X Z X
Z X P Z P n Const n n n n n n P ) | ( ) ( log ) log( )) , , , , , ( log(
6 5 4 3 2 1
Z
X n Z X n
X X
Z X P Z P Const X P Const n n n n n n P ) | ( ) ( ) ( ) , , , , , (
6 5 4 3 2 1
Expectation Maximization (or EM) algorithm
Rubin
– Maximum Likelihood Estimation from incomplete data, via the EM Algorithm, Journal of the Royal Statistical Society, Series B, 1977
prior to the landmark paper, however.
11755/18797 23
11755/18797 24
11755/18797 25
– Dice: The identity of the dice whose number has been called out
– By adding the observation to the right bin
11755/18797 26 Collection of “blue” numbers Collection of “red” numbers
6
.. ..
Collection of “blue” numbers Collection of “red” numbers
6
.. ..
Collection of “blue” numbers Collection of “red” numbers
6
6 6
6
6 6
..
6
..
Instance from blue dice Instance from red dice Dice unknown
– At each time there is a current estimate of parameters
– The a posteriori probabilities of the various values of Z are computed using Bayes’ rule:
) ( ) | ( ) ( ) ( ) | ( ) | ( Z P Z X CP X P Z P Z X P X Z P
11755/18797 27
sets of dice (somehow):
calls out the two shooters (somehow)
0.05 0.1 0.15 0.2 0.25 0.3 0.35 1 2 3 4 5 6
11755/18797 28
0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 1 2 3 4 5 6
0.5 0.5
0.1 0.05
P(X | blue) P(X | red) P(Z)
– P(blue) = P(red) = 0.5 – P(4 | blue) = 0.1, for P(4 | red) = 0.05
025 . 5 . 05 . ) ( ) | 4 ( ) 4 | ( C C red Z P red Z X C P X red P
11755/18797 29
05 . 5 . 1 . ) ( ) | 4 ( ) 4 | ( C C blue Z P blue Z X C P X blue P
05 . 025 . 025 . ) 4 | ( C C C X red P
11755/18797 30
6 4 5 1 2 3 4 5 2 2 1 4 3 4 6 2 1 6
4 (0.33) 4 (0.67)
contributes to both “Red” and “Blue”
11755/18797 31
6 4 5 1 2 3 4 5 2 2 1 4 3 4 6 2 1 6
contributes to both “Red” and “Blue”
11755/18797 32
6 4 5 1 2 3 4 5 2 2 1 4 3 4 6 2 1 6
6 (0.8) 6 (0.2)
contributes to both “Red” and “Blue”
11755/18797 33
6 4 5 1 2 3 4 5 2 2 1 4 3 4 6 2 1 6
6 (0.8), 6 (0.2), 4 (0.33) 4 (0.67)
contributes to both “Red” and “Blue”
11755/18797 34
6 4 5 1 2 3 4 5 2 2 1 4 3 4 6 2 1 6
6 (0.8), 6 (0.2), 4 (0.33), 4 (0.67), 5 (0.33), 5 (0.67),
contributes to both “Red” and “Blue”
11755/18797 35
6 4 5 1 2 3 4 5 2 2 1 4 3 4 6 2 1 6
6 (0.8), 4 (0.33), 5 (0.33), 1 (0.57), 2 (0.14), 3 (0.33), 4 (0.33), 5 (0.33), 2 (0.14), 2 (0.14), 1 (0.57), 4 (0.33), 3 (0.33), 4 (0.33), 6 (0.8), 2 (0.14), 1 (0.57), 6 (0.8) 6 (0.2), 4 (0.67), 5 (0.67), 1 (0.43), 2 (0.86), 3 (0.67), 4 (0.67), 5 (0.67), 2 (0.86), 2 (0.86), 1 (0.43), 4 (0.67), 3 (0.67), 4 (0.67), 6 (0.2), 2 (0.86), 1 (0.43), 6 (0.2)
contributes to both “Red” and “Blue”
in the red column
– 7.31
in the blue column
– 10.69 – Note: 10.69 + 7.31 = 18 = the total number of instances
Called P(red|X) P(blue|X) 6 .8 .2 4 .33 .67 5 .33 .67 1 .57 .43 2 .14 .86 3 .33 .67 4 .33 .67 5 .33 .67 2 .14 .86 2 .14 .86 1 .57 .43 4 .33 .67 3 .33 .67 4 .33 .67 6 .8 .2 2 .14 .86 1 .57 .43 6 .8 .2
11755/18797 36
7.31 10.69
– Total count for 1: 1.71
Called P(red|X) P(blue|X) 6 .8 .2 4 .33 .67 5 .33 .67 1 .57 .43 2 .14 .86 3 .33 .67 4 .33 .67 5 .33 .67 2 .14 .86 2 .14 .86 1 .57 .43 4 .33 .67 3 .33 .67 4 .33 .67 6 .8 .2 2 .14 .86 1 .57 .43 6 .8 .2
11755/18797 37
7.31 10.69
– Total count for 1: 1.71 – Total count for 2: 0.56
Called P(red|X) P(blue|X) 6 .8 .2 4 .33 .67 5 .33 .67 1 .57 .43 2 .14 .86 3 .33 .67 4 .33 .67 5 .33 .67 2 .14 .86 2 .14 .86 1 .57 .43 4 .33 .67 3 .33 .67 4 .33 .67 6 .8 .2 2 .14 .86 1 .57 .43 6 .8 .2
11755/18797 38
7.31 10.69
– Total count for 1: 1.71 – Total count for 2: 0.56 – Total count for 3: 0.66
Called P(red|X) P(blue|X) 6 .8 .2 4 .33 .67 5 .33 .67 1 .57 .43 2 .14 .86 3 .33 .67 4 .33 .67 5 .33 .67 2 .14 .86 2 .14 .86 1 .57 .43 4 .33 .67 3 .33 .67 4 .33 .67 6 .8 .2 2 .14 .86 1 .57 .43 6 .8 .2
11755/18797 39
7.31 10.69
– Total count for 1: 1.71 – Total count for 2: 0.56 – Total count for 3: 0.66 – Total count for 4: 1.32
Called P(red|X) P(blue|X) 6 .8 .2 4 .33 .67 5 .33 .67 1 .57 .43 2 .14 .86 3 .33 .67 4 .33 .67 5 .33 .67 2 .14 .86 2 .14 .86 1 .57 .43 4 .33 .67 3 .33 .67 4 .33 .67 6 .8 .2 2 .14 .86 1 .57 .43 6 .8 .2
11755/18797 40
7.31 10.69
– Total count for 1: 1.71 – Total count for 2: 0.56 – Total count for 3: 0.66 – Total count for 4: 1.32 – Total count for 5: 0.66
Called P(red|X) P(blue|X) 6 .8 .2 4 .33 .67 5 .33 .67 1 .57 .43 2 .14 .86 3 .33 .67 4 .33 .67 5 .33 .67 2 .14 .86 2 .14 .86 1 .57 .43 4 .33 .67 3 .33 .67 4 .33 .67 6 .8 .2 2 .14 .86 1 .57 .43 6 .8 .2
11755/18797 41
7.31 10.69
– Total count for 1: 1.71 – Total count for 2: 0.56 – Total count for 3: 0.66 – Total count for 4: 1.32 – Total count for 5: 0.66 – Total count for 6: 2.4
Called P(red|X) P(blue|X) 6 .8 .2 4 .33 .67 5 .33 .67 1 .57 .43 2 .14 .86 3 .33 .67 4 .33 .67 5 .33 .67 2 .14 .86 2 .14 .86 1 .57 .43 4 .33 .67 3 .33 .67 4 .33 .67 6 .8 .2 2 .14 .86 1 .57 .43 6 .8 .2
11755/18797 42
7.31 10.69
– Total count for 1: 1.71 – Total count for 2: 0.56 – Total count for 3: 0.66 – Total count for 4: 1.32 – Total count for 5: 0.66 – Total count for 6: 2.4
– P(1 | Red) = 1.71/7.31 = 0.234 – P(2 | Red) = 0.56/7.31 = 0.077 – P(3 | Red) = 0.66/7.31 = 0.090 – P(4 | Red) = 1.32/7.31 = 0.181 – P(5 | Red) = 0.66/7.31 = 0.090 – P(6 | Red) = 2.40/7.31 = 0.328
Called P(red|X) P(blue|X) 6 .8 .2 4 .33 .67 5 .33 .67 1 .57 .43 2 .14 .86 3 .33 .67 4 .33 .67 5 .33 .67 2 .14 .86 2 .14 .86 1 .57 .43 4 .33 .67 3 .33 .67 4 .33 .67 6 .8 .2 2 .14 .86 1 .57 .43 6 .8 .2
11755/18797 43
7.31 10.69
Called P(red|X) P(blue|X) 6 .8 .2 4 .33 .67 5 .33 .67 1 .57 .43 2 .14 .86 3 .33 .67 4 .33 .67 5 .33 .67 2 .14 .86 2 .14 .86 1 .57 .43 4 .33 .67 3 .33 .67 4 .33 .67 6 .8 .2 2 .14 .86 1 .57 .43 6 .8 .2
11755/18797 44
7.31 10.69
– Total count for 1: 1.29
Called P(red|X) P(blue|X) 6 .8 .2 4 .33 .67 5 .33 .67 1 .57 .43 2 .14 .86 3 .33 .67 4 .33 .67 5 .33 .67 2 .14 .86 2 .14 .86 1 .57 .43 4 .33 .67 3 .33 .67 4 .33 .67 6 .8 .2 2 .14 .86 1 .57 .43 6 .8 .2
11755/18797 45
7.31 10.69
– Total count for 1: 1.29 – Total count for 2: 3.44
Called P(red|X) P(blue|X) 6 .8 .2 4 .33 .67 5 .33 .67 1 .57 .43 2 .14 .86 3 .33 .67 4 .33 .67 5 .33 .67 2 .14 .86 2 .14 .86 1 .57 .43 4 .33 .67 3 .33 .67 4 .33 .67 6 .8 .2 2 .14 .86 1 .57 .43 6 .8 .2
11755/18797 46
7.31 10.69
– Total count for 1: 1.29 – Total count for 2: 3.44 – Total count for 3: 1.34
Called P(red|X) P(blue|X) 6 .8 .2 4 .33 .67 5 .33 .67 1 .57 .43 2 .14 .86 3 .33 .67 4 .33 .67 5 .33 .67 2 .14 .86 2 .14 .86 1 .57 .43 4 .33 .67 3 .33 .67 4 .33 .67 6 .8 .2 2 .14 .86 1 .57 .43 6 .8 .2
11755/18797 47
7.31 10.69
– Total count for 1: 1.29 – Total count for 2: 3.44 – Total count for 3: 1.34 – Total count for 4: 2.68
Called P(red|X) P(blue|X) 6 .8 .2 4 .33 .67 5 .33 .67 1 .57 .43 2 .14 .86 3 .33 .67 4 .33 .67 5 .33 .67 2 .14 .86 2 .14 .86 1 .57 .43 4 .33 .67 3 .33 .67 4 .33 .67 6 .8 .2 2 .14 .86 1 .57 .43 6 .8 .2
11755/18797 48
7.31 10.69
– Total count for 1: 1.29 – Total count for 2: 3.44 – Total count for 3: 1.34 – Total count for 4: 2.68 – Total count for 5: 1.34
– Total count for 1: 1.29 – Total count for 2: 3.44 – Total count for 3: 1.34 – Total count for 4: 2.68 – Total count for 5: 1.34 – Total count for 6: 0.6
Called P(red|X) P(blue|X) 6 .8 .2 4 .33 .67 5 .33 .67 1 .57 .43 2 .14 .86 3 .33 .67 4 .33 .67 5 .33 .67 2 .14 .86 2 .14 .86 1 .57 .43 4 .33 .67 3 .33 .67 4 .33 .67 6 .8 .2 2 .14 .86 1 .57 .43 6 .8 .2
11755/18797 49
7.31 10.69
– Total count for 1: 1.29 – Total count for 2: 3.44 – Total count for 3: 1.34 – Total count for 4: 2.68 – Total count for 5: 1.34 – Total count for 6: 0.6
– P(1 | Blue) = 1.29/11.69 = 0.122 – P(2 | Blue) = 0.56/11.69 = 0.322 – P(3 | Blue) = 0.66/11.69 = 0.125 – P(4 | Blue) = 1.32/11.69 = 0.250 – P(5 | Blue) = 0.66/11.69 = 0.125 – P(6 | Blue) = 2.40/11.69 = 0.056
Called P(red|X) P(blue|X) 6 .8 .2 4 .33 .67 5 .33 .67 1 .57 .43 2 .14 .86 3 .33 .67 4 .33 .67 5 .33 .67 2 .14 .86 2 .14 .86 1 .57 .43 4 .33 .67 3 .33 .67 4 .33 .67 6 .8 .2 2 .14 .86 1 .57 .43 6 .8 .2
11755/18797 50
7.31 10.69
– Note 7.31+10.69 = 18
probability that the caller calls out Red or Blue
– i.e the fraction of times that he calls Red and the fraction of times he calls Blue
Called P(red|X) P(blue|X) 6 .8 .2 4 .33 .67 5 .33 .67 1 .57 .43 2 .14 .86 3 .33 .67 4 .33 .67 5 .33 .67 2 .14 .86 2 .14 .86 1 .57 .43 4 .33 .67 3 .33 .67 4 .33 .67 6 .8 .2 2 .14 .86 1 .57 .43 6 .8 .2
11755/18797 51
7.31 10.69
Called P(red|X) P(blue|X) 6 .8 .2 4 .33 .67 5 .33 .67 1 .57 .43 2 .14 .86 3 .33 .67 4 .33 .67 5 .33 .67 2 .14 .86 2 .14 .86 1 .57 .43 4 .33 .67 3 .33 .67 4 .33 .67 6 .8 .2 2 .14 .86 1 .57 .43 6 .8 .2
52
Probability of Blue dice:
P(1 | Blue) = 1.29/11.69 = 0.122
P(2 | Blue) = 0.56/11.69 = 0.322
P(3 | Blue) = 0.66/11.69 = 0.125
P(4 | Blue) = 1.32/11.69 = 0.250
P(5 | Blue) = 0.66/11.69 = 0.125
P(6 | Blue) = 2.40/11.69 = 0.056
Probability of Red dice:
P(1 | Red) = 1.71/7.31 = 0.234
P(2 | Red) = 0.56/7.31 = 0.077
P(3 | Red) = 0.66/7.31 = 0.090
P(4 | Red) = 1.32/7.31 = 0.181
P(5 | Red) = 0.66/7.31 = 0.090
P(6 | Red) = 2.40/7.31 = 0.328
THE UPDATED VALUES CAN BE USED TO REPEAT THE
11755/18797
1. Initialize P(Z), P(X | Z) 2. Estimate P(Z | X) for each Z, for each called out number
3. Re-estimate P(X | Z) for every value of X and Z 4. Re-estimate P(Z) 5. If not converged, return to 2
53
6 3 1 5 4 1 2 4 … 4 4 1 6 3 2 1 2 …
6 4 1 5 3 2 2 2 …
11755/18797
11755/18797 54
'
) ' | ( ) ' ( ) ( ) | ( ) | (
Z
Z X P Z P Z P Z X P X Z P
X X X O X O O
X Z P N X Z P N O Z P X Z P Z X P ) | ( ) | ( ) | ( ) | ( ) | (
that such
'
) | ' ( ) | ( ) (
Z X X X X
X Z P N X Z P N Z P
– The probability of 6 being called out:
– The following too is a valid solution [FIX]
11755/18797 55
b r
P P blue P red P P a a ) | 6 ( ) | 6 ( ) 6 (
anything P P P
b r
. . 1 ) 6 ( a
distributions far better than a simple Gaussian
unknown random variable
a simple Gaussian
by a mixture of two Gaussians
number of Gaussians in a mixture
11755/18797 56
11755/18797 57
Q Q Q
k k k T k k d k k k
X X k P X N k P X P ) ( ) ( 5 . exp | | ) 2 ( ) ( ) , ; ( ) ( ) (
1
m m m
11755/18797 58
Q
k k k
X N k P X P ) , ; ( ) ( ) ( m
6.1 1.4 5.3 1.9 4.2 2.2 4.9 0.5
numbers drawn from a mixture
– As indicated by the colors, we
know which Gaussian generated what number
parameters for that Gaussian
N N red P
red
) (
11755/18797 59
6.1 1.4 5.3 1.9 4.2 2.2 4.9 0.5 …
6.1 5.3 4.2 4.9 .. 1.4 1.9 2.2 0.5 ..
red i i red red
X N 1 m
Q
red i T red i red i red red
X X N m m 1
– The color information is missing
11755/18797 60
Q
k k k
X N k P X P ) , ; ( ) ( ) ( m
6.1 1.4 5.3 1.9 4.2 2.2 4.9 0.5
11755/18797 61 Collection of “blue” numbers Collection of “red” numbers
4.2 4.2 4.2
4.2
..
4.2
..
Gaussian unknown
Q Q
' ' ' '
) , ; ( ) ' ( ) , ; ( ) ( ) ' | ( ) ' ( ) ( ) | ( ) | (
k k k k k k
X N k P X N k P k X P k P k P k X P X k P m m
Gaussians
– Important how we do this – Typical solution: Initialize means randomly, Qk as the global covariance of the data and P(k) uniformly
Gaussian, for each observation
Number P(red|X) P(blue|X) 6.1 .81 .19 1.4 .33 .67 5.3 .75 .25 1.9 .41 .59 4.2 .64 .36 2.2 .43 .57 4.9 .66 .34 0.5 .05 .95
11755/18797 62
' ' '
k k k k k
much as its fragment size to each statistic
(6.1*0.81 + 1.4*0.33 + 5.3*0.75 + 1.9*0.41 + 4.2*0.64 + 2.2*0.43 + 4.9*0.66 + 0.5*0.05 ) / (0.81 + 0.33 + 0.75 + 0.41 + 0.64 + 0.43 + 0.66 + 0.05) = 17.05 / 4.08 = 4.18
Number P(red|X) P(blue|X) 6.1 .81 .19 1.4 .33 .67 5.3 .75 .25 1.9 .41 .59 4.2 .64 .36 2.2 .43 .57 4.9 .66 .34 0.5 .05 .95
11755/18797 63
4.08 3.92
Var(red) = ((6.1-4.18)2*0.81 + (1.4-4.18)2*0.33 +
(5.3-4.18)2*0.75 + (1.9-4.18)2*0.41 + (4.2-4.18)2*0.64 + (2.2-4.18)2*0.43 + (4.9-4.18)2*0.66 + (0.5-4.18)2*0.05 ) / (0.81 + 0.33 + 0.75 + 0.41 + 0.64 + 0.43 + 0.66 + 0.05)
8 08 . 4 ) ( red P
11755/18797 64
' ' '
k k k k k
X X k
X k P X X k P ) | ( ) | ( m
Q
X X k k
X k P X X k P ) | ( ) ( ) | (
2
m N X k P k P
X
) | ( ) (
11755/18797 65
Histogram of 4000 instances of a randomly generated data Individual parameters
mixture estimated by EM Two-Gaussian mixture estimated by EM
distributions.
Gaussian update rules, Laplacians use the Laplacian rule
11755/18797 66
x k x k k
x x k P x k P b x k P median | | ) | ( ) | ( 1 )) | ( ( m m
a phenomenon requires the knowledge of a hidden or missing variable (or a set of hidden/missing variables)
– The hidden variable is often called a “latent” variable
– Estimating mixtures of distributions
must both be learnt.
– Estimating the distribution of data, when some attributes are missing – Estimating the dynamics of a system, based only on observations that may be a complex function of system state
11755/18797 67
11755/18797 68
11755/18797 69
4 4 3
4. 3
Heads or tail?
..
“Heads” count “Tails” count
11755/18797 70
4 4 3
4. 3
Heads or tail?
..
“Heads” count “Tails” count
11755/18797 71
4 3,1 2,2 1,3
11755/18797 72
4 3,1 2,2 1,3
12 2 1
K
6 1 2 1 2 1
J
– Work this out
11755/18797 73
k Z
k Z X P k Z P k P X P ) , | ( ) | ( ) ( ) (
k1 k2 Z1 Z2 Z3 Z4
11755/18797 74
– Determine when speaker change has occurred in the speech signal
– Group together speech segments from the same speaker
Speaker B Speaker A
Which segments are from the same speaker? Where are speaker changes?
520-412/520-612 75
Clustering
i
e c t
i
e c t
i
e c t
i
e c t
i
e c t
520-412/520-612 76
520-412/520-612 77
520-412/520-612 78
520-412/520-612 79