ECE 417, Lecture 6: Discrete Cosine Transform
Mark Hasegawa-Johnson 9/6/2019
ECE 417, Lecture 6: Discrete Cosine Transform Mark Hasegawa-Johnson - - PowerPoint PPT Presentation
ECE 417, Lecture 6: Discrete Cosine Transform Mark Hasegawa-Johnson 9/6/2019 Outline DCT KNN How to draw the contour plots of a multivariate Gaussian pdf Discrete Cosine Transform Last time: PCA Why its useful: PCs are
Mark Hasegawa-Johnson 9/6/2019
keep just the top-N (for N<<D), and still get a pretty good nearest- neighbor classifier.
collected the whole dataset.
we have the whole dataset? For example, what are the PC axes for the set of “all natural images”?
color,
location in the image,
Define the 2D DFT, 𝑌 𝑙#, 𝑙# , as &
'()* +(,#
&
'-)* +-,#
𝑦 𝑜#, 𝑜# 𝑓
,1234('( +(
𝑓
,1234-'- +-
It turns out that the pixels, 𝑦 𝑜#, 𝑜# , are highly correlated with one another (often exactly the same!) But on average, as # images → ∞, the DFT coefficients 𝑌 𝑙#, 𝑙# become uncorrelated with one another (because
that ⃗ 𝑦 = 𝑦 0,0 𝑦[0,1] ⋮ 𝑦[𝑂# − 1, 𝑂2 − 1]
example, it could be in diagonal order: 0: 0,0 , 1: 1,0 , 2: 0,1 , 3: 2,0 , 4: 1,1 , 5: 0,2 , 6: 3,0 , ⋯. Then the features are 𝑧4 = ⃗ 𝑦H ⃗ 𝑤4, with basis vectors ⃗ 𝑤4 = 𝑤*4 ⋮ 𝑤+(+-,#,4 , 𝑤'4 = 𝑓
,1234('( +(
𝑓
,1234-'- +-
… is that it’s complex-valued! That makes it hard to do some types of statistical analysis and machine learning (some types of derivatives, for example, do not have a definition if the variable is complex-valued). ⃗ 𝑤4 = 𝑤*4 ⋮ 𝑤+(+-,#,4 , 𝑤'4 = 𝑓
,1234('( +(
𝑓
,1234-'- +-
The DFT of a real symmetric sequence is real & symmetric. 𝑦 𝑜 = 𝑦∗ 𝑂 − 𝑜 ↔ Im 𝑌[𝑙] = 0 Im 𝑦[𝑜] = 0 ↔ 𝑌 𝑙 = 𝑌∗ 𝑂 − 𝑙
valued.
pretend that the observed image is just ¼ of a larger, mirrored image.
Define 𝑡 𝑛 = P
# 2 𝑦 𝑛 − # 2 , 𝑛 = # 2 , Q 2 , ⋯ , 𝑂 − # 2 # 2 𝑦 2𝑂 − 𝑛 − # 2 , 𝑛 = 𝑂 + # 2 , 𝑂 + Q 2 , ⋯ , 2𝑂 − # 2
Then: 𝑇 𝑙 = &
T)# 2 2+,# 2
𝑡[𝑛]𝑓,1234T
2+
= &
T)# 2 +,# 2
𝑡[𝑛]2 cos 𝜌𝑙𝑛 𝑂 = &
')* +,#
𝑦[𝑜] cos 𝜌𝑙 𝑜 + 1 2 𝑂 𝑓134'
+ + 𝑓,134' + = 2 cos 𝜌𝑙𝑛
𝑂 𝑛 = 𝑜 + 1 2
Assume that you have some reasonable mapping from 𝑜 to 𝑜#, 𝑜2 , and from 𝑙 to 𝑙#, 𝑙2 . Then ⃗ 𝑧 = 𝑊H ⃗ 𝑦, where 𝑊 = ⃗ 𝑤*, ⋯ , ⃗ 𝑤+(+-,# , and ⃗ 𝑤4 = 𝑤*4 ⋮ 𝑤+(+-,#,4 , 𝑤'4 = cos 𝜌𝑙# 𝑜# + 1 2 𝑂# cos 𝜌𝑙2 𝑜2 + 1 2 𝑂2
cos 𝜌𝑙# 𝑜# + 1 2 𝑂
#
cos 𝜌𝑙2 𝑜2 + 1 2 𝑂2
average intensity of all pixels in the image.
the brightness gradient from top to bottom, or from left to right, respectively.
the difference in pixel intensity between the center vs. the edges of the image.
This image shows the four nearest neighbors of “Image 0” (Arnold Schwarzenegger) and “Image 47” (Jiang Zemin), calculated using a 9th-
Neighbors of “Image 0” are dark on the right-hand-side, and in the lower- left corner. Neighbors of “Image 47” are darker
Neither of these features captures person identity very well…
cos 𝜌𝑙# 𝑜# + 1 2 𝑂
#
cos 𝜌𝑙2 𝑜2 + 1 2 𝑂2 With a 36th order DCT (up to k1=5,k2=5), we can get a bit more detail about the image.
The 36 order DCT is, at least, capturing the face orientation: most
at least looking in the same way. Jiang Zemin seems to be correctly identified (2 of the 4 neighbors are the same person), but Arnold Schwarzenegger isn’t (each of the 4 “similar” images shows a different person!)
PCA is like DCT in some ways. In this example, ⃗ 𝑤* might be measuring average brightness; ⃗ 𝑤# is left-to-right gradient; ⃗ 𝑤2 is measuring center-vs-edges. But PCA can also learn what’s important to represent sample covariance of the given data. For example, eyeglasses ( ⃗ 𝑤Z, ⃗ 𝑤[), short vs. long nose ( ⃗ 𝑤Z), narrow
𝑤\).
For these two test images, 9th-order PCA has managed to identify both people. Two of the four neighbors of ”Image 0” are Arnold Schwarzenegger. Three of the four neighbors of “Image 47” are Jiang Zemin.
It is not always true that PCA outperforms DCT. Especially for higher- dimension feature vectors, PCA might just learn random variation in the training dataset, which might not be useful for identifying person identity. …vs…
neighbor calculations, allowing it capture more about person identity.
because it models the particular problem under study (human faces) rather than a theoretical model of all natural images.
which are usually not useful for person identification. DCT is a bit more robust (maybe because it’s like using M → ∞).
closest.
ID as the hypothesis for the test token.
1 2 3 # times that person 0 was classified correctly (sometimes abbreviated C(0|0)) # times that person 0 was classified as person 1 (sometimes abbreviated C(1|0)) … … 1 … … … 2 3
Hypothesis Reference
Accuracy: 𝐵 = ∑`)*
Q
𝐷(𝑠|𝑠) ∑`)*
Q
∑f)*
Q
𝐷(ℎ|𝑠) = # 𝑑𝑝𝑠𝑠𝑓𝑑𝑢 # 𝑒𝑏𝑢𝑏 Recall: 𝑆 = 1 4 &
`)* Q
𝐷(𝑠|𝑠) ∑f)*
Q
𝐷(ℎ|𝑠) = 1 4 &
`)* Q # 𝑢𝑗𝑛𝑓𝑡 𝑠 𝑑𝑝𝑠𝑠𝑓𝑑𝑢𝑚𝑧 𝑠𝑓𝑑𝑝𝑜𝑗𝑨𝑓𝑒
# 𝑢𝑗𝑛𝑓𝑡 𝑠 𝑞𝑠𝑓𝑡𝑓𝑜𝑢𝑓𝑒 Precision: 𝑄 = 1 4 &
f)* Q
𝐷(ℎ|ℎ) ∑`)*
Q
𝐷(ℎ|𝑠) = 1 4 &
f)* Q # 𝑢𝑗𝑛𝑓𝑡 ℎ 𝑑𝑝𝑠𝑠𝑓𝑑𝑢𝑚𝑧 𝑠𝑓𝑑𝑝𝑜𝑗𝑨𝑓𝑒
# 𝑢𝑗𝑛𝑓𝑡 ℎ 𝑣𝑓𝑡𝑡𝑓𝑒
If the dimensions of ⃗ 𝑦 are jointly Gaussian, then we can write their joint probability density function (pdf) as 𝑔w ⃗ 𝑦 = 𝒪 ⃗ 𝑦; ⃗ 𝜈, 𝑆 = 1 2𝜌𝑆 #/2 𝑓,#
2 ⃗ |,} ~•€( ⃗ |,}
The exponent is sometimes called the Mahalanobis distance (with weight matrix 𝑆) between ⃗ 𝑦 and ⃗ 𝜈 (named after Prasanta Chandra Mahalanobis, 1893-1972): 𝑒•
2 ⃗
𝑦, ⃗ 𝜈 = ⃗ 𝑦 − ⃗ 𝜈 H𝑆,# ⃗ 𝑦 − ⃗ 𝜈
The contour lines of a Gaussian pdf are the lines of constant Mahalanobis distance between ⃗ 𝑦 and ⃗ 𝜈. For example:
”• ⃗ | ”• } = 𝑓,(
2 ⃗
𝑦, ⃗ 𝜈 which happens when 1 = 𝑒•
2 ⃗
𝑦, ⃗ 𝜈 = ⃗ 𝑦 − ⃗ 𝜈 H𝑆,# ⃗ 𝑦 − ⃗ 𝜈
”• ⃗ | ”• } = 𝑓,2 when 4 = 𝑒• 2 ⃗
𝑦, ⃗ 𝜈 which happens when 4 = 𝑒•
2 ⃗
𝑦, ⃗ 𝜈 = ⃗ 𝑦 − ⃗ 𝜈 H𝑆,# ⃗ 𝑦 − ⃗ 𝜈
The inverse of a positive definite matrix is: 𝑆,# = 𝑊Λ,#𝑊H Proof: 𝑆 𝑆,# = 𝑊Λ𝑊H𝑊Λ,#𝑊H = 𝑊ΛΛ,#𝑊H = 𝑊𝑊H = 𝐽 So 𝑒•
2 ⃗
𝑦, ⃗ 𝜈 = ⃗ 𝑦 − ⃗ 𝜈 H𝑆,# ⃗ 𝑦 − ⃗ 𝜈 = ⃗ 𝑦 − ⃗ 𝜈 H 𝑊Λ,#𝑊H ⃗ 𝑦 − ⃗ 𝜈 = ⃗ 𝑧H Λ,# ⃗ 𝑧
The formula 1 = ⃗ 𝑧HΛ,# ⃗ 𝑧, where ⃗ 𝑧 = 𝑊H ⃗ 𝑦 − ⃗ 𝜈 … or equivalently 1 =
™š- ›š + ⋯ + ™œ€(- ›œ€(
where 𝑧• = ⃗ 𝑤•
H ⃗
𝑦 − ⃗ 𝜈 … is the formula for an ellipsoid (in 2D, an ellipse).
𝑤*, ⃗ 𝑤#,…
𝑤• direction is 𝜇•
Suppose that 𝑦# and 𝑦2 are linearly correlated Gaussians with means 1 and -1, respectively, and with variances 1 and 4, and covariance 1. ⃗ 𝜈 = 1 −1 Remember the definitions of variance and covariance: 𝜏#2 = 𝐹 𝑦# − 𝜈# 2 = 1 𝜏22 = 𝐹 𝑦2 − 𝜈2 2 = 4 𝜍#2 = 𝜍2# = 𝐹 𝑦# − 𝜈# 𝑦2 − 𝜈2 = 1 𝑆 = 1 1 1 4
We have that 𝑆 = 1 1 1 4 We get the eigenvalues from the determinant equation: 𝑆 − 𝜇𝐽 = 1 − 𝜇 4 − 𝜇 − 1 = 𝜇2 − 5𝜇 + 3 which equals zero for 𝜇 =
Z± #Q 2
. We get the eigenvectors by solving 𝜇 ⃗ 𝑤 = 𝑆 ⃗ 𝑤, which gives ⃗ 𝑤# ∝ 1 3 + 13 2 , ⃗ 𝑤2 ∝ 1 3 − 13 2
So the principal axes of the ellipse are in the directions ⃗ 𝑤# ∝ 1 3 + 13 2 , ⃗ 𝑤2 ∝ 1 3 − 13 2
In fact, it’s useful to talk about 𝑆 in this way:
is the part of ⃗ 𝑦 − ⃗ 𝜈 that’s in the ⃗ 𝑤# direction. It has a variance of 𝜇#.
𝑧2, is the part of ⃗ 𝑦 − ⃗ 𝜈 that’s in the ⃗ 𝑤2 direction. It has a variance
uncorrelated with each other.
𝑦 is Gaussian, then 𝑧# and 𝑧2 are independent Gaussian random variables.