Machine Learning for Signal Processing
Representing Signals: Images and Sounds
Class 4. 10 Sep 2013 Instructor: Bhiksha Raj
10 Sep 2013 11-755/18-797 1
Machine Learning for Signal Processing Representing Signals: Images - - PowerPoint PPT Presentation
Machine Learning for Signal Processing Representing Signals: Images and Sounds Class 4. 10 Sep 2013 Instructor: Bhiksha Raj 10 Sep 2013 11-755/18-797 1 Administrivia Basics of probability: Will not be covered Several very nice
10 Sep 2013 11-755/18-797 1
10 Sep 2013 11-755/18-797 2
10 Sep 2013 11-755/18-797 3
To learning much inclined, Who went to see the elephant, (Though all of them were blind), That each by observation Might satisfy his mind.
And happening to fall Against his broad and sturdy side, At once began to bawl: "God bless me! But the elephant Is very like a wall!“
Cried: "Ho! What have we here, So very round and smooth and sharp? To me 'tis very clear, This wonder of an elephant Is very like a spear!“
And happening to take The squirming trunk within his hands, Thus boldly up and spake: "I see," quoth he, "the elephant Is very like a snake!“
And felt about the knee. "What most this wondrous beast is like Is might plain," quoth he; "Tis clear enough the elephant Is very like a tree."
Said: "E'en the blindest man Can tell what this resembles most: Deny the fact who can, This marvel of an elephant Is very like a fan.“
About the beast to grope, Than seizing on the swinging tail That fell within his scope, "I see," quoth he, "the elephant Is very like a rope.“
Disputed loud and long, Each in his own opinion Exceeding stiff and strong. Though each was partly right, All were in the wrong.
10 Sep 2013 11-755/18-797 4
10 Sep 2013 11-755/18-797 5
10 Sep 2013 11-755/18-797 6
How do you describe them?
10 Sep 2013 11-755/18-797 7
– Which leads to “natural sounds are blobs”
look like blobs”
– Which wont get us anywhere
10 Sep 2013 11-755/18-797 8
– E.g. a pixel-wise description of the two images here will be completely different
10 Sep 2013 11-755/18-797 9
– Or rather large regions of relatively featureless shading – Uniform sequences of numbers
10 Sep 2013 11-755/18-797 10
– Dumb approximation – a image is a block of uniform shade
– Represent the images as vectors and compute the projection of the image on the “basis”
10 Sep 2013 11-755/18-797 11
Image =
N pixel pixel pixel . 2 1 1 . 1 1
B =
age B B B B BW PROJECTION age B pinv W age BW
T T
Im . ) ( Im ) ( Im
1
– Dramatic changes – Add a second picture that has very fast changes
10 Sep 2013 11-755/18-797 12
1 1 1 1 1 1 1 1 1 1 B
2 1 2 1 2 2 1 1
B1 B2 B2 B1
Image . ) ( Image ) ( Image
1 T T
B B B B BW PROJECTION B pinv W BW
10 Sep 2013 11-755/18-797 13
] [ . . ... Im
3 2 1 3 2 1 3 3 2 2 1 1
B B B B w w w W B w B w B w age
B1 B2 B3 B4 B5 B6 Getting closer at 625 bases!
– Checker boards are the same regardless of the picture you’re trying to describe
describe trees.
– Not perfectly though
10 Sep 2013 11-755/18-797 14
10 Sep 2013 11-755/18-797 15
10 Sep 2013 11-755/18-797 16
] [
3 2 1 3 2 1 3 3 2 2 1 1
B B B B w w w W B w B w B w Signal
B1 B2 B3
– E.g. checkerboards – We will call these “bases”
– X = w1 B1 + w2 B2 + w3 B3 + …
– I.e. the error between X and Si wi Bi is minimized – The error is generally chosen to be ||X – Si wi Bi||2
– Since the bases are known beforehand – Knowing the weights is sufficient to reconstruct the data
10 Sep 2013 11-755/18-797 17
10 Sep 2013 11-755/18-797 18
10 Sep 2013 11-755/18-797 19
2
i i i
3 2 1 3 2 1
33 32 31 23 22 21 13 12 11
10 Sep 2013 11-755/18-797 20
– The two are orthogonal to one another!
– Joint decomposition with multiple bases gives the same result as separate decomposition with each – This never holds true if one basis can explain another
10 Sep 2013 11-755/18-797 21
1 1 1 1 1 1 1 1 1 1 B
B1 B2
] [ Im
2 1 2 1 2 2 1 1
B B B w w W B w B w age
2
i i i
10 Sep 2013 11-755/18-797 22
– Unfortunately, they cannot represent sharp corners
10 Sep 2013 11-755/18-797 23
– DC – The entire length of the signal is
– The entire length of the signal is two periods.
– F(n) = sin(2pkn/N)
samples
10 Sep 2013 11-755/18-797 24
– X) periods
– With sign inversion
– Red curve = sine with 9 cycles (in a 20 point sequence)
– Green curve = sine with 11 cycles in 20 points
– The blue lines show the actual samples obtained
10 Sep 2013 11-755/18-797 25
– Pinv() will do the trick as usual
10 Sep 2013 11-755/18-797 26
T
1
] [
3 2 1 3 2 1 3 3 2 2 1 1
B B B B w w w W B w B w B w Signal
B1 B2 B3
– Pinv() will do the trick as usual
10 Sep 2013 11-755/18-797 27
] 1 [ . ] 1 [ ] [ ] [
3 2 1 3 2 1 3 3 2 2 1 1
L s s s Signal B B B B w w w W B w B w B w Signal
] 1 [ . . ] 1 [ ] [ . . /L) ) 1 ).( 2 / ( . sin(2 . . /L) ) 1 .( 1 . sin(2 /L) ) 1 .( . sin(2 . . . . . . . . . . /L) 1 ). 2 / ( . sin(2 . . /L) 1 . 1 . sin(2 /L) 1 . . sin(2 /L) ). 2 / ( . sin(2 . . /L) . 1 . sin(2 /L) . . sin(2
2 / 2 1
L s s s w w w L L L L L L
L
p p p p p p p p p
L/2 columns only
– The amplitude is the weight of the sinusoid
10 Sep 2013 11-755/18-797 28
– The amplitude is the weight of the sinusoid
10 Sep 2013 11-755/18-797 29
– The amplitude is the weight of the sinusoid
10 Sep 2013 11-755/18-797 30
– The amplitude is the weight of the sinusoid
10 Sep 2013 11-755/18-797 31
– Can never represent a signal that is non-zero in the first sample!
– If the first sample is zero, the signal cannot be represented!
10 Sep 2013 11-755/18-797 32
10 Sep 2013 11-755/18-797 33
2 2 1 1
Sines are shifted: do not start with value = 0
– Find the combination of amplitude and phase that results in the lowest squared error
– The sinusoids are still orthogonal to one another
10 Sep 2013 11-755/18-797 34
– Find the combination of amplitude and phase that results in the lowest squared error
– The sinusoids are still orthogonal to one another
10 Sep 2013 11-755/18-797 35
– Find the combination of amplitude and phase that results in the lowest squared error
– The sinusoids are still orthogonal to one another
10 Sep 2013 11-755/18-797 36
– Find the combination of amplitude and phase that results in the lowest squared error
– The sinusoids are still orthogonal to one another
10 Sep 2013 11-755/18-797 37
– The “basis matrix” depends on the unknown phase
– We can only (pseudo) invert a known matrix
10 Sep 2013 11-755/18-797 38
] 1 [ . . ] 1 [ ] [ . . ) /L ) 1 ).( 2 / ( . sin(2 . . ) /L ) 1 .( 1 . sin(2 ) /L ) 1 .( . sin(2 . . . . . . . . . . ) /L 1 ). 2 / ( . sin(2 . . ) /L 1 . 1 . sin(2 ) /L 1 . . sin(2 ) /L ). 2 / ( . sin(2 . . ) /L . 1 . sin(2 ) /L . . sin(2
2 / 2 1 L/2 1 L/2 1 L/2 1
L s s s w w w L L L L L L
L
p p p p p p p p p
L/2 columns only
– The sine is the imaginary part
10 Sep 2013 11-755/18-797 39
) * sin( ] [ n freq n b
1 ) * sin( ) * cos( ) * * exp( ] [ j n freq j n freq n freq j n b
) * sin( ) * cos( ) exp( ) * * exp( ) * * exp( n freq j n freq n freq j n freq j
10 Sep 2013 11-755/18-797 40
A x
B x C x
– They are orthogonal
– Can even model complex data!
– exp(j x ) + exp(-j x) is real
– is real
L/2 are complex conjugates
10 Sep 2013 11-755/18-797 41
L n x L j L n x L j ) 2 / ( 2 exp ) 2 / ( 2 exp p p
– The complex exponentials with frequencies equally spaced from L/2 are complex conjugates
– Is also real – If the two exponentials are multiplied by numbers that are conjugates of one another the result is real
10 Sep 2013 11-755/18-797 42
be complex conjugates, to make the result real
– Because we are dealing with real data
automatically; there is no need to impose the constraint externally
10 Sep 2013 11-755/18-797 43
b0 b1 bL/2
1 1 2 / 2 / 1 2 /
. .
L L L L
w w w w w
Complex conjugates
) (
2 / 2 / k L k L
w conjugate w
10 Sep 2013 11-755/18-797 44
] 1 [ . . ] 1 [ ] [ . . /L) ) 1 ).( 1 ( . exp(j2 . /L) ) 1 ).( 2 / ( . exp(j2 . /L) ) 1 .( . exp(j2 . . . . . . . . . . /L) 1 ). 1 ( . exp(j2 . . /L) 1 ). 2 / ( . exp(j2 . /L) 1 . . exp(j2 /L) ). 1 ( . exp(j2 . . /L) ). 2 / ( . exp(j2 . /L) . . exp(j2
1 2 /
L s s s S S S L L L L L L L L L
L L
p p p p p p p p p
10 Sep 2013 11-755/18-797 45
1 2 / 1 , 1 1 , 2 / 1 , 1 , 1 1 , 2 / 1 , , 1 , 2 / , ,
L L L L L L L L L L L L L L L L L L L L n k L
– XXT = X XT = I
– The inverse of X is its own transpose
– XH = Complex conjugate of XT
– XXH = XH X = I – The inverse of a complex orthonormal matrix is its own Hermitian
10 Sep 2013 11-755/18-797 46
10 Sep 2013 11-755/18-797 47
1 , 1 1 , 2 / 1 , 1 , 1 1 , 2 / 1 , , 1 , 2 / ,
. . . . . . . . . . . . . . . . . .
L L L L L L L L L L L L L L L L L L
W W W W W W W W W W
) / 2 exp( 1
,
L kn j L W
n k L
p ) / 2 exp( 1
,
L kn j L W
n k L
p
) 1 ( ), 1 ( 2 / ), 1 ( ), 1 ( 1 , 1 2 / , 1 , , 1 1 , 2 / , ,
. . . . . . . . . . . . . . . . . .
L L L L L L L L L L L L L L L L L L H
W W W W W W W W W W
The complex exponential basis is orthogonal
Its inverse is its own Hermitian W-1 = WH
10 Sep 2013 11-755/18-797 48
) 1 ( ), 1 ( 2 / ), 1 ( ), 1 ( 1 , 1 2 / , 1 , , 1 1 , 2 / , , 1 2 /
L L L L L L L L L L L L L L L L L L L L
] 1 [ . . ] 1 [ ] [ . . . . . . . . . . . . . . . . . . . .
1 2 / 1 , 1 1 , 2 / 1 , 1 , 1 1 , 2 / 1 , , 1 , 2 / ,
L s s s S S S W W W W W W W W W
L L L L L L L L L L L L L L L L L L L L
10 Sep 2013 11-755/18-797 49
] 1 [ . . ] 1 [ ] [ . . . . . . . . . . . . . . . . . . . .
) 1 ( ), 1 ( 2 / ), 1 ( ), 1 ( 1 , 1 2 / , 1 , , 1 1 , 2 / , , 1 2 /
L s s s W W W W W W W W W S S S
L L L L L L L L L L L L L L L L L L L L
10 Sep 2013 11-755/18-797 50
1 2 / 1 , 1 1 , 2 / 1 , 1 , 1 1 , 2 / 1 , , 1 , 2 / ,
L L L L L L L L L L L L L L L L L L L L
– For a 32-point signal
10 Sep 2013 11-755/18-797 51
DISCRETE FOURIER TRANSFORM (DFT)
the symmetry of the matrix to perform the matrix multiplication really fast
– Is much faster if the length of the signal can be expressed as 2N
10 Sep 2013 11-755/18-797 52
10 Sep 2013 11-755/18-797 53
10 Sep 2013 11-755/18-797 54
– Have both a magnitude and a phase
– DFT(A + B) = DFT(A) + DFT(B) – The DFT of the sum of two signals is the DFT of their sum
– Magnitude(DFT(A+B)) = Magnitude(DFT(A)) + Magnitude(DFT(B)) – Utterly wrong – Absurdly useful
10 Sep 2013 11-755/18-797 55
k k k
– A(L/2-k) * exp(-j *f*(L/2-k)) + A(L/2+k) * exp(-j*f*(L/2+k)) is always real if
A(L/2-k) = conjugate(A(L/2+k))
– We can pair up samples around the center all the way; the final summation term is always real
– If the signal is real, the FT is (conjugate) symmetric – If the signal is (conjugate) symmetric, the FT is real – If the signal is real and symmetric, the FT is real and symmetric
10 Sep 2013 11-755/18-797 56
Contributions from points equidistant from L/2 combine to cancel out imaginary terms
– Images would be symmetric in two dimensions
– Since the FT is symmetric, sufficient to store only half the coefficients (quarter for an image)
10 Sep 2013 11-755/18-797 57
– Enough to compute an L-sized cosine transform – Taking advantage of the symmetry of the problem
10 Sep 2013 11-755/18-797 58
] 1 [ . . ] 1 [ ] [ . . /2L) ) 1 ).( 5 . ( . cos(2 . . /2L) ) 1 .( 0.5) 1 .( cos(2 /2L) ) 1 ).( 5 . ( . cos(2 . . . . . . . . . . /2L) 1 ). 5 . ( . cos(2 . . /2L) 1 . 0.5) 1 .( cos(2 /2L) 1 ). 5 . ( . cos(2 /2L) ). 5 . ( . cos(2 . . /2L) . 0.5) 1 .( cos(2 /2L) ). 5 . ( cos(2
1 1
L s s s w w w L L L L L L
L
p p p p p p p p p
L columns
– Degree of quantization = degree of compression
10 Sep 2013 11-755/18-797 59
DCT Multiply by DCT matrix
10 Sep 2013 11-755/18-797 60
– Its just a formula – But computing these terms behind 0 or beyond L-1 tells us what the signal composed by the DFT looks like outside our narrow window
10 Sep 2013 11-755/18-797 61
] 1 [ . . ] 1 [ ] [ . . /L) ) 1 ).( 1 ( . exp(j2 . /L) ) 1 ).( 2 / ( . exp(j2 . /L) ) 1 .( . exp(j2 . . . . . . . . . . /L) 1 ). 1 ( . exp(j2 . . /L) 1 ). 2 / ( . exp(j2 . /L) 1 . . exp(j2 /L) ). 1 ( . exp(j2 . . /L) ). 2 / ( . exp(j2 . /L) . . exp(j2
1 2 /
L s s s S S S L L L L L L L L L
L L
p p p p p p p p p
1
) / 2 exp( ] [
L k k
L kn j S n s p
10 Sep 2013 11-755/18-797 62
s[n] DFT
1
) / 2 exp( ] [
L k k
L kn j S n s p
31 63
10 Sep 2013 11-755/18-797 63
properties of the periodic signal shown below
– Which extends from –infinity to +infinity – The period of this signal is 32 samples in this example
10 Sep 2013 11-755/18-797 64
10 Sep 2013 11-755/18-797 65
The DFT of one period of the sinusoid shown in the figure computes the spectrum of the entire sinusoid from –infinity to +infinity
The DFT of a real sinusoid has only one non zero frequency
The second peak in the figure also represents the same frequency as an effect of aliasing
10 Sep 2013 11-755/18-797 66
The DFT of one period of the sinusoid shown in the figure computes the spectrum of the entire sinusoid from –infinity to +infinity
The DFT of a real sinusoid has only one non zero frequency
The second peak in the figure also represents the same frequency as an effect of aliasing
10 Sep 2013 11-755/18-797 67
The DFT of one period of the sinusoid shown in the figure computes the spectrum of the entire sinusoid from –infinity to +infinity
The DFT of a real sinusoid has only one non zero frequency
The second peak in the figure is the “reflection” around L/2 (for real signals) Magnitude spectrum
10 Sep 2013 11-755/18-797 68
The DFT of any sequence computes the spectrum for an infinite repetition of that sequence
The DFT of a partial segment of a sinusoid computes the spectrum of an infinite repetition of that segment, and not of the entire sinusoid
This will not give us the DFT of the sinusoid itself!
10 Sep 2013 11-755/18-797 69
The DFT of any sequence computes the spectrum for an infinite repetition of that sequence
The DFT of a partial segment of a sinusoid computes the spectrum of an infinite repetition of that segment, and not of the entire sinusoid
This will not give us the DFT of the sinusoid itself!
10 Sep 2013 11-755/18-797 70
Magnitude spectrum
The DFT of any sequence computes the spectrum for an infinite repetition of that sequence
The DFT of a partial segment of a sinusoid computes the spectrum of an infinite repetition of that segment, and not of the entire sinusoid
This will not give us the DFT of the sinusoid itself!
10 Sep 2013 11-755/18-797 71
Magnitude spectrum of segment Magnitude spectrum of complete sine wave
10 Sep 2013 11-755/18-797 72
The difference occurs due to two reasons: The transform cannot know what the signal actually looks like
The implicit repetition of the observed signal introduces large
discontinuities at the points of repetition
This distorts even our measurement of what happens at the
boundaries of what has been reliably observed
10 Sep 2013 11-755/18-797 73
The difference occurs due to two reasons: The transform cannot know what the signal actually looks like
The implicit repetition of the observed signal introduces large
discontinuities at the points of repetition
These are not part of the underlying signal
We only want to characterize the underlying signal
The discontinuity is an irrelevant detail
10 Sep 2013 11-755/18-797 74
While we can never know what the signal looks like outside the
window, we can try to minimize the discontinuities at the boundaries
We do this by multiplying the signal with a window function
We call this procedure windowing We refer to the resulting signal as a “windowed” signal
Windowing attempts to do the following:
Keep the windowed signal similar to the original in the central
regions
Reduce or eliminate the discontinuities in the implicit periodic signal
10 Sep 2013 11-755/18-797 75
While we can never know what the signal looks like outside the
window, we can try to minimize the discontinuities at the boundaries
We do this by multiplying the signal with a window function
We call this procedure windowing We refer to the resulting signal as a “windowed” signal
Windowing attempts to do the following:
Keep the windowed signal similar to the original in the central
regions
Reduce or eliminate the discontinuities in the implicit periodic signal
10 Sep 2013 11-755/18-797 76
While we can never know what the signal looks like outside the
window, we can try to minimize the discontinuities at the boundaries
We do this by multiplying the signal with a window function
We call this procedure windowing We refer to the resulting signal as a “windowed” signal
Windowing attempts to do the following:
Keep the windowed signal similar to the original in the central
regions
Reduce or eliminate the discontinuities in the implicit periodic signal
10 Sep 2013 11-755/18-797 77
Magnitude spectrum
10 Sep 2013 11-755/18-797 78
Magnitude spectrum of windowed signal Magnitude spectrum of complete sine wave Magnitude spectrum of original segment
10 Sep 2013 11-755/18-797 79
Cosine windows:
Window length is M Index begins at 0
Hamming: w[n] = 0.54 – 0.46 cos(2pn/M) Hanning: w[n] = 0.5 – 0.5 cos(2pn/M) Blackman: 0.42 – 0.5 cos(2pn/M) + 0.08 cos(4pn/M)
10 Sep 2013 11-755/18-797 80
Geometric windows:
Rectangular (boxcar): Triangular (Bartlett): Trapezoid:
– Useful if the FFT (or any other algorithm we use) requires signals of a specified length – E.g. Radix 2 FFTs require signals of length 2n i.e., some power of 2. We must zero pad the signal to increase its length to the appropriate number
whose Fourier spectrum is being computed by the DFT
10 Sep 2013 11-755/18-797 81
– Useful if the FFT (or any other algorithm we use) requires signals of a specified length – E.g. Radix 2 FFTs require signals of length 2n i.e., some power of 2. We must zero pad the signal to increase its length to the appropriate number
whose Fourier spectrum is being computed by the DFT
10 Sep 2013 11-755/18-797 82
between
– It does not contain any additional information over the original DFT – It also does not contain less information
10 Sep 2013 11-755/18-797 83
Magnitude spectrum
10 Sep 2013 11-755/18-797 84
Magnitude spectra
between
– It does not contain any additional information over the original DFT – It also does not contain less information
10 Sep 2013 11-755/18-797 85
Windowed signal
10 Sep 2013 11-755/18-797 86
Magnitude spectra
10 Sep 2013 11-755/18-797 87
8000Hz 8000Hz time frequency frequency
128 samples from a speech signal sampled at 16000 Hz The first 65 points of a 128 point DFT. Plot shows log of the magnitude spectrum The first 513 points of a 1024 point DFT. Plot shows log of the magnitude spectrum
10 Sep 2013 11-755/18-797 88
+
FT Inverse FT
10 Sep 2013 11-755/18-797 89
+
FT Inverse FT
– Because the properties of audio signals change quickly – They are “stationary” only very briefly
10 Sep 2013 11-755/18-797 90
– Because the properties of audio signals change quickly – They are “stationary” only very briefly
10 Sep 2013 11-755/18-797 91
– Because the properties of audio signals change quickly – They are “stationary” only very briefly
10 Sep 2013 11-755/18-797 92
– Because the properties of audio signals change quickly – They are “stationary” only very briefly
10 Sep 2013 11-755/18-797 93
– Because the properties of audio signals change quickly – They are “stationary” only very briefly
10 Sep 2013 11-755/18-797 94
– Because the properties of audio signals change quickly – They are “stationary” only very briefly
10 Sep 2013 11-755/18-797 95
– Because the properties of audio signals change quickly – They are “stationary” only very briefly
10 Sep 2013 11-755/18-797 96
10 Sep 2013 11-755/18-797 97
Each segment is typically 25-64 milliseconds wide
Audio signals typically do not change significantly within this short time interval
Segments shift every 10- 16 milliseconds
10 Sep 2013 11-755/18-797 98
Each segment is windowed and a DFT is computed from it Windowing Frequency (Hz) Complex spectrum
10 Sep 2013 11-755/18-797 99
Each segment is windowed and a DFT is computed from it Windowing
10 Sep 2013 11-755/18-797 100
Compute Fourier Spectra of segments of audio and stack them side-by-side
10 Sep 2013 11-755/18-797 101
Compute Fourier Spectra of segments of audio and stack them side-by-side
frequency frequency frequency frequency frequency frequency frequency
10 Sep 2013 11-755/18-797 102
Compute Fourier Spectra of segments of audio and stack them side-by-side
frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency
10 Sep 2013 11-755/18-797 103
Compute Fourier Spectra of segments of audio and stack them side-by-side
frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency
10 Sep 2013 11-755/18-797 104
Compute Fourier Spectra of segments of audio and stack them side-by-side
frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency
10 Sep 2013 11-755/18-797 105
Compute Fourier Spectra of segments of audio and stack them side-by-side
frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency
10 Sep 2013 11-755/18-797 106
Compute Fourier Spectra of segments of audio and stack them side-by-side
frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency
10 Sep 2013 11-755/18-797 107
Compute Fourier Spectra of segments of audio and stack them side-by-side
frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency
10 Sep 2013 11-755/18-797 108
Compute Fourier Spectra of segments of audio and stack them side-by-side
frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency
10 Sep 2013 11-755/18-797 109
Compute Fourier Spectra of segments of audio and stack them side-by-side
frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency
10 Sep 2013 11-755/18-797 110
Compute Fourier Spectra of segments of audio and stack them side-by-side
frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency
10 Sep 2013 11-755/18-797 111
Compute Fourier Spectra of segments of audio and stack them side-by-side
frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency
10 Sep 2013 11-755/18-797 112
Compute Fourier Spectra of segments of audio and stack them side-by-side
frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency
10 Sep 2013 11-755/18-797 113
Compute Fourier Spectra of segments of audio and stack them side-by-side
frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency
10 Sep 2013 11-755/18-797 114
Compute Fourier Spectra of segments of audio and stack them side-by-side
frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency
10 Sep 2013 11-755/18-797 115
Compute Fourier Spectra of segments of audio and stack them side-by-side
frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency
10 Sep 2013 11-755/18-797 116
Compute Fourier Spectra of segments of audio and stack them side-by-side
frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency frequency
10 Sep 2013 11-755/18-797 117
Compute Fourier Spectra of segments of audio and stack them side-by-side The Fourier spectrum of each window can be inverted to get back the signal. Hence the spectrogram can be inverted to obtain a time-domain signal In this example each segment was 25 ms long and adjacent segments overlapped by 15 ms
64ms wide.
– Adjacent segments overlap by 48 ms.
– 1024 points (16000 samples a second). – 2048 point DFT – 1024 points of zero padding. – Only 1025 points of each DFT are shown
values
– Most of our analysis / operations are performed on the magnitude
10 Sep 2013 11-755/18-797 118
– Where does that come from?
10 Sep 2013 11-755/18-797 119
] 1 [ . . ] 1 [ ] [ . . . . . . . . . . . . . . . . . . . .
1 1 , 1 1 , 2 / 1 , 1 , 1 1 , 2 / 1 , , 1 , 2 / ,
L s s s S S S W W W W W W W W W
L k L L L L L L L L L L L L L L L L L L
)) ( . exp( | |
k k k
S phase j S S ?
– Sft = DFT(signal) – Phase1 = phase(Sft)
– Smagnitude = magnitude(Sft)
– ProcessedSpectrum = Process(Smagnitude) – New SFT = ProcessedSpectrum*exp(j*Phase) – Recover signal from SFT
– Compute the FT of a different signal of the same length – Use the phase from that signal
10 Sep 2013 11-755/18-797 120
– Make sure to have the complete FT (including the reflected portion)
during analysis
– E.g. If a 48ms (768 sample) overlap was used during analysis, overlap adjacent segments by 768 samples
10 Sep 2013 11-755/18-797 121
Actually a matrix of complex numbers 16ms (256 samples)
magnitude spectrogram
– By computing the log of each entry in the spectrogram matrix – After processing, the entry is exponentiated to get back the magnitude spectrum
a signal
by a dimensionality reducing matrix
– Usually a DCT matrix
10 Sep 2013 11-755/18-797 122
Log() x DCT(24x1025)
– 8x8 – Each image becomes a matrix of DCT vectors
– Gabor wavelets
– Eigen faces, SIFT
10 Sep 2013 11-755/18-797 123
DCT Npixels / 64 columns
10 Sep 2013 11-755/18-797 124
10 Sep 2013 11-755/18-797 125
10 Sep 2013 11-755/18-797 126
N
2 1
10 Sep 2013 11-755/18-797 127
NN N N N N
2 1 2 22 21 1 12 11
10 Sep 2013 11-755/18-797 128
N
2 1
10 Sep 2013 11-755/18-797 129
10 Sep 2013 11-755/18-797 130
10 Sep 2013 11-755/18-797 131
10 Sep 2013 11-755/18-797 132
10 Sep 2013 11-755/18-797 133
Antialiasing Filter Sampling Analog signal Digital signal
10 Sep 2013 11-755/18-797 134
10 Sep 2013 11-755/18-797 135
10 Sep 2013 11-755/18-797 136
10 Sep 2013 11-755/18-797 137