GCT535- Sound Technology for Multimedia Pitch Analysis
Graduate School of Culture Technology KAIST Juhan Nam
1
GCT535- Sound Technology for Multimedia Pitch Analysis Graduate - - PowerPoint PPT Presentation
GCT535- Sound Technology for Multimedia Pitch Analysis Graduate School of Culture Technology KAIST Juhan Nam 1 Outlines Introduction Definition of Pitch Information in Pitch Monophonic Pitch Detection Algorithms Time-Domain
1
2
3
4
5
*Inharmonicity in Piano Vibraphone
[From Klapuri’s slides]
6 228 230 232 234 236 238 240 242 244 −0.2 −0.1 0.1 0.2 0.3 time [ms] Amplitude
1000 2000 3000 4000 5000 6000 −20 −10 10 20 30 40 50 freqeuncy [Hertz] Magnitude (dB)
waveform spectrum
7
8
t(l) =
n=0 N−1−l
Singing Voice
(Sondhi 1967)
100 200 300 400 500 600 700 800 900 1000 −1 −0.5 0.5 1 time [sample] Waveform 100 200 300 400 500 600 700 800 900 1000 −40 −20 20 40 60 80 lag [sample] Auto−correlation
9
biased,t(l) =
n=0 N−1−l
unbiased,t(l) =
n=0 N−1−l
100 200 300 400 500 600 700 800 900 1000 −0.04 −0.02 0.02 0.04 0.06 0.08 lag [sample] Auto−correlation
10
Spectrogram (tracking max values) ACF (tracking max values)
11
n=0 N−1−l
2)
2))
12
2 k=0 K−1
10 20 30 40 50 60 70 80 90 100 −0.4 −0.2 0.2 0.4 0.6 0.8 1 Freqeuncy [bin] Magnitude Power Power Spectrogram Weight
13
14
p n=0 N−1−l
u=1 l
n=0 N−1−l
n=0 N−1−l
t(0)− 2r t(l)+r t+l(0)
Minimize the negative ACF plus a lag-dependent term (de Cheveigné & Kawahara, 2002)
15
AMDF Normalized AMDF
16
17
18
19
(Puckette et al. 1998)
20
[From Ellis’ e4896 course slides]
500 1000 1500 2000 2500 3000 3500 4000 −20 20 40 60 80 100 120 Frequency [Hz] Magnitude [dB] 100 200 300 400 500 600 700 800 −100 −50 50 100 150 200 Quefrency Cepstrum
21
Liftering
22
m=1 M
23
Cochlear Filter banks
Oval window High Freq. Low Freq. Stabilize & Combine input . . .
HC HC HC
. . .
ACF ACF ACF
Summary ACF
Correlogram Summary ACF Correlogram
Hair cells Auto-correlation Functions
24
n−1e −2πbt cos(2π ft +ϕ)u(t)
25
26
27
28
29