Improving melody extraction using Probabilistic Latent Component - - PowerPoint PPT Presentation

▶

Feb 18, 2024 385 likes •548 views

Improving melody extraction using Probabilistic Latent Component Analysis Jinyu. Han 1 Ching-Wei. Chen 2 1 Interactive Audio Lab Northwestern Univsersity, USA 2 Media Technology Lab Gracenote, Inc May 19, 2011 Jinyu Han (Gracenote, Inc) Melody

SLIDE 1

Improving melody extraction using Probabilistic Latent Component Analysis

Jinyu. Han1

Ching-Wei. Chen2

1Interactive Audio Lab

Northwestern Univsersity, USA

2Media Technology Lab

Gracenote, Inc

May 19, 2011

Jinyu Han (Gracenote, Inc) Melody extraction by PLCA May 19, 2011 1 / 15

SLIDE 2

Agenda

1

Introduction

2

Modeling the Spectrogram Multinomial Model Probabilistic Latent Component Analysis

3

System Description

4

Experiment Results Illustration Example System Comparison

5

Conclusion

Jinyu Han (Gracenote, Inc) Melody extraction by PLCA May 19, 2011 2 / 15

SLIDE 3

Introduction

Pick only the singing voice as the Melody

Jinyu Han (Gracenote, Inc) Melody extraction by PLCA May 19, 2011 3 / 15

SLIDE 4

Introduction

System Overview

!"#$%& '$()+& ,%)-.%/+& '0(10)23& !//%14)$10)2& 5%#0+& '$)($)(&.%$/0&6020/7%)& !//%14)$10)2& 5%#0+&89$)$)(& !//%14)$10)2& :0#"/7%)& !//%14)$10 )2-3"4490330#& '$()+& ;$2/<&=3717%)& 50+%#>& .%/+& '0(10)23&

Jinyu Han (Gracenote, Inc) Melody extraction by PLCA May 19, 2011 4 / 15

SLIDE 5

Modeling the Spectrogram Multinomial Model

Multinomial Distribution for Spectrogram

Figure: Probability distribution underlying the t-th spectrum

!"#$% &'$()$+,% &'$()$+,%

./*"0)1$%23$+0'4/'.#%

23$+0')#%45%5'.#$%!" &'.#$%!" 6#37"0)1$%

Treat the spectrum in each time slice as a histogram Treat the histogram as a probability distribution

Jinyu Han (Gracenote, Inc) Melody extraction by PLCA May 19, 2011 5 / 15

SLIDE 6

Modeling the Spectrogram Multinomial Model

Multinomial Distribution for Spectrogram

!"#$% &'$()$+,% &'$()$+,%

./*"0)1$%23$+0'4/'.#%

23$+0')#%45%1"6$'$*0%5'.#$7! 8#39"0)1$%

Jinyu Han (Gracenote, Inc) Melody extraction by PLCA May 19, 2011 6 / 15

SLIDE 7

Modeling the Spectrogram Probabilistic Latent Component Analysis

!"#$%&'()*+)+&,(#)!"#-%.+)/) 01) 02) 03) 04)

.+501)/)
.+502)/)
.+503)/)
.+504)/)

6)78$94,&:)+))!"#$%&,;)<#$%&=) >8?%'&#)@#8AB%) C,%#4%)D("4#4%) EF=#&G#7)7,%,)84)%B#) ="#$%&A&,() H=9(,%#7)F:)H?"#$%,94I >,?8(80,94)6;A*&8%B()

Jinyu Han (Gracenote, Inc) Melody extraction by PLCA May 19, 2011 7 / 15

SLIDE 8

System Description

System Overview

!"#$%& '$()+& ,%)-.%/+& '0(10)23& !//%14)$10)2& 5%#0+& '$)($)(&.%$/0&6020/7%)& !//%14)$10)2& 5%#0+&89$)$)(& !//%14)$10)2& :0#"/7%)& !//%14)$10 )2-3"4490330#& '$()+& ;$2/<&=3717%)& 50+%#>& .%/+& '0(10)23&

Jinyu Han (Gracenote, Inc) Melody extraction by PLCA May 19, 2011 8 / 15

SLIDE 9

System Description

Train Pnv(f |z) from the non-vocal segment

!"#$%#&'() !"# *+,-)."/0&0&1) 23#&45#3)67&897'/:);#1<#&=)) >?#'="/:)@#'=7";)A7")

''7<?/&0<#&=)

Jinyu Han (Gracenote, Inc) Melody extraction by PLCA May 19, 2011 9 / 15

SLIDE 10

System Description

Extract singing voice in the mixture

!"#$%#&'() !"#

*+#&,-#+)./'01)2#34#&5))

!"#$%&'())+%,)-.% !/012345%6789%

/8,:;%-'()*89%!012345)

!"#$%#&'() !"#

<)789%!/0%12345)

Jinyu Han (Gracenote, Inc) Melody extraction by PLCA May 19, 2011 10 / 15

SLIDE 11

System Description

Extract singing voice in the mixture

!"#$%#&'() !"#

!"#$%&'()*+",&-./0123)

!"#$%#&'() !"#

4*5",&-!.&/0123)

Jinyu Han (Gracenote, Inc) Melody extraction by PLCA May 19, 2011 11 / 15

SLIDE 12

Experiment Results Illustration Example

!"#$%&#'()+,-+,-)./+&') 0$+-+,%1)+,-+,-)./+&') 2'1/(3)4+,')

5)678*)&1+9)/:);<+=91')2%,>)?3)43,3$()<@3,3$())

A+=') B$'CD',&3)

Mixture Extracted Voice Clean Voice

Jinyu Han (Gracenote, Inc) Melody extraction by PLCA May 19, 2011 12 / 15

SLIDE 13

Experiment Results System Comparison

Compare out system to DHP[1] and LW[2] Precision Recall F-measure Accuracy DHP 0.52 0.48 0.50 0.48 LW 0.09 0.086 0.09 0.19 Proposed 0.43 0.80 0.55 0.61

Parts of MIREX 2005 dataset: 9 recordings, totalling about 270 seconds of autio.

Z. Duan, J. Han, and B. Pardo, “Harmonically informed pitch tracking”,in Proc. ISMIR, 2009.
Y. Li and D. Wang,

“Separation of singing voice from music accompaniment for monaural recordings”, IEEE Trans. Audio, Speech, and Language Jinyu Han (Gracenote, Inc) Melody extraction by PLCA May 19, 2011 13 / 15

SLIDE 14

Conclusion

The Probabilistic Latent Variable Model is introduced to model the accompaniment and lead vocal adaptively Experimental results show that the melody of the singing voice in mixture aduio is successfully extracted to some extent. Future directions include improving the vocal/nonvocal segementation module and the pitch estimation algorithm.

Jinyu Han (Gracenote, Inc) Melody extraction by PLCA May 19, 2011 14 / 15

SLIDE 15

Conclusion

Acknowledgement

The first author performed this work with Ching-Wei Chen while at the Gracenote Media Technology Lab. We thank Markus Cremer, Bob Coover, Phillip Popp, Trista Chen, and Peter Dunker for enlightening discussions. The authors would like to thank the reviewers for their comments that help improve the paper. We also want to thank Bryan Pardo, David Little, Zhiyao Duan, Zafar Rafii, and Mark Cartwright for their suggestions that improve the presentation.

Jinyu Han (Gracenote, Inc) Melody extraction by PLCA May 19, 2011 15 / 15

Improving melody extraction using Probabilistic Latent Component Analysis

Ching-Wei. Chen2

Northwestern Univsersity, USA

Gracenote, Inc

May 19, 2011

Agenda

1

Introduction

2

Modeling the Spectrogram Multinomial Model Probabilistic Latent Component Analysis

3

System Description

4

Experiment Results Illustration Example System Comparison

5

Conclusion

Pick only the singing voice as the Melody

System Overview

!"#$%& '$()*+& ,%)-.%/*+& '0(10)23& !//%14*)$10)2& 5%#0+& '$)($)(&.%$/0&6020/7%)& !//%14*)$10)2& 5%#0+&89*$)$)(& !//%14*)$10)2& :0#"/7%)& !//%14*)$10 )2-3"4490330#& '$()*+& ;$2/<&=371*7%)& 50+%#>& .%/*+& '0(10)23&

Multinomial Distribution for Spectrogram

Figure: Probability distribution underlying the t-th spectrum

!"#$% &'$()$*+,% &'$()$*+,%

23$+0')#%45%5'.#$%!" &'.#$%!" 6#37"0)1$%

Treat the spectrum in each time slice as a histogram Treat the histogram as a probability distribution

Multinomial Distribution for Spectrogram

!"#$% &'$()$*+,% &'$()$*+,%

23$+0')#%45%1"6$'$*0%5'.#$7! 8#39"0)1$%

!"#$%&'()*+)+&,(#)!"#-%.+)/) 01) 02) 03) 04)

6)78$9*4,&:)*+))!"#$%&,;)<#$%*&=) >8?%'&#)@#8AB%) C,%#4%)D*("*4#4%) EF=#&G#7)7,%,)84)%B#) ="#$%&*A&,() H=9(,%#7)F:)H?"#$%,9*4I >,?8(80,9*4)6;A*&8%B()

System Overview

!"#$%& '$()*+& ,%)-.%/*+& '0(10)23& !//%14*)$10)2& 5%#0+& '$)($)(&.%$/0&6020/7%)& !//%14*)$10)2& 5%#0+&89*$)$)(& !//%14*)$10)2& :0#"/7%)& !//%14*)$10 )2-3"4490330#& '$()*+& ;$2/<&=371*7%)& 50+%#>& .%/*+& '0(10)23&

Train Pnv(f |z) from the non-vocal segment

!"#$%#&'() !"# *+,-)."/0&0&1) 23#&45#3)67&897'/:);#1<#&=)) >?#'="/:)@#'=7";)A7")

Extract singing voice in the mixture

*+#&,-#+)./'01)2#34#&5))

!"#$%&'()*)*+%,)-.% !/012345%6789%

/8,:;%-'()*89%!012345)

<)789%!/0%12345)

Extract singing voice in the mixture

!"#$%&'()*+",&-./0123)

4*5",&-!.&/0123)

!"#$%&#'()*+,-+,-)./+&') 0$+-+,%1)*+,-+,-)./+&') 2'1/(3)4+,')

5)678*)&1+9)/:);<+=91')2%,>)?3)43,3$()<@3,3$())

A+=') B$'CD',&3)

Mixture Extracted Voice Clean Voice

Compare out system to DHP[1] and LW[2] Precision Recall F-measure Accuracy DHP 0.52 0.48 0.50 0.48 LW 0.09 0.086 0.09 0.19 Proposed 0.43 0.80 0.55 0.61

Parts of MIREX 2005 dataset: 9 recordings, totalling about 270 seconds of autio.

Conclusion

Acknowledgement

!"#$%& '$()+& ,%)-.%/+& '0(10)23& !//%14)$10)2& 5%#0+& '$)($)(&.%$/0&6020/7%)& !//%14)$10)2& 5%#0+&89$)$)(& !//%14)$10)2& :0#"/7%)& !//%14)$10 )2-3"4490330#& '$()+& ;$2/<&=3717%)& 50+%#>& .%/+& '0(10)23&

!"#$% &'$()$+,% &'$()$+,%

!"#$% &'$()$+,% &'$()$+,%

6)78$94,&:)+))!"#$%&,;)<#$%&=) >8?%'&#)@#8AB%) C,%#4%)D("4#4%) EF=#&G#7)7,%,)84)%B#) ="#$%&A&,() H=9(,%#7)F:)H?"#$%,94I >,?8(80,94)6;A*&8%B()

!"#$%& '$()+& ,%)-.%/+& '0(10)23& !//%14)$10)2& 5%#0+& '$)($)(&.%$/0&6020/7%)& !//%14)$10)2& 5%#0+&89$)$)(& !//%14)$10)2& :0#"/7%)& !//%14)$10 )2-3"4490330#& '$()+& ;$2/<&=3717%)& 50+%#>& .%/+& '0(10)23&

!"#$%&'())+%,)-.% !/012345%6789%

!"#$%&#'()+,-+,-)./+&') 0$+-+,%1)+,-+,-)./+&') 2'1/(3)4+,')