end to end probabilistic inference for nonstationary
play

End-to-End Probabilistic Inference for Nonstationary Audio Analysis - PowerPoint PPT Presentation

End-to-End Probabilistic Inference for Nonstationary Audio Analysis (or how to apply Spectral Mixture GPs to audio) William Wilkinson , Michael Riis Andersen, Josh Reiss, Dan Stowell, Arno Solin June 12, 2019 Queen Mary University of London /


  1. End-to-End Probabilistic Inference for Nonstationary Audio Analysis (or how to apply Spectral Mixture GPs to audio) William Wilkinson , Michael Riis Andersen, Josh Reiss, Dan Stowell, Arno Solin June 12, 2019 Queen Mary University of London / Aalto University / Technical University of Denmark

  2. Probabilistic time-frequency analysis We previously showed that a spectral mixture Gaussian process is equivalent to a probabilistic filter bank , i.e. a filter bank that adapts to the signal and can make predictions / generate new data. filter response (dB) filter response (dB) frequency (Hz) frequency (Hz) standard filter bank probabilistic / adaptive filter bank 1

  3. Probabilistic time-frequency analysis We previously showed that a spectral mixture Gaussian process is equivalent to a probabilistic filter bank , i.e. a filter bank that adapts to the signal and can make predictions / generate new data. D � � � σ 2 d exp( −| t − t ′ | /ℓ d ) cos( ω d ( t − t ′ ) [Prior] f ( t ) ∼ GP 0 , , d =1 [Likelihood] y k = f ( t k ) + σ y k ε k , 1

  4. End-to-End probabilistic time-frequency analysis The next step in the signal processing chain is often to analyse the dependencies in the spectrogram, with e.g. non-negative matrix factorisation (NMF) . 2

  5. End-to-End probabilistic time-frequency analysis signal y k Audio 3 Time (sampled at 16 kHz)

  6. End-to-End probabilistic time-frequency analysis spectrogram GP subbands f d ( t ) × GP carrier ) z H ( . q e r F signal y k = Audio 3 Time (sampled at 16 kHz)

  7. End-to-End probabilistic time-frequency analysis GP spectrogram = NMF weights ( W ) × positive modulator GPs ( g n ( t )) × spectrogram GP subbands f d ( t ) × GP carrier ) z H ( . q e r F signal y k = Audio 3 Time (sampled at 16 kHz)

  8. The model GP prior: 0 , σ 2 � d exp( −| t − t ′ | /ℓ d ) cos( ω d ( t − t ′ ) � f d ( t ) ∼ GP , d = 1 , 2 , . . . , D , g n ( t ) ∼ GP (0 , κ ( n ) g ( t , t ′ )) , n = 1 , 2 , . . . , N , 4

  9. The model GP prior: 0 , σ 2 � d exp( −| t − t ′ | /ℓ d ) cos( ω d ( t − t ′ ) � f d ( t ) ∼ GP , d = 1 , 2 , . . . , D , g n ( t ) ∼ GP (0 , κ ( n ) g ( t , t ′ )) , n = 1 , 2 , . . . , N , Likelihood model: � y k = a d ( t k ) f d ( t k ) + σ y ε k , d for square amplitudes (the magnitude spectrogram): � a 2 d ( t k ) = W d , n softplus ( g n ( t k )) , n 4

  10. The model GP prior: 0 , σ 2 � d exp( −| t − t ′ | /ℓ d ) cos( ω d ( t − t ′ ) � f d ( t ) ∼ GP , d = 1 , 2 , . . . , D , g n ( t ) ∼ GP (0 , κ ( n ) g ( t , t ′ )) , n = 1 , 2 , . . . , N , Likelihood model: � y k = a d ( t k ) f d ( t k ) + σ y ε k , d for square amplitudes (the magnitude spectrogram): � a 2 d ( t k ) = W d , n softplus ( g n ( t k )) , n This is a nonstationary spectral mixture GP 4

  11. Inference We show how to write the model as a stochastic differential equation : d ˜ f ( t ) = F ˜ f ( t ) + Lw ( t ) , d t y k = H (˜ f ( t k )) + σ y ε k , such that inference can proceed via Kalman filtering & smoothing. 5

  12. Inference We show how to write the model as a stochastic differential equation : d ˜ f ( t ) = F ˜ f ( t ) + Lw ( t ) , d t y k = H (˜ f ( t k )) + σ y ε k , such that inference can proceed via Kalman filtering & smoothing. Usually the nonlinear H ( · ) is dealt with via linearisation (EKF), but we implement full Expectation Propagation (EP) in the Kalman smoother , and the infinite-horizon solution which scales as: O ( M 2 T ) 5

  13. Applications and Results The fully probabilistic model can, without modification , be applied to: 6

  14. Applications and Results The fully probabilistic model can, without modification , be applied to: Missing Data Synthesis Signal 2 EP IHGP 1 EKF 0 − 2 − 1 0 5 10 15 20 25 30 35 40 Time [ms] 6

  15. Applications and Results The fully probabilistic model can, without modification , be applied to: Missing Data Denoising Synthesis 15 Signal EP 1 2 EP EP 20 IHGP IHGP 1 10 1 EKF SNR [dB] IHGP 20 EKF 1 0 EKF 20 5 SpecSub − 2 − 1 0 1 · 10 − 2 0 . 1 0 . 3 0 . 5 0 5 10 15 20 25 30 35 40 Corrupting noise variance Time [ms] 6

  16. Applications and Results The fully probabilistic model can, without modification , be applied to: Missing Data Denoising Source Separation Synthesis Input audio, y 15 Signal EP 1 2 EP EP 20 IHGP IHGP 1 10 1 EKF SNR [dB] Source one: piano note C IHGP 20 EKF 1 0 EKF 20 Source two: piano note E 5 SpecSub − 2 − 1 Source three: piano note G 0 1 · 10 − 2 0 . 1 0 . 3 0 . 5 0 5 10 15 20 25 30 35 40 Corrupting noise variance Time [ms] 1 1 . 5 2 2 . 5 3 3 . 5 4 4 . 5 5 5 . 5 6 Time [secs] 6

  17. Applications and Results The fully probabilistic model can, without modification , be applied to: Missing Data Denoising Source Separation Synthesis Input audio, y 15 Signal EP 1 2 EP EP 20 IHGP IHGP 1 10 1 EKF SNR [dB] Source one: piano note C IHGP 20 EKF 1 0 EKF 20 Source two: piano note E 5 SpecSub − 2 − 1 Source three: piano note G 0 1 · 10 − 2 0 . 1 0 . 3 0 . 5 0 5 10 15 20 25 30 35 40 Corrupting noise variance Time [ms] 1 1 . 5 2 2 . 5 3 3 . 5 4 4 . 5 5 5 . 5 6 Time [secs] Thanks for listening! Poster: 6:30pm Weds, Pacific Ballroom #217 Contact: william.wilkinson@aalto.fi 6

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend