GCT634@KAIST Invited lecture: Sound Source Separation 7 June 2018 - - PowerPoint PPT Presentation

gct634 kaist invited lecture sound source separation
SMART_READER_LITE
LIVE PREVIEW

GCT634@KAIST Invited lecture: Sound Source Separation 7 June 2018 - - PowerPoint PPT Presentation

GCT634@KAIST Invited lecture: Sound Source Separation 7 June 2018 Keunwoo Choi at QMUL.uk, Spotify.us, groovo.io Sound Source Separation Lets isolate the target audio signal! Cocktail party e ff ects ..as if were


slide-1
SLIDE 1

GCT634@KAIST Invited lecture: Sound Source Separation

7 June 2018 Keunwoo Choi at QMUL.uk, Spotify.us, groovo.io

slide-2
SLIDE 2

Sound Source Separation

  • “Cocktail party effects”


..as if we’re simulating human brain (as if we know what’s going on there)

Let’s isolate the “target” audio signal!

slide-3
SLIDE 3

Sound Source Separation

Input Target Noise Speech, Ambience Speech Noise Mixture of speech Speaker i all j != i Music ((Vocal, Drum, Guitar, Bass + ..) Instrument i all j != i

problem = f(assumptions} assumptions = {environments: {dry, wet, ..}, signal = {ch: {mono, stereo, ..}, content: {speech, music}}, target: {...}}

slide-4
SLIDE 4

SSS Applications

  • KA-RA-O-KE!
  • Transcription
  • DJing/Mixing
  • Many other MIR tasks
  • Once we called it a “Chicken-and-egg”;


Solving SSS would make many tasks extremely easier

slide-5
SLIDE 5

BIG ASSUMPTION 
 FOR A LONG WHILE

  • with |STFT| (or CQT)
  • 1 time-frequency bin, 1 instrument -- aka “W-disjoint”
  • Phase doesn’t matter much
  • It used to apply to (almost) every research
slide-6
SLIDE 6

Problem config 1 - 
 mixing matrix A

  • There was a mixing matrix A that we’ll estimate its inverse.

s : source (instruments) a_xx : amplitude mixing coeffs x : stereo signal input ! w : estimated mixing coeffs ; y : estimated source (instruments)

slide-7
SLIDE 7

ICA

  • Independent Component Analysis (ICASSP ’98)
  • Based on some stats -- independency, (non-)Gaussianity
  • Not directly about audio but a general technique
  • Example: http://www.kecl.ntt.co.jp/icl/signal/sawada/

demo/bss2to4/index.html

Further study: https://www.cs.helsinki.fi/u/ahyvarin/papers/NN00new.pdf

slide-8
SLIDE 8

ADRESS

  • Azimuth Discrimination and Resynthesis (DAFx 2004)
  • 1-dim clustering; for stereo sound source separation
slide-9
SLIDE 9

Problem config 2 - 
 mixing matrix and delay

  • Sources are at different angles and distances


→Mixing matrix A is also about time delay

slide-10
SLIDE 10

DUET

  • Location = {angle, distance}
  • Each location, each 2D cluster, each instrument
  • DOA (Direction Of Arrival) estimation
  • Something similar is in your phone (with 2+ microphones)

to suppress non-speech sounds (but perhaps not in your earphones/headphones)

slide-11
SLIDE 11

Problem config 3 - music - spectra of instruments

http://www.physics.usyd.edu.au/teach_res/hsp/sp/mod31/m31_strings.htm

slide-12
SLIDE 12

NMF

  • Assumptions of using NMF for SSS:
  • The spectral shapes of musical instruments are known.
  • NMF would separate each note (aka basis)!
  • Many applications for drum separation (it works)

https://www.slideshare.net/DaichiKitamura/robust-music-signal-separation-based-on- supervised-nonnegative-matrix-factorization-with-prevention-of-basis-sharing

slide-13
SLIDE 13

Problem config 4 - music - repeats

  • “Instrumental parts repeat!” (↔ vocal)
  • “Drums/beats repeat!” (↔ harmonic instruments)
  • A valid assumption for modern popular music
  • E.g., REPET (IEEE 2013), KAM (D. Fano Yela, ICASSP

2017), ...

slide-14
SLIDE 14

Problem config 5 - music - some musical cases

  • “Central” (~= vocal) source separation
  • Because - main vocals are almost always

at the centre (and we all love karaoke)

  • Harmonic/percussive source separation
  • Because - they are (almost) completely

different in spectral/temporal axes

  • Median filtering for drum separation (D.

Fitzgerald, DAFx)

“Gaussian mixture model for singing voice separation from stereophonic music”, M Kim et al, 2011

slide-15
SLIDE 15

Problem config 6 - music - ‘informed’ source separation

  • Exploiting the score as side information

“Score-Informed Source Separation for Musical Audio Recordings”, S Ewert et al., 2013

slide-16
SLIDE 16

History so far...

less generality stronger assumptions

as time goes by

slide-17
SLIDE 17

DEEP! LEARNING!!

slide-18
SLIDE 18

DL and SSS

  • Less assumptions (let’s think further papers!)
  • Data-related; trained models do NOT extrapolate. 


E.g., A model with speech probably wouldn’t work with music.

  • Model-related; E.g., frame-based? context-free? Does

it estimate the phase? Stereo-input?

slide-19
SLIDE 19

Frame-based DL-SS

  • Because vocals are distinguishable in a frame (or frames)

“Deep Learning For Monaural Speech Separation”, Po-sen Huang et al, 2014

slide-20
SLIDE 20

U-Net and SS

  • Because vocals are distinguishable in the |STFT| image

“Singing Voice Separation With Deep U-Net Convolutional Networks”, A Jansson et al., ismir 2017 “U-Net: Convolutional Networks for Biomedical Image Segmentation”, O Ronneberger et al., 2015

slide-21
SLIDE 21

A practical limitation

  • Supervised learning requires a *paired*

dataset

  • for such a system;


x: [mixtures]
 y: [instrumental mixtures; vocal tracks]

  • → not sustainable

Inst Vocal 1 Inst Vocal 2 Inst Vocal 3

paired dataset

slide-22
SLIDE 22

GANs and SS

  • Weakly labelled dataset:


{many instrumental tracks} 
 (aka Real)
 +
 {many voc + instrumental tracks} 
 (input of aka Fake)

  • We alternate to show a GAN-based model 


{real instrumental / vocal-separated (fake) instrumental}
 and let the model learns


  • i) to classify real/fake 

  • ii) to fake an instrumental track = to remove vocal


simultaneously.

Inst tracks mix tracks

unpaired dataset

Inst Vocal 1 Inst Vocal 2 Inst Vocal 3

paired dataset

“Adversarial Semi-Supervised Audio Source Separation applied to Singing Voice Extraction”, D Stoller, 2018 ICASSP

slide-23
SLIDE 23

Further study

  • A great SS tutorial: http://ismir2010.ismir.net/

proceedings/tutorial_1_Vincent-Ono.pdf

slide-24
SLIDE 24

Further me

  • keunwoochoi.wordpress.com
  • keunwoochoi.blogspot.com
  • groovo.io
  • spotify.com
  • http://c4dm.eecs.qmul.ac.uk