Perceptual Evaluation of Source Separation for Remixing Music H. - - PowerPoint PPT Presentation

perceptual evaluation of source separation for remixing
SMART_READER_LITE
LIVE PREVIEW

Perceptual Evaluation of Source Separation for Remixing Music H. - - PowerPoint PPT Presentation

Perceptual Evaluation of Source Separation for Remixing Music H. Wierstorf 1 D. Ward 1 E. M. Grais 1 M. D. Plumbley 1 R. Mason 2 C. Hummersone 2 1 Centre for Vision, Speech and Signal Processing, University of Surrey 2 Institute of Sound Recording,


slide-1
SLIDE 1

Perceptual Evaluation of Source Separation for Remixing Music

  • H. Wierstorf 1
  • D. Ward 1
  • E. M. Grais 1
  • M. D. Plumbley 1
  • R. Mason 2
  • C. Hummersone 2
1Centre for Vision, Speech and Signal Processing, University of Surrey 2Institute of Sound Recording, University of Surrey 143rd AES Convention 20.10.2017, CC BY 4.0
slide-2
SLIDE 2

Source separation for music

Reference: vocals

  • thers

mixture Source separation: vocals

  • thers

How to talk about source separation? Sound quality: artifacts and distortion added Interference: not perfect separation achieved

1
slide-3
SLIDE 3

Source separation for music

How to evaluate source separation? BSS eval: signal decomposition and energy ratios1 PEASS: signal decomposition and auditory model2 Open questions Correlation with perception has been questioned3

1Vincent, et al. (2006), IEEE TASLP, doi: 10.1109/TSA.2005.858005 2Emiya, et al. (2011), IEEE TASLP, doi: 10.1109/TASL.2011.2109381 3e.g. Gupta, et al. (2015), WASPAA, doi: 10.1109/WASPAA.2015.7336923 2
slide-4
SLIDE 4

BSS eval

Decompose signal into different components sestimated = soriginal + einterferer + eartifacts SAR = 10 log10

||soriginal+einterferer||2 ||eartifacts||2

SIR = 10 log10

||soriginal||2 ||einterferer||2 3
slide-5
SLIDE 5

Source separation for music

Reference: vocals

  • thers

mixture Source separation: vocals

  • thers

How to talk about source separation? Sound quality: artifacts and distortion added Interference: not perfect separation achieved

4
slide-6
SLIDE 6

Source separation for music

Reference: vocals

  • thers

mixture Source separation: vocals

  • thers

mixture How to talk about source separation? Sound quality: artifacts and distortion added Interference: not perfect separation achieved

4
slide-7
SLIDE 7

Remixing using source separation

Modify component levels4 Change positions (upmix)5 Change frequency content6 Add effects7 Mashups

4Itoyama, et al. (2009), ISMIR, pp. 133–138 5Cobos, et al. (2008), ISCCSP, doi: 10.1109/ISCCSP.2008.4537423 6Yoshii, et al. (2005), WASPAA, doi: 10.1049/ic.2005.0733 7Woodruff, et al. (2006), ISMIR, pp. 314–319 5
slide-8
SLIDE 8

Evaluation of remixes

Evaluate the actual remix Problem if only asked for preference or naturalness8 Enable for adjustment by listeners9 Trade-off between artifacts and level increase10 Predictions with BSS eval?

8Gillet and Richard (2005), WASPAA, doi: 10.1109/ASPAA.2005.1540232 9Yoshii, et al. (2005), WASPAA, doi: 10.1049/ic.2005.0733 10Pons, et al. (2016), JASA, doi: 10.1121/1.4971424 6
slide-9
SLIDE 9

Experiment

Start with reference mix Introduce changes in level of vocals Rate sound quality and loudness balance Look for correlations with SAR and SIR

7
slide-10
SLIDE 10

Experiment

Loudness balance describes the relation of the

  • verall loudness of the vocals to the overall

loudness of the remaining instruments. It does not include short and abrupt changes in loudness that you might experience for some test sounds. It is more considered with the general balance of the vocals and the accompanying instruments.

8
slide-11
SLIDE 11

Experiment

MUSHRA inspired experiment using Web Audio Evaluation Tool11

11Jillings, et al. (2015), SMC, github: BrechtDeMan/WebAudioEvaluationTool 9
slide-12
SLIDE 12

Experiment

2 tasks: sound quality and loudness balance 5 source separation algorithms 6 songs (converted to mono) 3 remixes, level of vocal (0 dB, 6 dB, 12 dB) 3 anchor and references for every task loudness anchor: vocals −14 dB quality anchor: artifacts, distortions, 3.5 kHz low pass 15 participants

10
slide-13
SLIDE 13

Stimuli

Signal separation evaluation campaign (SiSEC)12 The MUS task includes 23 algorithms and 100 mixed songs13 SAR: 7.7 6.1 2.8 6.3 −3.4 SIR: 10.2 11.1 8.8 6.2 7.0 Vocal: UHL3 NUG3 OZE GRA3 KON

12Liutkus, et al. (2017), LVA/ICA, doi: 10.1007/978-3-319-53547-0_31 13https://www.sisec17.audiolabs-erlangen.de 11
slide-14
SLIDE 14

Results

Average across medians of every song

same worse same different

6 12 6 12 6 12 6 12 6 12

sound quality UHL3 NUG3 OZE GRA3 KON loudness balance level / dB

12
slide-15
SLIDE 15

Influence of song

Song 30 Song 48

same worse same different

R e f U H L 3 N U G 3 O Z E G R A 3 K O N A n c h
  • r
R e f U H L 3 N U G 3 O Z E G R A 3 K O N A n c h
  • r

sound quality

0 dB

loudness balance system system

13
slide-16
SLIDE 16

Influence of song

Song 30 Song 48

same worse same different

R e f U H L 3 N U G 3 O Z E G R A 3 K O N A n c h
  • r
R e f U H L 3 N U G 3 O Z E G R A 3 K O N A n c h
  • r

sound quality

6 dB

loudness balance system system

13
slide-17
SLIDE 17

Influence of song

Song 30 Song 48

same worse same different

R e f U H L 3 N U G 3 O Z E G R A 3 K O N A n c h
  • r
R e f U H L 3 N U G 3 O Z E G R A 3 K O N A n c h
  • r

sound quality

12 dB

loudness balance system system

13
slide-18
SLIDE 18

Influence of song

Connected to level balance of original mix? Song 30, level balance: 1.7 dB Song 48, level balance: −5.7 dB Weak correlation with both results for 12 dB Two songs were worse in level balance than song 48

14
slide-19
SLIDE 19

BSS eval and remixes

Correlation for 12 dB conditions

different same

−5 5 10 15 20 25 r = 0.75 rs = 0.79 loudness balance SIR / dB

15
slide-20
SLIDE 20

BSS eval and remixes

Correlation for 12 dB conditions

worse same

−4 −2 2 4 6 8 10 12 14 16 r = 0.68 rs = 0.67 sound quality SAR / dB

15
slide-21
SLIDE 21

BSS eval and remixes

Correlation for all conditions14

worse same

−10 10 20 30 40 50 60 70 80 r = 0.50 rs = 0.83 sound quality SARmix / dB

14Liu et al. (2015), EUSIPCO, doi: 10.1109/EUSIPCO.2015.7362551 16
slide-22
SLIDE 22

Conclusions

Source separation methods suitable for level remixing Trade off between achievable level and sound quality Maximum reachable level BSS eval can be used to pick algorithm Connection to adjustment experiments? https://hagenw.github.io

17
slide-23
SLIDE 23

http://cvssp.org/events/lva-ica-2018

18