using the similarity matrix
play

using the Similarity Matrix Zafar Rafii & Bryan Pardo - PowerPoint PPT Presentation

Music/Voice Separation using the Similarity Matrix Zafar Rafii & Bryan Pardo Introduction Musical pieces are often characterized by an underlying repeating structure over which varying elements are superimposed Propellerheads - History


  1. Music/Voice Separation using the Similarity Matrix Zafar Rafii & Bryan Pardo

  2. Introduction • Musical pieces are often characterized by an underlying repeating structure over which varying elements are superimposed Propellerheads - History Repeating 1 0 -1 2 4 6 8 10 12 time (s) 10/12/12 Zafar Rafii & Bryan Pardo 2

  3. Introduction • The REpeating Pattern Extraction Technique (REPET) was proposed to extract the repeating structure from the non-repeating structure Repeating Structure Mixture REPET Non-repeating Structure 10/12/12 Zafar Rafii & Bryan Pardo 3

  4. REPET Mixture Spectrogram V Step 1 Beat Spectrum b Mixture Signal x 500 1000 1 1 1500 .9 8 .8 2000 6 .7 4 .6 2500 2 .5 3000 0 .4 2 3500 .3 4 .2 4000 6 .1 8 0 4500 -1 p 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 5000 5500 Median V Repeating Segment S Step 2 500 500 500 1000 1000 500 1000 1500 1500 1000 1500 2000 2000 1500 2000 2500 2500 2000 2500 3000 3000 2500 3000 3500 3500 3000 3500 4000 4000 3500 4000 4500 4500 4000 4500 5000 5000 4500 5000 5500 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 5500 5000 5500 1p 2p 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 5500 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 Time-Frequency Mask M V Repeating Spectrogram W S Step 3 500 500 500 1000 1000 1000 1500 1500 1500 2000 2000 2000 2500 2500 2500 3000 3000 3000 3500 3500 3500 4000 4000 4000 4500 4500 4500 5000 5000 5000 5500 5500 5500 1 2 3 4 5 6 Zafar Rafii & Bryan Pardo 4 min min min

  5. Adaptive REPET Mixture Spectrogram V Beat Spectrogram B Step 1 Mixture Signal x 500 1000 1 1500 8 2000 6 4 2500 2 3000 0 2 3500 4 4000 6 8 4500 -1 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 p i 5000 5500 i i 1 .9 .8 .7 .6 .5 .4 .3 .2 .1 0 Median V Repeating Spectrogram U Step 2 500 500 1000 1000 1500 1500 2000 2000 2500 2500 3000 3000 3500 3500 4000 4000 4500 4500 5000 5000 5500 i-1p i i+1p i i-1p i i i+1p i i 5500 i Time-Frequency Mask M V Repeating Spectrogram W Step 3 U 500 500 500 1000 1000 1000 1500 1500 1500 500 2000 2000 2000 1000 2500 2500 2500 1500 3000 3000 3000 2000 3500 3500 3500 2500 4000 4000 4000 3000 4500 4500 4500 3500 5000 5000 5000 4000 5500 5500 5500 4500 1 2 3 4 5 6 Zafar Rafii & Bryan Pardo 5 5000 min 5500

  6. Limitations • Both the original and the adaptive REPET assume periodically repeating patterns Periodically Mixture repeating background Beat spectrogram period finder 10/12/12 Zafar Rafii & Bryan Pardo 6

  7. Limitations • Repetitions can also happen intermittently or without a global (or local) period Non-periodically Mixture repeating background Beat spectrogram period finder 10/12/12 Zafar Rafii & Bryan Pardo 7

  8. Limitations • Instead of looking for periodicities, we can look for similarities , using a similarity matrix Similarity matrix Non-periodically +similar Mixture repeating background +dissimilar 10/12/12 Zafar Rafii & Bryan Pardo 8

  9. Similarity Matrix • The similarity matrix is a matrix where each bin measures the (dis)similarity between any two elements of a sequence given a metric Similarity matrix +similar Sequence metric +dissimilar i 1 i 2 i 1 i 2 10/12/12 Zafar Rafii & Bryan Pardo 9

  10. Similarity Matrix • In audio, the SM can help to visualize the time structure and find repeating/similar patterns Similarity Matrix +similar 1 Spectrogram 12 frequency (kHz) 20 0 10 8 time (s) 0 cosine 10 6 +dissimilar 0 0 4 2 4 6 8 10 12 0 2 time (s) 0 2 4 6 8 10 12 time (s) 10/12/12 Zafar Rafii & Bryan Pardo 10

  11. Assumptions • Given a mixture of music + voice: – The repeating background is dense & low-ranked – The non-repeating foreground is sparse & varied Mixture Spectrogram Background Spectrogram Foreground Spectrogram frequency (kHz) frequency (kHz) frequency (kHz) 20 20 20 10 10 10 0 0 0 2 4 6 8 10 12 2 4 6 8 10 12 2 4 6 8 10 12 time (s) time (s) time (s) 10/12/12 Zafar Rafii & Bryan Pardo 11

  12. Assumptions • The SM of a mixture is then likely to reveal the structure of the repeating background Similarity Matrix 12 Mixture Spectrogram Background Spectrogram 10 frequency (kHz) frequency (kHz) 20 20 8 time (s) 6 10 10 4 0 0 2 2 4 6 8 10 12 2 4 6 8 10 12 time (s) time (s) 2 4 6 8 10 12 time (s) 10/12/12 Zafar Rafii & Bryan Pardo 12

  13. REPET-SIM • REPET with Similarity Matrix! 1. Identify the repeating/similar elements 2. Derive a repeating model 3. Extract the repeating structure Repeating Structure Mixture Signal REPET- Non-repeating Structure SIM 10/12/12 Zafar Rafii & Bryan Pardo 13

  14. REPET-SIM • Advantages compared with REPET: – Can handle intermittent repeating elements – Can handle fast-varying repeating structures – Can handle full-track songs Repeating Structure Mixture Signal REPET- Non-repeating Structure SIM 10/12/12 Zafar Rafii & Bryan Pardo 14

  15. Interests • Practical Interests – Audio post processing – Melody extraction – Karaoke gaming • Intellectual Interests – Music perception – Music understanding – Simply based on self-similarity! 10/12/12 Zafar Rafii & Bryan Pardo 15

  16. REPET-SIM Similarity Matrix S Mixture Spectrogram V 6 Step 1 5 Mixture Signal x j 3 500 1000 4 1 j 2 1500 8 2000 6 3 4 2500 2 3000 0 2 2 3500 j 1 4 4000 6 1 8 4500 -1 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 5000 5500 i i 1 2 3 4 5 6 Median V Repeating Spectrogram U Step 2 500 500 1000 1000 1500 1500 2000 2000 2500 2500 3000 3000 3500 3500 4000 4000 4500 4500 5000 5000 5500 j 1 j 2 =i j 3 j 1 j 2 j 3 5500 i Time-Frequency Mask M V Repeating Spectrogram W Step 3 U 500 500 500 1000 1000 1000 1500 1500 1500 500 2000 2000 2000 1000 2500 2500 2500 1500 3000 3000 3000 2000 3500 3500 3500 2500 4000 4000 4000 3000 4500 4500 4500 3500 5000 5000 5000 4000 5500 5500 5500 4500 1 2 3 4 5 6 Zafar Rafii & Bryan Pardo 16 min 5000 5500

  17. 1. Repeating Elements Similarity Matrix S Mixture Spectrogram V 6 Step 1 5 Mixture Signal x j 3 500 1000 4 1 j 2 1500 8 2000 6 3 4 2500 2 3000 0 2 2 3500 j 1 4 4000 6 1 8 4500 -1 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 5000 5500 i i 1 2 3 4 5 6 Median V Repeating Spectrogram U Step 2 500 500 1000 1000 1500 1500 2000 2000 2500 2500 3000 3000 3500 3500 4000 4000 4500 4500 5000 5000 5500 j 1 j 2 =i j 3 j 1 j 2 j 3 5500 i Time-Frequency Mask M V Repeating Spectrogram W Step 3 U 500 500 500 1000 1000 1000 1500 1500 1500 500 2000 2000 2000 1000 2500 2500 2500 1500 3000 3000 3000 2000 3500 3500 3500 2500 4000 4000 4000 3000 4500 4500 4500 3500 5000 5000 5000 4000 5500 5500 5500 4500 1 2 3 4 5 6 Zafar Rafii & Bryan Pardo 17 min 5000 5500

  18. 1. Repeating Elements • We take the cosine similarity between any two pairs of columns and get a similarity matrix Similarity Matrix Mixture Spectrogram frequency (kHz) 12 20 10 i 2 cosine 10 8 time (s) 6 0 2 4 6 8 10 12 i 1 i 2 4 time (s) 2 2 4 6 8 10 12 i 1 time (s) 10/12/12 Zafar Rafii & Bryan Pardo 18

  19. 1. Repeating Elements • The SM reveals for every frame i, the frames j k that are the most similar to frame i Similarity Matrix Mixture Spectrogram Mixture Spectrogram frequency (kHz) frequency (kHz) 12 20 20 10 j 3 cosine 10 10 8 time (s) 6 0 0 2 4 6 8 10 12 2 4 6 8 10 12 i j 1 j 2 j 3 4 time (s) time (s) j 2 2 j 1 2 4 6 8 10 12 i time (s) 10/12/12 Zafar Rafii & Bryan Pardo 19

  20. 1. Repeating Elements Similarity Matrix S Mixture Spectrogram V 6 Step 1 5 Mixture Signal x j 3 500 1000 4 1 j 2 1500 8 2000 6 3 4 2500 2 3000 0 2 2 3500 j 1 4 4000 6 1 8 4500 -1 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 5000 5500 i i 1 2 3 4 5 6 Median V Repeating Spectrogram U Step 2 500 500 1000 1000 1500 1500 2000 2000 2500 2500 3000 3000 3500 3500 4000 4000 4500 4500 5000 5000 5500 j 1 j 2 =i j 3 j 1 j 2 j 3 5500 i Time-Frequency Mask M V Repeating Spectrogram W Step 3 U 500 500 500 1000 1000 1000 1500 1500 1500 500 2000 2000 2000 1000 2500 2500 2500 1500 3000 3000 3000 2000 3500 3500 3500 2500 4000 4000 4000 3000 4500 4500 4500 3500 5000 5000 5000 4000 5500 5500 5500 4500 1 2 3 4 5 6 Zafar Rafii & Bryan Pardo 20 min 5000 5500

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend