Identifying Repeated Patterns in Music Using Sparse Convolutive - - PowerPoint PPT Presentation

identifying repeated patterns in music using sparse
SMART_READER_LITE
LIVE PREVIEW

Identifying Repeated Patterns in Music Using Sparse Convolutive - - PowerPoint PPT Presentation

Identifying Repeated Patterns in Music Using Sparse Convolutive Non-Negative Matrix Factorization ISMIR 2010 Ron Weiss Juan Bello { ronw,jpbello } @nyu.edu Music and Audio Research Lab New York University August 10, 2010 Ron Weiss, Juan


slide-1
SLIDE 1

Identifying Repeated Patterns in Music Using Sparse Convolutive Non-Negative Matrix Factorization

ISMIR 2010 Ron Weiss Juan Bello {ronw,jpbello}@nyu.edu

Music and Audio Research Lab New York University

August 10, 2010

Ron Weiss, Juan Bello (MARL, NYU) Identifying Repeated Patterns in Music Using Sparse Convolutive Non-Negative Matrix Fact August 10, 2010 1 / 17

slide-2
SLIDE 2

Repetitive patterns in music

Repetition is ubiquitous is music

long-term verse-chorus structure repeated motifs

Can we identify this structure directly from audio?

What about the repeated units?

Ron Weiss, Juan Bello (MARL, NYU) Identifying Repeated Patterns in Music Using Sparse Convolutive Non-Negative Matrix Fact August 10, 2010 2 / 17

slide-3
SLIDE 3

Proposed approach

Treat song as concatenation of short, repeated template patterns Inspired by source separation / text topic modeling

Convolutive Non-negative Matrix Factorization (NMF)

Ron Weiss, Juan Bello (MARL, NYU) Identifying Repeated Patterns in Music Using Sparse Convolutive Non-Negative Matrix Fact August 10, 2010 3 / 17

slide-4
SLIDE 4

Beat-synchronous chroma features [Ellis and Poliner, 2007]

50 100 150 200 250 Time (beats) A B C D E F G

Day Tripper

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

Summarize energy at each pitch class during each beat Normalize frame energy to ignore dynamics

Ron Weiss, Juan Bello (MARL, NYU) Identifying Repeated Patterns in Music Using Sparse Convolutive Non-Negative Matrix Fact August 10, 2010 4 / 17

slide-5
SLIDE 5

SI-PLCA [Smaragdis and Raj, 2007]

Shift-invariant Probabilistic Latent Component Analysis

i.e. probabilistic convolutive NMF

V ≈

  • k

Wk ∗ hk zk Decompose matrix V into weighted (by Z) sum of latent components

each component is convolution of basis W with activations H

Short-term structure in W , long-term structure in H Must specify number, length of patterns Iterative EM learning algorithm

Ron Weiss, Juan Bello (MARL, NYU) Identifying Repeated Patterns in Music Using Sparse Convolutive Non-Negative Matrix Fact August 10, 2010 5 / 17

slide-6
SLIDE 6

Learning algorithm example – Initialization

Ron Weiss, Juan Bello (MARL, NYU) Identifying Repeated Patterns in Music Using Sparse Convolutive Non-Negative Matrix Fact August 10, 2010 6 / 17

slide-7
SLIDE 7

Learning algorithm example – Converged

100 200 300 400 500 600 700 2 4 6 8 10

V (Iteration 199)

100 200 300 400 500 600 700 2 4 6 8 10

Reconstruction

100 200 300 400 500 600 700 2 4 6 8 10

Basis 0 reconstruction

100 200 300 400 500 600 700 2 4 6 8 10

Basis 1 reconstruction

100 200 300 400 500 600 700 2 4 6 8 10

Basis 2 reconstruction

100 200 300 400 500 600 700 2 4 6 8 10

Basis 3 reconstruction

1 2 3 0.00 0.05 0.10 0.15 0.20 0.25 0.30

Z

10 20 30

W0

10 20 30

W1

10 20 30

W2

10 20 30

W3

100 200 300 400 500 600 700 ∗

H0

100 200 300 400 500 600 700 ∗

H1

100 200 300 400 500 600 700 ∗

H2

100 200 300 400 500 600 700 ∗

H3

Ron Weiss, Juan Bello (MARL, NYU) Identifying Repeated Patterns in Music Using Sparse Convolutive Non-Negative Matrix Fact August 10, 2010 7 / 17

slide-8
SLIDE 8

Sparsity

Encourage sparse (mostly zero) parameters using prior distributions Use entropic prior over activations H [Smaragdis et al., 2008]

low entropy = ⇒ less uniform

Leads to more meaningful patterns

but reduces temporal information in activations sparse H = ⇒ dense W

Ron Weiss, Juan Bello (MARL, NYU) Identifying Repeated Patterns in Music Using Sparse Convolutive Non-Negative Matrix Fact August 10, 2010 8 / 17

slide-9
SLIDE 9

Automatic relevance determination [Tan and F´

evotte, 2009]

Avoid having to specify number of patterns in advance

Initialize decomposition with large number of patterns Sparse Dirichlet distribution over mixing weights Z Discard unused patterns

50 100 150 200 Iteration 2 4 6 8 10 12 14 16 Effective rank (K)

Ron Weiss, Juan Bello (MARL, NYU) Identifying Repeated Patterns in Music Using Sparse Convolutive Non-Negative Matrix Fact August 10, 2010 9 / 17

slide-10
SLIDE 10

Sparse learning example – Initialization

100 200 300 400 500 600 700 2 4 6 8 10

V (Iteration 0)

100 200 300 400 500 600 700 2 4 6 8 10

Reconstruction

100 200 300 400 500 600 700 2 4 6 8 10

Basis 0 reconstruction

100 200 300 400 500 600 700 2 4 6 8 10

Basis 1 reconstruction

100 200 300 400 500 600 700 2 4 6 8 10

Basis 2 reconstruction

100 200 300 400 500 600 700 2 4 6 8 10

Basis 3 reconstruction

100 200 300 400 500 600 700 2 4 6 8 10

Basis 4 reconstruction

100 200 300 400 500 600 700 2 4 6 8 10

Basis 5 reconstruction

100 200 300 400 500 600 700 2 4 6 8 10

Basis 6 reconstruction

100 200 300 400 500 600 700 2 4 6 8 10

Basis 7 reconstruction

100 200 300 400 500 600 700 2 4 6 8 10

Basis 8 reconstruction

100 200 300 400 500 600 700 2 4 6 8 10

Basis 9 reconstruction

100 200 300 400 500 600 700 2 4 6 8 10

Basis 10 reconstruction

100 200 300 400 500 600 700 2 4 6 8 10

Basis 11 reconstruction

100 200 300 400 500 600 700 2 4 6 8 10

Basis 12 reconstruction

100 200 300 400 500 600 700 2 4 6 8 10

Basis 13 reconstruction

100 200 300 400 500 600 700 2 4 6 8 10

Basis 14 reconstruction

100 200 300 400 500 600 700 2 4 6 8 10

Basis 15 reconstruction

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07

Z

10 20 30

W0

10 20 30

W1

10 20 30

W2

10 20 30

W3

10 20 30

W4

10 20 30

W5

10 20 30

W6

10 20 30

W7

10 20 30

W8

10 20 30

W9

10 20 30

W10

10 20 30

W11

10 20 30

W12

10 20 30

W13

10 20 30

W14

10 20 30

W15

100 200 300 400 500 600 700 ∗

H0

100 200 300 400 500 600 700 ∗

H1

100 200 300 400 500 600 700 ∗

H2

100 200 300 400 500 600 700 ∗

H3

100 200 300 400 500 600 700 ∗

H4

100 200 300 400 500 600 700 ∗

H5

100 200 300 400 500 600 700 ∗

H6

100 200 300 400 500 600 700 ∗

H7

100 200 300 400 500 600 700 ∗

H8

100 200 300 400 500 600 700 ∗

H9

100 200 300 400 500 600 700 ∗

H10

100 200 300 400 500 600 700 ∗

H11

100 200 300 400 500 600 700 ∗

H12

100 200 300 400 500 600 700 ∗

H13

100 200 300 400 500 600 700 ∗

H14

100 200 300 400 500 600 700 ∗

H15

Ron Weiss, Juan Bello (MARL, NYU) Identifying Repeated Patterns in Music Using Sparse Convolutive Non-Negative Matrix Fact August 10, 2010 10 / 17

slide-11
SLIDE 11

Sparse learning example – Converged

100 200 300 400 500 600 700 2 4 6 8 10

V (Iteration 199)

100 200 300 400 500 600 700 2 4 6 8 10

Reconstruction

100 200 300 400 500 600 700 2 4 6 8 10

Basis 0 reconstruction

100 200 300 400 500 600 700 2 4 6 8 10

Basis 1 reconstruction

100 200 300 400 500 600 700 2 4 6 8 10

Basis 2 reconstruction

100 200 300 400 500 600 700 2 4 6 8 10

Basis 3 reconstruction

1 2 3 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45

Z

10 20 30

W0

10 20 30

W1

10 20 30

W2

10 20 30

W3

100 200 300 400 500 600 700 ∗

H0

100 200 300 400 500 600 700 ∗

H1

100 200 300 400 500 600 700 ∗

H2

100 200 300 400 500 600 700 ∗

H3

Ron Weiss, Juan Bello (MARL, NYU) Identifying Repeated Patterns in Music Using Sparse Convolutive Non-Negative Matrix Fact August 10, 2010 11 / 17

slide-12
SLIDE 12

Applications: Riff identification / Thumbnailing

Reconstruct song using a single pattern

Sparse activations Riff length known in advance (for now) Thumbnail corresponds to largest activation in H

2 4 6 8 10 12 14 Time (beats) A B C D E F G 0.000 0.003 0.006 0.009 0.012 0.015 0.018 0.021 0.024 100 200 300 400 500 600 700 800 Time (beats) 0.000 0.005 0.010 0.015 0.020 0.025 2 4 6 8 10 12 14 Time (beats) A B C D E F G 0.000 0.002 0.004 0.006 0.008 0.010 0.012 0.014 0.016 200 400 600 800 1000 Time (beats) 0.000 0.002 0.004 0.006 0.008 0.010 0.012 0.014 0.016

Ron Weiss, Juan Bello (MARL, NYU) Identifying Repeated Patterns in Music Using Sparse Convolutive Non-Negative Matrix Facto August 10, 2010 12 / 17

slide-13
SLIDE 13

Applications: Structure segmentation

Identify long-term song structure (verse, chorus, bridge, etc.) Assume one-to-one mapping between chroma patterns and segments Use SI-PLCA decomposition with longer patterns

no prior on activations

Ron Weiss, Juan Bello (MARL, NYU) Identifying Repeated Patterns in Music Using Sparse Convolutive Non-Negative Matrix Fact August 10, 2010 13 / 17

slide-14
SLIDE 14

Structure segmentation example

Estimated intro refrain verse refrain verse refrain verse refrain refrain

  • utro

.. .. .. .. .. .. .. .. .. .. Ground truth intro refrain verse refrain vs/break refrain verse refrain refrain

  • utro

.. .. .. .. .. .. .. .. .. .. Ron Weiss, Juan Bello (MARL, NYU) Identifying Repeated Patterns in Music Using Sparse Convolutive Non-Negative Matrix Fact August 10, 2010 14 / 17

slide-15
SLIDE 15

Structure segmentation example 2

segments tend to be broken into multiple motifs

Est verse1 verse2 verse1 verse2 refrain. verse1 verse2 refrain. verse1 outro. verse1 refrain. verse1

  • utro

.. .. .. .. .. .. .. .. .. .. .. .. .. .. GT verse verse refrain. verse refrain.

1 2 verse inst. 1 2 verse

refrain.

  • utro

.. .. .. .. .. .. .. .. .. Ron Weiss, Juan Bello (MARL, NYU) Identifying Repeated Patterns in Music Using Sparse Convolutive Non-Negative Matrix Fact August 10, 2010 15 / 17

slide-16
SLIDE 16

Experiments

Evaluate on 180 songs from The Beatles catalog

System f-meas prec recall

  • ver-seg

under-seg [Mauch et al., 2009] 0.66 0.61 0.77 0.76 0.64 SI-PLCA (sparse Z) 0.60 0.58 0.68 0.61 0.56 SI-PLCA (rank=4) 0.58 0.60 0.59 0.56 0.59 [Levy and Sandler, 2008] 0.54 0.58 0.53 0.50 0.57 Random 0.30 0.36 0.26 0.07 0.24

Compare to systems based on self-similarity and HMM clustering

middle of the pack performance sparse Z gives ∼ 10% improvement in recall over fixed rank

Needs better post-processing?

Ron Weiss, Juan Bello (MARL, NYU) Identifying Repeated Patterns in Music Using Sparse Convolutive Non-Negative Matrix Fact August 10, 2010 16 / 17

slide-17
SLIDE 17

Summary

Novel algorithm for identifying repeated harmonic patterns in music Use sparsity to minimize number of fixed parameters, control structure Applications to thumbnailing and structure segmentation Future work

Adaptive model of pattern length, better downbeat alignment 2D convolution to compensate for key changes Time-warp invariance (beat-tracking errors, fixed hop size)

Open source Python/Matlab implementation available: http://ronw.github.com/siplca-segmentation

Ron Weiss, Juan Bello (MARL, NYU) Identifying Repeated Patterns in Music Using Sparse Convolutive Non-Negative Matrix Fact August 10, 2010 17 / 17

slide-18
SLIDE 18

References

Ellis, D. and Poliner, G. (2007). Identifying ‘cover songs’ with chroma features and dynamic programming beat tracking. In Proc. ICASSP, pages IV–1429–1432. Levy, M. and Sandler, M. (2008). Structural Segmentation of Musical Audio by Constrained Clustering. IEEE Trans. Audio, Speech, and Language Processing, 16(2). Mauch, M., Noland, K. C., and Dixon, S. (2009). Using musical structure to enhance automatic chord transcription. In Proc. ISMIR, pages 231–236. Smaragdis, P. and Raj, B. (2007). Shift-Invariant Probabilistic Latent Component Analysis. Technical Report TR2007-009, MERL. Smaragdis, P., Raj, B., and Shashanka, M. (2008). Sparse and shift-invariant feature extraction from non-negative data. In Proc. ICASSP, pages 2069–2072. Tan, V. and F´ evotte, C. (2009). Automatic Relevance Determination in Nonnegative Matrix Factorization. In Proc. SPARS. Ron Weiss, Juan Bello (MARL, NYU) Identifying Repeated Patterns in Music Using Sparse Convolutive Non-Negative Matrix Fact August 10, 2010 17 / 17