Applying Text-Based IR Techniques to Cover Song Identification - - PowerPoint PPT Presentation

applying text based ir techniques to cover song
SMART_READER_LITE
LIVE PREVIEW

Applying Text-Based IR Techniques to Cover Song Identification - - PowerPoint PPT Presentation

Overview Methodology Implementation Experimental results Rhythm Conclusion - Future Applying Text-Based IR Techniques to Cover Song Identification Nicola Montecchio nicola.montecchio@dei.unipd.it Department of Information Engineering


slide-1
SLIDE 1

Overview Methodology Implementation Experimental results Rhythm Conclusion - Future

Applying Text-Based IR Techniques to Cover Song Identification

Nicola Montecchio

nicola.montecchio@dei.unipd.it

Department of Information Engineering University of Padova

IRCAM, September 29th, 2010 joint work with Emanuele Di Buccio and Nicola Orio - University of Padova

slide-2
SLIDE 2

Overview Methodology Implementation Experimental results Rhythm Conclusion - Future

Introduction

Characterization of the problem Content–based music identification in a Query By Exam- ple paradigm: retrieving music pieces that are relevant w.r.t. a musical query, given as audio recording, without using any metadata information. In this case, relevant = Cover song: rendition of a previously recorded song in genres such as rock and pop. Cover songs can be either live or studio recordings, possibly by other musicians, and may have a completely different arrangement. Useful for: intellectual property rights management, recommen- dation systems, ...

slide-3
SLIDE 3

Overview Methodology Implementation Experimental results Rhythm Conclusion - Future

An example

Sweet Home Alabama reference – Lynyrd Skynyrd live – Lynyrd Skynyrd cover – The Outlaws

slide-4
SLIDE 4

Overview Methodology Implementation Experimental results Rhythm Conclusion - Future

An example

Sweet Home Alabama reference – Lynyrd Skynyrd live – Lynyrd Skynyrd cover – The Outlaws cover – Jewel [in a different key]

slide-5
SLIDE 5

Overview Methodology Implementation Experimental results Rhythm Conclusion - Future

An example

Sweet Home Alabama reference – Lynyrd Skynyrd live – Lynyrd Skynyrd cover – The Outlaws cover – Jewel [in a different key] live – Lynyrd Skynyrd [in another different key] reference with added noise

slide-6
SLIDE 6

Overview Methodology Implementation Experimental results Rhythm Conclusion - Future

Related work

slide-7
SLIDE 7

Overview Methodology Implementation Experimental results Rhythm Conclusion - Future

Why another approach?

Motivation: some of the existing methods yield a very high identification accuracy (e.g., Serr´ a, Zanin, Andrzejak at MIREX 2009) but are computationally intensive; we propose a fast approach for selecting a small set of candidate matches, on which accuracy can be refined using slower techniques; We adapt techniques from text-based Information Retrieval to the music domain, in order to achieve speed.

slide-8
SLIDE 8

Overview Methodology Implementation Experimental results Rhythm Conclusion - Future

Overview of the system

slide-9
SLIDE 9

Overview Methodology Implementation Experimental results Rhythm Conclusion - Future

Assumptions

a song is represented as a sequence of excerpts, and the

  • rder of the excerpts is not relevant

each excerpt is represented as a sequence of chroma features, and again the order of chroma features is not taken into account A song is thus represented in a bag-of-bag-of-words fashion. while ordering information is not considered, temporal information is not completely discarded as it is loosely preserved by the grouping of chroma features into excerpts.

slide-10
SLIDE 10

Overview Methodology Implementation Experimental results Rhythm Conclusion - Future

Chroma features

The perceived quality of a chord depends

  • nly partially on the octaves in which

the individual notes are played; what seems to be relevant is the pitch class

  • f the notes that form the chord.

Extraction steps: windowing (46ms) spectral processing frequency axis “folding” 1 minute → 1292 chroma features

slide-11
SLIDE 11

Overview Methodology Implementation Experimental results Rhythm Conclusion - Future

Quantization – excerpt similarity

Hashing of Chroma vectors by rank–based quantization:

Chroma vector c = (c1 . . . c12) Rank vector r = (r1 . . . r12), rk = arg kth largest value in c Hash: K

k=1 rk

The similarity of two excerpts qi, dj is measured by counting (with repetitions) the number of hashes they have in common. sim(qi, dj) = |qi ∩ dj|

slide-12
SLIDE 12

Overview Methodology Implementation Experimental results Rhythm Conclusion - Future

Segmentation – song similarity

A song is composed of overlapping excerpts of about 15s The similarity score s for a query–document pair (q, d) is computed like: q = (q1 . . . qNq) d = (d1 . . . dNd) sq,d ←

Nq

  • Nq
  • i=1

max

j=1...Nd

sim(qi, dj) where sim(qi, dj) is the local similarity of excerpts qi, dj

slide-13
SLIDE 13

Overview Methodology Implementation Experimental results Rhythm Conclusion - Future

Matching songs in different keys

As is often the case, cover versions of a song are performed in a different key A brute-force approach consists in trying all the possible 12 rotations of chroma vectors and keeping the best match among the transposed versions Alternatively, the most likely key(s) can be estimated, and only a subset of transposed matches is computed (in our case, 3).

slide-14
SLIDE 14

Overview Methodology Implementation Experimental results Rhythm Conclusion - Future

Similarity computation

Algorithmic formulation:

sq,d ←

Nq

  • Nq
  • i=1

max

j=1...Nd

sim(qi, dj) for all songs in the collection do for all excerpts of the query do for all excerpts of the song do compute similarity end for retain max score among song excerpts end for compute geometric mean among scores end for

slide-15
SLIDE 15

Overview Methodology Implementation Experimental results Rhythm Conclusion - Future

Similarity computation

Algorithmic formulation: sq,d ←

Nq

  • Nq
  • i=1

max

j=1...Nd

sim(qi , dj ) for all songs in the collection do for all excerpts of the query do for all excerpts of the song do compute similarity end for retain max among song excerpts end for compute geometric mean among scores end for

Actual implementation:

for all excerpts of the query do for all distinct hashes of the excerpt do find excerpts of any song that have such hash for all found excerpts do accumulate partial scores end for end for retain max among song excerpts (group by song) end for compute geometric mean among scores (group by song)

slide-16
SLIDE 16

Overview Methodology Implementation Experimental results Rhythm Conclusion - Future

Optimization

caching helps reducing time spent for score accumulation the computational load is mostly due to index access: for all distinct hashes of the excerpt do solution: consider only a subset of the hashes for some distinct hashes of the excerpt do Pruning algorithm

based on simple, precomputed collection-wise statistics for each hash trained by randomized hill climbing

  • bjective function which privileges speed

while maintaining sufficient accuracy results

MRR 0.0 0.2 0.4 0.6 0.8 1.0 fraction of pruned hashes 0.0 0.2 0.4 0.6 0.8 1.0 0.2 0.4 0.6 0.8

slide-17
SLIDE 17

Overview Methodology Implementation Experimental results Rhythm Conclusion - Future

FALCON

FALCON is an open source, pure Java implementation of the proposed approach, based on the popular Apache Lucene search engine library. Full source code, along with binary distribution and a test dataset, is available at: http://ims.dei.unipd.it/falcon

(a demo will follow ...)

slide-18
SLIDE 18

Overview Methodology Implementation Experimental results Rhythm Conclusion - Future

Test collection

Base collection: 500 pop songs in the database 70 corresponding queries (with a single match) 20 queries are played in a different key from their counterpart personal collection of the authors, a “real” usage scenario Extension of the collection to 10000 songs

slide-19
SLIDE 19

Overview Methodology Implementation Experimental results Rhythm Conclusion - Future

Evaluation measures

The output of our system for a query is a rank list, i.e., a list of possible responses ordered by probability of correctness. We evaluate our system with N queries.

slide-20
SLIDE 20

Overview Methodology Implementation Experimental results Rhythm Conclusion - Future

Evaluation measures

The output of our system for a query is a rank list, i.e., a list of possible responses ordered by probability of correctness. We evaluate our system with N queries.

MRR - Mean Reciprocal Rank assumption: exactly one relevant document for each query rn = rank of the relevant doc. for query n

MRR = 1 N

N

  • n=1

1 rn

slide-21
SLIDE 21

Overview Methodology Implementation Experimental results Rhythm Conclusion - Future

Evaluation measures

The output of our system for a query is a rank list, i.e., a list of possible responses ordered by probability of correctness. We evaluate our system with N queries.

MRR - Mean Reciprocal Rank assumption: exactly one relevant document for each query rn = rank of the relevant doc. for query n

MRR = 1 N

N

  • n=1

1 rn

Precision: fraction of the documents retrieved that are relevant MAP - Mean Average Precision Average Precision for a query is computed as the average of the precision values at each of the relevant documents in the ranked sequence let r(j) = 1(j-th doc. is relevant)

MAP = 1 N

N

  • n=1
  • j P(j)r(j)
  • j r(j)
slide-22
SLIDE 22

Overview Methodology Implementation Experimental results Rhythm Conclusion - Future

Accuracy

number accuracy

  • f songs

MRR−, MRR+, MAP 500 .615, .615, .615 1000 .545, .552, .550 2500 .504, .516, .493 10000 .385, .411, .323

slide-23
SLIDE 23

Overview Methodology Implementation Experimental results Rhythm Conclusion - Future

Accuracy - configuration selection

length / overlap 0% 25% 50% 75% 11.61s .299, .345, .262 .333, .368, .283 .338, .372, .298 .325, .360, .283 14.51s .385, .411, .323 .377, .404, .319 .391, .417, .330 .397, .423, .330 17.41s .313, .346, .265 .334, .370, .290 .344, .382, .302 .346, .384, .306 20.32s .342, .381, .293 .353, .391, .303 .368, .407, .316 .371, .406, .321 23.22s .311, .354, .264 .329, .371, .281 .332, .370, .288 .333, .372, .288

Longer excerpt duration + less overlap = higher speed. Small differences in accuracy do not justify higher computing times.

slide-24
SLIDE 24

Overview Methodology Implementation Experimental results Rhythm Conclusion - Future

Efficiency

number index indexing query

  • f songs

size time time 500 36MB 33s 2.75s 1000 70MB 71s 4.79s 2500 158MB 169s 11.25s 10000 645MB 1124s 44.20s

Table: Efficiency results - 3Ghz CPU (1 core used), 3 GB RAM, 7200 RPM disk

slide-25
SLIDE 25

Overview Methodology Implementation Experimental results Rhythm Conclusion - Future

Demo

slide-26
SLIDE 26

Overview Methodology Implementation Experimental results Rhythm Conclusion - Future

Rhythmic facet

Preliminary work shows that the rhythmic facet can be succesfully employed to increase accuracy of the system with very low computational load.

Extraction steps: Windowing (6s) FFT processing: spectrogram Psychoacoustic transformations: Bark scale frequency warping (24 critical bands), loudness eq. (Phon, Sone) FFT processing: rhythmic struc- ture of each band Histogram of rhythmic energy per modulation frequency Median of all the 6s features 1 minute/excerpt/song → 1 RH feature

Similarity computation: cosine distance (dot product)

slide-27
SLIDE 27

Overview Methodology Implementation Experimental results Rhythm Conclusion - Future

Rhythmic facet - early results

Testing on the base collection, only with the query songs played in the same key as their counterpart Chroma features: MRR = 0.784 Rhythm Histogram features:

MRR = 0.340 MRR = 0.285 (without segmentation)

slide-28
SLIDE 28

Overview Methodology Implementation Experimental results Rhythm Conclusion - Future

Rhythmic facet - early results

0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14 Chroma similarity 0.94 0.95 0.96 0.97 0.98 0.99 1.00 Rhythm histogram similarity 0.00 0.02 0.04 0.06 0.08 0.10 0.12 Chroma similarity 0.95 0.96 0.97 0.98 0.99 Rhythm histogram similarity 0.00 0.02 0.04 0.06 0.08 0.10 0.12 Chroma similarity 0.93 0.94 0.95 0.96 0.97 0.98 0.99 Rhythm histogram similarity 0.00 0.02 0.04 0.06 0.08 0.10 Chroma similarity 0.94 0.95 0.96 0.97 0.98 0.99 Rhythm histogram similarity

Smoke On The Water Sweet Child o’ Mine Sweet Home Alabama You Shook Me All Night Long

0.00 0.02 0.04 0.06 0.08 0.10 0.12 Chroma similarity 0.94 0.95 0.96 0.97 0.98 0.99 Rhythm histogram similarity

All Along The Watchtower

0.0 0.2 0.4 0.6 0.8 1.0 0.3 0.4 0.5 0.6 0.7 0.8 0.9 MAP linear combination weighted product
slide-29
SLIDE 29

Overview Methodology Implementation Experimental results Rhythm Conclusion - Future

Rhythmic facet - early results

0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14 Chroma similarity 0.94 0.95 0.96 0.97 0.98 0.99 1.00 Rhythm histogram similarity 0.00 0.02 0.04 0.06 0.08 0.10 0.12 Chroma similarity 0.95 0.96 0.97 0.98 0.99 Rhythm histogram similarity 0.00 0.02 0.04 0.06 0.08 0.10 0.12 Chroma similarity 0.93 0.94 0.95 0.96 0.97 0.98 0.99 Rhythm histogram similarity 0.00 0.02 0.04 0.06 0.08 0.10 Chroma similarity 0.94 0.95 0.96 0.97 0.98 0.99 Rhythm histogram similarity

Smoke On The Water Sweet Child o’ Mine Sweet Home Alabama You Shook Me All Night Long

0.00 0.02 0.04 0.06 0.08 0.10 0.12 Chroma similarity 0.94 0.95 0.96 0.97 0.98 0.99 Rhythm histogram similarity

All Along The Watchtower Linear combination of chroma and rhythm histogram features:

0.0 0.2 0.4 0.6 0.8 1.0 0.3 0.4 0.5 0.6 0.7 0.8 0.9 MAP linear combination weighted product α

For the optimal λ value, MRR = 0.820

slide-30
SLIDE 30

Overview Methodology Implementation Experimental results Rhythm Conclusion - Future

Conclusion

A system for cover song identification was presented which: can be succesfully used to retrieve a small number of candidate cover songs from a large collection can match songs played in different keys is scalable w.r.t. collection size: the small disk space required allows the index to be loaded directly in memory and the matching algorithm is embarassingly parallel

slide-31
SLIDE 31

Overview Methodology Implementation Experimental results Rhythm Conclusion - Future

Open issues

Some songs are very difficult to identify using the presented techniques: extensive differences from the original

slide-32
SLIDE 32

Overview Methodology Implementation Experimental results Rhythm Conclusion - Future

Open issues

Some songs are very difficult to identify using the presented techniques: extensive differences from the original static harmonic aspect

slide-33
SLIDE 33

Overview Methodology Implementation Experimental results Rhythm Conclusion - Future

Open issues

Some songs are very difficult to identify using the presented techniques: extensive differences from the original static harmonic aspect a whole genere can be difficult to handle using harmonic content-based retrieval!

slide-34
SLIDE 34

Overview Methodology Implementation Experimental results Rhythm Conclusion - Future

Future research directions

among others: index pruning

slide-35
SLIDE 35

Overview Methodology Implementation Experimental results Rhythm Conclusion - Future

Future research directions

among others: index pruning melodic aspect

slide-36
SLIDE 36

Overview Methodology Implementation Experimental results Rhythm Conclusion - Future

Future research directions

among others: index pruning melodic aspect lyrics

slide-37
SLIDE 37

Overview Methodology Implementation Experimental results Rhythm Conclusion - Future

MERCI! Questions?