Applying Text-Based IR Techniques to Cover Song Identification - PowerPoint PPT Presentation

Overview Methodology Implementation Experimental results Rhythm Conclusion - Future Applying Text-Based IR Techniques to Cover Song Identification Nicola Montecchio nicola.montecchio@dei.unipd.it Department of Information Engineering University of Padova IRCAM, September 29th, 2010 joint work with Emanuele Di Buccio and Nicola Orio - University of Padova

Overview Methodology Implementation Experimental results Rhythm Conclusion - Future Introduction Characterization of the problem Content–based music identification in a Query By Exam- ple paradigm: retrieving music pieces that are relevant w.r.t. a musical query, given as audio recording, without using any metadata information. In this case, relevant = Cover song : rendition of a previously recorded song in genres such as rock and pop. Cover songs can be either live or studio recordings, possibly by other musicians , and may have a completely different arrangement . Useful for: intellectual property rights management, recommen- dation systems, ...

Overview Methodology Implementation Experimental results Rhythm Conclusion - Future An example Sweet Home Alabama reference – Lynyrd Skynyrd live – Lynyrd Skynyrd cover – The Outlaws

Overview Methodology Implementation Experimental results Rhythm Conclusion - Future An example Sweet Home Alabama reference – Lynyrd Skynyrd live – Lynyrd Skynyrd cover – The Outlaws cover – Jewel [in a different key]

Overview Methodology Implementation Experimental results Rhythm Conclusion - Future An example Sweet Home Alabama reference – Lynyrd Skynyrd live – Lynyrd Skynyrd cover – The Outlaws cover – Jewel [in a different key] live – Lynyrd Skynyrd [in another different key] reference with added noise

Overview Methodology Implementation Experimental results Rhythm Conclusion - Future Related work

Overview Methodology Implementation Experimental results Rhythm Conclusion - Future Why another approach? Motivation: some of the existing methods yield a very high identification accuracy (e.g., Serr´ a, Zanin, Andrzejak at MIREX 2009) but are computationally intensive; we propose a fast approach for selecting a small set of candidate matches, on which accuracy can be refined using slower techniques; We adapt techniques from text-based Information Retrieval to the music domain, in order to achieve speed.

Overview Methodology Implementation Experimental results Rhythm Conclusion - Future Overview of the system

Overview Methodology Implementation Experimental results Rhythm Conclusion - Future Assumptions a song is represented as a sequence of excerpts , and the order of the excerpts is not relevant each excerpt is represented as a sequence of chroma features , and again the order of chroma features is not taken into account A song is thus represented in a bag-of-bag-of-words fashion. while ordering information is not considered, temporal information is not completely discarded as it is loosely preserved by the grouping of chroma features into excerpts.

Overview Methodology Implementation Experimental results Rhythm Conclusion - Future Chroma features The perceived quality of a chord depends only partially on the octaves in which the individual notes are played; what seems to be relevant is the pitch class of the notes that form the chord. Extraction steps: windowing (46ms) spectral processing frequency axis “folding” 1 minute → 1292 chroma features

Overview Methodology Implementation Experimental results Rhythm Conclusion - Future Quantization – excerpt similarity Hashing of Chroma vectors by rank–based quantization : Chroma vector c = ( c 1 . . . c 12 ) Rank vector r = ( r 1 . . . r 12 ) , r k = arg k th largest value in c Hash: � K k =1 r k The similarity of two excerpts q i , d j is measured by counting (with repetitions) the number of hashes they have in common. sim( q i , d j ) = | q i ∩ d j |

Overview Methodology Implementation Experimental results Rhythm Conclusion - Future Segmentation – song similarity A song is composed of overlapping excerpts of about 15s The similarity score s for a query–document pair ( q , d ) is computed like: q = ( q 1 . . . q N q ) d = ( d 1 . . . d N d ) � N q � � � Nq s q , d ← max sim( q i , d j ) � j =1 ... N d i =1 where sim( q i , d j ) is the local similarity of excerpts q i , d j

Overview Methodology Implementation Experimental results Rhythm Conclusion - Future Matching songs in different keys As is often the case, cover versions of a song are performed in a different key A brute-force approach consists in trying all the possible 12 rotations of chroma vectors and keeping the best match among the transposed versions Alternatively, the most likely key(s) can be estimated, and only a subset of transposed matches is computed (in our case, 3).

Overview Methodology Implementation Experimental results Rhythm Conclusion - Future Similarity computation Algorithmic formulation : � � N q � Nq � � max sim( q i , d j ) s q , d ← � j =1 ... N d i =1 for all songs in the collection do for all excerpts of the query do for all excerpts of the song do compute similarity end for retain max score among song excerpts end for compute geometric mean among scores end for

Overview Methodology Implementation Experimental results Rhythm Conclusion - Future Similarity computation Actual implementation: Algorithmic formulation : � � Nq � Nq for all excerpts of the query do � � s q , d ← max sim( q i , d j ) � j =1 ... Nd for all distinct hashes of the excerpt do i =1 find excerpts of any song that have such hash for all found excerpts do for all songs in the collection do accumulate partial scores for all excerpts of the query do for all excerpts of the song do end for compute similarity end for end for retain max among song excerpts retain max among song excerpts (group by song) end for end for compute geometric mean among scores compute geometric mean among scores (group by song) end for

Overview Methodology Implementation Experimental results Rhythm Conclusion - Future Optimization caching helps reducing time spent for score accumulation the computational load is mostly due to index access: for all distinct hashes of the excerpt do solution: consider only a subset of the hashes for some distinct hashes of the excerpt do Pruning algorithm based on simple, precomputed 0.8 collection-wise statistics for each hash 0.6 0.4 trained by randomized hill climbing 0.2 1.0 1.0 0.8 objective function which privileges speed 0.8 fraction of pruned hashes 0.6 0.6 MRR while maintaining sufficient accuracy 0.4 0.4 0.2 0.2 results 0.0 0.0

Overview Methodology Implementation Experimental results Rhythm Conclusion - Future FALCON FALCON is an open source, pure Java implementation of the proposed approach, based on the popular Apache Lucene search engine library. Full source code, along with binary distribution and a test dataset, is available at: http://ims.dei.unipd.it/falcon (a demo will follow ...)

Overview Methodology Implementation Experimental results Rhythm Conclusion - Future Test collection Base collection : 500 pop songs in the database 70 corresponding queries (with a single match) 20 queries are played in a different key from their counterpart personal collection of the authors, a “real” usage scenario Extension of the collection to 10000 songs

Overview Methodology Implementation Experimental results Rhythm Conclusion - Future Evaluation measures The output of our system for a query is a rank list , i.e., a list of possible responses ordered by probability of correctness. We evaluate our system with N queries.

Overview Methodology Implementation Experimental results Rhythm Conclusion - Future Evaluation measures The output of our system for a query is a rank list , i.e., a list of possible responses ordered by probability of correctness. We evaluate our system with N queries. MRR - Mean Reciprocal Rank assumption: exactly one relevant document N MRR = 1 1 for each query � N r n n =1 r n = rank of the relevant doc. for query n

Overview Methodology Implementation Experimental results Rhythm Conclusion - Future Evaluation measures The output of our system for a query is a rank list , i.e., a list of possible responses ordered by probability of correctness. We evaluate our system with N queries. MRR - Mean Reciprocal Rank assumption: exactly one relevant document N MRR = 1 1 for each query � N r n n =1 r n = rank of the relevant doc. for query n Precision : fraction of the documents retrieved that are relevant MAP - Mean Average Precision Average Precision for a query is computed as the average of the N � j P ( j ) r ( j ) MAP = 1 precision values at each of the relevant � � N j r ( j ) documents in the ranked sequence n =1 let r ( j ) = 1 ( j -th doc. is relevant)

Overview Methodology Implementation Experimental results Rhythm Conclusion - Future Accuracy number accuracy MRR − , MRR + , MAP of songs 500 .615, .615, .615 1000 .545, .552, .550 2500 .504, .516, .493 10000 .385, .411, .323

Applying Text-Based IR Techniques to Cover Song Identification - PowerPoint PPT Presentation

Overview Methodology Implementation Experimental results Rhythm Conclusion - Future Applying Text-Based IR Techniques to Cover Song Identification Nicola Montecchio nicola.montecchio@dei.unipd.it Department of Information Engineering

Song of Songs Song of Solomon 1:1 Solomons Song of Songs. Song of Songs Song of Songs Song

10 slides that always work Simple text boxes (I) Sample text Sample text Sample text

TEA IN THE SONG PERIOD History of the Song Tea Development in the Song Period Teaware

CONTENT TITLE Insert Subtitle Here Enter Text Here Enter Text Here Enter Text Here

Song of Songs Song of Solomon Song of Songs 6:13-8:4 (NIV) Ch Choru rus Come back, come back,

Song of Songs Song of Solomon Song of Songs 5 (NIV) He I have come into my garden, my sister,

Software Security (II): Other types of software vulnerabilities Dawn Song 1 Dawn Song 3 #293

Post-Conference Presentation Sunday Oladayo Oladejo Table of Content A Introduction B

COAL COVER COAL COAL COAL COVER COVER COVER Searfoss

Enhancing ICANN Text Accountability 26 June 2014 Text #ICANN50 Text #ICANN50 Text #ICANN50

Add Your Title Here Replace your text here! Replace your text here! Insert your title here 1

Text Text #ICANN51 15 October 2014 Text Text IDN Root Zone LGR Sarmad Hussain IDN Program

Text Text #ICANN51 Contractual Compliance Text Text Contractual Compliance Update

Text Text #ICANN50 Contractual Compliance Text Text GNSO Council Meeting Wednesday, Jun 25

Web Security: Vulnerabilities & Attacks Dawn Song Cross-site Scripting Dawn Song What is

11.4 The Pricing Method: Vertex Cover Weighted Vertex Cover Weighted vertex cover. Given a

Impact map Impact map www.impactmapping.org Strategic Visual Collaborative Strategic Visual

Kotlin Puzzlers Kotlinconf, San Francisco #kotlinpuzzlers @antonkeks Estonia How can we save

Ken Birman i Cornell University. CS5410 Fall 2008. Gossip 201 Last time we saw that gossip

Even better cameras? Even better cameras? Are they needed, possible and Are they needed,

Lecture 25: A very brief introduction to discourse Julia Hockenmaier juliahmr@illinois.edu

Mark Falcon Head of Regulatory Policy and Strategy PayExpo2015, 9-10 June 2015 1 PSR Restricted

Haptic Device Design: Practice CPSC 599.86 / 601.86 Sonny Chan University of Calgary A Few Last

MIGRATING TO CAN FD TOMORROW: SELF-DRIVING, CONNECTED VEHICLES Secure, Connected, Self-Driving

Applying Text-Based IR Techniques to Cover Song Identification - PowerPoint PPT Presentation

Overview Methodology Implementation Experimental results Rhythm Conclusion - Future Applying Text-Based IR Techniques to Cover Song Identification Nicola Montecchio nicola.montecchio@dei.unipd.it Department of Information Engineering

Song of Songs Song of Solomon 1:1 Solomons Song of Songs. Song of Songs Song of Songs Song

10 slides that always work Simple text boxes (I) Sample text Sample text Sample text

TEA IN THE SONG PERIOD History of the Song Tea Development in the Song Period Teaware

CONTENT TITLE Insert Subtitle Here Enter Text Here Enter Text Here Enter Text Here

Song of Songs Song of Solomon Song of Songs 6:13-8:4 (NIV) Ch Choru rus Come back, come back,

Song of Songs Song of Solomon Song of Songs 5 (NIV) He I have come into my garden, my sister,

Software Security (II): Other types of software vulnerabilities Dawn Song 1 Dawn Song 3 #293

Post-Conference Presentation Sunday Oladayo Oladejo Table of Content A Introduction B

COAL COVER COAL COAL COAL COVER COVER COVER Searfoss

Enhancing ICANN Text Accountability 26 June 2014 Text #ICANN50 Text #ICANN50 Text #ICANN50

Add Your Title Here Replace your text here! Replace your text here! Insert your title here 1

Text Text #ICANN51 15 October 2014 Text Text IDN Root Zone LGR Sarmad Hussain IDN Program

Text Text #ICANN51 Contractual Compliance Text Text Contractual Compliance Update

Text Text #ICANN50 Contractual Compliance Text Text GNSO Council Meeting Wednesday, Jun 25

Web Security: Vulnerabilities &amp; Attacks Dawn Song Cross-site Scripting Dawn Song What is

11.4 The Pricing Method: Vertex Cover Weighted Vertex Cover Weighted vertex cover. Given a

Impact map Impact map www.impactmapping.org Strategic Visual Collaborative Strategic Visual

Kotlin Puzzlers Kotlinconf, San Francisco #kotlinpuzzlers @antonkeks Estonia How can we save

Ken Birman i Cornell University. CS5410 Fall 2008. Gossip 201 Last time we saw that gossip

Even better cameras? Even better cameras? Are they needed, possible and Are they needed,

Lecture 25: A very brief introduction to discourse Julia Hockenmaier juliahmr@illinois.edu

Mark Falcon Head of Regulatory Policy and Strategy PayExpo2015, 9-10 June 2015 1 PSR Restricted

Haptic Device Design: Practice CPSC 599.86 / 601.86 Sonny Chan University of Calgary A Few Last

MIGRATING TO CAN FD TOMORROW: SELF-DRIVING, CONNECTED VEHICLES Secure, Connected, Self-Driving

Web Security: Vulnerabilities & Attacks Dawn Song Cross-site Scripting Dawn Song What is