extracting and using music audio information
play

Extracting and Using Music Audio Information Dan Ellis Laboratory - PowerPoint PPT Presentation

Extracting and Using Music Audio Information Dan Ellis Laboratory for Recognition and Organization of Speech and Audio Dept. Electrical Engineering, Columbia University, NY USA http://labrosa.ee.columbia.edu/ 1. Motivation: Music Collections


  1. Extracting and Using Music Audio Information Dan Ellis Laboratory for Recognition and Organization of Speech and Audio Dept. Electrical Engineering, Columbia University, NY USA http://labrosa.ee.columbia.edu/ 1. Motivation: Music Collections 2. Music Information 3. Music Similarity 4. Music Structure Discovery p. /42 1 Music Audio Information - Ellis 2007-11-02

  2. LabROSA Overview Information Extraction Music Environment Recognition Separation Retrieval Signal Machine Processing Learning Speech p. /42 Music Audio Information - Ellis 2007-11-02 2

  3. 1. Managing Music Collections • A lot of music data available e.g. 60G of MP3 ≈ 1000 hr of audio, 15k tracks • Management challenge how can computers help? • Application scenarios personal music collection discovering new music “music placement” p. /42 3 Music Audio Information - Ellis 2007-11-02

  4. Learning from Music • What can we infer from 1000 h of music? common patterns Scatter of PCA(3:6) of 12x16 beatchroma 60 sounds, melodies, chords, form 50 what is and what isn’t music 40 30 • Data driven musicology? 20 10 10 20 30 40 50 60 60 • Applications 50 40 modeling/description/coding 30 computer generated music 20 10 curiosity... 10 20 30 40 50 60 p. /42 4 Music Audio Information - Ellis 2007-11-02

  5. The Big Picture Low-level browsing Classification features discovery and Similarity production Melody and notes Music audio Key Music modeling and chords Structure generation curiosity Discovery Tempo and beat .. so far p. /42 5 Music Audio Information - Ellis 2007-11-02

  6. 2. Music Information • How to represent music audio? • Audio features 4000 3000 Frequency spectrogram, MFCCs, bases 2000 1000 0 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 Time • Musical elements notes, beats, chords, phrases requires transcription • Or something inbetween? optimized for a certain task? p. /42 6 Music Audio Information - Ellis 2007-11-02

  7. Transcription as Classification Poliner & Ellis ‘05,’06,’07 • Exchange signal models for data transcription as pure classification problem: feature representation feature vector T raining data and features: • MIDI, multi-track recordings, playback piano , & resampled audio (less than 28 mins of train audio). • Normalized magnitude STFT. classification posteriors Classification: • N-binary SVMs (one for ea. note). • Independent frame-le v el classification on 10 ms grid. • Dist. to class bndy as posterior. hmm smoothing T emporal Smoothing: • T w o state (on/off) independent HMM for ea. note. Parameters learned from training data. • Find Viterbi sequence for ea. note. p. /42 7 Music Audio Information - Ellis 2007-11-02

  8. Polyphonic Transcription • Real music excerpts + ground truth MIREX 2007 Frame-level transcription Estimate the fundamental frequency of all notes present on a 10 ms grid 1.25 1.00 0.75 0.50 0.25 0 Precision Recall Acc Etot Esubs Emiss Efa Note-level transcription Group frame-level predictions into note-level transcriptions by estimating onset/offset 1.25 1.00 0.75 0.50 0.25 0 Precision Recall Ave. F-measure Ave. Overlap p. /42 8 Music Audio Information - Ellis 2007-11-02

  9. Beat Tracking Ellis ’06,’07 • Goal: One feature vector per ‘beat’ (tatum) for tempo normalization, efficiency • “Onset Strength Envelope” sum f (max(0, diff t (log | X ( t , f )|))) 40 freq / mel 30 20 10 0 • Autocorr. + window → global tempo estimate 0 5 10 15 time / sec 0 168.5 BPM 0 100 200 300 400 500 600 700 800 900 1000 lag / 4 ms samples p. /42 9 Music Audio Information - Ellis 2007-11-02

  10. Beat Tracking • Dynamic Programming finds beat times { t i } optimizes  i O ( t i ) +   i W ( ( t i +1 – t i –  p )/  ) where O ( t ) is onset strength envelope (local score) W ( t ) is a log-Gaussian window (transition cost)  p is the default beat period per measured tempo incrementally find best predecessor at every time backtrace from largest final score to get beats τ t C *( t ) = γ O ( t ) + (1– γ )max { W ( ( τ – τ p )/ β ) C *( τ ) } C *( t ) τ P ( t ) = argmax { W ( ( τ – τ p )/ β ) C *( τ ) } O ( t ) τ p. /42 10 Music Audio Information - Ellis 2007-11-02

  11. Beat Tracking • DP will bridge gaps (non-causal) there is always a best path ... Alanis Morissette - All I Want - gap + beats 40 freq / Bark band 30 20 10 • 2nd place in MIREX 2006 Beat Tracking 182 184 186 188 190 192 time / sec compared to McKinney & Moelants human data test 2 (Bragg) - McKinney + Moelants Subject data freq / Bark band 40 30 20 10 40 Subject # 20 0 0 5 10 15 time / s p. /42 11 Music Audio Information - Ellis 2007-11-02

  12. Chroma Features • Chroma features convert spectral energy into musical weights in a canonical octave i.e. 12 semitone bins Piano chromatic scale IF chroma 4 chroma Piano freq / kHz G 3 F scale 2 D C 1 A 0 2 4 6 8 10 100 200 300 400 500 600 700 time / sec • Can resynthesize as “Shepard Tones” time / frames all octaves at once Shepard tone resynth 12 Shepard tone spectra 0 4 level / dB freq / kHz -10 3 -20 -30 2 -40 1 -50 -60 0 0 500 1000 1500 2000 2500 freq / Hz 2 4 6 8 10 time / sec p. /42 12 Music Audio Information - Ellis 2007-11-02

  13. Key Estimation Ellis ICASSP ’07 • Covariance of chroma reflects key • Normalize by transposing for best fit Taxman Eleanor Rigby I'm Only Sleeping aligned chroma G G G single Gaussian F F F D D D model of one piece C C C A A A find ML rotation A C D F G A C D F G Love You To Aligned Global model Yellow Submarine of other pieces G G G F F F model all D D D C C C transposed pieces A A A A C D F G A C D F G A C D F G iterate until She Said She Said Good Day Sunshine And Your Bird Can Sing G G G convergence F F F D D D C C C A A A A C D F G A C D F G A C D F G aligned chroma p. /42 13 Music Audio Information - Ellis 2007-11-02

  14. Chord Transcription Sheh & Ellis ’03 • “Real Books” give chord transcriptions but no exact timing # The Beatles - A Hard Day's Night # G Cadd9 G F6 G Cadd9 G F6 G C D G C9 G .. just like speech transcripts G Cadd9 G F6 G Cadd9 G F6 G C D G C9 G Bm Em Bm G Em C D G Cadd9 G F6 G Cadd9 G • Use EM to simultaneously F6 G C D G C9 G D G C7 G F6 G C7 G F6 G C D G C9 G Bm Em Bm G Em C D G Cadd9 G F6 G Cadd9 G F6 G C D G C9 G C9 G Cadd9 Fadd9 learn and align chord models Model inventory Labelled training data dh ax k ae t ae 1 ae 2 ae 3 s ae t aa n dh 1 dh 2 Uniform dh ax k ae t Initialization initialization Θ init parameters s ae t aa n alignments E-step: ae dh ax k i N probabilities p ( q n | X 1, Θ old ) of unknowns Repeat until convergence M-step: maximize via Θ : max E[log p ( X , Q | Θ )] parameters p. /42 14 Music Audio Information - Ellis 2007-11-02

  15. Chord Transcription Frame-level Accuracy Feature Recog. Alignment Beatles - Beatles For Sale - Eight Days a Week (4096pt) # MFCC 8.7% 22.0% G # 120 F PCP_ROT 21.7% 76.0% 100 E pitch class 80 # (random ~3%) MFCCs are poor 60 D # 40 (can overtrain) PCPs better C 20 B (ROT helps generalization) 0 # intensity A 16.27 24.84 time / sec true E G D Bm G align E G DBm G recog E G Bm Am Em7 Bm Em7 • Needed more training data... p. /42 15 Music Audio Information - Ellis 2007-11-02

  16. 3. Music Similarity • The most central problem... motivates extracting musical information supports real applications (playlists, discovery) • But do we need content-based similarity? compete with collaborative filtering compete with fingerprinting + metadata • Maybe ... for the Future of Music connect listeners directly to musicians p. /42 16 Music Audio Information - Ellis 2007-11-02

  17. Discriminative Classification Mandel & Ellis ‘05 • Classification as a proxy for similarity • Distribution models... Training MFCCs GMMs Artist 1 KL Artist Min Artist 2 KL Test Song • vs. SVM Training MFCCs Song Features D Artist 1 D D Artist DAG SVM D Artist 2 D D Test Song p. /42 17 Music Audio Information - Ellis 2007-11-02

  18. Segment-Level Features • Statistics of spectra and envelope Mandel & Ellis ‘07 define a point in feature space for SVM classification, or Euclidean similarity... p. /42 18 Music Audio Information - Ellis 2007-11-02

  19. MIREX’07 Results • One system for similarity and classification Audio Music Similarity Audio Classification 80 0.8 70 0.7 60 0.6 50 0.5 40 0.4 30 0.3 20 0.2 Greater0 Genre ID Hierarchical Psum Genre ID Raw Fine 10 0.1 Mood ID WCsum Composer ID SDsum Artist ID Greater1 0 0 IM svm IM knn ME spec ME TL GT KL CL GH PS GT LB CB1 TL1 ME TL2 CB2 CB3 BK1 PC BK2 PS = Pohle, Schnitzer; GT = George Tzane- IM = IMIRSEL M2K; ME = Mandel, Ellis; TL = Lidy, takis; LB = Barrington, Turnbull, Torres, Lanckriet; Rauber, Pertusa, I˜ nesta; GT = George Tzane- CB = Christoph Bastuck; TL = Lidy, Rauber, Per- takis; KL = Kyogu Lee; CL = Laurier, Herrera; tusa, I˜ nesta; ME = Mandel, Ellis; BK = Bosteels, GH = Guaus, Herrera Kerre; PC = Paradzinets, Chen p. /42 19 Music Audio Information - Ellis 2007-11-02

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend