Music Classification Using Constant-Q Based Features a library for - PowerPoint PPT Presentation

Feature Extraction Different features are extracted: � Length of the piece � Dynamic range (how relate loud parts to quieter ones) � Tempo in BPM (not used for classification) � Timbre (via Constant- Q Cepstrum) � Key-invariant chroma (map all octaves to one, remove key) Note: � Timbre and chroma are multi-dimensional features, the others are scalar values. � Timbre and chroma are calculated every 10 − 20 ms, the others are calculated once per recording. � But: Classifiers expect features to be uniform, or at least comparable. � Solution: Transform many multi-dimensional feature vectors to one scalar value ( → dimensionality and data count reduction). Introduction Music Signal Processing Classification Results Appendix L. Brüder Bibliography 8 / 22

Gaussian Mixture Models x 2 Data count reduction works as follows: � Take all feature vectors of one feature � Model their probability distribution � Forget about the original feature vectors � This step brings the data count reduction: One model (fixed model size) instead of many (arbitrary count) feature vectors x 1 � Do this with feature vectors of one recording: Get a model for the recording � Do this with feature vectors of all recordings from a category: Get a model for the category! Introduction Music Signal Processing Classification Results Appendix L. Brüder Bibliography 9 / 22

Gaussian Mixture Models x 2 Data count reduction works as follows: � Take all feature vectors of one feature � Model their probability distribution � Forget about the original feature vectors � This step brings the data count reduction: One model (fixed model size) instead of many (arbitrary count) feature vectors x 1

Gaussian Mixture Models x 2 Data count reduction works as follows: � Take all feature vectors of one feature � Model their probability distribution � Forget about the original feature vectors � This step brings the data count reduction: One model (fixed model size) instead of many (arbitrary count) feature vectors x 1 � Do this with feature vectors of one recording: Get a model for the recording � Do this with feature vectors of all recordings from a category: Get a model for the category! Introduction Music Signal Processing Classification Results Appendix L. Brüder Bibliography 9 / 22

GMM: Dimensionality reduction Dimensionality reduction works through comparision of the models: x 2 x 2 x 1 x 1 quite similar not that similar d ( a, b ) ≈ 0 . 9 d ( a, b ) ≈ 30 For comparision, the Kullback-Leibler divergence is used (Monte-Carlo integration!). Introduction Music Signal Processing Classification Results Appendix L. Brüder Bibliography 10 / 22

GMM: Applied to recordings and categories How does it work? � Build a model for every recording � Build a model for every category � Compare recording-model to category-model � Combine resulting scalar values for timbre and chroma with other scalar values to new “all-feature vector”:  timbre similarity to category model  chroma similarity to category model feature vector =    dynamic range    length of the recording � There is one such feature vector per recording Introduction Music Signal Processing Classification Results Appendix L. Brüder Bibliography 11 / 22

Plan 1 Introduction 2 Music Signal Processing The Constant Q transform Feature Extraction Gaussian Mixture Models 3 Classification 4 Results Demonstration 5 Appendix Dynamic range Tempo Timbre Key-invariant chroma 6 Bibliography Introduction Music Signal Processing Classification Results Appendix L. Brüder Bibliography 12 / 22

Classification: Classical approaches � Classical approaches use categories � Decision: Does a recording belong to a category, or not? � Score is binary: e.g. − 1 or 1 � Positive and negative examples needed for training (ideally many) � Approaches exist that only need positive examples � Examples for binary classifiers: LDA, SVM, (ANN) Introduction Music Signal Processing Classification Results Appendix L. Brüder Bibliography 13 / 22

Classification: Approach used � Different approach here: Recordings get a score from [ − 1 , 1] for a category � Gives a ranking rather than a classification � Positive and negative examples can be used, but there is no need for both � Only a few examples are needed (works from a single feature vector, 5-10 is ideal) + Better matches are shown first + No need for both positive and negative examples + Flexible approach, fits to users needs − There is no decision which recordings definitely do not match Introduction Music Signal Processing Classification Results Appendix L. Brüder Bibliography 14 / 22

Classification: Approach used � Different approach here: Recordings get a score from [ − 1 , 1] for a category � Gives a ranking rather than a classification � Positive and negative examples can be used, but there is no need for both � Only a few examples are needed (works from a single feature vector, 5-10 is ideal) Better matches are shown first + No need for both positive and negative examples + Flexible approach, fits to users needs − There is no decision which recordings definitely do not match Introduction Music Signal Processing Classification Results Appendix L. Brüder Bibliography 14 / 22

Classification: Approach used � Different approach here: Recordings get a score from [ − 1 , 1] for a category � Gives a ranking rather than a classification � Positive and negative examples can be used, but there is no need for both � Only a few examples are needed (works from a single feature vector, 5-10 is ideal) Better matches are shown first No need for both positive and negative examples + Flexible approach, fits to users needs − There is no decision which recordings definitely do not match Introduction Music Signal Processing Classification Results Appendix L. Brüder Bibliography 14 / 22

Classification: Approach used � Different approach here: Recordings get a score from [ − 1 , 1] for a category � Gives a ranking rather than a classification � Positive and negative examples can be used, but there is no need for both � Only a few examples are needed (works from a single feature vector, 5-10 is ideal) Better matches are shown first No need for both positive and negative examples Flexible approach, fits to users needs − There is no decision which recordings definitely do not match Introduction Music Signal Processing Classification Results Appendix L. Brüder Bibliography 14 / 22

Classification: Approach used � Different approach here: Recordings get a score from [ − 1 , 1] for a category � Gives a ranking rather than a classification � Positive and negative examples can be used, but there is no need for both � Only a few examples are needed (works from a single feature vector, 5-10 is ideal) Better matches are shown first No need for both positive and negative examples Flexible approach, fits to users needs There is no decision which recordings definitely do not match Introduction Music Signal Processing Classification Results Appendix L. Brüder Bibliography 14 / 22

Classification: How does it work? (1/2) � Four-dimensional chroma similarity recording feature vectors dynamic range dynamic range used � Calculate distribution of vectors ( → covariance matrix) same vector � Gaussian Model (no mixture!) of positive example feature vectors � Calculate Mahalanobis distance of any other feature vector: timbre similarity timbre similarity � ( x − µ ) T Σ − 1 ( x − µ ) . d Σ ( x , y ) = length Sectional drawing of feature vectors � This gives a distance value in [0 , ∞ [ . Introduction Music Signal Processing Classification Results Appendix L. Brüder Bibliography 15 / 22

GMM: Applied to recordings and categories How does it work? � Build a model for every recording � Build a model for every category � Compare recording-model to category-model � Combine resulting scalar values for timbre and chroma with other scalar values to new “all-feature vector”:  timbre similarity to category model  chroma similarity to category model feature vector =    dynamic range    length of the recording � There is one such feature vector per recording Introduction Music Signal Processing Classification Results Appendix L. Brüder Bibliography 16 / 22

Classification: How does it work? (1/2) � Four-dimensional chroma similarity recording feature vectors dynamic range dynamic range used � Calculate distribution of vectors ( → covariance matrix) same vector � Gaussian Model (no mixture!) of positive example feature vectors � Calculate Mahalanobis distance of any other feature vector: timbre similarity timbre similarity � ( x − µ ) T Σ − 1 ( x − µ ) . d Σ ( x , y ) = length Sectional drawing of feature vectors � This gives a distance value in [0 , ∞ [ . Introduction Music Signal Processing Classification Results Appendix L. Brüder Bibliography 17 / 22

Classification: How does it work? (2/2) � Transform from [0 , ∞ [ to [0 , 1] through T p ( x ) = 1 1+ x T p ( x ) 1 . 0 0 . 5 x 1 2 3 4 5 6 7 8 9 10 11 12 � Up to now: Positive model � Negative model: Second model, mapped to [ − 1 , 0] via T n ( x ) = − 1 1+ x � Sum both intervals: Values from [ − 1 , 1] Introduction Music Signal Processing Classification Results Appendix L. Brüder Bibliography 18 / 22

Results Testing procedure: Train classifier with positive and negative examples, take 100 best matches, count same-category matches. � Classical : Three positives, three negatives → 94% matches, first “false-positive” at rank 57 � Jazz/RnB : Two positives, two negatives → 89% matches, first “false-positive” at rank 41 � Pop/Rock : Two positives, one negative → 87% matches, first “false-positive” at rank 13 Introduction Music Signal Processing Classification Results Appendix L. Brüder Bibliography 20 / 22

Demonstration Introduction Music Signal Processing Classification Results Appendix L. Brüder Bibliography 21 / 22

Questions? Any questions left? Dynamic range Tempo Timbre Chroma Introduction Music Signal Processing Classification Results Appendix L. Brüder Bibliography 22 / 22

Dynamic range � Intuition: We want to define a measure of how loud parts of a musical piece relate to the quieter ones. � The measure should be small if most of the signal is at one volume. It should increase with the amount of volume changes during the recording. Within the context of music comparision, we define the dynamic range of an audio signal as the root of the mean energy of the continous input signal x c ( t ) , which is � � T c 1 (4) dyn cRMS = x 2 c ( t ) d t T c 0 with T c being the last point in time of the signal. Introduction Music Signal Processing Classification Results Appendix L. Brüder Bibliography 24 / 22

Dynamic range This definition will be changed slightly for the implementation: � N � � 1 � � nsum CQ2 ( X CQ , t n ) . (5) dyn dRMS = 1 − N n =0 with B nsum CQ ( X CQ , t n ) = 1 � | X CQ ( b, t n ) | (6) R b =0 and B � | X CQ ( b, t n ) | ) . (7) R = max t n ( b =0 Remark Here, we are talking of discrete points in time. Every t n refers to the continous time interval [ t n , t n +1 ] . Introduction Music Signal Processing Classification Results Appendix L. Brüder Bibliography 25 / 22

Tempo in bpm (beats per minute) � Intuitive: The speed at which humans tap when listening to a song � Problem: That speed is not well-defined. Some persons tap at quarters, some at halves, . . . 1. Take the sum of the constant-Q bins sum CQ ( X CQ , t n ) 2. Calculate the difference vector d CQ ( X CQ , t n ) 3. Calculate the autocorrelation of the difference vector 4. Find recurring peaks in the autocorrelation function B sum CQ ( X CQ , t n ) = � | X CQ ( b, t n ) | (8) b =0 d CQ ( X CQ , t n ) = sum CQ ( t n ) − sum CQ ( t n +1 ) (9) τ max � (10) a CQ (d CQ ( t n ) , τ ) = d CQ ( t n ) ∗ d CQ ( t n − τ ) τ =0 Introduction Music Signal Processing Classification Results Appendix L. Brüder Bibliography 26 / 22

Tempo: Find recurring peaks Metronom, 80 bpm 2500 2000 1500 1000 500 0 −500 −1000 The unit of the −1500 absissica is 10 µs , 0 100 200 300 400 500 600 the ordinate has no unit. Introduction Music Signal Processing Classification Results Appendix L. Brüder Bibliography 27 / 22

Tempo: Find recurring peaks Metronom, 80 bpm Drums, Hi−Hat on 8th, 80 bpm 2500 4000 3500 2000 3000 1500 2500 1000 2000 500 1500 1000 0 500 −500 0 −1000 −500 The unit of the −1500 −1000 absissica is 10 µs , 0 100 200 300 400 500 600 0 100 200 300 400 500 600 the ordinate has no unit. Introduction Music Signal Processing Classification Results Appendix L. Brüder Bibliography 27 / 22

Tempo: Find recurring peaks Metronom, 80 bpm Drums, Hi−Hat on 8th, 80 bpm 2500 4000 3500 2000 3000 1500 2500 1000 2000 500 1500 1000 0 500 −500 0 −1000 −500 The unit of the −1500 −1000 absissica is 10 µs , 0 100 200 300 400 500 600 0 100 200 300 400 500 600 Drums, Hi−Hat on 16th, 80 bpm the ordinate has 3500 no unit. 3000 2500 2000 1500 1000 500 0 −500 −1000 0 100 200 300 400 500 600 Introduction Music Signal Processing Classification Results Appendix L. Brüder Bibliography 27 / 22

Tempo: Find recurring peaks Metronom, 80 bpm Drums, Hi−Hat on 8th, 80 bpm 2500 4000 3500 2000 3000 1500 2500 1000 2000 500 1500 1000 0 500 −500 0 −1000 −500 The unit of the −1500 −1000 absissica is 10 µs , 0 100 200 300 400 500 600 0 100 200 300 400 500 600 Drums, Hi−Hat on 16th, 80 bpm 4 Test file: "dead_rocks.mp3", 103bpm the ordinate has x 10 3500 8 no unit. 3000 6 2500 2000 4 1500 2 1000 500 0 0 −2 −500 −1000 −4 0 100 200 300 400 500 600 0 100 200 300 400 500 600 Introduction Music Signal Processing Classification Results Appendix L. Brüder Bibliography 27 / 22

Timbre � The timbre of a signal is “the way it sounds” � It is a multi-dimensional feature � In many publications, the Mel Frequency Cepstrum (MFC) is used � The lower (e.g. 8-16) coefficients describe the timbre � Short-time feature: typically one vector every 10-50ms � The MFC is not based on the Constant-Q transform, but: � Similar features can be derived from the Constant-Q transform (see [11]) Introduction Music Signal Processing Classification Results Appendix L. Brüder Bibliography 28 / 22

Music Classification Using Constant-Q Based Features a library for - PowerPoint PPT Presentation

Music Classification Using Constant-Q Based Features a library for mobile devices Lena Brder January 5, 2013 Outline 1 Introduction 2 Music Signal Processing The Constant Q transform Feature Extraction Gaussian Mixture Models 3

MUSIC THERAPY MUSIC THERAPY What is music therapy? Music therapy is simply the process of using

JEWISH MUSIC 101: WHAT IS JEWISH MUSIC? A PROGRAM OF THE LOWELL MILKEN FUND FOR AMERICAN JEWISH

The intriguing case of sad music Dr. Jonna Vuoskoski jonna.vuoskoski@music.ox.ac.uk Music &

Music Classification Overview and Audio Features Graduate School of Culture Technology, KAIST

COMPANY PROFILE WATER FEATURES 1 WATER FEATURES 2 WATER FEATURES 3 WATER FEATURES 4 WATER

Music and Pain: A Music Therapy Perspective Deborah Salmon, MA, MTA, CMT BRAMS, Universit de

FOLK MUSIC AT KMH A presentation of the Folk Music Department at the Royal College of Music,

Music, Language and Computation Aline Honingh LoLaCo Guestlecture 2012 Outline Music at the

Non-constant Non-constant growth model growth model You are calculating the intrinsic value of

Graph Classification Classification Outline Introduction, Overview Classification using

A Musical Future Options for Studying Music at UWA Why choose Music at UWA? Music at UWA

Music Tagging Ryan Curtin LUG@GT Ryan Curtin Music Tagging - p. 1 The Problem You have a

School Music Education Plan THAMES Guidance for Schools Music in Schools - Introducing School

Radium: A Music Editor Inspired by the Music Tracker Kjetil Matheussen Norwegian Center for

Music recommendation and discovery in which Web? scar Celma (Music Technology Group, UPF)

1 Music IR Music? Music IR Music? Music - Sound Music - Sound - Loudness http://

A multi-site approach to risk assessment for the insurance industry Linda Speight Supervisors:

Doubly Spectral Finite Element Method for Stochastic Field Problems in Structural Dynamics S

Random Signals and Noise Distribution Functions The distribution function of a random variable X

Stabilizing factors in spatially structured food webs Sara Gudmundson LiTH-IFM-Ex 09/2127

Tailoring thermal conductivity of graphene via defect-and-molecular engineering Stefan Bringuier

Analysis of spectroscopic and theoretical results of compounds with intermolecular hydrogen

Optimal strategies to manage major disturbances Workshop on Operations for enhanced capacity,

Edge-effects of fences on elephant movement patterns: Implications for small reserves Abi Tamim

Music Classification Using Constant-Q Based Features a library for - PowerPoint PPT Presentation

Music Classification Using Constant-Q Based Features a library for mobile devices Lena Brder January 5, 2013 Outline 1 Introduction 2 Music Signal Processing The Constant Q transform Feature Extraction Gaussian Mixture Models 3

MUSIC THERAPY MUSIC THERAPY What is music therapy? Music therapy is simply the process of using

JEWISH MUSIC 101: WHAT IS JEWISH MUSIC? A PROGRAM OF THE LOWELL MILKEN FUND FOR AMERICAN JEWISH

The intriguing case of sad music Dr. Jonna Vuoskoski jonna.vuoskoski@music.ox.ac.uk Music &amp;

Music Classification Overview and Audio Features Graduate School of Culture Technology, KAIST

COMPANY PROFILE WATER FEATURES 1 WATER FEATURES 2 WATER FEATURES 3 WATER FEATURES 4 WATER

Music and Pain: A Music Therapy Perspective Deborah Salmon, MA, MTA, CMT BRAMS, Universit de

FOLK MUSIC AT KMH A presentation of the Folk Music Department at the Royal College of Music,

Music, Language and Computation Aline Honingh LoLaCo Guestlecture 2012 Outline Music at the

Non-constant Non-constant growth model growth model You are calculating the intrinsic value of

Graph Classification Classification Outline Introduction, Overview Classification using

A Musical Future Options for Studying Music at UWA Why choose Music at UWA? Music at UWA

Music Tagging Ryan Curtin LUG@GT Ryan Curtin Music Tagging - p. 1 The Problem You have a

School Music Education Plan THAMES Guidance for Schools Music in Schools - Introducing School

Radium: A Music Editor Inspired by the Music Tracker Kjetil Matheussen Norwegian Center for

Music recommendation and discovery in which Web? scar Celma (Music Technology Group, UPF)

1 Music IR Music? Music IR Music? Music - Sound Music - Sound - Loudness http://

A multi-site approach to risk assessment for the insurance industry Linda Speight Supervisors:

Doubly Spectral Finite Element Method for Stochastic Field Problems in Structural Dynamics S

Random Signals and Noise Distribution Functions The distribution function of a random variable X

Stabilizing factors in spatially structured food webs Sara Gudmundson LiTH-IFM-Ex 09/2127

Tailoring thermal conductivity of graphene via defect-and-molecular engineering Stefan Bringuier

Analysis of spectroscopic and theoretical results of compounds with intermolecular hydrogen

Optimal strategies to manage major disturbances Workshop on Operations for enhanced capacity,

Edge-effects of fences on elephant movement patterns: Implications for small reserves Abi Tamim

The intriguing case of sad music Dr. Jonna Vuoskoski jonna.vuoskoski@music.ox.ac.uk Music &