music classification using constant q based features
play

Music Classification Using Constant-Q Based Features a library for - PowerPoint PPT Presentation

Music Classification Using Constant-Q Based Features a library for mobile devices Lena Brder January 5, 2013 Outline 1 Introduction 2 Music Signal Processing The Constant Q transform Feature Extraction Gaussian Mixture Models 3


  1. Feature Extraction Different features are extracted: � Length of the piece � Dynamic range (how relate loud parts to quieter ones) � Tempo in BPM (not used for classification) � Timbre (via Constant- Q Cepstrum) � Key-invariant chroma (map all octaves to one, remove key) Note: � Timbre and chroma are multi-dimensional features, the others are scalar values. � Timbre and chroma are calculated every 10 − 20 ms, the others are calculated once per recording. � But: Classifiers expect features to be uniform, or at least comparable. � Solution: Transform many multi-dimensional feature vectors to one scalar value ( → dimensionality and data count reduction). Introduction Music Signal Processing Classification Results Appendix L. Brüder Bibliography 8 / 22

  2. Feature Extraction Different features are extracted: � Length of the piece � Dynamic range (how relate loud parts to quieter ones) � Tempo in BPM (not used for classification) � Timbre (via Constant- Q Cepstrum) � Key-invariant chroma (map all octaves to one, remove key) Note: � Timbre and chroma are multi-dimensional features, the others are scalar values. � Timbre and chroma are calculated every 10 − 20 ms, the others are calculated once per recording. � But: Classifiers expect features to be uniform, or at least comparable. � Solution: Transform many multi-dimensional feature vectors to one scalar value ( → dimensionality and data count reduction). Introduction Music Signal Processing Classification Results Appendix L. Brüder Bibliography 8 / 22

  3. Feature Extraction Different features are extracted: � Length of the piece � Dynamic range (how relate loud parts to quieter ones) � Tempo in BPM (not used for classification) � Timbre (via Constant- Q Cepstrum) � Key-invariant chroma (map all octaves to one, remove key) Note: � Timbre and chroma are multi-dimensional features, the others are scalar values. � Timbre and chroma are calculated every 10 − 20 ms, the others are calculated once per recording. � But: Classifiers expect features to be uniform, or at least comparable. � Solution: Transform many multi-dimensional feature vectors to one scalar value ( → dimensionality and data count reduction). Introduction Music Signal Processing Classification Results Appendix L. Brüder Bibliography 8 / 22

  4. Feature Extraction Different features are extracted: � Length of the piece � Dynamic range (how relate loud parts to quieter ones) � Tempo in BPM (not used for classification) � Timbre (via Constant- Q Cepstrum) � Key-invariant chroma (map all octaves to one, remove key) Note: � Timbre and chroma are multi-dimensional features, the others are scalar values. � Timbre and chroma are calculated every 10 − 20 ms, the others are calculated once per recording. � But: Classifiers expect features to be uniform, or at least comparable. � Solution: Transform many multi-dimensional feature vectors to one scalar value ( → dimensionality and data count reduction). Introduction Music Signal Processing Classification Results Appendix L. Brüder Bibliography 8 / 22

  5. Feature Extraction Different features are extracted: � Length of the piece � Dynamic range (how relate loud parts to quieter ones) � Tempo in BPM (not used for classification) � Timbre (via Constant- Q Cepstrum) � Key-invariant chroma (map all octaves to one, remove key) Note: � Timbre and chroma are multi-dimensional features, the others are scalar values. � Timbre and chroma are calculated every 10 − 20 ms, the others are calculated once per recording. � But: Classifiers expect features to be uniform, or at least comparable. � Solution: Transform many multi-dimensional feature vectors to one scalar value ( → dimensionality and data count reduction). Introduction Music Signal Processing Classification Results Appendix L. Brüder Bibliography 8 / 22

  6. Feature Extraction Different features are extracted: � Length of the piece � Dynamic range (how relate loud parts to quieter ones) � Tempo in BPM (not used for classification) � Timbre (via Constant- Q Cepstrum) � Key-invariant chroma (map all octaves to one, remove key) Note: � Timbre and chroma are multi-dimensional features, the others are scalar values. � Timbre and chroma are calculated every 10 − 20 ms, the others are calculated once per recording. � But: Classifiers expect features to be uniform, or at least comparable. � Solution: Transform many multi-dimensional feature vectors to one scalar value ( → dimensionality and data count reduction). Introduction Music Signal Processing Classification Results Appendix L. Brüder Bibliography 8 / 22

  7. Feature Extraction Different features are extracted: � Length of the piece � Dynamic range (how relate loud parts to quieter ones) � Tempo in BPM (not used for classification) � Timbre (via Constant- Q Cepstrum) � Key-invariant chroma (map all octaves to one, remove key) Note: � Timbre and chroma are multi-dimensional features, the others are scalar values. � Timbre and chroma are calculated every 10 − 20 ms, the others are calculated once per recording. � But: Classifiers expect features to be uniform, or at least comparable. � Solution: Transform many multi-dimensional feature vectors to one scalar value ( → dimensionality and data count reduction). Introduction Music Signal Processing Classification Results Appendix L. Brüder Bibliography 8 / 22

  8. Feature Extraction Different features are extracted: � Length of the piece � Dynamic range (how relate loud parts to quieter ones) � Tempo in BPM (not used for classification) � Timbre (via Constant- Q Cepstrum) � Key-invariant chroma (map all octaves to one, remove key) Note: � Timbre and chroma are multi-dimensional features, the others are scalar values. � Timbre and chroma are calculated every 10 − 20 ms, the others are calculated once per recording. � But: Classifiers expect features to be uniform, or at least comparable. � Solution: Transform many multi-dimensional feature vectors to one scalar value ( → dimensionality and data count reduction). Introduction Music Signal Processing Classification Results Appendix L. Brüder Bibliography 8 / 22

  9. Feature Extraction Different features are extracted: � Length of the piece � Dynamic range (how relate loud parts to quieter ones) � Tempo in BPM (not used for classification) � Timbre (via Constant- Q Cepstrum) � Key-invariant chroma (map all octaves to one, remove key) Note: � Timbre and chroma are multi-dimensional features, the others are scalar values. � Timbre and chroma are calculated every 10 − 20 ms, the others are calculated once per recording. � But: Classifiers expect features to be uniform, or at least comparable. � Solution: Transform many multi-dimensional feature vectors to one scalar value ( → dimensionality and data count reduction). Introduction Music Signal Processing Classification Results Appendix L. Brüder Bibliography 8 / 22

  10. Gaussian Mixture Models x 2 Data count reduction works as follows: � Take all feature vectors of one feature � Model their probability distribution � Forget about the original feature vectors � This step brings the data count reduction: One model (fixed model size) instead of many (arbitrary count) feature vectors x 1 � Do this with feature vectors of one recording: Get a model for the recording � Do this with feature vectors of all recordings from a category: Get a model for the category! Introduction Music Signal Processing Classification Results Appendix L. Brüder Bibliography 9 / 22

  11. Gaussian Mixture Models x 2 Data count reduction works as follows: � Take all feature vectors of one feature � Model their probability distribution � Forget about the original feature vectors � This step brings the data count reduction: One model (fixed model size) instead of many (arbitrary count) feature vectors x 1 � Do this with feature vectors of one recording: Get a model for the recording � Do this with feature vectors of all recordings from a category: Get a model for the category! Introduction Music Signal Processing Classification Results Appendix L. Brüder Bibliography 9 / 22

  12. Gaussian Mixture Models x 2 Data count reduction works as follows: � Take all feature vectors of one feature � Model their probability distribution � Forget about the original feature vectors � This step brings the data count reduction: One model (fixed model size) instead of many (arbitrary count) feature vectors x 1 � Do this with feature vectors of one recording: Get a model for the recording � Do this with feature vectors of all recordings from a category: Get a model for the category! Introduction Music Signal Processing Classification Results Appendix L. Brüder Bibliography 9 / 22

  13. Gaussian Mixture Models x 2 Data count reduction works as follows: � Take all feature vectors of one feature � Model their probability distribution � Forget about the original feature vectors � This step brings the data count reduction: One model (fixed model size) instead of many (arbitrary count) feature vectors x 1

  14. Gaussian Mixture Models x 2 Data count reduction works as follows: � Take all feature vectors of one feature � Model their probability distribution � Forget about the original feature vectors � This step brings the data count reduction: One model (fixed model size) instead of many (arbitrary count) feature vectors x 1 � Do this with feature vectors of one recording: Get a model for the recording � Do this with feature vectors of all recordings from a category: Get a model for the category! Introduction Music Signal Processing Classification Results Appendix L. Brüder Bibliography 9 / 22

  15. Gaussian Mixture Models x 2 Data count reduction works as follows: � Take all feature vectors of one feature � Model their probability distribution � Forget about the original feature vectors � This step brings the data count reduction: One model (fixed model size) instead of many (arbitrary count) feature vectors x 1 � Do this with feature vectors of one recording: Get a model for the recording � Do this with feature vectors of all recordings from a category: Get a model for the category! Introduction Music Signal Processing Classification Results Appendix L. Brüder Bibliography 9 / 22

  16. Gaussian Mixture Models x 2 Data count reduction works as follows: � Take all feature vectors of one feature � Model their probability distribution � Forget about the original feature vectors � This step brings the data count reduction: One model (fixed model size) instead of many (arbitrary count) feature vectors x 1 � Do this with feature vectors of one recording: Get a model for the recording � Do this with feature vectors of all recordings from a category: Get a model for the category! Introduction Music Signal Processing Classification Results Appendix L. Brüder Bibliography 9 / 22

  17. GMM: Dimensionality reduction Dimensionality reduction works through comparision of the models: x 2 x 2 x 1 x 1 quite similar not that similar d ( a, b ) ≈ 0 . 9 d ( a, b ) ≈ 30 For comparision, the Kullback-Leibler divergence is used (Monte-Carlo integration!). Introduction Music Signal Processing Classification Results Appendix L. Brüder Bibliography 10 / 22

  18. GMM: Dimensionality reduction Dimensionality reduction works through comparision of the models: x 2 x 2 x 1 x 1 quite similar not that similar d ( a, b ) ≈ 0 . 9 d ( a, b ) ≈ 30 For comparision, the Kullback-Leibler divergence is used (Monte-Carlo integration!). Introduction Music Signal Processing Classification Results Appendix L. Brüder Bibliography 10 / 22

  19. GMM: Dimensionality reduction Dimensionality reduction works through comparision of the models: x 2 x 2 x 1 x 1 quite similar not that similar d ( a, b ) ≈ 0 . 9 d ( a, b ) ≈ 30 For comparision, the Kullback-Leibler divergence is used (Monte-Carlo integration!). Introduction Music Signal Processing Classification Results Appendix L. Brüder Bibliography 10 / 22

  20. GMM: Dimensionality reduction Dimensionality reduction works through comparision of the models: x 2 x 2 x 1 x 1 quite similar not that similar d ( a, b ) ≈ 0 . 9 d ( a, b ) ≈ 30 For comparision, the Kullback-Leibler divergence is used (Monte-Carlo integration!). Introduction Music Signal Processing Classification Results Appendix L. Brüder Bibliography 10 / 22

  21. GMM: Dimensionality reduction Dimensionality reduction works through comparision of the models: x 2 x 2 x 1 x 1 quite similar not that similar d ( a, b ) ≈ 0 . 9 d ( a, b ) ≈ 30 For comparision, the Kullback-Leibler divergence is used (Monte-Carlo integration!). Introduction Music Signal Processing Classification Results Appendix L. Brüder Bibliography 10 / 22

  22. GMM: Dimensionality reduction Dimensionality reduction works through comparision of the models: x 2 x 2 x 1 x 1 quite similar not that similar d ( a, b ) ≈ 0 . 9 d ( a, b ) ≈ 30 For comparision, the Kullback-Leibler divergence is used (Monte-Carlo integration!). Introduction Music Signal Processing Classification Results Appendix L. Brüder Bibliography 10 / 22

  23. GMM: Applied to recordings and categories How does it work? � Build a model for every recording � Build a model for every category � Compare recording-model to category-model � Combine resulting scalar values for timbre and chroma with other scalar values to new “all-feature vector”:  timbre similarity to category model  chroma similarity to category model feature vector =    dynamic range    length of the recording � There is one such feature vector per recording Introduction Music Signal Processing Classification Results Appendix L. Brüder Bibliography 11 / 22

  24. GMM: Applied to recordings and categories How does it work? � Build a model for every recording � Build a model for every category � Compare recording-model to category-model � Combine resulting scalar values for timbre and chroma with other scalar values to new “all-feature vector”:  timbre similarity to category model  chroma similarity to category model feature vector =    dynamic range    length of the recording � There is one such feature vector per recording Introduction Music Signal Processing Classification Results Appendix L. Brüder Bibliography 11 / 22

  25. GMM: Applied to recordings and categories How does it work? � Build a model for every recording � Build a model for every category � Compare recording-model to category-model � Combine resulting scalar values for timbre and chroma with other scalar values to new “all-feature vector”:  timbre similarity to category model  chroma similarity to category model feature vector =    dynamic range    length of the recording � There is one such feature vector per recording Introduction Music Signal Processing Classification Results Appendix L. Brüder Bibliography 11 / 22

  26. GMM: Applied to recordings and categories How does it work? � Build a model for every recording � Build a model for every category � Compare recording-model to category-model � Combine resulting scalar values for timbre and chroma with other scalar values to new “all-feature vector”:  timbre similarity to category model  chroma similarity to category model feature vector =    dynamic range    length of the recording � There is one such feature vector per recording Introduction Music Signal Processing Classification Results Appendix L. Brüder Bibliography 11 / 22

  27. GMM: Applied to recordings and categories How does it work? � Build a model for every recording � Build a model for every category � Compare recording-model to category-model � Combine resulting scalar values for timbre and chroma with other scalar values to new “all-feature vector”:  timbre similarity to category model  chroma similarity to category model feature vector =    dynamic range    length of the recording � There is one such feature vector per recording Introduction Music Signal Processing Classification Results Appendix L. Brüder Bibliography 11 / 22

  28. GMM: Applied to recordings and categories How does it work? � Build a model for every recording � Build a model for every category � Compare recording-model to category-model � Combine resulting scalar values for timbre and chroma with other scalar values to new “all-feature vector”:  timbre similarity to category model  chroma similarity to category model feature vector =    dynamic range    length of the recording � There is one such feature vector per recording Introduction Music Signal Processing Classification Results Appendix L. Brüder Bibliography 11 / 22

  29. Plan 1 Introduction 2 Music Signal Processing The Constant Q transform Feature Extraction Gaussian Mixture Models 3 Classification 4 Results Demonstration 5 Appendix Dynamic range Tempo Timbre Key-invariant chroma 6 Bibliography Introduction Music Signal Processing Classification Results Appendix L. Brüder Bibliography 12 / 22

  30. Classification: Classical approaches � Classical approaches use categories � Decision: Does a recording belong to a category, or not? � Score is binary: e.g. − 1 or 1 � Positive and negative examples needed for training (ideally many) � Approaches exist that only need positive examples � Examples for binary classifiers: LDA, SVM, (ANN) Introduction Music Signal Processing Classification Results Appendix L. Brüder Bibliography 13 / 22

  31. Classification: Classical approaches � Classical approaches use categories � Decision: Does a recording belong to a category, or not? � Score is binary: e.g. − 1 or 1 � Positive and negative examples needed for training (ideally many) � Approaches exist that only need positive examples � Examples for binary classifiers: LDA, SVM, (ANN) Introduction Music Signal Processing Classification Results Appendix L. Brüder Bibliography 13 / 22

  32. Classification: Classical approaches � Classical approaches use categories � Decision: Does a recording belong to a category, or not? � Score is binary: e.g. − 1 or 1 � Positive and negative examples needed for training (ideally many) � Approaches exist that only need positive examples � Examples for binary classifiers: LDA, SVM, (ANN) Introduction Music Signal Processing Classification Results Appendix L. Brüder Bibliography 13 / 22

  33. Classification: Classical approaches � Classical approaches use categories � Decision: Does a recording belong to a category, or not? � Score is binary: e.g. − 1 or 1 � Positive and negative examples needed for training (ideally many) � Approaches exist that only need positive examples � Examples for binary classifiers: LDA, SVM, (ANN) Introduction Music Signal Processing Classification Results Appendix L. Brüder Bibliography 13 / 22

  34. Classification: Classical approaches � Classical approaches use categories � Decision: Does a recording belong to a category, or not? � Score is binary: e.g. − 1 or 1 � Positive and negative examples needed for training (ideally many) � Approaches exist that only need positive examples � Examples for binary classifiers: LDA, SVM, (ANN) Introduction Music Signal Processing Classification Results Appendix L. Brüder Bibliography 13 / 22

  35. Classification: Classical approaches � Classical approaches use categories � Decision: Does a recording belong to a category, or not? � Score is binary: e.g. − 1 or 1 � Positive and negative examples needed for training (ideally many) � Approaches exist that only need positive examples � Examples for binary classifiers: LDA, SVM, (ANN) Introduction Music Signal Processing Classification Results Appendix L. Brüder Bibliography 13 / 22

  36. Classification: Approach used � Different approach here: Recordings get a score from [ − 1 , 1] for a category � Gives a ranking rather than a classification � Positive and negative examples can be used, but there is no need for both � Only a few examples are needed (works from a single feature vector, 5-10 is ideal) + Better matches are shown first + No need for both positive and negative examples + Flexible approach, fits to users needs − There is no decision which recordings definitely do not match Introduction Music Signal Processing Classification Results Appendix L. Brüder Bibliography 14 / 22

  37. Classification: Approach used � Different approach here: Recordings get a score from [ − 1 , 1] for a category � Gives a ranking rather than a classification � Positive and negative examples can be used, but there is no need for both � Only a few examples are needed (works from a single feature vector, 5-10 is ideal) + Better matches are shown first + No need for both positive and negative examples + Flexible approach, fits to users needs − There is no decision which recordings definitely do not match Introduction Music Signal Processing Classification Results Appendix L. Brüder Bibliography 14 / 22

  38. Classification: Approach used � Different approach here: Recordings get a score from [ − 1 , 1] for a category � Gives a ranking rather than a classification � Positive and negative examples can be used, but there is no need for both � Only a few examples are needed (works from a single feature vector, 5-10 is ideal) + Better matches are shown first + No need for both positive and negative examples + Flexible approach, fits to users needs − There is no decision which recordings definitely do not match Introduction Music Signal Processing Classification Results Appendix L. Brüder Bibliography 14 / 22

  39. Classification: Approach used � Different approach here: Recordings get a score from [ − 1 , 1] for a category � Gives a ranking rather than a classification � Positive and negative examples can be used, but there is no need for both � Only a few examples are needed (works from a single feature vector, 5-10 is ideal) + Better matches are shown first + No need for both positive and negative examples + Flexible approach, fits to users needs − There is no decision which recordings definitely do not match Introduction Music Signal Processing Classification Results Appendix L. Brüder Bibliography 14 / 22

  40. Classification: Approach used � Different approach here: Recordings get a score from [ − 1 , 1] for a category � Gives a ranking rather than a classification � Positive and negative examples can be used, but there is no need for both � Only a few examples are needed (works from a single feature vector, 5-10 is ideal) Better matches are shown first + No need for both positive and negative examples + Flexible approach, fits to users needs − There is no decision which recordings definitely do not match Introduction Music Signal Processing Classification Results Appendix L. Brüder Bibliography 14 / 22

  41. Classification: Approach used � Different approach here: Recordings get a score from [ − 1 , 1] for a category � Gives a ranking rather than a classification � Positive and negative examples can be used, but there is no need for both � Only a few examples are needed (works from a single feature vector, 5-10 is ideal) Better matches are shown first No need for both positive and negative examples + Flexible approach, fits to users needs − There is no decision which recordings definitely do not match Introduction Music Signal Processing Classification Results Appendix L. Brüder Bibliography 14 / 22

  42. Classification: Approach used � Different approach here: Recordings get a score from [ − 1 , 1] for a category � Gives a ranking rather than a classification � Positive and negative examples can be used, but there is no need for both � Only a few examples are needed (works from a single feature vector, 5-10 is ideal) Better matches are shown first No need for both positive and negative examples Flexible approach, fits to users needs − There is no decision which recordings definitely do not match Introduction Music Signal Processing Classification Results Appendix L. Brüder Bibliography 14 / 22

  43. Classification: Approach used � Different approach here: Recordings get a score from [ − 1 , 1] for a category � Gives a ranking rather than a classification � Positive and negative examples can be used, but there is no need for both � Only a few examples are needed (works from a single feature vector, 5-10 is ideal) Better matches are shown first No need for both positive and negative examples Flexible approach, fits to users needs There is no decision which recordings definitely do not match Introduction Music Signal Processing Classification Results Appendix L. Brüder Bibliography 14 / 22

  44. Classification: How does it work? (1/2) � Four-dimensional chroma similarity recording feature vectors dynamic range dynamic range used � Calculate distribution of vectors ( → covariance matrix) same vector � Gaussian Model (no mixture!) of positive example feature vectors � Calculate Mahalanobis distance of any other feature vector: timbre similarity timbre similarity � ( x − µ ) T Σ − 1 ( x − µ ) . d Σ ( x , y ) = length Sectional drawing of feature vectors � This gives a distance value in [0 , ∞ [ . Introduction Music Signal Processing Classification Results Appendix L. Brüder Bibliography 15 / 22

  45. GMM: Applied to recordings and categories How does it work? � Build a model for every recording � Build a model for every category � Compare recording-model to category-model � Combine resulting scalar values for timbre and chroma with other scalar values to new “all-feature vector”:  timbre similarity to category model  chroma similarity to category model feature vector =    dynamic range    length of the recording � There is one such feature vector per recording Introduction Music Signal Processing Classification Results Appendix L. Brüder Bibliography 16 / 22

  46. Classification: How does it work? (1/2) � Four-dimensional chroma similarity recording feature vectors dynamic range dynamic range used � Calculate distribution of vectors ( → covariance matrix) same vector � Gaussian Model (no mixture!) of positive example feature vectors � Calculate Mahalanobis distance of any other feature vector: timbre similarity timbre similarity � ( x − µ ) T Σ − 1 ( x − µ ) . d Σ ( x , y ) = length Sectional drawing of feature vectors � This gives a distance value in [0 , ∞ [ . Introduction Music Signal Processing Classification Results Appendix L. Brüder Bibliography 17 / 22

  47. Classification: How does it work? (1/2) � Four-dimensional chroma similarity recording feature vectors dynamic range dynamic range used � Calculate distribution of vectors ( → covariance matrix) same vector � Gaussian Model (no mixture!) of positive example feature vectors � Calculate Mahalanobis distance of any other feature vector: timbre similarity timbre similarity � ( x − µ ) T Σ − 1 ( x − µ ) . d Σ ( x , y ) = length Sectional drawing of feature vectors � This gives a distance value in [0 , ∞ [ . Introduction Music Signal Processing Classification Results Appendix L. Brüder Bibliography 17 / 22

  48. Classification: How does it work? (1/2) � Four-dimensional chroma similarity recording feature vectors dynamic range dynamic range used � Calculate distribution of vectors ( → covariance matrix) same vector � Gaussian Model (no mixture!) of positive example feature vectors � Calculate Mahalanobis distance of any other feature vector: timbre similarity timbre similarity � ( x − µ ) T Σ − 1 ( x − µ ) . d Σ ( x , y ) = length Sectional drawing of feature vectors � This gives a distance value in [0 , ∞ [ . Introduction Music Signal Processing Classification Results Appendix L. Brüder Bibliography 17 / 22

  49. Classification: How does it work? (1/2) � Four-dimensional chroma similarity recording feature vectors dynamic range dynamic range used � Calculate distribution of vectors ( → covariance matrix) same vector � Gaussian Model (no mixture!) of positive example feature vectors � Calculate Mahalanobis distance of any other feature vector: timbre similarity timbre similarity � ( x − µ ) T Σ − 1 ( x − µ ) . d Σ ( x , y ) = length Sectional drawing of feature vectors � This gives a distance value in [0 , ∞ [ . Introduction Music Signal Processing Classification Results Appendix L. Brüder Bibliography 17 / 22

  50. Classification: How does it work? (1/2) � Four-dimensional chroma similarity recording feature vectors dynamic range dynamic range used � Calculate distribution of vectors ( → covariance matrix) same vector � Gaussian Model (no mixture!) of positive example feature vectors � Calculate Mahalanobis distance of any other feature vector: timbre similarity timbre similarity � ( x − µ ) T Σ − 1 ( x − µ ) . d Σ ( x , y ) = length Sectional drawing of feature vectors � This gives a distance value in [0 , ∞ [ . Introduction Music Signal Processing Classification Results Appendix L. Brüder Bibliography 17 / 22

  51. Classification: How does it work? (1/2) � Four-dimensional chroma similarity recording feature vectors dynamic range dynamic range used � Calculate distribution of vectors ( → covariance matrix) same vector � Gaussian Model (no mixture!) of positive example feature vectors � Calculate Mahalanobis distance of any other feature vector: timbre similarity timbre similarity � ( x − µ ) T Σ − 1 ( x − µ ) . d Σ ( x , y ) = length Sectional drawing of feature vectors � This gives a distance value in [0 , ∞ [ . Introduction Music Signal Processing Classification Results Appendix L. Brüder Bibliography 17 / 22

  52. Classification: How does it work? (2/2) � Transform from [0 , ∞ [ to [0 , 1] through T p ( x ) = 1 1+ x T p ( x ) 1 . 0 0 . 5 x 1 2 3 4 5 6 7 8 9 10 11 12 � Up to now: Positive model � Negative model: Second model, mapped to [ − 1 , 0] via T n ( x ) = − 1 1+ x � Sum both intervals: Values from [ − 1 , 1] Introduction Music Signal Processing Classification Results Appendix L. Brüder Bibliography 18 / 22

  53. Classification: How does it work? (2/2) � Transform from [0 , ∞ [ to [0 , 1] through T p ( x ) = 1 1+ x T p ( x ) 1 . 0 0 . 5 x 1 2 3 4 5 6 7 8 9 10 11 12 � Up to now: Positive model � Negative model: Second model, mapped to [ − 1 , 0] via T n ( x ) = − 1 1+ x � Sum both intervals: Values from [ − 1 , 1] Introduction Music Signal Processing Classification Results Appendix L. Brüder Bibliography 18 / 22

  54. Classification: How does it work? (2/2) � Transform from [0 , ∞ [ to [0 , 1] through T p ( x ) = 1 1+ x T p ( x ) 1 . 0 0 . 5 x 1 2 3 4 5 6 7 8 9 10 11 12 � Up to now: Positive model � Negative model: Second model, mapped to [ − 1 , 0] via T n ( x ) = − 1 1+ x � Sum both intervals: Values from [ − 1 , 1] Introduction Music Signal Processing Classification Results Appendix L. Brüder Bibliography 18 / 22

  55. Classification: How does it work? (2/2) � Transform from [0 , ∞ [ to [0 , 1] through T p ( x ) = 1 1+ x T p ( x ) 1 . 0 0 . 5 x 1 2 3 4 5 6 7 8 9 10 11 12 � Up to now: Positive model � Negative model: Second model, mapped to [ − 1 , 0] via T n ( x ) = − 1 1+ x � Sum both intervals: Values from [ − 1 , 1] Introduction Music Signal Processing Classification Results Appendix L. Brüder Bibliography 18 / 22

  56. Classification: How does it work? (2/2) � Transform from [0 , ∞ [ to [0 , 1] through T p ( x ) = 1 1+ x T p ( x ) 1 . 0 0 . 5 x 1 2 3 4 5 6 7 8 9 10 11 12 � Up to now: Positive model � Negative model: Second model, mapped to [ − 1 , 0] via T n ( x ) = − 1 1+ x � Sum both intervals: Values from [ − 1 , 1] Introduction Music Signal Processing Classification Results Appendix L. Brüder Bibliography 18 / 22

  57. Plan 1 Introduction 2 Music Signal Processing The Constant Q transform Feature Extraction Gaussian Mixture Models 3 Classification 4 Results Demonstration 5 Appendix Dynamic range Tempo Timbre Key-invariant chroma 6 Bibliography Introduction Music Signal Processing Classification Results Appendix L. Brüder Bibliography 19 / 22

  58. Results Testing procedure: Train classifier with positive and negative examples, take 100 best matches, count same-category matches. � Classical : Three positives, three negatives → 94% matches, first “false-positive” at rank 57 � Jazz/RnB : Two positives, two negatives → 89% matches, first “false-positive” at rank 41 � Pop/Rock : Two positives, one negative → 87% matches, first “false-positive” at rank 13 Introduction Music Signal Processing Classification Results Appendix L. Brüder Bibliography 20 / 22

  59. Results Testing procedure: Train classifier with positive and negative examples, take 100 best matches, count same-category matches. � Classical : Three positives, three negatives → 94% matches, first “false-positive” at rank 57 � Jazz/RnB : Two positives, two negatives → 89% matches, first “false-positive” at rank 41 � Pop/Rock : Two positives, one negative → 87% matches, first “false-positive” at rank 13 Introduction Music Signal Processing Classification Results Appendix L. Brüder Bibliography 20 / 22

  60. Results Testing procedure: Train classifier with positive and negative examples, take 100 best matches, count same-category matches. � Classical : Three positives, three negatives → 94% matches, first “false-positive” at rank 57 � Jazz/RnB : Two positives, two negatives → 89% matches, first “false-positive” at rank 41 � Pop/Rock : Two positives, one negative → 87% matches, first “false-positive” at rank 13 Introduction Music Signal Processing Classification Results Appendix L. Brüder Bibliography 20 / 22

  61. Results Testing procedure: Train classifier with positive and negative examples, take 100 best matches, count same-category matches. � Classical : Three positives, three negatives → 94% matches, first “false-positive” at rank 57 � Jazz/RnB : Two positives, two negatives → 89% matches, first “false-positive” at rank 41 � Pop/Rock : Two positives, one negative → 87% matches, first “false-positive” at rank 13 Introduction Music Signal Processing Classification Results Appendix L. Brüder Bibliography 20 / 22

  62. Demonstration Introduction Music Signal Processing Classification Results Appendix L. Brüder Bibliography 21 / 22

  63. Questions? Any questions left? Dynamic range Tempo Timbre Chroma Introduction Music Signal Processing Classification Results Appendix L. Brüder Bibliography 22 / 22

  64. Plan 1 Introduction 2 Music Signal Processing The Constant Q transform Feature Extraction Gaussian Mixture Models 3 Classification 4 Results Demonstration 5 Appendix Dynamic range Tempo Timbre Key-invariant chroma 6 Bibliography Introduction Music Signal Processing Classification Results Appendix L. Brüder Bibliography 23 / 22

  65. Dynamic range � Intuition: We want to define a measure of how loud parts of a musical piece relate to the quieter ones. � The measure should be small if most of the signal is at one volume. It should increase with the amount of volume changes during the recording. Within the context of music comparision, we define the dynamic range of an audio signal as the root of the mean energy of the continous input signal x c ( t ) , which is � � T c 1 (4) dyn cRMS = x 2 c ( t ) d t T c 0 with T c being the last point in time of the signal. Introduction Music Signal Processing Classification Results Appendix L. Brüder Bibliography 24 / 22

  66. Dynamic range � Intuition: We want to define a measure of how loud parts of a musical piece relate to the quieter ones. � The measure should be small if most of the signal is at one volume. It should increase with the amount of volume changes during the recording. Within the context of music comparision, we define the dynamic range of an audio signal as the root of the mean energy of the continous input signal x c ( t ) , which is � � T c 1 (4) dyn cRMS = x 2 c ( t ) d t T c 0 with T c being the last point in time of the signal. Introduction Music Signal Processing Classification Results Appendix L. Brüder Bibliography 24 / 22

  67. Dynamic range � Intuition: We want to define a measure of how loud parts of a musical piece relate to the quieter ones. � The measure should be small if most of the signal is at one volume. It should increase with the amount of volume changes during the recording. Within the context of music comparision, we define the dynamic range of an audio signal as the root of the mean energy of the continous input signal x c ( t ) , which is � � T c 1 (4) dyn cRMS = x 2 c ( t ) d t T c 0 with T c being the last point in time of the signal. Introduction Music Signal Processing Classification Results Appendix L. Brüder Bibliography 24 / 22

  68. Dynamic range This definition will be changed slightly for the implementation: � N � � 1 � � nsum CQ2 ( X CQ , t n ) . (5) dyn dRMS = 1 − N n =0 with B nsum CQ ( X CQ , t n ) = 1 � | X CQ ( b, t n ) | (6) R b =0 and B � | X CQ ( b, t n ) | ) . (7) R = max t n ( b =0 Remark Here, we are talking of discrete points in time. Every t n refers to the continous time interval [ t n , t n +1 ] . Introduction Music Signal Processing Classification Results Appendix L. Brüder Bibliography 25 / 22

  69. Dynamic range This definition will be changed slightly for the implementation: � N � � 1 � � nsum CQ2 ( X CQ , t n ) . (5) dyn dRMS = 1 − N n =0 with B nsum CQ ( X CQ , t n ) = 1 � | X CQ ( b, t n ) | (6) R b =0 and B � | X CQ ( b, t n ) | ) . (7) R = max t n ( b =0 Remark Here, we are talking of discrete points in time. Every t n refers to the continous time interval [ t n , t n +1 ] . Introduction Music Signal Processing Classification Results Appendix L. Brüder Bibliography 25 / 22

  70. Dynamic range This definition will be changed slightly for the implementation: � N � � 1 � � nsum CQ2 ( X CQ , t n ) . (5) dyn dRMS = 1 − N n =0 with B nsum CQ ( X CQ , t n ) = 1 � | X CQ ( b, t n ) | (6) R b =0 and B � | X CQ ( b, t n ) | ) . (7) R = max t n ( b =0 Remark Here, we are talking of discrete points in time. Every t n refers to the continous time interval [ t n , t n +1 ] . Introduction Music Signal Processing Classification Results Appendix L. Brüder Bibliography 25 / 22

  71. Tempo in bpm (beats per minute) � Intuitive: The speed at which humans tap when listening to a song � Problem: That speed is not well-defined. Some persons tap at quarters, some at halves, . . . 1. Take the sum of the constant-Q bins sum CQ ( X CQ , t n ) 2. Calculate the difference vector d CQ ( X CQ , t n ) 3. Calculate the autocorrelation of the difference vector 4. Find recurring peaks in the autocorrelation function B sum CQ ( X CQ , t n ) = � | X CQ ( b, t n ) | (8) b =0 d CQ ( X CQ , t n ) = sum CQ ( t n ) − sum CQ ( t n +1 ) (9) τ max � (10) a CQ (d CQ ( t n ) , τ ) = d CQ ( t n ) ∗ d CQ ( t n − τ ) τ =0 Introduction Music Signal Processing Classification Results Appendix L. Brüder Bibliography 26 / 22

  72. Tempo in bpm (beats per minute) � Intuitive: The speed at which humans tap when listening to a song � Problem: That speed is not well-defined. Some persons tap at quarters, some at halves, . . . 1. Take the sum of the constant-Q bins sum CQ ( X CQ , t n ) 2. Calculate the difference vector d CQ ( X CQ , t n ) 3. Calculate the autocorrelation of the difference vector 4. Find recurring peaks in the autocorrelation function B sum CQ ( X CQ , t n ) = � | X CQ ( b, t n ) | (8) b =0 d CQ ( X CQ , t n ) = sum CQ ( t n ) − sum CQ ( t n +1 ) (9) τ max � (10) a CQ (d CQ ( t n ) , τ ) = d CQ ( t n ) ∗ d CQ ( t n − τ ) τ =0 Introduction Music Signal Processing Classification Results Appendix L. Brüder Bibliography 26 / 22

  73. Tempo in bpm (beats per minute) � Intuitive: The speed at which humans tap when listening to a song � Problem: That speed is not well-defined. Some persons tap at quarters, some at halves, . . . 1. Take the sum of the constant-Q bins sum CQ ( X CQ , t n ) 2. Calculate the difference vector d CQ ( X CQ , t n ) 3. Calculate the autocorrelation of the difference vector 4. Find recurring peaks in the autocorrelation function B sum CQ ( X CQ , t n ) = � | X CQ ( b, t n ) | (8) b =0 d CQ ( X CQ , t n ) = sum CQ ( t n ) − sum CQ ( t n +1 ) (9) τ max � (10) a CQ (d CQ ( t n ) , τ ) = d CQ ( t n ) ∗ d CQ ( t n − τ ) τ =0 Introduction Music Signal Processing Classification Results Appendix L. Brüder Bibliography 26 / 22

  74. Tempo in bpm (beats per minute) � Intuitive: The speed at which humans tap when listening to a song � Problem: That speed is not well-defined. Some persons tap at quarters, some at halves, . . . 1. Take the sum of the constant-Q bins sum CQ ( X CQ , t n ) 2. Calculate the difference vector d CQ ( X CQ , t n ) 3. Calculate the autocorrelation of the difference vector 4. Find recurring peaks in the autocorrelation function B sum CQ ( X CQ , t n ) = � | X CQ ( b, t n ) | (8) b =0 d CQ ( X CQ , t n ) = sum CQ ( t n ) − sum CQ ( t n +1 ) (9) τ max � (10) a CQ (d CQ ( t n ) , τ ) = d CQ ( t n ) ∗ d CQ ( t n − τ ) τ =0 Introduction Music Signal Processing Classification Results Appendix L. Brüder Bibliography 26 / 22

  75. Tempo in bpm (beats per minute) � Intuitive: The speed at which humans tap when listening to a song � Problem: That speed is not well-defined. Some persons tap at quarters, some at halves, . . . 1. Take the sum of the constant-Q bins sum CQ ( X CQ , t n ) 2. Calculate the difference vector d CQ ( X CQ , t n ) 3. Calculate the autocorrelation of the difference vector 4. Find recurring peaks in the autocorrelation function B sum CQ ( X CQ , t n ) = � | X CQ ( b, t n ) | (8) b =0 d CQ ( X CQ , t n ) = sum CQ ( t n ) − sum CQ ( t n +1 ) (9) τ max � (10) a CQ (d CQ ( t n ) , τ ) = d CQ ( t n ) ∗ d CQ ( t n − τ ) τ =0 Introduction Music Signal Processing Classification Results Appendix L. Brüder Bibliography 26 / 22

  76. Tempo in bpm (beats per minute) � Intuitive: The speed at which humans tap when listening to a song � Problem: That speed is not well-defined. Some persons tap at quarters, some at halves, . . . 1. Take the sum of the constant-Q bins sum CQ ( X CQ , t n ) 2. Calculate the difference vector d CQ ( X CQ , t n ) 3. Calculate the autocorrelation of the difference vector 4. Find recurring peaks in the autocorrelation function B sum CQ ( X CQ , t n ) = � | X CQ ( b, t n ) | (8) b =0 d CQ ( X CQ , t n ) = sum CQ ( t n ) − sum CQ ( t n +1 ) (9) τ max � (10) a CQ (d CQ ( t n ) , τ ) = d CQ ( t n ) ∗ d CQ ( t n − τ ) τ =0 Introduction Music Signal Processing Classification Results Appendix L. Brüder Bibliography 26 / 22

  77. Tempo: Find recurring peaks Metronom, 80 bpm 2500 2000 1500 1000 500 0 −500 −1000 The unit of the −1500 absissica is 10 µs , 0 100 200 300 400 500 600 the ordinate has no unit. Introduction Music Signal Processing Classification Results Appendix L. Brüder Bibliography 27 / 22

  78. Tempo: Find recurring peaks Metronom, 80 bpm Drums, Hi−Hat on 8th, 80 bpm 2500 4000 3500 2000 3000 1500 2500 1000 2000 500 1500 1000 0 500 −500 0 −1000 −500 The unit of the −1500 −1000 absissica is 10 µs , 0 100 200 300 400 500 600 0 100 200 300 400 500 600 the ordinate has no unit. Introduction Music Signal Processing Classification Results Appendix L. Brüder Bibliography 27 / 22

  79. Tempo: Find recurring peaks Metronom, 80 bpm Drums, Hi−Hat on 8th, 80 bpm 2500 4000 3500 2000 3000 1500 2500 1000 2000 500 1500 1000 0 500 −500 0 −1000 −500 The unit of the −1500 −1000 absissica is 10 µs , 0 100 200 300 400 500 600 0 100 200 300 400 500 600 Drums, Hi−Hat on 16th, 80 bpm the ordinate has 3500 no unit. 3000 2500 2000 1500 1000 500 0 −500 −1000 0 100 200 300 400 500 600 Introduction Music Signal Processing Classification Results Appendix L. Brüder Bibliography 27 / 22

  80. Tempo: Find recurring peaks Metronom, 80 bpm Drums, Hi−Hat on 8th, 80 bpm 2500 4000 3500 2000 3000 1500 2500 1000 2000 500 1500 1000 0 500 −500 0 −1000 −500 The unit of the −1500 −1000 absissica is 10 µs , 0 100 200 300 400 500 600 0 100 200 300 400 500 600 Drums, Hi−Hat on 16th, 80 bpm 4 Test file: "dead_rocks.mp3", 103bpm the ordinate has x 10 3500 8 no unit. 3000 6 2500 2000 4 1500 2 1000 500 0 0 −2 −500 −1000 −4 0 100 200 300 400 500 600 0 100 200 300 400 500 600 Introduction Music Signal Processing Classification Results Appendix L. Brüder Bibliography 27 / 22

  81. Timbre � The timbre of a signal is “the way it sounds” � It is a multi-dimensional feature � In many publications, the Mel Frequency Cepstrum (MFC) is used � The lower (e.g. 8-16) coefficients describe the timbre � Short-time feature: typically one vector every 10-50ms � The MFC is not based on the Constant-Q transform, but: � Similar features can be derived from the Constant-Q transform (see [11]) Introduction Music Signal Processing Classification Results Appendix L. Brüder Bibliography 28 / 22

Recommend


More recommend