features for audio and music classification
play

Features for Audio and Music Classification Martin F. McKinney and - PowerPoint PPT Presentation

Features for Audio and Music Classification Martin F. McKinney and Jeroen Breebaart Auditory and Multisensory Perception, Digital Signal Processing Group Philips Research Laboratories Eindhoven, The Netherlands Introduction Wanted:


  1. Features for Audio and Music Classification Martin F. McKinney and Jeroen Breebaart Auditory and Multisensory Perception, Digital Signal Processing Group Philips Research Laboratories Eindhoven, The Netherlands

  2. Introduction • Wanted: automatic audio and music classifier • Previous work: – Typical method: Feature extraction followed by classification – Specific method of classification is not always crucial • i.e., features are the limiting factor – Temporal properties of audio are important for classification and summarization • Our focus here is on features for audio classification and their temporal properties Features for Audio and Music Classification, M.F. McKinney, ISMIR2003, Baltimore, MD 2

  3. Method: General • Compare classification performance of four feature sets: – “Standard” low-level signal parameters – Mel-frequency cepstral coefficients (MFCC) – Psychoacoustic features – Auditory filterbank temporal envelope • Include statistics of feature temporal behavior as additional features • Evaluate classification using a multivariate Gaussian framework (Quadratic Discriminate Analysis - QDA) Features for Audio and Music Classification, M.F. McKinney, ISMIR2003, Baltimore, MD 3

  4. Method: Feature extraction 743-ms analysis frame 23-ms subframes Feature extraction Subframe feature vectors Spectral feature modeling Spectral 0 Hz 1-2 Hz 3-15 Hz 20-43 Hz Feature model Feature selection (9 best for maximum prediction training data) Final feature vector Features for Audio and Music Classification, M.F. McKinney, ISMIR2003, Baltimore, MD 4

  5. Method: Classification • Classification tasks – Five class general audio classification • Classical music (35), popular music (188), speech (31), background noise (25), crowd noise (31) – Seven class music genre classification • Jazz (38), Folk (23), Electronica (27), R&B (43), Rock (37), Reggae (11), Vocal (9) • QDA training and cross-validation with the .632+ bootstrap method Features for Audio and Music Classification, M.F. McKinney, ISMIR2003, Baltimore, MD 5

  6. Results: Standard Low Level features Feature ranking: General Audio, Music Genre DC 1-2 Hz 3-15 Hz 20-43 Hz 1. RMS level 3, 3 8 7, 9 2. Spectral centroid 3. Bandwidth 6, 7 4. Zero crossing rate 4 5. Spectral roll-off freq 1, 2 6. Band energy ratio 2, 6 4, 1 7. Delta spectrum mag. 8. “Pitch” 5, 5 8 9. “Pitch” strength 9 Features for Audio and Music Classification, M.F. McKinney, ISMIR2003, Baltimore, MD 6

  7. Results: Standard Low Level features Classification with 9 best features General Audio (86 ± 4%) Music Genre (61 ± 11%) 0.64 0.98 Jazz Clas ± 0.1 ± 0.02 0.8 Folk ± 0.09 0.83 Real Class Pop 0.51 ± 0.03 Elct ± 0.15 0.94 0.49 Spch R&B ± 0.08 ± 0.04 0.76 Rock 0.6 ± 0.07 Nse ± 0.12 0.57 Regg ± 0.17 0.97 Crwd 0.52 Vocl ± 0.02 ± 0.22 Clas Pop Spch Nse Crwd Jazz Folk Elct R&B Rock Regg Vocl Classification Result Features for Audio and Music Classification, M.F. McKinney, ISMIR2003, Baltimore, MD 7

  8. Results: MFCC features Feature ranking: General Audio, Music Genre DC 1-2 Hz 3-15 Hz 20-43 Hz 1. MFCC 0 3, 2 2, 6 1 2. MFCC 1 1, 4 3. MFCC 2 5, 7 4. MFCC 3 3 5. MFCC 4 6 6. MFCC 5 5 7. MFCC 6 9 8. MFCC 7 9. MFCC 8 7 10. MFCC 9 4 11. MFCC 10 8, 8 12. MFCC 11 13. MFCC 12 9 Features for Audio and Music Classification, M.F. McKinney, ISMIR2003, Baltimore, MD 8

  9. Results: MFCC features Classification with 9 best features General Audio (92 ± 3%) Music Genre (65 ± 10%) 0.68 0.89 Jazz Clas ± 0.08 ± 0.05 0.83 Folk ± 0.07 0.92 Real Class Pop 0.53 ± 0.01 Elct ± 0.13 0.97 0.46 Spch R&B ± 0.09 ± 0.02 0.78 Rock 0.82 ± 0.05 Nse ± 0.07 0.54 Regg ± 0.16 0.97 Crwd 0.73 Vocl ± 0.02 ± 0.2 Clas Pop Spch Nse Crwd Jazz Folk Elct R&B Rock Regg Vocl Classification Result Features for Audio and Music Classification, M.F. McKinney, ISMIR2003, Baltimore, MD 9

  10. Results: Psychoacoustic features Feature ranking: General Audio, Music Genre DC 1-2 Hz 3-15 Hz 20-43 Hz 3, 2 N/A N/A N/A 1. Roughness 2. Roughness Std. Dev. 7 N/A N/A N/A 3. Loudness 4, 5 8 6, 6 5, 4 4. Sharpness 2, 1 9, 7 1, 3 8, 9 Features for Audio and Music Classification, M.F. McKinney, ISMIR2003, Baltimore, MD 10

  11. Results: Psychoacoustic features Classification with 9 best features General Audio (92 ± 3%) Music Genre (62 ± 10%) 0.63 0.94 Jazz Clas ± 0.08 ± 0.02 0.72 Folk ± 0.09 0.85 Real Class Pop 0.71 ± 0.02 Elct ± 0.09 1 0.52 Spch R&B ± 0.09 ± 0 0.69 Rock 0.89 ± 0.08 Nse ± 0.05 0.55 Regg ± 0.18 0.9 Crwd 0.5 Vocl ± 0.03 ± 0.2 Clas Pop Spch Nse Crwd Jazz Folk Elct R&B Rock Regg Vocl Classification Result Features for Audio and Music Classification, M.F. McKinney, ISMIR2003, Baltimore, MD 11

  12. Results: AFTE features Feature ranking: General Audio, Music Genre DC 3-15 Hz 20-150 Hz 150-1000 Hz 7, 6 N/A N/A 1. AFTE 1 (Fc = 26 Hz) 2. AFTE 2 (Fc = 88 Hz) 1 7 N/A N/A 3. AFTE 3 (Fc = 164 Hz) 1, 3 N/A 4. AFTE 4 (Fc = 258 Hz) 8 N/A 7. AFTE 7 (Fc = 703 Hz) 5 6 N/A 8. AFTE 8 (Fc = 927 Hz) N/A 9. AFTE 9 (Fc = 1206 Hz) 4 9 12. AFTE 12 (Fc = 2514 Hz) 8 9 16. AFTE 16 (Fc = 6279 Hz) 5 17. AFTE 17 (Fc = 7848 Hz) 18. AFTE 18 (Fc = 9795 Hz) 3, 2 4 2 Features for Audio and Music Classification, M.F. McKinney, ISMIR2003, Baltimore, MD 12

  13. Results: AFTE features Classification with 9 best features General Audio (93 ± 2%) Music Genre (74 ± 9%) 0.81 0.94 Jazz Clas ± 0.05 ± 0.01 0.84 Folk ± 0.06 0.95 Real Class Pop 0.71 ± 0.01 Elct ± 0.11 0.97 0.68 Spch R&B ± 0.07 ± 0.02 0.77 Rock 0.85 ± 0.07 Nse ± 0.06 0.61 Regg ± 0.17 0.91 Crwd 0.76 Vocl ± 0.03 ± 0.16 Clas Pop Spch Nse Crwd Jazz Folk Elct R&B Rock Regg Vocl Classification Result Features for Audio and Music Classification, M.F. McKinney, ISMIR2003, Baltimore, MD 13

  14. Results Summary SLL MFCC PA AFTE 86 ± 4% 92 ± 3% 92 ± 3% 93 ± 2% General Audio 61 ± 11% 65 ± 10% 62 ± 10% 74 ± 9% Music Genre Features for Audio and Music Classification, M.F. McKinney, ISMIR2003, Baltimore, MD 14

  15. Conclusions • Classification based on features from an auditory model (AFTE) is better than that from other standard feature sets. • Temporal modulations of features are important for audio and music classification. • Feature development can improve audio and music classification. Features for Audio and Music Classification, M.F. McKinney, ISMIR2003, Baltimore, MD 15

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend