Features for Audio and Music Classification Martin F. McKinney and - - PowerPoint PPT Presentation

features for audio and music classification
SMART_READER_LITE
LIVE PREVIEW

Features for Audio and Music Classification Martin F. McKinney and - - PowerPoint PPT Presentation

Features for Audio and Music Classification Martin F. McKinney and Jeroen Breebaart Auditory and Multisensory Perception, Digital Signal Processing Group Philips Research Laboratories Eindhoven, The Netherlands Introduction Wanted:


slide-1
SLIDE 1

Features for Audio and Music Classification

Martin F. McKinney and Jeroen Breebaart

Auditory and Multisensory Perception, Digital Signal Processing Group Philips Research Laboratories Eindhoven, The Netherlands

slide-2
SLIDE 2

2

Features for Audio and Music Classification, M.F. McKinney, ISMIR2003, Baltimore, MD

Introduction

  • Wanted: automatic audio and music classifier
  • Previous work:

– Typical method: Feature extraction followed by classification – Specific method of classification is not always crucial

  • i.e., features are the limiting factor

– Temporal properties of audio are important for classification and summarization

  • Our focus here is on features for audio

classification and their temporal properties

slide-3
SLIDE 3

3

Features for Audio and Music Classification, M.F. McKinney, ISMIR2003, Baltimore, MD

Method: General

  • Compare classification performance of four

feature sets:

– “Standard” low-level signal parameters – Mel-frequency cepstral coefficients (MFCC) – Psychoacoustic features – Auditory filterbank temporal envelope

  • Include statistics of feature temporal behavior as

additional features

  • Evaluate classification using a multivariate

Gaussian framework (Quadratic Discriminate Analysis - QDA)

slide-4
SLIDE 4

4

Features for Audio and Music Classification, M.F. McKinney, ISMIR2003, Baltimore, MD

Method: Feature extraction

743-ms analysis frame 23-ms subframes Feature extraction Subframe feature vectors Spectral feature modeling Spectral Feature model 0 Hz 1-2 Hz 3-15 Hz 20-43 Hz Feature selection (9 best for maximum prediction training data) Final feature vector

slide-5
SLIDE 5

5

Features for Audio and Music Classification, M.F. McKinney, ISMIR2003, Baltimore, MD

Method: Classification

  • Classification tasks

– Five class general audio classification

  • Classical music (35), popular music (188), speech

(31), background noise (25), crowd noise (31)

– Seven class music genre classification

  • Jazz (38), Folk (23), Electronica (27), R&B (43),

Rock (37), Reggae (11), Vocal (9)

  • QDA training and cross-validation with the

.632+ bootstrap method

slide-6
SLIDE 6

6

Features for Audio and Music Classification, M.F. McKinney, ISMIR2003, Baltimore, MD

Results: Standard Low Level features

Feature ranking: General Audio, Music Genre

  • 1. RMS level

7, 9 8 3, 3 9

  • 9. “Pitch” strength

8 5, 5

  • 8. “Pitch”
  • 7. Delta spectrum mag.

4, 1 2, 6

  • 6. Band energy ratio

1, 2

  • 5. Spectral roll-off freq

4

  • 4. Zero crossing rate

6, 7

  • 3. Bandwidth
  • 2. Spectral centroid

20-43 Hz 3-15 Hz 1-2 Hz DC

slide-7
SLIDE 7

7

Features for Audio and Music Classification, M.F. McKinney, ISMIR2003, Baltimore, MD

Results: Standard Low Level features

Classification with 9 best features General Audio (86±4%) Music Genre (61±11%)

0.98

±0.02

0.83

±0.03

0.94

±0.04

0.6

±0.12

0.97

±0.02

Clas Pop Spch Nse Crwd Clas Pop Spch Nse Crwd

0.64

±0.1

0.8

±0.09

0.51

±0.15

0.49

±0.08

0.76

±0.07

0.57

±0.17

0.52

±0.22

Jazz Folk Elct R&B Rock Regg Vocl Jazz Folk Elct R&B Rock Regg Vocl

Real Class Classification Result

slide-8
SLIDE 8

8

Features for Audio and Music Classification, M.F. McKinney, ISMIR2003, Baltimore, MD

Results: MFCC features

Feature ranking: General Audio, Music Genre

9

  • 13. MFCC 12
  • 12. MFCC 11

8, 8

  • 11. MFCC 10

4

  • 10. MFCC 9
  • 1. MFCC 0

1 2, 6 3, 2 7

  • 9. MFCC 8
  • 8. MFCC 7

9

  • 7. MFCC 6

5

  • 6. MFCC 5

6

  • 5. MFCC 4

3

  • 4. MFCC 3

5, 7

  • 3. MFCC 2

1, 4

  • 2. MFCC 1

20-43 Hz 3-15 Hz 1-2 Hz DC

slide-9
SLIDE 9

9

Features for Audio and Music Classification, M.F. McKinney, ISMIR2003, Baltimore, MD

Results: MFCC features

Classification with 9 best features General Audio (92±3%) Music Genre (65±10%) Real Class

0.89

±0.05

0.92

±0.01

0.97

±0.02

0.82

±0.07

0.97

±0.02

Clas Pop Spch Nse Crwd Clas Pop Spch Nse Crwd

0.68

±0.08

0.83

±0.07

0.53

±0.13

0.46

±0.09

0.78

±0.05

0.54

±0.16

0.73

±0.2

Jazz Folk Elct R&B Rock Regg Vocl Jazz Folk Elct R&B Rock Regg Vocl

Classification Result

slide-10
SLIDE 10

10

Features for Audio and Music Classification, M.F. McKinney, ISMIR2003, Baltimore, MD

Results: Psychoacoustic features

Feature ranking: General Audio, Music Genre

  • 1. Roughness

N/A N/A N/A 3, 2 8, 9 1, 3 9, 7 2, 1

  • 4. Sharpness

5, 4 6, 6 8 4, 5

  • 3. Loudness

N/A N/A N/A 7

  • 2. Roughness Std. Dev.

20-43 Hz 3-15 Hz 1-2 Hz DC

slide-11
SLIDE 11

11

Features for Audio and Music Classification, M.F. McKinney, ISMIR2003, Baltimore, MD

Results: Psychoacoustic features

Classification with 9 best features General Audio (92±3%) Music Genre (62±10%) Real Class

0.94

±0.02

0.85

±0.02

1

±0

0.89

±0.05

0.9

±0.03

Clas Pop Spch Nse Crwd Clas Pop Spch Nse Crwd

0.63

±0.08

0.72

±0.09

0.71

±0.09

0.52

±0.09

0.69

±0.08

0.55

±0.18

0.5

±0.2

Jazz Folk Elct R&B Rock Regg Vocl Jazz Folk Elct R&B Rock Regg Vocl

Classification Result

slide-12
SLIDE 12

12

Features for Audio and Music Classification, M.F. McKinney, ISMIR2003, Baltimore, MD

Results: AFTE features

Feature ranking: General Audio, Music Genre

2 4 3, 2

  • 18. AFTE 18 (Fc = 9795 Hz)
  • 17. AFTE 17 (Fc = 7848 Hz)

5

  • 16. AFTE 16 (Fc = 6279 Hz)

9 8

  • 12. AFTE 12 (Fc = 2514 Hz)
  • 1. AFTE 1 (Fc = 26 Hz)

N/A N/A 7, 6 9 4

  • 9. AFTE 9 (Fc = 1206 Hz)

N/A

  • 8. AFTE 8 (Fc = 927 Hz)

N/A 6 5

  • 7. AFTE 7 (Fc = 703 Hz)

N/A 8

  • 4. AFTE 4 (Fc = 258 Hz)

N/A 1, 3

  • 3. AFTE 3 (Fc = 164 Hz)

N/A N/A 7 1

  • 2. AFTE 2 (Fc = 88 Hz)

150-1000 Hz 20-150 Hz 3-15 Hz DC

slide-13
SLIDE 13

13

Features for Audio and Music Classification, M.F. McKinney, ISMIR2003, Baltimore, MD

Results: AFTE features

Classification with 9 best features General Audio (93±2%) Music Genre (74±9%) Real Class

0.94

±0.01

0.95

±0.01

0.97

±0.02

0.85

±0.06

0.91

±0.03

Clas Pop Spch Nse Crwd Clas Pop Spch Nse Crwd

0.81

±0.05

0.84

±0.06

0.71

±0.11

0.68

±0.07

0.77

±0.07

0.61

±0.17

0.76

±0.16

Jazz Folk Elct R&B Rock Regg Vocl Jazz Folk Elct R&B Rock Regg Vocl

Classification Result

slide-14
SLIDE 14

14

Features for Audio and Music Classification, M.F. McKinney, ISMIR2003, Baltimore, MD

Results Summary

Music Genre General Audio 74±9% 62±10% 65±10% 61±11% 93±2% 92±3% 92±3% 86±4% AFTE PA MFCC SLL

slide-15
SLIDE 15

15

Features for Audio and Music Classification, M.F. McKinney, ISMIR2003, Baltimore, MD

Conclusions

  • Classification based on features from an

auditory model (AFTE) is better than that from other standard feature sets.

  • Temporal modulations of features are

important for audio and music classification.

  • Feature development can improve audio

and music classification.

slide-16
SLIDE 16