gct535 sound technology for multimedia timbre analysis
play

GCT535- Sound Technology for Multimedia Timbre Analysis Graduate - PowerPoint PPT Presentation

GCT535- Sound Technology for Multimedia Timbre Analysis Graduate School of Culture Technology KAIST Juhan Nam 1 Outlines Timbre Analysis Definition of Timbre Timbre Features Zero-crossing rate Spectral summary features


  1. GCT535- Sound Technology for Multimedia Timbre Analysis Graduate School of Culture Technology KAIST Juhan Nam 1

  2. Outlines § Timbre Analysis – Definition of Timbre – Timbre Features • Zero-crossing rate • Spectral summary features • Mel-Frequency Cepstral Coefficient (MFCC) 2

  3. What is timbre? § Definition – Attribute of sensation in terms of which a listener can judge that two sounds having the same loudness and pitch are dissimilar (ANSI) – Tone color or quality that defines a particular sound § Associated with classifying or identifying sound sources – Class: piano, guitar, singing voice, engine sound – Identity: Steinway Model D, Fender Stratocaster, Michael Jackson, Harley Davisson § Also used to holistically describe polyphonic sounds – For example, music or environmental sounds – Associated with genre, mood or other high-level descriptions 3

  4. What is timbre? § Timbre is a very vague concept – There is no single quantitative scale like loudness or pitch. – There are actually multiple attributes. § Different aspects of the multiplicity – Acoustic attributes: temporal or spectral factors – Timber space: perceptual similarity/dissimilarity – Semantic attributes: textual descriptions 4

  5. Acoustic Attributes in Timbre Perception § Acoustic Attributes (Schouten, 1968) – Harmonicity: the range between tonal and noise-like character – Time envelope (ADSR) – Spectral envelope – Changes of spectral envelope and fundamental frequency – The onset of a sound differing notably from the sustained vibration ADSR Changes of spectral envelope 5

  6. Acoustic Attributes in Timbre Perception § Sound design problem? 6

  7. Timbre Space § Perceptual multi-dimensional attributes based on measuring similarity – Ask human to listen a pair of sounds and judge the degree of similarity as a score – The similarity matrix is processed using multi- dimensional scaling (MDS), a dimensionality reduction algorithm which determines the timbre space § Acoustic correlation with the three (reduced) dimensions – Spectral energy distribution – Attack and decay time – Amount of inharmonic sound in the attack (Grey, 1977) 7

  8. Semantic attributes § Verbally describe different characteristics of timbre using words Dull______|______Sharp Dull______|______Brilliant Compact______|______Scattered Cold______|______Warm Full______|______Empty Pure______|______Rich Colorful______|______Colorless (Pratt and Doak, 1976) (von Bismark, 1974) (T. Rossing’s music150 slides) 8

  9. Timbre Feature Extraction § Extracting acoustic features from signals § Low-level Acoustic Features – Zero-crossing rates – Spectral summaries – Spectral envelope: MFCC 9

  10. Zero-Crossing Rate (ZCR) § ZCR is low for harmonic (voiced) sounds and high for noisy (unvoiced) sounds § For simple periodic signals, it is related to the F0 Voiced Unvoiced 10

  11. Spectral Summary Features § Spectral Centroid: “Center of gravity” of the spectrum – Associated with the brightness of sounds ∑ f k X t ( k ) k SC ( t ) = ∑ X t ( k ) k § Spectral Roll-off: frequency under which 85% or 95% of spectral energy is concentrated in R t N ∑ ∑ X t ( k ) = 0.85 X t ( k ) k k 11

  12. Spectral Summary Features § Spectral Spread(SS): a measure of the bandwidth of the spectrum ( f k − SC ( t )) 2 X t ( k ) ∑ SS ( t ) = k ∑ X t ( k ) k § Spectral flatness (SF): a measure of the noisiness of the spectrum – The ratio between the geometric and arithmetic means – Examples: white noise à 1, pure tone à 0 ∏ X t ( k ) K k SF ( t ) = 1 ∑ X t ( k ) K k 12

  13. Examples of Spectral Centroids 10000 10000 9000 9000 8000 8000 7000 7000 frequency [Hz] frequency [Hz] 6000 6000 5000 5000 4000 4000 3000 3000 2000 2000 1000 1000 0 0 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 time [sec] time [sec] Classical: “Beethoven String Quartet” Pop: “Video killed the radio star” 13

  14. Mel-Frequency Cepstral Coefficient (MFCC) § Most popularly used audio feature that extracts spectral envelop from an audio frame – Standard audio feature in speech recognition – Introduced in music domain by Logan in 2000 § Computation Steps DFT Mapping freq. Log DCT (audio frame) scale to mel magnitude 14

  15. Mel-Frequency Spectrogram § Convert linear frequency to mel scale § Usually reduce the dimensionality of spectrum Spectrum (mel-scaled) Spectrum 15

  16. Discrete Cosine Transform § Real-valued transform: similar to DFT – De-correlate the mel-scaled log spectrum and reduce the dimensionality again N − 1 2 x ( n )cos( π k ∑ X DCT ( k ) = N ( n − 0.5)) N n = 1 Spectrum (mel-scaled) MFCC 16

  17. Reconstructed Frequency Spectrum from MFCC Frequency spectrum MFCC Frequency spectrum (mel-scaled, 60 bins) (512 bins) (13 dim) Reconstructed Reconstructed Frequency spectrum Frequency Spectrum (mel-scaled) 17

  18. Comparison of Spectrogram and MFCC Spectrogram Mel-frequency Spectrogram MFCC Reconstructed Spectrogram from MFCC 18

  19. Sound Examples of MFCC § Original: § MFCC reconstruction (using white-noise as a source): 19

  20. Post-processing § Adding temporal dynamics – Short-term dynamics of features are characterized with delta or double-delta Δ x = x ( n ) − x ( n − h ) ΔΔ x = Δ x ( n ) − Δ x ( n − h ) h h – 39 MFCCs in speech recognition: 13 MFCCs + 13 delta + 13 double-delta § Normalization – Cepstral Mean Subtraction (CMS): subtract the mean over surrounding frames – Standardization: subtract the mean and divide by the variance 20

  21. Applications § Music – Musical Instrument classification – Music genre/mood classification – Similarity-based audio retrieval § Speech – Speech recognition – Speaker recognition 21

  22. References § J. Grey, “Multidimensional Perceptual Scaling of musical timbre”, 1977 § D. Wessel, “Timbre Space as a musical control structure”, 1979 § S. Donnadieu, “Mental Representation of the Timbre of Complex Sounds”, book chapter (ch. 8) in “Analysis, Synthesis and Perception of Musical sounds”, ed. J. Beauchamp, 2007 § B. Logan, “Mel Frequency Cepstral Coefficients for Music Modeling”, 2000 22

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend