GCT535- Sound Technology for Multimedia Timbre Analysis Graduate - PowerPoint PPT Presentation

GCT535- Sound Technology for Multimedia Timbre Analysis Graduate School of Culture Technology KAIST Juhan Nam 1

Outlines § Timbre Analysis – Definition of Timbre – Timbre Features • Zero-crossing rate • Spectral summary features • Mel-Frequency Cepstral Coefficient (MFCC) 2

What is timbre? § Definition – Attribute of sensation in terms of which a listener can judge that two sounds having the same loudness and pitch are dissimilar (ANSI) – Tone color or quality that defines a particular sound § Associated with classifying or identifying sound sources – Class: piano, guitar, singing voice, engine sound – Identity: Steinway Model D, Fender Stratocaster, Michael Jackson, Harley Davisson § Also used to holistically describe polyphonic sounds – For example, music or environmental sounds – Associated with genre, mood or other high-level descriptions 3

What is timbre? § Timbre is a very vague concept – There is no single quantitative scale like loudness or pitch. – There are actually multiple attributes. § Different aspects of the multiplicity – Acoustic attributes: temporal or spectral factors – Timber space: perceptual similarity/dissimilarity – Semantic attributes: textual descriptions 4

Acoustic Attributes in Timbre Perception § Acoustic Attributes (Schouten, 1968) – Harmonicity: the range between tonal and noise-like character – Time envelope (ADSR) – Spectral envelope – Changes of spectral envelope and fundamental frequency – The onset of a sound differing notably from the sustained vibration ADSR Changes of spectral envelope 5

Acoustic Attributes in Timbre Perception § Sound design problem? 6

Timbre Space § Perceptual multi-dimensional attributes based on measuring similarity – Ask human to listen a pair of sounds and judge the degree of similarity as a score – The similarity matrix is processed using multidimensional scaling (MDS), a dimensionality reduction algorithm which determines the timbre space § Acoustic correlation with the three (reduced) dimensions – Spectral energy distribution – Attack and decay time – Amount of inharmonic sound in the attack (Grey, 1977) 7

Semantic attributes § Verbally describe different characteristics of timbre using words Dull______|______Sharp Dull______|______Brilliant Compact______|______Scattered Cold______|______Warm Full______|______Empty Pure______|______Rich Colorful______|______Colorless (Pratt and Doak, 1976) (von Bismark, 1974) (T. Rossing’s music150 slides) 8

Timbre Feature Extraction § Extracting acoustic features from signals § Low-level Acoustic Features – Zero-crossing rates – Spectral summaries – Spectral envelope: MFCC 9

Zero-Crossing Rate (ZCR) § ZCR is low for harmonic (voiced) sounds and high for noisy (unvoiced) sounds § For simple periodic signals, it is related to the F0 Voiced Unvoiced 10

Spectral Summary Features § Spectral Centroid: “Center of gravity” of the spectrum – Associated with the brightness of sounds ∑ f k X t ( k ) k SC ( t ) = ∑ X t ( k ) k § Spectral Roll-off: frequency under which 85% or 95% of spectral energy is concentrated in R t N ∑ ∑ X t ( k ) = 0.85 X t ( k ) k k 11

Spectral Summary Features § Spectral Spread(SS): a measure of the bandwidth of the spectrum ( f k − SC ( t )) 2 X t ( k ) ∑ SS ( t ) = k ∑ X t ( k ) k § Spectral flatness (SF): a measure of the noisiness of the spectrum – The ratio between the geometric and arithmetic means – Examples: white noise à 1, pure tone à 0 ∏ X t ( k ) K k SF ( t ) = 1 ∑ X t ( k ) K k 12

Examples of Spectral Centroids 10000 10000 9000 9000 8000 8000 7000 7000 frequency [Hz] frequency [Hz] 6000 6000 5000 5000 4000 4000 3000 3000 2000 2000 1000 1000 0 0 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 time [sec] time [sec] Classical: “Beethoven String Quartet” Pop: “Video killed the radio star” 13

Mel-Frequency Cepstral Coefficient (MFCC) § Most popularly used audio feature that extracts spectral envelop from an audio frame – Standard audio feature in speech recognition – Introduced in music domain by Logan in 2000 § Computation Steps DFT Mapping freq. Log DCT (audio frame) scale to mel magnitude 14

Mel-Frequency Spectrogram § Convert linear frequency to mel scale § Usually reduce the dimensionality of spectrum Spectrum (mel-scaled) Spectrum 15

Discrete Cosine Transform § Real-valued transform: similar to DFT – De-correlate the mel-scaled log spectrum and reduce the dimensionality again N − 1 2 x ( n )cos( π k ∑ X DCT ( k ) = N ( n − 0.5)) N n = 1 Spectrum (mel-scaled) MFCC 16

Reconstructed Frequency Spectrum from MFCC Frequency spectrum MFCC Frequency spectrum (mel-scaled, 60 bins) (512 bins) (13 dim) Reconstructed Reconstructed Frequency spectrum Frequency Spectrum (mel-scaled) 17

Comparison of Spectrogram and MFCC Spectrogram Mel-frequency Spectrogram MFCC Reconstructed Spectrogram from MFCC 18

Sound Examples of MFCC § Original: § MFCC reconstruction (using white-noise as a source): 19

Post-processing § Adding temporal dynamics – Short-term dynamics of features are characterized with delta or double-delta Δ x = x ( n ) − x ( n − h ) ΔΔ x = Δ x ( n ) − Δ x ( n − h ) h h – 39 MFCCs in speech recognition: 13 MFCCs + 13 delta + 13 double-delta § Normalization – Cepstral Mean Subtraction (CMS): subtract the mean over surrounding frames – Standardization: subtract the mean and divide by the variance 20

Applications § Music – Musical Instrument classification – Music genre/mood classification – Similarity-based audio retrieval § Speech – Speech recognition – Speaker recognition 21

References § J. Grey, “Multidimensional Perceptual Scaling of musical timbre”, 1977 § D. Wessel, “Timbre Space as a musical control structure”, 1979 § S. Donnadieu, “Mental Representation of the Timbre of Complex Sounds”, book chapter (ch. 8) in “Analysis, Synthesis and Perception of Musical sounds”, ed. J. Beauchamp, 2007 § B. Logan, “Mel Frequency Cepstral Coefficients for Music Modeling”, 2000 22

GCT535- Sound Technology for Multimedia Timbre Analysis Graduate - PowerPoint PPT Presentation

GCT535- Sound Technology for Multimedia Timbre Analysis Graduate School of Culture Technology KAIST Juhan Nam 1 Outlines Timbre Analysis Definition of Timbre Timbre Features Zero-crossing rate Spectral summary features

GCT535- Sound Technology for Multimedia Tonal Analysis Graduate School of Culture Technology

GCT535- Sound Technology for Multimedia Temporal Analysis Graduate School of Culture Technology

GCT535- Sound Technology for Multimedia Digital Systems Graduate School of Culture Technology

GCT535- Sound Technology for Multimedia Digital Audio Graduate School of Culture Technology

GCT535- Sound Technology for Multimedia Pitch Analysis Graduate School of Culture Technology

GCT535- Sound Technology for Multimedia Delay-based Effects Graduate School of Culture Technology

GCT535- Sound Technology for Multimedia Filters Graduate School of Culture Technology KAIST

GCT535- Sound Technology for Multimedia Music and Audio Alignment Graduate School of Culture

GCT535- Sound Technology for Multimedia Time-Stretching and Pitch-Shifting Graduate School of

GCT535- Sound Technology for Multimedia Fourier Representations of Audio Graduate School of

? Message sound Message P(wolf|sound) P(sound| wolf) x P(wolf) 1 9/4/19 P(sound| wolf)

Sonification - Sound of Science VU, WS 2013 Lecture 8 - Parameter Mapping Visda Goudarzi

Timbre Identification Classification of Musical Timbre Using Bayesian Networks Carina Schffer

Multimedia Systems Definition of Multimedia System A Multimedia System is a system capable of

Multimedia Applications Multimedia Applications Srinidhi Varadarajan Multimedia Applications

Chapter 1 Introduction to Multimedia 1.1 What is Multimedia? 1.2 Multimedia and Hypermedia 1.3

Enabling collaboration in offsite construction 19 th June 2019 Direction Group Meeting Encon,

WoodSolutions New market opportunities for the F & T sector New market opportunities for the

Ser ervice e Del eliver ery O y Options f for Private e Woodland O Owner ers I Inter

A campaign to promote A campaign to promote wood in the UK wood in the UK David Bills David

Outcome Based Approach in Outcome Based Approach in Education and Accreditation Education and

Outcome Based Approach in Outcome Based Approach in Outcome Based Approach in Outcome Based

Cloud meets GRID Wolfgang Hennerbichler (wolfgang.hennerbichler@risc-software.at) RISC Software

Taxonomies of Collaborative Applications Prasun Dewan Department of Computer Science University

Sambuz

Useful Links

Newsletter

Mail Us

GCT535- Sound Technology for Multimedia Timbre Analysis Graduate - PowerPoint PPT Presentation

GCT535- Sound Technology for Multimedia Timbre Analysis Graduate School of Culture Technology KAIST Juhan Nam 1 Outlines Timbre Analysis Definition of Timbre Timbre Features Zero-crossing rate Spectral summary features

GCT535- Sound Technology for Multimedia Tonal Analysis Graduate School of Culture Technology

GCT535- Sound Technology for Multimedia Temporal Analysis Graduate School of Culture Technology

GCT535- Sound Technology for Multimedia Digital Systems Graduate School of Culture Technology

GCT535- Sound Technology for Multimedia Digital Audio Graduate School of Culture Technology

GCT535- Sound Technology for Multimedia Pitch Analysis Graduate School of Culture Technology

GCT535- Sound Technology for Multimedia Delay-based Effects Graduate School of Culture Technology

GCT535- Sound Technology for Multimedia Filters Graduate School of Culture Technology KAIST

GCT535- Sound Technology for Multimedia Music and Audio Alignment Graduate School of Culture

GCT535- Sound Technology for Multimedia Time-Stretching and Pitch-Shifting Graduate School of

GCT535- Sound Technology for Multimedia Fourier Representations of Audio Graduate School of

? Message sound Message P(wolf|sound) P(sound| wolf) x P(wolf) 1 9/4/19 P(sound| wolf)

Sonification - Sound of Science VU, WS 2013 Lecture 8 - Parameter Mapping Visda Goudarzi

Timbre Identification Classification of Musical Timbre Using Bayesian Networks Carina Schffer

Multimedia Systems Definition of Multimedia System A Multimedia System is a system capable of

Multimedia Applications Multimedia Applications Srinidhi Varadarajan Multimedia Applications

Chapter 1 Introduction to Multimedia 1.1 What is Multimedia? 1.2 Multimedia and Hypermedia 1.3

Enabling collaboration in offsite construction 19 th June 2019 Direction Group Meeting Encon,

WoodSolutions New market opportunities for the F &amp; T sector New market opportunities for the

Ser ervice e Del eliver ery O y Options f for Private e Woodland O Owner ers I Inter

A campaign to promote A campaign to promote wood in the UK wood in the UK David Bills David

Outcome Based Approach in Outcome Based Approach in Education and Accreditation Education and

Outcome Based Approach in Outcome Based Approach in Outcome Based Approach in Outcome Based

Cloud meets GRID Wolfgang Hennerbichler (wolfgang.hennerbichler@risc-software.at) RISC Software

Taxonomies of Collaborative Applications Prasun Dewan Department of Computer Science University

Sambuz

Useful Links

Newsletter

Mail Us

WoodSolutions New market opportunities for the F & T sector New market opportunities for the