gct535 sound technology for multimedia tonal analysis
play

GCT535- Sound Technology for Multimedia Tonal Analysis Graduate - PowerPoint PPT Presentation

GCT535- Sound Technology for Multimedia Tonal Analysis Graduate School of Culture Technology KAIST Juhan Nam 1 Outline Pitch Perception Perceptual Pitch Scale Log-Scaled Spectrum Tonal Analysis Chroma Feature Key


  1. GCT535- Sound Technology for Multimedia Tonal Analysis Graduate School of Culture Technology KAIST Juhan Nam 1

  2. Outline § Pitch Perception – Perceptual Pitch Scale – Log-Scaled Spectrum § Tonal Analysis – Chroma Feature – Key Estimation – Chord Recognition 2

  3. Frequency Scale in Spectrogram § Linear frequency scale – Great to see the harmonic structure of a single tone. – However, it is not the most intuitive way to visualize musical signals 4000 10000 3500 3000 8000 frequency − Hz 2500 frequency − Hz 6000 2000 1500 4000 1000 2000 500 0 0 0 1 2 3 4 5 6 7 8 10 20 30 40 50 time [second] time [second] Beatles “Hey Jude” Piano (Chromatic Scale) 3

  4. Human Pitch Perception § Human ears are sensitive to frequency changes in a log scale – Pitch resolution: just noticeable difference (JND) increases as the frequency goes up – Place theory: resonance position along the basilar membrane in cochlea From CCRMA Music 150 slides (Thomas Rossing) Response of the basilar membrane to a pair of tones 4

  5. Critical Bandwidth § Frequency bandwidth within which one tone interferes with the perception of another tone by auditory masking – Constant at low frequency but linear at high frequency 5 From CCRMA Music 150 slides (Thomas Rossing)

  6. Psychoacoustical Pitch Scales § Mel scale – Based on pitch ratio of tones (mel from 1 “melody”) 0.9 m = 2595log 10 (1 + f / 700) 0.8 0.7 normalized scales 0.6 § Bark scale 0.5 – Critical band measurement by masking 0.4 0.3 Bark = 13arctan(0.00075 f ) + 3.5arctan(( f / 7500) 2 ) 0.2 ERB 0.1 Mel § Equivalent Regular Bandwidth (EBR) rate Bark 0 0 0.5 1 1.5 2 2.5 frequency (Hz) 4 – Critical band measurement using the notched- x 10 Comparison of Pitch Scales noise method Using Matlab code from https://www.speech.kth.se/~giampi/auditoryscales/ ERBS = 21.4 ⋅ log 10 (1 + 0.00437 f ) 6

  7. Musical Pitch Scale § Equal temperament – 1: 2 1/12 ratio between two adjacent notes – Music note ( m ) and frequency ( f ) in Hz m = 12log 2 ( f ( m − 69) 440) + 69, f = 440 ⋅ 2 12 7 https://newt.phys.unsw.edu.au/jw/notes.html

  8. Frequency Mapping Using Spectrogram § Mapping linear scale to a perceptual (log-like) scale – Locate center frequencies according to the frequency mapping – Linear interpolation on the center frequency with the corresponding bandwidth skirt 4000 120 3500 100 3000 Band Center MIDI note number width Frequency 80 2500 frequency − Hz 2000 60 1500 40 1000 20 500 0 10 20 30 40 50 10 20 30 40 50 time [second] time [second] Log-Frequency Spectrogram Linear-Frequency Spectrogram 8

  9. Frequency Mapping Using Spectrogram § The mapping can be formed as matrix multiplication – Each column of the mapping matrix contain the interpolation coefficients Y = M ⋅ X ( M : mapping matrix, X : spectrogram, Y : scaled spectrogram) 4000 120 3500 20 100 3000 40 MIDI note number 2500 frequency − Hz 80 × = 60 2000 60 80 1500 40 1000 100 20 500 120 0 100 200 300 400 500 600 10 20 30 40 50 10 20 30 40 50 time [second] time [second] § Limitation – Simple but time frequency resolutions are still constrained on STFT 9

  10. Mel-Frequency Spectrogram § Mel scale is a popularly choice – Example: MFCC 250 10000 200 8000 frequency − Hz 150 Mel bin 6000 100 4000 50 2000 0 0 1 2 3 4 5 6 7 8 0 1 2 3 4 5 6 7 8 time [second] time [second] Linear-Frequency Spectrogram Mel-Frequency Spectrogram 10

  11. Constant-Q transform § Use a set of sinusoidal kernels with: – Logarithmically spaced frequencies – Constant Q = frequency/bandwidth 11

  12. Comparison of Different Time-Frequency Representations frequency frequency time time Spectrogram (short window) Spectrogram (long window) frequency frequency time time Constant-Q transform Mel Spectrogram 12

  13. Example of Constant-Q transform 320 120 300 280 100 260 MIDI note number 240 80 220 60 200 180 40 160 140 20 120 100 10 20 30 40 50 0 10 20 30 40 50 time [second] time [second] Log-Frequency Spectrogram (mapping) Log-Frequency Spectrogram (Constant-Q transform) 13

  14. Chord Recognition in MIR § Identifying chord progression of tonal music § It is a challenging task (even for human) – Chords are not explicit in music – Non-chord notes or passing notes – Key change and chromaticism: requires in-depth knowledge of music theory – In audio, multiple musical instruments are mixed • Relevant: harmonically arranged notes • Irrelevant: percussive sounds (but can help detecting chord changes) § What kind of audio features can be extracted to recognize chords in a robust way? 14

  15. Pitch Helix § The basic assumption in tonal harmony is that octave-distance notes belong to the same pitch class – No dissonance among them – As a result, there are “12 pitch class” § Shepard represented the octave equivalence with “pitch helix” – Chroma: represents the inherent circularity of pitch organization – Height: naturally increase and have one octave apart for one rotation Pitch Helix and Chroma (Shepard, 2001) 15

  16. Chroma § Chroma is independent of the height – Shepard tone: single pitch class in harmonics – Constant rising and falling https://vimeo.com/34749558 Shepard tone Optical illusion stairs § Chroma contains the relative distribution of pitch classes and pitch height is noisy variation in chord recognition – Thus, chroma is considered to be well-suited for analyzing harmony. 16

  17. Chroma Features § Chroma features are audio feature vectors that contain the chroma characteristics – Ideally, obtained by polyphonic note transcription but too expensive – In addition, as notes are more harmonized, separating polyphonic notes become harder § In practice, chroma features are obtained by projecting all time-frequency energy onto 12 pitch classes § Used for not only for chord recognition but also key estimation, segmentation, synchronization, cover-song detection 17

  18. Chroma Features: FFT-based approach § Compute spectrogram and mapping matrix – Convert frequency to music pitch scale and get the pitch class – Set one to the corresponding pitch class and, otherwise, set zero – Adjust non-zeros values such that low-frequency content have more weights 18

  19. Improvements § Blurring – Intrinsic problem with STFT – Solutions: find amplitude peaks and use them only § De-tuning – Notes can be deviated from reference tuning – Compute 36 bin chroma features: add two neighboring bins to each pitch class – Use only a peak value among the three bins per pitch class § Normalization – Divide the frame chroma features by the local maximum or mean to regularize the volume change 19

  20. Chroma Features: Filter-bank approach § Alternatively, a filter-bank can be used to get a log-scale time-frequency representation – Center frequencies are arranged over 88 piano notes – band widths are set to have constant-Q and robust to +/- 25 cent detune § The outputs that belong to the same pitch class are wrapped and summed. (Muller, 2011) 20

  21. Beat-Synchronous Chroma Features § Make chroma features homogeneous within a beat (Bartsch and Wakefield, 2001) (From Ellis’ slides) 21

  22. Key Estimation Overview § Estimate music key from music data – One of 24 keys: 12 pitch classes (C, C#, D, .., B) + major/minor § General Framework (Gomez, 2006) Chroma Similarity Average Key G major Features Measure Strength Key Template 22

  23. Key Template § Probe tone profile (Krumhansl and Kessler, 1982) – Relative stability or weight of tones – Listeners rated which tones best completed the first seven notes of a major scale. • For example, in C major key, C, D, E, F, G, A, B, … what? Probe Tone Profile - Relative Pitch Ranking 23

  24. Key Estimation § Similarity by cross-correlation between chroma features and templates § Find the key that produces the maximum correlation 24

  25. Chord Recognition § Estimate chords from music data – Typically, one of 24 keys: 12 pitch classes + major/minor – Often, diminish chords are added (36 chords) § General Framework Template Matching HMM, SVM Audio/ Decision Chords Chroma Transform Making Features Chord Template or Models 25

  26. Template-Based Approach § Use chord templates (Fujishima, 1999; Harte and Sandler, 2005) and find the best matches § Chord Templates (from Bello’s Slides) 26

  27. Template-Based Approach § Compute the cross-correlation between chroma features and chord templates and select chords that have maximum values (from Bello’s Slides) 27

  28. Limitations § Template approach is too straightforward – The binary templates are hard assignments § Temporal dependency of chords is not considered – The majority of tonal music have certain types of chord progression § The recognized chords are not smooth – Some post-processing (smoothing) is necessary 28

  29. Demo § Chordify: https://chordify.net 29

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend