GCT535- Sound Technology for Multimedia Tonal Analysis Graduate - PowerPoint PPT Presentation

GCT535- Sound Technology for Multimedia Tonal Analysis Graduate School of Culture Technology KAIST Juhan Nam 1

Outline § Pitch Perception – Perceptual Pitch Scale – Log-Scaled Spectrum § Tonal Analysis – Chroma Feature – Key Estimation – Chord Recognition 2

Frequency Scale in Spectrogram § Linear frequency scale – Great to see the harmonic structure of a single tone. – However, it is not the most intuitive way to visualize musical signals 4000 10000 3500 3000 8000 frequency − Hz 2500 frequency − Hz 6000 2000 1500 4000 1000 2000 500 0 0 0 1 2 3 4 5 6 7 8 10 20 30 40 50 time [second] time [second] Beatles “Hey Jude” Piano (Chromatic Scale) 3

Human Pitch Perception § Human ears are sensitive to frequency changes in a log scale – Pitch resolution: just noticeable difference (JND) increases as the frequency goes up – Place theory: resonance position along the basilar membrane in cochlea From CCRMA Music 150 slides (Thomas Rossing) Response of the basilar membrane to a pair of tones 4

Critical Bandwidth § Frequency bandwidth within which one tone interferes with the perception of another tone by auditory masking – Constant at low frequency but linear at high frequency 5 From CCRMA Music 150 slides (Thomas Rossing)

Psychoacoustical Pitch Scales § Mel scale – Based on pitch ratio of tones (mel from 1 “melody”) 0.9 m = 2595log 10 (1 + f / 700) 0.8 0.7 normalized scales 0.6 § Bark scale 0.5 – Critical band measurement by masking 0.4 0.3 Bark = 13arctan(0.00075 f ) + 3.5arctan(( f / 7500) 2 ) 0.2 ERB 0.1 Mel § Equivalent Regular Bandwidth (EBR) rate Bark 0 0 0.5 1 1.5 2 2.5 frequency (Hz) 4 – Critical band measurement using the notched- x 10 Comparison of Pitch Scales noise method Using Matlab code from https://www.speech.kth.se/~giampi/auditoryscales/ ERBS = 21.4 ⋅ log 10 (1 + 0.00437 f ) 6

Musical Pitch Scale § Equal temperament – 1: 2 1/12 ratio between two adjacent notes – Music note ( m ) and frequency ( f ) in Hz m = 12log 2 ( f ( m − 69) 440) + 69, f = 440 ⋅ 2 12 7 https://newt.phys.unsw.edu.au/jw/notes.html

Frequency Mapping Using Spectrogram § Mapping linear scale to a perceptual (log-like) scale – Locate center frequencies according to the frequency mapping – Linear interpolation on the center frequency with the corresponding bandwidth skirt 4000 120 3500 100 3000 Band Center MIDI note number width Frequency 80 2500 frequency − Hz 2000 60 1500 40 1000 20 500 0 10 20 30 40 50 10 20 30 40 50 time [second] time [second] Log-Frequency Spectrogram Linear-Frequency Spectrogram 8

Frequency Mapping Using Spectrogram § The mapping can be formed as matrix multiplication – Each column of the mapping matrix contain the interpolation coefficients Y = M ⋅ X ( M : mapping matrix, X : spectrogram, Y : scaled spectrogram) 4000 120 3500 20 100 3000 40 MIDI note number 2500 frequency − Hz 80 × = 60 2000 60 80 1500 40 1000 100 20 500 120 0 100 200 300 400 500 600 10 20 30 40 50 10 20 30 40 50 time [second] time [second] § Limitation – Simple but time frequency resolutions are still constrained on STFT 9

Mel-Frequency Spectrogram § Mel scale is a popularly choice – Example: MFCC 250 10000 200 8000 frequency − Hz 150 Mel bin 6000 100 4000 50 2000 0 0 1 2 3 4 5 6 7 8 0 1 2 3 4 5 6 7 8 time [second] time [second] Linear-Frequency Spectrogram Mel-Frequency Spectrogram 10

Constant-Q transform § Use a set of sinusoidal kernels with: – Logarithmically spaced frequencies – Constant Q = frequency/bandwidth 11

Comparison of Different Time-Frequency Representations frequency frequency time time Spectrogram (short window) Spectrogram (long window) frequency frequency time time Constant-Q transform Mel Spectrogram 12

Example of Constant-Q transform 320 120 300 280 100 260 MIDI note number 240 80 220 60 200 180 40 160 140 20 120 100 10 20 30 40 50 0 10 20 30 40 50 time [second] time [second] Log-Frequency Spectrogram (mapping) Log-Frequency Spectrogram (Constant-Q transform) 13

Chord Recognition in MIR § Identifying chord progression of tonal music § It is a challenging task (even for human) – Chords are not explicit in music – Non-chord notes or passing notes – Key change and chromaticism: requires in-depth knowledge of music theory – In audio, multiple musical instruments are mixed • Relevant: harmonically arranged notes • Irrelevant: percussive sounds (but can help detecting chord changes) § What kind of audio features can be extracted to recognize chords in a robust way? 14

Pitch Helix § The basic assumption in tonal harmony is that octave-distance notes belong to the same pitch class – No dissonance among them – As a result, there are “12 pitch class” § Shepard represented the octave equivalence with “pitch helix” – Chroma: represents the inherent circularity of pitch organization – Height: naturally increase and have one octave apart for one rotation Pitch Helix and Chroma (Shepard, 2001) 15

Chroma § Chroma is independent of the height – Shepard tone: single pitch class in harmonics – Constant rising and falling https://vimeo.com/34749558 Shepard tone Optical illusion stairs § Chroma contains the relative distribution of pitch classes and pitch height is noisy variation in chord recognition – Thus, chroma is considered to be well-suited for analyzing harmony. 16

Chroma Features § Chroma features are audio feature vectors that contain the chroma characteristics – Ideally, obtained by polyphonic note transcription but too expensive – In addition, as notes are more harmonized, separating polyphonic notes become harder § In practice, chroma features are obtained by projecting all time-frequency energy onto 12 pitch classes § Used for not only for chord recognition but also key estimation, segmentation, synchronization, cover-song detection 17

Chroma Features: FFT-based approach § Compute spectrogram and mapping matrix – Convert frequency to music pitch scale and get the pitch class – Set one to the corresponding pitch class and, otherwise, set zero – Adjust non-zeros values such that low-frequency content have more weights 18

Improvements § Blurring – Intrinsic problem with STFT – Solutions: find amplitude peaks and use them only § De-tuning – Notes can be deviated from reference tuning – Compute 36 bin chroma features: add two neighboring bins to each pitch class – Use only a peak value among the three bins per pitch class § Normalization – Divide the frame chroma features by the local maximum or mean to regularize the volume change 19

Chroma Features: Filter-bank approach § Alternatively, a filter-bank can be used to get a log-scale time-frequency representation – Center frequencies are arranged over 88 piano notes – band widths are set to have constant-Q and robust to +/- 25 cent detune § The outputs that belong to the same pitch class are wrapped and summed. (Muller, 2011) 20

Beat-Synchronous Chroma Features § Make chroma features homogeneous within a beat (Bartsch and Wakefield, 2001) (From Ellis’ slides) 21

Key Estimation Overview § Estimate music key from music data – One of 24 keys: 12 pitch classes (C, C#, D, .., B) + major/minor § General Framework (Gomez, 2006) Chroma Similarity Average Key G major Features Measure Strength Key Template 22

Key Template § Probe tone profile (Krumhansl and Kessler, 1982) – Relative stability or weight of tones – Listeners rated which tones best completed the first seven notes of a major scale. • For example, in C major key, C, D, E, F, G, A, B, … what? Probe Tone Profile - Relative Pitch Ranking 23

Key Estimation § Similarity by cross-correlation between chroma features and templates § Find the key that produces the maximum correlation 24

Chord Recognition § Estimate chords from music data – Typically, one of 24 keys: 12 pitch classes + major/minor – Often, diminish chords are added (36 chords) § General Framework Template Matching HMM, SVM Audio/ Decision Chords Chroma Transform Making Features Chord Template or Models 25

Template-Based Approach § Use chord templates (Fujishima, 1999; Harte and Sandler, 2005) and find the best matches § Chord Templates (from Bello’s Slides) 26

Template-Based Approach § Compute the cross-correlation between chroma features and chord templates and select chords that have maximum values (from Bello’s Slides) 27

Limitations § Template approach is too straightforward – The binary templates are hard assignments § Temporal dependency of chords is not considered – The majority of tonal music have certain types of chord progression § The recognized chords are not smooth – Some post-processing (smoothing) is necessary 28

Demo § Chordify: https://chordify.net 29

GCT535- Sound Technology for Multimedia Tonal Analysis Graduate - PowerPoint PPT Presentation

GCT535- Sound Technology for Multimedia Tonal Analysis Graduate School of Culture Technology KAIST Juhan Nam 1 Outline Pitch Perception Perceptual Pitch Scale Log-Scaled Spectrum Tonal Analysis Chroma Feature Key

GCT535- Sound Technology for Multimedia Temporal Analysis Graduate School of Culture Technology

GCT535- Sound Technology for Multimedia Timbre Analysis Graduate School of Culture Technology

GCT535- Sound Technology for Multimedia Digital Systems Graduate School of Culture Technology

GCT535- Sound Technology for Multimedia Digital Audio Graduate School of Culture Technology

GCT535- Sound Technology for Multimedia Pitch Analysis Graduate School of Culture Technology

GCT535- Sound Technology for Multimedia Delay-based Effects Graduate School of Culture Technology

GCT535- Sound Technology for Multimedia Filters Graduate School of Culture Technology KAIST

GCT535- Sound Technology for Multimedia Music and Audio Alignment Graduate School of Culture

GCT535- Sound Technology for Multimedia Time-Stretching and Pitch-Shifting Graduate School of

GCT535- Sound Technology for Multimedia Fourier Representations of Audio Graduate School of

Multimedia Systems Definition of Multimedia System A Multimedia System is a system capable of

Multimedia Applications Multimedia Applications Srinidhi Varadarajan Multimedia Applications

Chapter 1 Introduction to Multimedia 1.1 What is Multimedia? 1.2 Multimedia and Hypermedia 1.3

Tonal Analysis Hidden Markov Model Graduate School of Culture Technology, KAIST Juhan Nam

Summary User-centric Social Social Multimedia Multimedia Computing From Users: user-perceptive

Level/Oblique Opposition and Raoyang Tonology Jinpang Song 1 Goals 1. Raoyang Tonal Structure

Methods for increasing equity, diversity, and inclusion in linguistics pedagogy Nathan Sanders,

Sensory System How Does Our Mind Know What is Happening Outside of Our Own Body? Helena

Speech Generation and Perception 1 Speech Generation and Perception : The study of the

"Robust Automatic Speech Recognition through on-line Semi Blind Source Extraction"

Radiation Therapy Planning in Low- and Middle- Income Countries Laurence Court PhD University of

ECE 417 Lecture 20: MP5 Walkthrough 10/31/2019 Outline Background things that are done for

Musical Interfaces (Past to Present) Guest Lecture for ECE 590.21 10/10/2018 Kenneth D. Stewart

Efficient numerical simulation of time-harmonic wave equations Prof. Tuomo Rossi M.Sc. Tuomas

GCT535- Sound Technology for Multimedia Tonal Analysis Graduate - PowerPoint PPT Presentation

GCT535- Sound Technology for Multimedia Tonal Analysis Graduate School of Culture Technology KAIST Juhan Nam 1 Outline Pitch Perception Perceptual Pitch Scale Log-Scaled Spectrum Tonal Analysis Chroma Feature Key

GCT535- Sound Technology for Multimedia Temporal Analysis Graduate School of Culture Technology

GCT535- Sound Technology for Multimedia Timbre Analysis Graduate School of Culture Technology

GCT535- Sound Technology for Multimedia Digital Systems Graduate School of Culture Technology

GCT535- Sound Technology for Multimedia Digital Audio Graduate School of Culture Technology

GCT535- Sound Technology for Multimedia Pitch Analysis Graduate School of Culture Technology

GCT535- Sound Technology for Multimedia Delay-based Effects Graduate School of Culture Technology

GCT535- Sound Technology for Multimedia Filters Graduate School of Culture Technology KAIST

GCT535- Sound Technology for Multimedia Music and Audio Alignment Graduate School of Culture

GCT535- Sound Technology for Multimedia Time-Stretching and Pitch-Shifting Graduate School of

GCT535- Sound Technology for Multimedia Fourier Representations of Audio Graduate School of

Multimedia Systems Definition of Multimedia System A Multimedia System is a system capable of

Multimedia Applications Multimedia Applications Srinidhi Varadarajan Multimedia Applications

Chapter 1 Introduction to Multimedia 1.1 What is Multimedia? 1.2 Multimedia and Hypermedia 1.3

Tonal Analysis Hidden Markov Model Graduate School of Culture Technology, KAIST Juhan Nam

Summary User-centric Social Social Multimedia Multimedia Computing From Users: user-perceptive

Level/Oblique Opposition and Raoyang Tonology Jinpang Song 1 Goals 1. Raoyang Tonal Structure

Methods for increasing equity, diversity, and inclusion in linguistics pedagogy Nathan Sanders,

Sensory System How Does Our Mind Know What is Happening Outside of Our Own Body? Helena

Speech Generation and Perception 1 Speech Generation and Perception : The study of the

&quot;Robust Automatic Speech Recognition through on-line Semi Blind Source Extraction&quot;

Radiation Therapy Planning in Low- and Middle- Income Countries Laurence Court PhD University of

ECE 417 Lecture 20: MP5 Walkthrough 10/31/2019 Outline Background things that are done for

Musical Interfaces (Past to Present) Guest Lecture for ECE 590.21 10/10/2018 Kenneth D. Stewart

Efficient numerical simulation of time-harmonic wave equations Prof. Tuomo Rossi M.Sc. Tuomas

"Robust Automatic Speech Recognition through on-line Semi Blind Source Extraction"