Tonal Analysis Hidden Markov Model Graduate School of Culture - - PowerPoint PPT Presentation
Tonal Analysis Hidden Markov Model Graduate School of Culture - - PowerPoint PPT Presentation
GCT634: Musical Applications of Machine Learning Tonal Analysis Hidden Markov Model Graduate School of Culture Technology, KAIST Juhan Nam Outlines Introduction - Tonality - Perceptual Distance of Two Tones - Chords and Scales Tonal
Outlines
- Introduction
- Tonality
- Perceptual Distance of Two Tones
- Chords and Scales
- Tonal Analysis
- Key Estimation
- Chord Recognition
- Hidden Markov Model
Introduction
Bach’s Chorale Harmonization Jazz “Real book” Pop Music
Tonality
- Tonal music has a tonal center called key
- 12 keys (C, C#, D, …, B)
- Tonal music has a major or minor scale on the key and the
notes have different roles
- Notes in tonal music are harmonized by chords
(C major scale)
Tonality
- A sequence of notes or chord progressions provide certain
degree of stability or instability
- E.g., cadence (V-I, IV-I), tension (sus2, sus4)
- How the tonality is formed?
- In other words, how we perceive different degrees of stability or tension
from notes?
Tonality
- Consonance and Dissonance
- If two sinusoidal tones are within 3 ST (minor 3rd) in frequency, they
become dissonant
- Most dissonant when they are apart about one quarter of the critical band
- Critical bands become wider below 500 Hz; two low notes can sound
dissonant (e.g. two piano notes in lower keys)
- Consonance of two harmonics tones
- Determined by how much two tones have closely-located overtones within
critical bands
Consonance Rating of Intervals in Music
- Perceptual distance between two notes are different from semi-
tone distance between them.
Chords
- The basic units of tonal harmony
- Triads, 7th , 9th, 11th, …
- Triads are formed by choosing three notes that make the most
consonant (or “most harmonized”) sounds
- This ends up with stacking up major or minor 3rds
- 7th, 9th are obtained by stacking up 3rds more.
- The quality of consonance becomes more sophisticated as more
notes are added
- Music theory is basically about how to create tension and resolve it with
different quality of consonance
Scales in Tonal Harmony
- Major Scale
- Formed by spreading notes from three major chords
- Minor scale
- Formed by spreading notes from three minor chords (natural minor scale)
- Harmonic or melodic minor scale can be formed by using both minor and
major chords
Automatic Chord Recognition
- Identifying chord progression of tonal music
- It is a challenging task (even for human)
- Chords are not explicit in music
- Non-chord notes or passing notes
- Key change and chromaticism: requires in-depth knowledge of music
theory
- In audio, multiple musical instruments are mixed
- Relevant: harmonically arranged notes
- Irrelevant: percussive sounds (but can help detecting chord changes)
- What kind of audio features can be extracted to recognize
chords in a robust way?
Chroma Features: FFT-based approach
- Compute spectrogram and mapping matrix
- Convert frequency to music pitch scale and get the pitch class
- Set one to the corresponding pitch class and, otherwise, set zero
- Adjust non-zeros values such that low-frequency content have more
weights
Chroma Features: Filter-bank approach
- A filter-bank can be used to get a log-
scale time-frequency representation
- Center frequencies are arranged over 88
piano notes
- band widths are set to have constant-Q and
robust to +/- 25 cent detune
- The outputs that belong to the same
pitch class are wrapped and summed.
(Müller, 2011)
Beat-Synchronous Chroma Features
- Make chroma features homogeneous within a beat
(Bartsch and Wakefield, 2001)
(From Ellis’ slides)
Key Estimation Overview
- Estimate music key from music data
- One of 24 keys: 12 pitch classes (C, C#, D, .., B) + major/minor
- General Framework (Gomez, 2006)
G major Similarity Measure Chroma Features Average Key Template Key Strength
Key Template
- Probe tone profile (Krumhansl and Kessler, 1982)
- Relative stability or weight of tones
- Listeners rated which tones best completed the first seven notes of a major
scale
- For example, in C major key, C, D, E, F, G, A, B, … what?
Probe Tone Profile - Relative Pitch Ranking
Key Estimation
- Similarity by cross-correlation between chroma features and
templates
- Find the key that produces the maximum correlation
Chord Recognition
- Estimate chords from music data
- Typically, one of 24 keys: 12 pitch classes + major/minor
- Often, diminish chords are added (36 chords)
- General Framework
Chords Decision Making Audio/ Transform Chroma Features Chord Template
- r Models
Template Matching HMM, SVM
Template-Based Approach
- Use chord templates (Fujishima, 1999; Harte and Sandler, 2005)
and find the best matches
- Chord Templates
(from Bello’s Slides)
Template-Based Approach
- Compute the cross-correlation between chroma features and
chord templates and select chords that have maximum values
(from Bello’s Slides)
Review
- Template approach is too straightforward
- The binary templates are hard assignments
- We can use a multi-class classifier
- The output is one of the target chords
- However, the local estimation tends to be temporally not smooth
- We need some algorithm that considers the temporal
dependency between chords
- The majority of tonal music have certain types of chord progression
Hidden Markov Model (HMM)
- A probabilistic model for time series data
- Speech, gesture, DNA sequence, financial data, weather data, …
- Assumes that the time series data are generated from hidden
states and the hidden states follow a Markov model
- Learning-based approach
- Need training data annotated with labels
- The labels usually correspond to hidden states
Markov Model
- A random variable 𝑟 has 𝑂 states (𝑇1, 𝑇2, … , 𝑇𝑂) and, at each time
step, one of the states are randomly chosen: 𝑟( ∈ {𝑇1, 𝑇2, … , 𝑇𝑂}
- The probability distribution for the current state is determined by
the previous state(s)
- The first-order: 𝑄 𝑟( 𝑟-, 𝑟., … , 𝑟(/- = 𝑄 𝑟( 𝑟(/-
- The second-order: 𝑄 𝑟( 𝑟-, 𝑟., … , 𝑟(/- = 𝑄 𝑟( 𝑟(/-, 𝑟(/.
- The first-order Markov model is widely used for simplicity
Markov Model
- Example: chord progression
- 𝑟( ∈ {𝐷, 𝐺, 𝐻}
- The transition probability matrix 3 by 3
F C G
St End
𝑄 𝑟( = 𝐷 𝑟(/- = 𝐷 = 0.7 𝑄 𝑟( = 𝐺 𝑟(/- = 𝐷 = 0.1 𝑄 𝑟( = 𝐻 𝑟(/- = 𝐷 = 0.2 𝑄 𝑟( = 𝐷 𝑟(/- = 𝐺 = 0.2 𝑄 𝑟( = 𝐺 𝑟(/- = 𝐺 = 0.6 𝑄 𝑟( = 𝐻 𝑟(/- = 𝐺 = 0.2 𝑄 𝑟( = 𝐷 𝑟(/- = 𝐻 = 0.3 𝑄 𝑟( = 𝐺 𝑟(/- = 𝐻 = 0.1 𝑄 𝑟( = 𝐻 𝑟(/- = 𝐻 = 0.6
Markov Model
- The joint probability of a sequence of states is simple with the
Markov model
𝑄 𝑟-, 𝑟., … , 𝑟( = 𝑄 𝑟-, 𝑟., … , 𝑟(/- 𝑄 𝑟( 𝑟-, 𝑟., … , 𝑟(/- = 𝑄 𝑟-, 𝑟., … , 𝑟(/- 𝑄 𝑟( 𝑟(/- = 𝑄 𝑟-, 𝑟., … , 𝑟(/. 𝑄 𝑟(/- 𝑟-, 𝑟., … , 𝑟(/. 𝑄 𝑟( 𝑟(/- = 𝑄 𝑟-, 𝑟., … , 𝑟(/. 𝑄 𝑟(/- 𝑟(/. 𝑄 𝑟( 𝑟(/- = 𝑄 𝑟- 𝑄 𝑟.|𝑟- … 𝑄 𝑟(/- 𝑟(/. 𝑄 𝑟( 𝑟(/-
What Can We Do with the Markov Model?
- Generate a chord sequence
- e.g.) C – C – C – C – F – F – C – C – G – G – C– C - …
- We can also generate melody if we define the transition probability matrix
among notes
- Evaluate if a specific chord progression is more likely than
- thers.
- For example, C-G-C is more likely than C-F-C (assuming 𝑄 𝑟- = 𝐷 = 1)
𝑄 𝑟 = 𝐷, 𝐻, 𝐷 = 𝑄 𝑟- = 𝐷 𝑄 𝑟. = 𝐻|𝑟- = 𝐷 𝑄 𝑟; = 𝐷|𝑟. = 𝐻 = 0.2 ∗ 0.3 = 0.06 𝑄 𝑟 = 𝐷, 𝐺, 𝐷 = 𝑄 𝑟- = 𝐷 𝑄 𝑟. = 𝐺|𝑟- = 𝐷 𝑄 𝑟; = 𝐷|𝑟. = 𝐺 = 0.1 ∗ 0.2 = 0.02
What Can We Do with a Markov Model ?
- Compute the probability that the chord at time 𝑈 is C (or F or G)
- Naïve method: count all paths that have C chord at time 𝑈: exponential!
- Clever method: use a recursive induction
- 𝑄 𝑟> = 𝐷 = 𝑄 𝑟> = 𝐷|𝑟>/- = 𝐷 𝑄 𝑟>/- = 𝐷
+𝑄 𝑟> = 𝐷|𝑟>/- = 𝐺 𝑄 𝑟>/- = 𝐺 +𝑄 𝑟> = 𝐷|𝑟>/- = 𝐻 𝑄 𝑟>/- = 𝐻
- Repeat this for 𝑄 𝑟@ = 𝐷 , 𝑄 𝑟@ = 𝐺 , 𝑄 𝑟@ = 𝐻 for 𝑗 = 𝑈 − 1, 𝑈 − 2, … , 1
Chord Recognition from Audio
- What we observe are not chords but audio features (e.g. chroma)
- We want to infer a chord sequence from audio feature sequences
𝑟-, 𝑟., … , 𝑟(/- 𝑃-, 𝑃., … , 𝑃(/-
Hidden Markov Model (HMM)
- The hidden states follow the Markov model
- Given a state, the corresponding observation distribution is
independent of previous states or observations
- Each state has emission distribution
𝑟(/- 𝑟( 𝑟(D- 𝑃(/-
. . .
𝑃( 𝑃(D-
F C G 𝑄 𝑃 𝑟( = 𝐷 𝑄 𝑃 𝑟( = 𝐺 𝑄 𝑃 𝑟( = 𝐻
Hidden Markov Model (HMM)
- Model parameters
- Initial state probabilities: 𝑄 𝑟E → 𝜌@
- Transition probability matrix: 𝑄 𝑟( 𝑟(/- → 𝑏@J
- Observation distribution given a state: 𝑄 𝑃 𝑟J → 𝑐
J (e.g. Gaussian)
- How can we learn the parameters from data?
Training HMM for Chord Recognition
- If chord labels are aligned with audio, estimate the parameters
directly from the data
- Initial state probabilities and transition probability matrix: count chord and
chord-to-chord transition
- Observation distribution: fit a Gaussian model to the audio features
separately for each chord
- Easy to train but very expensive to obtain the time-aligned data
- If If chord labels are not aligned with audio, we should do the
maximum-likelihood estimation
Training HMM: EM algorithm
- If If chord labels are not aligned with audio, use the EM
algorithm (the Baum-Welch method)
- E-Step: evaluate the probability of transitioning from state 𝑇𝑗 at
time 𝑢 to state 𝑇𝑘 at time 𝑢 + 1 given observation
- Then, the probability of being in state 𝑇𝑗 at time 𝑢 can be also derived
𝛿( 𝑗 = 𝑞 𝑟( = 𝑇@ 𝑃, 𝜄 = Q 𝜊( 𝑗, 𝑘
S JT-
𝜊( 𝑗, 𝑘 = 𝑞(𝑟( = 𝑇@, 𝑟(D- = 𝑇
J|𝑃, 𝜇)
Training HMM: EM algorithm
- M-Step: update the parameters such that they maximize the log-
likelihood given the evaluation
- ∑
𝛿( 𝑗
>/- (T-
: expected number of transitions from 𝑇𝑗 (or how many the state 𝑇𝑗 is visited from 1 to T-1)
- ∑
𝜊( 𝑗, 𝑘
>/- (T-
: expected number of transition from 𝑇𝑗 to 𝑇𝑘
- We can use the label to constrain the model (e.g. initialization)
𝜌@ = 𝛿- 𝑗 = 𝑏@J = ∑ 𝜊( 𝑗, 𝑘
>/- (T-
∑ 𝛿( 𝑗
>/- (T-
= 𝑐
J 𝑙 = ∑
𝛿( 𝑗, 𝑘 𝑡. 𝑢. 𝑃( = 𝑤[
> (T-
∑ 𝛿( 𝑗
> (T-
=
expected frequency in state 𝑇𝑗 at time 𝑢 = 1 expected number of transition from 𝑇𝑗 to 𝑇𝑘 expected number of transition from 𝑇𝑗 expected number of times in state 𝑇𝑘 and observing 𝑤[ expected number of times in state 𝑇𝑘
Evaluating HMM for Chord Recognition
- Find the most likely sequence of hidden states given
- bservations and HMM model parameters
- Viterbi algorithm
- Define a probability variable
- Initialization:
- Recursion:
- Termination:
(from start state) (to end state)
𝜀( 𝑗 = max
`a,`b,…,`cda 𝑄(𝑟-, 𝑟., … , 𝑟( = 𝑇@, 𝑃-, 𝑃., … , 𝑃(| 𝜇)
𝜀- 𝑗 = 𝜌@𝑐@(𝑃-) 𝜔- 𝑗 = 0 𝜀( 𝑘 = max
- g@gS 𝜀(/- 𝑗 𝑏@J𝑐
J(𝑃()
𝜔( 𝑘 = argmax
- g@gS
𝜀(/- 𝑗 𝑏@J
2 ≤ 𝑢 ≤ 𝑈, 1 ≤ 𝑘 ≤ 𝑂 1 ≤ 𝑗 ≤ 𝑂
𝑄∗ = max
- g@gS 𝜀> 𝑗
𝑟>∗ = argmax
- g@gS
𝜀> 𝑗
The Viterbi Trellis
- Recall the Dynamic Programming!
C F G St
v2(j)
v1( j)
. . . . . . . . .
C F G
End
v3( j)
t=1 t=2 t=3
C F G C F G C F G
t=T-1 t=T
vT−1( j) vT ( j)
Chord Recognition Result
- HMM provide more smoothed chord recognition output
(From Ellis’ E4896 practicals)
References
- P. R. Cook (Editor), “Music, Cognition, and Computerized Sound: An Introduction to
Psychoacoustics”, book, 2001
- C. Krumhansl, “Cognitive Foundations of Musical Pitch”, 1990
- M.A. Bartsch and G. H. Wakefield,“To catch a chorus: Using chroma-based
representations for audio thumbnailing”, 2001
- E. Gómez, P. Herrera, “Estimating The Tonality Of Polyphonic Audio Files: Cognitive
Versus Machine Learning Modeling Strategies”, 2004.
- M. Müller and S. Ewert, “Chroma Toolbox: MATLAB Implementations for Extracting
Variants of Chroma-Based Audio Features”, 2011.
- T. Fujishima, “Real-time chord recognition of musical sound: A system using common
lisp music,” 1999
- A. Sheh and D. Ellis, “Chord Segmentation and Recognition using EM-Trained Hidden
Markov Models”, 2003.
- L. Rabiner, “A Tutorial on Hidden Markov Models and Selected Applications in Speech