Tonal Analysis Hidden Markov Model Graduate School of Culture - - PowerPoint PPT Presentation

tonal analysis hidden markov model
SMART_READER_LITE
LIVE PREVIEW

Tonal Analysis Hidden Markov Model Graduate School of Culture - - PowerPoint PPT Presentation

GCT634: Musical Applications of Machine Learning Tonal Analysis Hidden Markov Model Graduate School of Culture Technology, KAIST Juhan Nam Outlines Introduction - Tonality - Perceptual Distance of Two Tones - Chords and Scales Tonal


slide-1
SLIDE 1

GCT634: Musical Applications of Machine Learning

Tonal Analysis Hidden Markov Model

Graduate School of Culture Technology, KAIST Juhan Nam

slide-2
SLIDE 2

Outlines

  • Introduction
  • Tonality
  • Perceptual Distance of Two Tones
  • Chords and Scales
  • Tonal Analysis
  • Key Estimation
  • Chord Recognition
  • Hidden Markov Model
slide-3
SLIDE 3

Introduction

Bach’s Chorale Harmonization Jazz “Real book” Pop Music

slide-4
SLIDE 4

Tonality

  • Tonal music has a tonal center called key
  • 12 keys (C, C#, D, …, B)
  • Tonal music has a major or minor scale on the key and the

notes have different roles

  • Notes in tonal music are harmonized by chords

(C major scale)

slide-5
SLIDE 5

Tonality

  • A sequence of notes or chord progressions provide certain

degree of stability or instability

  • E.g., cadence (V-I, IV-I), tension (sus2, sus4)
  • How the tonality is formed?
  • In other words, how we perceive different degrees of stability or tension

from notes?

slide-6
SLIDE 6

Tonality

  • Consonance and Dissonance
  • If two sinusoidal tones are within 3 ST (minor 3rd) in frequency, they

become dissonant

  • Most dissonant when they are apart about one quarter of the critical band
  • Critical bands become wider below 500 Hz; two low notes can sound

dissonant (e.g. two piano notes in lower keys)

  • Consonance of two harmonics tones
  • Determined by how much two tones have closely-located overtones within

critical bands

slide-7
SLIDE 7

Consonance Rating of Intervals in Music

  • Perceptual distance between two notes are different from semi-

tone distance between them.

slide-8
SLIDE 8

Chords

  • The basic units of tonal harmony
  • Triads, 7th , 9th, 11th, …
  • Triads are formed by choosing three notes that make the most

consonant (or “most harmonized”) sounds

  • This ends up with stacking up major or minor 3rds
  • 7th, 9th are obtained by stacking up 3rds more.
  • The quality of consonance becomes more sophisticated as more

notes are added

  • Music theory is basically about how to create tension and resolve it with

different quality of consonance

slide-9
SLIDE 9

Scales in Tonal Harmony

  • Major Scale
  • Formed by spreading notes from three major chords
  • Minor scale
  • Formed by spreading notes from three minor chords (natural minor scale)
  • Harmonic or melodic minor scale can be formed by using both minor and

major chords

slide-10
SLIDE 10

Automatic Chord Recognition

  • Identifying chord progression of tonal music
  • It is a challenging task (even for human)
  • Chords are not explicit in music
  • Non-chord notes or passing notes
  • Key change and chromaticism: requires in-depth knowledge of music

theory

  • In audio, multiple musical instruments are mixed
  • Relevant: harmonically arranged notes
  • Irrelevant: percussive sounds (but can help detecting chord changes)
  • What kind of audio features can be extracted to recognize

chords in a robust way?

slide-11
SLIDE 11

Chroma Features: FFT-based approach

  • Compute spectrogram and mapping matrix
  • Convert frequency to music pitch scale and get the pitch class
  • Set one to the corresponding pitch class and, otherwise, set zero
  • Adjust non-zeros values such that low-frequency content have more

weights

slide-12
SLIDE 12

Chroma Features: Filter-bank approach

  • A filter-bank can be used to get a log-

scale time-frequency representation

  • Center frequencies are arranged over 88

piano notes

  • band widths are set to have constant-Q and

robust to +/- 25 cent detune

  • The outputs that belong to the same

pitch class are wrapped and summed.

(Müller, 2011)

slide-13
SLIDE 13

Beat-Synchronous Chroma Features

  • Make chroma features homogeneous within a beat

(Bartsch and Wakefield, 2001)

(From Ellis’ slides)

slide-14
SLIDE 14

Key Estimation Overview

  • Estimate music key from music data
  • One of 24 keys: 12 pitch classes (C, C#, D, .., B) + major/minor
  • General Framework (Gomez, 2006)

G major Similarity Measure Chroma Features Average Key Template Key Strength

slide-15
SLIDE 15

Key Template

  • Probe tone profile (Krumhansl and Kessler, 1982)
  • Relative stability or weight of tones
  • Listeners rated which tones best completed the first seven notes of a major

scale

  • For example, in C major key, C, D, E, F, G, A, B, … what?

Probe Tone Profile - Relative Pitch Ranking

slide-16
SLIDE 16

Key Estimation

  • Similarity by cross-correlation between chroma features and

templates

  • Find the key that produces the maximum correlation
slide-17
SLIDE 17

Chord Recognition

  • Estimate chords from music data
  • Typically, one of 24 keys: 12 pitch classes + major/minor
  • Often, diminish chords are added (36 chords)
  • General Framework

Chords Decision Making Audio/ Transform Chroma Features Chord Template

  • r Models

Template Matching HMM, SVM

slide-18
SLIDE 18

Template-Based Approach

  • Use chord templates (Fujishima, 1999; Harte and Sandler, 2005)

and find the best matches

  • Chord Templates

(from Bello’s Slides)

slide-19
SLIDE 19

Template-Based Approach

  • Compute the cross-correlation between chroma features and

chord templates and select chords that have maximum values

(from Bello’s Slides)

slide-20
SLIDE 20

Review

  • Template approach is too straightforward
  • The binary templates are hard assignments
  • We can use a multi-class classifier
  • The output is one of the target chords
  • However, the local estimation tends to be temporally not smooth
  • We need some algorithm that considers the temporal

dependency between chords

  • The majority of tonal music have certain types of chord progression
slide-21
SLIDE 21

Hidden Markov Model (HMM)

  • A probabilistic model for time series data
  • Speech, gesture, DNA sequence, financial data, weather data, …
  • Assumes that the time series data are generated from hidden

states and the hidden states follow a Markov model

  • Learning-based approach
  • Need training data annotated with labels
  • The labels usually correspond to hidden states
slide-22
SLIDE 22

Markov Model

  • A random variable 𝑟 has 𝑂 states (𝑇1, 𝑇2, … , 𝑇𝑂) and, at each time

step, one of the states are randomly chosen: 𝑟( ∈ {𝑇1, 𝑇2, … , 𝑇𝑂}

  • The probability distribution for the current state is determined by

the previous state(s)

  • The first-order: 𝑄 𝑟( 𝑟-, 𝑟., … , 𝑟(/- = 𝑄 𝑟( 𝑟(/-
  • The second-order: 𝑄 𝑟( 𝑟-, 𝑟., … , 𝑟(/- = 𝑄 𝑟( 𝑟(/-, 𝑟(/.
  • The first-order Markov model is widely used for simplicity
slide-23
SLIDE 23

Markov Model

  • Example: chord progression
  • 𝑟( ∈ {𝐷, 𝐺, 𝐻}
  • The transition probability matrix 3 by 3

F C G

St End

𝑄 𝑟( = 𝐷 𝑟(/- = 𝐷 = 0.7 𝑄 𝑟( = 𝐺 𝑟(/- = 𝐷 = 0.1 𝑄 𝑟( = 𝐻 𝑟(/- = 𝐷 = 0.2 𝑄 𝑟( = 𝐷 𝑟(/- = 𝐺 = 0.2 𝑄 𝑟( = 𝐺 𝑟(/- = 𝐺 = 0.6 𝑄 𝑟( = 𝐻 𝑟(/- = 𝐺 = 0.2 𝑄 𝑟( = 𝐷 𝑟(/- = 𝐻 = 0.3 𝑄 𝑟( = 𝐺 𝑟(/- = 𝐻 = 0.1 𝑄 𝑟( = 𝐻 𝑟(/- = 𝐻 = 0.6

slide-24
SLIDE 24

Markov Model

  • The joint probability of a sequence of states is simple with the

Markov model

𝑄 𝑟-, 𝑟., … , 𝑟( = 𝑄 𝑟-, 𝑟., … , 𝑟(/- 𝑄 𝑟( 𝑟-, 𝑟., … , 𝑟(/- = 𝑄 𝑟-, 𝑟., … , 𝑟(/- 𝑄 𝑟( 𝑟(/- = 𝑄 𝑟-, 𝑟., … , 𝑟(/. 𝑄 𝑟(/- 𝑟-, 𝑟., … , 𝑟(/. 𝑄 𝑟( 𝑟(/- = 𝑄 𝑟-, 𝑟., … , 𝑟(/. 𝑄 𝑟(/- 𝑟(/. 𝑄 𝑟( 𝑟(/- = 𝑄 𝑟- 𝑄 𝑟.|𝑟- … 𝑄 𝑟(/- 𝑟(/. 𝑄 𝑟( 𝑟(/-

slide-25
SLIDE 25

What Can We Do with the Markov Model?

  • Generate a chord sequence
  • e.g.) C – C – C – C – F – F – C – C – G – G – C– C - …
  • We can also generate melody if we define the transition probability matrix

among notes

  • Evaluate if a specific chord progression is more likely than
  • thers.
  • For example, C-G-C is more likely than C-F-C (assuming 𝑄 𝑟- = 𝐷 = 1)

𝑄 𝑟 = 𝐷, 𝐻, 𝐷 = 𝑄 𝑟- = 𝐷 𝑄 𝑟. = 𝐻|𝑟- = 𝐷 𝑄 𝑟; = 𝐷|𝑟. = 𝐻 = 0.2 ∗ 0.3 = 0.06 𝑄 𝑟 = 𝐷, 𝐺, 𝐷 = 𝑄 𝑟- = 𝐷 𝑄 𝑟. = 𝐺|𝑟- = 𝐷 𝑄 𝑟; = 𝐷|𝑟. = 𝐺 = 0.1 ∗ 0.2 = 0.02

slide-26
SLIDE 26

What Can We Do with a Markov Model ?

  • Compute the probability that the chord at time 𝑈 is C (or F or G)
  • Naïve method: count all paths that have C chord at time 𝑈: exponential!
  • Clever method: use a recursive induction
  • 𝑄 𝑟> = 𝐷 = 𝑄 𝑟> = 𝐷|𝑟>/- = 𝐷 𝑄 𝑟>/- = 𝐷

+𝑄 𝑟> = 𝐷|𝑟>/- = 𝐺 𝑄 𝑟>/- = 𝐺 +𝑄 𝑟> = 𝐷|𝑟>/- = 𝐻 𝑄 𝑟>/- = 𝐻

  • Repeat this for 𝑄 𝑟@ = 𝐷 , 𝑄 𝑟@ = 𝐺 , 𝑄 𝑟@ = 𝐻 for 𝑗 = 𝑈 − 1, 𝑈 − 2, … , 1
slide-27
SLIDE 27

Chord Recognition from Audio

  • What we observe are not chords but audio features (e.g. chroma)
  • We want to infer a chord sequence from audio feature sequences

𝑟-, 𝑟., … , 𝑟(/- 𝑃-, 𝑃., … , 𝑃(/-

slide-28
SLIDE 28

Hidden Markov Model (HMM)

  • The hidden states follow the Markov model
  • Given a state, the corresponding observation distribution is

independent of previous states or observations

  • Each state has emission distribution

𝑟(/- 𝑟( 𝑟(D- 𝑃(/-

. . .

𝑃( 𝑃(D-

F C G 𝑄 𝑃 𝑟( = 𝐷 𝑄 𝑃 𝑟( = 𝐺 𝑄 𝑃 𝑟( = 𝐻

slide-29
SLIDE 29

Hidden Markov Model (HMM)

  • Model parameters
  • Initial state probabilities: 𝑄 𝑟E → 𝜌@
  • Transition probability matrix: 𝑄 𝑟( 𝑟(/- → 𝑏@J
  • Observation distribution given a state: 𝑄 𝑃 𝑟J → 𝑐

J (e.g. Gaussian)

  • How can we learn the parameters from data?
slide-30
SLIDE 30

Training HMM for Chord Recognition

  • If chord labels are aligned with audio, estimate the parameters

directly from the data

  • Initial state probabilities and transition probability matrix: count chord and

chord-to-chord transition

  • Observation distribution: fit a Gaussian model to the audio features

separately for each chord

  • Easy to train but very expensive to obtain the time-aligned data
  • If If chord labels are not aligned with audio, we should do the

maximum-likelihood estimation

slide-31
SLIDE 31

Training HMM: EM algorithm

  • If If chord labels are not aligned with audio, use the EM

algorithm (the Baum-Welch method)

  • E-Step: evaluate the probability of transitioning from state 𝑇𝑗 at

time 𝑢 to state 𝑇𝑘 at time 𝑢 + 1 given observation

  • Then, the probability of being in state 𝑇𝑗 at time 𝑢 can be also derived

𝛿( 𝑗 = 𝑞 𝑟( = 𝑇@ 𝑃, 𝜄 = Q 𝜊( 𝑗, 𝑘

S JT-

𝜊( 𝑗, 𝑘 = 𝑞(𝑟( = 𝑇@, 𝑟(D- = 𝑇

J|𝑃, 𝜇)

slide-32
SLIDE 32

Training HMM: EM algorithm

  • M-Step: update the parameters such that they maximize the log-

likelihood given the evaluation

𝛿( 𝑗

>/- (T-

: expected number of transitions from 𝑇𝑗 (or how many the state 𝑇𝑗 is visited from 1 to T-1)

𝜊( 𝑗, 𝑘

>/- (T-

: expected number of transition from 𝑇𝑗 to 𝑇𝑘

  • We can use the label to constrain the model (e.g. initialization)

𝜌@ = 𝛿- 𝑗 = 𝑏@J = ∑ 𝜊( 𝑗, 𝑘

>/- (T-

∑ 𝛿( 𝑗

>/- (T-

= 𝑐

J 𝑙 = ∑

𝛿( 𝑗, 𝑘 𝑡. 𝑢. 𝑃( = 𝑤[

> (T-

∑ 𝛿( 𝑗

> (T-

=

expected frequency in state 𝑇𝑗 at time 𝑢 = 1 expected number of transition from 𝑇𝑗 to 𝑇𝑘 expected number of transition from 𝑇𝑗 expected number of times in state 𝑇𝑘 and observing 𝑤[ expected number of times in state 𝑇𝑘

slide-33
SLIDE 33

Evaluating HMM for Chord Recognition

  • Find the most likely sequence of hidden states given
  • bservations and HMM model parameters
  • Viterbi algorithm
  • Define a probability variable
  • Initialization:
  • Recursion:
  • Termination:

(from start state) (to end state)

𝜀( 𝑗 = max

`a,`b,…,`cda 𝑄(𝑟-, 𝑟., … , 𝑟( = 𝑇@, 𝑃-, 𝑃., … , 𝑃(| 𝜇)

𝜀- 𝑗 = 𝜌@𝑐@(𝑃-) 𝜔- 𝑗 = 0 𝜀( 𝑘 = max

  • g@gS 𝜀(/- 𝑗 𝑏@J𝑐

J(𝑃()

𝜔( 𝑘 = argmax

  • g@gS

𝜀(/- 𝑗 𝑏@J

2 ≤ 𝑢 ≤ 𝑈, 1 ≤ 𝑘 ≤ 𝑂 1 ≤ 𝑗 ≤ 𝑂

𝑄∗ = max

  • g@gS 𝜀> 𝑗

𝑟>∗ = argmax

  • g@gS

𝜀> 𝑗

slide-34
SLIDE 34

The Viterbi Trellis

  • Recall the Dynamic Programming!

C F G St

v2(j)

v1( j)

. . . . . . . . .

C F G

End

v3( j)

t=1 t=2 t=3

C F G C F G C F G

t=T-1 t=T

vT−1( j) vT ( j)

slide-35
SLIDE 35

Chord Recognition Result

  • HMM provide more smoothed chord recognition output

(From Ellis’ E4896 practicals)

slide-36
SLIDE 36

References

  • P. R. Cook (Editor), “Music, Cognition, and Computerized Sound: An Introduction to

Psychoacoustics”, book, 2001

  • C. Krumhansl, “Cognitive Foundations of Musical Pitch”, 1990
  • M.A. Bartsch and G. H. Wakefield,“To catch a chorus: Using chroma-based

representations for audio thumbnailing”, 2001

  • E. Gómez, P. Herrera, “Estimating The Tonality Of Polyphonic Audio Files: Cognitive

Versus Machine Learning Modeling Strategies”, 2004.

  • M. Müller and S. Ewert, “Chroma Toolbox: MATLAB Implementations for Extracting

Variants of Chroma-Based Audio Features”, 2011.

  • T. Fujishima, “Real-time chord recognition of musical sound: A system using common

lisp music,” 1999

  • A. Sheh and D. Ellis, “Chord Segmentation and Recognition using EM-Trained Hidden

Markov Models”, 2003.

  • L. Rabiner, “A Tutorial on Hidden Markov Models and Selected Applications in Speech

Recognition”, 1989