Unsupervised Music Understanding based on Nonparametric Bayesian - PowerPoint PPT Presentation

Unsupervised Music Understanding based on Nonparametric Bayesian Models Kazuyoshi Yoshii Masataka Goto AIST, Japan

Why can we recognize music as music? Is this because we are taught music theory? Definitely No!

Why “Unsupervised Understanding”? • Even musically ‐ untrained people can intuitively understand and enjoy music – Examples: • People can notice that multiple sounds (musical notes) having different F 0 s are contained in music – Even if they do not know the number of notes in advance • People can distinguish whether given sound mixtures (chords) are harmonic or inharmonic – Even if they are not taught labels (chord names) – Structural patterns over musical notes can be discovered by listening to a large amount of music • Simultaneous and temporal patterns (chords and progressions)

Supervised Approach • The most popular approach in previous studies – Examples: • Music transcription – The number of musical notes is given in advance • Chord recognition – A vocabulary of chord labels (e.g., maj, min, dim, aug) is defined Training phase Decoding phase Train a model of each label Output most likely labels by using labeled audio signals for unlabeled audio signals Out ‐ of ‐ vocabulary problem!

Unsupervised Approach • Our goal is to find musical notes and discover their latent structural patterns at the same time – No finite vocabularies are defined chords and chord progressions • The number of musical notes is not given • Any chord labels (e.g., maj, min, dim, aug) are not given – Only polyphonic audio signals are available • We aim to find an appropriate number of musical notes • We aim to form chords directly from musical notes – No out ‐ of ‐ vocabulary problem! – “Understanding” is to infer the most likely hierarchical organization of music audio signals • A probabilistic framework is promising

The Big Picture • Integration of language and acoustic models – Hierarchical Bayesian formulation Prior Note combinations (chords) Chord progressions etc. S Structural patterns Language Chord modeling model Z Musical notes Acoustic Music model transcription X Music audio signals

A Key Feature • Joint unsupervised learning of both models – Find the most likely latent structures in a data ‐ driven way • What we usually call “chords” could be statistically defined as typical note combinations by maximizing the evidence p(X) Prior Note combinations (chords) Self ‐ organized Chord progressions etc. S Structural patterns Language Chord modeling model Self ‐ organized Z Musical notes Acoustic Music model transcription X Music audio signals

Model Selection • How to determine appropriate numbers of notes and chords so that p(X) is maximized? – Naïve combinatorial complexity control (grid search) is computationally prohibitive! The number of musical notes 10 20 30 40 … 10 ‐ 20,000 ‐ 9,000 ‐ 8,000 ‐ 8,000 The number 20 ‐ 10,000 ‐ 8,500 ‐ 7,000 ‐ 7,500 of types 30 Log ‐ evidence is ‐ 9,000 ‐ 8,300 ‐ 7,500 ‐ 7,900 of chords maximized 40 ‐ 8,800 ‐ 8,400 ‐ 8,000 ‐ 8,500 …

Why Nonparametric Bayes? • Principled approach to structure learning – “L 0 ‐ regularized” sparse learning in an infinite space • Infinite types of musical units (e.g., notes & chords) would be required to represent the variety of the whole music data in the universe • Only limited types of musical units are actually instantiated for explaining available finite data – No model selection • Effective model complexities (the numbers of musical units required) can be inferred at the same time Infinite data in the universe Nonparametric Bayesian model (Infinite model) Conventional Observed finite models finite data

Latest Achievements • We have developed nonparametric Bayesian acoustic and language models – Multipitch analysis for music audio signals • Infinite latent harmonic allocation [Yoshii ISMIR2010] – Infinite number of musical notes allowed – Chord progression modeling for musical notes • Vocabulary ‐ free infinity ‐ gram model [Yoshii ISMIR2011] – Infinite kinds of note combinations allowed Acoustic Mixture model Factorial model Language (e.g., PLCA) (e.g., NMF) Chain ‐ structured model Yoshii ISMIR2010 ? (e.g., n ‐ gram model) Yoshii ISMIR2011 Tree ‐ structured model Nakano WASPAA2011 ? (e.g., PCFG) Nakano ICASSP2012

The Big Picture • Integration of language and acoustic models – Hierarchical Bayesian formulation p ( S ) Prior Note combinations (chords) Chord progressions etc. S Structural patterns Language Grammar p ( Z | S ) induction model Z Musical notes Acoustic Music model transcription X Music audio signals

Nonparametric Bayesian Acoustic Modeling • Objective: Multipitch analysis – Detect multiple fundamental frequencies (F 0 s) at each frame from polyphonic audio signals • Unknown number of musical notes Observed data (wavelet spectrogram) Unknown number of Logarithmic harmonic frequency partials Frame Chopin Mazurka Op.33 ‐ 2

Conventional Finite Modeling • A nested Gaussian mixture model [Goto 1999] – A spectrum at each frame is assumed to consist of K harmonic structures (GMMs) – Each harmonic structure is assumed to contain M harmonic partials (Gaussians) A weight of m ‐ th harmonic A weight of k ‐ th harmonic Density Prob. partial in k ‐ th structure structure in d ‐ th frame Sharpe Gaussians corresponding to harmonic partials Frequency [cent] F 0 of k ‐ th t

Conventional Model Selection • We need to model polyphonic mixtures – How many numbers of musical notes ( K )? • We need to model harmonic structures – How many numbers of harmonic partials ( M )? The number of musical notes 10 20 30 40 … 5 ‐ 20,000 ‐ 9,000 ‐ 8,000 ‐ 8,000 10 ‐ 10,000 ‐ 8,500 ‐ 7,000 ‐ 7,500 The number of harmonic partials 15 ‐ 9,000 ‐ 8,300 ‐ 7,500 ‐ 7,900 20 ‐ 8,800 ‐ 8,400 ‐ 8,000 ‐ 8,500 Computationally prohibitive! …

Taking the Infinite Limit • Infinite latent harmonic allocation (iLHA) [Yoshii ISMIR2010] – We do not need to specify model complexities • Unknown number of musical notes ( K ) • Unknown number of harmonic partials ( M ) Finite nested GMM Let K & M diverge to infinity Infinite nested GMM Only limited numbers of musical notes and harmonic partials are actually contained in a finite amount of observed data

Sparse Learning • Incorporate Dirichlet process (DP) prior – An infinite number of exponentially ‐ decayed mixture weights can be stochastically generated • All weights sum to unity • Almost all weights are very close to zero Weights Weights of notes of partials Posterior inference is feasible (variational Bayes or MCMC) Almost zero Note id Partial id Effective number of notes Effective number of partials

The Big Picture • Integration of language and acoustic models – Hierarchical Bayesian formulation Prior Note combinations (chords) Chord progressions etc. S Structural patterns Language Chord modeling model Z Musical notes Acoustic p ( X | Z ) Music model transcription X Music audio signals

Nonparametric Bayesian Language Modeling • Objective: Chord progression modeling – Learn an n ‐ gram model directly from musical notes • Without using a vocabulary of conventional chord labels • Without specifying the value of n (a vocab. of chord patterns) … C:maj F:maj G:maj C:maj C:maj F:maj G:maj … 3 ‐ gram Any note 3 ‐ gram combinations allowed 3 ‐ gram … C+E+G F+A+C G+B+D C+E+G C+E+G F+A+C G+B+D … 3 ‐ gram Any chord 4 ‐ gram patterns allowed 2 ‐ gram

Conventional Model Selection • We need to formulate a variable ‐ order model – How to determine a context length ( n ) for each chord? … C:maj F:maj G:maj C:maj C:maj F:maj G:maj … The value of n … C:maj 1 2 3 4 … Testing all combinations F:maj 1 2 3 4 … is infeasible! G:maj 1 2 3 4 … C:maj 1 2 3 4 … F:maj 1 2 3 4 … G:maj 1 2 3 4 … … Is 3 ‐ gram always the best?

Taking the Infinite Limit • Vocabulary ‐ free infinity ‐ gram model [Yoshii ISMIR2011] – Hierarchical Bayesian smoothing • Based on generative model of n ‐ grams [Teh 2006] • Pitman ‐ Yor process (PY) – Diverge the value of n to Infinity • All possibilities of n are considered [mochihashi 2007] • Dirichlet process (DP) – Allow any note combinations to form “chords” • No out ‐ of ‐ vocabulary problem! – C + E + G is a chord – C + D + E is another chord (no corresponding conventional label) … C+E+G F+A+C G+B+D C+E+G C+E+G F+A+C G+B+D … We consider a generative model of note combinations and integrate it to the n ‐ gram model

Inference Results • Discover chord patterns from Beatles songs (represented by conventional chord labels n = 1 2 3 4 5 6 7 8 9 10 “Let It Be” Intro for readability) C:maj G:maj Proba n Stochastically ‐ coherent chord A:min bility patterns (in C major scale) A:min F:maj7 0.701 3 C:7 F:7 C:7 F:maj6 0.682 3 B:maj F:maj G:maj C:maj G:maj 0.656 3 A:min C:7 F:maj F:maj 0.647 3 F:min G:maj C:maj C:maj Verse C:maj 0.645 4 F:maj F:maj G:maj C:maj G:maj A:min 0.632 3 E:min C:7 F:maj A:min 0.630 3 C:maj7 D:min7 E:min7 F:maj7 F:maj6 0.623 4 B:maj F:maj G:maj C:maj C:maj 0.622 3 D:min7 G:sus4 G:maj G:maj F:maj 0.620 5 D:min G:maj C:maj F:maj C:maj C:maj

Unsupervised Music Understanding based on Nonparametric Bayesian - PowerPoint PPT Presentation

Unsupervised Music Understanding based on Nonparametric Bayesian Models Kazuyoshi Yoshii Masataka Goto AIST, Japan Why can we recognize music as music? Is this because we are taught music theory? Definitely No! Why Unsupervised Understanding?

MUSIC THERAPY MUSIC THERAPY What is music therapy? Music therapy is simply the process of using

UNSUPERVISED LEARNING, CLUSTERING UNSUPERVISED LEARNING UNSUPERVISED LEARNING Supervised

JEWISH MUSIC 101: WHAT IS JEWISH MUSIC? A PROGRAM OF THE LOWELL MILKEN FUND FOR AMERICAN JEWISH

The intriguing case of sad music Dr. Jonna Vuoskoski jonna.vuoskoski@music.ox.ac.uk Music &

Unsupervised Learning and Clustering l In unsupervised learning you are given a data set with no

4CSLL5 Parameter Estimation (Supervised and Unsupervised) Unsupervised Maximum Likelihood

Music and Pain: A Music Therapy Perspective Deborah Salmon, MA, MTA, CMT BRAMS, Universit de

FOLK MUSIC AT KMH A presentation of the Folk Music Department at the Royal College of Music,

Music, Language and Computation Aline Honingh LoLaCo Guestlecture 2012 Outline Music at the

Nonparametric Regression Splines for Nonparametric Regression Splines for Regional Atmospheric

Nonparametric Sequential Change Detection for High-Dimensional Problems Yasin Ylmaz Electrical

The np package np : A Package for Nonparametric Kernel The np package implements a variety of

Nonparametric analysis of CMB Nonparametric analysis of CMB power spectrum data and consistency

A Musical Future Options for Studying Music at UWA Why choose Music at UWA? Music at UWA

Music Tagging Ryan Curtin LUG@GT Ryan Curtin Music Tagging - p. 1 The Problem You have a

School Music Education Plan THAMES Guidance for Schools Music in Schools - Introducing School

and Inference for Convolutional Neural Networks 1 2 FFT IFFT 3 4 Mathieu et al.: Fast

Near Neighbor Search in High Dimensional Data (2) Locality-Sensitive Hashing (continued) LS

How to make the first million? How to create a company? Leszek Czarnecki, PhD London, October

association of financial mutuals 53 member companies 20 million policyholders Association of

Communications Strategy Page 1 Scott Edgell General Manager Will Graham Minute Item 4

Transition Research Programme Allan Colver Professor of Community Child Health Newcastle

31 January 2015 Primary 6 Outline 1) About the teachers Introduction Preferred mode of

4. Political Beliefs and Behavior 4.1 Processes By Which Citizens Learn About Politics 4.2

Unsupervised Music Understanding based on Nonparametric Bayesian - PowerPoint PPT Presentation

Unsupervised Music Understanding based on Nonparametric Bayesian Models Kazuyoshi Yoshii Masataka Goto AIST, Japan Why can we recognize music as music? Is this because we are taught music theory? Definitely No! Why Unsupervised Understanding?

MUSIC THERAPY MUSIC THERAPY What is music therapy? Music therapy is simply the process of using

UNSUPERVISED LEARNING, CLUSTERING UNSUPERVISED LEARNING UNSUPERVISED LEARNING Supervised

JEWISH MUSIC 101: WHAT IS JEWISH MUSIC? A PROGRAM OF THE LOWELL MILKEN FUND FOR AMERICAN JEWISH

The intriguing case of sad music Dr. Jonna Vuoskoski jonna.vuoskoski@music.ox.ac.uk Music &amp;

Unsupervised Learning and Clustering l In unsupervised learning you are given a data set with no

4CSLL5 Parameter Estimation (Supervised and Unsupervised) Unsupervised Maximum Likelihood

Music and Pain: A Music Therapy Perspective Deborah Salmon, MA, MTA, CMT BRAMS, Universit de

FOLK MUSIC AT KMH A presentation of the Folk Music Department at the Royal College of Music,

Music, Language and Computation Aline Honingh LoLaCo Guestlecture 2012 Outline Music at the

Nonparametric Regression Splines for Nonparametric Regression Splines for Regional Atmospheric

Nonparametric Sequential Change Detection for High-Dimensional Problems Yasin Ylmaz Electrical

The np package np : A Package for Nonparametric Kernel The np package implements a variety of

Nonparametric analysis of CMB Nonparametric analysis of CMB power spectrum data and consistency

A Musical Future Options for Studying Music at UWA Why choose Music at UWA? Music at UWA

Music Tagging Ryan Curtin LUG@GT Ryan Curtin Music Tagging - p. 1 The Problem You have a

School Music Education Plan THAMES Guidance for Schools Music in Schools - Introducing School

and Inference for Convolutional Neural Networks 1 2 FFT IFFT 3 4 Mathieu et al.: Fast

Near Neighbor Search in High Dimensional Data (2) Locality-Sensitive Hashing (continued) LS

How to make the first million? How to create a company? Leszek Czarnecki, PhD London, October

association of financial mutuals 53 member companies 20 million policyholders Association of

Communications Strategy Page 1 Scott Edgell General Manager Will Graham Minute Item 4

Transition Research Programme Allan Colver Professor of Community Child Health Newcastle

31 January 2015 Primary 6 Outline 1) About the teachers Introduction Preferred mode of

4. Political Beliefs and Behavior 4.1 Processes By Which Citizens Learn About Politics 4.2

The intriguing case of sad music Dr. Jonna Vuoskoski jonna.vuoskoski@music.ox.ac.uk Music &