Music Classification Overview and Audio Features Graduate School of - PowerPoint PPT Presentation

GCT634: Musical Applications of Machine Learning Music Classification Overview and Audio Features Graduate School of Culture Technology, KAIST Juhan Nam

Outlines • Definition of Music Classification Tasks • Overview of Music Classification Systems • Audio Features

Definition • Categorizing input audio into labels - Labels can be anything, even including note, chord or beat notations - However, we limit them to semantic words such as genre, mood, instrument, era and other word-based descriptions Model Input Output

Types of Music Classification Tasks • Genre/Mood classification - Classify music clips into a category - Single-label classification • Instrument Identification - Can be recast as a classification problem - Polyphonic cases: pre-dominant instrument detection (single-label classification) or multiple instrument detection (multiple-label classification) • Music Auto-Tagging - Labels can be anything (e.g. genre, mood, instrument, era, vocal quality) - Multi-label classification

Music Genre • Numerous genres and their sub-genres - http://research.google.com/bigpicture/music/ - http://en.wikipedia.org/wiki/List_of_popular_music_genres • Evolutionary and influence-based - https://frananddavesmusicaladventure.wordpress.com/the-music-tree/ - http://www.historyshots.com/rockmusic/ - http://techno.org/electronic-music-guide/ • Based on cultural context - Many cultural communities (or countries with homogenous culture) have different genre distributions - Unique genres (e.g. trot) and different popularity (e.g. metal)

Genre Categories in MIREX • MIREX ( M usic I nformation R etrieval E valuation e X change) - Community-based algorithm evaluation framework and events US Pop Genre Latin Genre K-pop Classification Classification Classification Blues Axe Ballad Jazz Bachata Dance Country/Western Bolero Folk Baroque Forro Hip-hop Classical Gaucha R&B Romantic Merengue Rock Electronica Pagode Trot Hip-Hop Salsa Rock Sertaneja HardRock/Metal Tango http://www.music-ir.org/mirex/wiki/2017:Audio_Classification_(Train/Test)_Tasks

Music Mood Models in Music Psychology 2/2 • Russel’s circumplex model of affect - “Arousal-Valence” 2D space � Dimensional � Russell’s circumplex model Russell, J. A. 1980. A circumplex model of affect. Social , 39: 1161‐ 1178. 10/10/2012 27

Mood Label Clustering Music Mood Mood labels for albums Mood labels for songs • Mood clustering - Using mood labels for songs ( “allmusic.com”) - Song by mood matrix à mood by mood correlation matrix à clustering C4 C5 C3 C2 C1 10/10/2012 30 Hu, X., & Downie, J. S. (2007). Exploring Mood Metadata: Relationships with (Hu, 2007) 10/10/2012 Genre, Artist and Usage Metadata. In 31

Mood Categories in MIREX • The five clusters are used in the MIREX mood classification task Mood Classification Cluster_1: passionate, rousing, confident, boisterous, rowdy Cluster_2: rollicking, cheerful, fun, sweet, amiable/good natured Cluster_3: literate, poignant, wistful, bittersweet, autumnal, brooding Cluster_4: humorous, silly, campy, quirky, whimsical, witty, wry Cluster_5: aggressive, fiery, tense/anxious, intense, volatile, visceral http://www.music-ir.org/mirex/wiki/2017:Audio_Classification_(Train/Test)_Tasks

Overview of Music Classification Systems Audio Feature Classifier Representations Extraction “Classical” “Jazz” (?) “Metal”

Overview of Music Classification Systems Audio Feature Classifier Representations Extraction “Classical” “Jazz” (?) “Metal” • Audio representations - Low-level representation of audio - Preserve the majority of information in input data - e.g. waveform, spectrogram, mel-spectrogram

Overview of Music Classification Systems Audio Feature Classifier Representations Extraction “Classical” “Jazz” (?) “Metal” • Feature extraction - Summary of acoustic or musical patterns that explain the characteristics of the audio representations - e.g. MFCC, chroma, learning-based feature represenations

Overview of Music Classification Systems Audio Feature Classifier Representations Extraction “Classical” “Jazz” (?) “Metal” • Classifiers - Determine the category based on the extracted features - A learning algorithm is necessary: e.g. SVM, GMM, NN - Training and Testing

It is important to extract good audio features! Audio Feature Classifier Representations Extraction “Classical” “Classical” “Jazz” “Jazz” “Metal” “Metal” Feature Space Feature Space Bad Features Good Features

Let’s listen to examples • What the genre of the music ? • What the mood of the music ? • What are the features of the music that explain your answers?

Human Knowledge to Explain Music • Acoustic Level - Loudness - Pitch - Timbre • Musical Level - Instrumentation - Rhythm - Key and scale - Chord and melodic pattern - Lyrics, structure, singing style, …

Two Approaches in Music Classification Audio Feature Classifier Representations Extraction • Feature engineering - Features are designed based on domain knowledge and heuristics - Traditional approach: e.g. MFCC+GMM model • Feature learning - Features are learned using optimization algorithms - Recent approach: e.g. deep neural networks Let’s focus on the feature engineering approach first!

Feature Engineering Model • Feature extraction is divided into several steps Frame-Level Temporal Normalization Audio Features Summarization (G. Tzanetakis)

(Frame-Level) Audio Features • Loudness - Root-Mean-Squares (RMS) of audio frames • Timbre features - Zero-crossing rate - MFCC (w/ delta or double-delta): spectral envelop - Spectral summary: centroid, roll-off, … • Pitch/Harmony features - Chroma • Rhythm features (this is not frame-level) - Beat histogram, Tempogram

Zero-Crossing Rate (ZCR) • ZCR is low for harmonic (voiced) sounds and high for noisy (unvoiced) sounds - Useful to classify different drum sounds (e.g. bass, snare, high-hat) • For narrow-band periodic signals, it is related to the F0 Unvoiced Voiced

Spectral Statistics • Spectral Centroid: “Center of gravity” of the spectrum - Associated with the brightness of sounds ∑ f k X t ( k ) SC ( t ) = k ∑ X t ( k ) k • Spectral Roll-off: frequency under which 85% or 95% of spectral energy is concentrated in R t N ∑ ∑ X t ( k ) = 0.85 X t ( k ) k k

Examples of Spectral Centroids 10000 10000 9000 9000 8000 8000 7000 7000 frequency [Hz] frequency [Hz] 6000 6000 5000 5000 4000 4000 3000 3000 2000 2000 1000 1000 0 0 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 time [sec] time [sec] Classical: “Beethoven String Quartet” Pop: “Video killed the radio star”

Spectral Statistics • Spectral Spread(SS): a measure of the bandwidth of the spectrum ( f k − SC ( t )) 2 X t ( k ) ∑ SS ( t ) = k ∑ X t ( k ) k • Spectral flatness (SF): a measure of the noisiness of the spectrum - The ratio between the geometric and arithmetic means ∏ X t ( k ) K k SF ( t ) = 1 ∑ X t ( k ) K k

Mel-Frequency Cepstral Coefficient (MFCC) • Most popularly used audio feature for timbre feature extraction - Extract spectral envelop from an audio frame - Standard audio feature in speech recognition - Introduced in music domain by Logan in 2000 • Computation Steps DFT Mapping freq. Log DCT (audio frame) scale to mel magnitude

Mel-Frequency Spectrogram • Convert linear frequency to mel scale • Usually reduce the dimensionality of spectrum Spectrum Spectrum (mel-scaled)

Discrete Cosine Transform • Real-valued transform: similar to DFT - De-correlate the mel-scaled log spectrum and reduce the dimensionality again N − 1 2 x ( n )cos( π k ∑ X DCT ( k ) = N ( n − 0.5)) N n = 1 MFCC Spectrum (mel-scaled)

Reconstructed Frequency Spectrum from MFCC Frequency spectrum Frequency spectrum MFCC (mel-scaled, 60 bins) (13 dim) (512 bins) Reconstructed Reconstructed Frequency spectrum Frequency Spectrum (mel-scaled)

Comparison of Spectrogram and MFCC Spectrogram Mel-frequency Spectrogram MFCC Reconstructed Spectrogram from MFCC

Sound Examples of MFCC • Original: • MFCC reconstruction (using white-noise as a source):

Post-processing • Adding temporal dynamics - Short-term dynamics of features are characterized with delta or double- delta Δ x = x ( n ) − x ( n − h ) ΔΔ x = Δ x ( n ) − Δ x ( n − h ) h h - 39 MFCCs in speech recognition: 13 MFCCs + 13 delta + 13 double-delta

Pitch and Chroma • The basic assumption in tonal harmony is that octave-distance notes belong to the same pitch class - No dissonance among them - As a result, there are “12 pitch class” • Shepard represented the octave equivalence with “pitch helix” - Chroma: represents the inherent circularity of pitch organization - Height: naturally increase and have one octave apart for one rotation Pitch Helix and Chroma (Shepard, 2001)

Pitch and Chroma • Chroma is independent of the height - Shepard tone: single pitch class in harmonics - Constant rising and falling Optical illusion stairs Shepard tone https://vimeo.com/34749558

Music Classification Overview and Audio Features Graduate School of - PowerPoint PPT Presentation

GCT634: Musical Applications of Machine Learning Music Classification Overview and Audio Features Graduate School of Culture Technology, KAIST Juhan Nam Outlines Definition of Music Classification Tasks Overview of Music Classification

MUSIC THERAPY MUSIC THERAPY What is music therapy? Music therapy is simply the process of using

Audio Device Client Better and Faster Audio I/O on Web Hongchan Choi Google Chrome Web Audio

Audio Indexing and Retrieval IT6902; Semester B, 2004/2005; Leung Audio Indexing and Retrieval

Music Pre-test Product Overview TWO SCENARIOS FOR MUSIC PRE-TEST With audio stimulus With video

JEWISH MUSIC 101: WHAT IS JEWISH MUSIC? A PROGRAM OF THE LOWELL MILKEN FUND FOR AMERICAN JEWISH

The intriguing case of sad music Dr. Jonna Vuoskoski jonna.vuoskoski@music.ox.ac.uk Music &

Music, Language and Computation Aline Honingh LoLaCo Guestlecture 2012 Outline Music at the

COMPANY PROFILE WATER FEATURES 1 WATER FEATURES 2 WATER FEATURES 3 WATER FEATURES 4 WATER

Cirrus Audio Solutions Cirrus Audio Solutions Home Audio Portable Audio Personal CD Player

Music and Pain: A Music Therapy Perspective Deborah Salmon, MA, MTA, CMT BRAMS, Universit de

FOLK MUSIC AT KMH A presentation of the Folk Music Department at the Royal College of Music,

Music Representations Meinard Mller International Audio Laboratories Erlangen

Music Structure Analysis Meinard Mller International Audio Laboratories Erlangen

Music Synchronization Meinard Mller International Audio Laboratories Erlangen

Audio Decomposition Meinard Mller International Audio Laboratories Erlangen

Create PowerPoint Audio and Video V0B August 2020 V0B V0B Schield: 2020 PPTX Create Audio-Video

How to estimate the GW-signal emitted by an evolv- ing system: THE QUADRUPOLE FORMALISM g =

GW5Data analysis (II) and tests of GR Michele Vallisneri ICTP Summer School on Cosmology

Correct-by-Design Control Synthesis for Multilevel Converters using State Space Decomposition G.

An open source user space fast path TCP/IP stack Industry network challenges Growth in data

Missions Bedtime Stories! Contributors: Caroline Armstrong, Steven A. Beverly, Kathy Fulton, Evelyn

3.3 Variance and Standard Deviation recap Anna Karlin Most Slides by Alex Tsun Agenda

Set 3: AJAX Prep, Functions and this Standard Function function handleQuery() { var elems = [

Chapter 13 Abstract Classes and Interfaces CS165 Colorado State University Original slides by

Music Classification Overview and Audio Features Graduate School of - PowerPoint PPT Presentation

GCT634: Musical Applications of Machine Learning Music Classification Overview and Audio Features Graduate School of Culture Technology, KAIST Juhan Nam Outlines Definition of Music Classification Tasks Overview of Music Classification

MUSIC THERAPY MUSIC THERAPY What is music therapy? Music therapy is simply the process of using

Audio Device Client Better and Faster Audio I/O on Web Hongchan Choi Google Chrome Web Audio

Audio Indexing and Retrieval IT6902; Semester B, 2004/2005; Leung Audio Indexing and Retrieval

Music Pre-test Product Overview TWO SCENARIOS FOR MUSIC PRE-TEST With audio stimulus With video

JEWISH MUSIC 101: WHAT IS JEWISH MUSIC? A PROGRAM OF THE LOWELL MILKEN FUND FOR AMERICAN JEWISH

The intriguing case of sad music Dr. Jonna Vuoskoski jonna.vuoskoski@music.ox.ac.uk Music &amp;

Music, Language and Computation Aline Honingh LoLaCo Guestlecture 2012 Outline Music at the

COMPANY PROFILE WATER FEATURES 1 WATER FEATURES 2 WATER FEATURES 3 WATER FEATURES 4 WATER

Cirrus Audio Solutions Cirrus Audio Solutions Home Audio Portable Audio Personal CD Player

Music and Pain: A Music Therapy Perspective Deborah Salmon, MA, MTA, CMT BRAMS, Universit de

FOLK MUSIC AT KMH A presentation of the Folk Music Department at the Royal College of Music,

Music Representations Meinard Mller International Audio Laboratories Erlangen

Music Structure Analysis Meinard Mller International Audio Laboratories Erlangen

Music Synchronization Meinard Mller International Audio Laboratories Erlangen

Audio Decomposition Meinard Mller International Audio Laboratories Erlangen

Create PowerPoint Audio and Video V0B August 2020 V0B V0B Schield: 2020 PPTX Create Audio-Video

How to estimate the GW-signal emitted by an evolv- ing system: THE QUADRUPOLE FORMALISM g =

GW5Data analysis (II) and tests of GR Michele Vallisneri ICTP Summer School on Cosmology

Correct-by-Design Control Synthesis for Multilevel Converters using State Space Decomposition G.

An open source user space fast path TCP/IP stack Industry network challenges Growth in data

Missions Bedtime Stories! Contributors: Caroline Armstrong, Steven A. Beverly, Kathy Fulton, Evelyn

3.3 Variance and Standard Deviation recap Anna Karlin Most Slides by Alex Tsun Agenda

Set 3: AJAX Prep, Functions and this Standard Function function handleQuery() { var elems = [

Chapter 13 Abstract Classes and Interfaces CS165 Colorado State University Original slides by

The intriguing case of sad music Dr. Jonna Vuoskoski jonna.vuoskoski@music.ox.ac.uk Music &