CTP431- Music and Audio Computing Music Information Retrieval - - PowerPoint PPT Presentation

ctp431 music and audio computing music information
SMART_READER_LITE
LIVE PREVIEW

CTP431- Music and Audio Computing Music Information Retrieval - - PowerPoint PPT Presentation

CTP431- Music and Audio Computing Music Information Retrieval Graduate School of Culture Technology KAIST Juhan Nam 1 Introduction Instrument: Piano Classical Genre: Composer: Chopin Key: E-minor Mood: Melancholy, Sad,


slide-1
SLIDE 1

CTP431- Music and Audio Computing Music Information Retrieval

Graduate School of Culture Technology KAIST Juhan Nam

1

slide-2
SLIDE 2

Introduction

2

ü Instrument: ü Genre: ü Composer: ü Key: ü Mood: ü Songs with similar melody ü Can you transcribe the song into a music score ?

  • ELO “After all”
  • Radiohead “Exit Music”

Chopin Piano E-minor Classical

Melancholy, Sad, …

slide-3
SLIDE 3

Information in Music

§ Factual Information

– track, artist, years, composers

§ Musical Information

– Music score: instrument, notes, meter, expressions – Melody, rhythm, chords, structure

§ Semantic Information

– genre, mood, text descriptions

3

slide-4
SLIDE 4

Music Understanding by Human

5 http://www.slideshare.net/Daritsetseg/brainstem-auditory-evoked-responses-baer-or-abr-45762118

slide-5
SLIDE 5

Music Understanding by Computer

§ Music Information Retrieval (MIR)

– An area of research that aims to infer various types of information from music by computers

6

slide-6
SLIDE 6

Applications of MIR

§ Music listening

– Music identification, search and recommendation

§ Music Performance

– Interactive music performance – Musical Instrument learning

§ Music composition

– Automatic composition and arrangement

§ Entertainment

– Singing evaluation, game

§ Sound production

– Sound sample search in sound libraries – Automatic segmentation and digital audio Effects

7

slide-7
SLIDE 7

Background

§ Scale and diversity of music contents

– Commercial music tracks

  • Spotify: 30M+ songs (2015)
  • Bugs music: 10M+ songs (2017)

– User contents

  • YouTube: 300h+ video uploaded per min (2015)
  • SoundCloud: 12h+ audio uploaded per minute (2014)

– User data

  • Profile, play history, rate,
  • Spotify: +24M active users (as of Jan, 2014)
  • YouTube: +1B unique users’ visit each month (as of Dec, 2014)

§ All the music contents are readily accessible.

– How can we find music of my taste? – Can we have a Google for music?

slide-8
SLIDE 8

Music Identification

§ Query by music

– Search a single unique song identified by the query – Audio fingerprinting

10

Shazam

Audio Fingerprinting

(http://labrosa.ee.columbia.edu/matlab/fingerprint/)

slide-9
SLIDE 9

Music Identification

§ Query by humming

– Sing with humming and find closest matches – Melody-based match

11

SoundHound Melody Extraction

slide-10
SLIDE 10

Music Search and Recommendation

§ Music Recommendation

– Playlist generation: personalized internet radio – Matching songs to users

  • Song information: genre, years, artist, audio
  • User information: profile, play history, rating, context (places)

– Music service item in industry: Google, Apple, Pandora, Spotify, Melon, Bugs,…

12

Pandora iTunes Music

slide-11
SLIDE 11

Current Approaches

§ Manual Curation § Human Expert Analysis § Collaborative Filtering § Content-based Analysis (by computers)

13

slide-12
SLIDE 12

Manual Curation

§ Playlist generation by music experts (or users)

– Traditional: AM/FM radio – The majority of current music services are based on this approach

§ Advantages

– Effective for usage-based music services (workout, study, driving or prenatal education) – Good for music discovery – Often with story-telling

§ Limitations

– No personalization – Not scalable

14

[www.soribada.com]

slide-13
SLIDE 13

Human Expert Analysis

§ Pandora: music genome project (1999)

– Musicologists analyze a song for about 450 musical attributes in various categories – Big success as a music service

§ Advantages

– High-quality analysis – Good for music discovery

§ Limitations

– Expensive: take 20-30 minutes for a song to be analyzed – Not scalable : only for commercial tracks ?

15

slide-14
SLIDE 14

Collaborative Filtering (CF)

§ Basic idea § Formation

– Matrix factorization (or matrix completion) problem

16

Person A: I like songs A, B, C and D. Person B: I like songs A, B, C and E. Person A: Really? You should check out song D. Person B: Wow, you also should check out song E.

16

Juhan Gangnam Style Juhan’s latent vector Gangnam Style’s latent vector

xu ys

pus = xu

Tys

Song Preference

qu1u2 = xu1

T xu2

User Similarity

r

s1s2 = ys1 T ys2

Song Similarity

slide-15
SLIDE 15

Collaborative Filtering

§ Advantages

– Capture semantics of music in the aspect of human – Enable personalized recommendation (by nature)

§ Limitations

– The cold start problem: what if a song was never played by anyone? – Popularity bias: likely to recommend (already) well-known songs or songs from the same musician or album

17

slide-16
SLIDE 16

Collaborative Filtering

§ Bad examples

18

Can you find songs similar to this musician? [Oord et. al, 2013]

slide-17
SLIDE 17

Content-Based Analysis

§ An intelligent approach that makes computers listen to music and predict descriptive words from audio tracks

– Tags: genre, mood, instrument, voice quality, usage – Features: Spectrogram, MFCC, – Algorithms: GMM, SVM, Neural Networks

19

Algorithms Audio Files Audio Features

slide-18
SLIDE 18
slide-19
SLIDE 19

Text-based Music Retrieval by Auto-tagging

§ Sort the probability of the query tag and choose top-N songs

– Like text-based Google search

§ We also can compute similarity between songs using the estimated tag probabilities

– E.g. cosine distance between two tag probability vectors – Applicable to query by audio

21

Query word: “Female Lead Vocals” Top 5 ranked songs Norah Jones – Don’t know why Dido – Here with me Sheryl Crow – I shall believe No doubt – Simple kind of like Carpenters – Rainy days and Mondays

slide-20
SLIDE 20

Demo: Music Galaxy Hitchhiker

(b) Search by Song mode with highlighted search results

slide-21
SLIDE 21

Content-based Music Recommendation

§ Blending audio and user data

– Replace the text-based tags with the latent vector of a song

23

Audio Track of “Gangnam Style” Matrix factorization from collaborative filtering

[Oord et. al, 2013]

“user” “song” “Gangnam Style’s latent vector

slide-22
SLIDE 22

Music Retrieval Results

24

Collaborative Filtering only Collaborative Filtering + Audio Content

[Oord et. al, 2013]

slide-23
SLIDE 23

Content-Based Analysis

§ Advantages

– Free of cold-start and popularity bias – Highly scalable: using high-performance computing – Works for music in other media or user content as well – Can be combined with other approaches

§ Limitations

– Social context is also important: indy, idol, affilation – Do not care of music quality (e.g. level of performance), especially for user contents

25

slide-24
SLIDE 24

Automatic Music Transcription (AMT)

§ Predict score information from audio

– Note information: note onset, duration, velocity – Rhythm: tempo, beat, down-beat – Chord – Structure

slide-25
SLIDE 25

Zenph’s Re-performance

slide-26
SLIDE 26

Zenph’s Re-performance

slide-27
SLIDE 27

Entertainment / Education

29

Yousician

slide-28
SLIDE 28

Score-Audio Alignment

§ Temporally align audio and score

– Dynamic time warping of AMT results as audio features

§ Applications

– Score Following – Automatic page turning – Auto-accompaniment – Performance analysis

slide-29
SLIDE 29

Automatic Page Turner (JKU, Austria)

slide-30
SLIDE 30

32

The Piano Music Companion (JKU, Austria)

slide-31
SLIDE 31

33

Sonation’s Cadenza

slide-32
SLIDE 32

Music Production

https://www.youtube.com/watch?v=RmT6MDOD3uc

slide-33
SLIDE 33

Music Production

§ Adaptive Audio Effects: automatic effect control

– Loudness

  • Compressor

– Pitch

  • Pitch correction (e.g. auto-tune)
  • Harmonizer

– Timbre

  • Genre-based automatic EQ

Antares Auto-tune

slide-34
SLIDE 34

Music Production

§ Singing Expression Transfer

– Given two renditions of the same piece of music – Transfer singing expressions from one voice to another

Note timing, Pitch, Dynamics

slide-35
SLIDE 35

Singing Expression Transfer

Feature Extraction

Target Singing Voice Source Singing Voice Time-Scale Modification Pitch Shifting Gain

DTW Smoothing HPSS Envelope Detector Pitch Detector

Modified Singing Voice

stretching ratio smoothed stretching ratio pitch ratio gain ratio harmonic signal

Temporal Alignment Pitch Alignment Dynamics Alignment

𝑡 𝑡" 𝑡"# 𝑡"#$

slide-36
SLIDE 36

Singing Expression Transfer: Demo Examples

source target all modified source 벚꽃엔딩 Let it go 취중진담

slide-37
SLIDE 37

Music Production

§ Sound Sample search

– Imagine Research’s MediaMind: search sound effect sample for media production (e.g. film, drama) – Izotope’s Breaktweaker: search similar timbre of drum sounds

39

slide-38
SLIDE 38

Automatic Music Composition

§ Algorithmic Composition

– An Area of Generative Art

§ Types of Algorithms

– Generative Grammar – Transition Network – Markov Model – Generic Algorithms – Neural Networks

slide-39
SLIDE 39

Automatic Music Composition

§ David Cope’s EMI (Experiments in Music Intelligence) (1980s)

– Based on Style Imitation

Augmented Transition Networks

slide-40
SLIDE 40

Recent Work: Automatic Music Composition

§ Flow Machine

– Style Imitation based on Markov Model – http://www.flow-machines.com/

§ Magenta

– Python Library based Deep Neural Networks (TensorFlow) – https://magenta.tensorflow.org/welcome-to-magenta

slide-41
SLIDE 41

“Daddy’s car”: Sony CSL Lab’s Flow Machines

slide-42
SLIDE 42

Automatic Music Composition

§ Background Music Generation: www.jukedeck.com

slide-43
SLIDE 43

Automatic Music Arrangement

쿨잼(Cool Jamm) – Hum On

slide-44
SLIDE 44

Musical “Process” and “Data”

“Musical” Knowledge Base “Physical” Knowledge Base Performer Composer Instrument Perception Cognition Sound Field Source Sound Temporal Control Symbolic Representation Room Listener Process Data

slide-45
SLIDE 45

Music Technology: The Present

“Musical” Knowledge Base “Physical” Knowledge Base Performer Composer Instrument Perception Cognition Sound Field Source Sound Temporal Control Symbolic Representation Room Listener Process Data

slide-46
SLIDE 46

Music Technology: The Future

“Musical” Knowledge Base “Physical” Knowledge Base Performer Composer Instrument Perception Cognition Sound Field Source Sound Temporal Control Symbolic Representation Room Listener Process Data