Course Introduction Juhan Nam Who We Are Instructor: Juhan Nam - - PowerPoint PPT Presentation

course introduction
SMART_READER_LITE
LIVE PREVIEW

Course Introduction Juhan Nam Who We Are Instructor: Juhan Nam - - PowerPoint PPT Presentation

GCT634/AI613: Musical Applications of Machine Learning (Fall 2020) Course Introduction Juhan Nam Who We Are Instructor: Juhan Nam Associate Professor, Graduate School of Culture Technology (GSCT) Affiliated Professor, Graduate


slide-1
SLIDE 1

GCT634/AI613: Musical Applications of Machine Learning (Fall 2020)

Course Introduction

Juhan Nam

slide-2
SLIDE 2

Who We Are

  • Instructor: Juhan Nam

○ Associate Professor, Graduate School of Culture Technology (GSCT) ○ Affiliated Professor, Graduate School of Artificial Intelligence (GSAI) ○ Music and Audio Computing Lab: https://mac.kaist.ac.kr/

  • TAs

○ Taegyun Kwon, PhD student, GSCT ○ Taejun Kim, PhD student, GSCT ○ Wonil Kim, PhD student, GSCT ○ Seunghyun Lee, MS Student, GSAI

slide-3
SLIDE 3

Music

  • The most widely enjoyed cultural contents and activities

Music in KAIST

Source: http://times.kaist.ac.kr/news/articleView.html?idxno=3835, http://times.kaist.ac.kr/news/articleView.html?idxno=3185

slide-4
SLIDE 4

Music and Computer

  • Computer is now essential in musical activities

○ Music listening: download music tracks as compressed audio files and uncompress them into waveforms

slide-5
SLIDE 5

Music and Computer

  • Computer is now essential in musical activities

○ Music performance: musical instrument, karaoke machine

slide-6
SLIDE 6

Music and Computer

  • Computer is now essential in musical activities

○ Music composition and production: recording, MIDI, processing, mixing

slide-7
SLIDE 7

Data and Processing

  • The role of computer is representing musical data in a digital form and

processing them according to the target task

○ Data: audio, MIDI, text (meta data) ○ Processing: audio (un)compression, sound synthesis, recording, digital audio effect, editing, and mixing

Input data Output data

slide-8
SLIDE 8

Data and Processing

  • The role of computer is representing musical data in a digital form and

processing them according to the target task Input data Output data In such music systems, each step of processing is hand-designed and programmed by human based on domain knowledge such as digital signal processing, acoustics, and music theory

slide-9
SLIDE 9

Machine Learning for Music?

  • Machine Learning (ML) is a method of teaching computer how to make

accurate predictions using data

  • Why we need machine learning for music ?

Input data Output data In the ML-based systems, each step of processing is learned from data through a learning algorithm

slide-10
SLIDE 10

Music Listening

  • Music at scale

○ Spotify: 60M tracks, 40K new tracks per day, and 4B playlists (2020) ○ SoundCloud: 200M tracks (2019) ○ YouTube: 500 min videos upload per min (2020) (music tracks, music videos and performance videos)

  • Content organization, search, and recommendation become

important

○ Meta data is not enough to explain the “content” of music ○ Need more rich descriptions

Source: https://newsroom.spotify.com/company-info/

Naver VIBE

slide-11
SLIDE 11

Music Listening

  • Pandora’s Music Genome Project (1999)

○ Annotate a track with about 450 music attributes

■ Genre, instruments, timbre, vocal quality, …

○ Playlists are generated using the similarity

  • f music attribute vectors

○ Not biased by the popularity

  • Problems

○ Takes 20-30 mins to annotate a single track by a music expert ○ The dictionary size of music attributes is fixed

Pandora Internet Radio

slide-12
SLIDE 12

Music Listening

  • Need to teach computer how to describe music with natural language as

humans do

○ Associate music with not only musical terms (e.g., genre, instrument, timbre) but also listening contexts (e.g. mood, time of day, location and activity)

Input data Output data

slide-13
SLIDE 13

Google for Music (KAIST MAC Lab)

slide-14
SLIDE 14

Music Performance

  • Mobile apps become popular in music education and entertainment

○ Music score ○ Karaoke ○ Instrument learning game

  • Emergence of “smart” features

○ Performance evaluation ○ Score following and page turning ○ Auto-accompaniment

Source: https://promusicianhub.com/yousician-review/, https://musescore.org, https://www.smule.com/

MuseScore Smule Yousician

slide-15
SLIDE 15

Music Performance

  • Extract music score information from audio

○ Identify and separate sound sources in adverse acoustic conditions: microphone, reverberation and interfering sources ○ Detect multiple pitches from polyphonic musical instruments: e.g., piano, guitar

Source: http://jameasy.com/ko/company.html, https://magenta.tensorflow.org/onsets-frames

slide-16
SLIDE 16

Music Performance

  • Need to teach computer how to separate individual sources and extract

musical information from complex auditory scenes

○ Source separation from mixed audio ○ Transcribe polyphonic music into music score or MIDI

Input data Output data

slide-17
SLIDE 17

Re-Performance by Polyphonic Piano Transcription (KAIST MAC Lab)

slide-18
SLIDE 18

Music Composition

  • Automatic music composition has been a dream project since the birth of

computer

○ Illiac Suite (String Quartet No. 4) (1957)

■ The first music score composed by an electronic computer ■ Composed using a Markov model ■ https://www.youtube.com/watch?v=n0njBFLQSk8

○ Experiments in Musical Intelligence (EMI) (1980s)

■ Style imitation using pre-composed patterns (recombinant) ■ https://www.youtube.com/watch?v=t6WeiyvAiYQ&t=52s

○ Numerous approaches in “algorithmic composition”

■ Audio/MIDI programming languages: Music-N, CSound, Max/PD, Common Music, Supercollider, Chuck ■ Rule-based or statistical models

Source: http://www.moz.ac.at/sem/lehre/lib/es/ems/hist/battisti.html

EMI: recombinant music (David Cope) The Experimental Music Studio (Lejaren Hiller and Leonard Isaacson)

slide-19
SLIDE 19

Music Composition

  • Recent advances in machine learning

○ Learn the sequential order of music data in a highly data-driven way ○ MIDI or audio generation

■ Flow Machine (Sony): https://www.flow-machines.com/ ■ Music Transformer (Google): https://magenta.tensorflow.org/music-transformer ■ Jukebox (OpenAI): https://openai.com/blog/jukebox/

Music Transformer Jukebox Flow Machine (“Hello World”: AI-composed album)

slide-20
SLIDE 20

Music Composition

  • Need to teach computer how to learn the distribution of the high-dimensional

long-term sequential data and generate music from conditions given by human

○ The conditions can be semantic, artist, lyrics, score, audio or even preference ○ Possible to create novel pieces (collaborating with human) ?

Input data Output data

slide-21
SLIDE 21
  • Powerful means to teach computer how to listen, perform and compose music

Machine Learning for Music

Learning Model

Audio Score (MIDI) Text

slide-22
SLIDE 22

Deep Learning

  • The key element in recent AI technology and developed mainly in the computer

vision, speech processing and natural language processing communities

○ Each of them handles a different modality: image, audio, text (i.e., symbol) ○ Due to the nature of data-driven approach (or less use of domain knowledge), the advance of deep learning has been naturally applied to a wide variety of domains that use image, audio and text as a input or output data form ○ Music is one of the domains that have benefited a lot from them

  • Deep learning is representation learning

○ Transform a type of data onto a more meaningful vector space (i.e., feature space) ○ The vector spaces from different modalities of data are associated with each other by their correspondence

slide-23
SLIDE 23

Deep Learning for Music

Audio Score (MIDI) Text Image

Learning Model

  • Modality-agnostic representation learning
slide-24
SLIDE 24

Objectives of This Course

  • Understanding machine learning and deep learning
  • Learning how to apply it to various tasks in the music domain
  • Hands-on experiences with Python language and machine learning libraries

through homework

  • Gain experience of the full cycle of research through the final project
slide-25
SLIDE 25

Course Format

  • This course is served as an 100% online format
  • There are two types of online sessions

○ Pre-Recorded videos

■ Cover the lecture part ■ Uploaded weekly in KLMS (YouTube link) ■ Students must watch videos before the weekly Zoom meeting

○ Weekly Zoom meeting

■ Focus on review, interactive Q&A and hands-on practice ■ Thursday from Week #2: 2:30-3:45 PM (Hopefully, less than 60 minutes)

slide-26
SLIDE 26

Schedules

  • Week 1

○ Course introduction

  • Week 2

○ Audio data representations

  • Week 3

○ Machine learning review: supervised learning

  • Week 4

○ Machine learning review: unsupervised learning

slide-27
SLIDE 27

Schedules

  • Week 5

○ Chusuk (no class)

  • Week 6

○ Convolutional neural network (CNN): music classification and tagging

  • Week 7

○ Recurrent neural network (RNN): automatic music transcription

  • Week 8

○ Break (no class)

slide-28
SLIDE 28

Schedules

  • Week 9

○ Auto-encoder, U-net: source separation

  • Week 10

○ Variational auto-encoder (VAE), generative adversarial network (GAN): music generation and sound synthesis

  • Week 11

○ Auto-regressive models: music generation and sound synthesis

  • Week 12

○ Transformer: music transcription and generation

slide-29
SLIDE 29

Schedules

  • Week 13

○ Invited talk or advanced topics (TBD)

  • Week 14

○ Invited talk or advanced topics (TBD)

  • Week 15

○ TBD

  • Week 16

○ Final project presentations

slide-30
SLIDE 30

Pre-requisite

  • Linear Algebra
  • Probability and Statistics
  • Basic understanding of machine learning and deep learning
  • Digital Signal Processing: digital filters, discrete Fourier transform, and spectral

analysis

  • Programming Language: Python
slide-31
SLIDE 31

Software

  • Audio processing: Librosa
  • Machine learning and deep learning: Scikit-learn, PyTorch
  • And more…
slide-32
SLIDE 32

Grading

  • 4 assignments: 50%
  • Final project (paper review, presentation and report): 50%
slide-33
SLIDE 33

Course Website

  • https://mac.kaist.ac.kr/~juhan/gct634