Course Introduction Juhan Nam Who We Are Instructor: Juhan Nam - - PowerPoint PPT Presentation
Course Introduction Juhan Nam Who We Are Instructor: Juhan Nam - - PowerPoint PPT Presentation
GCT634/AI613: Musical Applications of Machine Learning (Fall 2020) Course Introduction Juhan Nam Who We Are Instructor: Juhan Nam Associate Professor, Graduate School of Culture Technology (GSCT) Affiliated Professor, Graduate
Who We Are
- Instructor: Juhan Nam
○ Associate Professor, Graduate School of Culture Technology (GSCT) ○ Affiliated Professor, Graduate School of Artificial Intelligence (GSAI) ○ Music and Audio Computing Lab: https://mac.kaist.ac.kr/
- TAs
○ Taegyun Kwon, PhD student, GSCT ○ Taejun Kim, PhD student, GSCT ○ Wonil Kim, PhD student, GSCT ○ Seunghyun Lee, MS Student, GSAI
Music
- The most widely enjoyed cultural contents and activities
Music in KAIST
Source: http://times.kaist.ac.kr/news/articleView.html?idxno=3835, http://times.kaist.ac.kr/news/articleView.html?idxno=3185
Music and Computer
- Computer is now essential in musical activities
○ Music listening: download music tracks as compressed audio files and uncompress them into waveforms
Music and Computer
- Computer is now essential in musical activities
○ Music performance: musical instrument, karaoke machine
Music and Computer
- Computer is now essential in musical activities
○ Music composition and production: recording, MIDI, processing, mixing
Data and Processing
- The role of computer is representing musical data in a digital form and
processing them according to the target task
○ Data: audio, MIDI, text (meta data) ○ Processing: audio (un)compression, sound synthesis, recording, digital audio effect, editing, and mixing
Input data Output data
Data and Processing
- The role of computer is representing musical data in a digital form and
processing them according to the target task Input data Output data In such music systems, each step of processing is hand-designed and programmed by human based on domain knowledge such as digital signal processing, acoustics, and music theory
Machine Learning for Music?
- Machine Learning (ML) is a method of teaching computer how to make
accurate predictions using data
- Why we need machine learning for music ?
Input data Output data In the ML-based systems, each step of processing is learned from data through a learning algorithm
Music Listening
- Music at scale
○ Spotify: 60M tracks, 40K new tracks per day, and 4B playlists (2020) ○ SoundCloud: 200M tracks (2019) ○ YouTube: 500 min videos upload per min (2020) (music tracks, music videos and performance videos)
- Content organization, search, and recommendation become
important
○ Meta data is not enough to explain the “content” of music ○ Need more rich descriptions
Source: https://newsroom.spotify.com/company-info/
Naver VIBE
Music Listening
- Pandora’s Music Genome Project (1999)
○ Annotate a track with about 450 music attributes
■ Genre, instruments, timbre, vocal quality, …
○ Playlists are generated using the similarity
- f music attribute vectors
○ Not biased by the popularity
- Problems
○ Takes 20-30 mins to annotate a single track by a music expert ○ The dictionary size of music attributes is fixed
Pandora Internet Radio
Music Listening
- Need to teach computer how to describe music with natural language as
humans do
○ Associate music with not only musical terms (e.g., genre, instrument, timbre) but also listening contexts (e.g. mood, time of day, location and activity)
Input data Output data
Google for Music (KAIST MAC Lab)
Music Performance
- Mobile apps become popular in music education and entertainment
○ Music score ○ Karaoke ○ Instrument learning game
- Emergence of “smart” features
○ Performance evaluation ○ Score following and page turning ○ Auto-accompaniment
Source: https://promusicianhub.com/yousician-review/, https://musescore.org, https://www.smule.com/
MuseScore Smule Yousician
Music Performance
- Extract music score information from audio
○ Identify and separate sound sources in adverse acoustic conditions: microphone, reverberation and interfering sources ○ Detect multiple pitches from polyphonic musical instruments: e.g., piano, guitar
Source: http://jameasy.com/ko/company.html, https://magenta.tensorflow.org/onsets-frames
Music Performance
- Need to teach computer how to separate individual sources and extract
musical information from complex auditory scenes
○ Source separation from mixed audio ○ Transcribe polyphonic music into music score or MIDI
Input data Output data
Re-Performance by Polyphonic Piano Transcription (KAIST MAC Lab)
Music Composition
- Automatic music composition has been a dream project since the birth of
computer
○ Illiac Suite (String Quartet No. 4) (1957)
■ The first music score composed by an electronic computer ■ Composed using a Markov model ■ https://www.youtube.com/watch?v=n0njBFLQSk8
○ Experiments in Musical Intelligence (EMI) (1980s)
■ Style imitation using pre-composed patterns (recombinant) ■ https://www.youtube.com/watch?v=t6WeiyvAiYQ&t=52s
○ Numerous approaches in “algorithmic composition”
■ Audio/MIDI programming languages: Music-N, CSound, Max/PD, Common Music, Supercollider, Chuck ■ Rule-based or statistical models
Source: http://www.moz.ac.at/sem/lehre/lib/es/ems/hist/battisti.html
EMI: recombinant music (David Cope) The Experimental Music Studio (Lejaren Hiller and Leonard Isaacson)
Music Composition
- Recent advances in machine learning
○ Learn the sequential order of music data in a highly data-driven way ○ MIDI or audio generation
■ Flow Machine (Sony): https://www.flow-machines.com/ ■ Music Transformer (Google): https://magenta.tensorflow.org/music-transformer ■ Jukebox (OpenAI): https://openai.com/blog/jukebox/
Music Transformer Jukebox Flow Machine (“Hello World”: AI-composed album)
Music Composition
- Need to teach computer how to learn the distribution of the high-dimensional
long-term sequential data and generate music from conditions given by human
○ The conditions can be semantic, artist, lyrics, score, audio or even preference ○ Possible to create novel pieces (collaborating with human) ?
Input data Output data
- Powerful means to teach computer how to listen, perform and compose music
Machine Learning for Music
Learning Model
Audio Score (MIDI) Text
Deep Learning
- The key element in recent AI technology and developed mainly in the computer
vision, speech processing and natural language processing communities
○ Each of them handles a different modality: image, audio, text (i.e., symbol) ○ Due to the nature of data-driven approach (or less use of domain knowledge), the advance of deep learning has been naturally applied to a wide variety of domains that use image, audio and text as a input or output data form ○ Music is one of the domains that have benefited a lot from them
- Deep learning is representation learning
○ Transform a type of data onto a more meaningful vector space (i.e., feature space) ○ The vector spaces from different modalities of data are associated with each other by their correspondence
Deep Learning for Music
Audio Score (MIDI) Text Image
Learning Model
- Modality-agnostic representation learning
Objectives of This Course
- Understanding machine learning and deep learning
- Learning how to apply it to various tasks in the music domain
- Hands-on experiences with Python language and machine learning libraries
through homework
- Gain experience of the full cycle of research through the final project
Course Format
- This course is served as an 100% online format
- There are two types of online sessions
○ Pre-Recorded videos
■ Cover the lecture part ■ Uploaded weekly in KLMS (YouTube link) ■ Students must watch videos before the weekly Zoom meeting
○ Weekly Zoom meeting
■ Focus on review, interactive Q&A and hands-on practice ■ Thursday from Week #2: 2:30-3:45 PM (Hopefully, less than 60 minutes)
Schedules
- Week 1
○ Course introduction
- Week 2
○ Audio data representations
- Week 3
○ Machine learning review: supervised learning
- Week 4
○ Machine learning review: unsupervised learning
Schedules
- Week 5
○ Chusuk (no class)
- Week 6
○ Convolutional neural network (CNN): music classification and tagging
- Week 7
○ Recurrent neural network (RNN): automatic music transcription
- Week 8
○ Break (no class)
Schedules
- Week 9
○ Auto-encoder, U-net: source separation
- Week 10
○ Variational auto-encoder (VAE), generative adversarial network (GAN): music generation and sound synthesis
- Week 11
○ Auto-regressive models: music generation and sound synthesis
- Week 12
○ Transformer: music transcription and generation
Schedules
- Week 13
○ Invited talk or advanced topics (TBD)
- Week 14
○ Invited talk or advanced topics (TBD)
- Week 15
○ TBD
- Week 16
○ Final project presentations
Pre-requisite
- Linear Algebra
- Probability and Statistics
- Basic understanding of machine learning and deep learning
- Digital Signal Processing: digital filters, discrete Fourier transform, and spectral
analysis
- Programming Language: Python
Software
- Audio processing: Librosa
- Machine learning and deep learning: Scikit-learn, PyTorch
- And more…
Grading
- 4 assignments: 50%
- Final project (paper review, presentation and report): 50%
Course Website
- https://mac.kaist.ac.kr/~juhan/gct634