Course Introduction Juhan Nam Who We Are Instructor: Juhan Nam - PowerPoint PPT Presentation

GCT634/AI613: Musical Applications of Machine Learning (Fall 2020) Course Introduction Juhan Nam

Who We Are ● Instructor: Juhan Nam ○ Associate Professor, Graduate School of Culture Technology (GSCT) ○ Affiliated Professor, Graduate School of Artificial Intelligence (GSAI) ○ Music and Audio Computing Lab: https://mac.kaist.ac.kr/ ● TAs ○ Taegyun Kwon, PhD student, GSCT ○ Taejun Kim, PhD student, GSCT ○ Wonil Kim, PhD student, GSCT ○ Seunghyun Lee, MS Student, GSAI

Music ● The most widely enjoyed cultural contents and activities Music in KAIST Source: http://times.kaist.ac.kr/news/articleView.html?idxno=3835, http://times.kaist.ac.kr/news/articleView.html?idxno=3185

Music and Computer ● Computer is now essential in musical activities ○ Music listening: download music tracks as compressed audio files and uncompress them into waveforms

Music and Computer ● Computer is now essential in musical activities ○ Music performance: musical instrument, karaoke machine

Music and Computer ● Computer is now essential in musical activities ○ Music composition and production: recording, MIDI, processing, mixing

Data and Processing ● The role of computer is representing musical data in a digital form and processing them according to the target task Output data Input data ○ Data: audio, MIDI, text (meta data) ○ Processing: audio (un)compression, sound synthesis, recording, digital audio effect, editing, and mixing

Data and Processing ● The role of computer is representing musical data in a digital form and processing them according to the target task Output data Input data In such music systems, each step of processing is hand-designed and programmed by human based on domain knowledge such as digital signal processing, acoustics, and music theory

Machine Learning for Music? ● Machine Learning (ML) is a method of teaching computer how to make accurate predictions using data Output data Input data In the ML-based systems, each step of processing is learned from data through a learning algorithm ● Why we need machine learning for music ?

Music Listening ● Music at scale ○ Spotify: 60M tracks, 40K new tracks per day, and 4B playlists (2020) ○ SoundCloud: 200M tracks (2019) ○ YouTube: 500 min videos upload per min (2020) (music tracks, music videos and performance videos) ● Content organization, search, and recommendation become important ○ Meta data is not enough to explain the “content” of music ○ Need more rich descriptions Naver VIBE Source: https://newsroom.spotify.com/company-info/

Music Listening ● Pandora’s Music Genome Project (1999) ○ Annotate a track with about 450 music attributes Genre, instruments, timbre, vocal quality, … ■ ○ Playlists are generated using the similarity of music attribute vectors ○ Not biased by the popularity Pandora Internet Radio ● Problems ○ Takes 20-30 mins to annotate a single track by a music expert ○ The dictionary size of music attributes is fixed

Music Listening ● Need to teach computer how to describe music with natural language as humans do ○ Associate music with not only musical terms (e.g., genre, instrument, timbre) but also listening contexts (e.g. mood, time of day, location and activity) Input data Output data

Google for Music (KAIST MAC Lab)

Music Performance ● Mobile apps become popular in music education and entertainment ○ Music score ○ Karaoke ○ Instrument learning game ● Emergence of “smart” features MuseScore Smule ○ Performance evaluation ○ Score following and page turning ○ Auto-accompaniment Yousician Source: https://promusicianhub.com/yousician-review/, https://musescore.org, https://www.smule.com/

Music Performance ● Extract music score information from audio ○ Identify and separate sound sources in adverse acoustic conditions: microphone, reverberation and interfering sources ○ Detect multiple pitches from polyphonic musical instruments: e.g., piano, guitar Source: http://jameasy.com/ko/company.html, https://magenta.tensorflow.org/onsets-frames

Music Performance ● Need to teach computer how to separate individual sources and extract musical information from complex auditory scenes ○ Source separation from mixed audio ○ Transcribe polyphonic music into music score or MIDI Input data Output data

Re-Performance by Polyphonic Piano Transcription (KAIST MAC Lab)

Music Composition ● Automatic music composition has been a dream project since the birth of computer ○ Illiac Suite (String Quartet No. 4) (1957) ■ The first music score composed by an electronic computer ■ Composed using a Markov model https://www.youtube.com/watch?v=n0njBFLQSk8 ■ ○ Experiments in Musical Intelligence (EMI) (1980s) The Experimental Music Studio (Lejaren Hiller and Leonard Isaacson) Style imitation using pre-composed patterns (recombinant) ■ ■ https://www.youtube.com/watch?v=t6WeiyvAiYQ&t=52s ○ Numerous approaches in “algorithmic composition” ■ Audio/MIDI programming languages: Music-N, CSound, Max/PD, Common Music, Supercollider, Chuck ■ Rule-based or statistical models EMI: recombinant music (David Cope) Source: http://www.moz.ac.at/sem/lehre/lib/es/ems/hist/battisti.html

Music Composition ● Recent advances in machine learning ○ Learn the sequential order of music data in a highly data-driven way ○ MIDI or audio generation ■ Flow Machine (Sony): https://www.flow-machines.com/ Music Transformer (Google): https://magenta.tensorflow.org/music-transformer ■ ■ Jukebox (OpenAI): https://openai.com/blog/jukebox/ Flow Machine Jukebox Music Transformer (“Hello World”: AI-composed album)

Music Composition ● Need to teach computer how to learn the distribution of the high-dimensional long-term sequential data and generate music from conditions given by human ○ The conditions can be semantic, artist, lyrics, score, audio or even preference ○ Possible to create novel pieces (collaborating with human) ? Input data Output data

Machine Learning for Music ● Powerful means to teach computer how to listen, perform and compose music Audio Learning Score (MIDI) Model Text

Deep Learning ● The key element in recent AI technology and developed mainly in the computer vision, speech processing and natural language processing communities ○ Each of them handles a different modality: image, audio, text (i.e., symbol) ○ Due to the nature of data-driven approach (or less use of domain knowledge), the advance of deep learning has been naturally applied to a wide variety of domains that use image, audio and text as a input or output data form ○ Music is one of the domains that have benefited a lot from them ● Deep learning is representation learning ○ Transform a type of data onto a more meaningful vector space (i.e., feature space) ○ The vector spaces from different modalities of data are associated with each other by their correspondence

Deep Learning for Music ● Modality-agnostic representation learning Audio Score (MIDI) Learning Model Text Image

Objectives of This Course ● Understanding machine learning and deep learning ● Learning how to apply it to various tasks in the music domain ● Hands-on experiences with Python language and machine learning libraries through homework ● Gain experience of the full cycle of research through the final project

Course Format ● This course is served as an 100% online format ● There are two types of online sessions ○ Pre-Recorded videos ■ Cover the lecture part Uploaded weekly in KLMS (YouTube link) ■ ■ Students must watch videos before the weekly Zoom meeting ○ Weekly Zoom meeting ■ Focus on review, interactive Q&A and hands-on practice ■ Thursday from Week #2: 2:30-3:45 PM (Hopefully, less than 60 minutes)

Schedules ● Week 1 ○ Course introduction ● Week 2 ○ Audio data representations ● Week 3 ○ Machine learning review: supervised learning ● Week 4 ○ Machine learning review: unsupervised learning

Schedules ● Week 5 ○ Chusuk (no class) ● Week 6 ○ Convolutional neural network (CNN): music classification and tagging ● Week 7 ○ Recurrent neural network (RNN): automatic music transcription ● Week 8 ○ Break (no class)

Schedules ● Week 9 ○ Auto-encoder, U-net: source separation ● Week 10 ○ Variational auto-encoder (VAE), generative adversarial network (GAN): music generation and sound synthesis ● Week 11 ○ Auto-regressive models: music generation and sound synthesis ● Week 12 ○ Transformer: music transcription and generation

Schedules ● Week 13 ○ Invited talk or advanced topics (TBD) ● Week 14 ○ Invited talk or advanced topics (TBD) ● Week 15 ○ TBD ● Week 16 ○ Final project presentations

Pre-requisite ● Linear Algebra ● Probability and Statistics ● Basic understanding of machine learning and deep learning ● Digital Signal Processing: digital filters, discrete Fourier transform, and spectral analysis ● Programming Language: Python

Software ● Audio processing: Librosa ● Machine learning and deep learning: Scikit-learn, PyTorch ● And more…

Grading ● 4 assignments: 50% ● Final project (paper review, presentation and report): 50%

Course Introduction Juhan Nam Who We Are Instructor: Juhan Nam - PowerPoint PPT Presentation

GCT634/AI613: Musical Applications of Machine Learning (Fall 2020) Course Introduction Juhan Nam Who We Are Instructor: Juhan Nam Associate Professor, Graduate School of Culture Technology (GSCT) Affiliated Professor, Graduate

Course Orientation q Course Description q Course Outcomes q Course Requirements q Course Outline

Course Search Widget Topics StudyLink Course Search Widget Demo Generic Course Search

Course Specifications/Detailed Course Outline Course code : STA 331 2.0 Course title :

Statistics II Xavier Vil Course 2004-2005 1.- Course Contents 2.- Course Resources 3.-

DPD Basic Bicycle Course Course Objectives COURSE GOAL: The course will provide the trainee with

CANVAS COURSE PROFILE STUDENT PERFORMANCE COURSE OVERVIEW ASSIGNMENT AND SUBMISSION ANALYSIS

Leadplane Training Course Leadplane Training Course Course Objectives Describe procedures for

ARM Microcontroller Course June 3, 2015 ARM Microcontroller Course The Course Direct Digital

Course Home Page Course Design Course Structure main source reading-intensive course

Level 1, V2.0 Level 1, V2.0 1 Course Contents Course Contents Course Contents Course

Programming for Robotics Introduction to ROS Course 3 Marko Bjelonic, Dominic Jud, Martin

Programming for Robotics Introduction to ROS Course 2 Martin Wermelinger, Dominic Jud, Marko

Introduction to CICS Course introduction Course introduction What is CICS? What is an

Lecture 1.1 Course Introduction Course Introduction and Overview Course Goals Learn how

to the 1 year Foundation Course Aims of the Foundation course The course has four distinct

Sophomore Course Selection Scheduling Process 4-Year Plan with counselor Make course

Holographic hydrodynamization Micha P . Heller m.p.heller@uva.nl University of Amsterdam, The

Synchronous Programming In Audio Processing: A Use Case Study Karim Barkati 1 Pierre Jouvelot 2 1

CPU consumption for AM/FM audio effects Antonio Goulart Marcelo Queiroz Victor Lazzarini Joseph

Paper Summaries Any takers? Sound and Animation This week is the last week for paper

Sonification - Sound of Science VU, WS 2013 Lecture II - Sonification Tools Visda Goudarzi

Some nice features of AP-schemes Anisotropic transport equations Claudia Negulescu Institut de

Residual Distribution Schemes for Astrophysical Flows James A. Rossmanith Department of

Signal Rate Inference for Dimensional Faust Multi-Dimensional Faust Y. Orlarey P. Jouvelot