Clova Music: 똑똑한 DJ같은 AI비서
Clova AI Research(CLAIR), Naver Corp. 김정명 (Adrian Kim), M.S.
Clova Music: DJ AI (Adrian Kim), M.S. Clova AI - - PowerPoint PPT Presentation
Clova Music: DJ AI (Adrian Kim), M.S. Clova AI Research(CLAIR), Naver Corp. Clova: Cloud-based Virtual Assistant General Purpose AI platform Clova: Cloud-based Virtual Assistant https://clova.ai Clova:
Clova AI Research(CLAIR), Naver Corp. 김정명 (Adrian Kim), M.S.
https://clova.ai
+
For 16kHz, 30 seconds = 480k datapoints! Audio Domain Data
Audio Domain Data Expressive, has more information!
Mel Filter banks frequency bins > 1k Audio Domain Data mel bins = 80, 96, 128
Image from Choi, et. al. 16 Audio Domain Data
If complex, inverse stft If only magnitude, Griffin-Lim algorithm stft mel filter bank irreversable WavenetVocoder (Shen et al. 17) Audio Domain Data (128, 2584) (1025, 2584) (1323000,) =2648600 =330752
Storage problem Memory problem Low quality, weakly labeled (Choi et al. 2017) Takes a lot of time for high quality Not much open data Information per data point is very small
MNIST GTZAN Storage 45MB 1.2GB Data pairs 60000 1000 (30 second) Classes 10 digits 10 genres (100 each) Preprocessing Fast Slow Testing Easy Hard Issues
Bad Boy – Red Velvet News Speech Audio Short, Single source Long, Multiple source Issues
e el ion g er n ion
LSTM LSTM LSTMtional ers
LSTM output Attention-weightedAttention (softmax)
er
Element-wise multiplicationected Convolution & pooling Channel summation er
Automatic tagging using deep convolutional neural networks, ISMIR 16, Choi et. al
https://github.com/keunwoochoi/music-auto_tagging-keras
2D Convs
2D convs 1D convs Slow training Fast training Local structure in freq Frequencies are discrete nxm filters, 1 channel nx1 filters, m channels
Real: real and imaginary values as separate channels complex: as suggested
Deep Complex Networks, Trabelsi et al., To appear at ICLR18
Image from https://kakalabblog.wordpress.com/2017/07/18/wavenetnsynth-deep-audio-generative-models/
WaveNet: A Generative Model for Raw Audio, Oord et al., https://arxiv.org/pdf/1609.03499.pdf
and more!
https://magenta.tensorflow.org/nsynth
Generated example
https://magenta.tensorflow.org/performance-rnn
http://benanne.github.io/2014/08/05/spotify-cnns.html#contentbased http://blog.galvanize.com/spotify-discover-weekly-data-science/
* Reported at 2017 Oct.
5 10 15 20 25 NAVER_APP NAVER_PC WAVE CLOVA_APP
* Reported at 2017 Oct.
가요 기능성음악 팝 동요 OST 클래식 재즈 종교음악 일렉트로… 락 힙합 기타 NAVER_APP WAVE
* Reported at 2017 Oct.
* Reported at 2017 Oct.
Playing ratio Artist
* Reported at 2017 Oct.
Playing ratio Artist
WAVE NAVER MUSIC APP
핑크퐁 EXO 아이유 아이유 동요 젝스키스 동요 방탄소년단 EXO 뉴이스트 뉴이스트(NU`EST) 윤종신 Wanna One 별하나 동요 윤종신 이루마 우원재 오르골뮤직 볼빨간사춘기 볼빨간사춘기 뉴이스트 W 젝스키스 황치열 트니트니 헤이즈 헤이즈 선미 성시경 WINNER 힐링피아노 자장가
가을 신나는
Lack of well-defined meta data
Semantic Embedding
Lack of well-defined meta data
Semantic Embedding
Semantic Embedding
< 밤편지_2 > < 밤편지_1 > Personalized Playlists
Semantic Embedding
Ben Athiwaratkun and Andrew Gordon Wilson, Multimodal Word Distributions, 2017
Personalized Playlists
Semantic Embedding
Most popular method: Matrix Factorization
Collaborative Filtering
Collaborative Filtering
DeepArtistID Using more artists increase the representation quality ArtistNet
Representation Learning Using Artist Labels for Audio Classification Tasks, Park et al., MIREX17 Challenge Representation Learning of Music Using Artist Labels, Parketal.,https://arxiv.org/abs/1710.06648
ArtistNet
Representation Learning of Music Using Artist Labels, Parketal.,https://arxiv.org/abs/1710.06648 Representation Learning Using Artist Labels for Audio Classification Tasks, Park et al., MIREX17 Challenge
ArtistNet
Representation Learning Using Artist Labels for Audio Classification Tasks, Park et al., MIREX17 Challenge Representation Learning of Music Using Artist Labels, Parketal.,https://arxiv.org/abs/1710.06648
ArtistNet
Representation Learning Using Artist Labels for Audio Classification Tasks, Park et al., MIREX17 Challenge Representation Learning of Music Using Artist Labels, Parketal.,https://arxiv.org/abs/1710.06648
ArtistNet
Representation Learning Using Artist Labels for Audio Classification Tasks, Park et al., MIREX17 Challenge Representation Learning of Music Using Artist Labels, Parketal.,https://arxiv.org/abs/1710.06648
Music Emotion Recognition
Music Emotion Recognition via End-to-End Multimodal Neural Networks,Jeon etal.,RecSys17
4 / 30
Music Emotion Recognition
Music Emotion Recognition via End-to-End Multimodal Neural Networks,Jeon etal.,RecSys17
Data Model Accuracy Audio CNN 0.6479 RNN 0.6303 MCRN 0.6619 Lyrics CNN 0.7815 RNN 0.7716 Both MCRN, CNN 0.8046
Music Emotion Recognition
Music Emotion Recognition via End-to-End Multimodal Neural Networks,Jeon etal.,RecSys17
Music Emotion Recognition
Music Emotion Recognition via End-to-End Multimodal Neural Networks,Jeon etal.,RecSys17
Music Highlight Extraction via Convolutional Recurrent Attention Networks, Ha et al., ICML17 ML4MD Workshop
Automatic Highlight Extraction
Music Highlight Extraction via Convolutional Recurrent Attention Networks, Ha et al., ICML17 ML4MD Workshop
4000 H H+S mel-spectrogram of x Input: mel-spectrogram of x Output: Starting frame (H) of highlight of size S Automatic Highlight Extraction
Music Highlight Extraction via Convolutional Recurrent Attention Networks, Ha et al., ICML17 ML4MD Workshop
Automatic Highlight Extraction
Music Highlight Extraction via Convolutional Recurrent Attention Networks, Ha et al., ICML17 ML4MD Workshop
Automatic Highlight Extraction
0.05 0.1 0.15 0.2 0.25 0.3 0.35 Total Popular (Top 10%) New Released (Top 10%)
Music Highlight Extraction via Convolutional Recurrent Attention Networks, Ha et al., ICML17 ML4MD Workshop
Conv layer outputs Genre classification Automatic Highlight Extraction
Music Highlight Extraction via Convolutional Recurrent Attention Networks, Ha et al., ICML17 ML4MD Workshop
1 1.4 1.8 2.2 2.6 3 1 100 CNN CRNN C-HiEx R-HiEx
Automatic Highlight Extraction
Music Highlight Extraction via Convolutional Recurrent Attention Networks, Ha et al., ICML17 ML4MD Workshop
Automatic Highlight Extraction Attention Mel energy Attention-weighted Energy from frame n Cumulative Sum of S energy values Speed/Acceleration of energy change Highlight score
Frame on where highlight of length S starts
Music Highlight Extraction via Convolutional Recurrent Attention Networks, Ha et al., ICML17 ML4MD Workshop
Automatic Highlight Extraction
Music Highlight Extraction via Convolutional Recurrent Attention Networks, Ha et al., ICML17 ML4MD Workshop
Naturally mixed sequence of music clips as a single song
Automatic DJ mix generation using highlight detection, Kim et al., ISMIR 17 late breaking session
Automatic DJ Mix Generation
Automatic DJ mix generation using highlight detection, Kim et al., ISMIR 17 late breaking session
Automatic DJ Mix Generation
Automatic DJ mix generation using highlight detection, Kim et al., ISMIR 17 late breaking session
Automatic DJ Mix Generation Segments we want to play can be either highlights, or full songs Downbeat segmentation is critical
Automatic DJ mix generation using highlight detection, Kim et al., ISMIR 17 late breaking session
Automatic DJ Mix Generation
Features extracted from ArtistNet and mapped with t-SNE
Automatic DJ mix generation using highlight detection, Kim et al., ISMIR 17 late breaking session
조경현(NYU) 임재환(USC) 김성훈(HKUST) 박혜원(MIT) 신진우(KAIST) 주재걸(고려대)
…
Jun-Yan Zhu(MIT)
Hannaneh Hajishirzi (UW)