Combining Musical and Cultural Features for Intelligent Style Detection
Brian Whitman Paris Smaragdis
MIT Media Lab Music, Mind and Machine Group (formerly Machine Listening)
Combining Musical and Cultural Features for Intelligent Style - - PowerPoint PPT Presentation
Combining Musical and Cultural Features for Intelligent Style Detection Brian Whitman Paris Smaragdis MIT Media Lab Music, Mind and Machine Group (formerly Machine Listening) What Were Getting At Overall Results 120 100 80 Style ID
MIT Media Lab Music, Mind and Machine Group (formerly Machine Listening)
20 40 60 80 100 120 Style Style ID Prediction
Combined Audio Cultural
Artists Sound & Score Delivery (CDs, bits) Listeners
Channel
I nformation Source Transmitter Receiver Destination
Artists Sound & Score Listeners “My favorite song” “Timbaland produced the new Missy record” “Uninspired electro-glitch rock” “Reminds me of my ex-girlfriend” P2P Collections Online playlists Informal reviews Query habits
! Musical memory, personal preference,
! Audio sim / rec as insult!
! Large-scale cultural factors, “stranger
Web mining, NLP Web mining, NLP
Automatic music description (“cultural representation”) Time-aware recommendation (‘buzz factor’ extraction) Query-by-description
P2P Network Models
Daily ‘Top 40’ for peer-to-peer networks (Napster/Gnutella/etc) User models, trend ID Content-based representation Feature extraction (beat, instrument types)
Sound
! Acoustic:
! Instrumentation ! Short-time (timbral) ! Mid-time (structural) ! Usually all we have
! Cultural:
! Long-scale time ! Inherent user m odel ! Listener’s perspective ! Two-way IR
Which genre? Which artist? What instruments? Describe this. Do I like this? 10 years ago? Which style?
! Independent kernel
! Acoustic: fine-grained,
! Cultural: intrinsic user
! P2P, web, usenet, future?
! Machine learning friendly
! Cultural Feature Extraction:
! Web crawls for music information ! Retrieved documents are parsed for:
! P2P crawl:
! Robots watch OpenNap network for shared
2 ) ) (log(
2
µ − −
d
f t d t
! ‘Soft’ ground truth
! 10-20 songs per artist ! Minnowmatch testbed ! Cross album
! R&B / top 40 pop (marketing) ! Rap (substyle glut)
! “IDM”
! All styles have genres above them ! Artists can have multiple styles ! Albums can have styles, too
! = Sound + Peers + Time
! Marketing recommendation ! New music recommendation ! Self-recommendation
! Search ! Musical synonymy / hypernymy
Mya Mouse on Mars Outkast Kenny Chesney Black Sabbath Toni Braxton Plone Mystikal Garth Brooks Led Zeppelin Debelah Morgan Squarepusher Wu-Tang Clan Tim McGraw Skid Row Aaliyah Aphex Twin Ice Cube Alan Jackson AC/ DC Lauryn Hill Boards of Canada DMX Billy Ray Cyrus Guns N’ Roses Female R&B IDM Hardcore Rap Contemporary Country Heavy Metal
! 3 frame delay
10 20 30 40 50 60 70 Precision (%) 1 2 3 4 5 Style
Acoustic Representation
Heavy Metal Contemporary Country Hardcore Rap IDM Female Vocal R&B
! Sum overlap of smoothing function
10 20 30 40 50 60 70 Precision (%) 1 2 3 4 5 Style
Cultural Representation
Heavy Metal Contemporary Country Hardcore Rap IDM Female Vocal R&B
10 20 30 40 50 60 70 Precision (%) 1 2 3 4 5 Style
Combined Representation
Heavy Metal Contemporary Country Hardcore Rap IDM Female Vocal R&B
20 40 60 80 100 120 Style Style ID Prediction
Combined Audio Cultural
! CM proven for artist similarity
! Against AMG editors
! Against human evaluation
! Current IR uses of CM:
! Recommendation / Buzz Factor Extraction ! Query by Description ! Grounding Sound
! Artists change over time ! So does audience perception
! Parsable content goes up during album
! Freely available