Lyric-Based Music Recommendation
Paper Authors: Derek Gossi, Mehmet H Gunes University of Nevada, Reno By: Brendan Abraham, Abhijith Mandya
Lyric-Based Music Recommendation Paper Authors: Derek Gossi, Mehmet - - PowerPoint PPT Presentation
Lyric-Based Music Recommendation Paper Authors: Derek Gossi, Mehmet H Gunes University of Nevada, Reno By: Brendan Abraham, Abhijith Mandya Overview The challenge of Music Recommendation Industry standard (Collaborative Filtering) vs
Paper Authors: Derek Gossi, Mehmet H Gunes University of Nevada, Reno By: Brendan Abraham, Abhijith Mandya
Analysis)
most music-based recommendation systems use collaborative filtering
and not musical or lyrical content
scale well when songs lack user ratings
Hey! I like songs A, B, C,and D Well! I like songs B, C, D and E You should definitely listen to A then You should totally check
range of genres and styles
limit the capabilities
recommendation tasks
that influence a listener’s probability of enjoying a song.
lyrical content into categories using tags
show importance
rhyme, repetition, and meter
lyrical network and compares clustering methods on them
unique artists linked to MSD
stemming algorithm
accounting for ~92% all unique words.
Lyric Data TF-IDF Similarity Matrix Graph Analysis KNN Lyrical VSM Similarity Matrix Graph Analysis KNN User-based VSM User Listening Data
Song TFM
Group song vectors by artist
Artist TFM
237k rows One word vector per song 22k unique artists One word vector per artist
Goal: Represent an artist’s “vocabulary”
Stopword Removal
Artist TF-IDF Matrix
computed from artist TFM
cosine similarity between authors Ai and Aj
A1 A2 ... An A1 1 .8 ... .2 A2 .32 1 ... .4
. . . . . . . . . . . .
.6 An .22 .6 1
1. Built KNN based on cosine similarity matrix a. Vector: cosine similarities to all other artists b. Each artist connected to top K most similar artists c. Chose K=10 without much justification -_- 2. Each artist node has: a. Outdegree (outgoing edges) of k b. Unknown Indegree (incoming edges)
Eminem 50 Cent
Linkin Park Carlos Santana Metallica Shakira Los Lonely Boys
Eminem 50 Cent
Linkin Park Carlos Santana Metallica Shakira Los Lonely Boys
Category Type Unique Tags
Eminem 50 Cent
Linkin Park Carlos Santana Metallica Shakira Los Lonely Boys
Category Type Unique Tags
1. Compared Network Topologies of Lyric and Collaborative-Filtering graphs using subgraph analysis a. Each connection = a recommendation b. Measured how often recommendations stay within genre i. Compared # of edges ‘leaving’ subgraph to # of edges ‘staying’ in subgraph 2. For each artist, calculated top 1k most similar artists from both graphs a. Calculated difference between lists b. Used Rank Biased Overlap (RBO) 3. Measured lyrical graph utility by comparing recs. to randomly generated recs.
Network Diameter Average Shortest Path Clustering Coefficient Lyrics Network 10 4.52 0.217 CF Network 6 4.22 0.119
categories
recommendations
from new/emerging artist or song.
Ranking Compared to CF Mean RBO Lyrical Ranking 0.0649 Random Ranking 0.0052
○ Did not justify chosen parameter values (!) (k-value) ○ Never explicitly explained how recommendations were made… ○ Could incorporate more features other than TF-IDF ■ Sentiment Analysis ■ Measures of repetition and word choice ■ Word embeddings for subtleties ○ Never mentioned how random list was generated for RBO analysis
○ Can only extract so much information from lyrics ■ Using raw sound data could be more fruitful (like bpm) ○ Combine approaches to reap benefits of both
Abhijith Mandya Brendan Abraham
The lyrics network is significantly biased than the collaborative filtering network, with the top 10% of nodes receiving 65.1% of the possible edges. In comparison, the top 10% of nodes in the collaborative filtering network only receive 22.6%