Lyric-Based Music Recommendation Paper Authors: Derek Gossi, Mehmet - - PowerPoint PPT Presentation

lyric based music recommendation
SMART_READER_LITE
LIVE PREVIEW

Lyric-Based Music Recommendation Paper Authors: Derek Gossi, Mehmet - - PowerPoint PPT Presentation

Lyric-Based Music Recommendation Paper Authors: Derek Gossi, Mehmet H Gunes University of Nevada, Reno By: Brendan Abraham, Abhijith Mandya Overview The challenge of Music Recommendation Industry standard (Collaborative Filtering) vs


slide-1
SLIDE 1

Lyric-Based Music Recommendation

Paper Authors: Derek Gossi, Mehmet H Gunes University of Nevada, Reno By: Brendan Abraham, Abhijith Mandya

slide-2
SLIDE 2

Overview

  • The challenge of Music Recommendation
  • Industry standard (Collaborative Filtering) vs Text Mining Approach(Lyrical

Analysis)

  • Choice of Data (musiXmatch & Million Song Dataset - MSD)
  • Feature Engineering (TF-IDF, Cosine Similarity, Subgraph Analysis)
  • Ranked recommendations using K-Means
  • Performance comparison against each other and random recommendations
  • Conclusions and way forward
slide-3
SLIDE 3

Collaborative Filtering

  • Today,

most music-based recommendation systems use collaborative filtering

  • Based on User Preference

and not musical or lyrical content

  • Doesn’t

scale well when songs lack user ratings

Hey! I like songs A, B, C,and D Well! I like songs B, C, D and E You should definitely listen to A then You should totally check

  • ut E
slide-4
SLIDE 4

Subjectivity in Music Recommendations

ME TRYING TO EXPLAIN MY TASTE IN MUSIC

  • Preferences can span a wide

range of genres and styles

  • User ratings and preferences

limit the capabilities

  • f

recommendation tasks

  • Find factors beyond genre

that influence a listener’s probability of enjoying a song.

slide-5
SLIDE 5
  • Supervised classification of

lyrical content into categories using tags

  • Can

show importance

  • f

rhyme, repetition, and meter

  • This paper uses a complex

lyrical network and compares clustering methods on them

The Lyrical Approach

slide-6
SLIDE 6
  • Lyrics for 237,662 tracks and 22,821

unique artists linked to MSD

  • BOW format
  • Stemmed using a modified Porter2

stemming algorithm

  • Limited to the top 5,000 words

accounting for ~92% all unique words.

Lyrics Dataset

slide-7
SLIDE 7

Methodology: Overview

Lyric Data TF-IDF Similarity Matrix Graph Analysis KNN Lyrical VSM Similarity Matrix Graph Analysis KNN User-based VSM User Listening Data

slide-8
SLIDE 8

Methodology: Vector Space Models

Song TFM

Group song vectors by artist

Artist TFM

237k rows One word vector per song 22k unique artists One word vector per artist

Goal: Represent an artist’s “vocabulary”

Stopword Removal

Artist TF-IDF Matrix

slide-9
SLIDE 9

Methodology: Pairwise Artist Similarity Matrix

  • An AnxAn matrix from artist

computed from artist TFM

  • Measures

cosine similarity between authors Ai and Aj

A1 A2 ... An A1 1 .8 ... .2 A2 .32 1 ... .4

. . . . . . . . . . . .

.6 An .22 .6 1

slide-10
SLIDE 10

Methodology: Artist Graph Construction

1. Built KNN based on cosine similarity matrix a. Vector: cosine similarities to all other artists b. Each artist connected to top K most similar artists c. Chose K=10 without much justification -_- 2. Each artist node has: a. Outdegree (outgoing edges) of k b. Unknown Indegree (incoming edges)

Eminem 50 Cent

  • Dr. Dre

Linkin Park Carlos Santana Metallica Shakira Los Lonely Boys

slide-11
SLIDE 11

Methodology: Subgraph Analysis

  • 23 categories and 3 types
  • Each artist has set of tags (latin, spanish … etc)
  • Filter graph by category to only include artists with tags from that category
  • Analyze # of incoming and outgoing edges to subgraph

Eminem 50 Cent

  • Dr. Dre

Linkin Park Carlos Santana Metallica Shakira Los Lonely Boys

Category Type Unique Tags

slide-12
SLIDE 12

Methodology: Subgraph Analysis

Eminem 50 Cent

  • Dr. Dre

Linkin Park Carlos Santana Metallica Shakira Los Lonely Boys

  • 23 categories and 3 types
  • Each artist has set of tags (latin, spanish … etc)
  • Filter graph by category to only include artists with tags from that category
  • Analyze # of incoming and outgoing edges to subgraph

Category Type Unique Tags

slide-13
SLIDE 13

Evaluation Approach

1. Compared Network Topologies of Lyric and Collaborative-Filtering graphs using subgraph analysis a. Each connection = a recommendation b. Measured how often recommendations stay within genre i. Compared # of edges ‘leaving’ subgraph to # of edges ‘staying’ in subgraph 2. For each artist, calculated top 1k most similar artists from both graphs a. Calculated difference between lists b. Used Rank Biased Overlap (RBO) 3. Measured lyrical graph utility by comparing recs. to randomly generated recs.

slide-14
SLIDE 14

Network Topology Comparison

Network Diameter Average Shortest Path Clustering Coefficient Lyrics Network 10 4.52 0.217 CF Network 6 4.22 0.119

  • But more tightly clustered - users listen to a broad spectrum of

categories

  • Lesser cluster connectivity - niche lyrical content vs pop genres
slide-15
SLIDE 15
  • Lyrical Network 12.5 times more superior compared to random

recommendations

  • Advantageous to consider in determining the initial recommendations to and

from new/emerging artist or song.

Ranking Compared to CF Mean RBO Lyrical Ranking 0.0649 Random Ranking 0.0052

Recommendation Performance against Random

slide-16
SLIDE 16
  • Improvements

○ Did not justify chosen parameter values (!) (k-value) ○ Never explicitly explained how recommendations were made… ○ Could incorporate more features other than TF-IDF ■ Sentiment Analysis ■ Measures of repetition and word choice ■ Word embeddings for subtleties ○ Never mentioned how random list was generated for RBO analysis

  • Future Work

○ Can only extract so much information from lyrics ■ Using raw sound data could be more fruitful (like bpm) ○ Combine approaches to reap benefits of both

Improvements and Future Work

slide-17
SLIDE 17

THANK YOU

Abhijith Mandya Brendan Abraham

slide-18
SLIDE 18

In-Degree Distribution

The lyrics network is significantly biased than the collaborative filtering network, with the top 10% of nodes receiving 65.1% of the possible edges. In comparison, the top 10% of nodes in the collaborative filtering network only receive 22.6%

  • f the possible edges.