Cross-Domain Recommendation via Clustering on Multi-Layer Graphs Al - - PowerPoint PPT Presentation

β–Ά
cross domain recommendation via clustering on multi layer
SMART_READER_LITE
LIVE PREVIEW

Cross-Domain Recommendation via Clustering on Multi-Layer Graphs Al - - PowerPoint PPT Presentation

Cross-Domain Recommendation via Clustering on Multi-Layer Graphs Al Aleksandr Fa Farseev, Ivan Samborskii, Andrey Filchenkov, Tat-Seng Chua By AleksandrFarseev http://farseev .com Aug 8 th , 2017 Venue Category Recommendation Collaborative


slide-1
SLIDE 1

Cross-Domain Recommendation via Clustering on Multi-Layer Graphs

Aug 8th, 2017

Al Aleksandr Fa Farseev, Ivan Samborskii, Andrey Filchenkov, Tat-Seng Chua

By AleksandrFarseev http://farseev.com

slide-2
SLIDE 2

Collaborative Venue Category Recommendation – recommendation of venue categories (i.e. restaurant, cinema) to user using information about his/her profile (i.e. past visits) and/or information about users from the same domain. Venue categories: Clothing Store Hotel Ice Cream Shop Total 764 different categories

Venue Category Recommendation

Venue categories:

slide-3
SLIDE 3

Idea 1: Utilization of Individual And Group Knowledge for Better Recommendation

slide-4
SLIDE 4

We perform venue category recommendation based on both individual and group knowledge => naturally models the impact of society on an individual's behavior during the selection of a new place to go:

𝑠𝑓𝑑 𝑣 = 𝑑𝑝𝑠𝑒 𝛿 * 𝑀𝑓𝑑, + πœ„ βˆ‘ 𝑀𝑓𝑑0

0∈23

𝐷,

User Community-Based Collaborative Recommendation

+

slide-5
SLIDE 5

+ Users from the same community (extracted from multi-source data) may have similar location preferences + Search within user community significantly reduces search space during the recommendation process

What do we need user communities for?

slide-6
SLIDE 6

Example of User Communities (1) Community 1: Gingers Community K: Darker Hair

slide-7
SLIDE 7

One way to find user communities is to model users' relationships in the form of a graph so that dense subgraphs are considered to be user communities.

User Relation and Community Representations

slide-8
SLIDE 8

One of the commonly formulations is MinCut problem. For a given number k of subsets, the MinCut involves choosing a partition 𝐷;,…, 𝐷> such that it minimizes the expression: 𝑑𝑣𝑒 𝐷;,… ,𝐷> = ? 𝑋(𝐷B,𝐷̅B)

> BE;

Community Detection based on a single data source

*W is the sum of weights of edges attached to vertices in 𝐷B

slide-9
SLIDE 9

Approximation of MinCut as st standard tr trace mi minimi mization problem: m: min

H∈IJΓ—L tr 𝑉O𝑀𝑉 ,s.t. 𝑉O𝑉 = 𝐽

which can be solved by Sp Spectral Clu lusterin ing:

1. Calculates Laplacian matrix 𝑀 ∈ 𝑆UΓ—U 2. Builds matrix of the first 𝑙 eigenvectors 𝑉 ∈ 𝑆UΓ—> correspond to the smallest eigenvalues of 𝑀 3. Clusters data in a new space 𝑉 using i.e. 𝑙-means algorithm

How to solve MinCut problem?

slide-10
SLIDE 10

Idea 2: Utilization of Multi-Source Data

slide-11
SLIDE 11

~6 ~6 registered social network accounts per person*

Ac Accounts

People actively use ~3 ~3 social platforms simultaneously*

Ac Active Usage

1 2 3 4 5 10 9 8 7 6

* GlobalWebIndex. 2016. GWI Social report. http://www.globalwebindex.net/blog/internet-users-have-average-of-5-social-media-accounts

Most of user actively use β‰ˆ3 social networks

slide-12
SLIDE 12

Multi-source data describe user from multiple views

slide-13
SLIDE 13

Cr Cross Domain - Ve Venue ca category reco commendation – recommendation of venue categories (i.e. restaurant, cinema) using information about his/her profile (i.e. past visits) and/or information about users from other sources (i.e. images, texts, location types). Venue categories: Clothing Store Hotel Ice Cream Shop

Cross-Domain Venue Category Recommendation

Multi-Source Data:

slide-14
SLIDE 14

Community Detection must performed in a Cross-Source Manner…

  • Data source integration
  • Community detection

Problems:

slide-15
SLIDE 15

Mu Multi-la layer graph – graph 𝐻, where 𝐻 = 𝐻B , 𝐻B= π‘Š,𝐹B

How to represent multi-source data?

slide-16
SLIDE 16

Extending definition of spectral clustering

min

H∈IJΓ—L ? tr 𝑉O𝑀B𝑉 [ BE;

, s.t.𝑉O𝑉 = 𝐽 min

H∈IJΓ—Ltr 𝑉O𝑀\,]𝑉 , where 𝑀\,] = ? 𝑀B [ BE;

Such approximation could suffer from poor poor ge gene neralization

  • n abi

bility.

slide-17
SLIDE 17

Regularized Clustering on Multi-layer Graph -1

Use Gr Grassman Ma Manifolds to keep final latent representation β€œclose” to all layers of multi-layer graph*. Where projected distance between two spaces 𝑍

; and 𝑍 b:

𝑒defg

b

𝑍

;,𝑍 b = 1

2 𝑍

;𝑍 ; O βˆ’ 𝑍 b𝑍 b O k b,where 𝐡 k is the Frobenius norm

𝑒defg

b

𝑇, 𝑇B BE;

[

= 𝑙𝑁 βˆ’ ?tr(𝑇𝑇O βˆ’ 𝑇B𝑇B

O) [ BE;

* X. Dong, P. Frossard, P. Vandergheynst, and N. Nefedov. Clustering on multi-layer graphs via subspace analysis on grassmann manifolds. IEEE Transactions on Signal Processing, 2014.

slide-18
SLIDE 18

Regularized Clustering on Multi-layer Graph -2

Extends the objective function to introduce the subspace analysis regularization min

Hβˆˆβ„JΓ—L ? tr [ BE;

𝑉O𝑀B𝑉 + 𝛽 𝑙𝑁 βˆ’ ? tr

[ BE;

𝑉𝑉O𝑉B𝑉B

O

,s.t. 𝑉O𝑉 = 𝐽 min

Hβˆˆβ„JΓ—Ltr 𝑉O𝑀]ft𝑉

𝑀]ft = ?(𝑀B βˆ’ 𝛽𝑉B𝑉B

O) [ BE;

slide-19
SLIDE 19

Idea 4: Making use of Inter-Layer (Inter-Source) Relations

slide-20
SLIDE 20

Incorporating inter-layer relationship (1)

By using distance on Grassman Manifolds, we present the new objective function for the 𝑗th layer: min

H vwβˆˆβ„JΓ—Ltr 𝑉

vB

O𝑀B𝑉

vB + 𝛾B 𝑙𝑁 βˆ’ ? π‘₯B,gtr

[ gE;,gzB

𝑉 vB𝑉 vB

O𝑉 g𝑉 g O

min

H vwβˆˆβ„JΓ—Ltr 𝑉

vB

O𝑀

{B𝑉 vB 𝑀 {B = 𝑀B βˆ’ 𝛾B ? π‘₯B,gtr

[ gE;,gzB

𝑉

g𝑉 g O

slide-21
SLIDE 21

But how can we determine w|,} when computing i-th layer ?

min

H vwβˆˆβ„JΓ—Ltr 𝑉

vB

O𝑀

{B𝑉 vB 𝑀 {B = 𝑀B βˆ’ 𝛾B ? π‘₯B,gtr

[ gE;,gzB

𝑉

g𝑉 g O

In Inter-la layer rela latio ionship ip graph 𝑺(𝑾,𝑭) – weighted graph which represents the similarity between layers. βˆ€ 𝑗,π‘˜ ∈ 𝐹, π‘₯B,g= βˆ‘ 1 βˆ’ 𝑁B,> βˆ’ 𝑁

g,>

𝑂 𝑂 βˆ’ 1

β€ž >Eb

𝐿 βˆ’ 1 where 𝑁B,> is clustering co-occurrence matrix of layer 𝑗, 𝑛‑,Λ† = 1, if users 𝑏 and 𝑐 assigned to the same cluster, and 0 otherwise.

slide-22
SLIDE 22

Final objective function

Let’s combine equations from previous slides to define the final objective function: min

H βˆˆβ„JΓ—L ?tr [ BE;

𝑉O𝑀 {B𝑉 + 𝛽 𝑙𝑁 βˆ’ ? tr

[ BE;

𝑉𝑉O𝑉 vB𝑉 vB

O

= = min

H βˆˆβ„JΓ—Ltr 𝑉O ?(𝑀

{B βˆ’ 𝛽𝑉 vB𝑉 vB

O) [ BE;

𝑉

slide-23
SLIDE 23
  • Community detection
  • Data source integration

Problems

slide-24
SLIDE 24

Recall: Community-Based Cross-Domain Recommendation

We perform venue category recommendation based on both individual and group knowledge, where group knowledge is obtained from multiple sources:

𝑠𝑓𝑑 𝑣 = 𝑑𝑝𝑠𝑒 𝛿 * 𝑀𝑓𝑑, + πœ„ βˆ‘ 𝑀𝑓𝑑0

0∈23

𝐷,

+

slide-25
SLIDE 25

Twitter Instagram

NUS-MSS Dataset

Dataset* is presented as a set of features, extracted from user-generated data in three social networks:

  • text based fromTwitter (LDA, LIWC, text

features)

  • image based from Instagram (concepts)
  • location based from Foursquare (LDA,

categories, Mobility Features) Foursquare categories is splited into two parts: 3 months data (train) and 2 months (test).

* A. Farseev, N. Liqiang, M. Akbari, and T.-S. Chua. Ha Harvesting multiple so sources s for use ser profile learning: a Big data st

  • study. ACM International

Conference on Multimedia Retrieval (ICMR). China. June 23-26, 2015.

Foursquare

slide-26
SLIDE 26

Linguistic features: LIWC; Latent Topics Heuristic features: Writing behavior

Text Features:

Data Sources

Location Semantics: Venue Category Distribution Mobility Features: Areas of Interest (AOI)

Location Features:

Image Concept Distribution (Image Net)

Image Features

Image Concepts Google Net LIWC LDA Mobility Location Type Preferences Images

slide-27
SLIDE 27

Re Recommender Systems

Po Popular (PO POP) P) β€”recommendation based on user’s past experience Popular Al All (POP Al All) ) β€”recommendation based on experience of all users Mu Multi-So Source Re-Ra Ranking (MSRR) RR) β€” linearly combines recommendation results from all data modalities Ne Nearest Ne Neighbor Collaborative Filtering (CF) β€” recommendation based on top k most similar Foursquare users Ea Early Fusion (EF EF) β€” fuses multi-source data into a single feature vector SV SVD++ β€” makes use of the β€œimplicit feedback” information FM FMβ€” brings together the advantages of different factorization- based models via regularization. πƒπŸ’π’ βˆ’ 𝐌

  • 𝐣 β€” C’R recommendation without inter-layer

regularization πƒπŸ’π’ βˆ’ 𝐌

  • 𝐣 - 𝐌
  • 𝐍𝐩𝐞 β€” C’R recommendation without inter-layer

regularization and sub-space regularization πƒπŸ’π’βˆ’π‘«π’‘π’π’ β€” C’R recommendation without user community extraction πƒπŸ’π’ (DB DBScan) ) β€” C’R recommendation, where user communities are detected by Density-Based clustering (DBScan) πƒπŸ’π’ (x (x-me means) β€” C’R recommendation, where user communities are detected by x-means clustering πƒπŸ’π’ (H (Hierarchical) β€” C’R recommendation, where user communities are detected by Hierarchical Clustering πƒπŸ’π’ β€” Our Ap Approach

Evaluation Baselines

Co Community Detection Approaches

slide-28
SLIDE 28

Evaluation against other recommender systems

slide-29
SLIDE 29

Evaluation against other community detection approaches

+ Incorporation of group knowledge is is important + Multi-modal clustering performs better than single-source clustering + Incorporation of Inter-Source relationshipis crucial.

slide-30
SLIDE 30

Evaluation against source combinations

+ In different geo regions, different data sources are

  • f different importance

+ Location data is more powerful than other data modalities

slide-31
SLIDE 31

Examples of detected user communities

slide-32
SLIDE 32

Future Work

Community Detection is more useful when it is Source-Dependent => Introduce Supervision Into Clustering How?

  • Graph Construction Level – reweight edges according to prior

knowledge about existing user communities

  • Model Level – introduce community-related constraints into

clustering

slide-33
SLIDE 33

Summary

+ Multi-View Data is crucial for User Community Detection + For the task of venue category recommendation, both Group And Individual Knowledge are Important + Venue Category Recommendation is not a conventional recommendation task: users visit many venue types from the past. (items from the train set often occur in test set)

slide-34
SLIDE 34

Our released large multi-source multi-modal datasets

34

NUS-MSS NUS-SENSE http://nusmss.azurewebsites.net http://nussense.azurewebsites.net

The Released Datasets

http://tutorial.farseev.com

Our Tutorial on Multi-View Learning @ WST WSSS’17

slide-35
SLIDE 35

Thank You

Questions? By AleksandrFarseev http://farseev.com

slide-36
SLIDE 36