Some graph optimization problems in data mining P. Van Dooren, - - PowerPoint PPT Presentation

▶

Jun 07, 2023 200 likes •535 views

Some graph optimization problems in data mining P. Van Dooren, CESAME, Univ. catholique Louvain based on work in collaboration with the group on University of Chicago, October 16, 2012 Leuven Lambiotte et al Phys Rev, 2008 Call density over 6

SLIDE 1

Some graph optimization problems in data mining

University of Chicago, October 16, 2012

P. Van Dooren, CESAME, Univ. catholique Louvain

based on work in collaboration with the group on

SLIDE 2

Lambiotte et al Phys Rev, 2008 Call density over 6 months Leuven

SLIDE 3

Lambiotte et al Phys Rev, 2008 Call density over 6 months Brussels

SLIDE 4

Ref: Melchior, Eng. Thesis, UCL

SLIDE 5

Outline of the talk

Reputation systems

Application to MovieLens Database

Similarity matrix of two graphs

Application to Synonym Extraction

Concluding remarks

SLIDE 6

What is a reputation system ?

Movielens

SLIDE 7

Motivation

Detecting dishonest participants in auction systems ( )

Removing spammers in on-line review databases ( Movielens )
Giving a grade (reputation) to web raters ( )
Evaluating the trust of nodes in Peer2Peer systems ( )

SLIDE 8

Reputation of raters and objects

Given a bipartite graph with n raters and m objects and votes

n the edges, what should be the reputation of these n+m items ?

Example: graph matrix form = X (votes) Characterize the reputation f of the raters and r of the objects

r1 r2 r3

1 1 2 3 5

          3 . 5 2 1 1

r1 r2 r3

1 o2

SLIDE 9

4.2 4.5 2.8 3.4 3.3 4.9 f ? f1 = 4.6 f2 = 4.2 f3 = 3 r ? Belief divergence = Variance

Reputation of raters and objects

SLIDE 10

4.2 4.5 2.8 3.4 3.3 4.9 f ? f1 = 4.6 f2 = 4.2 f3 = 3 r ? Belief divergence = Variance

Reputation of raters and objects

SLIDE 11

4.2 4.5 2.8 3.4 3.3 4.9 f ? r ? Belief divergence = Variance f1 = 5 f2 = 4.8 f3 = 1.4

after convergence

Reputation of raters and objects

SLIDE 12

Our approach

Assume that every rater evaluates all objects with a vote  [0,1] and that f >0 are the voting matrix and the raters’ reputation The object’s reputation vector r is the weighted sum of the votes The rater’s reputation f depends on the discrepancy with the other votes There is a unique pair of vectors r and f satisfying these formulas when d  Inf De Kerchove-VD,SIAM News 08

SLIDE 13

Nonlinear iteration

These two formulas lead to define the following iteration: where the voting matrix could be dynamic and then changes at each

iteration. If the matrix X is fixed, we can prove

Theorem If d > m, the iteration converges towards the unique fixed point that gives the reputations r of the objects and f(r) of the raters.

SLIDE 14

Cost function

If d > m, the fixed point of our iteration corresponds to the minimum

f the following cost function defined on the unit hypercube [0,1]m:

E.g. for m=2, the energy function looks like (for d>2 and for d=1.5)

SLIDE 15

Convergence

and one iteration step corresponds to the steepest descent (with a particular step size) and this converges monotonically to r* since we have

||rk+1-rk||2

SLIDE 16

Data set consists of 100,000 ratings (1-5) from 943 users on 1682 movies. Each user has rated at least 20 movies. The data was collected through the MovieLens web site (movielens.umn.edu) during a seven-month period 237 spammers (scoring always 1 except for their unique best friend that receive the maximum: 5) are added (+25%): The mean (Left) is less robust than our iteration (Middle) that also gives good results for the raters’ reputations (Right).

Convergence for spammers separation after step 1, 2 and Inf

SLIDE 17

Some remarks

Strengths:

linear complexity (in the number of votes)
applicable to any graph and with any rating matrix
can be dynamic (varying matrix Xk)
reputations for the raters
robust against attackers and spammers

Further study:

choice of the function
stability for the dynamic case
mixing raters and objects

SLIDE 18

Similarity matrix of two arbitrary graphs

For A and B adjacency matrices of the two graphs S solves ρS = A S BT + AT S B This matrix can be obtained via fixed point of power method (linear)

Ref: Blondel et al, SIAM Rev., ‘04

SLIDE 19

Similarity matrix of two arbitrary graphs

For A and B adjacency matrices of the two graphs S solves ρS = A S BT + AT S B Element S54 says how similar node 5 of A is to node 4 of B

SLIDE 20

Similarity matrix of two arbitrary graphs

For A and B adjacency matrices of the two graphs S solves ρS = A S BT + AT S B Element S43 says how similar node 4 of A is to node 3 of B

SLIDE 21

Similarity matrix of two arbitrary graphs

For A and B adjacency matrices of the two graphs S solves ρS = A S BT + AT S B Two nodes are similar if their parents and children are similar Such a recursive definition leads to an eigenvector equation

SLIDE 22

Algorithm ?

The (normalized) sequence Zk+1 = (AZk BT+AT

Zk B)/ ||AZk BT+AT Zk B||F

has two fixed points Zeven and Zodd for every Z0>0 Similarity matrix S = lim k→∞ Z2k , Z0 =1 Si,j is the similarity score between Vi (A) and Vj (B) With zk=vec(Zk), this is equivalent to the power method zk+1 = (B  A + BT  AT )zk / ||(B  A + BT  AT )zk||2 which is the power method on M = B  A + BT  AT

SLIDE 23

Some properties

Satisfies ρS=ASBT+ATSB, ρ=||ASBT+ATSB||F It is the nonnegative fixed point S of largest 1-norm It solves the optimization problem max  ASBT+ATSB , S  subject to ||S||F=1 Extension of Kleinberg’s Hits method Linear convergence (power method for sparse M)

SLIDE 24

The dictionary graph

Nodes = words present in the dictionary : 112,169 nodes

Edge (u,v) if v appears in the definition of u : 1,398,424 edges Average of 12 edges per node

Ref: Blondel et al, SIAM Rev., ‘04

SLIDE 25

Neighborhood graph

is the subset of vertices used for finding synonyms : it contains “all” parents and children of the node neighborhood graph of likely “Central” uses this sub-graph to rank automatically synonyms Rank each node in the graph with the similarity to node c in

Ref: Blondel et al, SIAM Rev., ‘04

b c e

SLIDE 26

Disappear

Vectors Central ArcRanc Wordnet Microsoft 1 vanish vanish epidemic vanish vanish 2 wear pass disappearing go away cease to exist 3 die die port end fade away 4 sail wear dissipate finish die out 5 faint faint cease terminate go 6 light fade eat cease evaporate 7 port sail gradually wane 8 absorb light instrumental expire 9 appear dissipate darkness withdraw 10 cease cease efface pass away Mark 3.6 6.3 1.2 7.5 8.6 Std Dev 1.8 1.7 1.2 1.4 1.3

Vectors, Central and ArcRank are automatic, Wordnet, Microsoft Word are manual

SLIDE 27

Sugar

Vectors Central ArcRanc Wordnet Microsoft 1 juice cane granulation sweetening darling 2 starch starch shrub sweetener baby 3 cane sucrose sucrose carbohydrate honey 4 milk milk preserve saccharide dear 5 molasses sweet honeyed

rganic compound

love 6 sucrose dextrose property saccarify dearest 7 wax molasses sorghum sweeten beloved 8 root juice grocer dulcify precious 9 crystalline glucose acetate edulcorate pet 10 confection lactose saccharine dulcorate babe Mark 3.9 6.3 4.3 6.2 4.7 Std Dev 2.0 2.4 2.3 2.9 2.7

SLIDE 28

||S||F=1 UTU=VTV=Ik UTU=VTV=Ik

SLIDE 29

Optimization problems

The fixed point of ρS=ASBT+ATSB, ρ=||ASBT+ATSB||F corresponds to max  ASBT+ATSB , S  subject to ||S||F=1 The fixed point of UΣVT=Πopt(AUVTBT+ATUVTB), corresponds to max  AUVT BT+ATUVT B , UVT subject to UTU=VTV=Ik This is not an eigenvalue problem anymore but can be computed using iterative techniques with a linear complexity per step

SLIDE 30

Projected correlation

max  AUVT BT+ATUVT B , UVT subject to UTU=VTV=Ik Is also equivalent to max UTAU ,VT BV  subject to UTU=VTV=Ik UTAU and VT BV can be viewed as kxk “Rayleigh quotients” Linearly converging iteration (truncated SVD) Uk+1 Σk+1 VT

k+1 +U┴ Σ┴ V┴ T = AUkVT k BT + ATUkVT k B + sUkVT k

SLIDE 31

Correlation of graphs

Graphs with similar structure Correlation is nearly optimal

Fraikin, Nesterov, VD, LAA 07

SLIDE 32

Some remarks

Optimization is on large sparse graphs Complexity of one iteration step is linear in the number of nodes in both graphs We have methods with linear convergence (power-like method and gradient like method) We have Newton-like methods with manifold constraints (UTU=VTV=Ik) Extensions to colored nodes and edges