ds504 cs586 big data analytics graph mining ii
play

DS504/CS586: Big Data Analytics Graph Mining II Prof. Yanhua Li - PowerPoint PPT Presentation

Welcome to DS504/CS586: Big Data Analytics Graph Mining II Prof. Yanhua Li Time: 6-8:50PM Thursday Location: AK233 Spring 2018 Course Project I has been graded. v Grading was based on v 1. Project report v 2. Project team presentation v 3.


  1. Welcome to DS504/CS586: Big Data Analytics Graph Mining II Prof. Yanhua Li Time: 6-8:50PM Thursday Location: AK233 Spring 2018

  2. Course Project I has been graded. v Grading was based on v 1. Project report v 2. Project team presentation v 3. Self-&-cross evaluation form v 4. In-class survey/evaluation form v I also provided comments to your project reports in Canvas discussion forum. v If you are interested in publishing your results, talk to me. (Totally optional.) 2 Logistics

  3. Course Project II v Projects will be in groups! v 4-6 students per group, depending on enrollment v “ research-oriented ” project timeline: v Team Project v Starting date: Week 8 (R) on 3/1: v Project proposal due date: Week 10 (R) 3/15: v Project Progress Presentation: TBD, 15mins per team: v Project due date: Week 16 (R) 4/26: v Project final Presentation: Week 16 (R) 4/26: 3 Logistics

  4. Graph Data Graphs are everywhere. Ecological Biological Social Network Network Network Chemical Program Flow Web Graph Network 4

  5. Complex Graphs Real-life graph contains complex contents – labels associated with nodes, edges and graphs. Node Labels: Location, Gender, Charts, Library, Events, Groups, Journal, Tags, Age, Tracks. 5

  6. Large Graphs Large Scale Graphs. # of Users # of Links Facebook 400 Million 52K Million Twitter 105 Million 10K Million LinkedIn 60 Million 0.9K Million Last.FM 40 Million 2K Million LiveJournal 25 Million 2K Million del.icio.us 5.3 Million 0.7K Million DBLP 0.7 Million 8 Million 6

  7. Mining in Big Graphs v Network Statistic Analysis (last lecture) § Network Size § Degree distribution. v Node Ranking (this lecture) § Identifying most important/influential nodes § Viral Marketing, resource allocation

  8. Characterize Node Importance v Rank the webpages in search engine. v Viral Marketing, resource allocation v Open a new restaurant, find the optimal location v …

  9. Brainstorming } Node Importance 3 4 1 5 2 6 They are equivalent.

  10. Ranking nodes on an undirected graph Node Degree Stationary distribution Connected Graphs } Local Importance } Global Importance 3 3 4 4 1 1 5 5 2 2 6 6 } π (5)=4/14 } |V|=6 } d(5)=4 } π (3)=3/14 } |E|=7 } d(3)=3 } π (4)=2/14 } d(4)=2 } π (2)=2/14 } d(2)=2 } π (1)=2/14 } d(1)=2 } π (6)=1/14 } d(6)=1 10 They are equivalent. 10

  11. Ranking nodes on a directed graph Node in & out Degree Stationary distribution Strongly Connected Graphs & Aperiodic } Local Importance } Global Importance 3 3 4 4 1 1 5 5 2 2 6 6 } d in (3)=3; d out (5)=3; } π (5)=? } d in (5)=2; d out (3)=2; } π (4)=? } d in (1)=2; d out (1)=2; } π (3)=? } d in (2)=2; d out (4)=2; } π (2)=? } d in (4)=1; d out (2)=1; } π (1)=? } d in (6)=1; d out (6)=1; } π (6)=? 11 They are equivalent?

  12. Random Walk (Undirected Graph) v Adjacency matrix 1 2 ! $ ! $ 0 1 1 1 3 0 0 0 # & # & 0 2 0 0 1 0 1 0 Symmetric # & # & A = D = # & # & 1 1 0 1 0 0 3 0 # & # & 0 0 0 2 1 0 1 0 " % " % 4 3 v Transition Probability Matrix Undirected ij = 1 " % 0 1/ 3 1/ 3 1/ 3 P $ ' k i 1/ 2 0 1/ 2 0 P = A • D − 1 = $ ' $ ' 1/ 3 1/ 3 0 1/ 3 ∑ x t , i = x t − 1, j p ji $ ' 1/ 2 0 1/ 2 0 # & j v |E|: number of links } π (1)=3/10 v Stationary Distribution } π (3)=3/10 π i = d i } π (2)=2/10 } π (4)=2/10 2 E

  13. Random Walk (directed graph) Strongly Connected Graphs & Aperiodic v Adjacency matrix 1 2 ! $ ! $ 0 1 1 0 2 0 0 0 # & # & 0 1 0 0 0 0 1 0 Asymmetric # & # & A = D = # & # & 1 1 0 1 0 0 3 0 # & # & 0 0 0 1 1 0 0 0 " % " % 4 3 v Transition Probability Matrix 1 " % 0 1/ 2 1/ 2 0 P ij = $ ' k out , i 1/ 2 0 1/ 2 0 P = A • D − 1 = $ ' $ ' 1/ 3 1/ 3 0 1/ 3 ∑ x t , i = x t − 1, j p ji $ ' 1 0 0 0 # & j v |E|: number of directed links v Stationary Distribution π i ≠ d i π (1)=6/18=1/3 } π (2)=4/18=2/9 } π (3)=3/18=1/6 2 E } π (4)=5/18 }

  14. Ranking nodes in a directed graph Node in & out Degree Stationary distribution Strongly Connected Graphs & Aperiodic } Local Importance } Global Importance 3 3 4 4 1 1 5 5 2 2 6 6 } d in (3)=3; d out (5)=3; } π (1)=5/16 } d in (5)=2; d out (3)=2; } π (3)=1/4 } d in (1)=2; d out (1)=2; } π (2)=3/16 } d in (2)=2; d out (4)=2; } π (4)=1/8 } d in (4)=1; d out (2)=1; } π (5)=3/32 } d in (6)=1; d out (6)=1; } π (6)=1/32 14 They are no longer equivalent. 14

  15. directed graphs Strongly Connected Graphs & Aperiodic 1 2 v Periodic v vs v Aperiodic Graphs § The greatest common divisor of the lengths of its cycles is one or not 4 3 v Disconnected graph v vs 1 2 v Connected graph § Strongly Connected § vs § Weakly Connected 4 3 v Ergodic: Strongly Connected and Aperiodic

  16. Why This Order?

  17. Ranking nodes in a directed graph (II) PageRank HITS } Random Walk } Hub & Authority } with Random Jumps 3 3 4 4 1 1 5 5 2 2 6 6 } R(3)=?; } R a (3)=?; R h (5)=?; } R(5)=?; } R a (5)=?; R h (3)=?; } R(1)=?; } R a (1)=?; R h (1)=?; } R(2)=?; } R a (2)=?; R h (4)=?; } R(4)=?; } R a (4)=?; R h (2)=?; } R(6)=?; } R a (6)=?; R h (6)=?; 17 They are no longer equivalent. 17

  18. Naïve PageRank v Adjacency matrix 1 2 ! $ ! $ 0 1 1 0 2 0 0 0 # & # & 0 1 0 0 0 0 1 0 # & # & A = D = # & # & 1 1 0 1 0 0 3 0 # & # & 0 0 0 1 1 0 0 0 " % " % 4 3 v Transition Probability Matrix 1 " % 0 1/ 2 1/ 2 0 P ij = $ ' k out , i 1/ 2 0 1/ 2 0 P = A • D − 1 = $ ' $ ' 1/ 3 1/ 3 0 1/ 3 $ ' 1 0 0 0 # & ∑ R i = R j p ji π (1)=6/18=1/3 j } π (2)=4/18=2/9 } v Stationary Distribution π (3)=3/18=1/6 } π (4)=5/18 } R i = π i v Disconnected Graph & Random surfing behaviors

  19. Standard PageRank v Adjacency matrix 1 2 ! $ ! $ 0 1 1 0 2 0 0 0 # & # & 0 1 0 0 0 0 1 0 # & # & A = D = # & # & 1 1 0 1 0 0 3 0 # & # & 0 0 0 1 1 0 0 0 " % " % 4 3 v Transition Probability Matrix (d=0.85) 1 " % 0 1/ 2 1/ 2 0 P ij = $ ' k out , i 1/ 2 0 1/ 2 0 P = A • D − 1 = $ ' $ ' 1/ 3 1/ 3 0 1/ 3 $ ' + (1 − d ) 1 1 0 0 0 # & ∑ R i = d R j p ji n j v Stationary Distribution (J is all-1 matrix). " % 0.0375 0.4625 0.4625 0.0375 $ ' R i = π pr , i 0.0375 0.0375 0.0375 0.8875 pr = d • P + (1 − d ) 1 $ ' P n J = $ ' 0.3208 0.3208 0.0375 0.3208 v Convergence $ ' 0.8875 0.0375 0.0375 0.0375 # & § Leading eigenvector of P pr

  20. v How to quantify the importance as a hub and authority separately?

  21. Hub & Authority (HITS) v Adjacency matrix 1 2 ! $ ! $ 0 1 1 0 2 0 0 0 # & # & 0 1 0 0 0 0 1 0 # & # & A = D = # & # & 1 1 0 1 0 0 3 0 # & # & 0 0 0 1 1 0 0 0 " % " % 4 3 v Hub and authority hub ( p ) = 1; auth ( p ) = 1; § Initial Step: § Each step with normalization: hub ( p ) hub ( p ) = ; n ∑ hub ( p ) = auth ( i ) ; n hub ( i ) 2 ∑ i = 1 i = 1 auth ( p ) n auth ( p ) = ∑ hub ( i ) ; auth ( p ) = ; i = 1 n auth ( i ) 2 ∑ i = 1 v Convergence § hub and authority are the left and right singular vector of the adjacency matrix A.

  22. A Note on Maximizing the Spread of Influence in Social Networks E. Even-Dar and A. Shapira

  23. Social Influence

  24. Social Influence Instant Messaging Collaboration networks Sharing sites Location Microblogs Social Based Services networks

  25. Social Influence

  26. Voter Influence Model Opinion diffusions Switch opinions back and forth Word of mouth effect! D Randomly selecting one neighbor to adopt its opinion Bob Alice David [1] P. Clifford and A. Sudbury. A model for spatial conflict. Biometrika , 60(3):581, 1973.

  27. Influence Maximization Budget: Selecting k individuals as initial red seeds Assumption: Uniform cost of selecting each initial seed Goal: Maximize the number of future red nodes [15] E. Even-Dar and A. Shapira. A note on maximizing the spread of influence in social networks. In WINE , 2007.

  28. Formulation x i ( ) Probability of node i being red at step t: t x i ( ) 1 x i ( ) At step t>0, − i i t t x t + 1 ( i ) = ∑ x t ( j ) p ji At step t+1, i j : a ij > 0 p ij = a ij / ∑ a ij 2 j ∈ V 3 f x ( ) x i ( ) = ∑ Influence at step t: t 0 t i V ∈ 6 1 Influence contribution: max : f x ( ) f ( x ) Short term − 4 t 0 0 0 x 0 max :lim f x ( ) f ( x ) 5 Long term − t 0 0 0 t x →∞ 0

  29. Formulation (Random Walk ) Influence at step t: T 1 x t Influence contribution: T Short term max : 1 x t x 0 T max : lim t →∞ 1 x t Long term x 0 T x t x t is a column vector, which is the transpose of row vector x t = x 0 P t Matrix form: t →∞ x 0 P t = π lim t →∞ x t = lim Influence contribution: Short term max : f x ( ) f ( x ) − t 0 0 0 x 0 Long term t →∞ f t ( x 0 ) − f 0 ( x 0 ) = x 0 π T − f 0 ( x 0 ) max : lim x 0

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend