announcements thank you for participating in our mid
play

Announcements: - Thank you for participating in our mid-quarter - PowerPoint PPT Presentation

Announcements: - Thank you for participating in our mid-quarter evaluation Thank you for participating in our homework feedback polls! - - Course project - Average was ~80% Dont worry about grade but take feedback seriously - -


  1. Announcements: - Thank you for participating in our mid-quarter evaluation Thank you for participating in our homework feedback polls! ☺ - - Course project - Average was ~80% Don’t worry about grade but take feedback seriously - - Project Milestone due Thu Sun - No late days and no exceptions - Consider meeting with your assigned TA

  2.  We often think of networks being organized into modules, clusters, communities: 5/7/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 2

  3. 5/7/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 3

  4. Nodes Nodes Adjacency matrix Network 5/7/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 4

  5.  Find micro-markets by partitioning the query-to-advertiser graph: query advertiser [Andersen, Lang: Communities from seed sets, 2006] 5/7/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 5

  6.  Clusters in Movies-to-Actors graph: [Andersen, Lang: Communities from seed sets, 2006] 5/7/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 6

  7.  Discovering social circles, circles of trust: [McAuley, Leskovec: Discovering social circles in ego networks, 2012] 5/7/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 7

  8.  Graph is large ▪ Assume the graph fits in main memory ▪ For example, to work with a 200M node and 2B edge graph one needs approx. 16GB RAM ▪ But the graph is too big for running anything more than linear time algorithms  We will cover a PageRank based algorithm for finding dense clusters ▪ The runtime of the algorithm will be proportional to the cluster size (not the graph size!) 5/7/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 8

  9.  Discovering clusters based on seed nodes ▪ Given: Seed node s ▪ Compute (approximate) P ersonalized P age R ank ( PPR ) around node s (teleport set={ s }) ▪ Idea is that if s belongs to a nice cluster, the random walk will get trapped inside the cluster Seed node 5/7/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 9

  10. Cluster “quality” (lower is better) Good clusters Seed node  Algorithm outline: Node rank in decreasing PPR score ▪ Pick a seed node s of interest ▪ Run PPR with teleport set = { s } ▪ Sort the nodes by the decreasing PPR score ▪ Sweep over the nodes and find good clusters 5/7/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 10

  11. 5 1  Undirected graph 𝑯(𝑾,𝑭): 2 6 4 3  Partitioning task: ▪ Divide vertices into 2 disjoint groups 𝐵, 𝐶 = 𝑊\𝐵 A B=V\A 5 1 2 6 4 3  Question: ▪ How can we define a “good” cluster in 𝑯 ? 5/7/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 11

  12.  What makes a good cluster? ▪ Maximize the number of within-cluster connections ▪ Minimize the number of between-cluster connections 5 1 2 6 4 3 A V\A 5/7/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 12

  13.  Express cluster quality as a function of the “edge cut” of the cluster  Cut: Set of edges (edge weights) with only one node in the cluster: Note: This works for weighted and unweighted (set all w ij =1 ) graphs A 5 1 cut(A) = 2 2 6 4 3 5/7/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 13

  14.  Partition quality: Cut score ▪ Quality of a cluster is the weight of connections pointing outside the cluster  Degenerate case: “Optimal cut” Minimum cut  Problem: ▪ Only considers external cluster connections ▪ Does not consider internal cluster connectivity 5/7/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 14

  15. [Shi-Malik]  Criterion: Conductance: Connectivity of the group to the rest of the network relative to the density of the group    | {( , ) ; , } | i j E i A j A  = ( ) A − min( ( ), 2 ( )) vol A m vol A 𝒘𝒑𝒎(𝑩) : total weight of the edges with at least m … number of edges of one endpoint in 𝑩 : 𝐰𝐩𝐦 𝑩 = σ 𝒋∈𝑩 𝒆 𝒋 the graph ◼ Vol(A)=2*#edges inside A + #edges pointing out of A d i … degree ◼ Why use this criterion? of node I E... edge set ◼ Produces more balanced partitions of the graph 5/7/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 15

  16. 𝝔 = 𝟑/𝟓 = 𝟏.𝟔 𝝔 = 𝟕/𝟘𝟑 = 𝟏. 𝟏𝟕𝟔 5/7/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 16

  17.  Algorithm outline: Conductance 𝝔 𝑩 𝒋 ▪ Pick a seed node s of Good clusters interest ▪ Run PPR w/ teleport={ s } ▪ Sort the nodes by the decreasing PPR score ▪ Sweep over the nodes and find good clusters Node rank i in decreasing PPR score  Sweep: ▪ Sort nodes in decreasing PPR score 𝑠 1 > 𝑠 2 > ⋯ > 𝑠 𝑜 ▪ For each 𝒋 compute 𝝔(𝑩 𝒋 = 𝒔 𝟐 , … 𝒔 𝒋 ) ▪ Local minima of 𝝔(𝑩 𝒋 ) correspond to good clusters 5/7/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 17

  18.  The whole Sweep Conductance 𝝔 𝑩 𝒋 Good clusters curve can be computed in linear time: ▪ For loop over the nodes ▪ Keep hash-table of Node rank i in decreasing PPR score nodes in a set 𝐵 𝑗 ▪ To compute 𝝔 𝑩 𝒋+𝟐 = 𝐷𝑣𝑢(𝐵 𝑗+1 )/𝑊𝑝𝑚(𝐵 𝑗+1 ) ▪ 𝑊𝑝𝑚 𝐵 𝑗+1 = 𝑊𝑝𝑚 𝐵 𝑗 + 𝑒 𝑗+1 ▪ 𝐷𝑣𝑢 𝐵 𝑗+1 = 𝐷𝑣𝑢 𝐵 𝑗 + 𝑒 𝑗+1 − 2#(𝑓𝑒𝑕𝑓𝑡 𝑝𝑔 𝑣 𝑗+1 𝑢𝑝 𝐵 𝑗 ) 5/7/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 18

  19.  How to compute Personalized PageRank (PPR) without touching the whole graph? ▪ Power method won’t work since each single iteration accesses all nodes of the graph: 𝐬 (𝐮+𝟐) = 𝛄𝐍 ⋅ 𝐬 (𝒖) + 𝟐 − 𝜸 𝒃 At index S ▪ 𝒃 is a teleport vector: 𝒃 = 𝟏 … 𝟏 𝟐 𝟏 … 𝟏 𝑼 ▪ 𝒔 is the personalized PageRank vector  Approximate PageRank [Andersen, Chung, Lang, ‘07] ▪ A fast method for computing approximate Personalized PageRank ( PPR ) with teleport set ={ s } ▪ ApproxPageRank(s, β , ε ) ▪ s … seed node ▪ β … teleportation parameter ▪ ε … approximation error parameter 5/7/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 19

  20.  Overview of the approximate PPR ▪ Lazy random walk , which is a variant of a random walk that stays put with probability 1/2 at each time step, and walks to a random neighbor the other half of the time: 𝑒 𝑗 … degree of 𝑗 (𝒖) ▪ Keep track of residual PPR score 𝒓 𝒗 = 𝒒 𝒗 − 𝒔 𝒗 ▪ Residual tells us how well PPR score 𝑞 𝑣 of 𝒗 is approximated ▪ 𝒒 𝒗 … is the “true” PageRank of node 𝒗 (𝒖) … is PageRank estimate of node 𝑣 at around 𝒖 ▪ 𝒔 𝒗 𝒓 𝒗 If residual 𝒓 𝒗 of node 𝒗 is too big 𝒆 𝒗 ≥ 𝜻 then push the walk further (distribute some of residual 𝑟 𝑣 to all 𝑣 ’s neighbors along out- coming edges), else don’t touch the node 5/7/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 20

  21.  A different way to look at PageRank: [Jeh&Widom. Scaling Personalized Web Search , 2002] 𝒒 𝜸 (𝒃) = 𝟐 − 𝜸 𝒃 + 𝜸 𝒒 𝜸 (𝑵 ⋅ 𝒃) ▪ 𝒒 𝜸 (𝒃) is the true PageRank vector with teleport parameter 𝜸 , and teleport vector 𝒃 ▪ 𝒒 𝜸 (𝑵 ⋅ 𝒃) is the PageRank vector with teleportation vector 𝑵 ⋅ 𝒃 , and teleportation parameter 𝜸 ▪ where 𝑵 is the stochastic PageRank transition matrix ▪ Notice: 𝑵 ⋅ 𝒃 is one step of a random walk 5/7/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 21

  22.  Proving: 𝒒 𝜸 (𝒃) = 𝟐 − 𝜸 𝒃 + 𝜸 𝒒 𝜸 (𝑵 ⋅ 𝒃) ▪ We can break this probability into two cases: ▪ Walks of length 0, and ▪ Walks of length longer than 0 ▪ The probability of length 0 walk is 𝟐 − 𝜸 , and the walk ends where it started, with walker distribution 𝒃 ▪ The probability of walk length >0 is 𝜸 , and then the walk starts at distribution 𝒃 , takes a step, (so it has distribution 𝑵𝒃 ), then takes the rest of the random walk with distribution 𝒒 𝜸 (𝑵𝒃) ▪ Note that we used the memoryless nature of the walk: After we know the location of the second step of the walk has distribution 𝑵𝒃 , the rest of the walk can forget where it started and behave as if it started at 𝑵𝒃 . This is the key idea of the proof. 5/7/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 22

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend