Announcements: - Thank you for participating in our mid-quarter - - PowerPoint PPT Presentation

announcements thank you for participating in our mid
SMART_READER_LITE
LIVE PREVIEW

Announcements: - Thank you for participating in our mid-quarter - - PowerPoint PPT Presentation

Announcements: - Thank you for participating in our mid-quarter evaluation Thank you for participating in our homework feedback polls! - - Course project - Average was ~80% Dont worry about grade but take feedback seriously - -


slide-1
SLIDE 1

Announcements:

  • Thank you for participating in our mid-quarter evaluation
  • Thank you for participating in our homework feedback polls! ☺
  • Course project
  • Average was ~80%
  • Don’t worry about grade but take feedback seriously
  • Project Milestone due Thu Sun
  • No late days and no exceptions
  • Consider meeting with your assigned TA
slide-2
SLIDE 2

 We often think of networks being organized

into modules, clusters, communities:

5/7/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 2

slide-3
SLIDE 3

5/7/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 3

slide-4
SLIDE 4

5/7/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 4

Network Adjacency matrix

Nodes Nodes

slide-5
SLIDE 5

 Find micro-markets by partitioning the

query-to-advertiser graph:

5/7/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 5

advertiser query

[Andersen, Lang: Communities from seed sets, 2006]

slide-6
SLIDE 6

 Clusters in Movies-to-Actors graph:

5/7/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 6

[Andersen, Lang: Communities from seed sets, 2006]

slide-7
SLIDE 7

 Discovering social circles, circles of trust:

5/7/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 7

[McAuley, Leskovec: Discovering social circles in ego networks, 2012]

slide-8
SLIDE 8

 Graph is large

▪ Assume the graph fits in main memory

▪ For example, to work with a 200M node and 2B edge graph one needs approx. 16GB RAM

▪ But the graph is too big for running anything more than linear time algorithms

 We will cover a PageRank based algorithm

for finding dense clusters

▪ The runtime of the algorithm will be proportional to the cluster size (not the graph size!)

5/7/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 8

slide-9
SLIDE 9

 Discovering clusters based on seed nodes

▪ Given: Seed node s ▪ Compute (approximate) Personalized PageRank (PPR) around node s (teleport set={s}) ▪ Idea is that if s belongs to a nice cluster, the random walk will get trapped inside the cluster

5/7/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 9

Seed node

slide-10
SLIDE 10

 Algorithm outline:

▪ Pick a seed node s of interest ▪ Run PPR with teleport set = {s} ▪ Sort the nodes by the decreasing PPR score ▪ Sweep over the nodes and find good clusters

5/7/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 10

Node rank in decreasing PPR score

Cluster “quality” (lower is better)

Good clusters

Seed node

slide-11
SLIDE 11

 Undirected graph 𝑯(𝑾,𝑭):  Partitioning task:

▪ Divide vertices into 2 disjoint groups 𝐵, 𝐶 = 𝑊\𝐵

 Question:

▪ How can we define a “good” cluster in 𝑯?

5/7/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 11

1 3 2 5 4 6 A B=V\A

1 3 2 5 4 6

slide-12
SLIDE 12

 What makes a good cluster?

▪ Maximize the number of within-cluster connections ▪ Minimize the number of between-cluster connections

5/7/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 12

1 3 2 5 4 6

A V\A

slide-13
SLIDE 13

A

 Express cluster quality as a function of the

“edge cut” of the cluster

 Cut: Set of edges (edge weights) with only

  • ne node in the cluster:

5/7/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 13

cut(A) = 2

1 3 2 5 4 6

Note: This works for weighted and unweighted (set all wij=1) graphs

slide-14
SLIDE 14

 Partition quality: Cut score

▪ Quality of a cluster is the weight of connections pointing outside the cluster

 Degenerate case:  Problem:

▪ Only considers external cluster connections ▪ Does not consider internal cluster connectivity

5/7/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 14

“Optimal cut” Minimum cut

slide-15
SLIDE 15

 Criterion: Conductance:

Connectivity of the group to the rest of the network relative to the density of the group

𝒘𝒑𝒎(𝑩): total weight of the edges with at least

  • ne endpoint in 𝑩: 𝐰𝐩𝐦 𝑩 = σ𝒋∈𝑩𝒆𝒋

◼ Vol(A)=2*#edges inside A + #edges pointing out of A

◼ Why use this criterion?

◼ Produces more balanced partitions

5/7/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 15

[Shi-Malik]

)) ( 2 ), ( min( | } , ; ) , {( | ) ( A vol m A vol A j A i E j i A −    = 

m… number

  • f edges of

the graph di… degree

  • f node I

E...edge set

  • f the graph
slide-16
SLIDE 16

5/7/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 16

𝝔 = 𝟑/𝟓 = 𝟏.𝟔 𝝔 = 𝟕/𝟘𝟑 = 𝟏. 𝟏𝟕𝟔

slide-17
SLIDE 17

 Algorithm outline:

▪ Pick a seed node s of interest ▪ Run PPR w/ teleport={s} ▪ Sort the nodes by the decreasing PPR score

▪ Sweep over the nodes and find good clusters

5/7/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 17

Node rank i in decreasing PPR score

Conductance 𝝔 𝑩𝒋

Good clusters

 Sweep:

▪ Sort nodes in decreasing PPR score 𝑠

1 > 𝑠2 > ⋯ > 𝑠 𝑜

▪ For each 𝒋 compute 𝝔(𝑩𝒋 = 𝒔𝟐, … 𝒔𝒋 ) ▪ Local minima of 𝝔(𝑩𝒋) correspond to good clusters

slide-18
SLIDE 18

 The whole Sweep

curve can be computed in linear time:

▪ For loop over the nodes ▪ Keep hash-table of nodes in a set 𝐵𝑗 ▪ To compute 𝝔 𝑩𝒋+𝟐 = 𝐷𝑣𝑢(𝐵𝑗+1)/𝑊𝑝𝑚(𝐵𝑗+1)

▪ 𝑊𝑝𝑚 𝐵𝑗+1 = 𝑊𝑝𝑚 𝐵𝑗 + 𝑒𝑗+1 ▪ 𝐷𝑣𝑢 𝐵𝑗+1 = 𝐷𝑣𝑢 𝐵𝑗 + 𝑒𝑗+1 − 2#(𝑓𝑒𝑕𝑓𝑡 𝑝𝑔 𝑣𝑗+1 𝑢𝑝 𝐵𝑗)

5/7/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 18

Node rank i in decreasing PPR score

Conductance 𝝔 𝑩𝒋

Good clusters

slide-19
SLIDE 19

 How to compute Personalized PageRank (PPR)

without touching the whole graph?

▪ Power method won’t work since each single iteration accesses all nodes of the graph: 𝐬(𝐮+𝟐) = 𝛄𝐍 ⋅ 𝐬(𝒖) + 𝟐 − 𝜸 𝒃

▪ 𝒃 is a teleport vector: 𝒃 = 𝟏 … 𝟏 𝟐 𝟏 … 𝟏 𝑼 ▪ 𝒔 is the personalized PageRank vector

 Approximate PageRank [Andersen, Chung, Lang, ‘07]

▪ A fast method for computing approximate Personalized PageRank (PPR) with teleport set ={s} ▪ ApproxPageRank(s, β, ε)

▪ s … seed node ▪ β … teleportation parameter ▪ ε … approximation error parameter

5/7/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 19

At index S

slide-20
SLIDE 20

 Overview of the approximate PPR

▪ Lazy random walk, which is a variant of a random walk that stays put with probability 1/2 at each time step, and walks to a random neighbor the other half of the time: ▪ Keep track of residual PPR score 𝒓𝒗 = 𝒒𝒗 − 𝒔𝒗

(𝒖)

▪ Residual tells us how well PPR score 𝑞𝑣 of 𝒗 is approximated ▪ 𝒒𝒗… is the “true” PageRank of node 𝒗 ▪ 𝒔𝒗

(𝒖)… is PageRank estimate of node 𝑣 at around 𝒖

If residual 𝒓𝒗 of node 𝒗 is too big

𝒓𝒗 𝒆𝒗 ≥ 𝜻 then push the walk

further (distribute some of residual 𝑟𝑣 to all 𝑣’s neighbors along

  • ut-coming edges), else don’t touch the node

5/7/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 20

𝑒𝑗… degree of 𝑗

slide-21
SLIDE 21

 A different way to look at PageRank:

[Jeh&Widom. Scaling Personalized Web Search, 2002]

𝒒𝜸(𝒃) = 𝟐 − 𝜸 𝒃 + 𝜸 𝒒𝜸(𝑵 ⋅ 𝒃)

▪ 𝒒𝜸(𝒃) is the true PageRank vector with teleport parameter 𝜸, and teleport vector 𝒃 ▪ 𝒒𝜸(𝑵 ⋅ 𝒃) is the PageRank vector with teleportation vector 𝑵 ⋅ 𝒃, and teleportation parameter 𝜸

▪ where 𝑵 is the stochastic PageRank transition matrix ▪ Notice: 𝑵 ⋅ 𝒃 is one step of a random walk

5/7/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 21

slide-22
SLIDE 22

 Proving: 𝒒𝜸(𝒃) = 𝟐 − 𝜸 𝒃 + 𝜸 𝒒𝜸(𝑵 ⋅ 𝒃)

▪ We can break this probability into two cases:

▪ Walks of length 0, and ▪ Walks of length longer than 0

▪ The probability of length 0 walk is 𝟐 − 𝜸, and the walk ends where it started, with walker distribution 𝒃 ▪ The probability of walk length >0 is 𝜸, and then the walk starts at distribution 𝒃, takes a step, (so it has distribution 𝑵𝒃), then takes the rest of the random walk with distribution 𝒒𝜸(𝑵𝒃)

▪ Note that we used the memoryless nature of the walk: After we know the location of the second step of the walk has distribution 𝑵𝒃, the rest of the walk can forget where it started and behave as if it started at 𝑵𝒃. This is the key idea of the proof.

5/7/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 22

slide-23
SLIDE 23

 Idea:

▪ 𝒔… approx. PageRank, 𝒓… its residual PageRank ▪ Start with trivial approximation: 𝒔 = 𝟏 and 𝒓 = 𝒃 ▪ Iteratively push PageRank from 𝒓 to 𝒔 until 𝒓 is small

 Push: 1 step of a lazy random walk from node 𝒗:

5/7/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 23

Do 1 step of a walk: Stay at u with prob. ½ Spread remaining ½ fraction of qu as if a single step of random walk were applied to u Update r

𝑸𝒗𝒕𝒊(𝒗,𝒔,𝒓): 𝒔′ = 𝒔, 𝒓′ = 𝒓 𝒔𝒗

′ = 𝒔𝒗 + 𝟐 − 𝜸 𝒓𝒗

𝒓𝒗

′ = 𝟐 𝟑 𝜸𝒓𝒗

for each 𝒘 such that 𝒗 → 𝒘: 𝒓𝒘

′ = 𝒓𝒘 + 𝟐 𝟑 𝜸 𝒓𝒗 𝒆𝒗

return 𝒔′, 𝒓′

residual PPR score 𝒓𝒗 = 𝒒𝒗 − 𝒔𝒗

slide-24
SLIDE 24

 If 𝒓𝒗 is large, this

means that we have underestimated the importance of node 𝒗

 Then we want to take some

  • f that residual (𝒓𝒗) and give

it away, since we know that we have too much of it

 So, we keep

𝟐 𝟑 𝜸𝒓𝒗 and then give away the rest to our

neighbors, so that we can get rid of it

▪ This correspond to the spreading of

𝟐 𝟑 𝜸 𝒓𝒗/𝒆𝒗 term

 Each node wants to keep giving away this excess

PageRank until all nodes have no or a very small gap in excess PageRank

5/7/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 24

𝑸𝒗𝒕𝒊(𝒗,𝒔, 𝒓): 𝒔′ = 𝒔, 𝒓′ = 𝒓 𝒔𝒗

′ = 𝒔𝒗 + 𝟐 − 𝜸 𝒓𝒗

𝒓𝒗

′ = 𝟐 𝟑𝜸𝒓𝒗

for each 𝒘 such that𝒗 → 𝒘: 𝒓𝒘

′ = 𝒓𝒘 + 𝟐 𝟑𝜸 𝒓𝒗 𝒆𝒗

return 𝒔′, 𝒓′

slide-25
SLIDE 25

 ApproxPageRank(S, β, ε):

Set 𝒔 = 0, 𝒓 = [0 . . 0 1 0… 0]

While 𝐧𝐛𝐲

𝒗∈𝑾 𝒓𝒗 𝒆𝒗 ≥ 𝜻:

Choose any vertex 𝒗 where

𝑟𝑣 𝑒𝑣 ≥ 𝜁

𝑸𝒗𝒕𝒊(𝒗, 𝒔, 𝒓): 𝒔′ = 𝒔, 𝒓′ = 𝒓 𝒔𝒗

′ = 𝒔𝒗 + 𝟐 − 𝜸 𝒓𝒗

𝒓𝒗

′ = 𝟐 𝟑 𝜸𝒓𝒗

For each 𝒘 such that 𝒗 → 𝒘: 𝒓𝒘

′ = 𝒓𝒘 + 𝟐 𝟑 𝜸𝒓𝒗/𝒆𝒗

𝒔 = 𝒔′, 𝒓 = 𝒓′ Return 𝒔

5/7/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 25

r … PPR vector ru …PPR score of u q …residual PPR vector qu … residual of node u du … degree of u Update r: Move (1 − 𝛾)

  • f the prob. from qu to ru

1 step of a lazy random walk:

  • Stay at qu with prob. ½
  • Spread remaining ½ 𝜸

fraction of qu as if a single step of random walk were applied to u At index S

slide-26
SLIDE 26

 Runtime:

▪ ApproxPageRank (PageRank-Nibble) computes PPR in time O

𝟐 𝜻 𝟐−𝜸

with residual error ≤ 𝜻

▪ Power method would take time 𝑷(

𝐦𝐩𝐡 𝒐 𝜻(𝟐−𝜸))

 Graph cut approximation guarantee:

▪ If there exists a cut of conductance 𝝔 and volume 𝒍 then the method finds a cut of conductance 𝐏( 𝝔 𝒎𝒑𝒉 𝒍) ▪ Details in [Andersen, Chung, Lang. Local graph partitioning using PageRank vectors, 2007]

http://www.math.ucsd.edu/~fan/wp/localpartfull.pdf

5/7/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 27

slide-27
SLIDE 27

 The smaller the ε the farther the random

walk will spread!

5/7/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 28

Seed node

slide-28
SLIDE 28

5/7/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 29

[Andersen, Lang: Communities from seed sets, 2006]

slide-29
SLIDE 29

 Algorithm summary:

▪ Pick a seed node s of interest ▪ Run PPR with teleport set = {s} ▪ Sort the nodes by the decreasing PPR score ▪ Sweep over the nodes and find good clusters

5/7/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 31

Seed node Node rank in decreasing PPR score

Cluster “quality” (lower is better)

Good clusters

slide-30
SLIDE 30
slide-31
SLIDE 31

 Communities:sets of

tightly connected nodes

 Define: Modularity 𝑹

▪ A measure of how well a network is partitioned into communities ▪ Given a partitioning of the network into groups 𝒕 ∈ 𝑻: Q  ∑s S [ (# edges within group s) – (expected # edges within group s) ]

5/7/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 33

Need a null model!

slide-32
SLIDE 32

 Given real 𝑯 on 𝒐 nodes and 𝒏 edges,

construct rewired network 𝑯’

▪ Same degree distribution but random connections ▪ Consider 𝑯’ as a multigraph ▪ The expected number of edges between nodes 𝒋 and 𝒌 of degrees 𝒍𝒋 and 𝒍𝒌 equals to: 𝒍𝒋 ⋅

𝒍𝒌 𝟑𝒏 = 𝒍𝒋𝒍𝒌 𝟑𝒏

▪ The expected number of edges in (multigraph) G’:

▪ = 𝟐

𝟑σ𝒋∈𝑶 σ𝒌∈𝑶 𝒍𝒋𝒍𝒌 𝟑𝒏 = 𝟐 𝟑 ⋅ 𝟐 𝟑𝒏σ𝒋∈𝑶𝒍𝒋 σ𝒌∈𝑶𝒍𝒌 =

▪ =

𝟐 𝟓𝒏𝟑𝒏 ⋅ 𝟑𝒏 = 𝒏

5/7/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 34

j i

𝑣∈𝑊

𝑙𝑣 = 2𝑛

Note:

slide-33
SLIDE 33

 Modularity of partitioning S of graph G:

▪ Q  ∑s S [ (# edges within group s) – (expected # edges within group s) ] ▪ 𝑹 𝑯, 𝑻 =

𝟐 𝟑𝒏 σ𝒕∈𝑻 σ𝒋∈𝒕 σ𝒌∈𝒕 𝑩𝒋𝒌 − 𝒍𝒋𝒍𝒌 𝟑𝒏

 Modularity values take range [−1,1]

▪ It is positive if the number of edges within groups exceeds the expected number ▪ Q greater than 0.3-0.7 means significant community structure

5/7/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 35

Aij = 1 if i→j, 0 else Normalizing const.: -1<Q<1

slide-34
SLIDE 34

Equivalently modularity can be written as:

5/7/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 36

is an indicator function

Idea: We can identify communities by maximizing modularity

slide-35
SLIDE 35
slide-36
SLIDE 36

 Greedy algorithm for community detection

▪ 𝑃(𝑜 log 𝑜) run time (* observed empirically)

 Supports weighted graphs  Provides hierarchical partitions  Widely utilized to study large networks because:

▪ Fast ▪ Rapid convergence properties ▪ High modularity output (i.e., “better communities”)

5/7/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 38

[Fast unfolding of communities in large networks, Blondel et al. (2008)]

slide-37
SLIDE 37

 Louvain algorithm greedily maximizes modularity  Each pass is made of 2 phases:

▪ Phase 1: Modularity is optimized by allowing only local changes of communities ▪ Phase 2: The identified communities are aggregated in order to build a new network of communities ▪ Goto Phase 1

5/7/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 39

The passes are repeated iteratively until no increase of modularity is possible!

slide-38
SLIDE 38

 Put each node in a graph into a distinct community

(one node per community)

 For each node i, the algorithm performs two

calculations:

▪ Compute the modularity gain (∆𝑅) when putting node 𝑗 from its current community into the community of some neighbor 𝑘 of 𝑗 ▪ Move 𝑗 to a community that yields the largest modularity gain ∆𝑅

 The loop runs until no movement yields a gain

5/7/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 40

This first phase stops when a local maximum of the modularity is attained, i.e., when no individual move can improve the modularity. One should also note that the output of the algorithm depends on the order in which the nodes are

  • considered. Research indicates that the ordering of the nodes does not have a significant influence on the

modularity that is obtained.

slide-39
SLIDE 39

What is 𝚬𝑹 if we move node 𝒋 to community 𝑫?

▪ where:

▪ Σ𝑗𝑜… sum of link weights between nodes in 𝐷 ▪ Σ𝑢𝑝𝑢… sum of all link weights of nodes in 𝐷 ▪

𝑙𝑗,𝑗𝑜 2 … sum of link weights between node 𝑗 and 𝐷

▪ 𝑙𝑗… sum of all link weights (i.e., degree) of node 𝑗

 Also need to derive Δ𝑅 𝐸 → 𝑗

  • f taking

node 𝑗 out of community 𝐸.

 And then: Δ𝑅 = Δ𝑅 𝑗 → 𝐷 + Δ𝑅 𝐸 → 𝑗

5/7/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 41

Δ𝑅 𝑗 → 𝐷

Σ𝑗𝑜: Σ𝑢𝑝𝑢:

slide-40
SLIDE 40

 The partitions obtained in the first phase are

contracted into super-nodes, and the weighted network is created as follows

▪ Super-nodes are connected if there is at least one edge between nodes of the corresponding communities ▪ The weight of the edge between the two super- nodes is the sum of the weights from all edges between their corresponding partitions

 The loop runs until the community

configuration does not change anymore

5/7/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 42

slide-41
SLIDE 41

5/7/2020 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 43