Media Community detection 1 Team 1 (Forest Fire ): - PowerPoint PPT Presentation

Clusters defined by an objective function Finds clusters that minimize or maximize an objective function. – Enumerate all possible ways of dividing the points into clusters and evaluate the `goodness' of each potential set of clusters by using the given objective function. (NP Hard) – Can have global or local objectives. • Hierarchical clustering algorithms typically have local objectives • Partitional algorithms typically have global objectives – A variation of the global objective function approach is to fit the data to a parameterized model . • Parameters for the model are determined from the data. • Mixture models assume that the data is a ‘mixture' of a number of statistical distributions. 44

Clustering Algorithms • K-means • Hierarchical clustering • Density clustering 45

K-means Clustering • Partitional clustering approach • Each cluster is associated with a centroid (center point) • Each point is assigned to the cluster with the closest centroid • Number of clusters, K, must be specified • The basic algorithm is very simple 46

K-means Clustering • Initial centroids are often chosen randomly. – Clusters produced vary from one run to another. • The centroid is (typically) the mean of the points in the cluster. • ‘Closeness’ is measured by Euclidean distance, cosine similarity, correlation, etc. • K-means will converge for common similarity measures mentioned above. • Most of the convergence happens in the first few iterations. – Often the stopping condition is changed to ‘Until relatively few points change clusters’ • Complexity is O( n * K * I * d ) – n = number of points, K = number of clusters, I = number of iterations, d = number of attributes 47

Example Iteration 6 Iteration 3 Iteration 2 Iteration 4 Iteration 5 Iteration 1 3 3 3 3 3 3 2.5 2.5 2.5 2.5 2.5 2.5 2 2 2 2 2 2 1.5 1.5 1.5 1.5 1.5 1.5 y y y y y y 1 1 1 1 1 1 0.5 0.5 0.5 0.5 0.5 0.5 0 0 0 0 0 0 -2 -2 -2 -2 -2 -2 -1.5 -1.5 -1.5 -1.5 -1.5 -1.5 -1 -1 -1 -1 -1 -1 -0.5 -0.5 -0.5 -0.5 -0.5 -0.5 0 0 0 0 0 0 0.5 0.5 0.5 0.5 0.5 0.5 1 1 1 1 1 1 1.5 1.5 1.5 1.5 1.5 1.5 2 2 2 2 2 2 x x x x x x 48

Example Iteration 1 Iteration 2 Iteration 3 3 3 3 2.5 2.5 2.5 2 2 2 1.5 1.5 1.5 y y y 1 1 1 0.5 0.5 0.5 0 0 0 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 x x x Iteration 4 Iteration 5 Iteration 6 3 3 3 2.5 2.5 2.5 2 2 2 1.5 1.5 1.5 y y y 1 1 1 0.5 0.5 0.5 0 0 0 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 x x x 49

Two different K-means clusterings 3 2.5 Original Points 2 1.5 y 1 0.5 0 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 x 3 3 2.5 2.5 2 2 1.5 y 1.5 1 y 1 0.5 0.5 0 0 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 x -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 Optimal Clustering x Sub-optimal Clustering Importance of choosing initial points 50

K-means Clusters • Most common measure is Sum of Squared Error (SSE) – For each point, the error is the distance to the nearest cluster – To get SSE, we square these errors and sum them. K   2 ( , ) SSE dist m x i   1 i x C i – x is a data point in cluster C i and m i is the representative point for cluster C i • can show that m i corresponds to the center (mean) of the cluster – Given two clusters, we can choose the one with the smallest error – One easy way to reduce SSE is to increase K, the number of clusters • A good clustering with smaller K can have a lower SSE than a poor clustering with higher K 51

Limitations of K-means • K-means has problems when clusters are of differing – Sizes – Densities – Non-globular shapes • K-means has problems when the data contains outliers. 52

Pre-processing and Post-processing • Pre-processing – Normalize the data – Eliminate outliers • Post-processing – Eliminate small clusters that may represent outliers – Split ‘loose’ clusters, i.e., clusters with relatively high SSE – Merge clusters that are ‘close’ and that have relatively low SSE – Can use these steps during the clustering process 53

Hierarchical Clustering • Two main types of hierarchical clustering – Agglomerative: • Start with the points (vertices) as individual clusters • At each step, merge the closest pair of clusters until only one cluster (or k clusters) left – Divisive: • Start with one, all-inclusive cluster (the whole graph) • At each step, split a cluster until each cluster contains a point (vertex) (or there are k clusters) • Traditional hierarchical algorithms use a similarity or distance matrix – Merge or split one cluster at a time 54

Strengths of Hierarchical Clustering • Do not have to assume any particular number of clusters – Any desired number of clusters can be obtained by ‘cutting’ the dendogram at the proper level • They may correspond to meaningful taxonomies – Example in biological sciences (e.g., animal kingdom, phylogeny reconstruction, …) 55

Agglomerative Clustering Algorithm • Popular hierarchical clustering technique • Basic algorithm is straightforward 1. [Compute the proximity matrix] 2. Let each data point be a cluster 3. Repeat 4. Merge the two closest clusters 5. [Update the proximity matrix] 6. Until only a single cluster remains • Key operation is the computation of the proximity of two clusters – Different approaches to defining the distance between clusters distinguish the different algorithms 56

How to Define Inter-Cluster Similarity p1 p2 p3 p4 p5 . . . p1 p2 p3 p4 p5 Similarity? . . . Proximity Matrix 57

How to Define Inter-Cluster Similarity p1 p2 p3 p4 p5 . . . p1 p2 p3 p4 p5 MIN or single link . based on the two most similar (closest) . points in the different clusters . Proximity Matrix (sensitive to outliers) 58

How to Define Inter-Cluster Similarity p1 p2 p3 p4 p5 . . . p1 p2 p3 p4 p5 MAX or complete linkage . Similarity of two clusters is based on . the two least similar (most distant) . Proximity Matrix points in the different clusters (Tends to break large clusters Biased towards globular clusters) 59

How to Define Inter-Cluster Similarity p1 p2 p3 p4 p5 . . . p1 p2 p3 p4 p5 . Group Average . Proximity of two clusters is the average of . Proximity Matrix pairwise proximity between points in the two clusters. 60

How to Define Inter-Cluster Similarity p1 p2 p3 p4 p5 . . . p1   p2 p3 p4 p5 Distance Between Centroids . . . Proximity Matrix 61

Cluster Similarity: Ward’s Method • Similarity of two clusters is based on the increase in squared error when two clusters are merged – Similar to group average if distance between points is distance squared • Less susceptible to noise and outliers • Biased towards globular clusters • Hierarchical analogue of K-means – Can be used to initialize K-means 62

Example of a Hierarchically Structured Graph 63

Graph Partitioning  Divisive methods: try to identify and remove the “spanning links” between densely-connected regions  Agglomerative methods: Find nodes that are likely to belong to the same region and merge them together (bottom-up) 64

The Girvan Newman method Hierarchical divisive method  Start with the whole graph  Find edges whose removal “partitions” the graph  Repeat with each subgraph until single vertices Which edge? 65

The Girvan Newman method Use bridges or cut-edge (if removed, the nodes become disconnected) Which one to choose? 66

The Girvan Newman method There may be none! 67

Strength of Weak Ties • Edge betweenness: Number of shortest paths passing over the edge • Intuition: Edge strengths (call volume) Edge betweenness in a real network in a real network 68

Edge Betweenness Betweenness of an edge (a, b): number of pairs of nodes x and y such that the edge (a, b) lies on the shortest path between x and y - since there can be several such shortest paths edge (a, b) is credited with the fraction of those shortest paths that include (a, b). # _ ( , ) ( , )  shortest paths x y through a b  bt ( , ) a b # _ ( , ) shortest paths x y , x y 3x11 = 33 1x12 = 12 b=16 b=7.5 1 7x7 = 49 Edges that have a high probability to occur on a randomly chosen shortest path between two randomly chosen nodes have a high betweenness. Traffic (unit of flow) 69

[Girvan- Newman ‘02] The Girvan Newman method » Undirected unweighted networks – Repeat until no edges are left : • Calculate betweenness of edges • Remove edges with highest betweenness – Connected components are communities – Gives a hierarchical decomposition of the network 70

Girvan Newman method: An example Betweenness(7, 8)= 7x7 = 49 Betweenness(1, 3) = 1X12=12 Betweenness(3, 7)=Betweenness(6, 7)=Betweenness(8, 9) = Betweenness(8, 12)= 3X11=33 71

Girvan-Newman: Example 12 1 33 49 Need to re-compute betweenness at every step 72

Girvan Newman method: An example Betweenness(1, 3) = 1X5=5 Betweenness(3,7)=Betweenness(6,7)=Betweenness(8,9) = Betweenness(8,12)= 3X4=12 73

Girvan Newman method: An example Betweenness of every edge = 1 74

Girvan Newman method: An example 75

Girvan-Newman: Example Step 1: Step 2: Hierarchical network decomposition: Step 3: 76

Another example 5X5=25 77

Another example 5X6=30 5X6=30 78

Another example 79

Girvan-Newman: Results • Zachary’s Karate club: Hierarchical decomposition 80

Girvan-Newman: Results Communities in physics collaborations 81

How to Compute Betweenness? • Want to compute betweenness of paths starting at node 𝐵 82

Computing Betweenness 1.Perform a BFS starting from A 2.Determine the number of shortest path from A to each other node 3.Based on these numbers, determine the amount of flow from A to all other nodes that uses each edge 83

Computing Betweenness: step 1 Initial network BFS on A 84

Computing Betweenness: step 2 Count how many shortest paths from A to a specific node Level 1 Level 2 Level 3 Level 4 Top-down 85

Computing Betweenness: step 3 Compute betweenness by working up the tree: If there are multiple paths count them fractionally For each edge e : calculate the sum over all nodes Y of the fraction of shortest paths from the root A to Y that go through e. Each edge (X, Y) participates in the shortest-paths from the root to Y and to nodes (at levels) below Y -> Bottom up calculation 86

Computing Betweenness: step 3  | ( , ) through | shortest path X Y e   Count the flow through each ( ) credit e | _ ( , )} | shortest path X Y , X Y edge Portion of the shortest paths to I that go through (F, I) = 2/3 1/3+(1/3)1/2 = 1/2 + Portion of the shortest paths to K that go through (F, I) (1/2)(2/3) = 1/3 = 1 Portion of the shortest paths to K that go through (I, K) = 3/6 = 1/2 87

Computing Betweenness: step 3 The algorithm: • Add edge flows : -- node flow = 1+∑child edges 1+1 paths to H -- split the flow up Split evenly based on the parent value 1+0.5 paths to J • Repeat the BFS Split 1:2 procedure for each starting node 𝑉 1 path to K. Split evenly 88

Computing Betweenness: step 3 (X, Y) p X X Y p Y .. . Y 1 Y m   / ( / ) ( , ) p p p p flow Y Y ( , ) flow X Y i X Y X Y Y childofY i 89

Computing Betweenness Repeat the process for all nodes Sum over all BFSs 90

Example 91

Example 92

Computing Betweenness Issues  Test for connectivity?  Re-compute all paths, or only those affected  Parallel computation  Sampling 93

Outline PART I 1. Introduction: what, why, types? 2. Cliques and vertex similarity 3. Background: Cluster analysis 4. Hierarchical clustering (betweenness) 5. Modularity 6. How to evaluate 94

Modularity • Communities : sets of tightly connected nodes • Define: Modularity 𝑹 – A measure of how well a network is partitioned into communities – Given a partitioning of the network into groups 𝑡  𝑇 : Q  ∑ s  S [ (# edges within group s ) – (expected # edges within group s ) ] Need a null model! a copy of the original graph keeping some of its structural properties but without community structure 95

Null Model: Configuration Model • Given real 𝐻 on 𝑜 nodes and 𝑛 edges, construct rewired network 𝐻’ – Same degree distribution but random connections i – Consider 𝑯’ as a multigraph j – The expected number of edges between nodes 𝒆 𝒌 𝒆 𝒋 𝒆 𝒌 𝑗 and 𝑘 of degrees 𝒆 𝒋 and 𝒆 𝒌 equals to: 𝒆 𝒋 ⋅ 𝟑𝒏 = 𝟑𝒏 For any edge going out of i randomly, the probability of this 𝒆 𝒌 edge getting connected to node j is 𝟑𝒏 Note: Because the degree for i is d i , we have d i number of such edges 𝑒 𝑣 = 2𝑛 𝑣∈𝑂 96

Null Model: Configuration Model i j • The expected number of edges in (multigraph) G’ : 𝒆 𝒋 𝒆 𝒌 𝟐 𝟐 𝟐 𝟑 𝟑𝒏 𝒆 𝒋 – = = 𝟑 ⋅ 𝒆 𝒌 = 𝒋∈𝑶 𝒌∈𝑶 𝒋∈𝑶 𝒌∈𝑶 𝟑𝒏 𝟐 – = 𝟓𝒏 𝟑𝒏 ⋅ 𝟑𝒏 = 𝒏 Note: 𝑙 𝑣 = 2𝑛 𝑣∈𝑂 97

Modularity • Modularity of partitioning S of graph G: – Q  ∑ s  S [ (# edges within group s ) – (expected # edges within group s ) ] 𝑒 𝑗 𝑒 𝑘 1 2𝑛 – 𝑅 𝐻, 𝑇 = 𝐵 𝑗𝑘 − 𝑡∈𝑇 𝑗∈𝑡 𝑘∈𝑡 2𝑛 A ij = 1 if i  j, 0 else Normalizing cost.: -1<Q<1 • Modularity values take range [−1 , 1] – It is positive if the number of edges within groups exceeds the expected number – 0.3-0.7 < Q means significant community structure 98

Modularity Greedy method of Newman (one of the many ways to use modularity) Agglomerative hierarchical clustering method 1. Start with a state in which each vertex is the sole member of one of n communities 2. Repeatedly join communities together in pairs, choosing at each step the join that results in the greatest increase (or smallest decrease) in Q. Since the joining of a pair of communities between which there are no edges can never result in an increase in modularity, we need only consider those pairs between which there are edges , of which there will at any time be at most m 99

Modularity: Number of clusters • Modularity is useful for selecting the number of clusters: Q 100

Media Community detection 1 Team 1 (Forest Fire ): - PowerPoint PPT Presentation

Online Social Networks and Media Community detection 1 Team 1 (Forest Fire ): , , Team 2 (Kronecker graph ):

Presentation 1 What is social media? Get Media Smart social media 2 What is social media?

New Media Production 2 MUMT 303 Week 1 Sven-Amin Lembke What is new media? What is OLD media?

Media 101 Presented by: Elements of a Media Campaign: Overview Positioning Media strategy

Social Media Legal Issues Brian C. England Deputy City Attorney Garland, Texas March 7, 2018

Social Media for Mason AGENDA What is Social Media Social Media Strategy Content

Law and the Media Media lies, war propaganda and manipulation JRN 6205 Media Ethics and Law By

What is your definition of media literacy? 1. Radical media education 2. Ideology in media 3.

MEDIA TRAINING Media Outreach and Social Media INTRODUCTIONS Media Outreach Best Practices

We (Are Still) the Media Dan Gillmor Arizona State University Media Shift: A Brief History

Presentation 2 Why is there advertising on social media? Get Media Smart social media 2

Chart 1: Children s Media Use s Media Use Chart 1: Children Chart 1: Childrens Media

CRISIS COMMUNICATION The Social Media Impact May 10, 2011 MEDIA AS A FULL SPECTRUM MONITORING

All Media ADS About All Media ADS All Media ADS offers Internet advertising that provides

Social Media Week BEIRUT Social Media versus Traditional Media; The contradictory results of the

Media Fragmentation The Impact, Data and What You Can Do About It Introduction The Impact Media

Social Media donts What is social media Social media is nothing new Just an extension

Student r reflection on on g grammar correct ection on in rewr writing practice: ce: a d

Government 317: Campaigns and Elections Fall 2006 Tuesday and Thursday 2:554:10 (GS KAU)

"Political Losers as a Barrier to Economic Development" Daron Acemoglu and James

Causal mechanisms and case studies Gary Goertz Kroc Institute for International Peace Studies

An ontology for digital graphematjcs and philology Die (hyper-)diplomatjsche Transkriptjon und

Are computer-generated emotions and moods plausible to humans? The 6th International Conference on

Generation for Positive Emotion Elicitation Nuru Nurul Fithri ria Lubi ubis Augm ugmen ented

the Classroom By Jennifer Wilkes Fortis College - Smyrna EQ in the Classroom What is EQ?

Sambuz

Useful Links

Newsletter

Mail Us

Media Community detection 1 Team 1 (Forest Fire ): - PowerPoint PPT Presentation

Online Social Networks and Media Community detection 1 Team 1 (Forest Fire ): , , Team 2 (Kronecker graph ):

Presentation 1 What is social media? Get Media Smart social media 2 What is social media?

New Media Production 2 MUMT 303 Week 1 Sven-Amin Lembke What is new media? What is OLD media?

Media 101 Presented by: Elements of a Media Campaign: Overview Positioning Media strategy

Social Media Legal Issues Brian C. England Deputy City Attorney Garland, Texas March 7, 2018

Social Media for Mason AGENDA What is Social Media Social Media Strategy Content

Law and the Media Media lies, war propaganda and manipulation JRN 6205 Media Ethics and Law By

What is your definition of media literacy? 1. Radical media education 2. Ideology in media 3.

MEDIA TRAINING Media Outreach and Social Media INTRODUCTIONS Media Outreach Best Practices

We (Are Still) the Media Dan Gillmor Arizona State University Media Shift: A Brief History

Presentation 2 Why is there advertising on social media? Get Media Smart social media 2

Chart 1: Children s Media Use s Media Use Chart 1: Children Chart 1: Childrens Media

CRISIS COMMUNICATION The Social Media Impact May 10, 2011 MEDIA AS A FULL SPECTRUM MONITORING

All Media ADS About All Media ADS All Media ADS offers Internet advertising that provides

Social Media Week BEIRUT Social Media versus Traditional Media; The contradictory results of the

Media Fragmentation The Impact, Data and What You Can Do About It Introduction The Impact Media

Social Media donts What is social media Social media is nothing new Just an extension

Student r reflection on on g grammar correct ection on in rewr writing practice: ce: a d

Government 317: Campaigns and Elections Fall 2006 Tuesday and Thursday 2:554:10 (GS KAU)

&quot;Political Losers as a Barrier to Economic Development&quot; Daron Acemoglu and James

Causal mechanisms and case studies Gary Goertz Kroc Institute for International Peace Studies

An ontology for digital graphematjcs and philology Die (hyper-)diplomatjsche Transkriptjon und

Are computer-generated emotions and moods plausible to humans? The 6th International Conference on

Generation for Positive Emotion Elicitation Nuru Nurul Fithri ria Lubi ubis Augm ugmen ented

the Classroom By Jennifer Wilkes Fortis College - Smyrna EQ in the Classroom What is EQ?

Sambuz

Useful Links

Newsletter

Mail Us

"Political Losers as a Barrier to Economic Development" Daron Acemoglu and James