Sublinear Algorithms for Big Data Qin Zhang 1-1 Part 2: Sublinear - PowerPoint PPT Presentation

Sublinear Algorithms for Big Data Qin Zhang 1-1

Part 2: Sublinear in Communication 2-1

Sublinear in communication x 2 = 111011 x 1 = 010011 The model x 3 = 111111 x k = 100011 They want to jointly compute f ( x 1 , x 2 , . . . , x k ) Goal: minimize total bits of communication Applicaitons etc. 3-1

A natrual approach The model x 2 = 111011 x 1 = 010011 Coordinator x 3 = 111111 = C x k = 100011 S 1 S 2 S 3 S k · · · They want to jointly compute f ( x 1 , x 2 , . . . , x k ) Goal: minimize total bits of communication The natural approach Each S i computes a skech of its input sk ( S i ) and send it to C , and then C computes f ( x 1 , . . . , x k ) based on sk ( S 1 ) , . . . , sk ( S k ) The slides from next page are borrowed from Andrew McGregor 4-1

I. Connectivity II. k -Connectivity III. Min-Cut

II. k -Connectivity III. Min-Cut I. Connectivity Theorem: Testing Connectivity a) Dynamic Graph Stream: O(n polylog n) space. b) Simultaneous Messages: O(polylog n) length.

Ingredient 1: Basic Algorithm

Ingredient 1: Basic Algorithm Algorithm (Spanning Forest):

Ingredient 1: Basic Algorithm Algorithm (Spanning Forest): 1. For each node: pick incident edge

Ingredient 1: Basic Algorithm Algorithm (Spanning Forest): 1. For each node: pick incident edge 2.For each connected comp: pick incident edge

Ingredient 1: Basic Algorithm Algorithm (Spanning Forest): 1. For each node: pick incident edge 2.For each connected comp: pick incident edge 3.Repeat until no edges between connected comp.

Ingredient 1: Basic Algorithm Algorithm (Spanning Forest): 1. For each node: pick incident edge 2.For each connected comp: pick incident edge 3.Repeat until no edges between connected comp. Lemma: After O(log n) rounds selected edges include spanning forest.

Ingredient 2: Sketching Neighborhoods

Ingredient 2: Sketching Neighborhoods For node i, let a i be vector indexed by node pairs. Non-zero entries: a i [i,j]=1 if j>i and a i [i,j]=-1 if j<i. � 1 {1,2} {1,3} {1,4} {1,5} {2,3} {2,4} {2,5} {3,4} {3,5} {4,5} 0 � 1 0 0 0 0 0 0 0 2 5 a 1 = � − 1 0 � 0 0 0 1 0 0 0 0 a 2 = 1 3 4

Ingredient 2: Sketching Neighborhoods For node i, let a i be vector indexed by node pairs. Non-zero entries: a i [i,j]=1 if j>i and a i [i,j]=-1 if j<i. � 1 {1,2} {1,3} {1,4} {1,5} {2,3} {2,4} {2,5} {3,4} {3,5} {4,5} 0 � 1 0 0 0 0 0 0 0 2 5 a 1 = � − 1 0 � 0 0 0 1 0 0 0 0 a 2 = 1 � 0 0 � 1 0 0 1 0 0 0 0 a 1 + a 2 = 3 4

Ingredient 2: Sketching Neighborhoods For node i, let a i be vector indexed by node pairs. Non-zero entries: a i [i,j]=1 if j>i and a i [i,j]=-1 if j<i. � 1 {1,2} {1,3} {1,4} {1,5} {2,3} {2,4} {2,5} {3,4} {3,5} {4,5} 0 � 1 0 0 0 0 0 0 0 2 5 a 1 = � − 1 0 � 0 0 0 1 0 0 0 0 a 2 = 1 � 0 0 � 1 0 0 1 0 0 0 0 a 1 + a 2 = 3 4 Lemma: For any subset of nodes S ⊂ V , � support ( a i ) = E ( S , V \ S ) i ∈ S

Ingredient 2: Sketching Neighborhoods For node i, let a i be vector indexed by node pairs. Non-zero entries: a i [i,j]=1 if j>i and a i [i,j]=-1 if j<i. � 1 {1,2} {1,3} {1,4} {1,5} {2,3} {2,4} {2,5} {3,4} {3,5} {4,5} 0 � 1 0 0 0 0 0 0 0 2 5 a 1 = � − 1 0 � 0 0 0 1 0 0 0 0 a 2 = 1 � 0 0 � 1 0 0 1 0 0 0 0 a 1 + a 2 = 3 4 Lemma: For any subset of nodes S ⊂ V , � support ( a i ) = E ( S , V \ S ) i ∈ S Lemma: ∃ random M: � N → � k with k=O(polylog N) such that for any a ∈ � N , with high probability → e ∈ support( a ) M a −

Recipe: Sketch & Compute on Sketches

Recipe: Sketch & Compute on Sketches Sketch: Each player sends Ma j

Recipe: Sketch & Compute on Sketches Sketch: Each player sends Ma j Central Player Runs Algorithm in Sketch Space:

Recipe: Sketch & Compute on Sketches Sketch: Each player sends Ma j Central Player Runs Algorithm in Sketch Space: Use Ma j to get incident edge on each node j

Recipe: Sketch & Compute on Sketches Sketch: Each player sends Ma j Central Player Runs Algorithm in Sketch Space: Use Ma j to get incident edge on each node j For i=2 to log n: To get incident edge on component S ⊂ V use:

Recipe: Sketch & Compute on Sketches Sketch: Each player sends Ma j Central Player Runs Algorithm in Sketch Space: Use Ma j to get incident edge on each node j For i=2 to log n: To get incident edge on component S ⊂ V use: � � M a j = M ( a j ) j ∈ S j ∈ S

Recipe: Sketch & Compute on Sketches Sketch: Each player sends Ma j Central Player Runs Algorithm in Sketch Space: Use Ma j to get incident edge on each node j For i=2 to log n: To get incident edge on component S ⊂ V use: � � � → e ∈ support( a j ) = E ( S , V \ S ) M a j = M ( a j ) − j ∈ S j ∈ S j ∈ S

Recipe: Sketch & Compute on Sketches Sketch: Each player sends Ma j Central Player Runs Algorithm in Sketch Space: Use Ma j to get incident edge on each node j For i=2 to log n: To get incident edge on component S ⊂ V use: � � � → e ∈ support( a j ) = E ( S , V \ S ) M a j = M ( a j ) − j ∈ S j ∈ S j ∈ S Detail: Actually each player sends log n indept sketches M 1 a j , M 2 a j , ... and central player uses M i a j when emulating i th iteration of the algorithm.

I. Connectivity III. Min-Cut II. k -Connectivity Theorem: Checking every cut has size ≥ k a) Dynamic Graph Stream: O(n k polylog n) space. b) Simultaneous Messages: O(k polylog n) length.

Ingredient 1: Basic Algorithm

Ingredient 1: Basic Algorithm Algorithm (k-Connectivity):

Ingredient 1: Basic Algorithm Algorithm (k-Connectivity): 1. Let F 1 be spanning forest of G(V ,E)

Ingredient 1: Basic Algorithm Algorithm (k-Connectivity): 1. Let F 1 be spanning forest of G(V ,E) 2.For i=2 to k: 2.1. Let F i be spanning forest of G(V ,E-F 1 -...-F i-1 )

Ingredient 1: Basic Algorithm Algorithm (k-Connectivity): 1. Let F 1 be spanning forest of G(V ,E) 2.For i=2 to k: 2.1. Let F i be spanning forest of G(V ,E-F 1 -...-F i-1 ) Lemma: G(V ,F 1 +...+F k ) is k-connected iff G(V ,E) is.

Ingredient 2: Connectivity Sketches

Ingredient 2: Connectivity Sketches Sketch: Simultaneously construct k independent connectivity sketches {M 1 G, M 2 G, ... M k G}.

Ingredient 2: Connectivity Sketches Sketch: Simultaneously construct k independent connectivity sketches {M 1 G, M 2 G, ... M k G}. Run Algorithm in Sketch Space: Use M 1 G to find a spanning forest F 1 of G

Ingredient 2: Connectivity Sketches Sketch: Simultaneously construct k independent connectivity sketches {M 1 G, M 2 G, ... M k G}. Run Algorithm in Sketch Space: Use M 1 G to find a spanning forest F 1 of G Use M 2 G-M 2 F 1 =M 2 (G-F 1 ) to find F 2

Ingredient 2: Connectivity Sketches Sketch: Simultaneously construct k independent connectivity sketches {M 1 G, M 2 G, ... M k G}. Run Algorithm in Sketch Space: Use M 1 G to find a spanning forest F 1 of G Use M 2 G-M 2 F 1 =M 2 (G-F 1 ) to find F 2 Use M 3 G-M 3 F 1 -M 3 F 2 =M 3 (G-F 1 -F 2 ) to find F 3

Ingredient 2: Connectivity Sketches Sketch: Simultaneously construct k independent connectivity sketches {M 1 G, M 2 G, ... M k G}. Run Algorithm in Sketch Space: Use M 1 G to find a spanning forest F 1 of G Use M 2 G-M 2 F 1 =M 2 (G-F 1 ) to find F 2 Use M 3 G-M 3 F 1 -M 3 F 2 =M 3 (G-F 1 -F 2 ) to find F 3 etc.

I. Connectivity II. k -Connectivity III. Min-Cut Theorem: (1+ % )-approximating minimum cut a) Dynamic Graph Stream: O( % -2 n polylog n) space. b) Simultaneous Messages: O( % -2 polylog n) length.

Ingredient 1: Subsampling

Ingredient 1: Subsampling Lemma (Karger): Define subgraph G i by sampling edges w/p 2 -i . Then Min-Cut( G ) = (1 ± ǫ ) · 2 i · Min-Cut( G i ) if i < − log p ∗

Ingredient 1: Subsampling Lemma (Karger): Define subgraph G i by sampling edges w/p 2 -i . Then Min-Cut( G ) = (1 ± ǫ ) · 2 i · Min-Cut( G i ) if i < − log p ∗ p ∗ = 6 ǫ − 2 log n / Min-Cut(G) where

Ingredient 1: Subsampling Lemma (Karger): Define subgraph G i by sampling edges w/p 2 -i . Then Min-Cut( G ) = (1 ± ǫ ) · 2 i · Min-Cut( G i ) if i < − log p ∗ p ∗ = 6 ǫ − 2 log n / Min-Cut(G) where G=G 0

Sublinear Algorithms for Big Data Qin Zhang 1-1 Part 2: Sublinear - PowerPoint PPT Presentation

Sublinear Algorithms for Big Data Qin Zhang 1-1 Part 2: Sublinear in Communication 2-1 Sublinear in communication x 2 = 111011 x 1 = 010011 The model x 3 = 111111 x k = 100011 They want to jointly compute f ( x 1 , x 2 , . . . , x k ) Goal:

Sublinear Algorithms for Big Data Qin Zhang 1-1 Part 3: Sublinear in Time 2-1 Sublinear in

Random Local Exploration Techniques for Sublinear-Time Algorithms Krzysztof Onak IBM Research

B669 Sublinear Algorithms for Big Data Qin Zhang 1-1 Part 1: Sublinear in Space 2-1 The model

B669 Sublinear Algorithms for Big Data Qin Zhang 1-1 Part 1: Sublinear in Space 2-1 The model

Sublinear Geometric Algorithms Sublinear Geometric Algorithms B. Chazelle, D. Liu, A. Magen B.

Sublinear Algorithms for ( + 1) Vertex Coloring Sepehr Assadi University of Pennsylvania

L ECTURE 2 Last time Introduction Basic models for sublinear-time computation Simple

Big Data Algorithms with Medical Applications Yixin Chen Outline Challenges to big data

Machine Learning Anders Holst SICS Big Data Analytics Analysis Big Data Big Value Big Data

Sublinear Algorithms Lecture 5 Sofya Raskhodnikova Penn State University Thanks to Madhav Jha

L ECTURE 6 Last time Limitations of sublinear-time algorithms Yaos Minimax Principle

Sublinear Algorithms Lectures 1 and 2 Sofya Raskhodnikova Penn State University 1 Tentative

Sublinear Algorithms Lecture 1 Sofya Raskhodnikova Boston University 1 Organizational Course

Sublinear Algorithms for Big Data Part 4: Random Topics Qin Zhang 1-1 Topic 3: Random sampling

Stream Algorithmics Albert Bifet March 2012 Data Streams Big Data & Real Time Data Streams

B669 Sublinear Algorithms for Big Data Qin Zhang 1-1 An overview of problems 2-1 Statistics

Image Segmentaon Using Min-Cut Problem: automacally classify

Billiards and the arithmetic of ! ! 0 1 1 2 cos /q S = T = . and non-arithmetic groups

From p -adic to Artin representations: a story in three vignettes Henri Darmon Montr eal,

Bernstein centre for enhanced Langlands parameters Ahmed Moussaoui University of Calgary

Summer School on Fair Division (FairDiv-2015): Tutorial on Cake Cutting Ulle Endriss Institute

Cake-Cutting Indivisible Goods [Some illustrations due to: Ariel Procaccia] CSC2556 - Nisarg

Lecture 6 Fair Division 1: Cake-Cutting [Some illustrations due to: Ariel Procaccia] CSC2556 -

Spectral Clustering Lecture 16 David Sontag New York

Sublinear Algorithms for Big Data Qin Zhang 1-1 Part 2: Sublinear - PowerPoint PPT Presentation

Sublinear Algorithms for Big Data Qin Zhang 1-1 Part 2: Sublinear in Communication 2-1 Sublinear in communication x 2 = 111011 x 1 = 010011 The model x 3 = 111111 x k = 100011 They want to jointly compute f ( x 1 , x 2 , . . . , x k ) Goal:

Sublinear Algorithms for Big Data Qin Zhang 1-1 Part 3: Sublinear in Time 2-1 Sublinear in

Random Local Exploration Techniques for Sublinear-Time Algorithms Krzysztof Onak IBM Research

B669 Sublinear Algorithms for Big Data Qin Zhang 1-1 Part 1: Sublinear in Space 2-1 The model

B669 Sublinear Algorithms for Big Data Qin Zhang 1-1 Part 1: Sublinear in Space 2-1 The model

Sublinear Geometric Algorithms Sublinear Geometric Algorithms B. Chazelle, D. Liu, A. Magen B.

Sublinear Algorithms for ( + 1) Vertex Coloring Sepehr Assadi University of Pennsylvania

L ECTURE 2 Last time Introduction Basic models for sublinear-time computation Simple

Big Data Algorithms with Medical Applications Yixin Chen Outline Challenges to big data

Machine Learning Anders Holst SICS Big Data Analytics Analysis Big Data Big Value Big Data

Sublinear Algorithms Lecture 5 Sofya Raskhodnikova Penn State University Thanks to Madhav Jha

L ECTURE 6 Last time Limitations of sublinear-time algorithms Yaos Minimax Principle

Sublinear Algorithms Lectures 1 and 2 Sofya Raskhodnikova Penn State University 1 Tentative

Sublinear Algorithms Lecture 1 Sofya Raskhodnikova Boston University 1 Organizational Course

Sublinear Algorithms for Big Data Part 4: Random Topics Qin Zhang 1-1 Topic 3: Random sampling

Stream Algorithmics Albert Bifet March 2012 Data Streams Big Data &amp; Real Time Data Streams

B669 Sublinear Algorithms for Big Data Qin Zhang 1-1 An overview of problems 2-1 Statistics

Image Segmenta*on Using Min-Cut Problem: automa*cally classify

Billiards and the arithmetic of ! ! 0 1 1 2 cos /q S = T = . and non-arithmetic groups

From p -adic to Artin representations: a story in three vignettes Henri Darmon Montr eal,

Bernstein centre for enhanced Langlands parameters Ahmed Moussaoui University of Calgary

Summer School on Fair Division (FairDiv-2015): Tutorial on Cake Cutting Ulle Endriss Institute

Cake-Cutting Indivisible Goods [Some illustrations due to: Ariel Procaccia] CSC2556 - Nisarg

Lecture 6 Fair Division 1: Cake-Cutting [Some illustrations due to: Ariel Procaccia] CSC2556 -

Spectral Clustering Lecture 16 David Sontag New York

Stream Algorithmics Albert Bifet March 2012 Data Streams Big Data & Real Time Data Streams

Image Segmentaon Using Min-Cut Problem: automacally classify