[PPT] - Sublinear Algorithms Lecture 3 Sofya Raskhodnikova Penn State PowerPoint Presentation

SLIDE 1

1

Sublinear Algorithms

Lecture 3

Sofya Raskhodnikova

Penn State University

SLIDE 2

Graph Properties

SLIDE 3

Testing if a Graph is Connected [Goldreich Ron]

Input: a graph 𝐻 = (𝑊, 𝐹) on 𝑜 vertices

in adjacency lists representation

(a list of neighbors for each vertex)

maximum degree d, i.e., adjacency lists of length d with some empty entries

Query (𝑤, 𝑗), where 𝑤 ∈ 𝑊 and 𝑗 ∈ [𝑒]: entry 𝑗 of adjacency list of vertex 𝑤 Exact Answer: W(dn) time

Approximate version:

Is the graph connected or ²-far from connected? dist 𝐻1, 𝐻2 =

# 𝑝𝑔 𝑓𝑜𝑢𝑗𝑠𝑓𝑡 𝑗𝑜 𝑏𝑒𝑘𝑏𝑑𝑓𝑜𝑑𝑧 𝑚𝑗𝑡𝑢𝑡 𝑝𝑜 𝑥ℎ𝑗𝑑ℎ 𝐻1 𝑏𝑜𝑒 𝐻2 𝑒𝑗𝑔𝑔𝑓𝑠 𝑒𝑜

Time: 𝑃

1 𝜁2𝑒 today

+ improvement on HW

No dependence on n!

3

SLIDE 4

Testing Connectedness: Algorithm

1. Repeat s=16/ed times: 2. pick a random vertex 𝑣 3. determine if connected component of 𝑣 is small: perform BFS from 𝑣, stopping after at most 8/ed new nodes 4. Reject if a small connected component was found, otherwise accept. Run time: O(𝑒/e2𝑒2)=O(1/e2𝑒) Analysis:

Connected graphs are always accepted.
Remains to show:

If a graph is ²-far from connected, it is rejected with probability ≥

2 3

4

Connectedness Tester(G, d, ε)

SLIDE 5

Testing Connectedness: Analysis

If Claim 2 holds, at least e𝑒𝑜

8 nodes are in small connected components.

By Witness lemma, it suffices to sample

2⋅8

e𝑒𝑜/𝑜 =

16

e𝑒 nodes to detect one from a small connected component.

5

Claim 1

If G is e-far from connected, it has ≥ e𝑒𝑜

4 connected components.

Claim 2

If G is e-far from connected, it has ≥ e𝑒𝑜

8 connected components

f size at most 8/ed.

SLIDE 6

Testing Connectedness: Proof of Claim 1

We prove the contrapositive: If G has < e𝑒𝑜

4 connected components, one can make G connected by

modifying < e fraction of its representation, i.e., < e𝑒𝑜 entries.

If there are no degree restrictions, k components can be connected by

adding k-1 edges, each affecting 2 nodes. Here, k < e𝑒𝑜

4 , so 2k-2 < e𝑒𝑜 .

What if adjacency lists of all vertices in a component are full,

i.e., all vertex degrees are d?

6

Claim 1

If G is e-far from connected, it has ≥ e𝑒𝑜

4 connected components.

SLIDE 7

Freeing up an Adjacency List Entry

What if adjacency lists of all vertices in a component are full, i.e., all vertex degrees are d?

Consider an MST of this component.
Let 𝑤 be a leaf of the MST.
Disconnect 𝑤 from a node other than its parent in the MST.
Two entries are changed while keeping the same number of components.
Thus, k components can be connected by adding 2k-1 edges, each affecting

2 nodes. Here, k < e𝑒𝑜

4 , so 4k-2 < e𝑒𝑜 .

7

𝑤

Claim 1

If G is e-far from connected, it has ≥ e𝑒𝑜

4 connected components.

SLIDE 8

Testing Connectedness: Proof of Claim 2

If Claim 1 holds, there are at least e𝑒𝑜

4 connected components.

Their average size ≤

𝑜

e𝑒𝑜/4 =

4

e𝑜.

By an averaging argument (or Markov inequality), at least half of the

components are of size at most twice the average.

8

Claim 1

If G is e-far from connected, it has ≥ e𝑒𝑜

4 connected components.

Claim 2

If G is e-far from connected, it has ≥ e𝑒𝑜

8 connected components

f size at most 8/ed.

SLIDE 9

Testing if a Graph is Connected [Goldreich Ron]

9

Input: a graph 𝐻 = (𝑊, 𝐹) on 𝑜 vertices

in adjacency lists representation

(a list of neighbors for each vertex)

maximum degree d

Connected or 𝜁-far from connected? 𝑃

1 𝜁2𝑒 time

(no dependence on 𝑜)

SLIDE 10

Randomized Approximation in sublinear time

Simple Examples

SLIDE 11

Randomized Approximation: a Toy Example

Input: a string 𝑥 ∈ 0,1 𝑜 Goal: Estimate the fraction of 1’s in 𝑥 (like in polls) It suffices to sample 𝑡 = 1 ⁄ 𝜁2 positions and output the average to get the fraction of 1’s ±𝜁 (i.e., additive error 𝜁) with probability ¸ 2/3 Yi = value of sample 𝑗. Then E[Y] = ∑

𝑡 𝑗=1

E[Yi] = 𝑡 ⋅ (fraction of 1’s in 𝑥) Pr (sample average) − fraction of 1′s in 𝑥 ≥ 𝜁 = Pr Y − E Y ≥ 𝜁𝑡 ≤ 2e−2𝜀2/𝑡 = 2𝑓−2 < 1/3

11

Let Y1, … , Ys be independently distributed random variables in [0,1] and let Y = ∑

𝑡 𝑗=1

Yi (sample sum). Then Pr Y − E Y ≥ δ ≤ 2e−2𝜀2/𝑡. 0 0 0 1 … 0 1 0 0

Hoeffding Bound

Apply Hoeffding Bound with 𝜀 = 𝜁𝑡 substitute 𝑡 = 1 ⁄ 𝜁2

SLIDE 12

Approximating # of Connected Components

[Chazelle Rubinfeld Trevisan]

Input: a graph 𝐻 = (𝑊, 𝐹) on n vertices

in adjacency lists representation

(a list of neighbors for each vertex)

maximum degree d

Exact Answer: W(dn) time Additive approximation: # of CC ±εn with probability ¸ 2/3 Time:

Known: 𝑃

𝑒 𝜁2 log 1 𝜁 , W

𝑒 𝜁2

Today: 𝑃

𝑒 𝜁3 . No dependence on n!

12

Partially based on slides by Ronitt Rubinfeld: http://stellar.mit.edu/S/course/6/fa10/6.896/courseMaterial/topics/topic3/lectureNotes/lecst11/lecst11.pdf

SLIDE 13

Breaks C up into contributions

f different nodes

Approximating # of CCs: Main Idea

Let 𝐷 = number of components
For every vertex 𝑣, define

𝑜𝑣 = number of nodes in u’s component

– for each component A: ∑

1 𝑜𝑣 = 1 𝑣∈𝐵

∑

𝑣∈𝑊 1 𝑜𝑣 = 𝐷

Estimate this sum by estimating 𝑜𝑣’s for a few random nodes

– If 𝑣’s component is small, its size can be computed by BFS. – If 𝑣’s component is big, then 1/𝑜𝑣 is small, so it does not contribute much to the sum – Can stop BFS after a few steps Similar to property tester for connectedness [Goldreich Ron]

13

SLIDE 14

Approximating # of CCs: Algorithm

Estimating 𝑜𝑣 = the number of nodes in 𝑣’s component:

Let estimate 𝑜

𝑣 = min 𝑜𝑣,

2 𝜁

– When 𝑣’s component has · 2/e nodes , 𝑜 𝑣 = 𝑜𝑣 – Else 𝑜 𝑣 = 2/e, and so 0 < 1

𝑜 𝑣 − 1 𝑜𝑣 < 1 𝑜 𝑣 = 𝜁 2

Corresponding estimate for C is 𝐷

= ∑

1 𝑜 𝑣 𝑣∈𝑊

. It is a good estimate: 𝐷 − 𝐷 = ∑

1 𝑜 𝑣 𝑣∈𝑊

− ∑

1 𝑜𝑣 𝑣∈𝑊

≤ ∑

1 𝑜 𝑣 − 1 𝑜𝑣 ≤ 𝜁𝑜 2 𝑣∈𝑊

1. Repeat s=Θ(1/e2) times: 2. pick a random vertex 𝑣 3. compute 𝑜 𝑣 via BFS from 𝑣, stopping after at most 2/e new nodes

4. Return 𝐷

= (average of the values 1/𝑜 𝑣) ∙ 𝑜 Run time: O(d /e3)

14

𝑏 𝑐 𝑑 1 𝑜 𝑣 − 1 𝑜𝑣 ≤ 𝜁 2 APPROX_#_CCs (G, d, ε)

SLIDE 15

Approximating # of CCs: Analysis

Want to show: Pr 𝐷 − 𝐷 >

𝜁𝑜 2

≤

1 3

Let Yi = 1/𝑜 𝑣for the ith vertex 𝑣 in the sample

Y = ∑

𝑡 𝑗=1

Yi =

𝑡𝐷 𝑜 and E[Y] = ∑ 𝑡 𝑗=1

E[Yi] = 𝑡 ⋅ E[Y1] = 𝑡 ⋅

1 𝑜 ∑ 1 𝑜 𝑤 𝑣∈𝑊

=

𝑡𝐷 𝑜

Pr 𝐷 − 𝐷 >

𝜁𝑜 2

= Pr

𝑜 𝑡 𝑍 − 𝑜 𝑡 𝐹 𝑍

>

𝜁𝑜 2

= Pr Y − E Y >

𝜁𝑡 2 ≤ 2𝑓−𝜁2𝑡

2

Need 𝑡 = Θ

1 𝜁2 samples to get probability ≤ 1 3

15

Let Y1, … , Ys be independently distributed random variables in [0,1] and let Y = ∑

𝑡 𝑗=1

Yi (sample sum). Then Pr Y − E Y ≥ δ ≤ 2e−2𝜀2/𝑡.

Hoeffding Bound

SLIDE 16

Approximating # of CCs: Analysis

So far: 𝐷 − 𝐷 ≤

𝜁𝑜 2

Pr 𝐷 − 𝐷 >

𝜁𝑜 2

≤

1 3

With probability ≥

2 3 ,

𝐷 − 𝐷 ≤ 𝐷 − 𝐷 + 𝐷 − 𝐷 ≤ 𝜁𝑜 2 + 𝜁𝑜 2 ≤ 𝜁𝑜 Summary: The number of connected components in 𝑜-vetex graphs of degree at most 𝑒 can be estimated within ±𝜁𝑜 in time 𝑃

𝑒 𝜁3 .

16

SLIDE 17

Minimum spanning tree (MST)

What is the cheapest way to connect all the dots?

Input: a weighted graph with n vertices and m edges

Exact computation:

– Deterministic 𝑃(𝑛 ∙ inverse-Ackermann(𝑛)) time [Chazelle] – Randomized 𝑃(𝑛) time [Karger Klein Tarjan]

1 3 7 5 2 4

17

Partially based on slides by Ronitt Rubinfeld: http://stellar.mit.edu/S/course/6/fa10/6.896/courseMaterial/topics/topic3/lectureNotes/lecst11/lecst11.pdf

SLIDE 18

Approximating MST Weight in Sublinear Time

[Chazelle Rubinfeld Trevisan]

Input: a graph 𝐻 = (𝑊, 𝐹) on n vertices

in adjacency lists representation
maximum degree d and maximum allowed weight w
weights in {1,2,…,w}

Output: (1+ ε)-approximation to MST weight, 𝑥𝑁𝑇𝑈 Time:

Known: 𝑃

𝑒𝑥 𝜁3 log 𝑒𝑥 𝜁

, W

𝑒𝑥 𝜁2

Today: 𝑃

𝑒𝑥3log 𝑥 𝜁3

18

No dependence on n!

SLIDE 19

Idea Behind Algorithm

Characterize MST weight in terms of number of connected

components in certain subgraphs of G

Already know that number of connected components can be

estimated quickly

19

SLIDE 20

Recall Kruskal’s algorithm for computing MST exactly.

Suppose all weights are 1 or 2. Then MST weight

= (# weight-1 edges in MST) + 2 ⋅ (# weight-2 edges in MST)

= 𝑜 – 1 + (# of weight-2 edges in MST) = 𝑜 – 1 + (# of CCs induced by weight-1 edges) −1 weight 1 weight 2 connected components induced by weight-1 edges MST

MST and Connected Components: Warm-up

MST has 𝑜 − 1 edges By Kruskal

SLIDE 21

MST and Connected Components

In general: Let 𝐻𝑗 = subgraph of 𝐻 containing all edges of weight ≤ 𝑗 𝐷𝑗 = number of connected components in 𝐻𝑗 Then MST has 𝐷𝑗 − 1 edges of weight > 𝑗.

Let 𝛾𝑗 be the number of edges of weight > 𝑗 in MST
Each MST edge contributes 1 to 𝑥𝑁𝑇𝑈, each MST edge of weight >1

contributes 1 more, each MST edge of weight >2 contributes one more, … 𝑥𝑁𝑇𝑈 𝐻 = 𝛾𝑗

𝑥−1 𝑗=0

= (𝐷𝑗

𝑥−1 𝑗=0

− 1) = −𝑥 + 𝐷𝑗

𝑥−1 𝑗=0

= 𝑜 − 𝑥 + 𝐷𝑗

𝑥−1 𝑗=1

21

𝑥𝑁𝑇𝑈 𝐻 = 𝑜 − 𝑥 + 𝐷𝑗

𝑥−1 𝑗=1

Claim

SLIDE 22

APPROX_MSTweight (G, w, d, ε)

Algorithm for Approximating 𝒙𝑵𝑻𝑼

1. For 𝑗 = 1 to 𝑥 − 1 do: 2. 𝐷 𝑗 ←APPROX_#CCs(𝐻𝑗 , 𝑒, 𝜁/w). 3. Return 𝑥

𝑁𝑇𝑈 = 𝑜 − 𝑥 + ∑ 𝐷 𝑗

𝑥−1 𝑗=1

. Analysis:

Suppose all estimates of 𝐷𝑗’s are good: 𝐷

𝑗 − 𝐷𝑗 ≤

𝜁 𝑥 𝑜.

Then 𝑥 𝑁𝑇𝑈 − 𝑥𝑁𝑇𝑈 = | ∑ (𝐷 𝑗−𝐷𝑗)| ≤

𝑥−1 𝑗=1

∑ |𝐷 𝑗 − 𝐷𝑗| ≤ 𝑥 ⋅

𝑥−1 𝑗=1 𝜁 𝑥 𝑜 = 𝜁𝑜

Pr[all 𝑥 − 1 estimates are good]≥ 2/3 𝑥−1
Not good enough! Need error probability ≤

1 3𝑥 for each iteration

Then, by Union Bound, Pr[error]≤ 𝑥 ⋅

1 3𝑥 = 1 3

Can amplify success probability of any algorithm by repeating it and taking

the median answer.

Can take more samples in APPROX_#CCs. What’s the resulting run time?

22

Claim. 𝑥𝑁𝑇𝑈 𝐻 = 𝑜 − 𝑥 + ∑

𝐷𝑗

𝑥−1 𝑗=1

SLIDE 23

Multiplicative Approximation for 𝒙𝑵𝑻𝑼

For MST cost, additive approximation ⟹ multiplicative approximation

𝑥𝑁𝑇𝑈 ≥ 𝑜 − 1 ⟹ 𝑥𝑁𝑇𝑈 ≥ 𝑜/2 for 𝑜 ≥ 2

𝜁𝑜-additive approximation:

𝑥𝑁𝑇𝑈 − 𝜁𝑜 ≤ 𝑥 𝑁𝑇𝑈 ≤ 𝑥𝑁𝑇𝑈 + 𝜁𝑜

(1 ± 2𝜁)-multiplicative approximation:

𝑥𝑁𝑇𝑈 1 − 2𝜁 ≤ 𝑥𝑁𝑇𝑈 − 𝜁𝑜 ≤ 𝑥 𝑁𝑇𝑈 ≤ 𝑥𝑁𝑇𝑈 + 𝜁𝑜 ≤ 𝑥𝑁𝑇𝑈 1 + 2𝜁

23