1
Sublinear Algorithms Lecture 3 Sofya Raskhodnikova Penn State - - PowerPoint PPT Presentation
Sublinear Algorithms Lecture 3 Sofya Raskhodnikova Penn State - - PowerPoint PPT Presentation
Sublinear Algorithms Lecture 3 Sofya Raskhodnikova Penn State University 1 Graph Properties Testing if a Graph is Connected [Goldreich Ron] Input: a graph = (, ) on vertices in adjacency lists representation (a list of
Graph Properties
Testing if a Graph is Connected [Goldreich Ron]
Input: a graph π» = (π, πΉ) on π vertices
- in adjacency lists representation
(a list of neighbors for each vertex)
- maximum degree d, i.e., adjacency lists of length d with some empty entries
Query (π€, π), where π€ β π and π β [π]: entry π of adjacency list of vertex π€ Exact Answer: W(dn) time
- Approximate version:
Is the graph connected or Β²-far from connected? dist π»1, π»2 =
# ππ πππ’ππ ππ‘ ππ πππππππππ§ πππ‘π’π‘ ππ π₯βππβ π»1 πππ π»2 ππππππ ππ
Time: π
1 π2π today
+ improvement on HW
No dependence on n!
3
Testing Connectedness: Algorithm
1. Repeat s=16/ed times: 2. pick a random vertex π£ 3. determine if connected component of π£ is small: perform BFS from π£, stopping after at most 8/ed new nodes 4. Reject if a small connected component was found, otherwise accept. Run time: O(π/e2π2)=O(1/e2π) Analysis:
- Connected graphs are always accepted.
- Remains to show:
If a graph is Β²-far from connected, it is rejected with probability β₯
2 3
4
Connectedness Tester(G, d, Ξ΅)
Testing Connectedness: Analysis
- If Claim 2 holds, at least eππ
8 nodes are in small connected components.
- By Witness lemma, it suffices to sample
2β 8
eππ/π =
16
eπ nodes to detect one from a small connected component.
5
Claim 1
If G is e-far from connected, it has β₯ eππ
4 connected components.
Claim 2
If G is e-far from connected, it has β₯ eππ
8 connected components
- f size at most 8/ed.
Testing Connectedness: Proof of Claim 1
We prove the contrapositive: If G has < eππ
4 connected components, one can make G connected by
modifying < e fraction of its representation, i.e., < eππ entries.
- If there are no degree restrictions, k components can be connected by
adding k-1 edges, each affecting 2 nodes. Here, k < eππ
4 , so 2k-2 < eππ .
- What if adjacency lists of all vertices in a component are full,
i.e., all vertex degrees are d?
6
Claim 1
If G is e-far from connected, it has β₯ eππ
4 connected components.
Freeing up an Adjacency List Entry
What if adjacency lists of all vertices in a component are full, i.e., all vertex degrees are d?
- Consider an MST of this component.
- Let π€ be a leaf of the MST.
- Disconnect π€ from a node other than its parent in the MST.
- Two entries are changed while keeping the same number of components.
- Thus, k components can be connected by adding 2k-1 edges, each affecting
2 nodes. Here, k < eππ
4 , so 4k-2 < eππ .
7
π€
Claim 1
If G is e-far from connected, it has β₯ eππ
4 connected components.
Testing Connectedness: Proof of Claim 2
- If Claim 1 holds, there are at least eππ
4 connected components.
- Their average size β€
π
eππ/4 =
4
eπ.
- By an averaging argument (or Markov inequality), at least half of the
components are of size at most twice the average.
8
Claim 1
If G is e-far from connected, it has β₯ eππ
4 connected components.
Claim 2
If G is e-far from connected, it has β₯ eππ
8 connected components
- f size at most 8/ed.
Testing if a Graph is Connected [Goldreich Ron]
9
Input: a graph π» = (π, πΉ) on π vertices
- in adjacency lists representation
(a list of neighbors for each vertex)
- maximum degree d
Connected or π-far from connected? π
1 π2π time
(no dependence on π)
Randomized Approximation in sublinear time
Simple Examples
Randomized Approximation: a Toy Example
Input: a string π₯ β 0,1 π Goal: Estimate the fraction of 1βs in π₯ (like in polls) It suffices to sample π‘ = 1 β π2 positions and output the average to get the fraction of 1βs Β±π (i.e., additive error π) with probability ΒΈ 2/3 Yi = value of sample π. Then E[Y] = β
π‘ π=1
E[Yi] = π‘ β (fraction of 1βs in π₯) Pr (sample average) β fraction of 1β²s in π₯ β₯ π = Pr Y β E Y β₯ ππ‘ β€ 2eβ2π2/π‘ = 2πβ2 < 1/3
11
Let Y1, β¦ , Ys be independently distributed random variables in [0,1] and let Y = β
π‘ π=1
Yi (sample sum). Then Pr Y β E Y β₯ Ξ΄ β€ 2eβ2π2/π‘. 0 0 0 1 β¦ 0 1 0 0
Hoeffding Bound
Apply Hoeffding Bound with π = ππ‘ substitute π‘ = 1 β π2
Approximating # of Connected Components
[Chazelle Rubinfeld Trevisan]
Input: a graph π» = (π, πΉ) on n vertices
- in adjacency lists representation
(a list of neighbors for each vertex)
- maximum degree d
Exact Answer: W(dn) time Additive approximation: # of CC Β±Ξ΅n with probability ΒΈ 2/3 Time:
- Known: π
π π2 log 1 π , W
π π2
- Today: π
π π3 . No dependence on n!
12
Partially based on slides by Ronitt Rubinfeld: http://stellar.mit.edu/S/course/6/fa10/6.896/courseMaterial/topics/topic3/lectureNotes/lecst11/lecst11.pdf
Breaks C up into contributions
- f different nodes
Approximating # of CCs: Main Idea
- Let π· = number of components
- For every vertex π£, define
ππ£ = number of nodes in uβs component
β for each component A: β
1 ππ£ = 1 π£βπ΅
β
π£βπ 1 ππ£ = π·
- Estimate this sum by estimating ππ£βs for a few random nodes
β If π£βs component is small, its size can be computed by BFS. β If π£βs component is big, then 1/ππ£ is small, so it does not contribute much to the sum β Can stop BFS after a few steps Similar to property tester for connectedness [Goldreich Ron]
13
Approximating # of CCs: Algorithm
Estimating ππ£ = the number of nodes in π£βs component:
- Let estimate π
π£ = min ππ£,
2 π
β When π£βs component has Β· 2/e nodes , π π£ = ππ£ β Else π π£ = 2/e, and so 0 < 1
π π£ β 1 ππ£ < 1 π π£ = π 2
- Corresponding estimate for C is π·
= β
1 π π£ π£βπ
. It is a good estimate: π· β π· = β
1 π π£ π£βπ
β β
1 ππ£ π£βπ
β€ β
1 π π£ β 1 ππ£ β€ ππ 2 π£βπ
1. Repeat s=Ξ(1/e2) times: 2. pick a random vertex π£ 3. compute π π£ via BFS from π£, stopping after at most 2/e new nodes
- 4. Return π·
= (average of the values 1/π π£) β π Run time: O(d /e3)
14
π π π 1 π π£ β 1 ππ£ β€ π 2 APPROX_#_CCs (G, d, Ξ΅)
Approximating # of CCs: Analysis
Want to show: Pr π· β π· >
ππ 2
β€
1 3
Let Yi = 1/π π£for the ith vertex π£ in the sample
- Y = β
π‘ π=1
Yi =
π‘π· π and E[Y] = β π‘ π=1
E[Yi] = π‘ β E[Y1] = π‘ β
1 π β 1 π π€ π£βπ
=
π‘π· π
Pr π· β π· >
ππ 2
= Pr
π π‘ π β π π‘ πΉ π
>
ππ 2
= Pr Y β E Y >
ππ‘ 2 β€ 2πβπ2π‘
2
- Need π‘ = Ξ
1 π2 samples to get probability β€ 1 3
15
Let Y1, β¦ , Ys be independently distributed random variables in [0,1] and let Y = β
π‘ π=1
Yi (sample sum). Then Pr Y β E Y β₯ Ξ΄ β€ 2eβ2π2/π‘.
Hoeffding Bound
Approximating # of CCs: Analysis
So far: π· β π· β€
ππ 2
Pr π· β π· >
ππ 2
β€
1 3
- With probability β₯
2 3 ,
π· β π· β€ π· β π· + π· β π· β€ ππ 2 + ππ 2 β€ ππ Summary: The number of connected components in π-vetex graphs of degree at most π can be estimated within Β±ππ in time π
π π3 .
16
Minimum spanning tree (MST)
- What is the cheapest way to connect all the dots?
Input: a weighted graph with n vertices and m edges
- Exact computation:
β Deterministic π(π β inverse-Ackermann(π)) time [Chazelle] β Randomized π(π) time [Karger Klein Tarjan]
1 3 7 5 2 4
17
Partially based on slides by Ronitt Rubinfeld: http://stellar.mit.edu/S/course/6/fa10/6.896/courseMaterial/topics/topic3/lectureNotes/lecst11/lecst11.pdf
Approximating MST Weight in Sublinear Time
[Chazelle Rubinfeld Trevisan]
Input: a graph π» = (π, πΉ) on n vertices
- in adjacency lists representation
- maximum degree d and maximum allowed weight w
- weights in {1,2,β¦,w}
Output: (1+ Ξ΅)-approximation to MST weight, π₯πππ Time:
- Known: π
ππ₯ π3 log ππ₯ π
, W
ππ₯ π2
- Today: π
ππ₯3log π₯ π3
18
No dependence on n!
Idea Behind Algorithm
- Characterize MST weight in terms of number of connected
components in certain subgraphs of G
- Already know that number of connected components can be
estimated quickly
19
- Recall Kruskalβs algorithm for computing MST exactly.
Suppose all weights are 1 or 2. Then MST weight
= (# weight-1 edges in MST) + 2 β (# weight-2 edges in MST)
= π β 1 + (# of weight-2 edges in MST) = π β 1 + (# of CCs induced by weight-1 edges) β1 weight 1 weight 2 connected components induced by weight-1 edges MST
MST and Connected Components: Warm-up
MST has π β 1 edges By Kruskal
MST and Connected Components
In general: Let π»π = subgraph of π» containing all edges of weight β€ π π·π = number of connected components in π»π Then MST has π·π β 1 edges of weight > π.
- Let πΎπ be the number of edges of weight > π in MST
- Each MST edge contributes 1 to π₯πππ, each MST edge of weight >1
contributes 1 more, each MST edge of weight >2 contributes one more, β¦ π₯πππ π» = πΎπ
π₯β1 π=0
= (π·π
π₯β1 π=0
β 1) = βπ₯ + π·π
π₯β1 π=0
= π β π₯ + π·π
π₯β1 π=1
21
π₯πππ π» = π β π₯ + π·π
π₯β1 π=1
Claim
APPROX_MSTweight (G, w, d, Ξ΅)
Algorithm for Approximating ππ΅π»πΌ
1. For π = 1 to π₯ β 1 do: 2. π· π βAPPROX_#CCs(π»π , π, π/w). 3. Return π₯
πππ = π β π₯ + β π· π
π₯β1 π=1
. Analysis:
- Suppose all estimates of π·πβs are good: π·
π β π·π β€
π π₯ π.
Then π₯ πππ β π₯πππ = | β (π· πβπ·π)| β€
π₯β1 π=1
β |π· π β π·π| β€ π₯ β
π₯β1 π=1 π π₯ π = ππ
- Pr[all π₯ β 1 estimates are good]β₯ 2/3 π₯β1
- Not good enough! Need error probability β€
1 3π₯ for each iteration
- Then, by Union Bound, Pr[error]β€ π₯ β
1 3π₯ = 1 3
- Can amplify success probability of any algorithm by repeating it and taking
the median answer.
- Can take more samples in APPROX_#CCs. Whatβs the resulting run time?
22
- Claim. π₯πππ π» = π β π₯ + β
π·π
π₯β1 π=1
Multiplicative Approximation for ππ΅π»πΌ
For MST cost, additive approximation βΉ multiplicative approximation
π₯πππ β₯ π β 1 βΉ π₯πππ β₯ π/2 for π β₯ 2
- ππ-additive approximation:
π₯πππ β ππ β€ π₯ πππ β€ π₯πππ + ππ
- (1 Β± 2π)-multiplicative approximation:
π₯πππ 1 β 2π β€ π₯πππ β ππ β€ π₯ πππ β€ π₯πππ + ππ β€ π₯πππ 1 + 2π
23