L ECTURE 3 Last time Properties of lists and functions. Testing if - - PowerPoint PPT Presentation

β–Ά
l ecture 3
SMART_READER_LITE
LIVE PREVIEW

L ECTURE 3 Last time Properties of lists and functions. Testing if - - PowerPoint PPT Presentation

Sublinear Algorithms L ECTURE 3 Last time Properties of lists and functions. Testing if a list is sorted/Lipschitz and if a function is monotone. Today Testing if a graph is connected. Estimating the number of connected


slide-1
SLIDE 1

9/10/2020

Sublinear Algorithms

LECTURE 3

Last time

  • Properties of lists and functions.
  • Testing if a list is sorted/Lipschitz

and if a function is monotone.

Today

  • Testing if a graph is connected.
  • Estimating the number of connected

components.

  • Estimating the weight of a MST

Sofya Raskhodnikova;Boston University

slide-2
SLIDE 2

Graph Properties

slide-3
SLIDE 3

Testing if a Graph is Connected [Goldreich Ron]

Input: a graph 𝐻 = (π‘Š, 𝐹) on π‘œ vertices

  • in adjacency lists representation

(a list of neighbors for each vertex)

  • maximum degree d, i.e., adjacency lists of length d with some empty entries

Query (𝑀, 𝑗), where 𝑀 ∈ π‘Š and 𝑗 ∈ [𝑒]: entry 𝑗 of adjacency list of vertex 𝑀 Exact Answer: W(dn) time

  • Approximate version:

Is the graph connected or ²-far from connected? dist 𝐻1, 𝐻2 =

# 𝑝𝑔 π‘“π‘œπ‘’π‘—π‘ π‘“π‘‘ π‘—π‘œ π‘π‘’π‘˜π‘π‘‘π‘“π‘œπ‘‘π‘§ π‘šπ‘—π‘‘π‘’π‘‘ π‘π‘œ π‘₯β„Žπ‘—π‘‘β„Ž 𝐻1 π‘π‘œπ‘’ 𝐻2 𝑒𝑗𝑔𝑔𝑓𝑠 π‘’π‘œ

Time: 𝑃

1 𝜁2𝑒

today + improvement on HW

No dependence on n!

3

slide-4
SLIDE 4

Testing Connectedness: Algorithm

1. Repeat s=8/ed times: 2. pick a random vertex 𝑣 3. determine if connected component of 𝑣 is small: perform BFS from 𝑣, stopping after at most 4/ed new nodes 4. Reject if a small connected component was found, otherwise accept. Run time: O(𝑒/e2𝑒2)=O(1/e2𝑒) Analysis:

  • Connected graphs are always accepted.
  • Remains to show:

If a graph is Β²-far from connected, it is rejected with probability β‰₯

2 3

4

Connectedness Tester(n, d, Ξ΅, query access to G)

slide-5
SLIDE 5

Testing Connectedness: Analysis

  • By Claim 2, at least eπ‘’π‘œ

4

nodes are in small connected components.

  • By Witness lemma, it suffices to sample

2β‹…4

eπ‘’π‘œ/π‘œ =

8

e𝑒 nodes to detect one from a small connected component.

5

Claim 1

If G is e-far from connected, it has β‰₯ eπ‘’π‘œ

2

connected components.

Claim 2

If G is e-far from connected, it has β‰₯ eπ‘’π‘œ

4

connected components

  • f size at most 4/ed.
slide-6
SLIDE 6

Testing Connectedness: Proof of Claim 1

We prove the contrapositive: If G has < eπ‘’π‘œ

2

connected components, one can make G connected by modifying < e fraction of its representation, i.e., < eπ‘’π‘œ entries.

  • If there are no degree restrictions, k components can be connected by

adding 𝑙-1 edges, each affecting 2 nodes. Here, 𝑙 < eπ‘’π‘œ

2 , so 2𝑙 βˆ’ 2 < eπ‘’π‘œ .

  • What if adjacency lists of all vertices in a component are full,

i.e., all vertex degrees are d?

6

Claim 1

If G is e-far from connected, it has β‰₯ eπ‘’π‘œ

2

connected components.

slide-7
SLIDE 7

Freeing up an Adjacency List Entry

What if adjacency lists of all vertices in a component are full, i.e., all vertex degrees are d?

  • Consider an MST of this component.
  • Let 𝑀 be a leaf of the MST.
  • Disconnect 𝑀 from a node other than its parent in the MST.
  • Two entries are changed while keeping the same number of components.

7

𝑀

Claim 1

If G is e-far from connected, it has β‰₯ eπ‘’π‘œ

2

connected components.

slide-8
SLIDE 8

Freeing up an Adjacency List Entry

What if adjacency lists of all vertices in a component are full, i.e., all vertex degrees are d?

  • Apply this to each component with <2 free spots in adjacency lists.
  • Now we can connect all the components using the freed up spots while

ensuring that we never change more than 2 spots per component.

  • Thus, k components can be connected by changing 2k spots.

Here, k < eπ‘’π‘œ

2 , so 2k < eπ‘’π‘œ .

8

𝑀

Claim 1

If G is e-far from connected, it has β‰₯ eπ‘’π‘œ

2

connected components.

slide-9
SLIDE 9

Testing Connectedness: Proof of Claim 2

  • By Claim 1, there are at least eπ‘’π‘œ

2

connected components.

  • Their average size is at most

π‘œ

eπ‘’π‘œ/2 =

2

e𝑒.

  • By an averaging argument (or Markov inequality), at least half of the

components are of size at most twice the average.

9

Claim 1

If G is e-far from connected, it has β‰₯ eπ‘’π‘œ

2

connected components.

Claim 2

If G is e-far from connected, it has β‰₯ eπ‘’π‘œ

4

connected components

  • f size at most 4/ed.
slide-10
SLIDE 10

Testing if a Graph is Connected [Goldreich Ron]

10

Input: a graph 𝐻 = (π‘Š, 𝐹) on π‘œ vertices

  • in adjacency lists representation

(a list of neighbors for each vertex)

  • maximum degree d

Connected or 𝜁-far from connected? 𝑃

1 𝜁2𝑒 time

(no dependence on π‘œ)

slide-11
SLIDE 11

Randomized Approximation in sublinear time

A Simple Example

slide-12
SLIDE 12

Randomized Approximation: a Toy Example

Input: a string π‘₯ ∈ 0,1 π‘œ Goal: Estimate the fraction of 1’s in π‘₯ (like in polls) It suffices to sample 𝑑 = 1 ⁄ 𝜁2 positions and output the average to get the fraction of 1’s ±𝜁 (i.e., additive error 𝜁) with probability ΒΈ 2/3 Yi = value of sample 𝑗. Then E[Y] =

1 𝑑 β‹… βˆ‘ 𝑑 𝑗=1

E[Yi] = (fraction of 1’s in π‘₯) Pr (sample mean) βˆ’ fraction of 1β€²s in π‘₯ β‰₯ 𝜁 ≀ 2eβˆ’2π‘‘πœ2 = 2π‘“βˆ’2 < 1/3

12

Let Y1, … , Ys be independently distributed random variables in [0,1]. Let Y =

1 𝑑 β‹… βˆ‘ 𝑑 𝑗=1

Yi (called sample mean). Then Pr Y βˆ’ E Y β‰₯ 𝜁 ≀ 2eβˆ’2π‘‘πœ2. 1 … 0 1

Hoeffding Bound

Apply Hoeffding Bound substitute 𝑑 = 1 ⁄ 𝜁2

slide-13
SLIDE 13

Approximating # of Connected Components

[Chazelle Rubinfeld Trevisan]

Input: a graph 𝐻 = (π‘Š, 𝐹) on n vertices

  • in adjacency lists representation

(a list of neighbors for each vertex)

  • maximum degree d

Exact Answer: W(dn) time Additive approximation: # of CC Β±Ξ΅n with probability ΒΈ 2/3 Time:

  • Known: 𝑃

𝑒 𝜁2 log 1 𝜁 , W

𝑒 𝜁2

  • Today: 𝑃

𝑒 𝜁3 . No dependence on n!

13

Partially based on slides by Ronitt Rubinfeld: http://stellar.mit.edu/S/course/6/fa10/6.896/courseMaterial/topics/topic3/lectureNotes/lecst11/lecst11.pdf

slide-14
SLIDE 14

Breaks C up into contributions

  • f different nodes

Approximating # of CCs: Main Idea

  • Let 𝐷 = number of components
  • For every vertex 𝑣, define

π‘œπ‘£ = number of nodes in u’s component

– for each component A: βˆ‘π‘£βˆˆπ΅

1 π‘œπ‘£ = 1

βˆ‘

π‘£βˆˆπ‘Š 1 π‘œπ‘£ = 𝐷

  • Estimate this sum by estimating π‘œπ‘£β€™s for a few random nodes

– If 𝑣’s component is small, its size can be computed by BFS. – If 𝑣’s component is big, then 1/π‘œπ‘£ is small, so it does not contribute much to the sum – Can stop BFS after a few steps Similar to property tester for connectedness [Goldreich Ron]

14

slide-15
SLIDE 15

Approximating # of CCs: Algorithm

Estimating π‘œπ‘£ = the number of nodes in 𝑣’s component:

  • Let estimate ො

π‘œπ‘£ = min π‘œπ‘£,

2 𝜁

– When 𝑣’s component has Β· 2/e nodes , ො π‘œπ‘£ = π‘œπ‘£ – Else ො π‘œπ‘£ = 2/e, and so 0 < 1

ො π‘œπ‘£ βˆ’ 1 π‘œπ‘£ < 1 ො π‘œπ‘£ = 𝜁 2

  • Corresponding estimate for C is መ

𝐷 = βˆ‘π‘£βˆˆπ‘Š

1 ො π‘œπ‘£. It is a good estimate:

መ 𝐷 βˆ’ 𝐷 = βˆ‘π‘£βˆˆπ‘Š

1 ො π‘œπ‘£ βˆ’ βˆ‘π‘£βˆˆπ‘Š 1 π‘œπ‘£ ≀ βˆ‘π‘£βˆˆπ‘Š 1 ො π‘œπ‘£ βˆ’ 1 π‘œπ‘£ ≀ πœπ‘œ 2

1. Repeat s=Θ(1/e2) times: 2. pick a random vertex 𝑣 3. compute ො π‘œπ‘£ via BFS from 𝑣, stopping after at most 2/e new nodes

4.

Return ሚ 𝐷 = (average of the values 1/ො π‘œπ‘£) βˆ™ π‘œ Run time: O(d /e3)

15

ΰ΅’ 𝑏 𝑐 𝑑 1 ො π‘œπ‘£ βˆ’ 1 π‘œπ‘£ ≀ 𝜁 2 APPROX_#_CCs (n, d, Ξ΅, query access to G)

slide-16
SLIDE 16

Approximating # of CCs: Analysis

Want to show: Pr ሚ 𝐷 βˆ’ መ 𝐷 >

πœπ‘œ 2

≀

1 3

Let Yi = 1/ො π‘œπ‘£for the ith vertex 𝑣 in the sample

  • Y =

1 𝑑 β‹… βˆ‘ 𝑑 𝑗=1

Yi =

ሚ 𝐷 π‘œ

  • E[Y] =

1 𝑑 β‹… βˆ‘ 𝑑 𝑗=1

E[Yi] = E[Y1] =

1 π‘œ βˆ‘π‘£βˆˆπ‘Š 1 ො π‘œπ‘£ = መ 𝐷 π‘œ

Pr ሚ 𝐷 βˆ’ መ 𝐷 >

πœπ‘œ 2

= Pr π‘œπ‘ βˆ’ π‘œπΉ 𝑍 >

πœπ‘œ 2

= Pr Y βˆ’ E Y >

𝜁 2 ≀ 2π‘“βˆ’πœ2𝑑

2

  • Need 𝑑 = Θ

1 𝜁2 samples to get probability ≀ 1 3

16

Let Y1, … , Ys be independently distributed random variables in [0,1]. Let Y =

1 𝑑 β‹… βˆ‘ 𝑑 𝑗=1

Yi (called sample mean). Then Pr Y βˆ’ E Y β‰₯ 𝜁 ≀ 2eβˆ’2π‘‘πœ2.

Hoeffding Bound

slide-17
SLIDE 17

Approximating # of CCs: Analysis

So far: መ 𝐷 βˆ’ 𝐷 ≀

πœπ‘œ 2

Pr ሚ 𝐷 βˆ’ መ 𝐷 >

πœπ‘œ 2

≀

1 3

  • With probability β‰₯

2 3 ,

ሚ 𝐷 βˆ’ 𝐷 ≀ ሚ 𝐷 βˆ’ መ 𝐷 + መ 𝐷 βˆ’ 𝐷 ≀ πœπ‘œ 2 + πœπ‘œ 2 ≀ πœπ‘œ Summary: The number of connected components in π‘œ-vetex graphs of degree at most 𝑒 can be estimated within Β±πœπ‘œ in time 𝑃

𝑒 𝜁3 .

17

slide-18
SLIDE 18

Minimum spanning tree (MST)

  • What is the cheapest way to connect all the dots?

Input: a weighted graph with n vertices and m edges

  • Exact computation:

– Deterministic 𝑃(𝑛 βˆ™ inverse-Ackermann(𝑛)) time [Chazelle] – Randomized 𝑃(𝑛) time [Karger Klein Tarjan]

1 3 7 5 2 4

18

Partially based on slides by Ronitt Rubinfeld: http://stellar.mit.edu/S/course/6/fa10/6.896/courseMaterial/topics/topic3/lectureNotes/lecst11/lecst11.pdf

slide-19
SLIDE 19

Approximating MST Weight in Sublinear Time

[Chazelle Rubinfeld Trevisan]

Input: a graph 𝐻 = (π‘Š, 𝐹) on n vertices

  • in adjacency lists representation
  • maximum degree d and maximum allowed weight w
  • weights in {1,2,…,w}

Output: (1+ Ξ΅)-approximation to MST weight, π‘₯π‘π‘‡π‘ˆ Time:

  • Known: 𝑃

𝑒π‘₯ 𝜁3 log 𝑒π‘₯ 𝜁

, W

𝑒π‘₯ 𝜁2

  • Today: 𝑃

𝑒π‘₯4log π‘₯ 𝜁3

19

No dependence on n!

slide-20
SLIDE 20

Idea Behind Algorithm

  • Characterize MST weight in terms of number of connected

components in certain subgraphs of G

  • Already know that number of connected components can be

estimated quickly

20

slide-21
SLIDE 21
  • Recall Kruskal’s algorithm for computing MST exactly.

Suppose all weights are 1 or 2. Then MST weight

= (# weight-1 edges in MST) + 2 β‹… (# weight-2 edges in MST)

= π‘œ – 1 + (# of weight-2 edges in MST) = π‘œ – 1 + (# of CCs induced by weight-1 edges) βˆ’1 weight 1 weight 2 connected components induced by weight-1 edges MST

MST and Connected Components: Warm-up

MST has π‘œ βˆ’ 1 edges By Kruskal

slide-22
SLIDE 22

MST and Connected Components

In general: Let 𝐻𝑗 = subgraph of 𝐻 containing all edges of weight ≀ 𝑗 𝐷𝑗 = number of connected components in 𝐻𝑗 Then MST has 𝐷𝑗 βˆ’ 1 edges of weight > 𝑗.

  • Let 𝛾𝑗 be the number of edges of weight > 𝑗 in MST
  • Each MST edge contributes 1 to π‘₯π‘π‘‡π‘ˆ, each MST edge of weight >1

contributes 1 more, each MST edge of weight >2 contributes one more, … π‘₯π‘π‘‡π‘ˆ 𝐻 = ෍

𝑗=0 π‘₯βˆ’1

𝛾𝑗 = ෍

𝑗=0 π‘₯βˆ’1

(𝐷𝑗 βˆ’ 1) = βˆ’π‘₯ + ෍

𝑗=0 π‘₯βˆ’1

𝐷𝑗 = π‘œ βˆ’ π‘₯ + ෍

𝑗=1 π‘₯βˆ’1

𝐷𝑗

22

π‘₯π‘π‘‡π‘ˆ 𝐻 = π‘œ βˆ’ π‘₯ + ෍

𝑗=1 π‘₯βˆ’1

𝐷𝑗

Claim

slide-23
SLIDE 23

Algorithm for Approximating 𝒙𝑡𝑻𝑼

1. For 𝑗 = 1 to π‘₯ βˆ’ 1 do: 2. ሚ 𝐷𝑗 ←APPROX_#CCs(π‘œ, 𝑒,

𝜁 w ; 𝐻𝑗).

3. Return ΰ·₯

π‘₯π‘π‘‡π‘ˆ = π‘œ βˆ’ π‘₯ + βˆ‘π‘—=1

π‘₯βˆ’1 ሚ

𝐷𝑗 .

Analysis:

  • Suppose all estimates of 𝐷𝑗’s are good: ሚ

𝐷𝑗 βˆ’ 𝐷𝑗 ≀

𝜁 π‘₯ π‘œ.

Then ΰ·₯ π‘₯π‘π‘‡π‘ˆ βˆ’ π‘₯π‘π‘‡π‘ˆ = | βˆ‘π‘—=1

π‘₯βˆ’1( ሚ

π·π‘—βˆ’π·π‘—)| ≀ βˆ‘π‘—=1

π‘₯βˆ’1 | ሚ

𝐷𝑗 βˆ’ 𝐷𝑗| ≀ π‘₯ β‹…

𝜁 π‘₯ π‘œ = πœπ‘œ

  • Pr[all π‘₯ βˆ’ 1 estimates are good]β‰₯ 2/3 π‘₯βˆ’1
  • Not good enough! Need error probability ≀

1 3π‘₯ for each iteration

  • Then, by Union Bound, Pr[error]≀ π‘₯ β‹…

1 3π‘₯ = 1 3

  • Can amplify success probability of any algorithm by repeating it and taking

the median answer.

  • Can take more samples in APPROX_#CCs. What’s the resulting run time?

23

  • Claim. π‘₯π‘π‘‡π‘ˆ 𝐻 = π‘œ βˆ’ π‘₯ + βˆ‘π‘—=1

π‘₯βˆ’1 𝐷𝑗

APPROX_MSTweight (n, d, w, Ξ΅; G)

slide-24
SLIDE 24

Multiplicative Approximation for 𝒙𝑡𝑻𝑼

For MST cost, additive approximation ⟹ multiplicative approximation

π‘₯π‘π‘‡π‘ˆ β‰₯ π‘œ βˆ’ 1 ⟹ π‘₯π‘π‘‡π‘ˆ β‰₯ π‘œ/2 for π‘œ β‰₯ 2

  • πœπ‘œ-additive approximation:

π‘₯π‘π‘‡π‘ˆ βˆ’ πœπ‘œ ≀ ෝ π‘₯π‘π‘‡π‘ˆ ≀ π‘₯π‘π‘‡π‘ˆ + πœπ‘œ

  • (1 Β± 2𝜁)-multiplicative approximation:

π‘₯π‘π‘‡π‘ˆ 1 βˆ’ 2𝜁 ≀ π‘₯π‘π‘‡π‘ˆ βˆ’ πœπ‘œ ≀ ෝ π‘₯π‘π‘‡π‘ˆ ≀ π‘₯π‘π‘‡π‘ˆ + πœπ‘œ ≀ π‘₯π‘π‘‡π‘ˆ 1 + 2𝜁

24