Sublinear Algorithms Lecture 3 Sofya Raskhodnikova Penn State - - PowerPoint PPT Presentation

β–Ά
sublinear algorithms
SMART_READER_LITE
LIVE PREVIEW

Sublinear Algorithms Lecture 3 Sofya Raskhodnikova Penn State - - PowerPoint PPT Presentation

Sublinear Algorithms Lecture 3 Sofya Raskhodnikova Penn State University 1 Graph Properties Testing if a Graph is Connected [Goldreich Ron] Input: a graph = (, ) on vertices in adjacency lists representation (a list of


slide-1
SLIDE 1

1

Sublinear Algorithms

Lecture 3

Sofya Raskhodnikova

Penn State University

slide-2
SLIDE 2

Graph Properties

slide-3
SLIDE 3

Testing if a Graph is Connected [Goldreich Ron]

Input: a graph 𝐻 = (π‘Š, 𝐹) on π‘œ vertices

  • in adjacency lists representation

(a list of neighbors for each vertex)

  • maximum degree d, i.e., adjacency lists of length d with some empty entries

Query (𝑀, 𝑗), where 𝑀 ∈ π‘Š and 𝑗 ∈ [𝑒]: entry 𝑗 of adjacency list of vertex 𝑀 Exact Answer: W(dn) time

  • Approximate version:

Is the graph connected or ²-far from connected? dist 𝐻1, 𝐻2 =

# 𝑝𝑔 π‘“π‘œπ‘’π‘—π‘ π‘“π‘‘ π‘—π‘œ π‘π‘’π‘˜π‘π‘‘π‘“π‘œπ‘‘π‘§ π‘šπ‘—π‘‘π‘’π‘‘ π‘π‘œ π‘₯β„Žπ‘—π‘‘β„Ž 𝐻1 π‘π‘œπ‘’ 𝐻2 𝑒𝑗𝑔𝑔𝑓𝑠 π‘’π‘œ

Time: 𝑃

1 𝜁2𝑒 today

+ improvement on HW

No dependence on n!

3

slide-4
SLIDE 4

Testing Connectedness: Algorithm

1. Repeat s=16/ed times: 2. pick a random vertex 𝑣 3. determine if connected component of 𝑣 is small: perform BFS from 𝑣, stopping after at most 8/ed new nodes 4. Reject if a small connected component was found, otherwise accept. Run time: O(𝑒/e2𝑒2)=O(1/e2𝑒) Analysis:

  • Connected graphs are always accepted.
  • Remains to show:

If a graph is Β²-far from connected, it is rejected with probability β‰₯

2 3

4

Connectedness Tester(G, d, Ξ΅)

slide-5
SLIDE 5

Testing Connectedness: Analysis

  • If Claim 2 holds, at least eπ‘’π‘œ

8 nodes are in small connected components.

  • By Witness lemma, it suffices to sample

2β‹…8

eπ‘’π‘œ/π‘œ =

16

e𝑒 nodes to detect one from a small connected component.

5

Claim 1

If G is e-far from connected, it has β‰₯ eπ‘’π‘œ

4 connected components.

Claim 2

If G is e-far from connected, it has β‰₯ eπ‘’π‘œ

8 connected components

  • f size at most 8/ed.
slide-6
SLIDE 6

Testing Connectedness: Proof of Claim 1

We prove the contrapositive: If G has < eπ‘’π‘œ

4 connected components, one can make G connected by

modifying < e fraction of its representation, i.e., < eπ‘’π‘œ entries.

  • If there are no degree restrictions, k components can be connected by

adding k-1 edges, each affecting 2 nodes. Here, k < eπ‘’π‘œ

4 , so 2k-2 < eπ‘’π‘œ .

  • What if adjacency lists of all vertices in a component are full,

i.e., all vertex degrees are d?

6

Claim 1

If G is e-far from connected, it has β‰₯ eπ‘’π‘œ

4 connected components.

slide-7
SLIDE 7

Freeing up an Adjacency List Entry

What if adjacency lists of all vertices in a component are full, i.e., all vertex degrees are d?

  • Consider an MST of this component.
  • Let 𝑀 be a leaf of the MST.
  • Disconnect 𝑀 from a node other than its parent in the MST.
  • Two entries are changed while keeping the same number of components.
  • Thus, k components can be connected by adding 2k-1 edges, each affecting

2 nodes. Here, k < eπ‘’π‘œ

4 , so 4k-2 < eπ‘’π‘œ .

7

𝑀

Claim 1

If G is e-far from connected, it has β‰₯ eπ‘’π‘œ

4 connected components.

slide-8
SLIDE 8

Testing Connectedness: Proof of Claim 2

  • If Claim 1 holds, there are at least eπ‘’π‘œ

4 connected components.

  • Their average size ≀

π‘œ

eπ‘’π‘œ/4 =

4

eπ‘œ.

  • By an averaging argument (or Markov inequality), at least half of the

components are of size at most twice the average.

8

Claim 1

If G is e-far from connected, it has β‰₯ eπ‘’π‘œ

4 connected components.

Claim 2

If G is e-far from connected, it has β‰₯ eπ‘’π‘œ

8 connected components

  • f size at most 8/ed.
slide-9
SLIDE 9

Testing if a Graph is Connected [Goldreich Ron]

9

Input: a graph 𝐻 = (π‘Š, 𝐹) on π‘œ vertices

  • in adjacency lists representation

(a list of neighbors for each vertex)

  • maximum degree d

Connected or 𝜁-far from connected? 𝑃

1 𝜁2𝑒 time

(no dependence on π‘œ)

slide-10
SLIDE 10

Randomized Approximation in sublinear time

Simple Examples

slide-11
SLIDE 11

Randomized Approximation: a Toy Example

Input: a string π‘₯ ∈ 0,1 π‘œ Goal: Estimate the fraction of 1’s in π‘₯ (like in polls) It suffices to sample 𝑑 = 1 ⁄ 𝜁2 positions and output the average to get the fraction of 1’s ±𝜁 (i.e., additive error 𝜁) with probability ΒΈ 2/3 Yi = value of sample 𝑗. Then E[Y] = βˆ‘

𝑑 𝑗=1

E[Yi] = 𝑑 β‹… (fraction of 1’s in π‘₯) Pr (sample average) βˆ’ fraction of 1β€²s in π‘₯ β‰₯ 𝜁 = Pr Y βˆ’ E Y β‰₯ πœπ‘‘ ≀ 2eβˆ’2πœ€2/𝑑 = 2π‘“βˆ’2 < 1/3

11

Let Y1, … , Ys be independently distributed random variables in [0,1] and let Y = βˆ‘

𝑑 𝑗=1

Yi (sample sum). Then Pr Y βˆ’ E Y β‰₯ Ξ΄ ≀ 2eβˆ’2πœ€2/𝑑. 0 0 0 1 … 0 1 0 0

Hoeffding Bound

Apply Hoeffding Bound with πœ€ = πœπ‘‘ substitute 𝑑 = 1 ⁄ 𝜁2

slide-12
SLIDE 12

Approximating # of Connected Components

[Chazelle Rubinfeld Trevisan]

Input: a graph 𝐻 = (π‘Š, 𝐹) on n vertices

  • in adjacency lists representation

(a list of neighbors for each vertex)

  • maximum degree d

Exact Answer: W(dn) time Additive approximation: # of CC Β±Ξ΅n with probability ΒΈ 2/3 Time:

  • Known: 𝑃

𝑒 𝜁2 log 1 𝜁 , W

𝑒 𝜁2

  • Today: 𝑃

𝑒 𝜁3 . No dependence on n!

12

Partially based on slides by Ronitt Rubinfeld: http://stellar.mit.edu/S/course/6/fa10/6.896/courseMaterial/topics/topic3/lectureNotes/lecst11/lecst11.pdf

slide-13
SLIDE 13

Breaks C up into contributions

  • f different nodes

Approximating # of CCs: Main Idea

  • Let 𝐷 = number of components
  • For every vertex 𝑣, define

π‘œπ‘£ = number of nodes in u’s component

– for each component A: βˆ‘

1 π‘œπ‘£ = 1 π‘£βˆˆπ΅

βˆ‘

π‘£βˆˆπ‘Š 1 π‘œπ‘£ = 𝐷

  • Estimate this sum by estimating π‘œπ‘£β€™s for a few random nodes

– If 𝑣’s component is small, its size can be computed by BFS. – If 𝑣’s component is big, then 1/π‘œπ‘£ is small, so it does not contribute much to the sum – Can stop BFS after a few steps Similar to property tester for connectedness [Goldreich Ron]

13

slide-14
SLIDE 14

Approximating # of CCs: Algorithm

Estimating π‘œπ‘£ = the number of nodes in 𝑣’s component:

  • Let estimate π‘œ

𝑣 = min π‘œπ‘£,

2 𝜁

– When 𝑣’s component has Β· 2/e nodes , π‘œ 𝑣 = π‘œπ‘£ – Else π‘œ 𝑣 = 2/e, and so 0 < 1

π‘œ 𝑣 βˆ’ 1 π‘œπ‘£ < 1 π‘œ 𝑣 = 𝜁 2

  • Corresponding estimate for C is 𝐷

= βˆ‘

1 π‘œ 𝑣 π‘£βˆˆπ‘Š

. It is a good estimate: 𝐷 βˆ’ 𝐷 = βˆ‘

1 π‘œ 𝑣 π‘£βˆˆπ‘Š

βˆ’ βˆ‘

1 π‘œπ‘£ π‘£βˆˆπ‘Š

≀ βˆ‘

1 π‘œ 𝑣 βˆ’ 1 π‘œπ‘£ ≀ πœπ‘œ 2 π‘£βˆˆπ‘Š

1. Repeat s=Θ(1/e2) times: 2. pick a random vertex 𝑣 3. compute π‘œ 𝑣 via BFS from 𝑣, stopping after at most 2/e new nodes

  • 4. Return 𝐷

= (average of the values 1/π‘œ 𝑣) βˆ™ π‘œ Run time: O(d /e3)

14

𝑏 𝑐 𝑑 1 π‘œ 𝑣 βˆ’ 1 π‘œπ‘£ ≀ 𝜁 2 APPROX_#_CCs (G, d, Ξ΅)

slide-15
SLIDE 15

Approximating # of CCs: Analysis

Want to show: Pr 𝐷 βˆ’ 𝐷 >

πœπ‘œ 2

≀

1 3

Let Yi = 1/π‘œ 𝑣for the ith vertex 𝑣 in the sample

  • Y = βˆ‘

𝑑 𝑗=1

Yi =

𝑑𝐷 π‘œ and E[Y] = βˆ‘ 𝑑 𝑗=1

E[Yi] = 𝑑 β‹… E[Y1] = 𝑑 β‹…

1 π‘œ βˆ‘ 1 π‘œ 𝑀 π‘£βˆˆπ‘Š

=

𝑑𝐷 π‘œ

Pr 𝐷 βˆ’ 𝐷 >

πœπ‘œ 2

= Pr

π‘œ 𝑑 𝑍 βˆ’ π‘œ 𝑑 𝐹 𝑍

>

πœπ‘œ 2

= Pr Y βˆ’ E Y >

πœπ‘‘ 2 ≀ 2π‘“βˆ’πœ2𝑑

2

  • Need 𝑑 = Θ

1 𝜁2 samples to get probability ≀ 1 3

15

Let Y1, … , Ys be independently distributed random variables in [0,1] and let Y = βˆ‘

𝑑 𝑗=1

Yi (sample sum). Then Pr Y βˆ’ E Y β‰₯ Ξ΄ ≀ 2eβˆ’2πœ€2/𝑑.

Hoeffding Bound

slide-16
SLIDE 16

Approximating # of CCs: Analysis

So far: 𝐷 βˆ’ 𝐷 ≀

πœπ‘œ 2

Pr 𝐷 βˆ’ 𝐷 >

πœπ‘œ 2

≀

1 3

  • With probability β‰₯

2 3 ,

𝐷 βˆ’ 𝐷 ≀ 𝐷 βˆ’ 𝐷 + 𝐷 βˆ’ 𝐷 ≀ πœπ‘œ 2 + πœπ‘œ 2 ≀ πœπ‘œ Summary: The number of connected components in π‘œ-vetex graphs of degree at most 𝑒 can be estimated within Β±πœπ‘œ in time 𝑃

𝑒 𝜁3 .

16

slide-17
SLIDE 17

Minimum spanning tree (MST)

  • What is the cheapest way to connect all the dots?

Input: a weighted graph with n vertices and m edges

  • Exact computation:

– Deterministic 𝑃(𝑛 βˆ™ inverse-Ackermann(𝑛)) time [Chazelle] – Randomized 𝑃(𝑛) time [Karger Klein Tarjan]

1 3 7 5 2 4

17

Partially based on slides by Ronitt Rubinfeld: http://stellar.mit.edu/S/course/6/fa10/6.896/courseMaterial/topics/topic3/lectureNotes/lecst11/lecst11.pdf

slide-18
SLIDE 18

Approximating MST Weight in Sublinear Time

[Chazelle Rubinfeld Trevisan]

Input: a graph 𝐻 = (π‘Š, 𝐹) on n vertices

  • in adjacency lists representation
  • maximum degree d and maximum allowed weight w
  • weights in {1,2,…,w}

Output: (1+ Ξ΅)-approximation to MST weight, π‘₯π‘π‘‡π‘ˆ Time:

  • Known: 𝑃

𝑒π‘₯ 𝜁3 log 𝑒π‘₯ 𝜁

, W

𝑒π‘₯ 𝜁2

  • Today: 𝑃

𝑒π‘₯3log π‘₯ 𝜁3

18

No dependence on n!

slide-19
SLIDE 19

Idea Behind Algorithm

  • Characterize MST weight in terms of number of connected

components in certain subgraphs of G

  • Already know that number of connected components can be

estimated quickly

19

slide-20
SLIDE 20
  • Recall Kruskal’s algorithm for computing MST exactly.

Suppose all weights are 1 or 2. Then MST weight

= (# weight-1 edges in MST) + 2 β‹… (# weight-2 edges in MST)

= π‘œ – 1 + (# of weight-2 edges in MST) = π‘œ – 1 + (# of CCs induced by weight-1 edges) βˆ’1 weight 1 weight 2 connected components induced by weight-1 edges MST

MST and Connected Components: Warm-up

MST has π‘œ βˆ’ 1 edges By Kruskal

slide-21
SLIDE 21

MST and Connected Components

In general: Let 𝐻𝑗 = subgraph of 𝐻 containing all edges of weight ≀ 𝑗 𝐷𝑗 = number of connected components in 𝐻𝑗 Then MST has 𝐷𝑗 βˆ’ 1 edges of weight > 𝑗.

  • Let 𝛾𝑗 be the number of edges of weight > 𝑗 in MST
  • Each MST edge contributes 1 to π‘₯π‘π‘‡π‘ˆ, each MST edge of weight >1

contributes 1 more, each MST edge of weight >2 contributes one more, … π‘₯π‘π‘‡π‘ˆ 𝐻 = 𝛾𝑗

π‘₯βˆ’1 𝑗=0

= (𝐷𝑗

π‘₯βˆ’1 𝑗=0

βˆ’ 1) = βˆ’π‘₯ + 𝐷𝑗

π‘₯βˆ’1 𝑗=0

= π‘œ βˆ’ π‘₯ + 𝐷𝑗

π‘₯βˆ’1 𝑗=1

21

π‘₯π‘π‘‡π‘ˆ 𝐻 = π‘œ βˆ’ π‘₯ + 𝐷𝑗

π‘₯βˆ’1 𝑗=1

Claim

slide-22
SLIDE 22

APPROX_MSTweight (G, w, d, Ξ΅)

Algorithm for Approximating 𝒙𝑡𝑻𝑼

1. For 𝑗 = 1 to π‘₯ βˆ’ 1 do: 2. 𝐷 𝑗 ←APPROX_#CCs(𝐻𝑗 , 𝑒, 𝜁/w). 3. Return π‘₯

π‘π‘‡π‘ˆ = π‘œ βˆ’ π‘₯ + βˆ‘ 𝐷 𝑗

π‘₯βˆ’1 𝑗=1

. Analysis:

  • Suppose all estimates of 𝐷𝑗’s are good: 𝐷

𝑗 βˆ’ 𝐷𝑗 ≀

𝜁 π‘₯ π‘œ.

Then π‘₯ π‘π‘‡π‘ˆ βˆ’ π‘₯π‘π‘‡π‘ˆ = | βˆ‘ (𝐷 π‘—βˆ’π·π‘—)| ≀

π‘₯βˆ’1 𝑗=1

βˆ‘ |𝐷 𝑗 βˆ’ 𝐷𝑗| ≀ π‘₯ β‹…

π‘₯βˆ’1 𝑗=1 𝜁 π‘₯ π‘œ = πœπ‘œ

  • Pr[all π‘₯ βˆ’ 1 estimates are good]β‰₯ 2/3 π‘₯βˆ’1
  • Not good enough! Need error probability ≀

1 3π‘₯ for each iteration

  • Then, by Union Bound, Pr[error]≀ π‘₯ β‹…

1 3π‘₯ = 1 3

  • Can amplify success probability of any algorithm by repeating it and taking

the median answer.

  • Can take more samples in APPROX_#CCs. What’s the resulting run time?

22

  • Claim. π‘₯π‘π‘‡π‘ˆ 𝐻 = π‘œ βˆ’ π‘₯ + βˆ‘

𝐷𝑗

π‘₯βˆ’1 𝑗=1

slide-23
SLIDE 23

Multiplicative Approximation for 𝒙𝑡𝑻𝑼

For MST cost, additive approximation ⟹ multiplicative approximation

π‘₯π‘π‘‡π‘ˆ β‰₯ π‘œ βˆ’ 1 ⟹ π‘₯π‘π‘‡π‘ˆ β‰₯ π‘œ/2 for π‘œ β‰₯ 2

  • πœπ‘œ-additive approximation:

π‘₯π‘π‘‡π‘ˆ βˆ’ πœπ‘œ ≀ π‘₯ π‘π‘‡π‘ˆ ≀ π‘₯π‘π‘‡π‘ˆ + πœπ‘œ

  • (1 Β± 2𝜁)-multiplicative approximation:

π‘₯π‘π‘‡π‘ˆ 1 βˆ’ 2𝜁 ≀ π‘₯π‘π‘‡π‘ˆ βˆ’ πœπ‘œ ≀ π‘₯ π‘π‘‡π‘ˆ ≀ π‘₯π‘π‘‡π‘ˆ + πœπ‘œ ≀ π‘₯π‘π‘‡π‘ˆ 1 + 2𝜁

23