Algorithms for Big Data (VI) Chihao Zhang Shanghai Jiao Tong - - PowerPoint PPT Presentation

algorithms for big data vi
SMART_READER_LITE
LIVE PREVIEW

Algorithms for Big Data (VI) Chihao Zhang Shanghai Jiao Tong - - PowerPoint PPT Presentation

Algorithms for Big Data (VI) Chihao Zhang Shanghai Jiao Tong University Oct. 25, 2019 Algorithms for Big Data (VI) 1/13 Review . Algorithms for Big Data (VI) . Output ; , On input from a -universal family; Pick log We learnt AMS


slide-1
SLIDE 1

Algorithms for Big Data (VI)

Chihao Zhang

Shanghai Jiao Tong University

  • Oct. 25, 2019

Algorithms for Big Data (VI) 1/13

slide-2
SLIDE 2

Review

We learnt AMS algorithm to estimate f for using log log bits. An ad-hoc algorithm for f costs log log . Pick from a -universal family; On input , ; Output .

Algorithms for Big Data (VI) 2/13

slide-3
SLIDE 3

Review

We learnt AMS algorithm to estimate ∥f∥k

k for k ≥ 2 using O

( kn1−1/k(logm + logn) ) bits. An ad-hoc algorithm for f costs log log . Pick from a -universal family; On input , ; Output .

Algorithms for Big Data (VI) 2/13

slide-4
SLIDE 4

Review

We learnt AMS algorithm to estimate ∥f∥k

k for k ≥ 2 using O

( kn1−1/k(logm + logn) ) bits. An ad-hoc algorithm for ∥f∥2

2 costs O (logm + logn).

Pick from a -universal family; On input , ; Output .

Algorithms for Big Data (VI) 2/13

slide-5
SLIDE 5

Review

We learnt AMS algorithm to estimate ∥f∥k

k for k ≥ 2 using O

( kn1−1/k(logm + logn) ) bits. An ad-hoc algorithm for ∥f∥2

2 costs O (logm + logn).

▶ Pick h : [n] → {−1, 1} from a 4-universal family; ▶ On input (j, ∆), x ← x + ∆ · h(j); ▶ Output x2.

Algorithms for Big Data (VI) 2/13

slide-6
SLIDE 6

An Algebraic View

It is instructive to view the Tug-of-War algorithm from linear algebra. Assume that we run the algorithm times (to apply the averaging trick), each time with function . Consider the matrix where . Let x f, we know that E . Our algorithm outputs

x

. The 2-norm of the vector

x is close to that of f!

Algorithms for Big Data (VI) 3/13

slide-7
SLIDE 7

An Algebraic View

It is instructive to view the Tug-of-War algorithm from linear algebra. Assume that we run the algorithm times (to apply the averaging trick), each time with function . Consider the matrix where . Let x f, we know that E . Our algorithm outputs

x

. The 2-norm of the vector

x is close to that of f!

Algorithms for Big Data (VI) 3/13

slide-8
SLIDE 8

An Algebraic View

It is instructive to view the Tug-of-War algorithm from linear algebra. Assume that we run the algorithm k times (to apply the averaging trick), each time with function hi. Consider the matrix where . Let x f, we know that E . Our algorithm outputs

x

. The 2-norm of the vector

x is close to that of f!

Algorithms for Big Data (VI) 3/13

slide-9
SLIDE 9

An Algebraic View

It is instructive to view the Tug-of-War algorithm from linear algebra. Assume that we run the algorithm k times (to apply the averaging trick), each time with function hi. Consider the matrix A = (aij)i ∈[k],j ∈[n] where aij = hi(j). Let x f, we know that E . Our algorithm outputs

x

. The 2-norm of the vector

x is close to that of f!

Algorithms for Big Data (VI) 3/13

slide-10
SLIDE 10

An Algebraic View

It is instructive to view the Tug-of-War algorithm from linear algebra. Assume that we run the algorithm k times (to apply the averaging trick), each time with function hi. Consider the matrix A = (aij)i ∈[k],j ∈[n] where aij = hi(j). Let x = Af, we know that E [ x2

i

] = ∥f ∥2

  • 2. Our algorithm outputs

∑k

i=1 x 2 i

k

=

∥x∥2

2

k .

The 2-norm of the vector

x is close to that of f!

Algorithms for Big Data (VI) 3/13

slide-11
SLIDE 11

An Algebraic View

It is instructive to view the Tug-of-War algorithm from linear algebra. Assume that we run the algorithm k times (to apply the averaging trick), each time with function hi. Consider the matrix A = (aij)i ∈[k],j ∈[n] where aij = hi(j). Let x = Af, we know that E [ x2

i

] = ∥f ∥2

  • 2. Our algorithm outputs

∑k

i=1 x 2 i

k

=

∥x∥2

2

k .

The 2-norm of the vector

x √ k is close to that of f!

Algorithms for Big Data (VI) 3/13

slide-12
SLIDE 12

Dimension Reduction

Suppose , what the matrix does is to map a vector in to a vector in without changing its norm much. This operation is ofuen referred as dimension reduction or metric embedding. The algorithm we met is similar to one important dimension reduction technique - Johnson-Lindenstrauss transformation.

Algorithms for Big Data (VI) 4/13

slide-13
SLIDE 13

Dimension Reduction

Suppose k ≪ n, what the matrix A does is to map a vector in Rn to a vector in Rk without changing its norm much. This operation is ofuen referred as dimension reduction or metric embedding. The algorithm we met is similar to one important dimension reduction technique - Johnson-Lindenstrauss transformation.

Algorithms for Big Data (VI) 4/13

slide-14
SLIDE 14

Dimension Reduction

Suppose k ≪ n, what the matrix A does is to map a vector in Rn to a vector in Rk without changing its norm much. This operation is ofuen referred as dimension reduction or metric embedding. The algorithm we met is similar to one important dimension reduction technique - Johnson-Lindenstrauss transformation.

Algorithms for Big Data (VI) 4/13

slide-15
SLIDE 15

Johnson-Lindenstrauss transformation

Theorem

For any and any positive integer , consider a set of points . There exists an matrix where log satisfying x y x y x y x y We construct by drawing each of its entry from independently.

Algorithms for Big Data (VI) 5/13

slide-16
SLIDE 16

Johnson-Lindenstrauss transformation

Theorem

For any 0 < ε < 1

2 and any positive integer m, consider a set of m points S ⊆ Rn. There

exists an matrix A ∈ Rk×n where k = O (ε−2 logm) satisfying ∀x, y ∈ S, (1 − ε)∥x − y∥ ≤ ∥Ax − Ay∥ ≤ (1 + ε)∥x − y∥. We construct by drawing each of its entry from independently.

Algorithms for Big Data (VI) 5/13

slide-17
SLIDE 17

Johnson-Lindenstrauss transformation

Theorem

For any 0 < ε < 1

2 and any positive integer m, consider a set of m points S ⊆ Rn. There

exists an matrix A ∈ Rk×n where k = O (ε−2 logm) satisfying ∀x, y ∈ S, (1 − ε)∥x − y∥ ≤ ∥Ax − Ay∥ ≤ (1 + ε)∥x − y∥. We construct A by drawing each of its entry from N(0, 1

k ) independently.

Algorithms for Big Data (VI) 5/13

slide-18
SLIDE 18

Gaussian Distribution

Recall the density function of a variable is The distribution function is d Assume and , then

Algorithms for Big Data (VI) 6/13

slide-19
SLIDE 19

Gaussian Distribution

Recall the density function of a variable X ∼ N(µ,σ 2) is fX (x) = 1 √ 2πσ e− (x−µ)2

2σ 2 .

The distribution function is d Assume and , then

Algorithms for Big Data (VI) 6/13

slide-20
SLIDE 20

Gaussian Distribution

Recall the density function of a variable X ∼ N(µ,σ 2) is fX (x) = 1 √ 2πσ e− (x−µ)2

2σ 2 .

The distribution function is FX (x) = 1 √ 2πσ ∫ x

−∞

e− (t−µ)2

2σ 2 dt.

Assume and , then

Algorithms for Big Data (VI) 6/13

slide-21
SLIDE 21

Gaussian Distribution

Recall the density function of a variable X ∼ N(µ,σ 2) is fX (x) = 1 √ 2πσ e− (x−µ)2

2σ 2 .

The distribution function is FX (x) = 1 √ 2πσ ∫ x

−∞

e− (t−µ)2

2σ 2 dt.

Assume X1 ∼ N(µ1,σ 2

1) and X2 ∼ N(µ2,σ 2 2), then

aX1 + bX2 ∼ N(aµ1 + bµ2,a2σ 2

1 + b2σ 2 2).

Algorithms for Big Data (VI) 6/13

slide-22
SLIDE 22

Proof of JL

The statement is equivalent to x y x y We only need to show that for every unit length vector f, Pr f Assume x f, then . We need a concentration inequality for squared sum of Gaussians: Pr

Algorithms for Big Data (VI) 7/13

slide-23
SLIDE 23

Proof of JL

The statement is equivalent to 1 − ε ≤ ∥A(x − y)∥ ∥x − y∥ ≤ 1 + ε. We only need to show that for every unit length vector f, Pr f Assume x f, then . We need a concentration inequality for squared sum of Gaussians: Pr

Algorithms for Big Data (VI) 7/13

slide-24
SLIDE 24

Proof of JL

The statement is equivalent to 1 − ε ≤ ∥A(x − y)∥ ∥x − y∥ ≤ 1 + ε. We only need to show that for every unit length vector f, Pr [|∥Af∥ − 1| > ε] ≤ 1 − δ. Assume x f, then . We need a concentration inequality for squared sum of Gaussians: Pr

Algorithms for Big Data (VI) 7/13

slide-25
SLIDE 25

Proof of JL

The statement is equivalent to 1 − ε ≤ ∥A(x − y)∥ ∥x − y∥ ≤ 1 + ε. We only need to show that for every unit length vector f, Pr [|∥Af∥ − 1| > ε] ≤ 1 − δ. Assume x = Af, then xi = ∑

j ∈[n] aij · fj ∼ N(0, 1 k ).

We need a concentration inequality for squared sum of Gaussians: Pr

Algorithms for Big Data (VI) 7/13

slide-26
SLIDE 26

Proof of JL

The statement is equivalent to 1 − ε ≤ ∥A(x − y)∥ ∥x − y∥ ≤ 1 + ε. We only need to show that for every unit length vector f, Pr [|∥Af∥ − 1| > ε] ≤ 1 − δ. Assume x = Af, then xi = ∑

j ∈[n] aij · fj ∼ N(0, 1 k ).

We need a concentration inequality for squared sum of Gaussians: Pr [

  • k

i=1

x2

i − 1

  • ≥ ε

] ≤ 1 − δ.

Algorithms for Big Data (VI) 7/13

slide-27
SLIDE 27

Concentration

Theorem

Assume be i.i.d , then for , Pr The proof is similar to the proof of the Chernofg bound we met before.

Algorithms for Big Data (VI) 8/13

slide-28
SLIDE 28

Concentration

Theorem

Assume X1,X2, . . . ,Xk be i.i.d N(0, 1), then for 0 < ε < 1, Pr [

  • k

i=1

X 2

i − k

  • ≥ εk

] < 2e− ε2k

8 .

The proof is similar to the proof of the Chernofg bound we met before.

Algorithms for Big Data (VI) 8/13

slide-29
SLIDE 29

Concentration

Theorem

Assume X1,X2, . . . ,Xk be i.i.d N(0, 1), then for 0 < ε < 1, Pr [

  • k

i=1

X 2

i − k

  • ≥ εk

] < 2e− ε2k

8 .

The proof is similar to the proof of the Chernofg bound we met before.

Algorithms for Big Data (VI) 8/13

slide-30
SLIDE 30

Estimate F2 from JL

We can use JL to estimate : Algorithm JL Transformation Init: from . . On Input : Output: Output . The algorithm is neither friendly to implement nor efgicient, but it is inspiring. The core property we used to prove its correctness is that has the same distribution as f where .

Algorithms for Big Data (VI) 9/13

slide-31
SLIDE 31

Estimate F2 from JL

We can use JL to estimate F2: Algorithm JL Transformation Init: from . . On Input : Output: Output . The algorithm is neither friendly to implement nor efgicient, but it is inspiring. The core property we used to prove its correctness is that has the same distribution as f where .

Algorithms for Big Data (VI) 9/13

slide-32
SLIDE 32

Estimate F2 from JL

We can use JL to estimate F2: Algorithm JL Transformation Init: Z1, . . . ,Zn from N(0, 1). x ← 0. On Input (y, ∆): x ← x + ∆ · Zy Output: Output x2. The algorithm is neither friendly to implement nor efgicient, but it is inspiring. The core property we used to prove its correctness is that has the same distribution as f where .

Algorithms for Big Data (VI) 9/13

slide-33
SLIDE 33

Estimate F2 from JL

We can use JL to estimate F2: Algorithm JL Transformation Init: Z1, . . . ,Zn from N(0, 1). x ← 0. On Input (y, ∆): x ← x + ∆ · Zy Output: Output x2. The algorithm is neither friendly to implement nor efgicient, but it is inspiring. The core property we used to prove its correctness is that has the same distribution as f where .

Algorithms for Big Data (VI) 9/13

slide-34
SLIDE 34

Estimate F2 from JL

We can use JL to estimate F2: Algorithm JL Transformation Init: Z1, . . . ,Zn from N(0, 1). x ← 0. On Input (y, ∆): x ← x + ∆ · Zy Output: Output x2. The algorithm is neither friendly to implement nor efgicient, but it is inspiring. The core property we used to prove its correctness is that ∑n

j=1 Zj · fj has the same

distribution as ∥f∥2Z where Z ∼ N(0, 1).

Algorithms for Big Data (VI) 9/13

slide-35
SLIDE 35

The property generalizes to p < 2. For some distribution , if , then has the same distribution as f where . The distribution is called -stable. We can use them to estimate . Many technical issue of the algorithm is beyond the scope of this course.

Algorithms for Big Data (VI) 10/13

slide-36
SLIDE 36

The property generalizes to p < 2. For some distribution Dp, if Zj ∼ Dp, then ∑

j Zj · fj has the same distribution as ∥f∥pZ

where Z ∼ Dp. The distribution is called -stable. We can use them to estimate . Many technical issue of the algorithm is beyond the scope of this course.

Algorithms for Big Data (VI) 10/13

slide-37
SLIDE 37

The property generalizes to p < 2. For some distribution Dp, if Zj ∼ Dp, then ∑

j Zj · fj has the same distribution as ∥f∥pZ

where Z ∼ Dp. The distribution is called p-stable. We can use them to estimate . Many technical issue of the algorithm is beyond the scope of this course.

Algorithms for Big Data (VI) 10/13

slide-38
SLIDE 38

The property generalizes to p < 2. For some distribution Dp, if Zj ∼ Dp, then ∑

j Zj · fj has the same distribution as ∥f∥pZ

where Z ∼ Dp. The distribution is called p-stable. We can use them to estimate Fp. Many technical issue of the algorithm is beyond the scope of this course.

Algorithms for Big Data (VI) 10/13

slide-39
SLIDE 39

Graph Stream

We have a graph with vertex set , but its edges are unknown. The edge are given in a streaming fashion, namely each time we reveal an edge . Can we compute graph properties using small bits of memories? Say in poly log .

Algorithms for Big Data (VI) 11/13

slide-40
SLIDE 40

Graph Stream

We have a graph with vertex set [n], but its edges are unknown. The edge are given in a streaming fashion, namely each time we reveal an edge . Can we compute graph properties using small bits of memories? Say in poly log .

Algorithms for Big Data (VI) 11/13

slide-41
SLIDE 41

Graph Stream

We have a graph with vertex set [n], but its edges are unknown. The edge are given in a streaming fashion, namely each time we reveal an edge (u,v). Can we compute graph properties using small bits of memories? Say in poly log .

Algorithms for Big Data (VI) 11/13

slide-42
SLIDE 42

Graph Stream

We have a graph with vertex set [n], but its edges are unknown. The edge are given in a streaming fashion, namely each time we reveal an edge (u,v). Can we compute graph properties using small bits of memories? Say in poly log .

Algorithms for Big Data (VI) 11/13

slide-43
SLIDE 43

Graph Stream

We have a graph with vertex set [n], but its edges are unknown. The edge are given in a streaming fashion, namely each time we reveal an edge (u,v). Can we compute graph properties using small bits of memories? Say in O (n · poly(logn)).

Algorithms for Big Data (VI) 11/13

slide-44
SLIDE 44

Connectedness

A basic graph property is whether the graph is connected. We can maintain a spanning forest

  • f

: Init: , . On Input : if and has no cycle then ; if then end if end if Output: Output .

Algorithms for Big Data (VI) 12/13

slide-45
SLIDE 45

Connectedness

A basic graph property is whether the graph is connected. We can maintain a spanning forest

  • f

: Init: , . On Input : if and has no cycle then ; if then end if end if Output: Output .

Algorithms for Big Data (VI) 12/13

slide-46
SLIDE 46

Connectedness

A basic graph property is whether the graph is connected. We can maintain a spanning forest F of G: Init: F ← , X ← 0. On Input (u,v): if X = 0 and F ∪ { (u,v) } has no cycle then F ← F ∪ { (u,v) } ; if |F | = n − 1 then X ← 1 end if end if Output: Output X.

Algorithms for Big Data (VI) 12/13

slide-47
SLIDE 47

Connectedness

A basic graph property is whether the graph is connected. We can maintain a spanning forest F of G: Init: F ← , X ← 0. On Input (u,v): if X = 0 and F ∪ { (u,v) } has no cycle then F ← F ∪ { (u,v) } ; if |F | = n − 1 then X ← 1 end if end if Output: Output X.

Algorithms for Big Data (VI) 12/13

slide-48
SLIDE 48

Bipartiteness

The following algorithm decides whether is bipartite. Init: , . On Input : if then if has no cycle then ; else if has an odd cycle then end if end if Output: Output .

Algorithms for Big Data (VI) 13/13

slide-49
SLIDE 49

Bipartiteness

The following algorithm decides whether G is bipartite. Init: , . On Input : if then if has no cycle then ; else if has an odd cycle then end if end if Output: Output .

Algorithms for Big Data (VI) 13/13

slide-50
SLIDE 50

Bipartiteness

The following algorithm decides whether G is bipartite. Init: F ← , X ← 1. On Input (u,v): if X = 1 then if F ∪ { (u,v) } has no cycle then F ← F ∪ { (u,v) } ; else if F ∪ { (u,v) } has an odd cycle then X ← 0 end if end if Output: Output X.

Algorithms for Big Data (VI) 13/13

slide-51
SLIDE 51

Bipartiteness

The following algorithm decides whether G is bipartite. Init: F ← , X ← 1. On Input (u,v): if X = 1 then if F ∪ { (u,v) } has no cycle then F ← F ∪ { (u,v) } ; else if F ∪ { (u,v) } has an odd cycle then X ← 0 end if end if Output: Output X.

Algorithms for Big Data (VI) 13/13