Algorithms for Big Data (XIV) Chihao Zhang Shanghai Jiao Tong - - PowerPoint PPT Presentation

algorithms for big data xiv
SMART_READER_LITE
LIVE PREVIEW

Algorithms for Big Data (XIV) Chihao Zhang Shanghai Jiao Tong - - PowerPoint PPT Presentation

Algorithms for Big Data (XIV) Chihao Zhang Shanghai Jiao Tong University Dec. 20, 2019 Algorithms for Big Data (XIV) 1/12 We defined the graph Laplacian Review Last week we studied electrical networks using matrices. : We also defined the


slide-1
SLIDE 1

Algorithms for Big Data (XIV)

Chihao Zhang

Shanghai Jiao Tong University

  • Dec. 20, 2019

Algorithms for Big Data (XIV) 1/12

slide-2
SLIDE 2

Review

Last week we studied electrical networks using matrices. We defined the graph Laplacian : We also defined the notion of efgective resistance between two vertices in terms of :

eff

e e e e

Algorithms for Big Data (XIV) 2/12

slide-3
SLIDE 3

Review

Last week we studied electrical networks using matrices. We defined the graph Laplacian : We also defined the notion of efgective resistance between two vertices in terms of :

eff

e e e e

Algorithms for Big Data (XIV) 2/12

slide-4
SLIDE 4

Review

Last week we studied electrical networks using matrices. We defined the graph Laplacian L: L = UTWU. We also defined the notion of efgective resistance between two vertices in terms of :

eff

e e e e

Algorithms for Big Data (XIV) 2/12

slide-5
SLIDE 5

Review

Last week we studied electrical networks using matrices. We defined the graph Laplacian L: L = UTWU. We also defined the notion of efgective resistance between two vertices in terms of L: Reff(u, v) ≜ (eu − ev)TL+(eu − ev).

Algorithms for Big Data (XIV) 2/12

slide-6
SLIDE 6

Sparsification

Given a graph , the goal of sparsification is to construct a sparse graph such that Similar Laplacian implies similar spectrum; similar efgective resistance between any two vertices; similar clustering; …

Algorithms for Big Data (XIV) 3/12

slide-7
SLIDE 7

Sparsification

Given a graph G, the goal of sparsification is to construct a sparse graph H such that (1 − ε)LG ≼ LH ≼ (1 + ε)LG. Similar Laplacian implies similar spectrum; similar efgective resistance between any two vertices; similar clustering; …

Algorithms for Big Data (XIV) 3/12

slide-8
SLIDE 8

Sparsification

Given a graph G, the goal of sparsification is to construct a sparse graph H such that (1 − ε)LG ≼ LH ≼ (1 + ε)LG. Similar Laplacian implies ▶ similar spectrum; ▶ similar efgective resistance between any two vertices; ▶ similar clustering; ▶ …

Algorithms for Big Data (XIV) 3/12

slide-9
SLIDE 9

The Construction

We use to denote the Laplacian of the unweighted graph containing a single edge . For a graph , we have where is the weight on the edge . Let be a collection of probabilities on each pair of vertices.

Algorithms for Big Data (XIV) 4/12

slide-10
SLIDE 10

The Construction

We use Lu,v to denote the Laplacian of the unweighted graph containing a single edge {u, v}. For a graph , we have where is the weight on the edge . Let be a collection of probabilities on each pair of vertices.

Algorithms for Big Data (XIV) 4/12

slide-11
SLIDE 11

The Construction

We use Lu,v to denote the Laplacian of the unweighted graph containing a single edge {u, v}. For a graph G = (V, E), we have LG = ∑

{u,v}∈E

wu,v · Lu,v, where wu,v is the weight on the edge {u, v} ∈ E. Let be a collection of probabilities on each pair of vertices.

Algorithms for Big Data (XIV) 4/12

slide-12
SLIDE 12

The Construction

We use Lu,v to denote the Laplacian of the unweighted graph containing a single edge {u, v}. For a graph G = (V, E), we have LG = ∑

{u,v}∈E

wu,v · Lu,v, where wu,v is the weight on the edge {u, v} ∈ E. Let {pu,v}{u,v}∈E be a collection of probabilities on each pair of vertices.

Algorithms for Big Data (XIV) 4/12

slide-13
SLIDE 13

Let H = (V, EH) be the sparse graph we are going to construct… contains the edge with probability for every pair independently. If an edge , we assign it with weight . It is easy to verify that E We will carefully choose to guarantee that is sparse with high probability; is well-concentrated to its expectation.

Algorithms for Big Data (XIV) 5/12

slide-14
SLIDE 14

Let H = (V, EH) be the sparse graph we are going to construct… H contains the edge {u, v} with probability pu,v for every pair {u, v} independently. If an edge , we assign it with weight . It is easy to verify that E We will carefully choose to guarantee that is sparse with high probability; is well-concentrated to its expectation.

Algorithms for Big Data (XIV) 5/12

slide-15
SLIDE 15

Let H = (V, EH) be the sparse graph we are going to construct… H contains the edge {u, v} with probability pu,v for every pair {u, v} independently. If an edge {u, v} ∈ EH, we assign it with weight wu,v/pu,v. It is easy to verify that E We will carefully choose to guarantee that is sparse with high probability; is well-concentrated to its expectation.

Algorithms for Big Data (XIV) 5/12

slide-16
SLIDE 16

Let H = (V, EH) be the sparse graph we are going to construct… H contains the edge {u, v} with probability pu,v for every pair {u, v} independently. If an edge {u, v} ∈ EH, we assign it with weight wu,v/pu,v. It is easy to verify that E [LH] = LG. We will carefully choose to guarantee that is sparse with high probability; is well-concentrated to its expectation.

Algorithms for Big Data (XIV) 5/12

slide-17
SLIDE 17

Let H = (V, EH) be the sparse graph we are going to construct… H contains the edge {u, v} with probability pu,v for every pair {u, v} independently. If an edge {u, v} ∈ EH, we assign it with weight wu,v/pu,v. It is easy to verify that E [LH] = LG. We will carefully choose {pu,v} to guarantee that is sparse with high probability; is well-concentrated to its expectation.

Algorithms for Big Data (XIV) 5/12

slide-18
SLIDE 18

Let H = (V, EH) be the sparse graph we are going to construct… H contains the edge {u, v} with probability pu,v for every pair {u, v} independently. If an edge {u, v} ∈ EH, we assign it with weight wu,v/pu,v. It is easy to verify that E [LH] = LG. We will carefully choose {pu,v} to guarantee that ▶ H is sparse with high probability; ▶ LH is well-concentrated to its expectation.

Algorithms for Big Data (XIV) 5/12

slide-19
SLIDE 19

A Transformation

Sometimes it is more convenient to work with , the pseudo-inverse of . Note that The matrix is the projection onto the column space of . We will now study .

Algorithms for Big Data (XIV) 6/12

slide-20
SLIDE 20

A Transformation

Sometimes it is more convenient to work with L+

G, the pseudo-inverse of LG.

Note that The matrix is the projection onto the column space of . We will now study .

Algorithms for Big Data (XIV) 6/12

slide-21
SLIDE 21

A Transformation

Sometimes it is more convenient to work with L+

G, the pseudo-inverse of LG.

Note that LH ≼ (1 + ε)LG ⇐ ⇒ L+/2

G LHL+/2 G

≼ (1 + ε)L+/2

G LGL+/2 G .

The matrix is the projection onto the column space of . We will now study .

Algorithms for Big Data (XIV) 6/12

slide-22
SLIDE 22

A Transformation

Sometimes it is more convenient to work with L+

G, the pseudo-inverse of LG.

Note that LH ≼ (1 + ε)LG ⇐ ⇒ L+/2

G LHL+/2 G

≼ (1 + ε)L+/2

G LGL+/2 G .

The matrix L+/2

G LGL+/2 G

is the projection onto the column space of LG. We will now study .

Algorithms for Big Data (XIV) 6/12

slide-23
SLIDE 23

A Transformation

Sometimes it is more convenient to work with L+

G, the pseudo-inverse of LG.

Note that LH ≼ (1 + ε)LG ⇐ ⇒ L+/2

G LHL+/2 G

≼ (1 + ε)L+/2

G LGL+/2 G .

The matrix L+/2

G LGL+/2 G

is the projection onto the column space of LG. We will now study L+/2

G LHL+/2 G .

Algorithms for Big Data (XIV) 6/12

slide-24
SLIDE 24

Chernoff Bound for Matrices

The main tool to establish concentration is the following analogue of Chernofg bound for matrices.

Theorem

Let be independent random positive semi-definite matrices such that almost surely. Let . Let and be the minimum and maximum eigenvalues of E

  • respectively. Then

Pr , for , and Pr , for .

Algorithms for Big Data (XIV) 7/12

slide-25
SLIDE 25

Chernoff Bound for Matrices

The main tool to establish concentration is the following analogue of Chernofg bound for matrices.

Theorem

Let be independent random positive semi-definite matrices such that almost surely. Let . Let and be the minimum and maximum eigenvalues of E

  • respectively. Then

Pr , for , and Pr , for .

Algorithms for Big Data (XIV) 7/12

slide-26
SLIDE 26

Chernoff Bound for Matrices

The main tool to establish concentration is the following analogue of Chernofg bound for matrices.

Theorem

Let X1, . . . , Xn ∈ Rn×n be independent random positive semi-definite matrices such that λmax(Xi) ≤ R almost surely. Let X = ∑n

i=1 Xi. Let µmin and µmax be the minimum

and maximum eigenvalues of E [X] respectively. Then ▶ Pr [λmin (X) ≤ (1 − ε)µmin] ≤ n (

e−ε (1−ε)1−ε

)µmin/R , for 0 < ε < 1, and ▶ Pr [λmax (X) ≥ (1 + ε)µmax] ≤ n (

eε (1+ε)1+ε

)µmax/R , for ε > 0.

Algorithms for Big Data (XIV) 7/12

slide-27
SLIDE 27

Setting pu,v

For every pair of vertices and , we define Following our construction of , for every , define a random variable w.p.

  • therwise.

Then and

max

Algorithms for Big Data (XIV) 8/12

slide-28
SLIDE 28

Setting pu,v

For every pair of vertices u and v, we define pu,v ≜ 1 Rwu,v∥L+/2

G Lu,vL+/2 G ∥.

Following our construction of , for every , define a random variable w.p.

  • therwise.

Then and

max

Algorithms for Big Data (XIV) 8/12

slide-29
SLIDE 29

Setting pu,v

For every pair of vertices u and v, we define pu,v ≜ 1 Rwu,v∥L+/2

G Lu,vL+/2 G ∥.

Following our construction of H, for every {u, v}, define a random variable Xu,v = { (wu,v/pu,v)L+/2

G Lu,vL+/2 G ,

w.p. pu,v 0,

  • therwise.

Then and

max

Algorithms for Big Data (XIV) 8/12

slide-30
SLIDE 30

Setting pu,v

For every pair of vertices u and v, we define pu,v ≜ 1 Rwu,v∥L+/2

G Lu,vL+/2 G ∥.

Following our construction of H, for every {u, v}, define a random variable Xu,v = { (wu,v/pu,v)L+/2

G Lu,vL+/2 G ,

w.p. pu,v 0,

  • therwise.

Then L+/2

G LHL+/2 G

= ∑

{u,v}∈E

Xu,v, and

max

Algorithms for Big Data (XIV) 8/12

slide-31
SLIDE 31

Setting pu,v

For every pair of vertices u and v, we define pu,v ≜ 1 Rwu,v∥L+/2

G Lu,vL+/2 G ∥.

Following our construction of H, for every {u, v}, define a random variable Xu,v = { (wu,v/pu,v)L+/2

G Lu,vL+/2 G ,

w.p. pu,v 0,

  • therwise.

Then L+/2

G LHL+/2 G

= ∑

{u,v}∈E

Xu,v, and λmax (Xu,v) ≤ R.

Algorithms for Big Data (XIV) 8/12

slide-32
SLIDE 32

Relation to Resistance

It remains to compute . It is easy to verify that e e e e is a rank-1 matrix. Therefore Tr e e e e

eff

We can then use the algorithm learnt in the last lecture to approximate

eff

.

Algorithms for Big Data (XIV) 9/12

slide-33
SLIDE 33

Relation to Resistance

It remains to compute pu,v. It is easy to verify that e e e e is a rank-1 matrix. Therefore Tr e e e e

eff

We can then use the algorithm learnt in the last lecture to approximate

eff

.

Algorithms for Big Data (XIV) 9/12

slide-34
SLIDE 34

Relation to Resistance

It remains to compute pu,v. It is easy to verify that L+/2

G Lu,vL+/2 G

= L+/2

G (eu − ev)(eu − ev)TL+/2 G

is a rank-1 matrix. Therefore Tr e e e e

eff

We can then use the algorithm learnt in the last lecture to approximate

eff

.

Algorithms for Big Data (XIV) 9/12

slide-35
SLIDE 35

Relation to Resistance

It remains to compute pu,v. It is easy to verify that L+/2

G Lu,vL+/2 G

= L+/2

G (eu − ev)(eu − ev)TL+/2 G

is a rank-1 matrix. Therefore ∥L+/2

G Lu,vL+/2 G ∥ = Tr(L+/2 G Lu,vL+/2 G ) = (eu − ev)TL+ G(eu − ev) = Reff(u, v).

We can then use the algorithm learnt in the last lecture to approximate

eff

.

Algorithms for Big Data (XIV) 9/12

slide-36
SLIDE 36

Relation to Resistance

It remains to compute pu,v. It is easy to verify that L+/2

G Lu,vL+/2 G

= L+/2

G (eu − ev)(eu − ev)TL+/2 G

is a rank-1 matrix. Therefore ∥L+/2

G Lu,vL+/2 G ∥ = Tr(L+/2 G Lu,vL+/2 G ) = (eu − ev)TL+ G(eu − ev) = Reff(u, v).

We can then use the algorithm learnt in the last lecture to approximate Reff(u, v).

Algorithms for Big Data (XIV) 9/12

slide-37
SLIDE 37

Analysis

We now compute E . It holds that E

eff

We can also directly compute

eff

e e e e Tr e e e e Tr e e e e Tr

Algorithms for Big Data (XIV) 10/12

slide-38
SLIDE 38

Analysis

We now compute E [|EH|]. It holds that E

eff

We can also directly compute

eff

e e e e Tr e e e e Tr e e e e Tr

Algorithms for Big Data (XIV) 10/12

slide-39
SLIDE 39

Analysis

We now compute E [|EH|]. It holds that E [|EH|] = ∑

{u,v}∈E

pu,v = ∑

{u,v}∈E wu,v · Reff(u, v)

R . We can also directly compute

eff

e e e e Tr e e e e Tr e e e e Tr

Algorithms for Big Data (XIV) 10/12

slide-40
SLIDE 40

Analysis

We now compute E [|EH|]. It holds that E [|EH|] = ∑

{u,v}∈E

pu,v = ∑

{u,v}∈E wu,v · Reff(u, v)

R . We can also directly compute ∑

{u,v}∈E

wu,vReff(u, v) = ∑

{u,v}∈E

wu,v(eu − ev)TL+

G(eu − ev)

= ∑

{u,v}∈E

wu,vTr(L+

G(eu − ev)(eu − ev)T)

= Tr   ∑

{u,v}∈E

L+

Gwu,v(eu − ev)(eu − ev)T

  = Tr ( L+

GLG

) = n − 1.

Algorithms for Big Data (XIV) 10/12

slide-41
SLIDE 41

Therefore, E [|EH|] = n−1

R .

Note that E is the sum of independent Bernoulli trials, therefore, for suitable , we can control its concentration using the standard Chernofg bound. We choose

log , then

log with high probability. Now we can apply Matrix Chernofg bound to obtain the concentration bound needed.

Algorithms for Big Data (XIV) 11/12

slide-42
SLIDE 42

Therefore, E [|EH|] = n−1

R .

Note that E [|EH|] is the sum of m independent Bernoulli trials, therefore, for suitable R, we can control its concentration using the standard Chernofg bound. We choose

log , then

log with high probability. Now we can apply Matrix Chernofg bound to obtain the concentration bound needed.

Algorithms for Big Data (XIV) 11/12

slide-43
SLIDE 43

Therefore, E [|EH|] = n−1

R .

Note that E [|EH|] is the sum of m independent Bernoulli trials, therefore, for suitable R, we can control its concentration using the standard Chernofg bound. We choose R =

ε2 3.5 log n, then |EH| ≤ 4ε−2n log n with high probability.

Now we can apply Matrix Chernofg bound to obtain the concentration bound needed.

Algorithms for Big Data (XIV) 11/12

slide-44
SLIDE 44

Therefore, E [|EH|] = n−1

R .

Note that E [|EH|] is the sum of m independent Bernoulli trials, therefore, for suitable R, we can control its concentration using the standard Chernofg bound. We choose R =

ε2 3.5 log n, then |EH| ≤ 4ε−2n log n with high probability.

Now we can apply Matrix Chernofg bound to obtain the concentration bound needed.

Algorithms for Big Data (XIV) 11/12

slide-45
SLIDE 45

pu,v > 1?

Algorithms for Big Data (XIV) 12/12

slide-46
SLIDE 46

pu,v > 1?

Algorithms for Big Data (XIV) 12/12