Randomness in reduced order modeling Akil Narayan 1 1 Department of - - PowerPoint PPT Presentation

randomness in reduced order modeling
SMART_READER_LITE
LIVE PREVIEW

Randomness in reduced order modeling Akil Narayan 1 1 Department of - - PowerPoint PPT Presentation

Randomness in reduced order modeling Akil Narayan 1 1 Department of Mathematics, and Scientific Computing and Imaging (SCI) Institute University of Utah February 7, 2020 ICERM A. Narayan (U. Utah SCI) Randomization and ROM Randomness is


slide-1
SLIDE 1

Randomness in reduced order modeling

Akil Narayan1

1Department of Mathematics, and Scientific Computing and Imaging (SCI) Institute

University of Utah

February 7, 2020 ICERM

  • A. Narayan

(U. Utah – SCI) Randomization and ROM

slide-2
SLIDE 2

Randomness is your friend

Many things that are difficult to accomplish with deterministic

  • ptimization/algorithms can be accomplished˚ with randomization.
  • A. Narayan

(U. Utah – SCI) Randomization and ROM

slide-3
SLIDE 3

Randomness is your friend

Many things that are difficult to accomplish with deterministic

  • ptimization/algorithms can be accomplished˚ with randomization.

˚: with “high” probability

  • A. Narayan

(U. Utah – SCI) Randomization and ROM

slide-4
SLIDE 4

Randomness is your friend

Many things that are difficult to accomplish with deterministic

  • ptimization/algorithms can be accomplished˚ with randomization.

˚: with “high” probability

We’ll consider three examples of this in ROM: RBM for elliptic PDE’s Sparse approximation Measure atomization/discretization

  • A. Narayan

(U. Utah – SCI) Randomization and ROM

slide-5
SLIDE 5

Randomness is your friend

Many things that are difficult to accomplish with deterministic

  • ptimization/algorithms can be accomplished˚ with randomization.

˚: with “high” probability

We’ll consider three examples of this in ROM: RBM for elliptic PDE’s Sparse approximation Measure atomization/discretization

  • A. Narayan

(U. Utah – SCI) Randomization and ROM

slide-6
SLIDE 6

Why is randomness helpful?

Intuition is straightforward and simplistic: Let X be a random variable. Let pXmqmě1 be iid copies of X. Law of large numbers: S :“

M

ÿ

m“1

Xm Ñ ❊rXs. ❊ ❊ ❊ ❊

  • A. Narayan

(U. Utah – SCI) Randomization and ROM

slide-7
SLIDE 7

Why is randomness helpful?

Intuition is straightforward and simplistic: Let X be a random variable. Let pXmqmě1 be iid copies of X. Law of large numbers: S :“

M

ÿ

m“1

Xm Ñ ❊rXs. Furthermore, this convergence is quantitative through the Central limit theorem: SpMq ´ ❊rXs „ N ˆ 0, σ2pXq M ˙ . In other words, S concentrates around ❊rXs. ❊ ❊

  • A. Narayan

(U. Utah – SCI) Randomization and ROM

slide-8
SLIDE 8

Why is randomness helpful?

Intuition is straightforward and simplistic: Let X be a random variable. Let pXmqmě1 be iid copies of X. Law of large numbers: S :“

M

ÿ

m“1

Xm Ñ ❊rXs. Furthermore, this convergence is quantitative through the Central limit theorem: SpMq ´ ❊rXs „ N ˆ 0, σ2pXq M ˙ . In other words, S concentrates around ❊rXs. This statement is quite powerful: S provides an estimator for ❊rXs, without knowing ❊rXs. Convergence is essentially independent of distribution of X. Convergence rate is independent of dimension of X.

  • A. Narayan

(U. Utah – SCI) Randomization and ROM

slide-9
SLIDE 9

Examples of concentration

Concentration in general plays an important role in computing estimates: Monte Carlo (CLT) estimates Chebyshev inequalities (bounds on mass away from the mean) Hoeffding inequalities (bounds on deviation of iid sums from the mean) Chernoff bounds (bounds on deviation of spectrum) Concentration of measure (bounds on deviation of random functions) Today: We’ll see a particular Chernoff bound in action.

  • A. Narayan

(U. Utah – SCI) Randomization and ROM

slide-10
SLIDE 10

Chernoff bound applications

We will see how randomization and Chernoff bounds can be applied to: RBM for elliptic PDE’s Sparse approximation Measure atomization/discretization Before discussing ROM, let’s present the Chernoff bound.

  • A. Narayan

(U. Utah – SCI) Randomization and ROM

slide-11
SLIDE 11

Matrix law of large numbers

Let G P ❘NˆN be a Gramian matrix that is an iid sum of symmetric rank-1 matrices. I.e., let X P ❘N have distribution µ on ❘N, and define G :“ 1 M

M

ÿ

m“1

XmXT

m,

where tXmumě1 are iid copies of X. Chernoff bounds make quantitative statements about the spectrum of G that depend on the distribution of X. ❊

  • A. Narayan

(U. Utah – SCI) Randomization and ROM

slide-12
SLIDE 12

Matrix law of large numbers

Let G P ❘NˆN be a Gramian matrix that is an iid sum of symmetric rank-1 matrices. I.e., let X P ❘N have distribution µ on ❘N, and define G :“ 1 M

M

ÿ

m“1

XmXT

m,

where tXmumě1 are iid copies of X. Chernoff bounds make quantitative statements about the spectrum of G that depend on the distribution of X. For large M, we expect that pGqj,k

MÒ8

Ý Ñ ❊rXjXks.

  • A. Narayan

(U. Utah – SCI) Randomization and ROM

slide-13
SLIDE 13

Matrix law of large numbers

Let G P ❘NˆN be a Gramian matrix that is an iid sum of symmetric rank-1 matrices. I.e., let X P ❘N have distribution µ on ❘N, and define G :“ 1 M

M

ÿ

m“1

XmXT

m,

where tXmumě1 are iid copies of X. Chernoff bounds make quantitative statements about the spectrum of G that depend on the distribution of X. For large M, we expect that pGqj,k

MÒ8

Ý Ñ ❊rXjXks. For simplicity, in all that follows we assume that the components of X are uncorrelated,

  • f unit variance,

so that G

MÒ8

Ý Ñ I

  • A. Narayan

(U. Utah – SCI) Randomization and ROM

slide-14
SLIDE 14

Matrix Chernoff bounds

The proximity of G to I, as a function of M, is determined by K :“ sup

X

}X}2, which is assumed finite.

  • A. Narayan

(U. Utah – SCI) Randomization and ROM

slide-15
SLIDE 15

Matrix Chernoff bounds

The proximity of G to I, as a function of M, is determined by K :“ sup

X

}X}2, which is assumed finite.

Theorem ([Cohen, Davenport, Leviatan 2012])

Assume that M log M Á K δ2 log ˆ1 ǫ ˙ . Then, Pr ” pσminpGq ă 1 ´ δq ď pσmaxpGq ą 1 ` δq ı ď ǫ.

  • A. Narayan

(U. Utah – SCI) Randomization and ROM

slide-16
SLIDE 16

Matrix Chernoff bounds

The proximity of G to I, as a function of M, is determined by K :“ sup

X

}X}2, which is assumed finite.

Theorem ([Cohen, Davenport, Leviatan 2012])

Assume that M log M Á K δ2 log ˆ1 ǫ ˙ . Then, Pr ” pσminpGq ă 1 ´ δq ď pσmaxpGq ą 1 ` δq ı ď ǫ. What can we do with G? Form least-squares approximations using X. Remarks: The δ´2 dependence is “CLT-like”. K is the only thing that depends on the distribution of X.

  • A. Narayan

(U. Utah – SCI) Randomization and ROM

slide-17
SLIDE 17

The induced distribution

It turns out that K can be quite large (or infinite) for practical situations. A fix for this utilizes importance sampling. In particular, define dρpxq :“ ˜ 1 N

N

ÿ

n“1

x2

n

¸ dµpxq, where µ is the distribution of X. ρ is a probability measure on ❘N, and is frequently called the induced distribution.

  • A. Narayan

(U. Utah – SCI) Randomization and ROM

slide-18
SLIDE 18

A (more) optimal Chernoff bound

In practical scenarios, the induced distribution ρ can also be sampled from without too much effort. More importantly, we can get a (much) better Chernoff bound here. Let pY mqmě1 P ❘N be iid samples from ρ. We need to weight the Gramian so that we produce an unbiased estimate: F :“ 1 M

M

ÿ

m“1

wmY mY T

m,

wm :“ dµ dρ pY mq

  • A. Narayan

(U. Utah – SCI) Randomization and ROM

slide-19
SLIDE 19

A (more) optimal Chernoff bound

In practical scenarios, the induced distribution ρ can also be sampled from without too much effort. More importantly, we can get a (much) better Chernoff bound here. Let pY mqmě1 P ❘N be iid samples from ρ. We need to weight the Gramian so that we produce an unbiased estimate: F :“ 1 M

M

ÿ

m“1

wmY mY T

m,

wm :“ dµ dρ pY mq This results in the (better) Chernoff bound Pr ” pσminpF q ă 1 ´ δq ď pσminpF q ą 1 ` δq ı ď ǫ, with the much more reasonable assumption M log M Á N δ2 log ˆ1 ǫ ˙ . This Chernoff bound will be a seed for achieving model reduction.

  • A. Narayan

(U. Utah – SCI) Randomization and ROM

slide-20
SLIDE 20

Example 1: RBM (for elliptic problems)

  • A. Narayan

(U. Utah – SCI) Randomization and ROM

slide-21
SLIDE 21

Reduced basis methods

For the parameterized problem, ´∇ ¨ ˜ 8 ÿ

j“1

µjajpxq∇u ¸ “ b, with µ P r´1, 1s8, recall that RBM (essentially) iteratively computes arg max

µ

}upµq ´ Pj´1pupµqq} ,

  • A. Narayan

(U. Utah – SCI) Randomization and ROM

slide-22
SLIDE 22

Reduced basis methods

For the parameterized problem, ´∇ ¨ ˜ 8 ÿ

j“1

µjajpxq∇u ¸ “ b, with µ P r´1, 1s8, recall that RBM (essentially) iteratively computes arg max

µ

}upµq ´ Pj´1pupµqq} , If (any truncation of) µ is high-dimensional, this is an expensive optimization, even if the objective is easy to evaluate. There’s a bigger problem: the arg max is typically taken over a discrete µ grid. If µ is high-dimensional, how can we certify error without densely sampling?

  • A. Narayan

(U. Utah – SCI) Randomization and ROM

slide-23
SLIDE 23

Reduction feasibility

Some analysis gives us a strategy to proceed: if the tajud

j“1 satisfies an ℓp

summability condition,

8

ÿ

j“1

}aj}p

L8 ă 8,

p ă 1, then there is an N-dimensional downward-closed polynomial space PN in the variable µ such that sup

µ

› ›upµq ´ ProjPN upµq › › ď N ´s, s :“ 1 p ´ 1 2. There are constructive algorithms to essentially identify PN, [Cohen, Devore, Schwab

2011].

In particular, once PN is identified, this approximation can be obtained by µ-least-squares approximation.

  • A. Narayan

(U. Utah – SCI) Randomization and ROM

slide-24
SLIDE 24

Polynomial meshes

I.e., if we can certify accuracy on a “polynomial grid”, we can probably obtain accuracy. Let µ be a random variable with distribution ν. ❊

  • A. Narayan

(U. Utah – SCI) Randomization and ROM

slide-25
SLIDE 25

Polynomial meshes

I.e., if we can certify accuracy on a “polynomial grid”, we can probably obtain accuracy. Let µ be a random variable with distribution ν. Let X “ pXnpνqqN

n“1 denote a

dν-orthonormal basis for PN. Define the induced distribution ρ “ ρpν, Xq based on this, sample tY muM

mě1 from ρ, and use this to discretize the arg max procedure in

RBM. Let uNpµq denote the resulting N-degree of freedom RBM surrogate. ❊

  • A. Narayan

(U. Utah – SCI) Randomization and ROM

slide-26
SLIDE 26

Polynomial meshes

I.e., if we can certify accuracy on a “polynomial grid”, we can probably obtain accuracy. Let µ be a random variable with distribution ν. Let X “ pXnpνqqN

n“1 denote a

dν-orthonormal basis for PN. Define the induced distribution ρ “ ρpν, Xq based on this, sample tY muM

mě1 from ρ, and use this to discretize the arg max procedure in

RBM. Let uNpµq denote the resulting N-degree of freedom RBM surrogate. If M log M Á N δ2 log ˆ1 ǫ ˙ , then the least-squares PN-polynomial approximation vNpµq P PN to uN satisfies ❊ rvNpµq ´ upµqs2 À N ´2s ` U 2ǫ1 ` δ 1 ´ δ , where U is the uniform bound U “ supµ }upµq}.

  • A. Narayan

(U. Utah – SCI) Randomization and ROM

slide-27
SLIDE 27

Polynomial meshes

I.e., if we can certify accuracy on a “polynomial grid”, we can probably obtain accuracy. Let µ be a random variable with distribution ν. Let X “ pXnpνqqN

n“1 denote a

dν-orthonormal basis for PN. Define the induced distribution ρ “ ρpν, Xq based on this, sample tY muM

mě1 from ρ, and use this to discretize the arg max procedure in

RBM. Let uNpµq denote the resulting N-degree of freedom RBM surrogate. If M log M Á N δ2 log ˆ1 ǫ ˙ , then the least-squares PN-polynomial approximation vNpµq P PN to uN satisfies ❊ rvNpµq ´ upµqs2 À N ´2s ` U 2ǫ1 ` δ 1 ´ δ , where U is the uniform bound U “ supµ }upµq}. Without randomization, such a rigorous bound is practically infeasible.

  • A. Narayan

(U. Utah – SCI) Randomization and ROM

slide-28
SLIDE 28

Example 2: Sparse (polynomial) approximation

  • A. Narayan

(U. Utah – SCI) Randomization and ROM

slide-29
SLIDE 29

Underdetermined systems

Let x0 be a signal (vector) in ❘N. If we have M ě N linear measurements of x0: b :“ Ax0, then there is (usually) a unique solution x˚ that minimizes the ℓ2 discrepancy: x˚ :“ arg min

zP❘N

}Az ´ b}2 . And (usually), x˚ “ x0.

  • A. Narayan

(U. Utah – SCI) Randomization and ROM

slide-30
SLIDE 30

Underdetermined systems

Let x0 be a signal (vector) in ❘N. If we have M ě N linear measurements of x0: b :“ Ax0, then there is (usually) a unique solution x˚ that minimizes the ℓ2 discrepancy: x˚ :“ arg min

zP❘N

}Az ´ b}2 . And (usually), x˚ “ x0. The situation is (far) more complicated if M ă N. This is a particularly salient concern for MOR: x may be a high-dimensional model, but we may only have a small number of measurements.

  • A. Narayan

(U. Utah – SCI) Randomization and ROM

slide-31
SLIDE 31

Compressive sampling

How can we make this problem well-posed? Suppose that x0 is s-sparse, i.e., the number of non-zero terms is at most s ! N. We can consider the optimization problem, min }z}0 such that Az “ b. This problem is well-posed under mild conditions.

  • A. Narayan

(U. Utah – SCI) Randomization and ROM

slide-32
SLIDE 32

Compressive sampling

How can we make this problem well-posed? Suppose that x0 is s-sparse, i.e., the number of non-zero terms is at most s ! N. We can consider the optimization problem, min }z}0 such that Az “ b. This problem is well-posed under mild conditions. Unfortunately, it’s also NP-hard. A (fairly naive) relaxation of this problem is min }z}1 such that Az “ b. This is a convex problem, and hence it is computationally practical to solve.

  • A. Narayan

(U. Utah – SCI) Randomization and ROM

slide-33
SLIDE 33

Compressive sampling

How can we make this problem well-posed? Suppose that x0 is s-sparse, i.e., the number of non-zero terms is at most s ! N. We can consider the optimization problem, min }z}0 such that Az “ b. This problem is well-posed under mild conditions. Unfortunately, it’s also NP-hard. A (fairly naive) relaxation of this problem is min }z}1 such that Az “ b. This is a convex problem, and hence it is computationally practical to solve. If x0 is sparse, does the ℓ1 minimization problem recover the sparse solution?

  • A. Narayan

(U. Utah – SCI) Randomization and ROM

slide-34
SLIDE 34

Null space and restricted isometry properties

The matrix A satisfies the (robust) null space property (NSP) with constant c and sparsity s if }kS}1 ď c }kSc}1 , (1) holds for every k P kerpAq, and every subset S Ă rNs with cardinality at most s. Needless to say this is a rather difficult condition to verify directly. But: (1) is a necessary and sufficient condition so that ℓ1 minimization and ℓ0 minimization are equivalent. [Cohen, Devore 2009]

  • A. Narayan

(U. Utah – SCI) Randomization and ROM

slide-35
SLIDE 35

Null space and restricted isometry properties

The matrix A satisfies the (robust) null space property (NSP) with constant c and sparsity s if }kS}1 ď c }kSc}1 , (1) holds for every k P kerpAq, and every subset S Ă rNs with cardinality at most s. Needless to say this is a rather difficult condition to verify directly. But: (1) is a necessary and sufficient condition so that ℓ1 minimization and ℓ0 minimization are equivalent. [Cohen, Devore 2009] There is a stronger condition to ensure that ℓ1 minimization can compute sparse solutions, the restricted isometry property (RIP). A satisfies the RIP with constant ǫ and sparsity s if p1 ´ ǫq}x}2 ď }Ax}2 ď p1 ` ǫq}x}2, for every s-sparse vector x. This condition may also seem difficult to verify, but it contains ℓ2 norms!

  • A. Narayan

(U. Utah – SCI) Randomization and ROM

slide-36
SLIDE 36

RIP and sparse approximation

The virtue of the RIP is that: RIP ù ñ NSPpð ñ ℓ1 ” ℓ0q and the RIP is much easier to verify. [Candes, Tao 2005] ❘

  • A. Narayan

(U. Utah – SCI) Randomization and ROM

slide-37
SLIDE 37

RIP and sparse approximation

The virtue of the RIP is that: RIP ù ñ NSPpð ñ ℓ1 ” ℓ0q and the RIP is much easier to verify. [Candes, Tao 2005] In particular, suppose that B P ❘P ˆN with P ě N satisfies 1 ´ δ ď σminpBq, σmaxpBq ď 1 ` δ. Now, form A from B by uniformly at random subsampling M rows from B.

  • A. Narayan

(U. Utah – SCI) Randomization and ROM

slide-38
SLIDE 38

RIP and sparse approximation

The virtue of the RIP is that: RIP ù ñ NSPpð ñ ℓ1 ” ℓ0q and the RIP is much easier to verify. [Candes, Tao 2005] In particular, suppose that B P ❘P ˆN with P ě N satisfies 1 ´ δ ď σminpBq, σmaxpBq ď 1 ` δ. Now, form A from B by uniformly at random subsampling M rows from B. Then A satisfies the ps, ǫq RIP “with high probability” if M Á K log ˆ1 ǫ ˙ 1 1 ´ δ2 s log3psq log N, where K is the maximum row norm of B.[Rauhut 2010]

  • A. Narayan

(U. Utah – SCI) Randomization and ROM

slide-39
SLIDE 39

RIP and sparse approximation

The virtue of the RIP is that: RIP ù ñ NSPpð ñ ℓ1 ” ℓ0q and the RIP is much easier to verify. [Candes, Tao 2005] In particular, suppose that B P ❘P ˆN with P ě N satisfies 1 ´ δ ď σminpBq, σmaxpBq ď 1 ` δ. Now, form A from B by uniformly at random subsampling M rows from B. Then A satisfies the ps, ǫq RIP “with high probability” if M Á K log ˆ1 ǫ ˙ 1 1 ´ δ2 s log3psq log N, where K is the maximum row norm of B.[Rauhut 2010] The problems: (i) K can be very large, and (ii) sometimes P must be (extremely) large before δ is small.

  • A. Narayan

(U. Utah – SCI) Randomization and ROM

slide-40
SLIDE 40

The major point

If B is a matrix with “nearly” orthonormal columns, and maximum row norm K, then forming A with M „ Ks subsampled rows yields an RIP matrix. Hence, if b contains measurements from a sparse vector x0, then (with high probability) the solution to min }z}1 such that Az “ b, is the sparse vector x0.

  • A. Narayan

(U. Utah – SCI) Randomization and ROM

slide-41
SLIDE 41

The major point (optimized)

From the Chernoff bound: Forming A with M „ 1s subsampled rows yields an RIP matrix, if: we form B by taking P „ N log N samples from the induced distribution we use the appropriate biasing weights to rescale A. Hence with M „ s samples, we can guarantee recovery of sparse vectors with sparse measurements.[Adcock, Brugiapaglia, Razi, N 2020]

  • A. Narayan

(U. Utah – SCI) Randomization and ROM

slide-42
SLIDE 42

The major point (optimized)

From the Chernoff bound: Forming A with M „ 1s subsampled rows yields an RIP matrix, if: we form B by taking P „ N log N samples from the induced distribution we use the appropriate biasing weights to rescale A. Hence with M „ s samples, we can guarantee recovery of sparse vectors with sparse measurements.[Adcock, Brugiapaglia, Razi, N 2020] This type of guarantee is extremely difficult to achieve in general without randomization.

  • A. Narayan

(U. Utah – SCI) Randomization and ROM

slide-43
SLIDE 43

Randomness is your friend

Many things that cannot be accomplished with deterministic methods can be accomplished˚ with randomization.

˚: with “high” probability

We looked at two examples of this in ROM: RBM for elliptic PDE’s Sparse approximation There are many more examples.

  • A. Narayan

(U. Utah – SCI) Randomization and ROM