Finding Structure with Randomness Joel A. Tropp Computing + - PowerPoint PPT Presentation

Finding Structure with Randomness ❦ Joel A. Tropp Computing + Mathematical Sciences California Institute of Technology jtropp@cms.caltech.edu Joint with P.-G. Martinsson and N. Halko Applied Mathematics, Univ. Colorado at Boulder Research supported in part by NSF, DARPA, and ONR 1

Top 10 Scientific Algorithms Source: Dongarra and Sullivan, Comput. Sci. Eng. , 2000. Finding Structure with Randomness , ACM Colloquium, Caltech, 23 January 2012 2

The Decompositional Approach “The underlying principle of the decompositional approach to matrix computation is that it is not the business of the matrix algorithmicists to solve particular problems but to construct computational platforms from which a variety of problems can be solved.” ❧ A decomposition solves not one but many problems ❧ Often expensive to compute but can be reused ❧ Shows that apparently different algorithms produce the same object ❧ Facilitates rounding-error analysis ❧ Can be updated efficiently to reflect new information ❧ Has led to highly effective black-box software Source: Stewart, 2000. Finding Structure with Randomness , ACM Colloquium, Caltech, 23 January 2012 3

What’s Wrong with Classical Methods? ❧ Nothing... unless the matrices are large-scale ❧ Problem: Major cost for numerical algorithms is data transfer ❧ One Solution: Design multiplication-rich algorithms ❧ Matrix multiplication is efficient in many architectures: ❧ Graphics processing units ❧ Multi-core and parallel processors ❧ Distributed systems Source: Demmel and coauthors, 2003–present Finding Structure with Randomness , ACM Colloquium, Caltech, 23 January 2012 4

. Randomized . Truncated SVD Finding Structure with Randomness , ACM Colloquium, Caltech, 23 January 2012 5

Truncated Singular Value Decomposition A ≈ U Σ V ∗ where U , V have orthonormal columns and Σ is diagonal: Interpretation: k -SVD = optimal rank- k approximation Applications: ❧ Least-squares computations ❧ Principal component analysis ❧ Summarization and data reduction ❧ ... Finding Structure with Randomness , ACM Colloquium, Caltech, 23 January 2012 6

Overview of Two-Stage Randomized Approach Goal: Compute the k -SVD of an input matrix A Stage A: Finding the range ❧ Use a randomized algorithm to compute a k -dimensional basis Q that captures most of the range of A : Q has orthonormal columns and A ≈ QQ ∗ A . Stage B: Constructing the decomposition ❧ Use the basis Q to reduce the problem size ❧ Apply classical SVD algorithm to the reduced problem ❧ Obtain k -SVD in factored form Finding Structure with Randomness , ACM Colloquium, Caltech, 23 January 2012 7

Stage A: Finding the Range Given: ❧ An m × n matrix A ❧ Target rank k ≪ min { m, n } Construct an m × k matrix Q with orthonormal columns s.t. � A − QQ ∗ A � ≈ rank( B ) ≤ k � A − B � , min ❧ QQ ∗ is the orthogonal projector onto the range of Q Approach: Use a randomized algorithm! (The balance of this talk...) Total Cost: One multiply ( m × n × k ) + O( k 2 n ) flops Finding Structure with Randomness , ACM Colloquium, Caltech, 23 January 2012 8

Stage B: Constructing the Decomposition Assume A is m × n and Q is m × k with ON columns and A ≈ QQ ∗ A Approach: Reduce problem size; apply classical numerical linear algebra! 1. Form k × n matrix B = Q ∗ A 2. Factor B = U Σ V ∗ 3. Conclude A ≈ ( QU ) Σ V ∗ Total Cost: One multiply ( m × n × k ) + O( k 2 n ) flops Finding Structure with Randomness , ACM Colloquium, Caltech, 23 January 2012 9

Total Costs for Approximate k -SVD Two-Stage Randomized Algorithm: k 2 ( m + n ) 2 multiplies ( m × n × k ) + flops Classical Sparse Methods (Krylov) : k 2 ( m + n ) k multiplies ( m × n × 1 ) + flops Classical Dense Methods (RRQR + full SVD) : Not based on multiplies + mnk flops Finding Structure with Randomness , ACM Colloquium, Caltech, 23 January 2012 10

. Randomized . Range Finder Finding Structure with Randomness , ACM Colloquium, Caltech, 23 January 2012 11

Randomized Range Finder: Intuition Finding Structure with Randomness , ACM Colloquium, Caltech, 23 January 2012 12

Prototype for Randomized Range Finder Input: An m × n matrix A , number ℓ of samples Output: An m × ℓ matrix Q with orthonormal columns 1. Draw an n × ℓ random matrix Ω . 2. Form the matrix product Y = A Ω . 3. Construct an orthonormal basis Q for the range of Y . Total Cost: 1 multiply ( m × n × ℓ ) + O( ℓ 2 n ) flops Sources: NLA community: Stewart (1970s). GFA: Johnson–Lindenstrauss (1984) et seq. TCS: Boutsidis, Deshpande, Drineas, Frieze, Kannan, Mahoney, Papadimitriou, Sarl´ os, Vempala (1998–present). SciComp: Martinsson, Rokhlin, Szlam, Tygert (2004–present). Finding Structure with Randomness , ACM Colloquium, Caltech, 23 January 2012 13

Implementation Issues Q: How do we pick the number of samples ℓ ? A: Adaptively, using a randomized error estimator. Q: How does the number ℓ of samples compare with the target rank k ? A: Remarkably, ℓ = k + 5 or ℓ = k + 10 is usually adequate! Q: What random matrix Ω ? A: For many applications, standard Gaussian works brilliantly. Q: How do we perform the matrix–matrix multiply? A: Exploit the computational architecture. Q: How do we compute the orthonormal basis? A: Carefully... Double Gram–Schmidt or Householder reflectors. Finding Structure with Randomness , ACM Colloquium, Caltech, 23 January 2012 14

Approximating a Helmholtz Integral Operator Approximation errors 2 log 10 ( f ℓ ) log 10 ( e ℓ ) 0 log 10 ( σ ℓ +1 ) −2 −4 Order of magnitude −6 −8 −10 −12 −14 −16 −18 0 50 100 150 200 250 ℓ Total number of samples Finding Structure with Randomness , ACM Colloquium, Caltech, 23 January 2012 15

Error Bound for Randomized Range Finder Theorem 1. [HMT 2009] Assume ❧ the matrix A is m × n with m ≥ n ; ❧ the target rank is k ; ❧ the optimal error σ k +1 = min rank( B ) ≤ k � A − B � ; ❧ the test matrix Ω is n × ( k + p ) standard Gaussian. Then the basis Q computed by the randomized range finder satisfies 1 + 4 √ k + p � � · √ n E � A − QQ ∗ A � ≤ σ k +1 . p − 1 The probability of a substantially larger error is negligible. Note: The √ n term is actually the ℓ 2 norm of the residual singular values! Finding Structure with Randomness , ACM Colloquium, Caltech, 23 January 2012 16

. Randomized . Power Scheme Finding Structure with Randomness , ACM Colloquium, Caltech, 23 January 2012 17

Randomized Range Finder + Power Scheme Problem: The singular values of a data matrix often decay slowly Remedy: Apply the randomized range finder to ( AA ∗ ) q A for small q Input: An m × n matrix A , number ℓ of samples Output: An m × ℓ matrix Q with orthonormal columns 1. Draw an n × ℓ random matrix Ω . 2. Carefully form the matrix product Y = ( AA ∗ ) q A Ω . 3. Construct an orthonormal basis Q for the range of Y . Total Cost: 2 q + 1 multiplies ( m × n × ℓ ) + O( qℓ 2 n ) Finding Structure with Randomness , ACM Colloquium, Caltech, 23 January 2012 18

Eigenfaces ❧ Database consists of 7 , 254 photographs with 98 , 304 pixels each ❧ Form 98 , 304 × 7 , 254 data matrix � A ❧ Total storage: 5.4 Gigabytes (uncompressed) ❧ Center each column and scale to unit norm to obtain A ❧ The dominant left singular vectors are called eigenfaces ❧ Attempt to compute first 100 eigenfaces using power scheme Source: Image from Scholarpedia article “Eigenfaces,” accessed 12 October 2009 Finding Structure with Randomness , ACM Colloquium, Caltech, 23 January 2012 19

Approximation error e ℓ Estimated Singular Values σ j 2 2 10 10 Minimal error (est) q = 0 q = 1 q = 2 q = 3 Magnitude 1 1 10 10 0 0 10 10 0 20 40 60 80 100 0 20 40 60 80 100 ℓ j Total number of samples Rank of singular value Finding Structure with Randomness , ACM Colloquium, Caltech, 23 January 2012 20

Error Bound for Power Scheme Theorem 2. [HMT 2009] Assume ❧ the matrix A is m × n with m ≥ n ; ❧ the target rank is k ; ❧ the optimal error σ k +1 = min rank( B ) ≤ k � A − B � ; ❧ the test matrix Ω is n × ( k + p ) standard Gaussian. Then the basis Q computed by the power scheme satisfies 1 + 4 √ k + p � � 1 / (2 q +1) · √ n E � A − QQ ∗ A � ≤ σ k +1 . p − 1 ❧ The power scheme drives the extra factor to one exponentially fast! Finding Structure with Randomness , ACM Colloquium, Caltech, 23 January 2012 21

. Structured . Randomness Finding Structure with Randomness , ACM Colloquium, Caltech, 23 January 2012 22

Accelerating the Randomized Range Finder Input: An m × n matrix A , number ℓ of samples Output: An m × ℓ matrix Q with orthonormal columns 1. Draw an n × ℓ random matrix Ω . 2. Form the matrix product Y = A Ω . 3. Construct an orthonormal basis Q for the range of Y . ❧ Choose a structured random matrix Ω to accelerate multiply ❧ Example: Subsampled randomized Fourier transform (SRFT) test diagonal discrete Fourier random sample matrix random signs transform of columns Finding Structure with Randomness , ACM Colloquium, Caltech, 23 January 2012 23

Finding Structure with Randomness Joel A. Tropp Computing + - PowerPoint PPT Presentation

Finding Structure with Randomness Joel A. Tropp Computing + Mathematical Sciences California Institute of Technology jtropp@cms.caltech.edu Joint with P.-G. Martinsson and N. Halko Applied Mathematics, Univ. Colorado at Boulder Research

Algorithmic randomness Cuny logic worshop Benoit Monin - LACL - Universit e Paris-Est Cr

Lecture 19 Randomness, Pseudo Randomness, and Confidentiality Stephen Checkoway University

Counting Words: Non- Randomness Pre-Processing and Non-Randomness The End Marco Baroni &

Randomness Some content taken from Silence on the Wire by Michal Zalewski Todays Agenda

CS 574: Randomized Algorithms Lecture 1. Introduction to Randomness August 25, 2015 Lecture 1.

Randomness in Computing L ECTURE 1 Randomness in Computing Course information Verifying

15-251 Great Theoretical Ideas in Computer Science Lecture 21: Introduction to Randomness and

Firmware Insider Bluetooth Randomness is Mostly Random RANDOMNESS IS MY PASSION Jrn

Finding your way in a graph Finding your way in a graph Finding your way in a graph Finding your

Randomness in Computing L ECTURE 19 Last time Finding Hamiltonian cycles in random graphs

Randomness Dependent Randomness Dependent Message Security g y Eleanor Birrell Kai

Randomness in C 2 and Pluripotential Theory Randomness in C 2 and Pluripotential Theory Outline 1

Randomness and analysis: a tutorial Part I: Randomness notions and almost everywhere theorems

Computability, randomness and the ergodic decomposition Mathieu Hoyrup ( t r

Higher Randomness and hK-Trivials Paul-Elliot Angls dAuriac Benot Monin March 26, 2019

Pseudo-Random Number Generators Functional Programming and Intelligent Algorithms Prof Hans Georg

Numerical Computation for Deep Learning Lecture slides for Chapter 4 of Deep Learning

Behaviour of some data-types In this laboration you will visualize and investigate some

Automated Precision Tuning using Semidefinite Programming Victor Magron , RA Imperial College

Scientific Computing What is scientific computing ? Design and analysis of algorithms for solving

Identifying Calculation Problems Richard Watson BEng (Hons) CEng MIOA MAES MIEEE PCT Workshop

Floating-Point Verification by Theorem Proving John Harrison Intel Corporation SFM-06:HV 27th

x 2 + bx + c = 0 The following is a way to solve such quadratics: b 2 b x = 2 4 c

lgebra Linear e Aplicaes LINEAR EQUATIONS Start with a few examples One way of

Finding Structure with Randomness Joel A. Tropp Computing + - PowerPoint PPT Presentation

Finding Structure with Randomness Joel A. Tropp Computing + Mathematical Sciences California Institute of Technology jtropp@cms.caltech.edu Joint with P.-G. Martinsson and N. Halko Applied Mathematics, Univ. Colorado at Boulder Research

Algorithmic randomness Cuny logic worshop Benoit Monin - LACL - Universit e Paris-Est Cr

Lecture 19 Randomness, Pseudo Randomness, and Confidentiality Stephen Checkoway University

Counting Words: Non- Randomness Pre-Processing and Non-Randomness The End Marco Baroni &amp;

Randomness Some content taken from Silence on the Wire by Michal Zalewski Todays Agenda

CS 574: Randomized Algorithms Lecture 1. Introduction to Randomness August 25, 2015 Lecture 1.

Randomness in Computing L ECTURE 1 Randomness in Computing Course information Verifying

15-251 Great Theoretical Ideas in Computer Science Lecture 21: Introduction to Randomness and

Firmware Insider Bluetooth Randomness is Mostly Random RANDOMNESS IS MY PASSION Jrn

Finding your way in a graph Finding your way in a graph Finding your way in a graph Finding your

Randomness in Computing L ECTURE 19 Last time Finding Hamiltonian cycles in random graphs

Randomness Dependent Randomness Dependent Message Security g y Eleanor Birrell Kai

Randomness in C 2 and Pluripotential Theory Randomness in C 2 and Pluripotential Theory Outline 1

Randomness and analysis: a tutorial Part I: Randomness notions and almost everywhere theorems

Computability, randomness and the ergodic decomposition Mathieu Hoyrup ( t r

Higher Randomness and hK-Trivials Paul-Elliot Angls dAuriac Benot Monin March 26, 2019

Pseudo-Random Number Generators Functional Programming and Intelligent Algorithms Prof Hans Georg

Numerical Computation for Deep Learning Lecture slides for Chapter 4 of Deep Learning

Behaviour of some data-types In this laboration you will visualize and investigate some

Automated Precision Tuning using Semidefinite Programming Victor Magron , RA Imperial College

Scientific Computing What is scientific computing ? Design and analysis of algorithms for solving

Identifying Calculation Problems Richard Watson BEng (Hons) CEng MIOA MAES MIEEE PCT Workshop

Floating-Point Verification by Theorem Proving John Harrison Intel Corporation SFM-06:HV 27th

x 2 + bx + c = 0 The following is a way to solve such quadratics: b 2 b x = 2 4 c

lgebra Linear e Aplicaes LINEAR EQUATIONS Start with a few examples One way of

Counting Words: Non- Randomness Pre-Processing and Non-Randomness The End Marco Baroni &