From math to peta-app: Challenges to practical computation with - - PowerPoint PPT Presentation

from math to peta app
SMART_READER_LITE
LIVE PREVIEW

From math to peta-app: Challenges to practical computation with - - PowerPoint PPT Presentation

From math to peta-app: Challenges to practical computation with tensor-based algorithms Robert J. Harrison harrisonrj@ornl.gov robert.harrison@utk.edu Outline Why tensors? Many-body physics Low-rank approximation for operators


slide-1
SLIDE 1

From math to peta-app:

Challenges to practical computation with tensor-based algorithms

Robert J. Harrison harrisonrj@ornl.gov robert.harrison@utk.edu

slide-2
SLIDE 2

Outline

  • Why tensors?

– Many-body physics – Low-rank approximation for operators & functions

  • Math issues

– Operators: accurate representations without an underlying integral form – Functions: efficient and accurate computation with local/global low-rank approximations

  • C/S issues

– Generating efficient code for modern computers – High-level composition

slide-3
SLIDE 3

The Electronic Schrödinger Equation

  • A 2nd-order, linear, partial differential

equation in 3N dimensions (N electrons)

 H r=Er  H  r ,t=i d r ,t dt  H =−1 2 ∑

i

∇i

2−∑  ∑ i

Z ∣r−ri∣∑

i j

1 ∣ri−r j∣

slide-4
SLIDE 4

 Simulation capability for molecular science

 Broad range of molecules, including biomolecules, nanoparticles and heavy elements  Electronic structure of molecules (non-relativistic, relativistic, ECPs, first and second derivatives)  Solid state capability (DFT plane-wave, CPMD)  Molecular dynamics, molecular mechanics

 Emphasis on modularity and portability.  Freely distributed  Performance characteristics – designed for MPP  Portable – runs on a wide range of computers

NWChem Overview

slide-5
SLIDE 5

Synthesis of High Performance Algorithms for Electronic Structure Calculations

http://www.cis.ohio-state.edu/~gb/TCE

  • Collaboration between DOE/SciDAC, NSF/ITR and ORNL/LDRD
  • Objective: develop a high level programming tool that translates many-body

quantum theory into efficient massively parallel codes. This is anticipated to revolutionize the rate of progress in this field by eliminating man-years of programming effort.

  • NSF Project:
  • Sadayappan (PI), Baumgartner, Cociorva, Pitzer

(OSU) Bernholdt, Harrison (unfunded) (ORNL) Ramanujam (LSU) Nooijen (Waterloo)

  • DOE SciDAC: Harrison (PI), Hirata (PNNL)
  • DOE ORNL/LDRD: Bernholdt (PI, 2002-3)
  • Other SciDAC projects adopting this tool: Piecuch, Gordon
  • Also being applied to nuclear physics (Bernholdt and Dean)
slide-6
SLIDE 6

CCSD Doubles Equation

hbar[a,b,i,j] == sum[f[b,c]*t[i,j,a,c],{c}] -sum[f[k,c]*t[k,b]*t[i,j,a,c],{k,c}] +sum[f[a,c]*t[i,j,c,b],{c}] -sum[f[k,c]*t[k,a]*t[i,j,c,b],{k,c}]

  • sum[f[k,j]*t[i,k,a,b],{k}] -sum[f[k,c]*t[j,c]*t[i,k,a,b],{k,c}] -sum[f[k,i]*t[j,k,b,a],{k}] -sum[f[k,c]*t[i,c]*t[j,k,b,a],{k,c}]

+sum[t[i,c]*t[j,d]*v[a,b,c,d],{c,d}] +sum[t[i,j,c,d]*v[a,b,c,d],{c,d}] +sum[t[j,c]*v[a,b,i,c],{c}] -sum[t[k,b]*v[a,k,i,j],{k}] +sum[t[i,c]*v[b,a,j,c],{c}] -sum[t[k,a]*v[b,k,j,i],{k}] -sum[t[k,d]*t[i,j,c,b]*v[k,a,c,d],{k,c,d}] -sum[t[i,c]*t[j,k,b,d]*v[k,a,c,d], {k,c,d}] -sum[t[j,c]*t[k,b]*v[k,a,c,i],{k,c}] +2*sum[t[j,k,b,c]*v[k,a,c,i],{k,c}] -sum[t[j,k,c,b]*v[k,a,c,i],{k,c}]

  • sum[t[i,c]*t[j,d]*t[k,b]*v[k,a,d,c],{k,c,d}] +2*sum[t[k,d]*t[i,j,c,b]*v[k,a,d,c],{k,c,d}] -sum[t[k,b]*t[i,j,c,d]*v[k,a,d,c],{k,c,d}]
  • sum[t[j,d]*t[i,k,c,b]*v[k,a,d,c],{k,c,d}] +2*sum[t[i,c]*t[j,k,b,d]*v[k,a,d,c],{k,c,d}] -sum[t[i,c]*t[j,k,d,b]*v[k,a,d,c],{k,c,d}]
  • sum[t[j,k,b,c]*v[k,a,i,c],{k,c}] -sum[t[i,c]*t[k,b]*v[k,a,j,c],{k,c}] -sum[t[i,k,c,b]*v[k,a,j,c],{k,c}]
  • sum[t[i,c]*t[j,d]*t[k,a]*v[k,b,c,d],{k,c,d}] -sum[t[k,d]*t[i,j,a,c]*v[k,b,c,d],{k,c,d}] -sum[t[k,a]*t[i,j,c,d]*v[k,b,c,d],{k,c,d}]

+2*sum[t[j,d]*t[i,k,a,c]*v[k,b,c,d],{k,c,d}] -sum[t[j,d]*t[i,k,c,a]*v[k,b,c,d],{k,c,d}] -sum[t[i,c]*t[j,k,d,a]*v[k,b,c,d],{k,c,d}]

  • sum[t[i,c]*t[k,a]*v[k,b,c,j],{k,c}] +2*sum[t[i,k,a,c]*v[k,b,c,j],{k,c}] -sum[t[i,k,c,a]*v[k,b,c,j],{k,c}]

+2*sum[t[k,d]*t[i,j,a,c]*v[k,b,d,c],{k,c,d}] -sum[t[j,d]*t[i,k,a,c]*v[k,b,d,c],{k,c,d}] -sum[t[j,c]*t[k,a]*v[k,b,i,c],{k,c}]

  • sum[t[j,k,c,a]*v[k,b,i,c],{k,c}] -sum[t[i,k,a,c]*v[k,b,j,c],{k,c}] +sum[t[i,c]*t[j,d]*t[k,a]*t[l,b]*v[k,l,c,d],{k,l,c,d}]
  • 2*sum[t[k,b]*t[l,d]*t[i,j,a,c]*v[k,l,c,d],{k,l,c,d}] -2*sum[t[k,a]*t[l,d]*t[i,j,c,b]*v[k,l,c,d],{k,l,c,d}]

+sum[t[k,a]*t[l,b]*t[i,j,c,d]*v[k,l,c,d],{k,l,c,d}] -2*sum[t[j,c]*t[l,d]*t[i,k,a,b]*v[k,l,c,d],{k,l,c,d}]

  • 2*sum[t[j,d]*t[l,b]*t[i,k,a,c]*v[k,l,c,d],{k,l,c,d}] +sum[t[j,d]*t[l,b]*t[i,k,c,a]*v[k,l,c,d],{k,l,c,d}]
  • 2*sum[t[i,c]*t[l,d]*t[j,k,b,a]*v[k,l,c,d],{k,l,c,d}] +sum[t[i,c]*t[l,a]*t[j,k,b,d]*v[k,l,c,d],{k,l,c,d}]

+sum[t[i,c]*t[l,b]*t[j,k,d,a]*v[k,l,c,d],{k,l,c,d}] +sum[t[i,k,c,d]*t[j,l,b,a]*v[k,l,c,d],{k,l,c,d}] +4*sum[t[i,k,a,c]*t[j,l,b,d]*v[k,l,c,d],{k,l,c,d}] -2*sum[t[i,k,c,a]*t[j,l,b,d]*v[k,l,c,d],{k,l,c,d}]

  • 2*sum[t[i,k,a,b]*t[j,l,c,d]*v[k,l,c,d],{k,l,c,d}] -2*sum[t[i,k,a,c]*t[j,l,d,b]*v[k,l,c,d],{k,l,c,d}] +sum[t[i,k,c,a]*t[j,l,d,b]*v[k,l,c,d],

{k,l,c,d}] +sum[t[i,c]*t[j,d]*t[k,l,a,b]*v[k,l,c,d],{k,l,c,d}] +sum[t[i,j,c,d]*t[k,l,a,b]*v[k,l,c,d],{k,l,c,d}]

  • 2*sum[t[i,j,c,b]*t[k,l,a,d]*v[k,l,c,d],{k,l,c,d}] -2*sum[t[i,j,a,c]*t[k,l,b,d]*v[k,l,c,d],{k,l,c,d}] +sum[t[j,c]*t[k,b]*t[l,a]*v[k,l,c,i],

{k,l,c}] +sum[t[l,c]*t[j,k,b,a]*v[k,l,c,i],{k,l,c}] -2*sum[t[l,a]*t[j,k,b,c]*v[k,l,c,i],{k,l,c}] +sum[t[l,a]*t[j,k,c,b]*v[k,l,c,i],{k,l,c}]

  • 2*sum[t[k,c]*t[j,l,b,a]*v[k,l,c,i],{k,l,c}] +sum[t[k,a]*t[j,l,b,c]*v[k,l,c,i],{k,l,c}] +sum[t[k,b]*t[j,l,c,a]*v[k,l,c,i],{k,l,c}]

+sum[t[j,c]*t[l,k,a,b]*v[k,l,c,i],{k,l,c}] +sum[t[i,c]*t[k,a]*t[l,b]*v[k,l,c,j],{k,l,c}] +sum[t[l,c]*t[i,k,a,b]*v[k,l,c,j],{k,l,c}]

  • 2*sum[t[l,b]*t[i,k,a,c]*v[k,l,c,j],{k,l,c}] +sum[t[l,b]*t[i,k,c,a]*v[k,l,c,j],{k,l,c}] +sum[t[i,c]*t[k,l,a,b]*v[k,l,c,j],{k,l,c}]

+sum[t[j,c]*t[l,d]*t[i,k,a,b]*v[k,l,d,c],{k,l,c,d}] +sum[t[j,d]*t[l,b]*t[i,k,a,c]*v[k,l,d,c],{k,l,c,d}] +sum[t[j,d]*t[l,a]*t[i,k,c,b]*v[k,l,d,c],{k,l,c,d}] -2*sum[t[i,k,c,d]*t[j,l,b,a]*v[k,l,d,c],{k,l,c,d}]

  • 2*sum[t[i,k,a,c]*t[j,l,b,d]*v[k,l,d,c],{k,l,c,d}] +sum[t[i,k,c,a]*t[j,l,b,d]*v[k,l,d,c],{k,l,c,d}] +sum[t[i,k,a,b]*t[j,l,c,d]*v[k,l,d,c],

{k,l,c,d}] +sum[t[i,k,c,b]*t[j,l,d,a]*v[k,l,d,c],{k,l,c,d}] +sum[t[i,k,a,c]*t[j,l,d,b]*v[k,l,d,c],{k,l,c,d}] +sum[t[k,a]*t[l,b]*v[k,l,i,j], {k,l}] +sum[t[k,l,a,b]*v[k,l,i,j],{k,l}] +sum[t[k,b]*t[l,d]*t[i,j,a,c]*v[l,k,c,d],{k,l,c,d}] +sum[t[k,a]*t[l,d]*t[i,j,c,b]*v[l,k,c,d], {k,l,c,d}] +sum[t[i,c]*t[l,d]*t[j,k,b,a]*v[l,k,c,d],{k,l,c,d}] -2*sum[t[i,c]*t[l,a]*t[j,k,b,d]*v[l,k,c,d],{k,l,c,d}] +sum[t[i,c]*t[l,a]*t[j,k,d,b]*v[l,k,c,d],{k,l,c,d}] +sum[t[i,j,c,b]*t[k,l,a,d]*v[l,k,c,d],{k,l,c,d}] +sum[t[i,j,a,c]*t[k,l,b,d]*v[l,k,c,d], {k,l,c,d}] -2*sum[t[l,c]*t[i,k,a,b]*v[l,k,c,j],{k,l,c}] +sum[t[l,b]*t[i,k,a,c]*v[l,k,c,j],{k,l,c}] +sum[t[l,a]*t[i,k,c,b]*v[l,k,c,j],{k,l,c}] +v[a,b,i,j]

i j

ab=〈

a b i j∣e

− T1− T 2 

H e

 T 1  T 2∣0〉

slide-7
SLIDE 7

range V = 3000; range O = 100; index a,b,c,d,e,f : V; index i,j,k,l : O; mlimit = 100GB; procedure P(in A[V,V,O,O], in B[V,V,V,O], in C[V,V,O,O], in D[V,V,V,O],

  • ut S[V,V,O,O])=

begin S[a,b,i,j] == sum[ A[a,c,i,k] * B[b,e,f,l] * C[d,f,j,k] * D[c,d,e,l], {c,e,f,k,l}]; end

Tensor Contraction Engine (TCE)

  • High-level domain-specific language for a

class of problems in quantum chemistry/physics based on contraction of large multi-dimensional tensors

  • Specialized optimizing compiler

– Produces F77+GA code, linked to runtime libs

Sabij=∑

cefkl

Aacik Bbefl CdfjkDcdel

slide-8
SLIDE 8

CCSD(T) run on Cray XT5 : 18 water

(H2O)18 54 atoms 918 basis functions Cc-pvtz(-f) basis

FP performance at 90K cores: 358 TFlops

  • E. Apra, ORNL
slide-9
SLIDE 9

CCSD(T) run on Cray XT5 : 20 water

(H2O)20 60 atoms 1020 basis functions Cc-pvtz(-f) basis

Floating-Point performance at 92K cores: 475 TFlops Efficiency > 50%

  • E. Apra, ORNL
slide-10
SLIDE 10
slide-11
SLIDE 11

CSGF June 2008 11

Multiresolution Adaptive Numerical Scientific Simulation

Ariana Beste1, George I. Fann1, Robert J. Harrison1,2, Rebecca Hartman-Baker1, Judy Hill1

1Oak Ridge National Laboratory

2University of Tennessee, Knoxville

In collaboration with Gregory Beylkin4, Fernando Perez4, Lucas Monzon4, Martin Mohlenkamp5 and others

4University of Colorado 5Ohio University

harrisonrj@ornl.gov

slide-12
SLIDE 12

CSGF June 2008 12

Funding

  • DOE Office of Science, SciDAC, divisions of Advanced

Scientific Computing Research and Basic Energy Science, under contract DE-AC05-00OR22725 with Oak Ridge National Laboratory, in part using the National Center for Computational Sciences.

  • NSF CHE 0625598: Cyber-infrastructure and Research

Facilities: Chemical Computations on Future High-end Computers

  • NSF CNS-0509410: CAS-AES: An integrated framework

for compile-time/run-time support for multi-scale applications on high-end systems

  • DARPA HPCS2: HPCS programming language

evaluation

slide-13
SLIDE 13

CSGF June 2008 13

MADNESS objectives

  • Scaling to 1+M processors ASAP
  • High-level composition (matlab-like) for accurate

and efficient solvers – Targeting graduate students in physical science

  • Correct scaling of cost with system size

– Fast algorithms with guaranteed precision

  • Broad relevance

– Applications now in chemistry, atomic physics, fusion, material science, nuclear physics http://code.google.com/p/m-a-d-n-e-s-s

slide-14
SLIDE 14

CSGF June 2008 14

High-level composition

  • Close to the physics
  • peratorT op = CoulombOperator(k, rlo, thresh);

functionT rho = psi*psi; double twoe = inner(apply(op,rho),rho); double pe = 2.0*inner(Vnuc*psi,psi); double ke = 0.0; for (int axis=0; axis<3; axis++) { functionT dpsi = diff(psi,axis); ke += inner(dpsi,dpsi); } double energy = ke + pe + twoe;

E=〈∣−1 2 ∇

2V∣〉∫ 2x

1 ∣x−y∣

2 ydx dy

slide-15
SLIDE 15

CSGF June 2008 15

slide-16
SLIDE 16

Essential techniques for fast computation

  • Multiresolution
  • Low-separation

rank

  • Low-operator

rank

V 0⊂V 1⊂⋯⊂V n V n=V 0V 1−V 0⋯V n−V n−1

f x1,, xn=∑

l=1 M

 l∏

i=1 d

f i

l xiO

∥ f i

l∥2=1

l0

A=∑

=1 r

u v

TO 

0 v

T v=u T u=

slide-17
SLIDE 17

Integral Formulation

  • Solving the integral equation

– Eliminates the derivative operator and related “issues” – Converges as fixed point iteration with no preconditioner

   

   

2 1 2 1 2 2

2 2 2 * * ( ) ( ) in 3D ; 2 4

k r s

V E E V G V e G f r ds f s k E r s 

  

                

Such Green’s Functions (bound state Helmholtz, Poisson) can be rapidly and accurately applied with a single, sparse matrix vector product.

slide-18
SLIDE 18

CSGF June 2008 18

Separated form for integral operators

  • Approach

– Represent the kernel over a finite range as a sum of products of 1-D operators (often, not always, Gaussian) – Only need compute 1D transition matrices (X,Y,Z) – SVD the 1-D operators (low rank away from singularity) – Apply most efficient choice of low/full rank 1-D operator – Even better algorithms not yet implemented

T∗ f =∫ ds K r−s f s

rii' , j j' , k k '

n, l−l'

=∑

=0 M

X ii'

n , lx−l' xY j j' n, l y−l' y Z k k ' n, l z−l' zO

slide-19
SLIDE 19

CSGF June 2008 19

Accurate Quadratures

  • Trapezoidal quadrature

– Geometric precision for periodic functions with sufficient smoothness

  • Beylkin & Monzon

– Further reductions, but not automatic

The kernel for x=1e-4,1e-3,1e-2,1e-,1e0. The curve for x=1e-4 is the rightmost

e

−r

r = 2

∫

e

−x

2t 2− 2/4t 2

dt = 2

∫

−∞ ∞

e

−x

2e 2s− 2e −2 s/4sds

slide-20
SLIDE 20

02/20/09 Robert J. Harrison, UT/ORNL 20

Computational kernels

  • Discontinuous spectral element

– In each “box” a tensor product of coefficients – Most operations are small matrix-multiplication – Exploit low operator-rank of operator matrices

  • Leads to irregular dimensions and shapes

– Typical matrix dimensions are 2 to 30 – E.g., (20,400)T * (20,20)

ri' j' k '=∑

i j k

si jk cii' c j j' ck k '=∑

k ∑ j ∑ i

si jk cii'c j j'ck k ' ⇒r=s

T c T c T c

slide-21
SLIDE 21

02/20/09 Robert J. Harrison, UT/ORNL 21

Comparison with MKL, Goto, ATLAS on Intel Xeon 5355 for (20,400)T*(20,n)

(ratio of speeds mtxmq/other ... value > 1 implies mtxmq is faster) n MKL Goto ATLAS n MKL Goto ATLAS 2 6.25 4.1667 5 16 0.8966 1.2581 2.0708 4 3.1042 3.6341 4.6563 18 1.7763 1.3636 2.4545 6 4.375 2.625 5.122 20 0.9556 1.2727 2.6168 8 1.3132 2.0427 5.1957 22 1.6416 1.2968 2.7308 10 2.7368 1.9549 5.3061 24 0.9638 1.2208 1.9664 12 1.0605 1.5843 2.4352 26 1.5337 1.2814 2.1295 14 2.0323 1.4737 2.1356 28 0.8411 1.0588 2.0301

slide-22
SLIDE 22

02/20/09 Robert J. Harrison, UT/ORNL 22

XT5 single core FLOPs/cycle (AMD Barcelona)

(nj, ni)T*(nj,nk) ni

nj nk MTXMQ ACML

400 2 20 2.55 0.95 400 4 20 2.62 1.50 400 6 20 2.60 1.79 400 8 20 2.56 2.02 400 10 20 2.58 2.12 400 12 20 2.64 2.27 400 14 20 2.90 2.35 400 16 20 2.80 2.46 400 18 20 2.74 2.49 400 20 20 2.89 2.58 nested transform (nj, ni)T*(nj,nk) ni

nj nk MTXMQ ACML

4 2 2 0.10 0.07 16 4 4 1.04 0.51 36 6 6 1.74 0.99 64 8 8 2.33 1.56 100 10 10 2.61 1.80 144 12 12 2.69 2.12 196 14 14 2.94 2.17 256 16 16 2.97 2.41 324 18 18 2.93 2.38 400 20 20 3.03 2.49 484 22 22 3.01 2.52 576 24 24 3.09 2.73 676 26 26 3.02 2.73 784 28 28 2.87 2.87 900 30 30 2.88 2.81

L2 cache is 512Kb = 2*32^3 doubles

  • hence expect good multi-core scaling
  • confirmed linear scaling to 8 cores for nested transformation
slide-23
SLIDE 23

CSGF June 2008 23

Electron correlation

  • All defects in the mean-field model are ascribed to

electron correlation

  • Singularities in Hamiltonian imply for a two-electron atom
  • Include the inter-electron distance in the wavefunction

– E.g., Hylleraas 1938 wavefunction for He – Potentially very accurate, but not systematically improvable, and (until recently) not computationally feasible for many-electron systems

  • Configuration interaction expansion – slowly convergent

r1 r2 r12

r1,r2, r12=11 2 r12⋯ as r120 r1,r2, r12=exp−r1r21a r12⋯

r1,r2,=∑

i

ci∣1

ir12 ir2∣

slide-24
SLIDE 24

CSGF June 2008 24

x y

|x-y| |x-y| x-y |x-y| y-x |x-y| |x-y| |x-y| |x-y| y-x x-y y-x x-y

In 3D, ideally must be one box removed from the diagonal Diagonal box has full rank Boxes touching diagonal (face, edge,

  • r corner) have

increasingly low rank Away from diagonal r = O(-log ) r = separation rank

∣x− y∣=∑

=1 r

f xg  y

Partitioned SVD representation

slide-25
SLIDE 25

25

Preliminary results for He atom

Variational E ∆E residual HF

  • 2.861 61
  • Iter. 0
  • 2.871 08

0.414 73 1

  • 2.894 92
  • 0.023 84

0.017 28 2

  • 2.900 43
  • 0.005 51

0.007 94 3

  • 2.902 18
  • 0.001 75

0.003 84 4

  • 2.902 88
  • 0.000 70

0.002 02 5

  • 2.903 20
  • 0.000 32

0.001 25 6

  • 2.903 39
  • 0.000 20

0.000 91 … … … … 12

  • 2.903 73
  • 0.000 04

0.000 36 13

  • 2.903 73

+0.000 004 0.000 32 14

  • 2.903 77
  • 0.000 04

0.000 28 Computational details:

  • 5-th order multiwavelets
  • Wavelet threshold: 210-5
  • SVD threshold: 210-6
  • Exponential correlation factor

Perturbative wavefunction:

  • Maximum refinement: n=4
  • Memory: 132M in full

partitioned SVD form ~10GB without SCD Energy is variational (small non-variational is just truncation err) exact

  • 2.903 74 (E(HF)=-2.861 68)

Hylleraas (6 terms)

  • 2.903 24

Löwdin and Redei

  • 2.895 4

cc-pV6Z

  • 2.903 48 (FCI) (E(HF)= -2.861 67)
slide-26
SLIDE 26

02/20/09 Robert J. Harrison, UT/ORNL 26

MADNESS parallel runtime MPI Global Arrays ARMCI GPC/GASNET MADNESS math and numerics MADNESS applications – chemistry, physics, nuclear, ...

MADNESS architecture

Intel Thread Building Blocks being considered for multicore

slide-27
SLIDE 27

02/20/09 Robert J. Harrison, UT/ORNL 27

Runtime Objectives

  • Scalability to 1+M processors ASAP
  • Runtime responsible for
  • scheduling and placement,
  • managing data dependencies,
  • hiding latency, and
  • Medium to coarse grain concurrency
  • Compatible with existing models
  • MPI, Global Arrays
  • Borrow successful concepts from

Cilk, Charm++, Python

  • Anticipating next gen. languages
slide-28
SLIDE 28

02/20/09 Robert J. Harrison, UT/ORNL 28

Key elements

  • Futures for hiding latency and

automating dependency management

  • Global names and name spaces
  • Non-process centric computing
  • One-sided messaging between objects
  • Retain place=process for MPI/GA legacy
  • Dynamic load balancing
  • Data redistribution, work stealing, randomization
slide-29
SLIDE 29

02/20/09 Robert J. Harrison, UT/ORNL 29

Multi-threaded architecture

RMI Server (MPI or portals) Incoming active messages Task dequeue Incoming active messages Application logical main thread Outgoing active messages Work stealing

slide-30
SLIDE 30

CSGF June 2008 30

Summary

  • Tensor-based computation central to

– Efficient computation in many-dimensions – Efficient use of present and future computers

  • Challenges/opportunities

– Theoretical framework for low-rank representations

  • f operators/functions with controlled error

– Fast algorithms for rank reduction and computing with low-rank representations – Automating generation of efficient code – LAPACK-like software base – Specialization to application domains