Silicon Graphics Scientific Library Update Mimi Celis Tom Elken - PowerPoint PPT Presentation

Silicon Graphics Scientific Library Update Mimi Celis Tom Elken celis@sgi.com telken@sgi.com Supercomputing Applications Silicon Graphics, Inc. 41st Cray User Group Conference Minneapolis, Minnesota

Contents ¥ Scientific Libraries available on SGI hardware ¥ SCSL Scientific Library (like ÒSGIÓ, ÒSCSLÓ doesnÕt mean anything ;-) ) ¥ SCSL Release 1.2 ¥ Signal Processing in SCSL 1.2 ¥ Performance ¥ Special Solvers in SCSL 1.2 ¥ Future 2

Scientific Libraries on SGI There are ÒmanyÓ scientific libraries available on SGI platforms today. ¥ LibSci on Cray platforms. ¥ CHALLENGEcomplib on IRIX platforms. (libcomplib.sgimath,libblas) Ð Part of the IDO in IRIX 6.4 and older Ð Part of the IRIX development libraries in IRIX 6.5 Ð Version 3.1 ¥ SCSL on IRIX platforms. Ð Unbundled product Ð Available for IRIX 6.4 and newer Ð Version 1.1 3

SCSL Scientific Library ¥ SCSL is a scientific and math library ¥ SCSL is (initially) available on IRIX 6.4 and 6.5 systems ¥ SCSL will become the standard scientific library on all SGI platforms ¥ SCSL will merge the important functionality of CHALLENGEcomplib and LibSci into one library ¥ SCSL will provide a new library with more functionality and better performance than either library by itself. 4

SCSL Contents ¥ BLAS (Basic Linear Algebra Subprograms). Ð BLAS1-Vector-vector operations Ð BLAS2-Matrix-vector operations Ð BLAS3-Matrix-matrix operations ¥ LAPACK Ð Symmetric and Nonsymmetric linear systems of equations Ð Symmetric and Nonsymmetric eigenvector/value Ð Singular Value Decomposition Ð Linear Least Squares BLAS and LAPACK developed at the University of Tennessee. 5

SCSL Contents (continued) ¥ Sparse Linear Equation Solvers Ð Symmetric linear systems of equations Ð Nonsymmetric linear systems of equations (NO pivoting) ¥ FFTs Ð multiple one-dimension mixed radix Ð one-,two-and three-dimension mixed radix Ð single-and double-precision, for both real and complex data types Sparse solvers and FFTs were developed at SGI. (There is no defacto standard API). 6

How to use SCSL ¥ Documentation in form of man pages: Ð intro_libscsl Ð intro_blas1, _blas2, _blas3 Ð intro_fft Ð intro_lapack Ð intro_sparse (soon) Ð these will point you to more detailed man pages ¥ Linking: Ð Serial: -lscs Ð OpenMP or libmp parallel: -lscs_mp -mp 7

SCSL Release 1.2 SCSL 1.1 is the current release. Release 1.2 will be the next SCSL release. Goals for 1.2: ¥ Add the missing complib Signal Processing functionality. ¥ Provide C language interfaces for the Signal Processing routines. ¥ Enhance the ordering techniques in the sparse linear solvers. ¥ Performance tuning for the MIPS R12000 Processor. ¥ Rollup bug fixes from SCSL 1.1 and complib 3.1. SCSL 1.2 will be released with IRIX 6.5.5 (late July 1999). 8

SCSL Release 1.2 (continued) SCSL 1.2 is the follow-on to CHALLENGEcomplib with some exceptions: ¥ SCSL 1.2 will NOT include o32 versions of the libraries. ¥ SCSL 1.2 will NOT support LINPACK and EISPACK. ¥ SCSL 1.2 will run on all platforms that have n32 or 64 support. CHALLENGEcomplib is available to run on older and current platforms,however: ¥ There will be no further releases of complib. ¥ No complib bugs fixes (with rare exceptions). 9

Signal Processing for SCSL 1.2 Additions to the FFTs : ¥ multiple 1D routine which calculates an FFT in one dimension for each row of a two-dimensional matrix. ¥ 1D, 2D and 3D routines that compute the product of the Fourier Transform of a sequence with the Fourier Transform of a filter (*prod routines in complib). ¥ Functions will be introduced to release memory allocated within the FFT routines. ¥ C language bindings. 10

Signal Processing for SCSL 1.2 (continued) SCSL 1.2 will include convolution and correlation routines. ¥ Convolution for Finite Impulse Response (FIR) and Infinite Impulse Response (IIR) filters, together with Correlations. ¥ 1D and 2D convolution and correlation Single and double precision for real and complex arithmetic. ¥ 2D routines will run on multiple processors. ¥ API similar to complib API (but not fully compatible). ¥ Fortran and C language bindings. The two main goals of the Convolution and Correlation library are performance and generality . It provides well tuned modules usable in most convolution and correlation instances. 11

Performance ¥ BLAS ¥ Fast Fourier Transforms ¥ Sparse Solver 12

BLAS Performance DGEMM Performance 700 600 500 400 Mflops 300 200 100 0 32 64 128 256 512 1024 2048 Matrix Size 13

BLAS Performance DGEMV Performance 450 400 350 300 Mflops 250 200 150 100 50 0 32 64 128 256 512 1024 2048 Matrix Size 14

BLAS Performance DGEMM Parallel Performance 18000 16000 14000 12000 10000 Mflops 8000 6000 4000 2000 0 1 2 4 8 16 32 Number of processors 15

Fast Fourier Transforms (FFT) ¥ 1-Dimensional FFT applications: Ð Seismic: many short FFTs (1024-4096 data points) Ð Sonar, radar cross-section, speech recognition and astronomical systems: large 1D FFTs ¥ Multi-dimensional FFTs: Ð image processing Ð PDEs from CFP applications Following charts show Òeffective megaflop rateÓ based on 5n*log(n) for each complex-to-complex FFT. 16

FFT performance 1D Complex-complex FFT 600 500 Single Precision Double Precision 400 Mflops 300 200 100 0 1 100 10000 1000000 1E+08 FFT size 17

FFT performance Complex-complex Multiple 1D FFT 600 500 400 Mflops 300 200 Single Precision Double Precision 100 0 10 100 1000 10000 FFT size and # of repetitions 18

FFT performance 2D Complex-complex FFT 450 400 350 300 Mflops 250 Single Precision 200 Double Precision 150 100 50 0 10 100 1000 FFT size of one dimension 19

FFT parallel performance Complex-complex Multiple 1D FFT 6000 1024-single 2048-single 5000 4096-single 1024-double 4000 2048-double Mflops 4096-double 3000 2000 1000 0 1 10 100 # of CPUs Ò1024-singleÓ means 1024 copies of a size 1024 single precision (32 bits) FFT 20

Changes to SGI Sparse Solvers ¥ New Matrix Ordering Options Ð Methods 3 and 4 are termed ÒExtreme2Ó ordering ¥ New default for ordering option Ð Extreme ordering (Method 2) is now the default ¥ Out-of-core solver option Ð Was in recent SCSL version, but now is documented Ð Single-processor only Ð Striped file system useful Ð Simple interface and performs well 21

New ordering options 3. Multiple Nested Dissection orders ¥ default is OMP_NUM_THREADS orders ¥ repeatable quality 4. Multiple ND orders using feedback file information ¥ default is 2 x OMP_NUM_THREADS orders ¥ feedback file is at most 5KB, up to 200 records ¥ binary feedback file ¥ a solver that learns 22

Choosing a default method ¥ Should default be best for Total Time for Nine models which size model? 3500 ¥ Decided to optimize for medium or larger problems 3000 (at least 5000 equations) 2500 ¥ Extreme2 (3) about 3% 2000 faster than Extreme, but is 1500 new tech., so we use Method 2 as the new default. 1000 500 0 1 2 3 4 Ordering Method 23

Out-of-core (OOC) Option ¥ Performance 10-40% Total Time for Nine models slower than extreme (1-CPU runs) (Method 2) ordering in- 1800 core; 15% in this case. 1600 1400 ¥ but faster than AMF (1) 1200 ¥ This used 4-way striping on 1000 file system -- 140 MB/s on 800 some reads 600 400 ¥ Allowed 128MB in-core for 200 factor storage 0 1 2 3 4 OOC Ordering Method / Factor Storage 24

Scalability: Factorization Mflops ¥ AmdahlÕs law resp. for 3500 much of lack of scaling gismondi 3000 in previous chart fleet10 th2 ¥ Over 11 Gflops Factorization Mflops 2500 achieved on gismondi 280Kdof on 48 CPUs 2000 ¥ More can be done to 1500 improve memory placement 1000 ¥ These results used 500 DSM_ROUND_ROBIN data placement 0 0 5 10 # of CPUs 25

PSLDLT: Scalability to 8 CPUs 7 ¥ Measured: Elapsed time 6 for 1 preprocess, 2 factorizations, 2 solves. fleet10 5 ¥ # floating point ops to gismondi Speedup factor & preprocess time 4 : 3 th2 Ð Gflop secs. Ð fleet10 383 27 2 Ð gismondi 133 3 280Kdof Ð th2 34 18 1 Ð 280Kdof 18 15 0 0 2 4 6 8 10 # of CPUs 26

Summary ¥ SCSL 1.2 improvements: Ð FFTs have new interface Ð Add the missing complib Signal Processing functionality. Ð Provide C language interfaces for the Signal Processing routines. Ð Enhance the ordering techniques in the sparse linear solvers. Ð Performance tuning for the MIPS R12000 Processor. Ð Rollup bug fixes from SCSL 1.1 and complib 3.1. ¥ Comments, questions: Ð Mimi Celis; celis@sgi.com Ð Tom Elken; telken@sgi.com 27

Silicon Graphics Scientific Library Update Mimi Celis Tom Elken - PowerPoint PPT Presentation

Silicon Graphics Scientific Library Update Mimi Celis Tom Elken celis@sgi.com telken@sgi.com Supercomputing Applications Silicon Graphics, Inc. 41st Cray User Group Conference Minneapolis, Minnesota Contents Scientific Libraries

Graphics Murray Cole Graphics 1 Graphics 2 Graphics 3 Graphics 4 Graphics 5 Graphics 6

CS378 - Mobile Computing 3D Graphics 2D Graphics android.graphics library for 2D graphics

PV Technology Based on Crystalline Silicon Wafers Manufacturing of Crystalline Silicon Week 4.2

3D GRAPHICS design animate render Computer Graphics 3D animation movies Computer Graphics

Library Department FY 2021 Library Department FY 2021 Library Organization Chart Springfield

Presentation 7.3b: Multiple linear regression Murray Logan 09 Aug 2016 library (GGally) library

Graphics! Using graphics in Python Many programming languages include a library for computer

Silicon Labs Corporate Overview J A N U A R Y 2 0 2 0 The leader in silicon, software and

Proposed Newport, Washington Silicon Metal Facility - Private & Confidential -

Silicon Europe - your connection to innovative European SMEs! www.silicon-europe.eu Silicon

Graphics Processing CS418 Computer Graphics John C. Hart Graphics Processing Graphics

AAPoly Library Orientation Library Contacts Phone : 61 3 8610 4132 Email : library@aapoly.edu.au

OpenGL: Open Graphics Library Graphics API Introduction to OpenGL ( Application Programming

Scalable Vector Graphics (SVG) XML Graphics for the Web SVG Overview Scalable Vector Graphics

Computer Graphics Overview CMSC 435/634 1 Graphics Areas Core graphics areas

CS 4204 Computer Graphics Structure Graphics and Structure Graphics and Hierarchical Modeling

Experiences with the Omega tool set in the context of the MARS case study Yuri Yushtein

What is an Adinkra Lutian Zhao Shanghai Jiao Tong University golbez@sjtu.edu.cn December 13,

Multiple Preprocessing for Systematic SAT Solvers Anbulagan and John Slaney Logic and

Pheno Pheno'pic 'pic He Heter erog ogen eneit eity of y of Leukemi mias Pier Giuseppe

Model checking with Message Sequence Charts Doron Peled Collaborators: R. Alur, E. Gunter, G.

Promoting Community College Education to International Students Through EducationUSA and 2+2

Intr Introduction t oduction to t o the UK LS & he UK LS & Census 2 Census 2011 D 11

Scenarios@run.time Distributed Scenarios@run.time Distributed Execution of Specifications

Silicon Graphics Scientific Library Update Mimi Celis Tom Elken - PowerPoint PPT Presentation

Silicon Graphics Scientific Library Update Mimi Celis Tom Elken celis@sgi.com telken@sgi.com Supercomputing Applications Silicon Graphics, Inc. 41st Cray User Group Conference Minneapolis, Minnesota Contents Scientific Libraries

Graphics Murray Cole Graphics 1 Graphics 2 Graphics 3 Graphics 4 Graphics 5 Graphics 6

CS378 - Mobile Computing 3D Graphics 2D Graphics android.graphics library for 2D graphics

PV Technology Based on Crystalline Silicon Wafers Manufacturing of Crystalline Silicon Week 4.2

3D GRAPHICS design animate render Computer Graphics 3D animation movies Computer Graphics

Library Department FY 2021 Library Department FY 2021 Library Organization Chart Springfield

Presentation 7.3b: Multiple linear regression Murray Logan 09 Aug 2016 library (GGally) library

Graphics! Using graphics in Python Many programming languages include a library for computer

Silicon Labs Corporate Overview J A N U A R Y 2 0 2 0 The leader in silicon, software and

Proposed Newport, Washington Silicon Metal Facility - Private &amp; Confidential -

Silicon Europe - your connection to innovative European SMEs! www.silicon-europe.eu Silicon

Graphics Processing CS418 Computer Graphics John C. Hart Graphics Processing Graphics

AAPoly Library Orientation Library Contacts Phone : 61 3 8610 4132 Email : library@aapoly.edu.au

OpenGL: Open Graphics Library Graphics API Introduction to OpenGL ( Application Programming

Scalable Vector Graphics (SVG) XML Graphics for the Web SVG Overview Scalable Vector Graphics

Computer Graphics Overview CMSC 435/634 1 Graphics Areas Core graphics areas

CS 4204 Computer Graphics Structure Graphics and Structure Graphics and Hierarchical Modeling

Experiences with the Omega tool set in the context of the MARS case study Yuri Yushtein

What is an Adinkra Lutian Zhao Shanghai Jiao Tong University golbez@sjtu.edu.cn December 13,

Multiple Preprocessing for Systematic SAT Solvers Anbulagan and John Slaney Logic and

Pheno Pheno'pic 'pic He Heter erog ogen eneit eity of y of Leukemi mias Pier Giuseppe

Model checking with Message Sequence Charts Doron Peled Collaborators: R. Alur, E. Gunter, G.

Promoting Community College Education to International Students Through EducationUSA and 2+2

Intr Introduction t oduction to t o the UK LS &amp; he UK LS &amp; Census 2 Census 2011 D 11

Scenarios@run.time Distributed Scenarios@run.time Distributed Execution of Specifications

Proposed Newport, Washington Silicon Metal Facility - Private & Confidential -

Intr Introduction t oduction to t o the UK LS & he UK LS & Census 2 Census 2011 D 11