Getting Science Out of Computing Dr Frank L offler Fri, Aug 1st - - PowerPoint PPT Presentation

getting science out of computing dr frank l offler fri
SMART_READER_LITE
LIVE PREVIEW

Getting Science Out of Computing Dr Frank L offler Fri, Aug 1st - - PowerPoint PPT Presentation

Getting Science Out of Computing Dr Frank L offler Fri, Aug 1st 2014 Frank L offler Fri, Aug 1st 2014 1 Goals 2 Summary 3 Additional Framework Concepts 4 Application efficiency 5 Scientific Programming Frank L offler Fri, Aug 1st 2014


slide-1
SLIDE 1

Getting Science Out of Computing Dr Frank L¨

  • ffler

Fri, Aug 1st 2014

Frank L¨

  • ffler

Fri, Aug 1st 2014

slide-2
SLIDE 2

1 Goals 2 Summary 3 Additional Framework Concepts 4 Application efficiency 5 Scientific Programming

Frank L¨

  • ffler

Fri, Aug 1st 2014

slide-3
SLIDE 3

Goals

Frank L¨

  • ffler

Fri, Aug 1st 2014

slide-4
SLIDE 4

Goals

We already discussed:

The concept of a simulation and it’s ingredients. Supercomputers from the application scientist’s point of view. Parallelization: data structures, load balancing, domain decomposition. Software Engineering: multi-physics simulations, large projects, distributed code development. The component model as software architecture for real-world simulation codes. The Cactus Software Framework as a specific example.

In this lecture we will discuss:

Additional framework concepts. Scientific programming.

Frank L¨

  • ffler

Fri, Aug 1st 2014

slide-5
SLIDE 5

Summary

Frank L¨

  • ffler

Fri, Aug 1st 2014

slide-6
SLIDE 6

Summary

To go from physics to a simulation, one usually

1

Finds a mathematical model (e.g. PDEs) expressing the physics.

2

Discretises the model (finite differences, spectral methods, ...)

3

Implements the discretised equations on a supercomputer (Programming, testing, debugging)

Many simulation codes have a similar structure. Many supercomputers have a similar architecture.

Frank L¨

  • ffler

Fri, Aug 1st 2014

slide-7
SLIDE 7

Summary

Parallel algorithms are necessary due to size of the problems (memory) and computational cost (CPU time). MPI is the tool of choice (right now). Requires domain decomposition, advanced data structures and load balancing algorithms. A component model is necessary to develop complicated multi-physics codes using geographically distributed code developers. A framework provides the glue between components. We introduced the Einstein Toolkit as a real world example.

Frank L¨

  • ffler

Fri, Aug 1st 2014

slide-8
SLIDE 8

Summary

We introduced the Cactus framework. Applications consist of many components (thorns) glued together by the framework (flesh). Cactus provides the main program while components are libraries. The end user can mix and match the thorns necessary for a specific problem and control which thorns are active at runtime. Thorns have implementation (regular code) and interface (ccl) files. Thorns “talk” to each other only through well-defined interfaces and an API provided by the flesh. The MPI parallellisation issues are (mostly) hidden from the application programmer (SYNC statements in schedule determines ghost zone updates).

Frank L¨

  • ffler

Fri, Aug 1st 2014

slide-9
SLIDE 9

Additional Framework Concepts

Frank L¨

  • ffler

Fri, Aug 1st 2014

slide-10
SLIDE 10

Cactus: Driver Thorn

A driver is a special thorn in Cactus that implements parallelism and memory management. The driver implements the “grid function” data type (as well as “grid arrays”). This externalizes parallelism so that other thorns don’t have to implement parallel algorithms However, this places certain restrictions onto other thorns. There must be exactly one driver active (standard Cactus driver is PUGH). The driver can provide advanced discretisation methods, such as AMR or multi-block (e.g. the Carpet driver). The driver can be based on an existing parallel library (e.g. Chombo

  • r Samrai).

Closely related thorns provide I/O.

Frank L¨

  • ffler

Fri, Aug 1st 2014

slide-11
SLIDE 11

Application efficiency

Frank L¨

  • ffler

Fri, Aug 1st 2014

slide-12
SLIDE 12

Data Access

Simulations handle large data sets Cannot easily copy data:

Not enough memory. It takes too much time.

If possible, each process must compute with the data it owns (“bring computation to data”). In Cactus, work routines are called on each process with access to the data owned by the process.

Frank L¨

  • ffler

Fri, Aug 1st 2014

slide-13
SLIDE 13

Data Sharing

Different components may need to access the same data.

Example: A spacetime evolution thorn needs access to the stress energy tensor and a hydrodynamics evolution thorn needs access to the spacetime metric.

If components are very independent, data need to be copied. If data cannot be copied, the components must interact in some (non-trivial) way. In Cactus this is done by inheritance: A thorn can have direct access to another thorn’s data.

Frank L¨

  • ffler

Fri, Aug 1st 2014

slide-14
SLIDE 14

Component Coupling

How closely are components coupled in a framework? No Coupling: Independently executing programs. Data “sharing” requires writing/copying/reading files. Loose Coupling: Independent data management and parallelism in each component. Data sharing requires memory transfers. Tight Coupling: Data are managed outside of components (or by a special component). Data sharing is efficient (components share access to the same memory), but components need to rely on an external data manager.

Frank L¨

  • ffler

Fri, Aug 1st 2014

slide-15
SLIDE 15

Component Coupling

Frank L¨

  • ffler

Fri, Aug 1st 2014

slide-16
SLIDE 16

Component Safety

Efficient data sharing between components requires running in the same address space. This means that components can (accidentally?) modify each other’s

  • data. E.g. errors (such as array index out of bounds) can propagate

between components. Compile time access control and coding standards can provide some safety.

Frank L¨

  • ffler

Fri, Aug 1st 2014

slide-17
SLIDE 17

Additional Framework Concepts Summary

Many simulation frameworks with many different designs exist. Fundamental design question is: How tight are components coupled? Tight coupling requires shared data management between components. Trade-off between independence/ease-of-programming/safety and efficiency.

Frank L¨

  • ffler

Fri, Aug 1st 2014

slide-18
SLIDE 18

Scientific Programming

Frank L¨

  • ffler

Fri, Aug 1st 2014

slide-19
SLIDE 19

Shared Code Development

Developing a large code as a group (or community) is different from small-scale programming.

There is old code (> 10 years old) that “belongs to nobody”. People use “your” code without understanding it. People make changes to “your” code without understanding it.

Best not to have “your” or “my” code. Instead share responsibility. Program defensively, so that wrong usage is (always) detected. There need to be a testing mechanism so that bad changes can be detected quickly.

Frank L¨

  • ffler

Fri, Aug 1st 2014

slide-20
SLIDE 20

Test Cases

Code can be > 10 years old and still very good.

Cannot rewrite old code every year (and introduce new errors every year).

But need to make sure old code is actually still working, despite the many other changes to the framework and other components that it interacts with. A test case stores program input and expected output so that any change in behavior can be detected. Test cases can also be used to test portability. Should get the same result on different architectures to within roundoff error.

Frank L¨

  • ffler

Fri, Aug 1st 2014

slide-21
SLIDE 21

Recovering From Errors

Mistakes happen (bugs) and it should be possible to undo bad changes to the code. It is important, therefore, to keep the complete history of all changes to the code in order to be able to undo changes when necessary. Need to use source code management tools such as subversion, darcs, git, mercurial. . . This not only keeps track of the changes to the code but also who made them.

Frank L¨

  • ffler

Fri, Aug 1st 2014

slide-22
SLIDE 22

Working Together

A source code management system also defines a single standard version of the components on which everybody is working. It would be too confusing to send source code around by email or look into other directories. Source code management systems also allows for temporary branches for heavy development when adding new features without disturbing people doing production runs. Source code management systems are indispensable for scientific code development. Tutorials for source code management systems are available online.

Frank L¨

  • ffler

Fri, Aug 1st 2014

slide-23
SLIDE 23

Policies

Working in a group on a code base requires some policies regarding:

Coding style (routine names, indentation, commit messages). Access rights (using, modifying, adding, committing). Testing standards before committing changes. Peer review before/after making changes.

It is necessary to know what is acceptable behavior.

Frank L¨

  • ffler

Fri, Aug 1st 2014

slide-24
SLIDE 24

Component Life Cycle

Idea, experimental implementation. Prototype, useful for a single paper. Production code, more features added, most bugs removed, useful for a series of papers. Mature code, very useful, few changes. Outdated, used mostly for historic investigations but still somewhat useful.

Frank L¨

  • ffler

Fri, Aug 1st 2014

slide-25
SLIDE 25

Portability

Machines become old, outdated and unreliable after a few years, while new machines become available. HPC systems frequently (sometimes once a week!) require maintenance or are unavailable for longer periods of time for an upgrade (maybe once a year!). Installed software (compilers) may have bugs that make a machine unusable until fixed. Therefore, scientific codes need to be portable so that one can then quickly use other machines.

Frank L¨

  • ffler

Fri, Aug 1st 2014

slide-26
SLIDE 26

Computati ıonal Challenges

Frank L¨

  • ffler

Fri, Aug 1st 2014

slide-27
SLIDE 27

Computati ıonal Challenges

Frank L¨

  • ffler

Fri, Aug 1st 2014

slide-28
SLIDE 28
slide-29
SLIDE 29
slide-30
SLIDE 30
slide-31
SLIDE 31

More and more diverse hardware

Frank L¨

  • ffler

Fri, Aug 1st 2014

slide-32
SLIDE 32
slide-33
SLIDE 33

Computational Challenges

Simulate cutting edge science Use latest numerical methods Make use of latest hardware

Cache Vector SMP parallelism Scale to many nodes

Frank L¨

  • ffler

Fri, Aug 1st 2014

slide-34
SLIDE 34

New Hardware Architectures

Frank L¨

  • ffler

Fri, Aug 1st 2014

slide-35
SLIDE 35

New Hardware Architectures

Software stays around much longer than hardware.

Software: > 15 years (Cactus). Hardware: 3 years on average (5 years at most).

Software design must not only be portable but also architecture independent. Software has to be adaptable when architecture changes dramatically.

Frank L¨

  • ffler

Fri, Aug 1st 2014

slide-36
SLIDE 36

Multi Core CPUs

The clock speed has not increased since 2005 whereas the transistor density has continued increasing. End result: more and more cores on a chip resulting in nodes with multiple cores able to access the same shared memory. Could in principle continue to use pure MPI parallelization. However, remember memory overhead due to ghost zones and additional computational overhead from domain decomposition. Scaling can suffer at very large core counts.

Frank L¨

  • ffler

Fri, Aug 1st 2014

slide-37
SLIDE 37

OpenMP

Better scaling may be obtained by using OpenMP (Open Multi-Processing) parallelization within a shared memory node and MPI parallelization between nodes. OpenMP is a shared memory parallelization paradigm based on lightweight threads and workload distribution between threads. Parallisation is implemented using compiler directives to tell the compiler how to distribute the work in a loop among threads. Parallelization can be done incrementally (i.e. loop by loop). OpenMP directives ignored when compiling with a sequential compiler. Works with Fortran, C and C++.

Frank L¨

  • ffler

Fri, Aug 1st 2014

slide-38
SLIDE 38

OpenMP examples

C:

#pragma omp parallel for private(i,w) shared(N,a,b) reduction(+:sum) for(i = 0; i < N; i++) { b[i] = 2*a[i]+3; w = i*i; sum = sum + w*a[i]; }

Fortran:

!$omp parallel do private(i,w) shared(N,a,b) reduction(+:sum) do i = 1, N b(i) = 2*a(i)+3 w = (i-1)*(i-1) sum = sum + w*a(i) end do !$omp end parallel do

Frank L¨

  • ffler

Fri, Aug 1st 2014

slide-39
SLIDE 39

Accelerator Computing

Early example from 2001: Sony/Toshiba/IBM collaboration for the Cell processor. It has one “real” core (Power Processor Element) and 8 “additional” cores (Synergistic Processing Element) on a single chip that are linked together by a high speed bus (Element Interconnect Bus). It was used in, e.g., Sony’s Playstation, Roadrunner at Los Alamos National Laboratory Newer example: Intel Xeon Phi used in TACC’s Stampede.

Frank L¨

  • ffler

Fri, Aug 1st 2014

slide-40
SLIDE 40

GPU Computing

Graphics cards with a Graphic Processing Unit (GPU) have a similar architecture with many cores. In a GPU each core has to perform the same operation (Single Instruction Multiple Data) i.e. the cores together work like a vector unit. The GPU cannot access the main memory of the CPU. Some of the top 10 machines on the current top 500 contains a significant portion of GPUs.

Frank L¨

  • ffler

Fri, Aug 1st 2014

slide-41
SLIDE 41

GPU Computing Challenges

Each core is small and slow. A GPU needs to use many cores to be as fast as a regular CPU.

Slower clock speed. Less memory per core.

Need to use even more cores to get good performance. But, if this is possible, then total speed is much larger than that of regular CPUs (of same total cost) and uses a lot less power. Memory management is a lot more complicated.

Data has to be copied from CPU main memory to GPU memory and back (slow). Algorithms have to be modified to do as many calculations as possible

  • n data on GPU before copying back and forth.

Frank L¨

  • ffler

Fri, Aug 1st 2014

slide-42
SLIDE 42

Framework Architecture Challenges

Don’t want to redesign framework and rewrite code for new architectures. The framework should isolate the programmer from architectural changes as much as possible. MPI clusters have been around for about 20 years. Now something new is coming (has already arrived).

Multicore CPUs (requires use of OpenMP, pthreads or similar). Cell and GPU accelerators (requires use of CUDA, OpenCL or OpenACC). Intel Xeon Phi accelerators currently only works with the Intel compilers.

Frank L¨

  • ffler

Fri, Aug 1st 2014

slide-43
SLIDE 43

Cactus Approach to Architecture Independence

Separate physics code from computer science code. Each thorn “sees” only the small part of the overall problem that is relevant for its function (information hiding). Ideally each physics thorn would act on a single grid point at a time. However, that would have too much overhead. This externalizes parallelism, load balancing, data distribution and data sharing (biggest Cactus success). Multicore and accelerator programming poses a distinct challenge for Cactus. Currently being addressed with macros, templates and automatic code generation (LoopControl, CaKernel and Kranc).

Frank L¨

  • ffler

Fri, Aug 1st 2014

slide-44
SLIDE 44

Ei ınstei ın Toolki ıt

Frank L¨

  • ffler

Fri, Aug 1st 2014

slide-45
SLIDE 45

Einstein Toolkit

Collection of scientific software components and tools to simulate and analyze general relativistic astrophysical systems Freely available as open source at http://einsteintoolkit.org Supported by NSF grants (1212401/1212426/1212433/1212460) State-of-the-art set of tools for numerical relativity, open source Currently more than 100 members from over 50 sites worldwide 8 maintainers from 6 sites > 200 publications, > 30 theses building on these components Regular, tested releases User support through various channels

Frank L¨

  • ffler

Fri, Aug 1st 2014

slide-46
SLIDE 46

Community Effort!

Frank L¨

  • ffler

Fri, Aug 1st 2014

slide-47
SLIDE 47

Collaborative Challenges

How can we work together? Researchers in the USA

Louisiana Pennsylvania Georgia California

Researchers in Germany Researchers in Canada

Frank L¨

  • ffler

Fri, Aug 1st 2014

slide-48
SLIDE 48

Einstein Toolkit as growing project

Cactus

Initially: some infrastructure, some application code

Frank L¨

  • ffler

Fri, Aug 1st 2014

slide-49
SLIDE 49

Einstein Toolkit as growing project

Cactus

Growing application suite

Frank L¨

  • ffler

Fri, Aug 1st 2014

slide-50
SLIDE 50

Einstein Toolkit as growing project

Cactus

Growing infrastructure “return”

Frank L¨

  • ffler

Fri, Aug 1st 2014

slide-51
SLIDE 51

Einstein Toolkit as growing project

Cactus

Users from more fields of science

Frank L¨

  • ffler

Fri, Aug 1st 2014

slide-52
SLIDE 52

Einstein Toolkit as growing project

Cactus

Most modules open-source, but not necessarily all

Frank L¨

  • ffler

Fri, Aug 1st 2014

slide-53
SLIDE 53

Base Modules

Frank L¨

  • ffler

Fri, Aug 1st 2014

slide-54
SLIDE 54

The Einstein Equations

Frank L¨

  • ffler

Fri, Aug 1st 2014

slide-55
SLIDE 55

spacetime curvature

Frank L¨

  • ffler

Fri, Aug 1st 2014

slide-56
SLIDE 56

spacetime curvature constants

Frank L¨

  • ffler

Fri, Aug 1st 2014

slide-57
SLIDE 57

spacetime curvature matter constants

Frank L¨

  • ffler

Fri, Aug 1st 2014

slide-58
SLIDE 58

spacetime curvature matter constants hydrodynamics el.-magnetism particle radiation

Frank L¨

  • ffler

Fri, Aug 1st 2014

slide-59
SLIDE 59

Frank L¨

  • ffler

Fri, Aug 1st 2014

slide-60
SLIDE 60

ADMBase

Frank L¨

  • ffler

Fri, Aug 1st 2014

slide-61
SLIDE 61

ADMBase TmunuBase

Frank L¨

  • ffler

Fri, Aug 1st 2014

slide-62
SLIDE 62

ADMBase ML_BSSN TmunuBase

Frank L¨

  • ffler

Fri, Aug 1st 2014

slide-63
SLIDE 63

ADMBase HydroBase ML_BSSN TmunuBase

Frank L¨

  • ffler

Fri, Aug 1st 2014

slide-64
SLIDE 64

ADMBase HydroBase ML_BSSN TmunuBase GRHydro

Frank L¨

  • ffler

Fri, Aug 1st 2014

slide-65
SLIDE 65

ADMBase HydroBase Initial Data / Analysis ML_BSSN TmunuBase GRHydro

Frank L¨

  • ffler

Fri, Aug 1st 2014

slide-66
SLIDE 66

ADMBase HydroBase ID / Analysis ML_BSSN TmunuBase GRHydro ID / Analysis ID / Analysis G-evol G-evol T-evol T-evol

Frank L¨

  • ffler

Fri, Aug 1st 2014

slide-67
SLIDE 67

Guiding Principles

Open, community-driven software development Separation of physics software and computational infrastructure Stable interfaces, allowing extensions Simplify usage where possible:

Doing science >> Running a simulation Students need to know a lot about physics (meaningful initial conditions, numerical stability, accuracy/resolution, have patience, have curiosity, develop a “gut feeling” for what is right ...) Einstein Toolkit cannot give that, however: Open codes that are easy to use allow to concentrate on these things!

Frank L¨

  • ffler

Fri, Aug 1st 2014

slide-68
SLIDE 68

Credits, Citations

In academics: citations, citations, citations! For Einstein Toolkit: Open and free source No requirement to cite anything However: requested to cite a few publications Which publications:

One, maybe two for the Toolkit itself Some components list a few as well List published on website and manage through publication database

Frank L¨

  • ffler

Fri, Aug 1st 2014

slide-69
SLIDE 69

Summary

Use a (or several) source code management system for code development. Keep track of software versions for each simulation. Have test cases for each component to ensure correctness. Need portability and architecture independence or at least enough flexibility to make programs future-proof. Use some programming framework if at all possible

Frank L¨

  • ffler

Fri, Aug 1st 2014