Lecture 2: Terminology and Definitions Abhinav Bhatele, Department - - PowerPoint PPT Presentation

lecture 2 terminology and definitions
SMART_READER_LITE
LIVE PREVIEW

Lecture 2: Terminology and Definitions Abhinav Bhatele, Department - - PowerPoint PPT Presentation

High Performance Computing Systems (CMSC714) Lecture 2: Terminology and Definitions Abhinav Bhatele, Department of Computer Science Announcements ELMS/Canvas page for the course is up: myelms.umd.edu


slide-1
SLIDE 1

Lecture 2: Terminology and Definitions

Abhinav Bhatele, Department of Computer Science

High Performance Computing Systems (CMSC714)

slide-2
SLIDE 2

Abhinav Bhatele, CMSC714

Announcements

  • ELMS/Canvas page for the course is up: myelms.umd.edu
  • https://umd.instructure.com/courses/1273118
  • Slides from previous class are now posted online
  • Assignments, project and midterm dates are also online

2

slide-3
SLIDE 3

Abhinav Bhatele, CMSC714

Summary of last lecture

  • Need for high performance computing
  • Parallel architecture: nodes, memory, network, storage
  • Programming models: shared memory vs. distributed
  • Performance and debugging tools
  • Systems issues: job scheduling, routing, parallel I/O, fault tolerance, power
  • Parallel algorithms and applications

3

slide-4
SLIDE 4

Abhinav Bhatele, CMSC714

Cores, sockets, nodes

  • CPU: processor
  • Single or multi-core: core is a processing unit,

multiple such units on a single chip make it a multi- core processor

  • Socket: chip
  • Node: packaging of sockets

4

https://www.glennklockwood.com/hpc-howtos/process-affinity.html

slide-5
SLIDE 5

Abhinav Bhatele, CMSC714

Serial vs. parallel code

  • Thread: a thread or path of execution managed by the OS
  • Process: heavy-weight, processes do not share resources such as memory, file

descriptors etc.

  • Serial or sequential code: can only run on a single thread or process
  • Parallel code: can be run on one or more threads or processes

5

slide-6
SLIDE 6

Abhinav Bhatele, CMSC714

Scaling and scalable

  • Scaling: running a parallel program on 1 to n processes
  • 1, 2, 3, … , n
  • 1, 2, 4, 8, …, n
  • Scalable: A program is scalable if it’s performance improves when using more

resources

6

slide-7
SLIDE 7

Abhinav Bhatele, CMSC714

Weak versus strong scaling

  • Strong scaling: Fixed total problem size as we run on more processes
  • Weak scaling: Fixed problem size per process but increasing total problem size as we

run on more processes

7

slide-8
SLIDE 8

Abhinav Bhatele, CMSC714

Speedup and efficiency

  • Speedup: Ratio of execution time on one process to that on n processes
  • Efficiency: Speedup per process

8

Speedup = t1 tn Efficiency = t1 tn × n

slide-9
SLIDE 9

Abhinav Bhatele, CMSC714

Amdahl’s law

  • Speedup is limited by the serial portion of the code
  • Often referred to as serial “bottleneck”
  • Lets say only a fraction p of the code can be parallelized on n processes

9

Speedup = 1 (1 − p) + p/n

slide-10
SLIDE 10

Abhinav Bhatele, CMSC714

Supercomputers vs. commodity clusters

  • Typically, supercomputer refers to customized hardware
  • IBM Blue Gene, Cray XT, Cray XC
  • Cluster refers to a parallel machine put together using off-the-shelf hardware

10

slide-11
SLIDE 11

Abhinav Bhatele, CMSC714

Communication and synchronization

  • Each physical node might compute independently for a while
  • When data is needed from other (remote) nodes, messaging occurs
  • Referred to as communication or synchronization or MPI messages
  • Intra-node vs. inter-node communication
  • Bulk synchronous programs: All processes compute simultaneously, then synchronize

together

11

slide-12
SLIDE 12

Abhinav Bhatele, CMSC714

Different models of parallel computation

  • SIMD: Single Instruction Multiple Data
  • MIMD: Multiple Instruction Multiple Data
  • SPMD: Single Program Multiple Data
  • Typical in HPC

12

slide-13
SLIDE 13

Abhinav Bhatele, CMSC714

Writing parallel programs

  • Decide the algorithm first
  • Data: how to distribute data among threads/processes?
  • Data locality
  • Computation: how to divide work among threads/processes?

13

slide-14
SLIDE 14

Abhinav Bhatele, CMSC714

Writing parallel programs: examples

  • Molecular Dynamics
  • N-body Simulations

14

slide-15
SLIDE 15

Abhinav Bhatele, CMSC714

Load balance and grain size

  • Load balance: try to balance the amount of work (computation) assigned to different

threads/ processes

  • Grain size: ratio of computation-to-communication
  • Coarse-grained vs. fine-grained

15

slide-16
SLIDE 16

Abhinav Bhatele, CMSC714

2D Jacobi iteration

  • Stencil computation
  • Commonly found kernel in computational

codes

16

A[i, j] = A[i, j] + A[i − 1,j] + A[i + 1,j] + A[i, j − 1] + A[i, j + 1] 5

slide-17
SLIDE 17

Abhinav Bhatele 5218 Brendan Iribe Center (IRB) / College Park, MD 20742 phone: 301.405.4507 / e-mail: bhatele@cs.umd.edu

Questions?