4. Performance Analysis of Parallel Programs 4.1 Performance - - PowerPoint PPT Presentation

4 performance analysis of parallel programs 4 1
SMART_READER_LITE
LIVE PREVIEW

4. Performance Analysis of Parallel Programs 4.1 Performance - - PowerPoint PPT Presentation

4. Performance Analysis of Parallel Programs 4.1 Performance Evaluation of Computer User criteria: - Small response times Computing center criteria: - High throughputs 4.1.1 Evaluation of CPU Performance 4.1.1 Evaluation of CPU Performance


slide-1
SLIDE 1
  • 4. Performance Analysis of

Parallel Programs

slide-2
SLIDE 2

4.1 Performance Evaluation of Computer

User criteria:

  • Small response times

Computing center criteria:

  • High throughputs
slide-3
SLIDE 3

4.1.1 Evaluation of CPU Performance

slide-4
SLIDE 4

4.1.1 Evaluation of CPU Performance

The response time of a program A can be split into:

slide-5
SLIDE 5

User CPU time of A The response time of a program A can be split into:

4.1.1 Evaluation of CPU Performance

slide-6
SLIDE 6

User CPU time of A System CPU time of A The response time of a program A can be split into:

4.1.1 Evaluation of CPU Performance

slide-7
SLIDE 7

User CPU time of A System CPU time of A Waiting time of A The response time of a program A can be split into:

4.1.1 Evaluation of CPU Performance

slide-8
SLIDE 8

User CPU time of A System CPU time of A Waiting time of A The response time of a program A can be split into:

4.1.1 Evaluation of CPU Performance

slide-9
SLIDE 9

4.1.1 Evaluation of CPU Performance

User CPU time of A

slide-10
SLIDE 10

User CPU time of A tcycle -> reciprocal to clock rate: T=1/f -> 2GHz = 1/(2*109)s = 0.5ns (cycle time) ncycle (A)-> total number of CPU cycles needed for all instructions of A

4.1.1 Evaluation of CPU Performance

slide-11
SLIDE 11

CPI (Clock cycles Per Instruction)

4.1.1 Evaluation of CPU Performance

slide-12
SLIDE 12

CPI (Clock cycles Per Instruction)

4.1.1 Evaluation of CPU Performance

slide-13
SLIDE 13

CPI (Clock cycles Per Instruction) ninstr(A) -> total number of instructions executed for A

4.1.1 Evaluation of CPU Performance

slide-14
SLIDE 14

ni(A) -> is the number of instructions of type Ii executed for the program A CPIi -> number of CPU cycles needed for instructions of type Ii CPI (Clock cycles Per Instruction)

4.1.1 Evaluation of CPU Performance

slide-15
SLIDE 15

Example: We consider a processor with three instruction classes I1, I2, I3 containing instructions which require 1, 2, or 3 cycles for their execution. We assume that there are two difgerent possibilities for the translation

  • f a

Programming language construct using difgerent instructions. CPI1 = 10/5 = 2 CPI2 = 9/6 = 1,5 CPI (Clock cycles Per Instruction)

4.1.1 Evaluation of CPU Performance

slide-16
SLIDE 16

4.1.2 MIPS and MFLOPS

slide-17
SLIDE 17

MIPS (Million Instructions Per Second)

4.1.2 MIPS and MFLOPS

slide-18
SLIDE 18

Drawbacks/limitations:

  • Only considers the number of instructions.
  • MIPS rate does not necessarily correspond to the

execution time. MIPS (Million Instructions Per Second)

4.1.2 MIPS and MFLOPS

slide-19
SLIDE 19

4.1.2 MIPS and MFLOPS

MFLOPS (Million Floating-point Operations Per Second)

slide-20
SLIDE 20

4.1.2 MIPS and MFLOPS

MFLOPS (Million Floating-point Operations Per Second)

slide-21
SLIDE 21

4.1.2 MIPS and MFLOPS

MFLOPS (Million Floating-point Operations Per Second) Drawbacks/limitations:

  • Doesn’t difgerence between types of floating-points
  • perations performed.
slide-22
SLIDE 22

4.1.3 Performance of Processors with a Memory

slide-23
SLIDE 23

4.1.3 Performance of Processors with a Memory

slide-24
SLIDE 24

4.1.3 Performance of Processors with a Memory

nmm_cycles(A) -> number of additional machine cycles caused by memory accesses of A .