Complexity Measures for Parallel Computation Complexity Measures - - PowerPoint PPT Presentation

complexity measures for parallel computation
SMART_READER_LITE
LIVE PREVIEW

Complexity Measures for Parallel Computation Complexity Measures - - PowerPoint PPT Presentation

Complexity Measures for Parallel Computation Complexity Measures for Parallel Computation Problem parameters: n index of problem size p number of processors Algorithm parameters: t p running time on p processors t 1 time on 1


slide-1
SLIDE 1

Complexity Measures for Parallel Computation

slide-2
SLIDE 2

Complexity Measures for Parallel Computation

Problem parameters:

  • n

index of problem size

  • p

number of processors Algorithm parameters:

  • tp

running time on p processors

  • t1

time on 1 processor = sequential time = “work”

  • t∞ time on unlimited procs = critical path length = “span”
  • v

total communication volume Performance measures

  • speedup s = t1 / tp
  • efficiency e = t1 / (p*tp) = s / p
  • (potential) parallelism

pp = t1 / t∞

  • computational intensity q = t1 / v
slide-3
SLIDE 3

Several possible models!

  • Execution time and parallelism:
  • Work / Span Model
  • Total cost of moving data:
  • Communication Volume Model
  • Detailed models that try to capture time for moving data:
  • Latency / Bandwidth Model (for message-passing)
  • Cache Memory Model (for hierarchical memory)
  • Other detailed models we won’t discuss: LogP, UMH, ….
slide-4
SLIDE 4

tp = execution time on p processors

Work / Span Model

slide-5
SLIDE 5

tp = execution time on p processors t1 = wo work

Work / Span Model

slide-6
SLIDE 6

tp = execution time on p processors

* Also called criti tical-path th length th

  • r computa

tati tional depth th.

t1 = wo work t∞ = sp span an * *

Work / Span Model

slide-7
SLIDE 7

tp = execution time on p processors t1 = wo work t∞ = sp span an * *

* Also called criti tical-path th length th

  • r computa

tati tional depth th.

WORK

ORK L

LAW

AW

∙ tp ≥t1/p

SPAN

PAN L

LAW

AW

∙ tp ≥ t∞

Work / Span Model

slide-8
SLIDE 8

Work: Work: t1(A∪B) = Series Composition

A B

Work: Work: t1(A∪B) = t1(A) + t1(B) Sp Span: n: t∞(A∪B) = t∞(A) +t∞(B) Sp Span: n: t∞(A∪B) =

slide-9
SLIDE 9

Parallel Composition

A B

Sp Span: n: t∞(A∪B) = max{t∞(A), t∞(B)} Work: Work: t1(A∪B) = t1(A) + t1(B)

slide-10
SLIDE 10

De

  • Def. t1/tP = sp

speed eedup up

  • n p processors.

If t1/tP = Θ(p), we have lin linear speedu ear speedup, = p, we have perfect t linear speedup, > p, we have sup superlinear erlinear sp speed eedup up,

(which is not possible in this model,
 because of the Work Law tp ≥ t1/p)

Speedup

slide-11
SLIDE 11

Parallelism

Because the Span Law requires tp ≥ t∞, the maximum possible speedup is

t1/t∞

= (potential) parallelism = the average amount of work per step along the span.

slide-12
SLIDE 12

Laws of Parallel Complexity

  • Work law:

tp ≥ t1 / p

  • Span law:

tp ≥ t∞

  • Amdahl’s law:
  • If a fraction f, between 0 and 1, of the work must be

done sequentially, then speedup ≤ 1 / f

  • Exercise: prove Amdahl’s law from the span law.
slide-13
SLIDE 13

Communication Volume Model

  • Network of p processors
  • Each with local memory
  • Message-passing
  • Communication volume (v)
  • Total size (words) of all messages passed during computation
  • Broadcasting one word costs volume p (actually, p-1)
  • No explicit accounting for communication time
  • Thus, can’t really model parallel efficiency or speedup;

for that, we’d use the latency-bandwidth model (see later slide)

slide-14
SLIDE 14

Complexity Measures for Parallel Computation

Problem parameters:

  • n

index of problem size

  • p

number of processors Algorithm parameters:

  • tp

running time on p processors

  • t1

time on 1 processor = sequential time = “work”

  • t∞ time on unlimited procs = critical path length = “span”
  • v

total communication volume Performance measures

  • speedup s = t1 / tp
  • efficiency e = t1 / (p*tp) = s / p
  • (potential) parallelism

pp = t1 / t∞

  • computational intensity q = t1 / v
slide-15
SLIDE 15

Detailed complexity measures for data movement I: Latency/Bandwith Model

Moving data between processors by message-passing

  • Machine parameters:
  • α or tstartup latency (message startup time in seconds)
  • β or tdata inverse bandwidth (in seconds per word)
  • between nodes of Triton, α ∼ 2.2 × 10-6 and β ∼ 6.4 × 10-9
  • Time to send & recv or bcast a message of w words: α + w*β
  • tcomm total commmunication time
  • tcomp total computation time
  • Total parallel time: tp = tcomp + tcomm
slide-16
SLIDE 16

Moving data between cache and memory on one processor:

  • Assume just two levels in memory hierarchy, fast and slow
  • All data initially in slow memory
  • m = number of memory elements (words) moved between fast and slow

memory

  • tm = time per slow memory operation
  • f = number of arithmetic operations
  • tf = time per arithmetic operation, tf << tm
  • q = f / m (computational intensity) flops per slow element access
  • Minimum possible time = f * tf when all data in fast memory
  • Actual time
  • f * tf + m * tm = f * tf * (1 + tm/tf * 1/q)
  • Larger q means time closer to minimum f * tf

Detailed complexity measures for data movement II: Cache Memory Model