Should We Defy Amdahls Law (or DALs motivations) Andr Seznec Andr - - PowerPoint PPT Presentation

should we defy amdahl s law
SMART_READER_LITE
LIVE PREVIEW

Should We Defy Amdahls Law (or DALs motivations) Andr Seznec Andr - - PowerPoint PPT Presentation

1 Should We Defy Amdahls Law (or DALs motivations) Andr Seznec Andr Seznec INRIA/IRISA 2 DAL: Defying Amdahls Law ERC advanced grant to A. Seznec (2011-2016) DAL objective: Given that Amdahls Law is Forever


slide-1
SLIDE 1

1

Should We Defy Amdahl’s Law

(or DAL’s motivations)

André Seznec André Seznec INRIA/IRISA

slide-2
SLIDE 2

2

DAL: Defying Amdahl’s Law

  • ERC advanced grant to A. Seznec (2011-2016)

DAL objective: « Given that Amdahl’s Law is Forever propose (impact) the microarchitecture of the 2020 General Purpose manycore »

slide-3
SLIDE 3

3

10 years in the multicore era and what ?

  • Multicores are everywhere
  • Parallel (mainstream) apps do not materialize
slide-4
SLIDE 4

4

Multicores are everywhere

  • Multicores in servers, desktop, laptops
  • 2-4-8-12 O-O-O cores
  • Multicores in smart phones, tablets
  • 2-4-(not that simple) cores
  • 2-4-(not that simple) cores
  • Manycores for niche markets
  • 48-80-100 simple cores
  • Tilera, Intel MIC
slide-5
SLIDE 5

5

Multicore/multithread for everyone

  • End-user : improved usage comfort
  • Can read e-mail and hear MP3
  • Parallel performance for the masses?
  • Parallel performance for the masses?
  • Very few (scalable) mainstream // apps
  • Graphics
  • Niche market segments
slide-6
SLIDE 6

6

No parallel software bonanza in the near future

  • Inheritage of sequential legacy codes

Parallelism is not cost-effective for most apps

  • Parallelism is not cost-effective for most apps
  • Sequential programming will remain dominant
slide-7
SLIDE 7

7

Inheritage of sequential legacy codes

  • Software is more resilient than hardware
  • Apps are surviving/evolving for years, often decades
  • Very few parallel apps now
  • Unlikely redevelopment of parallel apps from scratch
  • Computing intensive sections will be parallelized
  • But significant code sections will remain sequential
slide-8
SLIDE 8

8

Parallelism is not cost-effective for most apps

  • Why parallelism ?
  • Only for performance
  • But costly:
  • But costly:
  • Difficult, man-time consuming, error prone
  • Poorly portable: functionality and performance
slide-9
SLIDE 9

9

Sequential programming will remain dominant

  • Just easier
  • The « Joe » programmer
  • Portability, maintenance, debug
  • + compiler to parallelize
  • + parallel libraries
  • + software components (developped by experts)
slide-10
SLIDE 10

10

Looking backwards Looking backwards

slide-11
SLIDE 11

11

2002: The End of the Uniprocessor Road

  • Power and temperature walls:
  • Stopped the frequency increase
  • 2x transistors: 5 %? 10 % ? perf. (if any)

economical logic : buy smaller chips ! economical logic : buy smaller chips ! IC industry needs to sell new (expensive) chips: Marketing: « You need 2 (4, 8) cores »

slide-12
SLIDE 12

12

Marketing multicores to the masses 2002- ..

GREAT !!

slide-13
SLIDE 13

13

And now ?

The end user is not such a fool ..

slide-14
SLIDE 14

14

Following the trend: 2020

  • Silicon area, power envelope
  • for 100 Nehalem class cores
  • r
  • r
  • for 1,000 simple cores (VLIW, in-order

superscalar)

slide-15
SLIDE 15

15

Amdahl’s Law

“Cannot run faster than sequential part”

seq. parallel

slide-16
SLIDE 16

16

Naive model

  • A parallel application:
  • Parallel section: can use 1000 processors
  • Sequential section: run on a single

processor SEQ: fraction of code in sequential section

slide-17
SLIDE 17

17

Complex cores against simple cores

  • CC: 100 complex vs SC :1000 simple cores

with complex 2X faster than simple if SEQ > 0.8 % then CC > SC

slide-18
SLIDE 18

18

And if ..

  • Use a huge amount of resource for a single core:

10X the area of the complex core 10X the power of the complex core Use all the uniprocessor techniques Use all the uniprocessor techniques

  • Very wide issue (8 – 16 ?)
  • Ultimate frequency ( « heat and run »)
  • Helper threads
  • Value prediction
  • ..
slide-19
SLIDE 19

19

And if ..

  • UC ultra complex cores (but only 10)
  • 10X more resources than complex cores
  • but only 10 of them
  • 2X faster
  • 2X faster

If SEQ > 3.3 % then UC > SC If SEQ > 8 % then UC > CC

slide-20
SLIDE 20

20

So what ?

  • Embarassingly parallel

SC simple cores

  • Some parallel + some sequential
  • Some parallel + some sequential
  • CC complex cores
  • Sequential+ poor parallel + multiprogrammed
  • UC ultra complex cores
slide-21
SLIDE 21

21

And hybrid SC + CC ?

CC_SC:

  • 50 complex
  • 500 simple

if SEQ> 0.2% then CC_SC > SC

slide-22
SLIDE 22

22

DAL architecture proposition

  • Heterogeneous architecture:
  • A few ultra complex cores
  • to enable performance
  • n sequential codes
  • n sequential codes

and/or critical sections

  • A « sea » of simple cores
  • for parallel sections
slide-23
SLIDE 23

23

For our simple model

« DAL » : UC_SC 5 ultra complex cores + 500 simple cores

  • If SEQ > 0.13 % then « DAL » > SC
  • If SEQ > 0.13 % then « DAL » > SC
  • « DAL » always better than UC, CC, CC_SC
slide-24
SLIDE 24

24

DAL

Many groups targetting architecture for parallel performance Many groups targetting energy efficiency Many groups targetting energy efficiency Let us concentrate on performance on sequential apps or code sections

slide-25
SLIDE 25

25

DAL research directions

  • Focus on the sequential performance
  • The sequential accelerator
  • Heat and run
  • Microarchitecture of O-O-O execution cores
  • Microarchitecture of O-O-O execution cores
  • Revisit all the « old » concepts
  • but with quasi-unlimited resources
  • Manycores and sequential codes
  • Can we use (adapt) the plurality of (simple)

cores ?