Hete terog ogene neous C ous Conc oncur urrenc ncy Michael L. - - PowerPoint PPT Presentation

hete terog ogene neous c ous conc oncur urrenc ncy
SMART_READER_LITE
LIVE PREVIEW

Hete terog ogene neous C ous Conc oncur urrenc ncy Michael L. - - PowerPoint PPT Presentation

Hete terog ogene neous C ous Conc oncur urrenc ncy Michael L. Scott (on leave at Google Madison) www.cs.rochester.edu/u/scott/ Schloss Dagstuhl January 2015 Future Future proc processors ssors may not be y not be pre pretty tty


slide-1
SLIDE 1

Hete terog

  • gene

neous C

  • us Conc
  • ncur

urrenc ncy

Michael L. Scott

(on leave at Google Madison)

www.cs.rochester.edu/u/scott/

Schloss Dagstuhl January 2015

slide-2
SLIDE 2

MLS 2

Future Future proc processors ssors

may not be y not be pre pretty tty

  • Specter of “dark silicon”: most of the chip may need to

be “off” most of the time

  • Architects likely to fill the space with customized circuits

» compression, encryption, XML parsing, pattern matching, media

transcoding, vector/matrix algebra, arbitrary precision math, FFT, even FPGA ...

» Not to mention cores with different computational/energy

tradeoffs

  • “Typical” program may need to jump frequently from
  • ne core to another
slide-3
SLIDE 3

MLS 3

Progre Progression of func ssion of functiona tionality lity

  • FPU: pure simple function (e.g., arctan)

» protection not really an issue

  • GPU: fire-and-forget rendering
  • GPGPU: compute and return (with memory access)

» direct access from user space » one protection domain at a time

  • first-class core: juggle multiple contexts safely

» preemption, multiprogramming

slide-4
SLIDE 4

MLS 4

How do we

  • w do we...

...

  • arbitrate access to resources (cycles, scratchpad

memory, bandwidth, ...)

» what do we need in HW that we don’t have now?

  • choose among cores with non-trivial tradeoffs

(speed, power, energy, load)

  • access system services on nontraditional cores
  • balance computational ability v. locality

» how fast can we stream data from core to core?

  • accommodate heterogeneous ISAs (esp. if choosing

among cores on which these differ)

slide-5
SLIDE 5

MLS 5

And ( nd (w.r.t w.r.t. c . conc

  • ncurre

urrenc ncy), y), how do we how do we... ...

  • dispatch across cores (HW queues? flat combining?)
  • manage stacks (contiguous v. linked frames)
  • wait for completion (spin? yield? deschedule?

ship continuations?)

  • avoiding writing code in a different language for every

accelerator

  • unblock threads across cores? across languages?

» connections here to Eliot’s talk

slide-6
SLIDE 6

MLS 6

(U (Unsupporte nsupported) H d) Hypothe ypothese ses

  • Traditional kernel interface will not suffice

» must expose more of the underlying architecture, so run-time

systems can figure out what to do

» must not make everything a pthread [Capriccio, Akaros, ...]

  • Contiguous stack frames will not suffice; neither will

proliferating languages

» compiler help will be required

  • “Accelerator” cores will need “first-class status”

» ability to request OS services directly [GPUfs, ...]

  • Tree-structured dynamic call graph will be too restrictive

» will sometimes want to “return” elsewhere than whence we came

(continuation shipping)

slide-7
SLIDE 7

www.cs.rochester.edu/u/scott/

Ple Plenty to k nty to keep us b p us busy! usy!