Structural Object Programming Model: Enabling Efficient Development - PowerPoint PPT Presentation

Structural Object Programming Model: Enabling Efficient Development on Massively Parallel Architectures Mike Butts, Laurent Bonetto, Brad Budlong, Paul Wasson HPEC 2008 – September 2008

Introduction HPEC systems run compute-intensive real-time applications such as image � processing, video compression, software radio and networking. Familiar CPU, DSP, ASIC and FPGA technologies have all reached � fundamental scaling limits, failing to track Moore’s Law. A number of parallel embedded platforms have appeared to address this: � — SMP (symmetric multiprocessing) multithreaded architectures, adapted from general-purpose desktop and server architectures. — SIMD (single-instruction, multiple data) architectures, adapted from supercomputing and graphics architectures. — MPPA (massively parallel processor array) architectures, specifically aimed at high-performance embedded computing. Ambric devised a practical, scalable MPPA programming model first, � then developed an architecture, chip and tools to realize this model. Structural Object Programming Model: Enabling Efficient Development on Massively Parallel Architectures – Ambric, Inc. - HPEC 2008 2

Scaling Limits: CPU/DSP, ASIC/FPGA MIPS Single CPU & DSP performance 2008: � 5X gap has fallen off Moore’s Law 10000 20%/year — All the architectural features 1000 that turn Moore’s Law area into speed have been used up. Hennessy & 100 Patterson, — Now it’s just device speed. 52%/year Computer Architecture: A CPU/DSP does not scale � 10 Quantitative Approach , 4th ed. ASIC project now up to $30M � 2002 1986 1990 1994 1998 2006 — NRE, Fab / Design, Validation HW Design Productivity Gap � — Stuck at RTL — 21%/yr productivity vs 58%/yr Moore’s Law ASICs limited now, FPGAs soon � ASIC/FPGA does not scale � Gary Smith, The Crisis of Complexity, Parallel Processing is the Only Choice DAC 2003 Structural Object Programming Model: Enabling Efficient Development on Massively Parallel Architectures – Ambric, Inc. - HPEC 2008 3

Parallel Platforms for Embedded Computing Program processors in software, far more productive than hardware design � Massive parallelism is available � A basic pipelined 32-bit integer CPU takes less than 50,000 transistors — Medium-sized chip has over 100 million transistors available. — But many parallel chips are difficult to program. � The trick is to � 1) Find the right programming model first, 2) Arrange and interconnect the CPUs and memories to suit the model, 3) To provide an efficient, scalable platform that’s reasonable to program. Embedded computing is free to adopt a new platform � General-purpose platforms are bound by huge compatibility constraints — Embedded systems are specialized and implementation-specific — Structural Object Programming Model: Enabling Efficient Development on Massively Parallel Architectures – Ambric, Inc. - HPEC 2008 4

Choosing a Parallel Platform That Lasts How to choose a durable parallel platform for embedded computing? � — Don’t want adopt a new platform only to have to change again soon. Effective parallel computing depends on common-sense qualities: � — Suitability: How well-suited is its architecture for the full range of high- performance embedded computing applications? — Efficiency: How much of the processors’ potential performance can be achieved? How energy efficient and cost efficient is the resulting solution? — Development Effort: How much work to achieve a reliable result? Inter-processor communication and synchronization are key: � — Communication: How easily can processors pass data and control from stage to stage, correctly and without interfering with each other? — Synchronization: How do processors coordinate with one another, to maintain the correct workflow? — Scalability: Will the hardware architecture and development effort scale up to a massively parallel system of hundreds or thousands of processors? Structural Object Programming Model: Enabling Efficient Development on Massively Parallel Architectures – Ambric, Inc. - HPEC 2008 5

Symmetric Multiprocessing (SMP) Multiple processors share similar access to a common memory space � Incremental path from the old serial programming model � — Each processor sees the same memory space it saw before. — Existing applications run unmodified (unaccelerated as well of course) — Old applications with millions of lines of code can run without modification. SMP programming model has task-level and thread-level parallelism. � — Task-level is like multi-tasking operating system behavior on serial platforms. To use more parallelism the tasks must become parallel: Multithreading � — Programmer writes source code which forks off separate threads of execution — Programmer explicitly manages data sharing, synchronization Commercial SMP Platforms: � — Multicore GP processors: Intel, AMD (not for embedded systems) — Multicore DSPs: TI, Freescale, ... — Multicore Systems-on-Chip: using cores from ARM, MIPS, ... Structural Object Programming Model: Enabling Efficient Development on Massively Parallel Architectures – Ambric, Inc. - HPEC 2008 6

SMP Interconnects, Cache Coherency Each SMP processor has its own single or multi-level cache. � Needs a scalable interconnect to reach other caches, memory, I/O. � CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU U U U CPU CPU CPU CPU P P P C C C L1$ L1$ L1$ L1$ L1$ L1$ L1$ L1$ L1$ L1$ L1$ L1$ L1$ L1$ L1$ CPU CPU CPU CPU CPU CPU L1$ L1$ L1$ L L L L 1 1 1 1 $ $ $ $ L2$ L2$ L2$ L2$ L2$ L2$ L2$ L2$ L2$ L2$ L2$ L2$ L2$ L2$ L2$ CPU CPU CPU CPU CPU CPU CPU L2$ L2$ L2$ L2$ L2$ L2$ L2$ L1$ L1$ L1$ L1$ L1$ L1$ L1$ L1$ L1$ L1$ L1$ L1$ L1$ L2$ L2$ L2$ L2$ L2$ L2$ L2$ L2$ L2$ L2$ L2$ L2$ L2$ CPU CPU CPU C C C C CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU P P P P $ $ $ L1$ L1$ L1$ L1$ L2$ L2$ L2$ L L L L U U U U 1 1 1 L1$ L1$ L1$ L1$ L1$ L1$ L1$ L1$ L1$ L1$ L1$ L1$ L L L 2 2 2 2 $ $ $ $ L2$ L2$ L2$ L2$ L2$ L2$ L2$ L2$ L2$ L2$ L2$ L2$ CPU CPU CPU L2$ L2$ L2$ CPU CPU CPU L2$ L2$ L2$ L1$ L1$ L1$ L1$ L1$ L1$ CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU L1$ L1$ L1$ L1$ L1$ L1$ L1$ L1$ L1$ L1$ L1$ L1$ $ $ $ $ L2$ L2$ L2$ L2$ 2 2 2 2 L1$ L1$ L1$ L1$ L L L L U U U U CPU CPU CPU CPU L2$ L2$ L2$ L2$ L2$ L2$ L2$ L2$ L2$ L2$ L2$ L2$ L L L L 1 1 1 1 P P P P $ $ $ $ C C C C I/O I/O L2$ L2$ L2$ L2$ Memory Memory SDRAM SDRAM L1$ L1$ L1$ L1$ CPU CPU CPU CPU Memory Memory Memory Memory Memory Memory M M I/O I/O A A SDRAM SDRAM I I / / R R O O D D S S Bus, Ring: Saturates Crossbar: N-squared Network-on-chip: Complex SMP processors have separate caches which must be kept coherent � — Bus snooping, network-wide directories As the number of processors goes up, total cache traffic goes up linearly, � but the possible cache conflict combinations go up as the square. — Maintaining cache coherence becomes more expensive and more complex faster than the number of processors. Structural Object Programming Model: Enabling Efficient Development on Massively Parallel Architectures – Ambric, Inc. - HPEC 2008 7

SMP Communication CPU CPU In SMP communication is a second-class function . � L1$ L1$ Just a side-effect of shared memory. — Data is copied five times through � L2$ L2$ four memories and an interconnect. The destination CPU must wait through a two-level — cache miss to satisfy its read request. interconnect Poor cache reuse if the data only gets used once. � Pushes out other data, causing other cache misses. — Communication thru shared memory is expensive in power compared with � communicating directly. The way SMPs do inter-processor communication through shared memory � is complex and expensive. Structural Object Programming Model: Enabling Efficient Development on Massively Parallel Architectures – Ambric, Inc. - HPEC 2008 8

SMP: The Troubles with Threads SMP’s multithreaded programming model is deeply flawed: � Multithreaded programs behave unpredictably. Single-threaded (serial) program always goes through the same sequence of � intermediate states, i.e. the values of its data structures, every time. — Testing a serial program for reliable behavior is reasonably practical. Multiple threads communicate with one another through shared variables: � — Synchronization: partly one thread, partly the other Intended behavior Result depends on behavior of all threads. � x y — Depends on dynamic behavior: indeterminate results. Untestable . x Synchronization failure x y “If we expect concurrent Another thread may interfere programming to become mainstream, and if we demand reliability and x y y predictability from programs, we must z discard threads as a programming model.” -- Prof. Edward Lee z Structural Object Programming Model: Enabling Efficient Development on Massively Parallel Architectures – Ambric, Inc. - HPEC 2008 9

Structural Object Programming Model: Enabling Efficient Development - PowerPoint PPT Presentation

Structural Object Programming Model: Enabling Efficient Development on Massively Parallel Architectures Mike Butts, Laurent Bonetto, Brad Budlong, Paul Wasson HPEC 2008 September 2008 Introduction HPEC systems run compute-intensive

Object Oriented Object 3 Programming Object 1 Object 2 Object 4 For : COP 3330. Object

Object-Oriented Databases Object Oriented Databases ODMG Standard Object Model, Object

61A Lecture 15 Announcements Object-Oriented Programming Object-Oriented Programming 4

The Object Factory Object-Oriented Programming in R: S3 & R6 pour refill Object-Oriented

Enabling Efficient Batch Verification Enabling Efficient Batch Verification on Data Integrity for

Object oriented Object oriented Object oriented Object oriented approach and UML approach and

T T T The CDO Blueprint: Enabling the he CDO Blueprint: Enabling the he CDO Blueprint:

Object Oriented Programming in Python By Amarjit Singh Karanvir Singh *#%???$% Contents

Lecture 18: object-oriented programming Review of Object-Oriented Programming in Python The

CMSC 132: Object-Oriented Programming II Object-Oriented Programming Intro Department of

Introduction to Object-Oriented Programming Review 2: Object-Oriented Programming Christopher

Object Oriented Programming in C# February 13, 2008 The C# object model 1 Implementing an

Structural Matrices in MDOF Systems Structural Matrices Evaluation of Structural Giacomo Boffi

THE 3 PRINCIPLES OF STRUCTURAL DESIGN FOR STRUCTURAL DETAILING DENIS H CAMILLERI BICC CPD 03/03

Object Space Volume Rendering Object Space Volume Rendering Ronald Peikert SciVis 2010 - Object

Object orientation Object orientation is imperative programming with some additions DD2471

Simplifying Complex Multi-Domain Measurement Challenges Presented by: Alan Wolke, W2AEW RF

Stationary complex for radio monitoring and position location of radio emission sources of

Drone Hacking Basics Intro to UAS Architectures, Attack Vectors and RF Hacking Matt Koskela June

LimeNET Open-Source Field Programmable RF Technology Driving Innovation in Wireless Networks From

Electronic Nematic Phases Andrew Davis Preliminaries Fermi Liquid Theory Begin with free

Case Study in System of Systems Engineering: NASAs Advanced Communications Technology Satellite

<Opening comments> Before going into the Q&A session, I would like to add a few comments

Welcome to the Welkom op de jaarlijkse Annual General Meeting Algemene Vergadering van

Structural Object Programming Model: Enabling Efficient Development - PowerPoint PPT Presentation

Structural Object Programming Model: Enabling Efficient Development on Massively Parallel Architectures Mike Butts, Laurent Bonetto, Brad Budlong, Paul Wasson HPEC 2008 September 2008 Introduction HPEC systems run compute-intensive

Object Oriented Object 3 Programming Object 1 Object 2 Object 4 For : COP 3330. Object

Object-Oriented Databases Object Oriented Databases ODMG Standard Object Model, Object

61A Lecture 15 Announcements Object-Oriented Programming Object-Oriented Programming 4

The Object Factory Object-Oriented Programming in R: S3 &amp; R6 pour refill Object-Oriented

Enabling Efficient Batch Verification Enabling Efficient Batch Verification on Data Integrity for

Object oriented Object oriented Object oriented Object oriented approach and UML approach and

T T T The CDO Blueprint: Enabling the he CDO Blueprint: Enabling the he CDO Blueprint:

Object Oriented Programming in Python By Amarjit Singh Karanvir Singh *#%???$% Contents

Lecture 18: object-oriented programming Review of Object-Oriented Programming in Python The

CMSC 132: Object-Oriented Programming II Object-Oriented Programming Intro Department of

Introduction to Object-Oriented Programming Review 2: Object-Oriented Programming Christopher

Object Oriented Programming in C# February 13, 2008 The C# object model 1 Implementing an

Structural Matrices in MDOF Systems Structural Matrices Evaluation of Structural Giacomo Boffi

THE 3 PRINCIPLES OF STRUCTURAL DESIGN FOR STRUCTURAL DETAILING DENIS H CAMILLERI BICC CPD 03/03

Object Space Volume Rendering Object Space Volume Rendering Ronald Peikert SciVis 2010 - Object

Object orientation Object orientation is imperative programming with some additions DD2471

Simplifying Complex Multi-Domain Measurement Challenges Presented by: Alan Wolke, W2AEW RF

Stationary complex for radio monitoring and position location of radio emission sources of

Drone Hacking Basics Intro to UAS Architectures, Attack Vectors and RF Hacking Matt Koskela June

LimeNET Open-Source Field Programmable RF Technology Driving Innovation in Wireless Networks From

Electronic Nematic Phases Andrew Davis Preliminaries Fermi Liquid Theory Begin with free

Case Study in System of Systems Engineering: NASAs Advanced Communications Technology Satellite

&lt;Opening comments&gt; Before going into the Q&amp;A session, I would like to add a few comments

Welcome to the Welkom op de jaarlijkse Annual General Meeting Algemene Vergadering van

The Object Factory Object-Oriented Programming in R: S3 & R6 pour refill Object-Oriented

<Opening comments> Before going into the Q&A session, I would like to add a few comments