Hybrid Computer Architecture Brian Van Essen Benjamin Ylvisaker - - PowerPoint PPT Presentation
Hybrid Computer Architecture Brian Van Essen Benjamin Ylvisaker - - PowerPoint PPT Presentation
Hybrid Computer Architecture Brian Van Essen Benjamin Ylvisaker Carl Ebeling Moores Law: Is it Over? n von Neumann processors no longer scale n Overhead of speculative execution is too high n Complexity of superscalar OOO core is n 2 n
Moore’s Law: Is it Over?
n von Neumann processors no longer scale
n Overhead of speculative execution is too high n Complexity of superscalar OOO core is n2 n Optimum power / performance pipeline depth is ~7
stages
n Spatial processors benefit from added transistors
n Reconfigurability allows virtualization
n Enables programming abstraction
Keeping up with streams is hard
n Multimedia workloads
n Audio & Video
n Communication workloads
n Networking
ƒ(x)=… Example of a streaming transformation Spatial processors are good at this
Hybrid Architecture Research
n Blend sequential and spatial computing
n One program executes both types of
computation
Overview
n What is spatial computing
n Why is it interesting
n Hybrid Architectures
n What is hard about hybrid architectures
n Future Research
What is spatial computing?
n Spatial processors:
n Parallel array of compute elements (fabric) n Assign operations to different physical
resources
n Stream operands through the fabric n Execute many operations in parallel
n Sequential processors:
n Step through a sequence of instructions
Encoding a program
Load r 1 , A Load r 2 , B Load r 3 , C Load r 4 , D Add r 5 , r 1 , r 2 Mul r 6 , r 2 , r 3 Add r 7 , r 1 , r 5 Sub r 8 , r 5 , r 4 Sub r 9 , r 7 , r 6 Add r 10 , r 7 , r 8 Mul r 11 , r 8 , r 4
LD LD LD LD + x +
- +
x
Instruction Stream Dataflow Graph
Processors: Under the hood
PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE
Data
Traditional Computer (Load / Store Arch) Spatial Computer (e.g. FPGA, PipeRench)
Instructions
Fetch Decode WriteBack Execute
ALU LS
Register File Memory
Why spatial processors?
n Extremely efficient for certain applications
n Regular computation n Regular communication
n e.g. Streaming Data
n Excellent performance / power ratio n Limitations:
n Difficult to execute control flow n Hard to program
Basic Hybrid Architectures
n Two processors on a single chip
n Integrates control plane and data plane processors n Provide high speed interconnect n Share memory
n Execute independent programs
n Manage synchronization
Unified Hybrid Architecture
n Single programming model
n Collapses control plane and data plane processors into single
abstraction
n Implicit synchronization n Simplified programming abstraction n Program “Automagicly” executes on appropriate processor
n Runtime system manages
fabric configuration
Research Challenges
n Creating a new Instruction Set Architecture (ISA)
n Provides canonical sequential interpretation n Exposes good spatial configuration n Efficient synchronization of runtime control
n Virtualization of spatial processors is hard
n Necessary to provide abstract programmers model n Use dynamic reconfiguration
n Programming Language
n Explicit stream operations n Disambiguate memory references
Research Synopsis
n Define new processor architecture and ISA
n New level of ease of use
n Unified programming model
n Blend sequential and spatial computing
n Excels at streaming data applications n One program executes both types of computation
n Implicit communication
n Efficient virtualization of spatial processors n System-level programming language
Appendix
Type Architectures Programming Languages
Abstract processor models
n von Neumann Type Architecture - RAM Model
n A processor interpreting 3-address instructions n PC describing the next instruction of program in memory n Flat, randomly accessed memory requires 1 time unit n Memory is composed of fixed sized addressable units n One instruction executes at a time, and is completed before the
next instruction executes
n Modern RISC & CISC processors emulate this model
C directly implements this model
Hybrid Type Architecture
n
von Neumann sequential processor
n
Spatial Fabric
n
P operations per cycle
n
Statically scheduled
n
Main Memory
n
~ 1 access per cycle
n
Local Memory (Workspace)
n
~ P accesses per cycle
n
enough to maintain P ops
n
Alternating Execution
n
Sequential program executes
n
Control transferred to spatial fabric
n
Shared state transferred
n
Atomic execution of spatial section
n
Shared state transferred back
Main Memory M1 Sequential Processor Spatial Computing Fabric Local Memory M2 Spatial Processor
Working set
A new Programming Language
n “System level”
n Full control of underlying ISA n Explicit resource management
n Key Issues
n Expressing parallel portions of computation
n Easily mapped to spatial processor
n “Relaxed” memory access ordering
n e.g. streams
n Disambiguate memory references
n mitigate aliasing
n Reflect constraints of type architecture
n e.g. low main memory bandwidth