A Reconfigurable Architecture for Load-Balanced Rendering
Graphics Hardware July 31, 2005, Los Angeles, CA
A Reconfigurable Architecture for Load-Balanced Rendering Jiawen - - PowerPoint PPT Presentation
A Reconfigurable Architecture for Load-Balanced Rendering Jiawen Chen Michael I. Gordon William Thies Matthias Zwicker Kari Pulli Frdo Durand Graphics Hardware July 31, 2005, Los Angeles, CA The Load Balancing Problem data parallel
Graphics Hardware July 31, 2005, Los Angeles, CA
V R T F D V R T F D V R T F D V R T F D
Parallelism in multiple graphics pipelines
Screenshot from Counterstrike
Input Vertex Vertex Sync Triangle Setup Pixel Pixel V P
Simplified graphics pipeline
Screenshot from Doom 3
Input Vertex Vertex Sync Triangle Setup V R
Simplified graphics pipeline
Rest of Pixel Pipeline Rest of Pixel Pipeline
Rasterizer Rasterizer
Die Photo of 16-tile Raw chip Diagram of a 4x4 Raw processor
Input Vertex Vertex join split Triangle Setup split Pixel Pixel V P
Sort-middle graphics pipeline stream graph
Stream graph for graphics pipeline StreamIt Layout on 8x8 Raw
– Mesh of identical tiles – No global signals
– Integrated into bypass paths – Register mapped – Fast neighbor communications – Essential for flexible resource allocation
– Compute processor – Programmable Switch Processor
A 4x4 Raw chip
Computation Resources
Switch Processor Diagram
– 180nm process – 16 tiles at 425 MHz – 6.8 GFLOPS peak – 47.6 GB/s memory bandwidth
– 64 tiles at 425 MHz – 27.2 GFLOPS peak – 108.8 GB/s memory bandwidth (32 ports)
Die photo of 16-tile Raw chip 180nm process, 331 mm2
Example stream graph
parallel computation may be any StreamIt language construct
joiner splitter pipeline feedback loop joiner splitter splitjoin filter
Graphics pipeline stream graph
– Simulated annealing layout algorithm – Generates code for compute processors – Generates routing schedule for switch processors Layout on 8x8 Raw
Input Vertex Processor Sync Triangle Setup Rasterizer Pixel Processor Frame Buffer
StreamIt Compiler Stream graph
Input Vertex Vertex join split Triangle Setup split Pixel Pixel V P
Sort-middle Stream Graph
Configuration 1 Configuration 2
Manual layout on Raw Fixed Resource Allocation: 6 vertex units, 15 pixel pipelines
Input Vertex Processor Sync Triangle Setup Rasterizer Pixel Processor Frame Buffer
– Devote 2 tiles to pixel shader – 1 for computing the lighting direction and normal – 1 for shading
– Eliminate texture coordinate interpolation, etc
Output, rendered using the Raw simulator
Input Vertex Processor Triangle Setup Rasterizer Pixel Processor A Frame Buffer Pixel Processor B Phong Shading Stream Graph Automatic Layout on Raw
Fixed pipeline Reconfigurable pipeline
most of the screen
– Initialize depth buffer – Draw extruded shadow volume geometry with Z-fail algorithm – Draw textured triangles with stencil testing
pass
– Adjust ratio of vertex to pixel units – Eliminate unused operations
Output, rendered using the Raw simulator
Input Vertex Processor Triangle Setup Rasterizer Frame Buffer
Input Vertex Processor Triangle Setup Rasterizer Texture Lookup Frame Buffer Texture Filtering Shadow Volumes Pass 3 Stream Graph Automatic Layout on Raw
Fixed pipeline Reconfigurable pipeline
Pass 1 Pass 2 Pass 3 Pass 1 Pass 2 Pass 3