Physical ‐ Aware System ‐ Level Design for Tiled Hierarchical Chip Multiprocessors Jordi Cortadella, Javier de San Pedro, Nikita Nikitin and Jordi Petit Universitat Politècnica de Catalunya (Barcelona) Project funded by Intel Corp.
Designing a Chip Multiprocessor DSP Graphics Off ‐ Chip CMP Memory Data Mining ⁞ Bioinformatics • How many cores? • How much L2/L3 on-chip cache? • Interconnect: mesh/ring/bus? • How many memory controllers? ISPD 2013 Tiled CMPs 2
What is architectural exploration? MC R ‐ 6x4 mesh, 24 clusters C2 C2 C2 ‐ total 144 cores NI L2 L2 L2 ‐ 6 cores/cluster Bus ‐ 1 C1, 128K L1, 256K L2 MC MC L2 L2 L2 ‐ 5 C2, 64K L1, 96K L2 L3 C2 C2 ‐ 146 Mb total shared L3 C1 Throughput = 85.71 IPC MC MC ‐ 5x5 mesh, 25 clusters L2 C1 C2 ‐ total 100 cores ‐ 4 cores/cluster L2 L2 C1 Bus ‐ 3 C1, 128K L1, 1M L2 MC MC ‐ 1 C2, 128K L1, 1M L2 L2 C1 L3 ‐ 130 Mb total shared L3 R NI Throughput = 103.26 IPC MC ISPD 2013 Tiled CMPs 3
Library of models CMP configuration Cache models Parameter Value Cache Area Latency (mm 2 ) Size (cycles) 350 mm 2 Chip area Mesh dimensions 2x2 to 16x16 64Kb 0.063 2 Mem. Cntrl. latency 200 cycles 128Kb 0.125 3 Interconnect Bus, uni ‐ / bi ‐ ring 256Kb 0.25 4 Link width 256 – 1024 bits Workload MPI 0.5 … … … Workload MLP 1.25 8Mb 8.0 9 Core library 0.8 2.75 C3 (OoO) Miss rates for 2.5 Miss Ratio 0.6 SPEC CPU2006 2.25 C2 (OoO) IPC 0.4 2 0.2 1.75 C1 (IO) 1.5 0 0.75 1 1.25 1.5 1.75 2 2.25 0 1 2 3 4 Area (mm 2 ) Cache Size (Mb) ISPD 2013 Tiled CMPs 4
Physical planning for tiled CMPs N W E S ISPD 2013 Tiled CMPs 5
Outline • Architectural exploration – The cost of exploration – Exploring with metaheuristics – Analytical models • Physical planning for tiled CMPs • Current work: regular floorplanning ISPD 2013 Tiled CMPs 6
Exploration engines 1E+13 Design space: 10 9 configurations 300 centuries 1E+12 1E+11 Exploration runtime (sec) 300 years 1E+10 1E+09 1E+08 100 days 1E+07 1E+06 1E+05 10000 1000 100 seconds 100 10 1 Simulation Simulation Analytical Analytical (full system) (probabilistic) (exhaustive) (metaheuristic) ISPD 2013 Tiled CMPs 7
Scalable exploration Architectural configurations Analytical Modeling Promising configurations Simulation ISPD 2013 Tiled CMPs 8
Automated exploration Physical info Cores Caches Interconnects Models Architectural configuration (performance/power) Number of cores Cores Cluster size On ‐ chip caches Exploration L2/L3 size Off ‐ chip memories tool Intra ‐ cluster interconnect Interconnect fabrics Inter ‐ cluster interconnect Cache protocol Memory controllers Workloads Constraints Area Throughput Power ISPD 2013 Tiled CMPs 9
Exploration engine: metaheuristics • Explore huge design spaces efficiently • Our proposal: – Simulated Annealing (Kirkpatrick et al., 1983) – Extremal Optimization (Boettcher et al., 1999) Models Partial generation Analytical Exploration tool of configurations modeling Constraints search direction Best Simulation configuration ISPD 2013 Tiled CMPs 10
Generation of configurations • Generate neighbors by applying transformations – Increase/Decrease • mesh dimensions • core count per cluster • L1, L2 size – Change interconnect type (bus/uni ‐ ring/bi ‐ ring) – Complex updates (increase mesh/decrease core count) • Example: Increase_X(mesh 4x4) => mesh 5x4 Increase_X ISPD 2013 Tiled CMPs 11
Analytical performance model for CMPs Nonlinear analytical models Memory subsystem Core Throughput model: λ i L i Traffic model: Core Latency model: Queueing model: … Umit Ogras et al. IEEE TCAD, Dec 2010 Characteristic of Core Characteristic of the IC the cores/workload 50 L, average latency (cycles) λ (L) L( λ ) 40 30 λ L Throughput 20 10 0 0 0.05 0.1 0.15 0.2 L L ••• Hop ‐ count λ , average traffic rate (flits/cycle) latency ISPD 2013 Tiled CMPs 12
Analytical model vs. simulation 80 Modeling Simulation 70 Analytical modeling 60 Throughput (IPC) 50 Simulation 40 30 20 10 0 1 55 109 163 217 271 325 379 433 487 541 595 649 703 757 811 865 919 973 1027 1081 1135 1189 1243 1297 1351 1405 1459 1513 1567 1621 1675 1729 1783 1837 1891 1945 1999 2053 2107 Configurations sorted in descending order of throughput ISPD 2013 Tiled CMPs 13
Case Study: Power ‐ performance exploration Power ‐ performance trade ‐ off (Search space: 1.5 · 10 9 configurations) 130 6x5, Bi ‐ Ring, 4C 2 5x4, Bi ‐ Ring, 6C 2 120 5x3, Bi ‐ Ring, 8C 2 Throughput (IPC) 4x2, Bi ‐ Ring, 15C 2 110 4x3, Bi ‐ Ring, 10C 2 3x2, Bi ‐ Ring, 20C 2 100 6x5, Bus, 4C 2 7x4, Bus, 3C 2 +1C 3 90 7x4, Bus, 4C 2 80 6x5, Bus, 2C 1 +2C 2 6x4, Bus, 3C 1 +2C 2 70 120 140 160 180 200 220 240 Power (W) ISPD 2013 Tiled CMPs 14
Outline • Architectural exploration • Physical planning for tiled CMPs – Impact of physical planning – Floorplanning – Wire planning • Current work: regular floorplanning ISPD 2013 Tiled CMPs 15
Physical planning NSWE C C R L2 L2 r r r r r r L2 L2 C C L3 ISPD 2013 Tiled CMPs 16
The impact of physical planning NSWE C C R L2 L2 r r r r r r L2 L2 C C L3 ISPD 2013 Tiled CMPs 17
Physical planning for tiles N W E S ISPD 2013 Tiled CMPs 18
Link width: how many wires? Router Cntrl Addr Cache line Cntrl Addr Cache line 64 512 64 512 > 1K wires 100 m Router ISPD 2013 Tiled CMPs 19
3D Wire Planning m6 m5 m4 m3 m2 Core Router Memory m1 FEOL In systems where memory bandwidth is the bottleneck, the physical resources providing the bandwidth are critical ISPD 2013 Tiled CMPs 20
Exploration without physical planning Architectural exploration Models Generation of Analytical configurations modeling Constraints search direction Validation Best Simulation configuration ISPD 2013 Tiled CMPs 21
Exploration with physical planning Architectural exploration Models Generation of Analytical Phys. Info configurations modeling Constraints search direction Validation Physical planning Wire Floor ‐ Best Simulation configuration Planning planning ISPD 2013 Tiled CMPs 22
Physical planning C C L2 L2 Local IC Analytical L3 R Modeling Physical Planning Simulation Estimations: • Area • Wirelength L3 • Routability ISPD 2013 Tiled CMPs 23
Physical planning technology • Floorplanning – Slicing structures & Simulated Annealing – Lightweight 3D maze router – Constraints: • Adjacency (Core L2) • Balanced links (rings) • Wire planning – SAT ‐ based 3D global routing – Boolean constraints ISPD 2013 Tiled CMPs 24
Slicing structures V 3 1 H H 5 V 4 1 2 3 2 4 5 H 4 1 V V 3 1 2 5 H 2 5 4 3 D.F. Wong and C.L. Liu, “ A New Algorithm for Floorplan Design ” DAC, 1986, pages 101-107. ISPD 2013 Tiled CMPs 25
Bounding curves Memory L. Stockmeyer, 1983, Optimal Orientation of Cells in Slicing Floorplan Designs ISPD 2013 Tiled CMPs 26
Wire planner • SAT ‐ based approach for gridded routing • Grid unit: link width ( 500 ‐ 1000 wires) • Support for floating terminals • Customizable for any type of Boolean ‐ encoded constraints (symmetry, 1D/2D routing, …) Top view Cross-section view ISPD 2013 Tiled CMPs 27
Wire planner W E Router W E • Concurrent routing: all nets simultaneously • Using Euler’s theory to find legal routes • SAT: a route is always found if it exists • ILP ‐ based route optimization ISPD 2013 Tiled CMPs 28
Design space ISPD 2013 Tiled CMPs 29
Design space Wire length [10 6 μ m] ISPD 2013 Tiled CMPs 30
Filtering floorplans Area [mm 2 ] ISPD 2013 Tiled CMPs 31
Filtering floorplans Area [mm 2 ] ISPD 2013 Tiled CMPs 32
After physical planning ISPD 2013 Tiled CMPs 33
After physical planning ISPD 2013 Tiled CMPs 34
After physical planning ISPD 2013 Tiled CMPs 35
Outline • Architectural exploration • Physical planning for tiled CMPs • Current work: regular floorplanning – Memory floorplanning – Regularity extraction ISPD 2013 Tiled CMPs 36
Min ‐ area floorplan NSWE C C R L2 C L2 L2 L2 L3 C r r r r r r r r r r r r L2 L2 L2 C C R L2 L3 C C ISPD 2013 Tiled CMPs 37
Integrated memory floorplanner 1Mb 1Mb 1Mb 256 256 512Kb 512Kb 512 512 Kb Kb Kb Kb R 512Kb L-shape T-shape ISPD 2013 Tiled CMPs 38
Regular floorplan NSWE C C R C C L3 L2 L2 r r L2 L2 r r r L2 L2 r r r r r r r C C L2 L2 R L3 C C ISPD 2013 Tiled CMPs 39
Regular floorplan L2 C C C L2 L3 L3 C r r L2 L2 r r r r r L2 L2 r r r L2 r r C C R L2 C C R Regularity: • Smaller design effort • Efficient timing closure • Choppability ISPD 2013 Tiled CMPs 40
Regular floorplan L2 C C L2 L3 L3 C r L2 r r r r r L2 r r L2 r r C C R L2 C R Regularity: Exploration: • Smaller design effort • Graph based knowledge discovery • Efficient timing closure • Hierarchical slicing structures • Choppability • Simulated Annealing ISPD 2013 Tiled CMPs 41
Recommend
More recommend