in Monolithic 3D ICs Hao Zhuang, Jingwei Lu, Kambiz Samadi*, Yang - PowerPoint PPT Presentation

Performance-Driven Placement for Design of Rotation and Right Arithmetic Shifters in Monolithic 3D ICs Hao Zhuang, Jingwei Lu, Kambiz Samadi*, Yang Du*, and Chung-Kuan Cheng Dept. Computer Science & Engineering, University of California, San Diego, CA, 92093 USA *Qualcomm Research, San Diego, CA, 92121, USA

Outline • Motivation – Monolithic 3D ICs (M3D) – Our target circuits: • Rotation Shifter • Arithmetic Shifter (right shift) • Optimization Approach of Shifter Designs – Permutation-based Optimization + M3D technology – Efficient Simulated Annealing Solver • Experiment • Conclusions

Motivation: Resume Moore’s Law – 3D ICs • 3D-ICs is a promising solution for scaling of VLSI. • Standard Cell Height = 1.4um [1] • Through Silicon Vias (TSV)-based 3D ICs – Fabricate dies separately. – Wafer need to be thinned, aligned and then bonded. – TSV is large • TSV diameter = 6um [1] • Monolithic 3D Ics (M3D) – Fabricate tiers sequentially – Use monolithic inter-tier vias (MIVs) as vertical connections. The are of only metal-via sizes. • MIV diameter = 70nm [1] [1] Shreepad. Panth, et al. ASPDAC2012

Motivation: Monolithic 3D ICs (M3D)/Monolithic Inter-Tier Vias (MIV) The advantages of M3D/MIV: • High-density integrations. Reduce the huge dimensions and area overhead of TSVs for 3D IC designs. • Cope with interconnect-limited 2D-ICs, where most of the problems are essentially caused by the high interconnect density at gate level . • Inserts vertical connections and shortens the distance between connected modules. • Reduce the total wire length and power , improves the routability and timing behavior.

Motivation: Our target circuits • Shifter Circuits – An indispensable datapath components in the MPU and ASIC. – Has a broad spectrum of application and could impact the system performance in a larger scale. – The wiring inside each shifter module is quite dense . Improvement on timing and power behaviors of shifters becomes an important subject . • In this work, our specific targets of shifters are: – Rotation Shifter – Arithmetic Shifter (Right shift)

Rotation Shifter (Rotator) This is a linear ordering design (LO) of Rotation shifter (rotator), also known as cyclic shifter. Rotation requires long wrap-around wires. D 7 D 6 D 5 D 4 D 3 D 2 D 0 D 1 x (7, 0) (6, 0) (5, 0) (4, 0) (3, 0) (2, 0) (0, 0) (1, 0) y 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 S 0 (7, 1) (6, 1) (5, 1) (4, 1) (3, 1) (2, 1) (1, 1) (0, 1) 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 S 1 (7, 2) (6, 2) (5, 2) (4, 2) (3, 2) (2, 2) (1, 2) (0, 2) 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 S 2 (7, 3) (6, 3) (5, 3) (4, 3) (3, 3) (2, 3) (1, 3) (0, 3) Z 1 Z 0 Z 7 Z 6 Z 5 Z 4 Z 3 Z 2

Right Arithmetic Shifter This is a linear ordering design of right arithmetic shifter. Extend the original MSB (most significant bits). D 7 D 6 D 5 D 4 D 3 D 2 D 0 D 1 x (7, 0) (6, 0) (5, 0) (4, 0) (3, 0) (2, 0) (0, 0) (1, 0) y 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 S 0 (7, 1) (6, 1) (5, 1) (4, 1) (3, 1) (2, 1) (1, 1) (0, 1) 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 S 1 (7, 2) (6, 2) (5, 2) (4, 2) (3, 2) (2, 2) (1, 2) (0, 2) 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 S 2 (7, 3) (6, 3) (5, 3) (4, 3) (3, 3) (2, 3) (1, 3) (0, 3) Z 1 Z 0 Z 7 Z 6 Z 5 Z 4 Z 3 Z 2

Objectives • Objectives: – Reduce such longest path to improve timing. – Reduce total wire length to improve power. • Heavy wire loads in the linear order design, caused by long wrap-around wires. D 7 D 6 D 5 D 4 D 3 D 2 D 0 D 1 x (7, 0) (6, 0) (5, 0) (4, 0) (3, 0) (2, 0) (0, 0) (1, 0) y 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 S 0 (7, 1) (6, 1) (5, 1) (4, 1) (3, 1) (2, 1) (1, 1) (0, 1) 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 S 1 (7, 2) (6, 2) (5, 2) (4, 2) (3, 2) (2, 2) (1, 2) (0, 2) 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 S 2 (7, 3) (6, 3) (5, 3) (4, 3) (3, 3) (2, 3) (1, 3) (0, 3) Z 1 Z 0 Z 7 Z 6 Z 5 Z 4 Z 3 Z 2

Approaches Our optimization approach combines two aspects as follows: • M3D/MIV – Inserts vertical connections, may shortens the distance between connected cells. – By introducing extra dimension here, it reduces the total wire length and dynamic power, improve the routability and timing behavior. • Cell Order Permutations (proposed in our ASPDAC 07 paper [2]) – Idea/Observations: By swapping the physical positions of cells in shifter, it reduces the longest path and total wire length . – Sometimes, it compensates the delay penalty by deviate routes of the design only by naïve folding 2D designs to 3D ICs (show in the experiment of right arithmetic shifter). The first work to optimize 3D shifter by cell order permutations. (Previous work tend to use simple folding 2D linear design into 3D space. Not efficient!) [2] Haikun Zhu, et al. ASPDAC2007.

Optimization (Cell Order Permutation) Illustration case of permutation-based optimization (8-bit rotator) 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 1-bit >> 1-bit 1 0 1 0 1 0 1 0 4 3 0 1 0 1 0 1 0 7 0 1 0 1 0 1 0 1 0 1 0 1 0 1 6 5 4 3 7 2 1 0 7 6 5 4 3 2 1 0 7 1 1 1 1 1 1 1 1 4 1 1 1 1 1 1 >> 2-bit >> 2-bit 7 7 1 7 1 9 1 3 1 3 1 3 1 3 1 3 4 2 2 2 4 1 4 5 4 3 5 5 1 4 1 3 7 6 5 4 3 2 1 0 3 4 2 6 7 5 1 0 7 7 9 3 3 3 3 3 2 4 4 5 4 5 4 3 >> 4-bit >> 4-bit 7 7 7 7 9 7 3 7 3 11 3 11 3 13 3 7 8 4 7 5 8 8 4 7 8 4 7 7 4 6 3 8 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 7 9 7 11 11 13 7 7 8 8 7 8 7 6 8 LO design A optimized solution The longest path spans (LPS) The longest path spans (LPS) is 13 MUX cells. only 8 MUX cells. 11

Optimization (M3D) • Folding 2D LO into 3D Linear Order Design • Cut the long wrap-around wires, use short MIVs instead Cut the 2D design Folding to different layers (3D) 12

Optimization (M3D) (1) Physical placement of cell and (2) Connect different layers with MIVs for vertical communications 13

Cell Order Permutation in 3D space • Extend to 3D Space Permutations – wires are not shown along x-y direction, – the MIVs are treated as vertical interconnects. – Assume MIVs connects adjacent layers is 5% width of MUX cell – Swap these two highlighted cells, and etc. Input Output 14

Solve the whole optimization via Simulated Annealing-Based Solver (SA) • Use slacks for timing – Take two fan-out nodes into considerations, not just reducing the longest path. The weight of net 𝑜 𝑗 , – 𝑡𝑚𝑏𝑑𝑙 𝑓 ) 𝜄 𝑥 𝑜 𝑗 = (1 − 𝑓∈𝑜 𝑗 𝐸 – Total slack of 𝑜𝑓𝑢𝑥𝑝𝑠𝑙 of shifter 𝑋 𝑡𝑚𝑏𝑑𝑙 = 𝑥 𝑜 𝑗 𝑜 𝑗 ∈𝑜𝑓𝑢𝑥𝑝𝑠𝑙 • Use total wire length as another cost function – Power is proportional to wire length 𝑋 𝑈𝑋𝑀 is the total wire length. – • Auto-Normalizing Cost Function ∆ 𝑑𝑝𝑡𝑢 = 𝛿 Δ𝑋 𝑡𝑚𝑏𝑑𝑙 + (1 − 𝛿) Δ𝑋 𝑈𝑋𝑀 𝑋 𝑡𝑚𝑏𝑑𝑙𝑞𝑠𝑓𝑤 𝑋 𝑈𝑋𝑀𝑞𝑠𝑓𝑤 𝛿 is a tuning parameter. [3][4] [3] A. Marquardt, et. al. FPGA 2000 [4] K. Eguro, et. al. DAC 2008

Scalable SA optimization solver • Integer Linear Programming (ILP) is not scalable in our case, which was used in [2]. • SA is a scalable method to solve this optimization problem, and also archive almost same quality of LPS as ILP, (shown in Table II, 16 bits rotator cases). • “LPS”: The span of the longest path along x -/z- directions, measure in the number of MUX cell. wire span along y-direction contributes the same among shifters) • When optimizing a 32 bits rotator in 2 layer, ILP spends over days to obtain the solution, while SA only take minutes. [2] Haikun Zhu, et al. ASPDAC2007.

Experiment of Shifter Design Optimization • The parameters for performance evaluations • Notations on following pages, – “SA”: permutation -based optimization by simulated annealing-based solver. – “LO”: Linear order design in 2D or folding linear order design in 3D. – “LPS”: The span of the longest path along x -/z- directions, measure in the number of MUX cell (wire span along y-direction contributes the same among shifters). – Delay, and Power are measured based on the following methods.

in Monolithic 3D ICs Hao Zhuang, Jingwei Lu, Kambiz Samadi*, Yang - PowerPoint PPT Presentation

Performance-Driven Placement for Design of Rotation and Right Arithmetic Shifters in Monolithic 3D ICs Hao Zhuang, Jingwei Lu, Kambiz Samadi, Yang Du, and Chung-Kuan Cheng Dept. Computer Science & Engineering, University of California, San

ICS 233 ICS 233 ICS 233 ICS 233 Computer Architecture & Computer Architecture &

ICS Vulnerability Disclosure To Disclose or Not to Disclose ICS-CERT Control Systems Security

What Could Microkernels What Could Microkernels Learn from Monolithic Learn from Monolithic

Activities related to monolithic and vertically Activities related to monolithic and vertically

ics.uwex.edu ics.uwex.edu The Video Interoperability Challenge ics.uwex.edu Room Systems A

ICS COOL ENERGY ICS COOL ENERGY THE TEMPERATURE CONTROL SPECIALISTS THE TEMPERATURE CONTROL

Introduction to ICS- -214 214 Introduction to ICS Official Unit / Incident Log - A V-C-N.org

Introduction to ICS- -214 214 Introduction to ICS Official Unit / Incident Log - A V-C-N.org

CS233601: Discret e CS233601: Discret e CS233601: Discret e Mat hemat ics Mat hemat ics Mat

MESSAGE HANDLING MESSAGE HANDLING ICS- -213 213 ICS Presented by Chuck Sprick KE5RAD Feb

Resource Disaggregation Yiying Zhang 2 Monolithic Computer OS / Hypervisor 3 Application Can

Radiation hardness of fully depleted CMOS Monolithic Active Pixel Sensors Tobias Bus, Benjamin

MICROKERNELS KISHAN VARMA VAISHALI CHANANA AGENDA MONOLITHIC KERNELS VS MICROKERNELS

Monolithic Active Pixel Sensors (MAPS) Maria Elisabetta Giglio February 3, 2017 Maria

Incident Command System (ICS) Incident Command System (ICS) Describe the role, responsibility

Therm odynam ics Therm odynam ics and and Fabric of Spacetim e Fabric of Spacetim e Dm itri

343H: Honors AI Lecture 6: Adversarial Search 2/4/2014 Kristen Grauman UT-Austin Slides

Maximization of Submodular Functions Seffi Naor Lecture 1 4th Cargese Workshop on Combinatorial

A brief review of quantum annealing Hidetoshi Nishimori Tokyo Institute of Technology

Non-convex Learning via Replica Exchange Stochastic Gradient MCMC A scalable parallel tempering

Limits to Nonlinear Inversion Klaus Mosegaard Univ. of Copenhagen September 2008 Klaus

to Simulation Optimization to Simulation, Optimization, and back Biology Physics

Optical Recording and Optical Recording and and tilt it just right, the watchs face appears to

Challenges on new control services for transport networks Raul Muoz Head of Optical Networks

in Monolithic 3D ICs Hao Zhuang, Jingwei Lu, Kambiz Samadi*, Yang - PowerPoint PPT Presentation

Performance-Driven Placement for Design of Rotation and Right Arithmetic Shifters in Monolithic 3D ICs Hao Zhuang, Jingwei Lu, Kambiz Samadi*, Yang Du*, and Chung-Kuan Cheng Dept. Computer Science & Engineering, University of California, San

ICS 233 ICS 233 ICS 233 ICS 233 Computer Architecture &amp; Computer Architecture &amp;

ICS Vulnerability Disclosure To Disclose or Not to Disclose ICS-CERT Control Systems Security

What Could Microkernels What Could Microkernels Learn from Monolithic Learn from Monolithic

Activities related to monolithic and vertically Activities related to monolithic and vertically

ics.uwex.edu ics.uwex.edu The Video Interoperability Challenge ics.uwex.edu Room Systems A

ICS COOL ENERGY ICS COOL ENERGY THE TEMPERATURE CONTROL SPECIALISTS THE TEMPERATURE CONTROL

Introduction to ICS- -214 214 Introduction to ICS Official Unit / Incident Log - A V-C-N.org

Introduction to ICS- -214 214 Introduction to ICS Official Unit / Incident Log - A V-C-N.org

CS233601: Discret e CS233601: Discret e CS233601: Discret e Mat hemat ics Mat hemat ics Mat

MESSAGE HANDLING MESSAGE HANDLING ICS- -213 213 ICS Presented by Chuck Sprick KE5RAD Feb

Resource Disaggregation Yiying Zhang 2 Monolithic Computer OS / Hypervisor 3 Application Can

Radiation hardness of fully depleted CMOS Monolithic Active Pixel Sensors Tobias Bus, Benjamin

MICROKERNELS KISHAN VARMA VAISHALI CHANANA AGENDA MONOLITHIC KERNELS VS MICROKERNELS

Monolithic Active Pixel Sensors (MAPS) Maria Elisabetta Giglio February 3, 2017 Maria

Incident Command System (ICS) Incident Command System (ICS) Describe the role, responsibility

Therm odynam ics Therm odynam ics and and Fabric of Spacetim e Fabric of Spacetim e Dm itri

343H: Honors AI Lecture 6: Adversarial Search 2/4/2014 Kristen Grauman UT-Austin Slides

Maximization of Submodular Functions Seffi Naor Lecture 1 4th Cargese Workshop on Combinatorial

A brief review of quantum annealing Hidetoshi Nishimori Tokyo Institute of Technology

Non-convex Learning via Replica Exchange Stochastic Gradient MCMC A scalable parallel tempering

Limits to Nonlinear Inversion Klaus Mosegaard Univ. of Copenhagen September 2008 Klaus

to Simulation Optimization to Simulation, Optimization, and back Biology Physics

Optical Recording and Optical Recording and and tilt it just right, the watchs face appears to

Challenges on new control services for transport networks Raul Muoz Head of Optical Networks

Performance-Driven Placement for Design of Rotation and Right Arithmetic Shifters in Monolithic 3D ICs Hao Zhuang, Jingwei Lu, Kambiz Samadi, Yang Du, and Chung-Kuan Cheng Dept. Computer Science & Engineering, University of California, San

ICS 233 ICS 233 ICS 233 ICS 233 Computer Architecture & Computer Architecture &