Inherently Lower Complexity Inherently Lower Complexity - PowerPoint PPT Presentation

Inherently Lower Complexity Inherently Lower Complexity Architectures using Architectures using Dynamic Optimization Dynamic Optimization Michael Gschwind Michael Gschwind Erik Altman Erik Altman ÿþýüûúùúüø÷öõôóüòñõñ÷ðïîüíñóöñð

What is the Problem? What is the Problem? Out of order superscalars achieve high Out of order superscalars achieve high performance. performance. high hardware complexity ... But at the cost of high hardware complexity ... But at the cost of Predictors Predictors Complex decode Complex decode Complex issue queues with wakeup and issue Complex issue queues with wakeup and issue logic logic Register mapping tables Register mapping tables ... ...

What is the Problem? What is the Problem? Out of order superscalars achieve high Out of order superscalars achieve high performance. performance. ... But at the cost of high power. high power. ... But at the cost of Many out of order components operate Many out of order components operate every cycle. every cycle. Many components query a large set of Many components query a large set of data to operate on a single element. data to operate on a single element.

What is the Problem? What is the Problem? Out of order superscalars achieve high Out of order superscalars achieve high performance. performance. ... But at the cost of deep pipelines. deep pipelines. ... But at the cost of Complex logic has long latency. Complex logic has long latency. To achieve high frequency with long To achieve high frequency with long latency, super pipelining is required. latency, super pipelining is required. Deep pipelines require excellent branch Deep pipelines require excellent branch predictors. predictors. Excellent branch predictors are complex. Excellent branch predictors are complex. Complex logic has long latency ... Complex logic has long latency ...

What is the Problem? What is the Problem? Out of order superscalars achieve high Out of order superscalars achieve high performance. performance. ... But at the cost of high verification and high verification and ... But at the cost of debug complexity. debug complexity. With Moore's Law, schedule slips = With Moore's Law, schedule slips = performance slips performance slips Schedule Slip Relative Performance 1 month 4% 3 month 12% 6 month 26% 9 month 41% 12 month 59% 18 month 100%

What is the Solution? What is the Solution? Software Dynamic Optimization Software Dynamic Optimization Allows reduced hardware complexity: Allows reduced hardware complexity: Shorter pipelines for same frequency. Shorter pipelines for same frequency. Fewer hardware predictors. Fewer hardware predictors. Simpler issue logic. Simpler issue logic. Less power, a la Transmeta. Less power, a la Transmeta. Less debug and verification. Less debug and verification. Smaller chips and higher yield. Smaller chips and higher yield.

How to Implement the Solution How to Implement the Solution BOA Architecture for Complexity BOA Architecture for Complexity Effective Design Effective Design BOA = = B Binary Translation inary Translation O Optimized ptimized BOA Architecture rchitecture A BOA in combination with its dynamic BOA in combination with its dynamic optimization software is architecturally optimization software is architecturally compatible with PowerPC. compatible with PowerPC.

What is interesting about BOA? What is interesting about BOA? Software dynamic optimization. Software dynamic optimization. Precise behavior on most memory faults. Precise behavior on most memory faults. Load/Store order tables ensure memory Load/Store order tables ensure memory semantics and allow aggressive dynamic semantics and allow aggressive dynamic software reordering. software reordering. Instruction recirculation mechanism to Instruction recirculation mechanism to simplify issue and exception handling. simplify issue and exception handling. Predictable latencies handled by Predictable latencies handled by software, unpredictable by hardware. software, unpredictable by hardware.

BOA System Architecture BOA System Architecture Update Goto Interpret Ins X Statistics next ins (PowerPC) X X Prev No Seen X Translated 15 times No Entry Pt Yes Yes Form Group at X Exec Group X's and Translate Ins BOA Translation to BOA Instruc

BOA ISA (1) BOA ISA (1) BOA is variable length VLIW machine. BOA is variable length VLIW machine. BOA instructions (bundles) are 128 bits. BOA instructions (bundles) are 128 bits. Bundles have 3 primitive ops. Bundles have 3 primitive ops. Primitive ops have 39 bits plus stop bit. Primitive ops have 39 bits plus stop bit. Complex PowerPC ops cracked. Complex PowerPC ops cracked. 8 bits of bundle reserved for future uses 8 bits of bundle reserved for future uses such as predication. such as predication. Instruction Issue: Instruction Issue: Up to 6 primitive ops are issued together. Up to 6 primitive ops are issued together. Only last op issued may have stop bit set. Only last op issued may have stop bit set.

BOA BOA Instructions Instructions

BOA ISA (2) BOA ISA (2) 64 Integer Registers Integer Registers 64 64 Float Registers Float Registers 64 16 4-bit 4-bit Condition Registers Condition Registers 16 Branches take 1 1 cycle: cycle: Branches take Branch mispredicts cost 7 7 cycles cycles Branch mispredicts cost Static branch pred ( using interpreter stats using interpreter stats ) ) Static branch pred ( At most one branch per cycle At most one branch per cycle

PowerPC State and Precise PowerPC State and Precise Exceptions Exceptions PowerPC Regs Shadow Regs Scratch Regs ÿþýüûú�� ÿþýüûúùø÷þø ��ûø�ý�

BOA Latencies BOA Latencies Integer ops take 1 1 cycle cycle Integer ops take No bypass => => Dependent ops must be 2 Dependent ops must be 2 No bypass . cycles apart cycles apart LOADs take 3 3 cycles cycles LOADs take No bypass => => Dependent ops must be 4 Dependent ops must be 4 ..... ..... No bypass cycles later cycles later

BOA Resources BOA Resources Issue Slots 6 Issue Slots 6 2 LOAD / STORE LOAD / STORE units units 2 Each with own copy of register file Each with own copy of register file 4 Integer Integer units units 4 Each with own copy of register file Each with own copy of register file 2 Float Float units units 2 1 Branch Branch unit unit 1 32-entry -entry Load Load and and Store Buffers Store Buffers 32 Register scoreboarding of LOAD values Register scoreboarding of LOAD values Stall when try to use loaded value Stall when try to use loaded value

Dynamic Dynamic Optimization Optimization

BOA Dynamic Optimization BOA Dynamic Optimization BOA's software optimizer originates with BOA's software optimizer originates with IBM's earlier DAISY project. IBM's earlier DAISY project. BOA adjusted and tuned optimizer: BOA adjusted and tuned optimizer: To support a narrower, higher frequency target To support a narrower, higher frequency target machine. machine. To optimize along single hyperblock paths, To optimize along single hyperblock paths, instead of tree region with multiple paths. instead of tree region with multiple paths. Improves code packing, reduces TLB misses Improves code packing, reduces TLB misses Improves code layout and helps IFetch, a la Improves code layout and helps IFetch, a la trace caches. trace caches.

Dynamic Optimization Dynamic Optimization Environments Environments Dynamic Optimization can be used in a Dynamic Optimization can be used in a variety of environments: variety of environments: Process level Process level Idealized virtual memory Idealized virtual memory Fewer difficult system/kernel code issues Fewer difficult system/kernel code issues Operating system level Operating system level No modifications to operating system No modifications to operating system More transparent More transparent Less danger of compatibility issues Less danger of compatibility issues

Dynamic Optimization Targets (1) Dynamic Optimization Targets (1) Simpler implementation of the same Simpler implementation of the same architecture architecture Ability to bail out and revert to native Ability to bail out and revert to native execution: execution: If overhead too high If overhead too high For hard to emulate sequences For hard to emulate sequences When no benefit of DO can be measured When no benefit of DO can be measured Or actually degrades Or actually degrades

Dynamic Optimization Targets (2) Dynamic Optimization Targets (2) Different architecture, e.g., RISC => Different architecture, e.g., RISC => VLIW VLIW Drastically simplify architecture Drastically simplify architecture Reduce decoding overhead even further Reduce decoding overhead even further Add more registers, add new concepts Add more registers, add new concepts All code must be emulated. Can cause All code must be emulated. Can cause severe degradation if low reuse, e.g. severe degradation if low reuse, e.g. WinStone. WinStone. Get benefits of code packing Get benefits of code packing

Inherently Lower Complexity Inherently Lower Complexity - PowerPoint PPT Presentation

Inherently Lower Complexity Inherently Lower Complexity Architectures using Architectures using Dynamic Optimization Dynamic Optimization Michael Gschwind Michael Gschwind Erik Altman Erik Altman

The Cloud and Collaboration Stephen Downes Ars Electronica Linz, September 5, 2009 Truth

Circuit Lower-bounds Lecture 24 Weak circuits are indeed weak 1 Circuit Lower-bounds 2

Data Streams & Communication Complexity Lecture 3: Communication Complexity and Lower Bounds

Hans Vangheluwe Modelling and Simulation Causes of Complexity Dealing with Complexity

Hans Vangheluwe Modelling and Simulation Causes of Complexity Dealing with Complexity

Background Background Text Complexity Text Complexity Text Complexity Sowmya V.B., Sowmya

Kolmogorov Complexity of Categories Complexity Programing Language Kolmogorov Noson S.

IN 5210 Complexity Theory Complexity Complexity: Socio-technical (Internet, globalization)

Communication Complexity Lecture 23 Computing with remote inputs 1 Communication Complexity

Complexity and Character of Human Languages The Faculty of Language Informatics 2A: Lecture 28

Kernel-Size Lower Bounds: The Evidence from Complexity Theory Andrew Drucker IAS Worker 2013,

Techniques to lower bound extension complexity Thomas Rothvoss UW Seattle Known lower bounds on

Kernel-Size Lower Bounds: The Evidence from Complexity Theory Andrew Drucker IAS Worker 2013,

Kernel-Size Lower Bounds: The Evidence from Complexity Theory Andrew Drucker IAS Worker 2013,

Paper Motivation Fixed geometric structures of CNN models CNNs are inherently limited to

Circuit Complexity Circuit model aims to offer unconditional lower bound results. Computational

GC0097 Proposal Updates 1 st November 2017 Action Update: IT User Groups Ofgem / participant

Progress of the FELIX Demonstrator Testing H. Chen, K. Chen, S. Tang August 29, 2019 Progress

A Simple and Fast Bi-Objective Search Algorithm andez 1 , William Yeoh 2 , Jorge A. Baier 3 , Han

Globe Life Inc. Bank of America Insurance Conference February 12, 2020 Forward-Looking

Womens Basketball 3-Person Mechanics NBWBOA October

Bank Stress Testing: Public Interest or Regulatory Capture? Thomas Schneider Old Dominion

Testing MCNP+McStas Coupling Experiments and Simulations at BOA/PSI Emmanouela Rantsiou SINE2020

1 Peter Series Lesson #086 April 13, 2017 Dean Bible Ministries www.deanbibleministries.org Dr.

Inherently Lower Complexity Inherently Lower Complexity - PowerPoint PPT Presentation

Inherently Lower Complexity Inherently Lower Complexity Architectures using Architectures using Dynamic Optimization Dynamic Optimization Michael Gschwind Michael Gschwind Erik Altman Erik Altman

The Cloud and Collaboration Stephen Downes Ars Electronica Linz, September 5, 2009 Truth

Circuit Lower-bounds Lecture 24 Weak circuits are indeed weak 1 Circuit Lower-bounds 2

Data Streams &amp; Communication Complexity Lecture 3: Communication Complexity and Lower Bounds

Hans Vangheluwe Modelling and Simulation Causes of Complexity Dealing with Complexity

Hans Vangheluwe Modelling and Simulation Causes of Complexity Dealing with Complexity

Background Background Text Complexity Text Complexity Text Complexity Sowmya V.B., Sowmya

Kolmogorov Complexity of Categories Complexity Programing Language Kolmogorov Noson S.

IN 5210 Complexity Theory Complexity Complexity: Socio-technical (Internet, globalization)

Communication Complexity Lecture 23 Computing with remote inputs 1 Communication Complexity

Complexity and Character of Human Languages The Faculty of Language Informatics 2A: Lecture 28

Kernel-Size Lower Bounds: The Evidence from Complexity Theory Andrew Drucker IAS Worker 2013,

Techniques to lower bound extension complexity Thomas Rothvoss UW Seattle Known lower bounds on

Kernel-Size Lower Bounds: The Evidence from Complexity Theory Andrew Drucker IAS Worker 2013,

Kernel-Size Lower Bounds: The Evidence from Complexity Theory Andrew Drucker IAS Worker 2013,

Paper Motivation Fixed geometric structures of CNN models CNNs are inherently limited to

Circuit Complexity Circuit model aims to offer unconditional lower bound results. Computational

GC0097 Proposal Updates 1 st November 2017 Action Update: IT User Groups Ofgem / participant

Progress of the FELIX Demonstrator Testing H. Chen, K. Chen, S. Tang August 29, 2019 Progress

A Simple and Fast Bi-Objective Search Algorithm andez 1 , William Yeoh 2 , Jorge A. Baier 3 , Han

Globe Life Inc. Bank of America Insurance Conference February 12, 2020 Forward-Looking

Womens Basketball 3-Person Mechanics NBWBOA October

Bank Stress Testing: Public Interest or Regulatory Capture? Thomas Schneider Old Dominion

Testing MCNP+McStas Coupling Experiments and Simulations at BOA/PSI Emmanouela Rantsiou SINE2020

1 Peter Series Lesson #086 April 13, 2017 Dean Bible Ministries www.deanbibleministries.org Dr.

Data Streams & Communication Complexity Lecture 3: Communication Complexity and Lower Bounds