HardwareSoftware Co-Design: Not Just a Clich Adrian Sampson James - PowerPoint PPT Presentation

Hardware–Software Co-Design: Not Just a Cliché Adrian Sampson James Bornholt Luis Ceze sa pa University of Washington SNAPL 2015

time 2005 2015 immemorial (not to scale)

free lunch time 2005 2015 immemorial exponential single-threaded performance scaling! (not to scale)

Clock Frequency (MHz) 10,000 1,000 100 2020 2015 2010 2005 2000 1995 10 Year of Introduction 1990 1985 om 1986 to 2008 as measured by the bench- l Technology

free lunch multicore era time 2005 2015 immemorial we’ll scale the number of cores instead

The multicore transition was a stopgap, not a panacea.

free lunch multicore era who knows? time 2005 2015 immemorial ? ? ? ? ?

Application Language Architecture Circuits

Application Language hardware–software abstraction boundary Architecture parallelism data guard energy movement bands costs Circuits

Application Language parallelism data guard energy hardware–software abstraction boundary movement bands costs Architecture Circuits

lessons learned from Approximate Computing New Opportunities for hardware–software co-design

Application Language new abstractions for incorrectness Architecture Circuits

Application probabilistic Language type systems debuggers guarantees auto-tuning new abstractions for incorrectness flaky lossy cache Architecture neural drowsy functional units compression acceleration SRAMs Circuits

The von Neumann curse useful work other crud we don’t care about and can’t fix

Hardware design costs sanity & well-being Thierry Moreau, FPGA design champion [Moreau et al.; HPCA 2015]

Trust your compiler approximate cache [Esmaeilzadeh, Sampson, Ceze, Burger; ASPLOS 2012]

Trust your compiler approximate cache st r1 x ld x r3 st.a r2 y ld.a y r4 [Esmaeilzadeh, Sampson, Ceze, Burger; ASPLOS 2012]

Trust your compiler approximate cache 0 st r1 x ld x r3 1 0 1 st.a r2 y ld.a y r4 1 line state bits? [Esmaeilzadeh, Sampson, Ceze, Burger; ASPLOS 2012]

Trust your compiler approximate cache st r1 x ld x r3 st.a r2 y ld.a y r4 line state bits? [Esmaeilzadeh, Sampson, Ceze, Burger; ASPLOS 2012]

lessons learned from Approximate Computing New Opportunities for hardware–software co-design

More hardware flexibility that humans can actually program

More hardware flexibility that humans can actually program FPGA

More hardware flexibility that humans can actually program explicit data movement explicit memory blocks explicit physical routing explicit clock frequency explicit ILP FPGA explicit numeric bit width

More hardware flexibility that humans can actually program A Reconfigurable Fabric for Accelerating Large-Scale Datacenter Services Derek Chiou 1 Andrew Putnam Adrian M. Caulfield Eric S. Chung Kypros Constantinides 2 John Demme 3 Hadi Esmaeilzadeh 4 Jeremy Fowers Scott Hauck 5 Stephen Heil Gopi Prashanth Gopal Jan Gray Michael Haselman Amir Hormati 6 Joo-Young Kim James Larus 7 Eric Peterson Sitaram Lanka Simon Pope Aaron Smith Jason Thong Phillip Yi Xiao Doug Burger Microsoft desirable to reduce management issues and to provide a consis- Abstract tent platform that applications can rely on. Second, datacenter Datacenter workloads demand high computational capabili- services evolve extremely rapidly, making non-programmable ties, flexibility, power efficiency, and low cost. It is challenging hardware features impractical. Thus, datacenter providers to improve all of these factors simultaneously. To advance dat- are faced with a conundrum: they need continued improve- acenter capabilities beyond what commodity server designs ments in performance and efficiency, but cannot obtain those can provide, we have designed and built a composable, recon- improvements from general-purpose systems. figurable fabric to accelerate portions of large-scale software Reconfigurable chips, such as Field Programmable Gate services. Each instantiation of the fabric consists of a 6x8 2-D Arrays (FPGAs), offer the potential for flexible acceleration torus of high-end Stratix V FPGAs embedded into a half-rack of many workloads. However, as of this writing, FPGAs have of 48 machines. One FPGA is placed into each server, acces- not been widely deployed as compute accelerators in either sible through PCIe, and wired directly to other FPGAs with datacenter infrastructure or in client devices. One challenge pairs of 10 Gb SAS cables. traditionally associated with FPGAs is the need to fit the ac- In this paper, we describe a medium-scale deployment of celerated function into the available reconfigurable area. One this fabric on a bed of 1,632 servers, and measure its efficacy could virtualize the FPGA by reconfiguring it at run-time to in accelerating the Bing web search engine. We describe support more functions than could fit into a single device. the requirements and architecture of the system, detail the However, current reconfiguration times for standard FPGAs

More hardware flexibility that humans can actually program A Reconfigurable Fabric for Accelerating Large-Scale Datacenter Services 23 Derek Chiou 1 Andrew Putnam Adrian M. Caulfield Eric S. Chung Kypros Constantinides 2 John Demme 3 Hadi Esmaeilzadeh 4 Jeremy Fowers Scott Hauck 5 Stephen Heil Gopi Prashanth Gopal Jan Gray Michael Haselman authors! Amir Hormati 6 Joo-Young Kim James Larus 7 Eric Peterson Sitaram Lanka Simon Pope Aaron Smith Jason Thong Phillip Yi Xiao Doug Burger Microsoft desirable to reduce management issues and to provide a consis- Abstract tent platform that applications can rely on. Second, datacenter Datacenter workloads demand high computational capabili- services evolve extremely rapidly, making non-programmable ties, flexibility, power efficiency, and low cost. It is challenging hardware features impractical. Thus, datacenter providers to improve all of these factors simultaneously. To advance dat- are faced with a conundrum: they need continued improve- acenter capabilities beyond what commodity server designs ments in performance and efficiency, but cannot obtain those can provide, we have designed and built a composable, recon- improvements from general-purpose systems. figurable fabric to accelerate portions of large-scale software Reconfigurable chips, such as Field Programmable Gate services. Each instantiation of the fabric consists of a 6x8 2-D Arrays (FPGAs), offer the potential for flexible acceleration torus of high-end Stratix V FPGAs embedded into a half-rack of many workloads. However, as of this writing, FPGAs have of 48 machines. One FPGA is placed into each server, acces- not been widely deployed as compute accelerators in either sible through PCIe, and wired directly to other FPGAs with datacenter infrastructure or in client devices. One challenge pairs of 10 Gb SAS cables. traditionally associated with FPGAs is the need to fit the ac- In this paper, we describe a medium-scale deployment of celerated function into the available reconfigurable area. One this fabric on a bed of 1,632 servers, and measure its efficacy could virtualize the FPGA by reconfiguring it at run-time to in accelerating the Bing web search engine. We describe support more functions than could fit into a single device. the requirements and architecture of the system, detail the However, current reconfiguration times for standard FPGAs

Trust, but formally verify useful work

Trust, but formally verify useful work checking that software doesn’t do anything crazy

Trust, but formally verify Application Language verified properties Architecture Circuits e.g., [Hunt and Larus; OSR April 2007]

Hardware beyond core computation software-defined CPU FPGA networking power supply & battery GPU accelerators new memory mobile display technologies & backlight

the era of language free lunch multicore era co-design? time 2005 2015 immemorial

HardwareSoftware Co-Design: Not Just a Clich Adrian Sampson James - PowerPoint PPT Presentation

HardwareSoftware Co-Design: Not Just a Clich Adrian Sampson James Bornholt Luis Ceze sa pa University of Washington SNAPL 2015 time 2005 2015 immemorial (not to scale) free lunch time 2005 2015 immemorial exponential

Hardware Observability Framework Hardware Observability Framework Hardware Observability

software and hardware for the Internet of Things. Choose hardware Design hardware Design

Agile Software Design 19 February, 2020 Software Design Early decisions Modular design Agile

VC. VC. Hardware Startup The Hardware Revolu/on The Hardware Revolution Removing Barriers to

Sec Secure ure Hardware Hardware and Hardware and Hardware- En Enabled abled Security

Was I supposed to Mix the Was I supposed to Mix the Security in Before I Baked It? Security in

Software-Hardware Software-Hardware mapping mapping in a Robot Design in a Robot Design Pavol

Flexible Hardware Design at Flexible Hardware Design at Low Levels of Abstraction Low Levels of

MPSoC 2003 MPSoC 2003 Hardware dependent Software (HdS). Hardware dependent Software (HdS).

LibreCores Free and Open Digital Hardware Requirements Design Implementation Hardware

KRISTA BOAN WAIT, WHAT JUST HAPPENED? WAIT, WHAT JUST HAPPENED? WAIT, WHAT JUST HAPPENED? WAIT,

Just Culture CAPT JEFF SALVON-HARMAN, MD JUST CULTURE, CERTIFIED QUALITY FOCUS OFFICE OF THE

Keep in mind A good program is not one that just works A good program is not one that just works

Hardware/Software Hardware/Software Codesign Environments Codesign Environments Gert Jervan

Outline Software Software Design Design Enrico Bini Enrico Bini Design Design problem

Hardware Design for Cryptographers P . Schaumont Bradley Department of Electrical and Computer

Solar with Justice: Recommendations for State Governments January 29, 2020 Housekeeping Join

Linking Localisation and Language Resources Dominic Jones, David Lewis, Alexander OConnor, Leroy

Is employment a panacea to poverty? A mixed-methods investigation of employment decisions in

PANACEA: AUTOMATING ATTACK CLASSIFICATION FOR ANOMALY-BASED NETWORK INTRUSION DETECTION SYSTEMS

Ichthyology: Phishing as a Science @tetrakazi Let's talk. Dear Sir, I would like to o ff er you

Spock: Exploiting Serverless Functions for SLO and Cost Aware Resource Procurement in Public

Securing Internet Routing Securing Internet Routing $ Local ISP Sharon Goldberg g Princeton

SPP/Entergy Cost-Benefit Analysis Qualitative Impacts Areas of Impact July 13 Draft For July