[PPT] - Panel Session: Amending Moores Law for Embedded Applications James PowerPoint Presentation

SLIDE 1

MIT Lincoln Laboratory

000523-jca-1 KAM 10/7/2004

Panel Session: Amending Moore’s Law for Embedded Applications James C. Anderson MIT Lincoln Laboratory HPEC04 29 September 2004

This work is sponsored by the HPEC-SI (high performance embedded computing software initiative) under Air Force Contract F19628-00-C-0002. Opinions, interpretations, conclusions and recommendations are those of the author and are not necessarily endorsed by the United States Government. Reference to any specific commercial product, trade name, trademark

r manufacturer does not constitute or imply endorsement.

SLIDE 2

MIT Lincoln Laboratory

000523-jca-2 KAM 10/7/2004

Objective, Questions for the Panel & Schedule

Objective: identify & characterize factors that affect the impact
f Moore’s Law on embedded applications
Questions for the panel

– 1). Moore’s Law: what’s causing the slowdown? – 2). What is the contribution of Moore’s Law to improvements at the embedded system level? – 3). Can we preserve historical improvement rates for embedded applications?

Schedule

– 1540-1600: panel introduction & overview – 1600-1620: guest speaker Dr. Robert Schaller – 1620-1650: panelist presentations – 1650-1720: open forum – 1720-1730: conclusions & the way ahead

Panel members & audience may hold diverse, evolving opinions

SLIDE 3

MIT Lincoln Laboratory

000523-jca-3 KAM 10/7/2004

Panel Session: Amending Moore’s Law for Embedded Applications

Moderator: Dr. James C. Anderson, MIT Lincoln Laboratory

Dr. Richard Linderman,

Air Force Research Laboratory

Dr. Mark Richards,

Georgia Institute of Technology

Mr. David Martinez,

MIT Lincoln Laboratory

Dr. Robert R. Schaller,

College of Southern Maryland

SLIDE 4

MIT Lincoln Laboratory

000523-jca-4 KAM 10/7/2004

Four Decades of Progress at the System Level

1965 Gordon Moore publishes “Cramming more components

nto integrated circuits”

Computers lose badly at chess

SLIDE 5

MIT Lincoln Laboratory

000523-jca-5 KAM 10/7/2004

Four Decades of Progress at the System Level

Gordon Moore publishes “Cramming more components onto integrated circuits” 1965

Robert Schaller publishes “Moore’s Law: past, present and future”

Computers lose badly at chess

1997 Deep Blue (1270kg) beats chess champ Kasparov

SLIDE 6

MIT Lincoln Laboratory

000523-jca-6 KAM 10/7/2004

Four Decades of Progress at the System Level

Gordon Moore publishes “Cramming more components onto integrated circuits” Robert Schaller publishes “Moore’s Law: past, present and future”

Mark Richards (with Gary Shaw) publishes “Sustaining the exponential growth of embedded digital signal processing capability”

Computers lose badly at chess Deep Blue (1270kg) beats chess champ Kasparov

Chess champ Kramnik ties Deep Fritz & Kasparov ties Deep Junior (10K lines C++ running on 15 GIPS server using 3 Gbytes)

1965 1997 ~2008

2002- 2004

SLIDE 7

MIT Lincoln Laboratory

000523-jca-7 KAM 10/7/2004

Four Decades of Progress at the System Level

Gordon Moore publishes “Cramming more components onto integrated circuits” Robert Schaller publishes “Moore’s Law: past, present and future”

Mark Richards (with Gary Shaw) publishes “Sustaining the exponential growth of embedded digital signal processing capability” Computers lose badly at chess Deep Blue (1270kg) beats chess champ Kasparov Chess champ Kramnik ties Deep Fritz & Kasparov ties Deep Junior (10K lines C++ running on 15 GIPS server using 3 Gbytes)

1965 1997

Deep Dew hand- held chess champ (0.6L & 0.6kg) uses 22 AA cells (Li/FeS2, 22W for 3.5 hrs) & COTS parts incl. voice I/O chip ~2005

2002- 2004

SLIDE 8

MIT Lincoln Laboratory

000523-jca-8 KAM 10/7/2004

Four Decades of Progress at the System Level

Gordon Moore publishes “Cramming more components onto integrated circuits” Robert Schaller publishes “Moore’s Law: past, present and future”

Mark Richards (with Gary Shaw) publishes “Sustaining the exponential growth of embedded digital signal processing capability” Computers lose badly at chess Deep Blue (1270kg) beats chess champ Kasparov Chess champ Kramnik ties Deep Fritz & Kasparov ties Deep Junior (10K lines C++ running on 15 GIPS server using 3 Gbytes)

1965 1997

Deep Dew hand-held chess champ (0.6L & 0.6kg) uses 22 AA cells (Li/FeS2, 22W for 3.5 hrs) & COTS parts incl. voice I/O chip Deep Yogurt has 1/3 the size & power of Deep Dew, with 3X improvement in 3 yrs

~2005 ~2008 2002- 2004

SLIDE 9

MIT Lincoln Laboratory

000523-jca-9 KAM 10/7/2004

Power per Unit Volume (Watts/Liter) for Representative Systems ca. 2003

1.00E-01 1.00E+00 1.00E+01 1.00E+02 1.00E+03 1.00E+04 1.00E+05 1.00E+06 1.00E+07 1.00E+08 1.00E-02 1.00E-01 1.00E+00 1.00E+01 1.00E+02 1.00E+03

Computation Efficiency (GIPS/watt) Com putation Density (GIPS/Liter)

Deep Fritz & Deep Junior Chess Server Throughput in GIPS (billions of Dhrystone instructions/sec) 7 W / L l i m i t f

r

c

n

v e c t i

n
c
l

e d c a r d s ( t y p i c a l f

r

c

n

d u c t i

n
c
l

e d ) Hand-held unit feasible with COTS parts 4Q03, but not built

SLIDE 10

MIT Lincoln Laboratory

000523-jca-10 KAM 10/7/2004

Power per Unit Volume (Watts/Liter) for Representative Systems ca. 2003

1.00E-01 1.00E+00 1.00E+01 1.00E+02 1.00E+03 1.00E+04 1.00E+05 1.00E+06 1.00E+07 1.00E+08 1.00E-02 1.00E-01 1.00E+00 1.00E+01 1.00E+02 1.00E+03

Computation Efficiency (GIPS/watt) Com putation Density (GIPS/Liter)

Deep Fritz & Deep Junior Chess Server Throughput in GIPS (billions of Dhrystone instructions/sec) Human chess champs Kramnik & Kasparov Chess champs’ brains 7 W / L l i m i t f

r

c

n

v e c t i

n
c
l

e d c a r d s ( t y p i c a l f

r

c

n

d u c t i

n
c
l

e d ) 1.6 W/L moderately active human (human vs. machine “Turing Tests”)

Kramnik & Deep Fritz

Hand-held unit feasible with COTS parts 4Q03, but not built

SLIDE 11

MIT Lincoln Laboratory

000523-jca-11 KAM 10/7/2004

Power per Unit Volume (Watts/Liter) for Representative Systems ca. 2003

1.00E-01 1.00E+00 1.00E+01 1.00E+02 1.00E+03 1.00E+04 1.00E+05 1.00E+06 1.00E+07 1.00E+08 1.00E-02 1.00E-01 1.00E+00 1.00E+01 1.00E+02 1.00E+03

Computation Efficiency (GIPS/watt) Computation Density (GIPS/Liter)

8000 Watts/Liter nuclear reactor core Deep Fritz & Deep Junior Chess Server PowerPC 750FX (0.13µm, 800 MHz) Active die volume (1µm depth) Throughput in GIPS (billions of Dhrystone instructions/sec) Human chess champs Kramnik & Kasparov Chess champs’ brains Die (1mm thick) Packaged device 70 W/L limit for convection-cooled cards (typical for conduction-cooled) 1.6 W/L moderately active human (human vs. machine “Turing Tests”) Computer card

Kramnik & Deep Fritz RBMK-1500 reactor

Hand-held unit feasible with COTS parts 4Q03, but not built

SLIDE 12

MIT Lincoln Laboratory

000523-jca-12 KAM 10/7/2004

System-level Improvements Falling Short of Historical Moore’s Law

1 10 100 1000 10000 0.01 0.10 1.00 10.00 100.00

Computation Efficiency (GFLOPS/Watt) Computation Density (GFLOPS/Liter)

GFLOPS (billions of 32 bit floating-point

perations/sec) sustained for 1K complex FFT

using 6U form factor convection-cooled COTS multiprocessor cards <55W, 2Q04 data 7/99 3/00 3/99 SRAM-based FPGA, 2/09 Special-purpose ASIC, 6/10 General-purpose RISC with on-chip vector processor, 2/10

Moore’s Law slope: 100X in 10 yrs Y2K 2010

ASIC RISC FPGA COTS ASIC & FPGA improvements

utpacing general-purpose

processors, but all fall short of historical Moore’s Law

SLIDE 13

MIT Lincoln Laboratory

000523-jca-13 KAM 10/7/2004

Timeline for ADC Sampling Rate & COTS Processors (2Q04)

1 10 100 1000 10000

1 / 1 / 1 9 9 2 1 / 1 / 1 9 9 3 1 / 1 / 1 9 9 4 1 / 1 / 1 9 9 5 1 / 1 / 1 9 9 6 1 / 1 / 1 9 9 7 1 / 1 / 1 9 9 8 1 / 1 / 1 9 9 9 1 / 1 / 2 1 / 1 / 2 1 1 / 1 / 2 2 1 / 1 / 2 3 1 / 1 / 2 4 1 / 1 / 2 5 1 / 1 / 2 6 1 / 1 / 2 7 1 / 1 / 2 8 1 / 1 / 2 9 1 / 1 / 2 1

92 94 96 1 10

Rate (MSPS) Year

100 98 00 1000 10,000

Moore’s Law slope: 4X in 3 yrs

1 2

t
1

4

b

i t A D C s 04 06 08 02 10

2X in 3 yrs SRAM-based FPGAs Pair of analog-to-digital converters provide data to processor card for 32 bit floating-point 1K complex FFT Highest-performance 6U form factor multiprocessor cards <55W 3X in 3yrs G e n e r a l

p

u r p

s

e µ P , D S P & R I S C ( w / v e c t

r

p r

c

e s s

r

)

Open systems architecture goal: mix old & new general- & special-purpose cards, with upgrades as needed (from 1992-2003, a new card could replace four 3-yr-old cards)

Special- purpose ASICs Projections assume future commercial market for 1 GSPS 12- bit ADCs & 50 GFLOPS cards with 8 Gbytes/sec I&O

SLIDE 14

MIT Lincoln Laboratory

000523-jca-14 KAM 10/7/2004

Representative Embedded Computing Applications

Sonar for anti-submarine rocket-launched lightweight torpedo (high throughput requirements but low data rates) Radio for soldier’s software-defined comm/nav system (severe size, weight & power constraints) Radar for mini-UAV surveillance applications (stressing I/O data rates) ~3m wingspan

Wingspan < 3m

Cost- & schedule-sensitive real-time applications with high RAS (reliability, availability & serviceability) requirements

SLIDE 15

MIT Lincoln Laboratory

000523-jca-15 KAM 10/7/2004

Embedded Signal Processor Speed & Numeric Representations Must Track ADC Improvements .1

2005 (2Q04 data) 2009 (2Q04 projections)

Sonar Radio Radar H i g h e s t p e r f

r

m a n c e c

m

m e r c i a l

f

f

t

h e

s

h e l f a n a l

g
t
d

i g i t a l c

n

v e r t e r s 48-64 bit floating-point 32 bit floating-point 32 bit floating- or fixed-point* 16-32 bit fixed-point ADC ENOB Typical Processor Numeric Representation

Sonar example near limit of 32 bit floating-point (18 ADC bits @ 100 KSPS + 5 bits processing gain vs. 23 bit mantissa + sign bit) Radio example near limit of 16 bit fixed- point (10 ADC bits @ 400 MSPS + 5 bits processing gain) *Floating-point preferred (same memory & I/O as fixed-point)

5 10 15 20 1 10 100 1000 10000 Sampling Rate (MSPS) E ffective N u m b er o f B its

SLIDE 16

MIT Lincoln Laboratory

000523-jca-16 KAM 10/7/2004

Objective, Questions for the Panel & Schedule

Objective: identify & characterize factors that affect the impact
f Moore’s Law on embedded applications
Questions for the panel

– 1). Moore’s Law: what’s causing the slowdown? – 2). What is the contribution of Moore’s Law to improvements at the embedded system level? – 3). Can we preserve historical improvement rates for embedded applications?

Schedule

– 1540-1600: panel introduction & overview – 1600-1620: guest speaker Dr. Robert Schaller – 1620-1650: panelist presentations – 1650-1720: open forum – 1720-1730: conclusions & the way ahead

Panel members & audience may hold diverse, evolving opinions

SLIDE 17

MIT Lincoln Laboratory

000523-jca-17 KAM 10/7/2004

Objective, Questions for the Panel & Schedule

Objective: identify & characterize factors that affect the impact
f Moore’s Law on embedded applications
Questions for the panel

– 1). Moore’s Law: what’s causing the slowdown? – 2). What is the contribution of Moore’s Law to improvements at the embedded system level? – 3). Can we preserve historical improvement rates for embedded applications?

Schedule

– 1540-1600: panel introduction & overview – 1600-1620: guest speaker Dr. Robert Schaller – 1620-1650: panelist presentations – 1650-1720: open forum – 1720-1730: conclusions & the way ahead

Panel members & audience may hold diverse, evolving opinions

SLIDE 18

MIT Lincoln Laboratory

000523-jca-18 KAM 10/7/2004

Conclusions & The Way Ahead

Slowdown in Moore’s Law due to a variety of factors

– Improvement rate was 4X in 3 yrs, now 2-3X in 3 yrs (still substantial) – Impact of slowdown greatest in “leading edge” embedded applications – Software issues may overshadow Moore’s Law slowdown

COTS markets may not emerge in time to support historical levels of

improvement

– Federal government support may be required in certain areas (e.g., ADCs) – Possible return of emphasis on advanced packaging and custom devices/technologies for military embedded applications

Developers need to overcome issues with I/O standards & provide

customers with cost-effective solutions in a timely manner: success may depend more on economic & political rather than technical considerations

Hardware can be designed to drive down software cost/schedule, but

new methodologies face barriers to acceptance

Improvements clearly come both from Moore’s Law & algorithms, but

better metrics needed to measure relative contributions “It’s absolutely critical for the federal government to fund basic research. Moore’s Law will take care of itself. But what happens after that is what I’m worried about.”

Gordon Moore, Nov. 2001

SLIDE 19

MIT Lincoln Laboratory

000523-jca-19 KAM 10/7/2004

Backup Slides

SLIDE 20

MIT Lincoln Laboratory

000523-jca-20 KAM 10/7/2004

Points of Reference

6U form factor card

– Historical data available for many systems – Convection cooled

Fans blow air across heat sinks Rugged version uses conduction cooling

– Size: 16x23cm, 2cm slot-to-slot (0.76L) – Weight: 0.6kg, typ. – Power: 54W max. (71W/L)

Power limitations on connectors & backplane Reliability decreases with increasing temperature

– Can re-package with batteries for hand-held applications (e.g., walkie-talkie similar to 1L water bottle weighing 1kg)

1024-point complex FFT (fast Fourier transform)

– Historical data available for many computers (e.g., fftw.org) – Realistic benchmark that exercises connections between processor, memory and system I/O – Up to 5 bits processing gain for extracting signals from noise – Expect 1µsec/FFT (32 bit floating-point) on 6U COTS card ~7/05

Assume each FFT computation requires 51,200 real operations 51.2 GFLOPS (billions of floating point operations/sec) throughput 1024 MSPS (million samples samples/sec, complex) sustained, simultaneous input & output (8 Gbytes/sec I&O) COTS (commercial

ff-the-shelf) 6U

multiprocessor card

SLIDE 21

MIT Lincoln Laboratory

000523-jca-21 KAM 10/7/2004

Moore’s Law & Variations, 1965-1997

“Original” Moore’s Law (1965, revised 1975)

– 4X transistors/die every 3 yrs – Held from late ’70s - late ’90s for DRAM (dynamic random access memory), the most common form of memory used in personal computers – Improvements from decreasing geometry, “circuit cleverness,” & increasing die size – Rates of speed increase & power consumption decrease not quantified

“Amended” Moore’s Law: 1997 National Technology Roadmap

for Semiconductors (NTRS97)

– Models provided projections for 1997-2012 – Improvement rates of 1.4X speed @ constant power & 2.8X density (transistors per unit area) every 3 yrs – For constant power, speed x density gave max 4X performance improvement every 3 yrs – Incorrectly predicted 560 mm2 DRAM die size for 2003 (4X actual)

Historically, Performance = 2Years/1.5

SLIDE 22

MIT Lincoln Laboratory

000523-jca-22 KAM 10/7/2004

Moore’s Law Slowdown, 1999-2003 (recent experience with synchronous DRAM)

Availability issues: production did not come until 4 yrs after

development for 1Gbit DDR (double data rate) SDRAMs (7/99 – 7/03)

SDRAM price crash

– 73X reduction in 2.7 yrs (11/99 – 6/02) – Justice Dept. price-fixing investigation began in 2002

Reduced demand

– Users unable to take advantage of improvements as $3 SDRAM chip holds 1M lines of code having $100M development cost (6/02) – Software issues made Moore’s Law seem irrelevant

Moore’s Law impacted HW, not SW Old SW development methods unable to keep pace with HW improvements SW slowed at a rate faster than HW accelerated Fewer projects had HW on critical path In 2000, 25% of U.S. commercial SW projects ($67B) canceled outright with no final product 4 yr NASA SW project canceled (9/02) after 6 yrs (& $273M) for being 5 yrs behind schedule

System-level improvement rates possibly slowed by factors not considered in Moore’s Law “roadmap” models

SLIDE 23

MIT Lincoln Laboratory

000523-jca-23 KAM 10/7/2004

The End of Moore’s Law, 2004-20XX

2003 International Technology Roadmap for Semiconductors (ITRS03)

– Models provide projections for 2003-2018 – 2003 DRAM size listed as 139 mm2 (1/4 the area predicted by NTRS97) – Predicts that future DRAM die will be smaller than in 2003 – Improvement rates of 1.5X speed @ constant power & 2X density every 3 yrs – Speed x density gives max 3X performance improvement every 3 yrs – Limited by lithography improvement rate (partially driven by economics)

Future implications (DRAMs & other devices)

– Diminished “circuit cleverness” for mature designs (chip & card level) – Die sizes have stopped increasing (and in some cases are decreasing) – Geometry & power still decreasing, but at a reduced rate – Fundamental limits (e.g., speed of light) may be many (more) years away

Nearest-neighbor architectures 3D structures

– Heat dissipation issues becoming more expensive to address – More chip reliability & testability issues – Influence of foundry costs on architectures may lead to fewer device types in latest technology (e.g., only SDRAMs and static RAM-based FPGAs)

Slower (but still substantial) improvement rate predicted, with greatest impact on systems having highest throughput & memory requirements

SLIDE 24

MIT Lincoln Laboratory

000523-jca-24 KAM 10/7/2004

High-Performance MPU (microprocessor unit) & ASIC (application-specific integrated circuit) Trends

Year of production 2004 2007 2010 2013 2016 MPU/ASIC 1/2 pitch, nm 90 65 45 32 22 Transistors/chip 553M 1106M 2212M 4424M 8848M Max watts @ volts 158@1.2 189@1.1 218@1.0 251@0.9 288@0.8V Clock freq, MHz 4,171 9,285 15,079 22,980 39,683 Clock freq, MHz, for 158W power 4,171 7,762 10,929 14,465 21,771

2003 International Technology Roadmap for Semiconductors

– http://public.itrs.net – Executive summary tables 1i&j, 4c&d, 6a&b – Constant 310 mm2 die size

Lithography improvement rate (partially driven by economics)

allows 2X transistors/chip every 3 yrs

– 1.5X speed @ constant power – ~3X throughput for multiple independent ASIC (or FPGA) cores while maintaining constant power dissipation – ~2X throughput for large-cache MPUs (constant throughput/memory), but power may possibly decrease with careful design

SLIDE 25

MIT Lincoln Laboratory

000523-jca-25 KAM 10/7/2004

Bottleneck Issues

Bottlenecks occur when interconnection bandwidth (e.g.,

processor-to-memory, bisection or system-level I/O) is inadequate to support the throughput for a given application

For embedded applications, I/O bottlenecks are a greater

concern for general-purpose, highly interconnected back-end

vs. special-purpose, channelized front-end processors

Can developers provide timely, cost-effective solutions to bottleneck problems?

SLIDE 26

MIT Lincoln Laboratory

000523-jca-26 KAM 10/7/2004

Processor Bottlenecks at Device & System Levels

Device level (ITRS03)

– 2X transistors & 1.5X speed every 3 yrs

High-performance microprocessor units & ASICs Constant power & 310 mm2 die size

– 3X throughput every 3 yrs possible if chip is mostly logic gates changing state frequently (independent ASIC or FPGA cores) – 2X throughput every 3 yrs is limit for microprocessors with large on-chip cache (chip is mostly SRAM & throughput/memory remains constant) – Possible technical solutions for microprocessors: 3D structures, on-chip controller for external L3 cache

System level

– 54W budget for hypothetical 6U COTS card computing 32 bit floating-point 1K complex FFT every 1µsec

10% (5W) DC-to-DC converter loss 40% (22W) I/O (7 input & 7 output links @ 10 Gbits/sec & 1.5W ea., RF coax, 2004) 50% (27W) processor (51 GFLOPS sustained) & memory (5 Gbytes)

– Possible technical solutions for I/O

RF coax point-to-point serial links with central crosspoint switch network (PIN diodes or MEMS switches) Fiber optic links (may require optical free-space crosspoint switch) & optical chip-to-chip interconnects

SLIDE 27

MIT Lincoln Laboratory

000523-jca-27 KAM 10/7/2004

Examples of Hardware vs. Algorithms

Static RAM-based FPGAs

– 2002: system-level throughput improved substantially vs. 1999 – 2/3 of improvement attributable to new devices, 1/3 to architecture changes

Chess computers

– 1997: Deep Blue provided 40 trillion operations per second using 600nm custom ASICs (but 250nm was state-of-the-art) – 2001: Desktop version of Deep Blue using state-of-the-art custom ASICs feasible, but not built – 2002-2003: improved algorithms provide functional equivalent of Deep Blue using COTS servers instead of custom ASICs

Speedup provided by FFT & other “fast” algorithms

Contributions of HW vs. algorithms may be difficult to quantify, even when all necessary data are available

SLIDE 28

MIT Lincoln Laboratory

000523-jca-28 KAM 10/7/2004

Cost vs. Time for Modern HS/SW Development Process (normalized to a constant funding level)

Cost (effort & expenditures) Time (SW release version) 100% 75% 50% 25% 1 2 3 4 Software Hardware Management Frequent SW-only “tech refresh” provides upgraded capabilities for fixed HW in satellites & space probes, ship-based missiles & torpedoes, radars, “software radios,” etc.

HW delivered with IOC SW

Initial operating capability SW has 12% HW utilization, allowing 8X growth over 9 yr lifetime (2X every 3 yrs): HW still programmable @ end-of-life

SLIDE 29

MIT Lincoln Laboratory

000523-jca-29 KAM 10/7/2004

Timeline for Highest Performance COTS ADCs, 2Q04

0.1 1 10 100 1000 10000 Jan-86 Jan-88 Jan-90 Jan-92 Jan-94 Jan-96 Jan-98 Jan-00 Jan-02 Jan-04 Jan-06 Jan-08

Year S a m p lin g R a te (M S P S )

86 90 94 98 96 02 0.1 1 10

Sampling Rate (MSPS) Year

100 88 92 00 1000 10,000

2 X s p e e d ( u p t

1

/ 2 b i t p r

c

e s s i n g g a i n ) i n ~ 4 . 5 y r s f

r

h i g h

s

p e e d A D C s 6

t
8
b

i t ( E N O B > 5 . ) 1

b

i t ( E N O B > 7 . 6 ) 12-bit (ENOB>9.8) 1 4

b

i t ( E N O B > 1 1 . 8 ) 16-bit (ENOB>12.5) 2 4

b

i t ( E N O B > 1 5 . 3 )

~1/3 effective bit/yr @ 100 MSPS

04 06 08 Dashed lines indicate upper bounds (all ADCs below a dashed line have Effective Number Of Bits indicated)

4X in 3 yrs for ENOB ~10

SLIDE 30

MIT Lincoln Laboratory

000523-jca-30 KAM 10/7/2004

Improvement Rates for Highest Performance COTS ADCs, 2Q04

5 10 15 20 1 10 100 1000 10000 Sampling Rate (million samples/sec) Effective Number of Bits 1985-89 1990-94 1995-99 2000-04

.1

1/2 bit/octave maximum processing gain with linearization ~1/3 bit/yr for high- resolution ADCs @ 100 MSPS ~ 1 b i t /

c

t a v e s l

p

e ADC improvements @ ~100 KSPS limited by commercial audio market 2X speed (up to 1/2 bit processing gain) in ~4.5 yrs for high- speed ADCs

SLIDE 31

MIT Lincoln Laboratory

000523-jca-31 KAM 10/7/2004

Evolution of COTS Embedded Multiprocessor Cards, 2Q04

1 10 100 1000 0.01 0.10 1.00 10.00

Computation Efficiency (GFLOPS/Watt) Computation Density (GFLOPS/Liter)

7 1 W / L i t e r l i m i t f

r

c

n

v e c t i

n
c
l

e d c a r d s

11/01

GFLOPS (billions of 32-bit floating point operations/sec) sustained for 1K complex FFT

7/99 3/00 6/10 3/99 12/01 10/04 6/10 8/02 7/05 6/10 12/03

Reconfigurable FPGA cards (~100 FLOPS/byte) improving 3X in 3 yrs Special-purpose ASIC cards (~10 FLOPS/byte) improving 3X in 3 yrs General-purpose RISC (with on- chip vector processor) cards (~10 FLOPS/byte) improving 2X in 3 yrs

SLIDE 32

MIT Lincoln Laboratory

000523-jca-32 KAM 10/7/2004

Timeline for Highest Performance COTS Multiprocessors, 2Q04

0.1 1 10 100 1000

S u s ta in e d T h ro u g h p u t (G F L O P S )

i860 µP 4@40 MHz Card-level Moore’s Law improvement rate was 4X in 3 yrs 1995 2000 2005 2010 2015 0.1 1 10

GFLOPS (card-level sustained throughput for 32-bit flt 1K cmplx FFT) Year

G4 400 MHz 1000 G3 375 MHz G4 500 MHz G2 200 MHz Quad PowerPC RISC S H A R C D S P Virtex 1000-6 FPGA 2@100 MHz 12@40 MHz 18@40 MHz Virtex II 6000 FPGA 3@150 MHz Pathfinder1(2) ASIC 2@80(120) MHz Quad PowerPC RISC with AltiVec

Open systems architecture goal: mix old & new general- & special- purpose cards, with upgrades as needed (a new card may replace four 3-yr-old cards)

Future Virtex FPGA 3@250 MHz 100

6U form factor cards <55W

Future FPGAs & ASICs 3X in 3 yrs Future microprocessors 2X in 3 yrs

ITRS2003 projections

Future G4 800 MHz TM-44 ASIC 2@100 MHz CS301 ASIC 8@166 MHz

SLIDE 33

MIT Lincoln Laboratory

000523-jca-33 KAM 10/7/2004

Timeline for COTS Processor I&O Rate and ADC Sampling Rate (2Q04)

1 10 100 1000 10000

1 / 1 / 1 9 9 2 1 / 1 / 1 9 9 3 1 / 1 / 1 9 9 4 1 / 1 / 1 9 9 5 1 / 1 / 1 9 9 6 1 / 1 / 1 9 9 7 1 / 1 / 1 9 9 8 1 / 1 / 1 9 9 9 1 / 1 / 2 1 / 1 / 2 1 1 / 1 / 2 2 1 / 1 / 2 3 1 / 1 / 2 4 1 / 1 / 2 5 1 / 1 / 2 6 1 / 1 / 2 7 1 / 1 / 2 8 1 / 1 / 2 9 1 / 1 / 2 1

92 94 96 1 10

Rate (MSPS) Year

100 98 00 1000 10,000

Moore’s Law slope: 4X in 3 yrs

Effective number of bits @ sampling rate for high- speed, high-resolution analog-to-digital converters 04 06 08 02 10 11.8 10.6 11.8 10.8 10.9 10.5 9.8 i860 µP SHARC DSP PowerPC RISC with AltiVec Virtex II FPGA Virtex FPGA

Programmable microprocessors, digital signal processors & reduced instruction set computers Field-programmable gate arrays

Card-level I&O cmplx sample rate sustained for 32 bit flt-pt 1K cmplx FFT (1000 MSPS for 50 GFLOPS)

Highest-performance 6U form factor multiprocessor cards <55W

Future FPGA Future FPGA 10.6 Future ADC

3X in 3yrs

Future AltiVec

2X in 3yrs

~10 Open systems architecture goal: mix old & new general- & special-purpose cards, with upgrades as needed (a new card may replace four 3-yr-old cards) Pathfinder-1 ASIC TM-44 Blackbird ASIC CS301 ASIC Future ASIC