Computing - Big Impact in the 21 st Century Wen-mei Hwu Professor - - PowerPoint PPT Presentation

computing big impact in the 21 st century
SMART_READER_LITE
LIVE PREVIEW

Computing - Big Impact in the 21 st Century Wen-mei Hwu Professor - - PowerPoint PPT Presentation

Computing - Big Impact in the 21 st Century Wen-mei Hwu Professor and Sanders-AMD Chair, ECE University of Illinois at Urbana-Champaign 1 1988 2016 Start of the Hwu Family Yale Wins Franklin Medal 2 Int 286 134K vs. 12.1B transistors 12


slide-1
SLIDE 1

Computing - Big Impact in the 21st Century

Wen-mei Hwu Professor and Sanders-AMD Chair, ECE University of Illinois at Urbana-Champaign

1

slide-2
SLIDE 2

2

1988 Start of the Hwu Family 2016 Yale Wins Franklin Medal

slide-3
SLIDE 3

3

134K vs. 12.1B transistors 12 MHz vs. 1.1 GHz 1.5 um vs. 12 nm 2.7 MIPS (needs 287 for FP) vs. 14 TFLOPS 1MB DRAM vs. 16GB HBM

Int 286

slide-4
SLIDE 4

The Industry Landscape

  • Apple II
  • Sony, DEC, IBM, Intel and

Microsoft

  • iPhone X
  • Samsung, Apple, NVIDIA,

Amazon, Google, and Facebook

4

slide-5
SLIDE 5

Innovation

A high-value concept in the right historical context

5

slide-6
SLIDE 6

Important Innovations in Recent History

  • Telescope
  • Microscope
  • Electricity
  • Telephone
  • Medical imaging
  • Electrical Motor
  • Automobiles
  • Airplane
  • Credit cards
  • Radar
  • Clean Energy
  • Mobile phones
  • Internet and search engine
  • eCommerce
  • Social media
  • GPS

6

slide-7
SLIDE 7

7

Future innovations will rely heavily on computing

slide-8
SLIDE 8

On May 11, 1997, IBM Deep Blue defeated world champion of chess (Garry ry Kasparov)

slide-9
SLIDE 9

Feb 16, 2011, IBM Watson defeated two world champions of Final Jeopardy!

slide-10
SLIDE 10

On March 15, 2016, Google AlphaGo defeated South Korean Go grandmaster Lee Sedol

slide-11
SLIDE 11

In Jan 2017, CMU Libratus beat professional players in heads-up no no-limit Texas hold’em poker game

slide-12
SLIDE 12

On Jan 24, 2019, Google AlphaStar defeated human pros at r real-time strategy game StarCraft II

slide-13
SLIDE 13

On Feb 11, 2019, IBM Project Debater debated with an European world champion

slide-14
SLIDE 14

Hardware for Watson Jeopardy! 2011

  • 90 x IBM Power 750 servers
  • 2880 Power7 cores
  • 3.55 GHz clock
  • 80 TeraFLOPS
  • 15 Terabytes of DRAM
  • 20 Terabytes of disk
  • 10 Gb Ethernet network
  • > 100,000 Watt power
slide-15
SLIDE 15

Software for Watson Jeopardy! 2011

“Watson DeepQA generates and scores many hypotheses using an extensible collection

  • f Natural Language

Processing, Machine Learning and Reasoning Algorithms. These gather and weigh evidence over both unstructured and structured content to determine the answer with the best confidence.”

slide-16
SLIDE 16

Novelty vs. Great Product

16

German Flocken Elektrowagen of 1888, regarded as the first electric car of the world American Tesla Model X of 2017, whose producer is worth more than GM and Ford

slide-17
SLIDE 17

A Simplified View of IBM Newell with NVIDIA Volta

GPUs

CPU Host (~1 TFLOPS) DDR Memory System (~100 GBs) GPU 1 (~14 TFLOPS)

HBM2 (~16 GBs)

GPU 2 (~14 TFOPS)

HBM2 (~16 GBs)

80 GB/s 80 GB/s 80 GB/s 100 GB/s 900 GB/s 900 GB/s Storage (~10 TBs) 16 GB/s

slide-18
SLIDE 18

Data Access Challenge (HBM2)

Volta

14.03 SP TFLOPS

HBM2

16 GB

225 Giga SP

  • perands/cycle

900 GB/S

Each operands must be used 62.3 times once fetched to achieve peak FLOPS rate.

  • r

Sustain < 1.6% of peak without data reuse

slide-19
SLIDE 19

Graph Analytics Example – if graph fits into GPU Memory

CPU Host (~1 TFLOPS) DDR Memory System (~100 GBs) GPU 1 (~10 TFLOPS)

HBM2 (~16 GBs)

GPU 2 (~10 TFOPS)

HBM2 (~16 GBs)

80 GB/s 80 GB/s 80 GB/s 100 GB/s 900 GB/s 900 GB/s Storage (~10 TBs) 16 GB/s

~100 GOPS Sustained

slide-20
SLIDE 20

Data Access Challenge (Host DDR3)

Volta

14.03 SP TFLOPS

Host DDR3

128-512 GB

NVLINK

20 Giga SP

  • perands/cycle

80 GB/S

Each operands must be used 700 times once fetched to achieve peak FLOPS rate.

  • r

Sustain < 0.14% peak without data reuse

slide-21
SLIDE 21

Graph Analytics Example – if graph fits into Host DDR Memory

CPU Host (~1 TFLOPS) DDR Memory System (~100 GBs) GPU 1 (~10 TFLOPS)

HBM2 (~16 GBs)

GPU 2 (~10 TFOPS)

HBM2 (~16 GBs)

80 GB/s 80 GB/s 80 GB/s 100 GB/s 900 GB/s 900 GB/s Storage (~10 TBs) 16 GB/s

~10 GOPS Sustained Tremendous loss of both performance and energy efficiency

slide-22
SLIDE 22

Data Access Challenge (FLASH SSD)

Volta

14.03 SP TFLOPS

FLASH

1,000-5,000 GB

PCIe 3

4 Giga SP

  • perands/cycle

16 GB/S

Each operands must be used 3,507 times once fetched to achieve peak FLOPS rate.

  • r

Sustain < 0.03% of peak without data reuse

slide-23
SLIDE 23

Graph Analytics Example – if graph is accessed from storage

CPU Host (~1 TFLOPS) DDR Memory System (~100 GBs) GPU 1 (~10 TFLOPS)

HBM2 (~16 GBs)

GPU 2 (~10 TFOPS)

HBM2 (~16 GBs)

80 GB/s 80 GB/s 80 GB/s 100 GB/s 900 GB/s 900 GB/s Storage (~10 TBs) 16 GB/s

< 1 GOPS Sustained

slide-24
SLIDE 24

Erudite Vision: remove file system from data access path

CPU Host (~1 TFLOPS) DDR/Flash Memory System (~10 TBs) GPU 1 (~14 TFLOPS)

HBM2 (16 GBs)

GPU 2 (~14 TFOPS)

HBM2 (16 GBs)

80 GB/s 80 GB/s 80 GB/s 100 GB/s 900 GB/s 900 GB/s Storage (~10 TBs) 16 GB/s

ASPLOS 2016, OOPSLA 2017, ASPLOS 2019

software/DMA

slide-25
SLIDE 25

FlatFlash – Storage-class Memory

Traditional FlatFlash

ASPLOS ‘19 – Abdula, Mailthody, Quresh, Xiong, Huang, Kim, Hwu

slide-26
SLIDE 26

Erudite Vision: place NMA compute inside memory system

CPU Host (~1 TFLOPS) GPU 1 (~10 TFLOPS)

HBM2 (16 GBs)

GPU 2 (~10 TFOPS)

HBM2 (16 GBs)

80 GB/s 80 GB/s 80 GB/s 100 GB/s 900 GB/s 900 GB/s 100+ GFLOPS NMA Compute Proportional to data capacity DDR/Flash Memory System (~10 TBs)

IEEE MICRO 2017

Compute Kernel launched from CPU and GPU

slide-27
SLIDE 27

DeepStore: In-Storage Acceleration for Intelligent Image Search

27

Image-based Apps Hard to Build Index for Intelligent Image-Based Search Applications

Each image has multiple features Different app queries different features

slide-28
SLIDE 28

Case Study: Person Re-Identification

28

Offline Preprocessing Online Query for One Image Online Comparison for All Images

2 convolutions 1 matrix multiplication 2 matrix addition 2 comparison

slide-29
SLIDE 29

Some Predictions for Yale:100

  • Prominent Companies will be very different from today.
  • Prominent products will be very different from today.
  • The role of universities will be very different from today.
  • We will still complain about ISCA and MICRO reviews…
  • We will still come to Barcelona.

29

slide-30
SLIDE 30

Way to go, Yale!

30