ASIC Clouds: Specializing the Datacenter Ikuo Magaki, Moein - - PowerPoint PPT Presentation

asic clouds specializing the datacenter
SMART_READER_LITE
LIVE PREVIEW

ASIC Clouds: Specializing the Datacenter Ikuo Magaki, Moein - - PowerPoint PPT Presentation

ASIC Clouds: Specializing the Datacenter Ikuo Magaki, Moein Khazraee, Luis Vega Gutierrez, and Michael Bedford Taylor UC San Diego and Toshiba Presented By: Vandit Agarwal Motivation GPU and FPGA based clouds already successful Even ASIC


slide-1
SLIDE 1

ASIC Clouds: Specializing the Datacenter

Ikuo Magaki, Moein Khazraee, Luis Vega Gutierrez, and Michael Bedford Taylor UC San Diego and Toshiba

Presented By: Vandit Agarwal

slide-2
SLIDE 2

Motivation

  • GPU and FPGA based clouds already successful
  • Even ASIC Clouds have been successfully used
  • Take this idea ahead to form ASIC based clouds for other applications
  • Purpose built Datacenter
  • Large arrays of ASIC accelerators
  • Optimize Total Cost of Ownership (TCO)
  • For increasingly common high-volume chronic computations
  • Downside:
  • High Non Recurring Engineering (NRE)
  • Inflexibility
slide-3
SLIDE 3

Introduction

  • Two visible trends:
  • Heavy work done on cloud; interactive moved to client
  • Rise of dark silicon - specialization and near threshold

computation

  • Conjunction of these two designs proved viable
  • On a single machine level, ASICs can offer at least an order

improvement - explore and propose ASIC cloud

  • Identify key issues by studying Bitcoin ASIC Cloud
slide-4
SLIDE 4

Objective In a Nutshell

  • Two key metrics drive the development:
  • H/w cost per performance = $ per op/s
  • Energy per operation = W per op/s
  • Working with a joint knowledge/control over datacenter and h/

w design

  • Select single TCO-optimal point amongst many Pareto-
  • ptimal points
slide-5
SLIDE 5
  • ASIC Design: achieves reduction in silicon area and energy consumption
  • ASIC Server: organization of ASIC, heat sinks, selective components, custom voltages
  • ASIC Datacenter: optimize rack and datacenter level thermal distribution, costs such as provisioning cost,

availability, taxes etc.

**To meet the requirements at datacenter level, modifications trickle down in the hierarchy

Specialization Hierarchy

Off-PCB Interface On-PCB Network

On-ASIC Interconnection Network

slide-6
SLIDE 6

ASIC Cloud Architecture

  • Trying to create a generic skeleton for ASIC Cloud
  • Heart of ASIC cloud - Replicated Compute Accelerator (RCA) -

multiplied recursively

  • Customization: eg - if RCA requires DRAM, then ASIC contains

shared DRAM controllers connected to ASIC-local DRAMs

Off-PCB Interface On-PCB Network

On-ASIC Interconnection Network

slide-7
SLIDE 7

ASIC Server Overview

  • Focussed on 1U 19-inch

Rackmount servers

  • Forced air-cooling system
  • Air intake from front,

removal from back

  • Air at 30oC
slide-8
SLIDE 8

ASIC Server Evaluation Flow

  • Given an implementation and architecture for target

RCA:

  • VLSI tools used to map it to target process
  • Analysis tools provide info on:
  • Area
  • Performance
  • Power density
  • Tune the following to find lowest TCO:
  • No. of RCAs/Chip
  • No. of chips/PCB
  • Organization of chips on PCB
  • Power delivery mechanism
  • Cooling mechanism
  • Choice of voltage
slide-9
SLIDE 9

Thermally-Aware ASIC Server Design

  • ASICs and DC/DC convertors - major sources of heat
  • Heat Sinks:
  • Heat spreader glued to the heat source (die) using

Thermal Interface Material (TIM)

  • Spreader has fins - air blowed through them
  • Increasing spreader size improves cooling
  • Increasing the die size improves cooling -
  • vercomes TIM resistance
  • Developed a model:
  • Input: fan curve, ASIC count/row
  • Output: Optimal heat sink parameters
slide-10
SLIDE 10

Arranging ASICs on PCB

slide-11
SLIDE 11

More Chips vs Fewer Chips

  • How large (in mm2) should each chip

be?

  • Determines how many RCAs will be
  • n each chip
  • Many small ASICs easier to cool than

few large ASICs

  • Increasing silicon area -> heat

dissipation capacity increases (TIM)

  • Large total die area in a row is effective
  • Increasing no. of chips increases the

packaging cost but not by much

slide-12
SLIDE 12

Power Density and Server Cost

  • Given same RCA, increasing

Watts, increases performance

  • Moving right (high power

density), very little total silicon per lane (due to temperature constraints) and must be divided into many smaller chips

  • Cooling and packaging cost
  • Moving left (low power density),

more silicon per lane and fewer chips

  • Silicon area cost
slide-13
SLIDE 13

Bitcoin

  • Semi-anonymously and securely transfer money
  • Blockchain - globally replicated public ledger of

transactions

  • A distributed consensus algorithm called

Byzantine Fault Tolerance determines whose transactions are added to the blockchain

  • Mining:
  • Machines request work from a pool server
  • Hash - brute force attempt at partial inversion
  • f cryptographically hard hash function
  • Hashrate - rate of hash - typically Giga

hashes per second (GH/s)

  • On success, other machines verify. Accept

and append the block

slide-14
SLIDE 14

What Led to Bitcoin ASIC Cloud?

  • People are incentivized to mine:
  • More number of machine = more secure system
  • Blockchain reward (25 BTC = ~USD 11k in 2016)
  • 144 blocks daily x 25 BTC per block = ~USD 1.5M daily
  • Rising TCO justifies the increased investment in NRE and other development cost
  • Leads to more specialization
slide-15
SLIDE 15

Bitcoin ASIC Trend

Difficulty

slide-16
SLIDE 16

Implementation

  • 0.66 mm2 silicon in UMC 28-nm process.
  • Power density: 2W/mm2
  • Extremely high power density
slide-17
SLIDE 17

Results

  • More silicon -> optimal voltages decreases
  • > server efficiency increases
  • Initially, costs reduce (right to left) but then

silicon costs start building up

slide-18
SLIDE 18

Voltage Stacking

  • DC/DC power is

significant

  • Chips serially chained

so that their supplies sum to 12V

  • Lead to significant

savings in TCO optimal case

slide-19
SLIDE 19

Litecoin ASIC Cloud

slide-20
SLIDE 20

Video Transcoding ASIC Cloud

**Pareto points are glitchy because of variations in constants and polynomial order for server components as they vary with voltages

slide-21
SLIDE 21

CNN ASIC Cloud

slide-22
SLIDE 22

When is ASIC Cloud Feasible

slide-23
SLIDE 23

Discussion

  • This is one of the earlier attempts to create a general

framework/skeleton for an ASIC cloud. How feasible do you think this technology is and how widely and how soon can we potentially adopt it for a large variety of applications?

  • The authors recommend that open sourcing various tools by

the cloud providers and silicon foundries would potentially lead to lower TCO. Is this a good solution? Why or why not?

  • What do you think is more optimal? Investing heavily in (high

NRE) in more advanced nodes (eg 16nm) or using/modifying

  • lder nodes (eg 65nm) in an ASIC?
slide-24
SLIDE 24

Bitcoin ASIC Cloud Design

  • Repeatedly execute a Bitcoin hash operation
  • Input: 512 bit block
  • Mutate the block and perform SHA256 on it
  • Fed into another round of SHA256
  • Leading zero count performed and matched with the

target

  • 64 rounds in each SHA
slide-25
SLIDE 25
slide-26
SLIDE 26
slide-27
SLIDE 27