ASIC Clouds: Specializing the Datacenter Ikuo Magaki, Moein - - PowerPoint PPT Presentation

▶

Nov 21, 2023 338 likes •626 views

ASIC Clouds: Specializing the Datacenter Ikuo Magaki, Moein Khazraee, Luis Vega Gutierrez, and Michael Bedford Taylor UC San Diego and Toshiba Presented By: Vandit Agarwal Motivation GPU and FPGA based clouds already successful Even ASIC

SLIDE 1

ASIC Clouds: Specializing the Datacenter

Ikuo Magaki, Moein Khazraee, Luis Vega Gutierrez, and Michael Bedford Taylor UC San Diego and Toshiba

Presented By: Vandit Agarwal

SLIDE 2

Motivation

GPU and FPGA based clouds already successful
Even ASIC Clouds have been successfully used
Take this idea ahead to form ASIC based clouds for other applications
Purpose built Datacenter
Large arrays of ASIC accelerators
Optimize Total Cost of Ownership (TCO)
For increasingly common high-volume chronic computations
Downside:
High Non Recurring Engineering (NRE)
Inflexibility

SLIDE 3

Introduction

Two visible trends:
Heavy work done on cloud; interactive moved to client
Rise of dark silicon - specialization and near threshold

computation

Conjunction of these two designs proved viable
On a single machine level, ASICs can offer at least an order

improvement - explore and propose ASIC cloud

Identify key issues by studying Bitcoin ASIC Cloud

SLIDE 4

Objective In a Nutshell

Two key metrics drive the development:
H/w cost per performance = $ per op/s
Energy per operation = W per op/s
Working with a joint knowledge/control over datacenter and h/

w design

Select single TCO-optimal point amongst many Pareto-
ptimal points

SLIDE 5

ASIC Design: achieves reduction in silicon area and energy consumption
ASIC Server: organization of ASIC, heat sinks, selective components, custom voltages
ASIC Datacenter: optimize rack and datacenter level thermal distribution, costs such as provisioning cost,

availability, taxes etc.

**To meet the requirements at datacenter level, modifications trickle down in the hierarchy

Specialization Hierarchy

Off-PCB Interface On-PCB Network

On-ASIC Interconnection Network

SLIDE 6

ASIC Cloud Architecture

Trying to create a generic skeleton for ASIC Cloud
Heart of ASIC cloud - Replicated Compute Accelerator (RCA) -

multiplied recursively

Customization: eg - if RCA requires DRAM, then ASIC contains

shared DRAM controllers connected to ASIC-local DRAMs

Off-PCB Interface On-PCB Network

On-ASIC Interconnection Network

SLIDE 7

ASIC Server Overview

Focussed on 1U 19-inch

Rackmount servers

Forced air-cooling system
Air intake from front,

removal from back

Air at 30oC

SLIDE 8

ASIC Server Evaluation Flow

Given an implementation and architecture for target

RCA:

VLSI tools used to map it to target process
Analysis tools provide info on:
Area
Performance
Power density
Tune the following to find lowest TCO:
No. of RCAs/Chip
No. of chips/PCB
Organization of chips on PCB
Power delivery mechanism
Cooling mechanism
Choice of voltage

SLIDE 9

Thermally-Aware ASIC Server Design

ASICs and DC/DC convertors - major sources of heat
Heat Sinks:
Heat spreader glued to the heat source (die) using

Thermal Interface Material (TIM)

Spreader has fins - air blowed through them
Increasing spreader size improves cooling
Increasing the die size improves cooling -
vercomes TIM resistance
Developed a model:
Input: fan curve, ASIC count/row
Output: Optimal heat sink parameters

SLIDE 10

Arranging ASICs on PCB

SLIDE 11

More Chips vs Fewer Chips

How large (in mm2) should each chip

be?

Determines how many RCAs will be
n each chip
Many small ASICs easier to cool than

few large ASICs

Increasing silicon area -> heat

dissipation capacity increases (TIM)

Large total die area in a row is effective
Increasing no. of chips increases the

packaging cost but not by much

SLIDE 12

Power Density and Server Cost

Given same RCA, increasing

Watts, increases performance

Moving right (high power

density), very little total silicon per lane (due to temperature constraints) and must be divided into many smaller chips

Cooling and packaging cost
Moving left (low power density),

more silicon per lane and fewer chips

Silicon area cost

SLIDE 13

Bitcoin

Semi-anonymously and securely transfer money
Blockchain - globally replicated public ledger of

transactions

A distributed consensus algorithm called

Byzantine Fault Tolerance determines whose transactions are added to the blockchain

Mining:
Machines request work from a pool server
Hash - brute force attempt at partial inversion
f cryptographically hard hash function
Hashrate - rate of hash - typically Giga

hashes per second (GH/s)

On success, other machines verify. Accept

and append the block

SLIDE 14

What Led to Bitcoin ASIC Cloud?

People are incentivized to mine:
More number of machine = more secure system
Blockchain reward (25 BTC = ~USD 11k in 2016)
144 blocks daily x 25 BTC per block = ~USD 1.5M daily
Rising TCO justifies the increased investment in NRE and other development cost
Leads to more specialization

SLIDE 15

Bitcoin ASIC Trend

Difficulty

SLIDE 16

Implementation

0.66 mm2 silicon in UMC 28-nm process.
Power density: 2W/mm2
Extremely high power density

SLIDE 17

Results

More silicon -> optimal voltages decreases
> server efficiency increases
Initially, costs reduce (right to left) but then

silicon costs start building up

SLIDE 18

Voltage Stacking

DC/DC power is

significant

Chips serially chained

so that their supplies sum to 12V

Lead to significant

savings in TCO optimal case

SLIDE 19

Litecoin ASIC Cloud

SLIDE 20

Video Transcoding ASIC Cloud

**Pareto points are glitchy because of variations in constants and polynomial order for server components as they vary with voltages

SLIDE 21

CNN ASIC Cloud

SLIDE 22

When is ASIC Cloud Feasible

SLIDE 23

Discussion

This is one of the earlier attempts to create a general

framework/skeleton for an ASIC cloud. How feasible do you think this technology is and how widely and how soon can we potentially adopt it for a large variety of applications?

The authors recommend that open sourcing various tools by

the cloud providers and silicon foundries would potentially lead to lower TCO. Is this a good solution? Why or why not?

What do you think is more optimal? Investing heavily in (high

NRE) in more advanced nodes (eg 16nm) or using/modifying

lder nodes (eg 65nm) in an ASIC?

SLIDE 24

Bitcoin ASIC Cloud Design

Repeatedly execute a Bitcoin hash operation
Input: 512 bit block
Mutate the block and perform SHA256 on it
Fed into another round of SHA256
Leading zero count performed and matched with the

target

64 rounds in each SHA

SLIDE 25

SLIDE 26

SLIDE 27