Dataflow Super-Computing Jacob Bower YU INFO, February 2012 - - PowerPoint PPT Presentation

dataflow super computing
SMART_READER_LITE
LIVE PREVIEW

Dataflow Super-Computing Jacob Bower YU INFO, February 2012 - - PowerPoint PPT Presentation

Dataflow Super-Computing Jacob Bower YU INFO, February 2012 Maxeler Technologies Maxeler offers complete hardware, software and application acceleration solutions for high performance computing ~70 people, offices in London, UK and Palo


slide-1
SLIDE 1

Jacob Bower

YU INFO, February 2012

Dataflow Super-Computing

slide-2
SLIDE 2
  • Maxeler offers complete hardware, software and application

acceleration solutions for high performance computing

  • ~70 people, offices in London, UK and Palo Alto, CA

2

Maxeler Technologies

Hardware

  • Card: PCI Express x16, compute, memory and local interconnect
  • Node: 1U solutions with 1 or 4 Cards
  • Rack: 10U, 20U or 40U, balancing compute, storage & network
  • MaxelerOS: Resource management of Dataflow Computing
  • Runtime support: memory management and data choreography
  • MaxCompiler: providing programmability

Consulting

  • HPC System Performance Architecture
  • Algorithms and Numerical Optimization
  • Integration into business and technical processes

Software

slide-3
SLIDE 3

Overview

Dataflow Computing Programming Dataflow Systems Dataflow Engines and Platforms Case Study: Accelerating Risk Computation

3

slide-4
SLIDE 4

DATAFLOW COMPUTING

slide-5
SLIDE 5

5

Computing with Instruction Processors

slide-6
SLIDE 6

Instruction Processor Spectrum

6

Single-Core CPU Multi-Core Many-Core Intel, AMD GPU (NVIDIA, AMD) Tilera, XMOS etc... Hybrid e.g. AMD Fusion, IBM Cell

slide-7
SLIDE 7

Computing with Dataflow

7

slide-8
SLIDE 8

Computation Resources

Intel 6-Core X5680 “Westmere”

8

Computation MaxelerOS

Computation

Xilinx Virtex-6 SX 475T

slide-9
SLIDE 9

PROGRAMMING DATAFLOW SYSTEMS

slide-10
SLIDE 10

Programming with MaxCompiler

10

MaxCompilerRT MaxelerOS Memory CPU Dataflow Engine Memory

PCI Express

Kernels

*

+ +

Manager (MaxelerOS) Host application

slide-11
SLIDE 11

MaxCompiler Architecture

11

slide-12
SLIDE 12

Dataflow Kernel Programming

12

3 / ) (

1 1  

  

i i i i

x x x y

slide-13
SLIDE 13

DATAFLOW ENGINES AND PLATFORMS

slide-14
SLIDE 14

Various Dataflow Systems

14

MaxWorkstation Desktop development system MaxNode High density compute system 1-4 Dataflow Engines with up to 192GB RAM MaxNode10G Low latency connectivity platform 1-2 Dataflow Engines with up to six 10GE connections MaxRack 10, 20 or 40 node rack systems Balanced compute, networking & storage

slide-15
SLIDE 15

MaxNode with MAX3

  • 1U Form Factor
  • 4x MAX3 cards

with Virtex-6 FPGAs

  • MaxRing interconnect
  • 2x Intel Xeon CPUs
  • Up to 192GB host RAM
  • Up to 192GB FPGA RAM
  • 3x 3.5” disks
  • ~700W Power

15

slide-16
SLIDE 16

CASE STUDY: ACCELERATING J.P. MORGAN RISK COMPUTATION

slide-17
SLIDE 17

Computational Finance

  • Compute value of complex financial products
  • Compute risk: what’s the sensitivity to X changing?
  • Typically computed overnight on hundreds to thousands
  • f CPU cores. But

– The market changes throughout the day! – We really need to evaluate scenarios: what happens if country X defaults?

17

slide-18
SLIDE 18
  • Bonds are a way for Companies/Countries to borrow

– Investors make profit through coupon payments

  • Investors mitigate risk using

– Credit Default Swaps (CDS) – Collateralized Debt Obligations (CDO)...

18

Credit Derivatives 101

slide-19
SLIDE 19

CDOs

19

CDS CDS CDS CDS CDS

slide-20
SLIDE 20

CDOs

20

CDS CDS CDS CDS CDS High risk Low risk

Tranche Tranche

slide-21
SLIDE 21

CDOs

21

CDS CDS CDS CDS CDS High risk Low risk

slide-22
SLIDE 22

CDOs

22

CDS CDS CDS CDS CDS High risk Low risk

slide-23
SLIDE 23

Losses for Different Tranches

23

100000 200000 300000 400000 500000 600000 700000 800000 900000 1000000

20 40 60 80 100 120

Amount of Loss in Tranche ($) Number of Names Defaulted

Amount of Loss for $1M notional in various tranches of 125 name pool

0% - 100% (CDSI) 0%-3% 3%-7% 7%-15% 15%-30% 30%-100%

slide-24
SLIDE 24

24

Valuing Tranched Credit Derivatives

Market factor Correlation Unconditional Survival Probability for this Name Conditional Survival Probability for this Name

Amount of Loss (%) Probability 100 1 Amount of Loss (%) Probability 100 1

Good Market (M>>0) Bad Market (M<<0) M

slide-25
SLIDE 25

Application Analysis

25

slide-26
SLIDE 26

Convoluter Design

26

Credits Unrolled (c) Market Factors Unrolled (m) Conditional Survival Probabilities Weights Notional Sizes Accumulated Loss Distribution (weighted sum)

slide-27
SLIDE 27
  • Calculation of current value and credit spread risk for

population of 2,925 bespoke tranches.

  • Speedup from 1 MAX2:

– 219 – 270x compared to 1 core – ~30x compared to 8-core node

  • Power consumption drops from 250W/node to

240W/node with acceleration

– >30x more power efficient

27

Credit Derivatives Acceleration

slide-28
SLIDE 28

Summary & Conclusions

28

  • Dataflow computing allows:

– massive parallelism in computation – highly efficient use of silicon area on chips

  • Maxeler creates:

– dataflow engines and systems – MaxCompiler to easily program these

  • Dataflow computing used at J.P. Morgan

– around 30x performance improvement in speed