An Initial Characterization of the Emu Chick Eric Hein, Tom Conte, - PowerPoint PPT Presentation

An Initial Characterization of the Emu Chick Eric Hein, Tom Conte, (ECE) Jeff Young, (CS) Srinivas Eswar, Jiajia Li, Patrick Lavin, Richard Vuduc, Jason Riedy (CSE) 5/21/2018 . . . .

Migratory Memory-side Processing • Main innovation: Thread contexts migrate to the data • Threads always read from local memory • Migration is hardware-controlled, triggered on a remote read • CRNCH Rogue’s Gallery • Early access to Emu Chick hardware prototype . . . 2 5/21/2018 .

Outline  Emu Architecture Description  Data Allocation and Thread Spawning  Benchmark Results • STREAM • Sparse Matrix Vector Multiply (SpMV) • Pointer Chasing  Simulator Validation  Conclusion . . . 3 5/21/2018 .

Emu Architecture . . . 4 5/21/2018 .

Fine-grained Memory Accesses  Narrow-channel DRAM (NCDRAM) • 8-bit bus allows access at 8-byte granularity without waste • Many narrow channels instead of few wide channels  Remote Writes • Write to remote nodelet without migrating • Proceed directly to the memory front-end, bypassing the GC  Remote Atomics • Performed in Memory Front-End (MFE), near memory . . . 5 5/21/2018 .

Emu Cilk  cilk_spawn: create a child thread to execute a function in parallel • Creates an actual thread, not just a continuation stack frame • No work-stealing across nodelets  cilk_sync: wait for all child threads to complete • Threads die instead of waiting, last thread to arrive continues  “Remote Spawn”: Create thread on a remote nodelet • Determines location of cactus stack frame . . . 6 5/21/2018 .

Emu Nodelet Emu Node Card Emu Chick Emu1 Rack (8 nodelets) (8 nodes) (256 nodes) Current Future Current Future Current Future # of cores 1 core 4 cores 8 cores 32 cores 64 cores 256 cores 8192 cores # of threads 64 256 512 2048 4096 16384 > 2 million Memory 2 GiB 8 GiB 16 GiB 64 GiB 128 GiB 512 GiB 16 TiB capacity # of 8-bit 1 channel 1 channel 8 channels 8 channels 64 channels 64 channels 2048 DDR4 channels Memory 120 MB/s 2.5 GB/s 1.2 GB/s 20 GB/s 8 GB/s 160 GB/s 5.12 TB/s bandwidth 7 Images and data from www.emutechnology.com

STREAM: Thread spawning Serial Spawn Recursive Spawn https://www.cilkplus.org/tutorial-cilk-plus-keywords#cilk_for . . . 8 5/21/2018 .

Emu Memory Layouts . . . 9 5/21/2018 .

Nodelet 0 Nodelet 1 Nodelet 2 Nodelet 3 Serial Spawn Serial Remote Spawn Recursive Remote Spawn 10

STREAM: Emu hardware results (single-node) ~140 MB/s per nodelet ~1.2 GB/s per node (8 nodelets) • Surprisingly, serial spawn performance matches recursive spawn • Remote spawn is necessary to saturate global bandwidth . . . 11 5/21/2018 .

STREAM: Emu hardware results (multi-node) . . . 12 5/21/2018 .

Proxy applications  Pointer chasing -> streaming graphs • Designed to mimic the traversal of an edge list in a streaming graph data structure • Using results to tune streaming graph engine for Emu  SpMV -> sparse tensor analysis • Exploring data layout options with a sparse matrix • Will transfer knowledge to design of sparse tensor library . . . 13 5/21/2018 .

Sparse matrix-vector multiply (SpMV) Data layout is very important on Emu. We experimented with three layouts for the sparse matrix. In each case, the vector X was replicated onto all nodelets. . . . 14 5/21/2018 .

SpMV results: Emu simulator vs Haswell Xeon • “2D” layout outperforms “1D” layout • Spurious thread migrations are limiting performance on Emu . . . 15 5/21/2018 .

Pointer Chasing Microbenchmark  Designed to mimic access pattern of dynamically allocated data structures (i.e. streaming graphs) • Data-dependent loads : Memory-level parallelism is severely limited since each thread must wait for one pointer dereference to complete before accessing the next pointer • Fine-grained accesses : Spatial locality is restricted since all accesses are at a 16B granularity. This is smaller than a 64B cache line on x86 platforms, and much smaller than a typical DRAM page size (around 8KB). • Random access pattern : Since each block of memory is read exactly once in random order, caching and prefetching are mostly ineffective. . . . 16 5/21/2018 .

Pointer chasing: Initialization  1. Create a linked list of elements “Ordered” Access pattern is sequential and predictable Plenty of spatial locality available . . . 17 5/21/2018 .

Pointer chasing: Intra-block shuffle  2. Randomize traversal order of elements within each block “Intra-block shuffle” Creates small contiguous blocks of memory that are accessed in random order. Overall access pattern is still sequential. . . . 18 5/21/2018 .

Pointer chasing: Block shuffle  3. Randomize traversal order of each block “Block shuffle” Overall access pattern is now random. Small chunks of sequential locality still available. . . . 19 5/21/2018 .

Pointer Chasing: Sandy Bridge Xeon Results . . . 20 5/21/2018 .

Pointer Chasing: Emu Hardware Results (single-node) . . . 21 5/21/2018 .

Pointer Chasing: Bandwidth Utilization . . . 22 5/21/2018 .

Simulator Validation: STREAM When configured to match the current hardware specifications, the simulator results match closely for local stream and global stream. . . . 23 5/21/2018 .

Simulator Validation: Migrations • Pointer chasing performs 2x better in the simulator • Simulator is over-estimating migration throughput • Updated simulator matches more closely . . . 24 5/21/2018 .

Conclusions  First independent evaluation of Emu Chick prototype  STREAM bandwidth is low, but scales well  Memory layout and thread management decisions are critical to achieving scalability in SpMV  Pointer Chasing maintains 80% memory bandwidth utilization in a worst- case pointer chasing scenario  Future work: Applying these lessons to streaming graphs analytics and sparse tensor processing . . . 25 5/21/2018 .

An Initial Characterization of the Emu Chick Eric Hein, Tom Conte, - PowerPoint PPT Presentation

An Initial Characterization of the Emu Chick Eric Hein, Tom Conte, (ECE) Jeff Young, (CS) Srinivas Eswar, Jiajia Li, Patrick Lavin, Richard Vuduc, Jason Riedy (CSE) 5/21/2018 . . . . Migratory Memory-side Processing Main innovation:

team cell-mates learning with cell phones Where We Started Where We Went EMU project EMU

Why Chick-fil-A is the best fast food restaurant By: Sadie Wise Introduction Chick-fil-A was

EMU ARCHITECTURE AND THE FUTURE OF RISK SHARING IN EUROPE Bridge Forum Dialogue European

Gerard Chick FCIPS chief knowledge officer Sponsored by 25 th September 2015 Todays Presenter

Bell Schedule 2020-21 Initial Data Initial Data Initial Data Initial

ERMII anchoring on the way to EMU: more notional than real effects ? Jean-Sbastien Pentecte

Echo Measurement Utility (EMU) 818 West Diamond Avenue - Third Floor, Gaithersburg, MD 20878

Characterization of the Household Electricity Characterization of the Household Electricity

SITE CHARACTERIZATION Part 1. Non-Intrusive Site Characterization Technologies Tyler E. Gass,

Geomaterial Characterization Sub-topics Chemical characterization pH, TDS, EC, BOD, COD

Sub-topics Chemical characterization Sorption-Desorption (Contaminant Transport in Porous

The Entire Chick Came From Just One Egg Nutrient Composition of Whole Hens Egg 1. Weight 60 g

Presentation Fluid power genes and memes About sharks, mice and a poor albatross chick the 16th

Kissing a Chick Will Make You Sick: A Fowl Case of Salmonella Kayla Donohue, MPH CDC/CSTE

CHICK - CHAT Donna T. Tina M. OVERVIEW Coop safe housing for protection with roosting area

Chick-Fil-a Restaurant Addition Location: 0.83 acres 1105 N. Burleson Blvd.

EMU IETF 74 Note Well Any submission to the IETF intended by the Contributor for publication as

US CMS L1 Trigger Hardware R&D Thomas A. Gorski, Wesley H. Smith, U. Wisconsin - Madison

Agenda Welcome & Introductions Introduction to the program Chris Howard, General

Record Systems Dylan Just YOW LambdaJam 2015 Haskell Elm PureScript OCaml Idris Pascal

Reforming the Eurozone - Analysing Member-State Preferences Thomas Lehner 1 1 SCEUS (University of

Europes Unfinished Currency: the political economics of the Euro Dr Thomas Mayer Senior

Covid-19 Crisis: Evaluating EU/EMU responses and proposal wiiw webinar April 20th 2020 Philipp

Discussion of: Fiscal policy in EMU with downward nominal wage rigidity by Matthias Burgert,

An Initial Characterization of the Emu Chick Eric Hein, Tom Conte, - PowerPoint PPT Presentation

An Initial Characterization of the Emu Chick Eric Hein, Tom Conte, (ECE) Jeff Young, (CS) Srinivas Eswar, Jiajia Li, Patrick Lavin, Richard Vuduc, Jason Riedy (CSE) 5/21/2018 . . . . Migratory Memory-side Processing Main innovation:

team cell-mates learning with cell phones Where We Started Where We Went EMU project EMU

Why Chick-fil-A is the best fast food restaurant By: Sadie Wise Introduction Chick-fil-A was

EMU ARCHITECTURE AND THE FUTURE OF RISK SHARING IN EUROPE Bridge Forum Dialogue European

Gerard Chick FCIPS chief knowledge officer Sponsored by 25 th September 2015 Todays Presenter

Bell Schedule 2020-21 Initial Data Initial Data Initial Data Initial

ERMII anchoring on the way to EMU: more notional than real effects ? Jean-Sbastien Pentecte

Echo Measurement Utility (EMU) 818 West Diamond Avenue - Third Floor, Gaithersburg, MD 20878

Characterization of the Household Electricity Characterization of the Household Electricity

SITE CHARACTERIZATION Part 1. Non-Intrusive Site Characterization Technologies Tyler E. Gass,

Geomaterial Characterization Sub-topics Chemical characterization pH, TDS, EC, BOD, COD

Sub-topics Chemical characterization Sorption-Desorption (Contaminant Transport in Porous

The Entire Chick Came From Just One Egg Nutrient Composition of Whole Hens Egg 1. Weight 60 g

Presentation Fluid power genes and memes About sharks, mice and a poor albatross chick the 16th

Kissing a Chick Will Make You Sick: A Fowl Case of Salmonella Kayla Donohue, MPH CDC/CSTE

CHICK - CHAT Donna T. Tina M. OVERVIEW Coop safe housing for protection with roosting area

Chick-Fil-a Restaurant Addition Location: 0.83 acres 1105 N. Burleson Blvd.

EMU IETF 74 Note Well Any submission to the IETF intended by the Contributor for publication as

US CMS L1 Trigger Hardware R&amp;D Thomas A. Gorski, Wesley H. Smith, U. Wisconsin - Madison

Agenda Welcome &amp; Introductions Introduction to the program Chris Howard, General

Record Systems Dylan Just YOW LambdaJam 2015 Haskell Elm PureScript OCaml Idris Pascal

Reforming the Eurozone - Analysing Member-State Preferences Thomas Lehner 1 1 SCEUS (University of

Europes Unfinished Currency: the political economics of the Euro Dr Thomas Mayer Senior

Covid-19 Crisis: Evaluating EU/EMU responses and proposal wiiw webinar April 20th 2020 Philipp

Discussion of: Fiscal policy in EMU with downward nominal wage rigidity by Matthias Burgert,

US CMS L1 Trigger Hardware R&D Thomas A. Gorski, Wesley H. Smith, U. Wisconsin - Madison

Agenda Welcome & Introductions Introduction to the program Chris Howard, General