Active Access: A Mechanism for High-Performance Distributed - PowerPoint PPT Presentation

spcl.inf.ethz.ch @spcl_eth Active Access: A Mechanism for High-Performance Distributed Data-Centric Computations M ACIEJ B ESTA , T ORSTEN H OEFLER

spcl.inf.ethz.ch @spcl_eth R EMOTE M EMORY A CCESS (RMA) P ROGRAMMING

spcl.inf.ethz.ch @spcl_eth R EMOTE M EMORY A CCESS (RMA) P ROGRAMMING Process p Memory A

spcl.inf.ethz.ch @spcl_eth R EMOTE M EMORY A CCESS (RMA) P ROGRAMMING Process q Process p Memory Memory A B

spcl.inf.ethz.ch @spcl_eth R EMOTE M EMORY A CCESS (RMA) P ROGRAMMING Process q Process p Memory Memory A B Cray BlueWaters

spcl.inf.ethz.ch @spcl_eth R EMOTE M EMORY A CCESS (RMA) P ROGRAMMING Process q Process p Memory Memory A put A A B Cray BlueWaters

spcl.inf.ethz.ch @spcl_eth R EMOTE M EMORY A CCESS (RMA) P ROGRAMMING Process q Process p Memory Memory A put A A B get B B B Cray BlueWaters

spcl.inf.ethz.ch @spcl_eth R EMOTE M EMORY A CCESS (RMA) P ROGRAMMING Process q Process p Memory Memory A put A A B get B B B flush Cray BlueWaters

spcl.inf.ethz.ch @spcl_eth R EMOTE M EMORY A CCESS (RMA) P ROGRAMMING Process q Process p Memory Memory put A A A get B B B B flush Cray BlueWaters

spcl.inf.ethz.ch @spcl_eth R EMOTE M EMORY A CCESS P ROGRAMMING  Implemented in hardware in NICs in the majority of HPC networks (RDMA)

spcl.inf.ethz.ch @spcl_eth R EMOTE M EMORY A CCESS P ROGRAMMING  Supported by many HPC libraries and languages

spcl.inf.ethz.ch @spcl_eth R EMOTE M EMORY A CCESS P ROGRAMMING  Enables significant speedups over message passing in many types of applications, e.g.: [1] R. Gerstenberger et al. Enabling Highly-Scalable Remote Memory Access Programming with MPI-3 One-Sided. SC13 [2] D. Petrovic et al., High-performance RMA-based broadcast on the Intel SCC . SPAA’12

spcl.inf.ethz.ch @spcl_eth R EMOTE M EMORY A CCESS P ROGRAMMING  Enables significant speedups over message passing in many types of applications, e.g.:  Speedup of ~1.5 for communication patterns in irregular workloads [1] R. Gerstenberger et al. Enabling Highly-Scalable Remote Memory Access Programming with MPI-3 One-Sided. SC13 [2] D. Petrovic et al., High-performance RMA-based broadcast on the Intel SCC . SPAA’12

spcl.inf.ethz.ch @spcl_eth R EMOTE M EMORY A CCESS P ROGRAMMING  Enables significant speedups over message passing in many types of applications, e.g.:  Speedup of ~1.5 for communication patterns in irregular workloads  Speedup of ~1.4-2 in physics computations [1] R. Gerstenberger et al. Enabling Highly-Scalable Remote Memory Access Programming with MPI-3 One-Sided. SC13 [2] D. Petrovic et al., High-performance RMA-based broadcast on the Intel SCC . SPAA’12

spcl.inf.ethz.ch @spcl_eth RMA VS . M ESSAGE P ASSING RMA: Process q Process p A put Memory Memory A A flush

spcl.inf.ethz.ch @spcl_eth RMA VS . M ESSAGE P ASSING RMA: Process q Process p A put Memory Memory A A flush Message Passing:

spcl.inf.ethz.ch @spcl_eth RMA VS . M ESSAGE P ASSING RMA: Process q Process p A put Memory Memory A A flush Message Passing: Process q Process p A message Memory Memory A A A

spcl.inf.ethz.ch @spcl_eth RMA VS . M ESSAGE P ASSING  Communication in RMA is one-sided RMA: Process q Process p A put Memory Memory A A flush Message Passing: Process q Process p A message Memory Memory A A A

spcl.inf.ethz.ch @spcl_eth RMA VS . M ESSAGE P ASSING  Communication in RMA is one-sided RMA: Process q Process p put A put Memory Memory A A flush Message Passing: Process q Process p A message Memory Memory A A A

spcl.inf.ethz.ch @spcl_eth RMA VS . M ESSAGE P ASSING no active  participation, Communication in RMA is one-sided direct access to memory RMA: Process q Process p put A put Memory Memory A A flush Message Passing: Process q Process p A message Memory Memory A A A

spcl.inf.ethz.ch @spcl_eth RMA VS . M ESSAGE P ASSING no active  participation, Communication in RMA is one-sided direct access to memory RMA: Process q Process p put A put Memory Memory A A flush explicit receive, Message Passing: possible queueing Process q Process p send A message Memory Memory A A A

spcl.inf.ethz.ch @spcl_eth R EMOTE M EMORY A CCESS P ROGRAMMING [1] R. Gerstenberger et al. Enabling Highly-Scalable Remote Memory Access Programming with MPI-3 One-Sided. SC13

spcl.inf.ethz.ch @spcl_eth R EMOTE M EMORY A CCESS P ROGRAMMING  Is it ideal? [1] R. Gerstenberger et al. Enabling Highly-Scalable Remote Memory Access Programming with MPI-3 One-Sided. SC13

spcl.inf.ethz.ch @spcl_eth R EMOTE M EMORY A CCESS P ROGRAMMING No hash collision:  Is it ideal? How to enable it?  1 remote atomic  Consider an insert in a  Up to 5x speedup over MP [1] distributed hashtable... Proc p Proc q A hash collision: Use and extend I/O  4 remote atomics + 2 remote puts MMUs and their paging  Significant performance drops capabilities Use “active” semantics Local execution; triggered by an active access . In RMA? [1] R. Gerstenberger et al. Enabling Highly-Scalable Remote Memory Access Programming with MPI-3 One-Sided. SC13

spcl.inf.ethz.ch @spcl_eth U SE SEMANTICS FROM A CTIVE M ESSAGES (AM) [1] Process p AM++[2] We use it in syntax & GASNet [3] Process q semantics to enable the “active” behavior Memory We need active puts/gets: A’s addr: Handler A ...  Invoke a handler upon accessing a given page Z’s addr: Handler Z  Preserve one-sided RMA behavior [1] T. von Eicken et al. Active messages: a mechanism for integrated communication and computation . ISCA’92. [2] J. J. Willcock et al. AM++: A generalized active message framework . PACT’ 10. [3] D. Bonachea, GASNet Specification, v1.1. Berkeley Technical Report. 2002.

spcl.inf.ethz.ch @spcl_eth U SE I NPUT /O UTPUT M EMORY M ANAGEMENT U NITS Main memory Physical Physical addresses addresses IOMMU MMU We propose it as a way to implement the “active” Device Virtual IOTLB TLB addresses behavior addresses I/O devices CPU +

spcl.inf.ethz.ch @spcl_eth We could use it somehow. But … IOMMU S AND RMA 10 MSI An RDMA CPU IOMMU packet 1 3 11 SMT cores 4 ... Dev-to-PT NIC cache 6 IOTLB No multiplexing 2 No parallelism (single log)... BAD PCIe packets (single log)... BAD Main memory 9 Remapping structures System-wide fault log 12 5 W ... User Fault entry Fault entry 8 R Dev-to-PT handlers Handler A ... 7 Data is discarded... PT Extremely BAD

spcl.inf.ethz.ch @spcl_eth Stores addresses of each access log A CTIVE P UTS MSI An RDMA CPU IOMMU packet SMT cores ... Dev-to-PT NIC cache Access log table + IOTLB Decide on PCIe packets keeping/discarding the entry/data Main memory Remapping structures System-wide fault log W ... User Fault entry Fault entry R Dev-to-PT handlers + WL + Handler A WLD + Access log (private for each process) ... ... Fault entry Fault entry Enables Request Request IUID + PT data data data-centric Data can be Maps each page to programming reused an access log

spcl.inf.ethz.ch @spcl_eth A CTIVE P UTS Log both the entry and the Do not modify data of an incoming put the page Process q W = 0 Attempt to Accessed 2 WL = 1 write(X) page WLD = 1 1 IOMMU Process p Page fault! 3 (W = 0) Access log 4 Move(X) X 5 Process(X) CPU Main memory

spcl.inf.ethz.ch @spcl_eth A CTIVE G ETS MSI An RDMA CPU IOMMU packet SMT cores ... Dev-to-PT NIC cache Access log table + IOTLB PCIe packets Main memory Remapping structures System-wide fault log W ... User Fault entry Fault entry R Dev-to-PT handlers + WL + Handler A WLD + Access log (private for each process) + RL ... RLD + ... Fault entry Fault entry Request Request IUID + PT data data

Active Access: A Mechanism for High-Performance Distributed - PowerPoint PPT Presentation

spcl.inf.ethz.ch @spcl_eth Active Access: A Mechanism for High-Performance Distributed Data-Centric Computations M ACIEJ B ESTA , T ORSTEN H OEFLER spcl.inf.ethz.ch @spcl_eth R EMOTE M EMORY A CCESS (RMA) P ROGRAMMING spcl.inf.ethz.ch

Vickery-Clark-Groves Mechanism Maria Serna Fall 2016 AGT-MIRI VCG mechanism Selling one item

The Active Card An Active Mind in an Active Body More people, More Active, More often! The

Active Adversary Lecture 7 CCA Security MAC Active Adversary Active Adversary An active

Recent Advances and Techniques in Algorithmic Mechanism Design Part 2: Bayesian Mechanism Design

Few-shot Domain Adaptation 1/12 by Causal Mechanism Transfer Domain adaptation Causal mechanism

Agenda Intro to Active Learning Activity Design Resources for Active Learning Lunch with Active

Partnership event 21 st November 2019 Welcome #ActiveBradford Active Bradford Members Active

MAC. SKE in Practice. Lecture 5 Active Adversary Active Adversary An active adversary can

Active Threat on Campus Prevention & Response Active threat defined An active threat can be

Multi-Task Active Learning Yi Zhang Outline Active Learning Multi-Task Active Learning

Symmetric Active/Active High Availability for High-Performance Computing System Services:

The European Union Civil Protection Mechanism Union CP Mechanism from a Practitioners

and the Financial Mechanism of the European Economic Area (EEA) in Croatia Tuesday, 25 April

The SDL adjustment mechanism 28 June 2018 Outline of presentation What is the SDL adjustment

Voting and Mechanism Design Jos e M Vidal Department of Computer Science and Engineering,

Dynamic Mechanism Design Tutorial Susan Athey July 7, 2009 Susan Athey () Dynamic Mechanism

Defense Mechanism against Spectre Attacks Jacob Fustos, Farzad Farshchi, Heechul Yun University

Dragonblood: A Security Analysis of WPA3s SAE Handshake Mathy Vanhoef and Eyal Ronen WAC

ROUNDERS (1998) CASINO ROYALE (2006) HAND RANKINGS HIGH CARD HAND RANKINGS PAIR HIGH CARD

Linux Performance Analysis New Tools and Old Secrets Brendan Gregg Senior Performance Architect

Installing slides into electronic enclosures Installing slides into electronic enclosures 2012

Drive-Thru: Drive-Thru: Fast, Accurate Evaluation of Fast, Accurate Evaluation of Storage Power

Transaction Management Part II: Recovery vanilladb.org Todays Topic: Recovery Mgr VanillaCore

Walking Through Walls PostgreSQL FreeBSD tmunro@freebsd.org tmunro@postgresql.org

Active Access: A Mechanism for High-Performance Distributed - PowerPoint PPT Presentation

spcl.inf.ethz.ch @spcl_eth Active Access: A Mechanism for High-Performance Distributed Data-Centric Computations M ACIEJ B ESTA , T ORSTEN H OEFLER spcl.inf.ethz.ch @spcl_eth R EMOTE M EMORY A CCESS (RMA) P ROGRAMMING spcl.inf.ethz.ch

Vickery-Clark-Groves Mechanism Maria Serna Fall 2016 AGT-MIRI VCG mechanism Selling one item

The Active Card An Active Mind in an Active Body More people, More Active, More often! The

Active Adversary Lecture 7 CCA Security MAC Active Adversary Active Adversary An active

Recent Advances and Techniques in Algorithmic Mechanism Design Part 2: Bayesian Mechanism Design

Few-shot Domain Adaptation 1/12 by Causal Mechanism Transfer Domain adaptation Causal mechanism

Agenda Intro to Active Learning Activity Design Resources for Active Learning Lunch with Active

Partnership event 21 st November 2019 Welcome #ActiveBradford Active Bradford Members Active

MAC. SKE in Practice. Lecture 5 Active Adversary Active Adversary An active adversary can

Active Threat on Campus Prevention &amp; Response Active threat defined An active threat can be

Multi-Task Active Learning Yi Zhang Outline Active Learning Multi-Task Active Learning

Symmetric Active/Active High Availability for High-Performance Computing System Services:

The European Union Civil Protection Mechanism Union CP Mechanism from a Practitioners

and the Financial Mechanism of the European Economic Area (EEA) in Croatia Tuesday, 25 April

The SDL adjustment mechanism 28 June 2018 Outline of presentation What is the SDL adjustment

Voting and Mechanism Design Jos e M Vidal Department of Computer Science and Engineering,

Dynamic Mechanism Design Tutorial Susan Athey July 7, 2009 Susan Athey () Dynamic Mechanism

Defense Mechanism against Spectre Attacks Jacob Fustos, Farzad Farshchi, Heechul Yun University

Dragonblood: A Security Analysis of WPA3s SAE Handshake Mathy Vanhoef and Eyal Ronen WAC

ROUNDERS (1998) CASINO ROYALE (2006) HAND RANKINGS HIGH CARD HAND RANKINGS PAIR HIGH CARD

Linux Performance Analysis New Tools and Old Secrets Brendan Gregg Senior Performance Architect

Installing slides into electronic enclosures Installing slides into electronic enclosures 2012

Drive-Thru: Drive-Thru: Fast, Accurate Evaluation of Fast, Accurate Evaluation of Storage Power

Transaction Management Part II: Recovery vanilladb.org Todays Topic: Recovery Mgr VanillaCore

Walking Through Walls PostgreSQL FreeBSD tmunro@freebsd.org tmunro@postgresql.org

Active Threat on Campus Prevention & Response Active threat defined An active threat can be