CoMerge Toward Efficient Data Placement in Shared Heterogeneous - PowerPoint PPT Presentation

CoMerge Toward Efficient Data Placement in Shared Heterogeneous Memory Systems Thaleia Dimitra Doudali Ada Gavrilovska

Motivation Performance slowdown in heterogeneous memory systems. Application data objects How to reduce ➡ the slowdown? ➡ higher access latency ⇒ performance slowdown from ‘all-data-in-DRAM’ cost ↑ DRAM Non Volatile Memory DRAM Heterogenous Memory Subsystem 2 MEMSYS 17

Existing Solutions Data tiering that maximizes DRAM accesses. Application ➡ Think about data objects which objects data objects ➡ get allocated ➡ Existing Solutions in DRAM. 1. X-Mem - Dulloor et al. 2. Dataplacer - Shen et al. 3. Valgrind extension - Peña, Balaji. more memory requests with lower latency DRAM Non Volatile Memory DRAM Heterogenous Memory Subsystem 3 MEMSYS 17

Problem Statement Limited Utility of Existing Solutions in Shared Systems. Application 1 Application 2 Which objects should now be data objects data objects ➡ ➡ in DRAM? Shared Memory System DRAM Non Volatile Memory Do the partitioning techniques using existing solutions: ⇒ NO ● Reduce the slowdown across all collocated applications? ● Maximize DRAM utilization? 4 MEMSYS 17

Our Contributions What do we need to do differently? 1. Sorting objects within one application : co-benefit metric captures: a. Exact contribution of a data object to overall application runtime. b. Overall application sensitivity to execution over Non-Volatile Memory. 2. Distributing DRAM across applications: CoMerge memory sharing technique. a. Mitigates slowdown across all collocated DRAM applications. b. Maximizes the DRAM usage. 5 MEMSYS 17

Observations What are we going to see next? 1. Not all applications are slowed down in the same degree when accessing Non Volatile Memory. 2. Not all data objects of an application help reduce the performance slowdown, when placed in DRAM. Polybench Benchmarks CORAL Suite of mini-apps ● 30 simple algebraic kernels. ● 3 HPC representative kernels. ● Single-threaded. ● Multi-threaded. OpenMP. Hardware Testbed Emulate Non Volatile Memory for various combinations of reduced bandwidth and emulated DRAM NVM increased latency . e.g. B 0.5 : L 2 CPU 0.5 times less bandwidth : 2 times more latency 6 MEMSYS 17

Overall Application Sensitivity Do all applications get slowed down in the same way when accessing Non Volatile Memory? High Medium Low None Performance slowdown across Polybench/C, normalized to ‘all-data-in-DRAM’ execution. Applications show different levels of sensitivity to execution over slower memory components. 7 MEMSYS 17

Data Object Sensitivity Do all data objects help minimize the slowdown, when allocated in DRAM? fixed NVM at B 0.2 : L 5 2 2 2 1 3 1 3 3 Observations 1. For non or low sensitive apps, doesn’t matter which object is in DRAM. 2. Different data objects can contribute equally to the application runtime. 3. There can be objects whose allocation in DRAM is the only way to mitigate slowdown. 8 MEMSYS 17

Co-Benefit Metric Can we capture the previous observations? F = S/F S coB(O) t(O) F = 1 F B(O) Scale Normalize S = 0 S = 0 Run Objects in How much does a specific How can we make sure that Time DRAM object help reduce the objects of higher sensitivity F All slowdown? kernels are getting prioritized? t(O) object O S None coB(O) = 0.9 * low sensitivity = 0.9 e.g. B(O) = 0.9 ⇒ coB(O) = 0.9 * high sensitivity = 3.9 9 MEMSYS 17

DRAM Distribution What are the goals of an efficient technique? Runtime { sharing Overall 1. Minimize overall runtime Slowdown data tiering slowdown across all applications. All-in-DRAM Collocation Object 1 unutilized 2. Maximize the utilization of DRAM. Object 2 Object 3 DRAM 10 MEMSYS 17

Sharing DRAM Sorting objects using co-benefit metric. jacobi-2d adi Fair Merge high low sensitivity sensitivity CoMerge Fair CoMerge Fair CoMerge DRAM 11 MEMSYS 17

Summary More detailed analysis in the paper Equal Split Proportional Split unused Partitioning & existing solutions xsbench clomp stream xsbench clomp stream 7x 6x slowdown Fair Merge CoMerge unused Sharing & co-benefit metric 2.7x 2.6x slowdown Co-Benefit metric allows CoMerge to achieve: ● Lower runtime across all collocated applications. ● Higher DRAM utilization. 12 MEMSYS 17

CoMerge Toward Efficient Data Placement in Shared Heterogeneous - PowerPoint PPT Presentation

CoMerge Toward Efficient Data Placement in Shared Heterogeneous Memory Systems Thaleia Dimitra Doudali Ada Gavrilovska Motivation Performance slowdown in heterogeneous memory systems. Application data objects How to reduce the

A Lower Bound for the Distributed Lovsz Local Lemma Sebastian Brandt, Orr Fischer, Juho

Generating Efficient Data Movement Code for Heterogeneous Architectures with Distributed-Memory

IV. Adiabatic Processes IV. Adiabatic Processes If a material undergoes a change in its physical

Encoding of Phonology in an RNN model of Grounded Speech Afra Alishahi, Marie Barking, Grzegorz

Motivation: Environmental Sensing Relative humidity and temperature (RH/T) monitoring

Model Interpretation Danish Pruthi April 28, 2020 Why interpretability? Task:

Public Key Cryptography Introduction Foundation of todays secure communication Allows

Quarterly D&O Claim Trends: End of Year Wrap-Up 1 ABOUT ADVISEN Advisen Ltd. is a

SHARK-VIS expected performances and simulations G. LI CAUSI, M. STANGALINI, S. ANTONIUCCI F.

Invariant Subspace Computation in Scientific Computing: Gramian Based Model Order Reduction

HPSG Analysis and Computational Implementation of Indonesian Passives Division of Linguistics and

Safe Harbor Statement The following is intended to outline our general product direction. It is

Disability Rights, Enabling Design and Dementia Kate Swaffer MSc, BPsych, BA, Retired nurse

Michael Blank, Sam Hanson, Jeremy Stein, and Adi Sunderam Harvard University and Harvard Business

EVALUATING THE PERFORMANCE OF THE HIPSYCL TOOLCHAIN FOR HPC KERNELS ON NVIDIA V100 GPUS

Household Magnets Household Magnets Magnets stick only to certain metals Magnets stick only

Memory-hard functions and tradeoff cryptanalysis with applications to password hashing,

constraining aspects Oral presentation, 30 August 2011 MIE 2011 Oslo, Norway Centre for Language

Introduction to Q&C and linear models revisited 58I Lab and Prof Skills II Quantitative and

BIOINFORMATICS MINOR INFORMATION SESSION Fall Quarter 2016 Published GWAS Reports, 2005

Writing a Clinical Research Manuscript that Has Impact for Experienced Researchers Faculty of

Clustered cis-regulatory elements underlie adaptive divergence in sticklebacks Felicity Jones

A novel approach for ER + breast cancer treatment: A new compound that modulates aromatase and ER

Nashville CHildren Eating Well (CHEW) for Health Principal Investigator: Baqar Husaini, Ph.D.

CoMerge Toward Efficient Data Placement in Shared Heterogeneous - PowerPoint PPT Presentation

CoMerge Toward Efficient Data Placement in Shared Heterogeneous Memory Systems Thaleia Dimitra Doudali Ada Gavrilovska Motivation Performance slowdown in heterogeneous memory systems. Application data objects How to reduce the

A Lower Bound for the Distributed Lovsz Local Lemma Sebastian Brandt, Orr Fischer, Juho

Generating Efficient Data Movement Code for Heterogeneous Architectures with Distributed-Memory

IV. Adiabatic Processes IV. Adiabatic Processes If a material undergoes a change in its physical

Encoding of Phonology in an RNN model of Grounded Speech Afra Alishahi, Marie Barking, Grzegorz

Motivation: Environmental Sensing Relative humidity and temperature (RH/T) monitoring

Model Interpretation Danish Pruthi April 28, 2020 Why interpretability? Task:

Public Key Cryptography Introduction Foundation of todays secure communication Allows

Quarterly D&amp;O Claim Trends: End of Year Wrap-Up 1 ABOUT ADVISEN Advisen Ltd. is a

SHARK-VIS expected performances and simulations G. LI CAUSI, M. STANGALINI, S. ANTONIUCCI F.

Invariant Subspace Computation in Scientific Computing: Gramian Based Model Order Reduction

HPSG Analysis and Computational Implementation of Indonesian Passives Division of Linguistics and

Safe Harbor Statement The following is intended to outline our general product direction. It is

Disability Rights, Enabling Design and Dementia Kate Swaffer MSc, BPsych, BA, Retired nurse

Michael Blank, Sam Hanson, Jeremy Stein, and Adi Sunderam Harvard University and Harvard Business

EVALUATING THE PERFORMANCE OF THE HIPSYCL TOOLCHAIN FOR HPC KERNELS ON NVIDIA V100 GPUS

Household Magnets Household Magnets Magnets stick only to certain metals Magnets stick only

Memory-hard functions and tradeoff cryptanalysis with applications to password hashing,

constraining aspects Oral presentation, 30 August 2011 MIE 2011 Oslo, Norway Centre for Language

Introduction to Q&amp;C and linear models revisited 58I Lab and Prof Skills II Quantitative and

BIOINFORMATICS MINOR INFORMATION SESSION Fall Quarter 2016 Published GWAS Reports, 2005

Writing a Clinical Research Manuscript that Has Impact for Experienced Researchers Faculty of

Clustered cis-regulatory elements underlie adaptive divergence in sticklebacks Felicity Jones

A novel approach for ER + breast cancer treatment: A new compound that modulates aromatase and ER

Nashville CHildren Eating Well (CHEW) for Health Principal Investigator: Baqar Husaini, Ph.D.

Quarterly D&O Claim Trends: End of Year Wrap-Up 1 ABOUT ADVISEN Advisen Ltd. is a

Introduction to Q&C and linear models revisited 58I Lab and Prof Skills II Quantitative and