DMA API Performance and Contention on IOMMU Enabled Environments - PowerPoint PPT Presentation

Sep 08, 2023 •326 likes •569 views

DMA API Performance and Contention on IOMMU Enabled Environments Thadeu Cascardo <cascardo@linux.vnet.ibm.com> Linux is a registered trademark of Linus Torvalds. Disclaimer There is some bias towards PPC64. Please, help me

DMA API Performance and Contention on IOMMU Enabled Environments ● Thadeu Cascardo <cascardo@linux.vnet.ibm.com> Linux is a registered trademark of Linus Torvalds.
Disclaimer There is some bias towards PPC64. Please, help me collaborate with you.
Agenda ● Why ● What ● How much ● How
Why I/O Virtualization in the form of PCI passthrough Provides: – Isolation between guests – Performance – Stability – Debug
What DMA API: – virtual address -> IO/DMA address IOMMU: – translates addresses coming from the bus into memory addresses
PVDMA Pseries use paravirtualized IOMMU KVM PVDMA didn't make into mainline (5-ish years old) Advantage: don't have to pin whole guest memory
DMA maps performance Direct PVDMA Map Adds offset Allocate IOVA, insert mapping Unmap Nothing Remove mapping, free IOVA
PVDMA Performance Hypercall cost IOVA allocation cost <- Contention
Drivers performance 10Gbps NIC device driver mapped for every packet
Drivers performance Result: 3Gbps
Drivers Optimization After allocation chunks from mapped pages
Drivers optimization Result: 9.5Gbps
Performance Numbers IOMMU only Direct DMA Direct DMA IOMMU Bitmap IOMMU Bitmap Pool IOMMU RBTree IOMMU 450 400 350 300 250 Time (s) 200 150 100 50 0 1 2 4 8 16 32 64 128 256 512 1024 Threads
Performance Numbers 1M IOMMU Ops 100 10 Time (s) 1 0.1 1 2 4 8 16 32 64 128 256 Threads
Performance Numbers 1M Direct DMA map 10 1 Time (s) 0.1 0.01 1 2 4 8 16 32 64 128 256 512 1024 Threads
Performance Numbers 1M Direct DMA IOMMU Ops 100 10 Time (s) 1 0.1 1 2 4 8 16 32 64 128 256 Threads
Performance Numbers 1M Bitmap DMA IOMMU Ops 1000 100 Time (s) 10 1 0.1 1 2 4 8 16 32 64 Threads
Performance Numbers 1M Bitmap Pool DMA IOMMU Ops 1000 100 Time (s) 10 1 0.1 1 2 4 8 16 32 64 128 256 Threads
Performance Numbers 1M RBTree DMA IOMMU Ops 1000 100 Time (s) 10 1 1 2 4 8 Threads
Performance Numbers 1M RBTree DMA IOMMU Ops on X86 1000 100 Time (s) 10 1 1 2 4 8 Threads
Sharing code ● IOMMU drivers infrastructure ● Allocation algorithm(s) ● PVDMA Guest and Host code
Future works ● Experiment with other tree-based algorithms
Conclusions ● A lower bound for allocation algorithms ● Current RBTree IOVA code has bad performance ● IOMMUs are currently underused

Download Presentation

Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend

A scalable, common IOMMU pooled allocation library for multiple architectures Sowmini Varadhan

A scalable, common IOMMU pooled allocation library for multiple architectures Sowmini Varadhan (sowmini.varadhan@oracle.com) Linuxcon North America, Seattle 2015 \ 1 Agenda What is IOMMU? Benefits/drawbacks of IOMMU Typical design

575 views • 28 slides

vIOMMU/ARM: full emulation and virtio-iommu approaches Eric Auger KVM Forum 2017 Overview

vIOMMU/ARM: full emulation and virtio-iommu approaches Eric Auger KVM Forum 2017 Overview Goals & T erminology ARM IOMMU Emulation QEMU Device VHOST Integration VFIO Integration Challenges VIRTIO-IOMMU

393 views • 25 slides

Performance Impact of Resource Contention in Multicore Systems R. Hood, H. Jin, P. Mehrotra, J.

Performance Impact of Resource Contention in Multicore Systems R. Hood, H. Jin, P. Mehrotra, J. Chang, J. Djomehri, S. Gavali, D. Jespersen, K. Taylor, R. Biswas Commodity Multicore Chips in NASA HEC 2004: Columbia Itanium2 based;

661 views • 20 slides

When and how VOTM can improve performance in contention situations Kai-Cheung Leung Yawen Chen

When and how VOTM can improve performance in contention situations Kai-Cheung Leung Yawen Chen Zhiyi Huang University of Otago New Zealand P2S2 2012 Locks vs Transactional Memory (TM) Parallel programming is becoming mainstream

561 views • 18 slides

On the Performance of Window-Based Contention Managers for Transactional Memory Gokarna Sharma

On the Performance of Window-Based Contention Managers for Transactional Memory Gokarna Sharma and Costas Busch Louisiana State University Agenda Introduction and Motivation Previous Studies and Limitations Execution Window Model

571 views • 23 slides

Improving performance for Improving performance for security enabled web security enabled web

Improving performance for Improving performance for security enabled web security enabled web services services - Dr. Colm higeartaigh - Dr. Colm higeartaigh Agenda Introduction to Apache CXF WS-Security in CXF 3.0.0

490 views • 44 slides

Automatic Identifjcation and Precise Attribution of DRAM Bandwidth Contention Christian Helm and

Automatic Identifjcation and Precise Attribution of DRAM Bandwidth Contention Christian Helm and Kenjiro T aura The University of T okyo 2017-01-12 1 Performance Optimization Applications rarely reach peak performance of hardware

808 views • 32 slides

Impac&ng Campus Performance Advancing the Spa&ally Enabled

Impac&ng Campus Performance Advancing the Spa&ally Enabled Smart Campus Jim Nelson Director of Planning Resources Harvard Planning and Project Management

745 views • 23 slides

awareness Contention between neighbors in carrier- sensing range (c- B C A neighbors)

Motivation: Contention- awareness Contention between neighbors in carrier- sensing range (c- B C A neighbors) Transmission at a node may consume bandwidth at its c- neighbors A new flow may affects the QoS of existing

706 views • 24 slides

Shuffling: A Lock Contention Aware Thread Scheduling Technique Kishore Pusukuri Multicores are

Shuffling: A Lock Contention Aware Thread Scheduling Technique Kishore Pusukuri Multicores are Ubiquitous Deliver computing power via parallelism Potential for delivering high performance for multithreaded applications Mobile phones

736 views • 20 slides

Energy S gy Savings Performance C Contr tracts ts (ESPCs) Drive E e Efficien ency-Enabled

Energy S gy Savings Performance C Contr tracts ts (ESPCs) Drive E e Efficien ency-Enabled New I Infrastr truc uctur ture Lesli lie Nicholls ls Acting g Direc ector, F Fed eder eral al E Ener ergy gy M Man anagem agemen

402 views • 27 slides

An Agile Approach to Building a GPU-enabled and Performance- portable Global Cloud-resolving

An Agile Approach to Building a GPU-enabled and Performance- portable Global Cloud-resolving Atmospheric Model Dr. Richard Loft* Director, Technology Development CISL/NCAR *National Center for Atmospheric Research GTC, San Jose, CA March 26,

295 views • 14 slides

Mastering the DMA and IOMMU APIs Embedded Linux Conference Europe 2014 Dsseldorf Laurent

Mastering the DMA and IOMMU APIs Embedded Linux Conference Europe 2014 Dsseldorf Laurent Pinchart laurent.pinchart@ideasonboard.com DMA != DMA DMA != DMA (mapping) (engine) The topic we will focus on is how to manage system

1.45k views • 102 slides

ETHICS & FAIRNESS IN AI- ETHICS & FAIRNESS IN AI- ENABLED SYSTEMS ENABLED SYSTEMS

ETHICS & FAIRNESS IN AI- ETHICS & FAIRNESS IN AI- ENABLED SYSTEMS ENABLED SYSTEMS Christian Kaestner (with slides from Eunsuk Kang) Required reading: R. Caplan, J. Donovan, L. Hanson, J. Matthews. " Algorithmic Accountability:

1.41k views • 75 slides

Contention-Related Crash Failures Anas Durand LIP6, Sorbonne Universit, Paris April 1st,

Contention-Related Crash Failures Anas Durand LIP6, Sorbonne Universit, Paris April 1st, 2019 1 / 25 Anas Durand Contention-Related Crash Failures Set Agreement and Renaming in the Presence of Contention-Related Crash Failures SSS 2018

1.02k views • 52 slides

SOFTWARE ARCHITECTURE SOFTWARE ARCHITECTURE OF AI-ENABLED SYSTEMS OF AI-ENABLED SYSTEMS

SOFTWARE ARCHITECTURE SOFTWARE ARCHITECTURE OF AI-ENABLED SYSTEMS OF AI-ENABLED SYSTEMS Christian Kaestner Required reading: Hulten, Geoff. " Building Intelligent Systems: A Guide to Machine Learning Engineering. " Apress, 2018,

924 views • 79 slides

PolarDB Cloud Native DB @ Alibaba Lixun Peng Inaam Rana Alibaba Cloud Team Agenda

PolarDB Cloud Native DB @ Alibaba Lixun Peng Inaam Rana Alibaba Cloud Team Agenda Context Architecture Internals HA Context PolarDB is a cloud native DB offering Based on MySQL-5.6 Uses shared storage

1.27k views • 25 slides

L-Store: A Real-time OLTP and OLAP System Mohammad Sadoghi Souvik Bhattacharjee ,

Motivations L-Store Evaluation Conclusions L-Store: A Real-time OLTP and OLAP System Mohammad Sadoghi Souvik Bhattacharjee , Bishwaranjan Bhattacharjee # , Mustafa Canim # Exploratory Systems Lab University of California, Davis

1.38k views • 40 slides

Exploration of Influence of Program Inputs on CMP Co-Scheduling Yunlian Jiang Xipeng Shen

Exploration of Influence of Program Inputs on CMP Co-Scheduling Yunlian Jiang Xipeng Shen Computer Science The College of William and Mary, USA Cache sharing in CMP Commercial CMPs Intel Core 2 Duo E6750 CPU CPU AMD Athlon X2

730 views • 25 slides

Database Systems Do Not Scale to 1000 CPU Cores And Other Tales of the Macabre @ andy_pavlo 2

Database Systems Do Not Scale to 1000 CPU Cores And Other Tales of the Macabre @ andy_pavlo 2 Three million children die per year due to poor nutrition. Source: http://www.wfp.org/hunger/stats 3 Three days after you die, stomach enzymes

843 views • 40 slides

IS TOPOLOGY IMPORTANT AGAIN? Effects of contention on message latencies in large supercomputers

IS TOPOLOGY IMPORTANT AGAIN? Effects of contention on message latencies in large supercomputers Abhinav S Bhatele and Laxmikant V Kale Parallel Programming Laboratory, UIUC Outline Why should we consider topology aware mapping for optimizing

371 views • 22 slides

Transactional Execution of Java Programs Brian D. Carlstrom, JaeWoong Chung, Hassan Chafi, Austen

Transactional Execution of Java Programs Brian D. Carlstrom, JaeWoong Chung, Hassan Chafi, Austen McDonald Chi Cao Minh, Lance Hammond, Christos Kozyrakis, Kunle Olukotun Computer Systems Laboratory Stanford University http://tcc.stanford.edu

410 views • 17 slides

Replication and Consistency 08 Spin Locking and Contention Annette Bieniusa AG Softech FB

Replication and Consistency 08 Spin Locking and Contention Annette Bieniusa AG Softech FB Informatik TU Kaiserslautern Annette Bieniusa Replication and Consistency 1/ 76 Thank you! These slides are based on companion material of the

1.28k views • 84 slides

Communication Models for Resource Constrained Hierarchical Ethernet Networks Speaker: Konstantinos

Communication Models for Resource Constrained Hierarchical Ethernet Networks Speaker: Konstantinos Katrinis # Jun Zhu + , Alexey Lastovetsky * , Shoukat Ali # , Rolf Riesen # + Technical University of Eindhoven, Netherlands * University College

352 views • 18 slides