dma api performance and contention on iommu enabled
play

DMA API Performance and Contention on IOMMU Enabled Environments - PowerPoint PPT Presentation

DMA API Performance and Contention on IOMMU Enabled Environments Thadeu Cascardo <cascardo@linux.vnet.ibm.com> Linux is a registered trademark of Linus Torvalds. Disclaimer There is some bias towards PPC64. Please, help me


  1. DMA API Performance and Contention on IOMMU Enabled Environments ● Thadeu Cascardo <cascardo@linux.vnet.ibm.com> Linux is a registered trademark of Linus Torvalds.

  2. Disclaimer There is some bias towards PPC64. Please, help me collaborate with you.

  3. Agenda ● Why ● What ● How much ● How

  4. Why I/O Virtualization in the form of PCI passthrough Provides: – Isolation between guests – Performance – Stability – Debug

  5. What DMA API: – virtual address -> IO/DMA address IOMMU: – translates addresses coming from the bus into memory addresses

  6. PVDMA Pseries use paravirtualized IOMMU KVM PVDMA didn't make into mainline (5-ish years old) Advantage: don't have to pin whole guest memory

  7. DMA maps performance Direct PVDMA Map Adds offset Allocate IOVA, insert mapping Unmap Nothing Remove mapping, free IOVA

  8. PVDMA Performance Hypercall cost IOVA allocation cost <- Contention

  9. Drivers performance 10Gbps NIC device driver mapped for every packet

  10. Drivers performance Result: 3Gbps

  11. Drivers Optimization After allocation chunks from mapped pages

  12. Drivers optimization Result: 9.5Gbps

  13. Performance Numbers IOMMU only Direct DMA Direct DMA IOMMU Bitmap IOMMU Bitmap Pool IOMMU RBTree IOMMU 450 400 350 300 250 Time (s) 200 150 100 50 0 1 2 4 8 16 32 64 128 256 512 1024 Threads

  14. Performance Numbers 1M IOMMU Ops 100 10 Time (s) 1 0.1 1 2 4 8 16 32 64 128 256 Threads

  15. Performance Numbers 1M Direct DMA map 10 1 Time (s) 0.1 0.01 1 2 4 8 16 32 64 128 256 512 1024 Threads

  16. Performance Numbers 1M Direct DMA IOMMU Ops 100 10 Time (s) 1 0.1 1 2 4 8 16 32 64 128 256 Threads

  17. Performance Numbers 1M Bitmap DMA IOMMU Ops 1000 100 Time (s) 10 1 0.1 1 2 4 8 16 32 64 Threads

  18. Performance Numbers 1M Bitmap Pool DMA IOMMU Ops 1000 100 Time (s) 10 1 0.1 1 2 4 8 16 32 64 128 256 Threads

  19. Performance Numbers 1M RBTree DMA IOMMU Ops 1000 100 Time (s) 10 1 1 2 4 8 Threads

  20. Performance Numbers 1M RBTree DMA IOMMU Ops on X86 1000 100 Time (s) 10 1 1 2 4 8 Threads

  21. Sharing code ● IOMMU drivers infrastructure ● Allocation algorithm(s) ● PVDMA Guest and Host code

  22. Future works ● Experiment with other tree-based algorithms

  23. Conclusions ● A lower bound for allocation algorithms ● Current RBTree IOVA code has bad performance ● IOMMUs are currently underused

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend