network performance workshop
play

Network Performance Workshop Memory bottlenecks Jesper Dangaard - PowerPoint PPT Presentation

Part of: Network Performance Workshop Memory bottlenecks Jesper Dangaard Brouer Principal Engineer, Red Hat Date: April 2017 Venue: NetDevConf 2.1, Montreal, Canada 1 Network Performance Workshop, NetDev 1.2 Memory vs. Networking


  1. Part of: Network Performance Workshop Memory bottlenecks Jesper Dangaard Brouer Principal Engineer, Red Hat Date: April 2017 Venue: NetDevConf 2.1, Montreal, Canada 1 Network Performance Workshop, NetDev 1.2

  2. Memory vs. Networking ● Network provoke bottlenecks in memory allocators ● Lots of work needed in MM-area ● SLAB/SLUB area ● Basically done via bulk APIs ● Page allocator current limiting XDP ● Baseline performance too slow ● Drivers implement page recycle caches ● Can we generalize this? ● And integrate this into page allocator? 2 Network Performance Workshop, NetDev 1.2

  3. Cost when page order increase (Kernel 4.11-rc1) ● Page allocator perf vs. size ● Per CPU cache order-0 1200 Cycles ● No cache > order-0 Cycles per 4K 10G budget 1000 ● Order to size: ● 0=4K, 1=8K, 2=16K 800 ● Yellow line Amortize cost per 4K 600 ● Trick used by some drivers ● 400 Want to avoid this trick: ● Attacker pin down memory ● 200 Bad for concurrent workload ● Reclaim/compaction stalls ● 0 Order-0 Order-1 Order-2 Order-3 Order-4 Order-5 Order-6 3 Network Performance Workshop, NetDev 1.2

  4. Issues with: Higher order pages ● Performance workaround: ● Alloc larger order page, handout fragments ● Amortize alloc cost over several packets ● Troublesome ● 1. fast sometimes and other times require reclaim/compaction which can stall for prolonged periods of time. ● 2. clever attacker can pin-down memory ● Especially relevant for end-host TCP/IP use-case ● 3. does not scale as well, concurrent workloads 4 Network Performance Workshop, NetDev 1.2

  5. Driver page recycling ● All high-speed NIC drivers do page recycling ● Two reasons: ● 1. page allocator is too slow ● 2. Avoiding DMA mapping cost ● Different variations per driver ● Want to generalize this ● Every driver developer is reinventing a page recycle mechanism 5 Network Performance Workshop, NetDev 1.2

  6. Page pool: Generic recycle cache ● Basic concept for the page_pool ● Pages are recycled back into originating pool ● At put_page() time ● Drivers still need to handle dma_sync part ● Page-pool handle dma_map/unmap ● essentially: constructor and destructor calls 6 Network Performance Workshop, NetDev 1.2

  7. The end ● kfree_bulk(7, slides); 7 Network Performance Workshop, NetDev 1.2

  8. Page pool: Generic solution, many advantages ● 5 features of a recycling page pool (per device): 1)Faster than page-allocator speed ● As a specialized allocator require less checks 2)DMA IOMMU mapping cost removed ● Keeping page mapped (credit to Alexei) 3)Make page writable ● By predictable DMA unmap point 4)OOM protection at device level ● Feedback-loop know #outstanding pages 5)Zero-copy RX, solving memory early demux • Depend on HW filters into RX queues 8 Network Performance Workshop, NetDev 1.2

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend