swapping and embedded
play

Swapping and embedded: compression relieves the pressure? Vitaly - PowerPoint PPT Presentation

Swapping and embedded: compression relieves the pressure? Vitaly Wool Embedded Linux Conference 2016 Intro> Swapping (Paging) Paging: [OS capability of] using a secondary storage to store and retrieve data With RAM being primary


  1. Swapping and embedded: compression relieves the pressure? Vitaly Wool Embedded Linux Conference 2016

  2. Intro> Swapping (Paging) ● Paging: [OS capability of] using a secondary storage to store and retrieve data – With RAM being primary – Storing and retrieving happens on a per-page basis ● Page – Uni-size storage block, usually of size 2 n – Corresponds to a single record in page table ● Paging is only possible with VM enabled

  3. Intro> Swapping

  4. Intro> Embedded device objectives ● [very] limited RAM ● [relatively] slow storage – Using swap will hurt performance ● [relatively] small storage – Hardly is there a place for big swap ● Flash chip used as a storage – Swap on flash wears it out fast

  5. Intro> Swapping in Embedded ● Should be applicable – Constrained RAM ● But is isn't sometimes – Constrained storage ● May have adverse effects – Flash storage faster wear-out – Longer delays if the storage device is slow ● There has to be a way out...

  6. Smarter swapping> Swapping optimization: zswap ● zswap: compressed write-back cache for swapped pages – Write operation completion signaled on write-to- cache completion ● Compresses swapped-out pages and moves them into a pool – This pool is dynamically allocated in RAM ● Configurable parameters – Pool size – Compression algorithm

  7. Smarter swapping> zswap backend: zbud ● zbud: special purpose memory allocator – allocation is always per-page ● Stores up to 2 compressed pages per page – One bound to the beginning, one to the end – The in-page pages are called “buddies” ● Key characteristics – Simplicity and stability ● zbud is the allocator backend for zswap

  8. Smarter swapping> RAM as a swap storage ● Compression required – No gain otherwise – But increases CPU load ● Implementation of a [virtual] block device required ● Careful memory management is required – Should not use high-order page allocations

  9. Smarter swapping> ZRAM ● Block device for compressed data storage in RAM – Compression algorithm is configurable – Default algorithm is LZO – LZ4 is used mostly ● Usually deployed as a self-contained swap device – The size is specified in runtime (via sysfs) – Configuration is the same otherwise

  10. Smarter swapping> ZRAM vs Flash swap ● Compared on Carambola (MIPS24kc) – Details on the configuration will follow ● Standard I/O measurement tools – 'fio' with 'tiobench' script ● Results – Average read speed: 730 vs 699 (kb/s) – Average write speed: 180.5 vs 172 (kb/s) ● Difference is larger where RAM is faster

  11. Smarter swapping> zsmalloc: ZRAM backend ● Special purpose pool-based memory allocator ● Packs objects into a set of non- contiguous pages – ZRAM calls into zsmalloc to allocate space for compressed data – Compressed data is stored in scattered pages within the pool

  12. z--- in detail> zsmalloc and zbud compared zsmalloc zbud Compression ratio High (3x – 4x) Medium/Low (1.8x – 2x) CPU utilization Medium/High Medium Internal yes no fragmentation Latencies Medium/Low Low

  13. z--- in detail> zpool: a unified API ● Common API for compressed memory storage ● Any memory allocator can implement zpool API – And register in zpool ● 2 main zpool users – zbud – zsmalloc

  14. z--- in detail> zswap uses zpool API! ● zswap is now backend-independent – As long as the backend implements zpool API ● zswap can use zsmalloc – Better compression ratio – Less disk/flash utilization

  15. ZRAM moving forward> What if ZRAM used zbud? ● Persistent storage is not used anyway – Compression ratio may not be the key ● No performance degrade over time ● Less dependency on memory subsystem ● CPU utilization may get lower ● Throughput may get higher ● Latencies may get lower

  16. ZRAM moving forward> Why can't ZRAM use zbud? ● zbud can't handle PAGE_SIZE allocations – Uses small part of the page for internal structure ● Called struct zbud_header – Easy to fix: it can go to struct page ● ZRAM doesn't use zpool API – zsmalloc API fits zpool API nicely – Easy to fix: just implement it

  17. ZRAM moving forward> Allow ZRAM to use zbud ● An initiative taken by the author – Allow PAGE_SIZE allocations in ZBUD – Make ZRAM use zpool ● Two mainlining attempts ● https://lkml.org/lkml/2015/9/14/356 [1] ● https://lkml.org/lkml/2015/9/22/220 [2] – Faced strong opposition from ZRAM authors – Vendor neutrality questionable ● More attempts to come

  18. Measurements> Prerequisites ● Use fio for performance measurement – Written by Jens Axboe – Flexible and versatile ● EXT4 file system on /dev/zram0 – 50% full ● A flavor of fio 'enospc' script – Adapted for smaller block device (zram) ● 40 iterations per z--- backend (zbud/zsmalloc)

  19. Measurements> Test device 1 ● Sony Xperia Z2 – MSM8974 CPU ● 2.3 GHz Quad-Core Krait TM – 3 GB RAM ● Cyanogenmod build as of Jan 15, 2016 (12.1) – A flavor of Android 5.1.1 – Custom 3.10-based kernel

  20. Measurements> ZRAM performance: Android 200000 180000 160000 140000 120000 100000 zsmalloc zbud 80000 60000 Outcome: zbud clearly outperforms 40000 20000 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38

  21. Measurements> ZRAM latency: Android 80000 70000 Outcome: zbud outperforms again 60000 50000 zsmalloc zbud 40000 30000 20000 10000 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60 62 64 66 68 70 72 74 76 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71 73 75

  22. Measurements> ZRAM performance: Android Okay what happens in the long run, does zbud remain superior to zsmalloc?

  23. Measurements> ZRAM performance: Android 200000 200000 180000 180000 160000 160000 140000 140000 120000 120000 zsmalloc zsmalloc zbud zbud 100000 100000 80000 80000 60000 60000 Outcome: yes it does. 40000 40000 20000 20000 0 0 2 2 4 4 6 6 8 8 10 10 12 12 14 14 16 16 18 18 20 20 22 22 24 24 26 26 28 28 30 30 32 32 34 34 36 36 38 38 40 40 42 42 44 44 46 46 48 48 50 50 52 52 54 54 56 56 58 58 60 60 62 62 64 64 66 66 68 68 70 70 72 72 74 74 76 76 78 78 80 80 82 82 84 84 86 86 88 88 90 90 92 92 94 94 96 96 1 1 3 3 5 5 7 7 9 9 11 11 13 13 15 15 17 17 19 19 21 21 23 23 25 25 27 27 29 29 31 31 33 33 35 35 37 37 39 39 41 41 43 43 45 45 47 47 49 49 51 51 53 53 55 55 57 57 59 59 61 61 63 63 65 65 67 67 69 69 71 71 73 73 75 75 77 77 79 79 81 81 83 83 85 85 87 87 89 89 91 91 93 93 95 95 97 97

  24. Measurements> Test device 2 ● Intel Minnowboard Max EVB – 64bit Atom TM CPU E3815 @ 1.46GHz – DDR3 2 GB RAM – Storage 4 GB eMMC ● Debian 8.4 64 bit – Custom 4.3-based kernel

  25. ZRAM performance: x86_64 500000 450000 400000 350000 300000 zsmalloc zbud 250000 200000 150000 Outcome: obvious. 100000 50000 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39

  26. Measurements> ZRAM latency: x86_64 20000 Outcome: zbud is better again. 18000 16000 zsmalloc zbud 14000 12000 10000 8000 6000 4000 2000 0

  27. Measurements> Test device 3 ● Carambola 2 – MIPS32 24Ke – Qualcomm/Atheros AR9331 SoC – 400 MHz CPU – 64 MB DDR2 RAM – Storage 512 MB NAND flash ● OpenWRT – Git as of Jan 15, 2016 – Custom 4.3-based kernel

  28. Measurements> ZRAM performance: MIPS32 30000 25000 20000 15000 10000 zsmalloc zram 5000 Outcome: roughly equal. 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

  29. Measurements> ZRAM latency: MIPS32 50000 Outcome: more stability with zbud. 45000 40000 zsmalloc zbud 35000 30000 25000 20000 15000 10000 5000 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60 62 64 66 68 70 72 74 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71 73 75

  30. Wrap-Up ● Compressed RAM swap is a generous idea – Many systems can benefit from it ● Two implementations mainlined – Zswap: mostly targeting big systems – ZRAM: mostly for embedded / small systems ● Each has its own backend – Zswap uses zbud – ZRAM uses zsmalloc

  31. Conclusions ● Compressed RAM swap is the way out for embedded systems ● ZRAM over zbud is a good match for non-compression-ratio-demanding cases – Lower latencies – Higher throughput – Minimal aging ● Having options is good

  32. swapping completed. Questions? mailto: vitalywool@gmail.com

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend