pblcache pblcache
play

PBLCACHE PBLCACHE A client side persistent block cache for the data - PowerPoint PPT Presentation

PBLCACHE PBLCACHE A client side persistent block cache for the data center Vault Boston 2015 - Luis Pabn - Red Hat ABOUT ME ABOUT ME LUIS PAB N LUIS PAB N Principal Software Engineer, Red Hat Storage IRC, GitHub: lpabon QUESTIONS:


  1. PBLCACHE PBLCACHE A client side persistent block cache for the data center Vault Boston 2015 - Luis Pabón - Red Hat

  2. ABOUT ME ABOUT ME LUIS PABÓ N LUIS PABÓ N Principal Software Engineer, Red Hat Storage IRC, GitHub: lpabon

  3. QUESTIONS: QUESTIONS: What are the bene fi ts of client side persistent caching? How to effectively use the SSD? Compute Node Storage SSD

  4. MERCURY* MERCURY* Use in memory Write Increase storage Cache must be data structures sequentially to backend availability persistent since to handle cache the SSD by reducing read warming could be misses as requests time consuming quickly as possible * S. Byan, et al., Mercury: Host-side fl ash caching for the data center

  5. M E RC U RY Q E M U I N T EG RATION M E RC U RY Q E M U I N T EG RATION

  6. PBLCACHE PBLCACHE

  7. PBLCACHE PBLCACHE P ersistent BL ock Cache Persistent, block based, look aside cache for QEMU User space library/application Based on ideas described in the Mercury paper Requires exclusive access to mutable objects

  8. GOAL: QEMU SHARED CACHE GOAL: QEMU SHARED CACHE

  9. PBLCACHE ARCHITECTURE PBLCACHE ARCHITECTURE PBL Application Cache Map Log SSD

  10. PBL APPLICATION PBL APPLICATION Sets up the cache map and log Decides how to use the cache (writethrough, read-miss) Inserts, retrieves, or invalidates blocks from the cache Pbl App Msg Queue Cache map Log

  11. CACHE MAP CACHE MAP Composed of two data structures Maintains all block metadata Address Map Block Descriptor Array

  12. ADDRESS MAP ADDRESS MAP Implemented using as a hash table Translates object blocks to Block Descriptor Array (BDA) indeces Cache misses are determined extremely fast Address Map Block Descriptor Array

  13. BLOCK DESCRIPTOR ARRAY BLOCK DESCRIPTOR ARRAY Contains metadata for blocks stored in the log Length is equal to the maximum number of blocks stored in the log Handles CLOCK evictions Invalidations are extremely fast Address Map Block Descriptor Array Insertions always append

  14. CACHE MAP I/O FLOW CACHE MAP I/O FLOW Block Descriptor Array

  15. CACHE MAP I/O FLOW CACHE MAP I/O FLOW Get In address map No Yes Miss Hit Set CLOCK bit in BDA Read from log

  16. CACHE MAP I/O FLOW CACHE MAP I/O FLOW Invalidate Free BDA index Delete from map

  17. LOG LOG Block location determined by BDA CLOCK optimized with segment read-ahead Segment pool with buffered writes Contiguous block support Segments SSD

  18. LOG SEGMENT STATE MACHINE LOG SEGMENT STATE MACHINE

  19. LOG READ I/O FLOW LOG READ I/O FLOW Read In a segment? Yes No Read from segment Read from SSD

  20. PERSISTENT METADATA PERSISTENT METADATA Save address map to a fi le on application shutdown Cache warm on application restart Not designed to be durable System crash will cause metadata fi le not to be created

  21. PBLIO BENCHMARK PBLIO BENCHMARK PBL APPLICATION PBL APPLICATION

  22. PBLIO PBLIO Benchmark tool Uses an enterprise workload workload generator from NetApp* Cache setup as write through Can be used with or without pblcache Documentation https://github.com/pblcache/pblcache/wiki/Pblio * S. Daniel et al., A portable, open-source implementation of the SPC-1 workload * https://github.com/lpabon/goioworkload

  23. ENTERPRISE WORKLOAD ENTERPRISE WORKLOAD Synthetic OLTP enterprise workload generator Tests for maximum number of IOPS before exceeding 30ms latency Divides storage system into three logical storage units: ASU1 - Data Store - 45% of total storage - RW ​ ASU2 - User Store - 45% of total storage - RW ASU3 - Log - 10% of total storage - Write Only BSU - Business Scaling Units 1 BSU = 50 IOPS

  24. S IM P L E E XAM P L E S IM P L E E XAM P L E $ fallocate -l 45MiB file1 $ fallocate -l 45MiB file2 $ fallocate -l 10MiB file3 $ $ ./pblio -asu1=file1 \ -asu2=file2 \ -asu3=file3 \ -runlen=30 -bsu=2 ----- pblio ----- Cache : None ASU1 : 0.04 GB ASU2 : 0.04 GB ASU3 : 0.01 GB BSUs : 2 Contexts: 1 Run time: 30 s ----- Avg IOPS:98.63 Avg Latency:0.2895 ms

  25. RAW D EVICES E XAMPL E RAW D EVICES E XAMPL E $ ./pblio -asu1=/dev/sdb,/dev/sdc,/dev/sdd,/dev/sde \ -asu2=/dev/sdf,/dev/sdg,/dev/sdh,/dev/sdi \ -asu3=/dev/sdj,/dev/sdk,/dev/sdl,/dev/sdm \ -runlen=30 -bsu=2

  26. CACHE EXAMPLE CACHE EXAMPLE $ fallocate -l 10MiB mycache $ ./pblio -asu1=file1 -asu2=file2 -asu3=file3 \ -runlen=30 -bsu=2 -cache=mycache ----- pblio ----- Cache : mycache (New) C Size : 0.01 GB ASU1 : 0.04 GB ASU2 : 0.04 GB ASU3 : 0.01 GB BSUs : 2 Contexts: 1 Run time: 30 s ----- Avg IOPS:98.63 Avg Latency:0.2573 ms Read Hit Rate: 0.4457 Invalidate Hit Rate: 0.6764 Read hits: 1120 Invalidate hits: 347 Reads: 2513 Insertions: 1906 Evictions: 0 Invalidations: 513 == Log Information == Ram Hit Rate: 1.0000 Ram Hits: 1120 Buffer Hit Rate: 0.0000 Buffer Hits: 0 Storage Hits: 0 Wraps: 1 Segments Skipped: 0 Mean Read Latency: 0.00 usec Mean Segment Read Latency: 4396.77 usec Mean Write Latency: 1162.58 usec

  27. L ATENCY OVER 30MS L ATENCY OVER 30MS ----- pblio ----- Cache : /dev/sdg (Loaded) C Size : 185.75 GB ASU1 : 673.83 GB ASU2 : 673.83 GB ASU3 : 149.74 GB BSUs : 32 Contexts: 1 Run time: 600 s ----- Avg IOPS:1514.92 Avg Latency:112.1096 ms Read Hit Rate: 0.7004 Invalidate Hit Rate: 0.7905 Read hits: 528539 Invalidate hits: 120189 Reads: 754593 Insertions: 378093 Evictions: 303616 Invalidations: 152039 == Log Information == Ram Hit Rate: 0.0002 Ram Hits: 75 Buffer Hit Rate: 0.0000 Buffer Hits: 0 Storage Hits: 445638 Wraps: 0 Segments Skipped: 0 Mean Read Latency: 850.89 usec Mean Segment Read Latency: 2856.16 usec Mean Write Latency: 6472.74 usec

  28. EVALUATION EVALUATION

  29. TEST SETUP TEST SETUP Client using 180GB SAS SSD (about 10% of workload size) GlusterFS 6x2 Cluster 100 fi les for each ASU pblio v0.1 compiled with go1.4.1 Each system has: Fedora 20 6 Intel Xeon E5-2620 @ 2GHz 64 GB RAM 5 300GB SAS Drives 10Gbit Network

  30. CACHE WARMUP IS TIME CACHE WARMUP IS TIME COMSU MIN G COMSU MIN G 16 hours

  31. I N C R E AS E D R ES PO NS E TIM E I N C R E AS E D R ES PO NS E TIM E 73% Increase

  32. STO RAG E BAC K E N D I O PS STO RAG E BAC K E N D I O PS REDUCTION REDUCTION BSU = 31 or 1550 IOPS ~75% IOPS Reduction

  33. CURRENT STATUS CURRENT STATUS

  34. M I L ESTO N ES M I L ESTO N ES 1. Create Cache Map - COMPLETED 2. Create Log - COMPLETED 3. Create Benchmark application - COMPLETED 4. Design pblcached architecture - IN PROGRESS

  35. NEXT: QEMU SHARED CACHE NEXT: QEMU SHARED CACHE Work with the community to bring this technology to QEMU Possible architecture: Some conditions to think about: VM migration Volume deletion VM crash

  36. FUTURE FUTURE Hyperconvergence Peer-cache Writeback Shared cache QoS using mClock* Possible integrations with Ceph and GlusterFS backends * A. Gulati et al., mClock: Handling Throughput Variability for Hypervisor IO Scheduling

  37. JOIN! JOIN! Github: https://github.com/pblcache/pblcache IRC Freenode: #pblcache Google Group: https://groups.google.com/forum/#!forum/pblcache Mail list: pblcache@googlegroups.com

  38. FROM THIS... FROM THIS...

  39. TO THIS TO THIS

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend