the performance analysis of cache architecture based on
play

The Performance Analysis of Cache Architecture based on Alluxio - PowerPoint PPT Presentation

The Performance Analysis of Cache Architecture based on Alluxio over Virtualized Infrastructure Xu Chang, Li Zha 1 Contents Background Related Works Motivation Experiments Results Conclusion Future Work


  1. The Performance Analysis of Cache Architecture based on Alluxio over Virtualized Infrastructure � Xu Chang, Li Zha � 1

  2. Contents • Background • Related Works • Motivation • Experiments • Results • Conclusion • Future Work ��

  3. Background • Cloud Computing – Computing as a service – Application of resources on demand and payment on demand • Virtualization – Integrates and encapsulates the resources – Provide the resource in piece – Transparent to users � ��

  4. Background Traditional Architecture Compute Node Compute Node Compute Node Data Node Data Node Data Node Decoupling architecture of Decoupling vs Traditional computing and storage Compute cluster Advantage: Compute Compute Compute • More flexible Node Node Node • Overall cost is reduced Shortcoming: Data Center (Object Storage) • Performance decline Data Node Data Node Data Node � ��

  5. Related Works For making up the loss of performance • Traditional optimization method – Speed up the shuffle part of jobs with SSDs – [kambatla2014truth] [ruan2017improving] • Reduce the frequency of accessing the object storage – Construct the cache layer between applications and object storage – [shankar2017performance] [qureshi2014cache] ��

  6. Related Works Alluxio (Tachyon) • The world’s first memory speed virtual distributed storage system • Resides between computation frameworks and storage systems �� Source: https://www.alluxio.org/

  7. Motivation • Only concern about performance, do not care about cost • Cost reduction is critical • Question: – How to design the caching architecture to make the cost performance highest? ��

  8. Experiments System architecture � MapReduce MapReduce MapReduce Alluxio Alluxio Alluxio Cloud Storage �� Source: https://www.alluxio.org/

  9. Experiments Experimental environment Experiment 1: Experiment 2: Platform: AWS Platform: G-Cloud Servers: m3.2xlarge * 4 Servers: 8 cores & 30G Object storage: S3 memory * 4 Object storage: Ceph � � ��

  10. Experiments Experimental scheme • Experiment 1: – Workload: Terasort * 6 • Experiment 2: – Workload: Hive-Join * 3 • Data Size: 120G • Cost ratio of memory to SSD Memory : 8:0 � 7:1 � 5:3 � 3:5 � 1:7 � 0:8 � SSD � ��

  11. Results Experimental 1: Performance 92.00 Throughput (MB/s) � 90.00 88.00 86.00 84.00 82.00 80.00 78.00 76.00 Cost Performance 5.00 4.50 COST PERFORMANCE � 4.00 3.50 3.00 2.50 2.00 1.50 1.00 0.50 0.00 � 100%MEM 87.5%MEM 62.5%MEM 37.5%MEM 12.5%MEM 100%SSD 12.5%SSD 37.5%SSD 62.5%SSD 87.5%SSD

  12. Results Experimental 2: Performance 210 Throughput (MB/s) � 205 200 195 190 185 180 175 Cost Performance 3 COST PERFORMANCE � 2.5 2 1.5 1 0.5 0 100%MEM 87.5%MEM 62.5%MEM 37.5%MEM 12.5%MEM 100%SSD 12.5%SSD 37.5%SSD 62.5%SSD 87.5%SSD ��

  13. Conclusion • Hybrid cache architecture is recommended. • For the workload with large size of output and small size of hot data, the cost ratio of memory to SSD in cache should be around 1:7 • For the workload with small size of output and large size of hot data, the cost ratio of memory to SSD in cache should be around 5:3 ��

  14. Future Work • Study several aspects that affect the cost performance, and try to give a configuration scheme with the best cost performance • Increase workload types and application scenarios, so that the conclusion is closer to the real scene and has generality ��

  15. Q & A � Thanks! � ��

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend