gaining insights into multicore
play

Gaining Insights into Multicore Cache Partitioning: Bridging the - PowerPoint PPT Presentation

Gaining Insights into Multicore Cache Partitioning: Bridging the Gap between Simulation and Real Systems 1 Presented by Hadeel Alabandi 2/18/2014 Introduction and Motivation 2 A serious issue to the effective utilization of multicore


  1. Gaining Insights into Multicore Cache Partitioning: Bridging the Gap between Simulation and Real Systems 1 Presented by Hadeel Alabandi 2/18/2014

  2. Introduction and Motivation 2 • A serious issue to the effective utilization of multicore processors is cache partitioning and sharing • Simulation were used to evaluate cache partitioning in the existing studies, however, it has some limitations • Excessive simulation time • Absence of OS activities • Proneness to simulation inaccuracy 2/18/2014

  3. Introduction and Motivation (cont.) 3 • In this paper, a software approach has been used • It supports static and dynamic cache partitioning by using memory address mapping • It emulates hardware partitioning mechanism will examine cache partitioning policies on real time systems • Three metrics were used through evaluation for optimization purposes • Performance • Fairness • QoS 2/18/2014

  4. Cache Partitioning for Multicore Processors 4 • It has two interdependent parts • Mechanism • Forces cache partitioning • Provides partitioning policy input • Policy • Decides how much cache resources will be allocated to each program with an optimization objective 2/18/2014

  5. Adopted Evaluation Metrics in The Study 5 • Performance Metrics • Throughput (IPCs) • Absolute number of IPCs • Combined miss rates • Summarizes miss rates • Combined misses • Summarizes number of cache misses • QoS Metrics • Suppose that QoS constraints are never violated in their case 2/18/2014

  6. Adopted Evaluation Metrics in The Study 6 (cont.) • Fairness Metrics • Miss rates • The number of misses • The slowdown for each co- secheduled program should be identical after cache partitioning • In the study, fairness metrics related to single core execution with dedicated L2 cache • Date required for policy metric and the evaluation metric were acquired by running a workload with different cache partitioning • The result value will be in the range (-1 to 1) • If the result is 1, the correlation between the 2 metrics is perfect 2/18/2014

  7. Static OS-based Cache Partitioning 7 • Static cache partitioning policy predetermines the amount of cache blocks allocated to each program at the beginning of its execution • Page coloring will be used in the partitioning mechanism • There several bits between cache index and physical page number in the physical address • It will be used for page color • Addressed cache will be divided to non-intersecting regions by page color • Pages with the same color are mapped to the same cache region 2/18/2014

  8. Cache Partitioning – Page Coloring 8 2/18/2014

  9. Cache Partitioning – Page Coloring 9 2/18/2014

  10. Dynamic OS-based Cache Partitioning 10 • Adjust cache quotas among processes dynamically • Page recoloring procedure • Increasing the process cache resources ( i.e number of colors used by the process) • The kernel rearrange the virtual memory mapping of the process • Allocating physical pages of the new color • Copying the memory contents • Freeing the old pages • Remapping virtual pages cause performance overhead • Reduce the overall overhead by lowering the frequency of cache allocation adjustment • Another option is using lazy method of page migration, so the content of colored page is moved only when it’s accessed • Average overhead of dynamic partitioning reduced to 2% 2/18/2014 • Highest migration overhead observed 7%

  11. Page Recoloring 11 2/18/2014

  12. Dynamic Cache Partitioning Policies 12 • Cache partitioning will be adjusted periodically by the policies at the end of each epoch • Dynamic cache partitioning policy for performance • Adjust cache partitioning dynamically • Metrics • Throughput (IPCs) • Combined miss rate • Combined misses • Fair speedup • Dynamic cache partitioning policy for fairness • Two dynamic policies were implemented based on FM0 and FM4 • FM0 is the evaluation metric ( i.e. the ratio of the current cumulative IPC over the baseline IPC) 2/18/2014 • FM4 is the cache miss rates

  13. Dynamic Cache Partitioning Policies (cont.) 13 • Dynamic cache partitioning policy for QoS consideration • Two core workload of two programs • The first is the target program • The second is the partner program • QoS guarantee • Ensure the target program performance is larger than or equal to X% of a baseline execution of homogeneous workload on a dual core processor with half of the cache capacity allocated for each program • Increase the performance of the partner program 2/18/2014

  14. Experimental Methodology 14 • Hardware and software platform • Dell PowerEdge1950 • Two dual core, 3.0GHz Intel Xeon 5160 processors and 8GB fully Buffered DIMM (FB-DIMM) main memory • Shared, 4MB, 16-way set associative L2 cache • Each core has a private 32KB instruction cache and a private 32KB data cache • Red Hat Enterprise Linux 4.0 • Kernel linux-2.6.20.3 • Performance collected using pfmon 2/18/2014

  15. Evaluation Results 15 • Show the improvement with the best static partitioning of each workload over shared cache 2/18/2014

  16. The Performance – Static & Dynamic 16 2/18/2014

  17. Fairness – Correlation between Evaluation 17 Metrics and Policy Metrics 2/18/2014

  18. QoS – Static & Dynamic 18 2/18/2014

  19. Related Work 19 • Cache partitioning for multicore processors • Page Coloring 2/18/2014

  20. Summary 20 • An OS-based cache partitioning mechanism on multicore processors were designed and implemented • Using it to study different cache partitioning polices • Some simulation- based study findings’ were confirmed, however, this approach shows new insights haven’t been shown by simulation • Future work • Reduce cache partitioning overhead • Adding easy user interface • Conducting partitioning research at the compiler level for both multiprogramming and multithreaded applications 2/18/2014

  21. Discussion 21 • Does OS-based approach had provided new insights and observations that simulation couldn’t or failed to show it? 2/18/2014

  22. References 22 • Gaining Insights into Multicore Cache Partitioning:Bridging the Gap between Simulation and Real Systems • http://www.contrib.andrew.cmu.edu/~hyoseunk/pdf/ecrts13- hyos-slides.pdf • http://ftp.cs.rochester.edu/~xiao/eurosys09/euro061-zhang.pdf 2/18/2014

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend