stratus clouds with microarchitectural resource management
play

Stratus: Clouds with Microarchitectural Resource Management Kaveh - PowerPoint PPT Presentation

Stratus: Clouds with Microarchitectural Resource Management Kaveh Razavi and Animesh Trivedi Once Upon a Time in the Cloud Large: 4 cores, 16 GB Medium: 2 cores, 8 GB Small: 1 core, 4 GB 2 Once Upon a Time in the Cloud Large: 4 cores, 16


  1. Stratus: Clouds with Microarchitectural Resource Management Kaveh Razavi and Animesh Trivedi

  2. Once Upon a Time in the Cloud Large: 4 cores, 16 GB Medium: 2 cores, 8 GB Small: 1 core, 4 GB 2

  3. Once Upon a Time in the Cloud Large: 4 cores, 16 GB Medium: 2 cores, 8 GB cloud Small: 1 core, 4 GB provider allocate(small) CPU tenant DRAM 3

  4. Once Upon a Time in the Cloud Large: 4 cores, 16 GB Medium: 2 cores, 8 GB cloud Small: 1 core, 4 GB provider allocate(small) CPU tenant DRAM 4

  5. Once Upon a Time in the Cloud ) m u i d e m ( e t a c tenant o Large: 4 cores, 16 GB l l a Medium: 2 cores, 8 GB cloud Small: 1 core, 4 GB provider allocate(small) CPU tenant DRAM 5

  6. Once Upon a Time in the Cloud ) m u i d e m ( e t a c o Large: 4 cores, 16 GB l l a Medium: 2 cores, 8 GB cloud Small: 1 core, 4 GB provider allocate(small) CPU Shared L3 Cache What could possibly go wrong here when DRAM two tenant share the L3 cache? 6

  7. Problems with (Unsupervised) Sharing (a) Performance (b) Security CPU Shared L3 Cache NetCAT (S&P’20): detect activity dCat (EuroSys’18): 57.6% improvements of another tenant over network for Redis with noisy neighbours 7 Cong Xu, et al. DCat: dynamic cache management for efficient, performance-sensitive infrastructure-as-a-service . ACM EuroSys 2018.

  8. Problems with (Unsupervised) Sharing (a) Performance (b) Security CPU Shared L3 Cache NetCAT (S&P’20): detect activity dCat (EuroSys’18): 57.6% improvements of another tenant over network for Redis with noisy neighbours 8 Cong Xu, et al. DCat: dynamic cache management for efficient, performance-sensitive infrastructure-as-a-service . ACM EuroSys 2018. NetCAT: Practical Cache Attacks from the Network. Kurth, M.; Gras, B.; Andriesse, D.; Giuffrida, C.; Bos, H.; and Razavi, K. In S&P, 2020.

  9. Problems with (Unsupervised) Sharing (a) Performance (b) Security CPU Shared L3 Cache Are these two examples L3 Cache specific? NetCAT (S&P’20): detect activity dCat (EuroSys’18): 57.6% improvements of another tenant over network for Redis with noisy neighbours 9 Cong Xu, et al. DCat: dynamic cache management for efficient, performance-sensitive infrastructure-as-a-service . ACM EuroSys 2018. NetCAT: Practical Cache Attacks from the Network. Kurth, M.; Gras, B.; Andriesse, D.; Giuffrida, C.; Bos, H.; and Razavi, K. In S&P, 2020.

  10. New Classes of Microarchitectural Resources Microarchitectural Resource Resource CPU Caches, TLBs, Hyperthreads, ALUs Smart NICs Caches (memory, requests, connection), TLBs, RMT pipelines, DMA engines NVM Storage Blocks, pages, internal r/w ports, programmable cores, SRAM GPUs Memories, caches, execution units In-Network SRAM and TCAM memories, Match-action Switches Unit processors, ALUs And more - TPUs, FPGAs, near-memory compute elements … 10

  11. What is happening ● Diverse microarchitectural resources are here to stay ● They have security and performance ramifications ● The root cause is unsupervised sharing (or lack of isolation) Key challenge How to manage microarchitectural resources in a principled manner? 11

  12. Stratus: Clouds with Principled Microarchitectural Resource Management Key property : isolation 12

  13. Stratus: Clouds with Principled Microarchitectural Resource Management A cloud resource allocation framework Security and performance requirements are the two sides of isolation Captures and reasons about microarchitectural resource isolation 13

  14. Stratus: Clouds with Principled Microarchitectural Resource Management Q1: How to capture isolation? Q2: How to reason about isolation? Q3: How to charge for isolation? 14

  15. Stratus: Clouds with Principled Microarchitectural Resource Management Q1: How to capture isolation? Q2: How to reason about isolation? Q3: How to charge for isolation? 15

  16. Capturing Isolation: A Declarative Interface Isolation captured as handle = ISOLATE (resource, scale, quantity); constraints on resource allocations 16

  17. Capturing Isolation: A Declarative Interface Isolation captured as handle = ISOLATE (resource, scale, quantity); constraints on resource allocations hard : discrete allocation Extent of isolation Number of resources for LLC slots, ALUs, TLBs, requested {0,1} which this constraint must be satisfied soft: contented (in time) DRAM bandwidth 17

  18. Capturing Isolation: A Declarative Interface handle = ISOLATE (resource, scale, quantity); Multiple constraints can be ATTACH (handle1, handle2, ...); “attached” (AND) by their handles 18

  19. Capturing Isolation: A Declarative Interface handle = ISOLATE (resource, scale, quantity); ATTACH (handle1, handle2, ...); ALLOCATE cloud_resource, .. where constraints Pass multiple microarchitectural constraints during cloud resource allocations (also labeled grouped constraints -- see the paper) 19

  20. Capturing Isolation: A Declarative Interface handle = ISOLATE (resource, scale, quantity); ATTACH (handle1, handle2, ...); ALLOCATE cloud_resource, .. where constraints E.g., mitigating NetCAT: ALLOCATE small where h1 = ISOLATE (CPU.LLC, 1.0, 64), // the first 64 lines h2 = ISOLATE (NIC.*, *, ...), // all NIC-level uarch ATTACH (h1, h2); // both for small VM allocations 20

  21. Stratus: Clouds with Principled Microarchitectural Resource Management Q1: How to capture isolation? Q2: How to reason about isolation? Q3: How to charge for isolation? 21

  22. Building a Reasoning Framework Cloud Knowledge Base Resources - Topology, numbers, types ● Similar in the spirit as SKB - Datasheets information in Barrelfish OS (SOSP’09) CKB ● Structured representation of Online measurements knowledge in one place - Utilization, spare ● Can be queried - Occupancy 22

  23. Building a Reasoning Framework Cloud Knowledge Base Resources - Topology, numbers, types - Datasheets information Allocation strategy CKB Online measurements - Utilization, spare - Occupancy Tenant’s constraints (CNF format) 23

  24. Stratus: Clouds with Principled Microarchitectural Resource Management Q1: How to capture isolation? Q2: How to reason about isolation? Q3: How to charge for isolation? 24

  25. Charging for Isolation: Isolation Credits Isolation spectrum Low isolation High isolation + High utilization + Low interference + Better efficiency + Better perf/security - Low perf/security - Low utilization 25

  26. Charging for Isolation: Isolation Credits Isolation spectrum Low isolation High isolation + High utilization + Low interference + Better efficiency + Better perf/security - Low perf/security - Low utilization Isolation credit: a currency to capture this tradeoff ● Encourages tenants to only specify relevant constraints ● Encourages providers to innovate in better isolation mechanisms 26

  27. Charging for Isolation: Isolation Credits Isolation spectrum Low isolation High isolation + High utilization + Low interference + Better efficiency + Better perf/security - Low perf/security - Low utilization Stratus c r e d Cloud i constraints t b u d g e t = constraints 4 2 42 credits Tenant who Tenant with knows a budget 27

  28. Summary: Stratus Managing microarchitectural resources in a principled manner Three key ideas: 1. A declarative interface 2. A Cloud Knowledge Base (CKB) 3. Isolation Credits 28

  29. Challenges and Discussion Points Enforcing isolation Right constraints Scalability ● ● ● Mechanisms Profile guided CKB ● ● ● Policies Security libraries O(1-10ms) Is microarchitectural resource management really worth it? More efforts on mechanisms or better policies? Can we have better hardware support from vendors? What are we missing from a cloud-operation point of view? 29

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend