resource disaggregation
play

Resource Disaggregation Yiying Zhang 2 Monolithic Computer OS / - PowerPoint PPT Presentation

Farewell to Servers: Hardware, Software, and Network Approaches towards Datacenter Resource Disaggregation Yiying Zhang 2 Monolithic Computer OS / Hypervisor 3 Application Can monolithic Hardware servers continue to Heterogeneity meet


  1. Farewell to Servers: Hardware, Software, and Network Approaches towards Datacenter Resource Disaggregation Yiying Zhang

  2. 2

  3. Monolithic Computer OS / Hypervisor 3

  4. Application Can monolithic Hardware servers continue to Heterogeneity meet Flexibility datacenter needs? Perf / $

  5. TPU GPU FPGA HBM NVM ASIC DNA Storage NVMe 5

  6. Making new hardware work with existing servers is like fitting puzzles 6

  7. Application Can monolithic Hardware servers continue to Heterogeneity meet Flexibility datacenter needs? Perf / $

  8. Poor Hardware Elasticity • Hard to change hardware components Add (hotplug), remove, reconfigure, restart - • No fine-grained failure handling The failure of one device can crash a whole machine - 8

  9. Application Can monolithic Hardware servers continue to Heterogeneity meet Flexibility datacenter needs? Perf / $

  10. Poor Resource Utilization • Whole VM/container has to run on one physical machine Move current applications to make room for new ones - wasted! cpu mem Server 1 Server 2 Job 1 Job 2 Available Space Required Space 10

  11. Resource Utilization in Production Clusters * Google Production Cluster Trace Data. * Alibaba Production Cluster Trace Data. “https://github.com/google/cluster-data” “https://github.com/alibaba/clusterdata." Unused Resource + Waiting/Killed Jobs Because of Physical-Node Constraints 11

  12. Application Can monolithic Hardware servers continue to Heterogeneity meet Flexibility datacenter needs? Perf / $

  13. How to achieve better heterogeneity, flexibility, and perf/$? Go beyond physical node boundary 13

  14. Resource Disaggregation : Breaking monolithic servers into network- attached, independent hardware components 14

  15. 15

  16. Application Flexibility Heterogeneity Hardware Perf / $ Network 16

  17. Why Possible Now? • Network is faster • InfiniBand ( 200Gbps, 600ns ) Berkeley Firebox • Optical Fabric ( 400Gbps, 100ns ) • More processing power at device • SmartNIC, SmartSSD, PIM Intel Rack-Scale • Network interface closer to device HP The Machine System • Omni-Path, Innova-2 IBM Composable System 17

  18. Disaggregated Datacenter End-to-End Solution Unmodified Performance Application Heterogeneity Dist Sys Flexibility OS Reliability Network Hardware $ Cost

  19. Disaggregated Datacenter End-to-End Solution Physically Disaggregated Resources Disaggregated Operating System (OSDI’18) New Processor and Memory Architecture Networking for Disaggregated Resources Kernel-Level RDMA Virtualization (SOSP’17) RDMA Network

  20. 20

  21. Can Existing Kernels Fit? Kern Kern Kern monolithic microkernel kernel Core GPU P-NIC CPU CPU mem NIC mem NIC Shared Main Memory Disk Disk Server Server network across servers Disk NIC Monolithic Server Monolithic/Micro-kernel Multikernel (e.g., Linux, L4) (e.g., Barrelfish, Helios, fos) 21

  22. Existing Kernels Don’t Fit Access remote resources Network Distributed resource mgmt Fine-grained failure handling 22

  23. When hardware is disaggregated The OS should be also 23

  24. OS Virtual File & Process Memory Storage Mgmt System System Network 24

  25. Network File & Process Storage Mgmt System Network Network Virtual Memory File & System Storage Network System Network 25

  26. The Splitkernel Architecture • Split OS functions into monitors • Run each monitor at h/w device • Network messaging across non-coherent components GPU XPU Process Minitor Manager Monitor • Distributed resource mgmt and New h/w Processor Processor failure handling (XPU) (CPU) (GPU) network messaging across non-coherent components Memory NVM HDD SSD Monitor Monitor Monitor Monitor Memory NVM Hard Disk SSD 26

  27. LegoOS The First Disaggregated OS M e m o r y Processor Storage NVM 27

  28. How Should LegoOS Appear to Users? As a set of hardware devices? As a giant machine? • Our answer: as a set of virtual Nodes ( vNodes ) Similar semantics to virtual machines - Unique vID, vIP , storage mount point - Can run on multiple processor, memory, and storage components - 28

  29. Abstraction - vNode Process GPU XPU Monitor Minitor Manager vNode1 Processor Processor New h/w (CPU) (GPU) (XPU) network messaging across non-coherent components vNode2 Memory NVM HDD SSD Monitor Monitor Monitor Monitor Memory NVM Hard Disk SSD One vNode can run multiple hardware components One hardware component can run multiple vNodes 29

  30. Abstraction • Appear as vNodes to users • Linux ABI compatible • Support unmodified Linux system call interface (common ones) • A level of indirection to translate Linux interface to LegoOS interface 30

  31. LegoOS Design 1. Clean separation of OS and hardware functionalities 2. Build monitor with hardware constraints 3. RDMA-based message passing for both kernel and applications 4. Two-level distributed resource management 5. Memory failure tolerance through replication 31

  32. Separate Processor and Memory Processor CPU $ CPU $ Last-Level TLB MMU DRAM PT 32

  33. Separate Processor and Memory Separate and move Processor CPU $ CPU $ hardware units Last-Level to memory Network component Memory TLB MMU DRAM PT Memory 33

  34. Separate Processor and Memory Virtual Memory Separate and move Processor CPU $ CPU $ hardware units Last-Level to memory Network component Memory TLB MMU DRAM PT Memory 34

  35. Separate Processor and Memory Separate and move Processor virtual memory CPU $ CPU $ Last-Level system Network to memory Memory component TLB MMU Virtual Memory DRAM PT Memory 35

  36. Separate Processor and Memory Virtual Virtual Address Address Processor Processor components only Virtual CPU $ CPU $ Address see virtual memory addresses All levels of cache are Last-Level virtual cache Network Virtual Address Memory Memory components manage TLB MMU Virtual Memory virtual and physical memory DRAM PT Memory 36

  37. Challenge: network is 2x-4x slower than memory bus 37

  38. Add Extended Cache at Processor Processor CPU $ CPU $ Last-Level Network Memory TLB MMU Virtual Memory DRAM PT Memory 38

  39. Add Extended Cache at Processor Processor • Add small DRAM/HBM at processor CPU $ CPU $ Last-Level • Use it as Extended Cache, or ExCache DRAM • Software and hardware co- Network managed Memory • Inclusive TLB MMU Virtual Memory • Virtual cache DRAM PT Memory 39

  40. LegoOS Design 1. Clean separation of OS and hardware functionalities 2. Build monitor with hardware constraints 3. RDMA-based message passing for both kernel and applications 4. Two-level distributed resource management 5. Memory failure tolerance through replication 40

  41. Distributed Resource Management Global Process Manager ( GPM ) Process GPU Global Monitor Minitor Resource Mgmt Global Processor Processor Memory Manager ( GMM ) (CPU) (GPU) Global network messaging across non-coherent components Storage Manager ( GSM ) Memory Memory NVM HDD SSD 1. Coarse-grain allocation Monitor Monitor Monitor Monitor Monitor Memory Memory NVM Hard Disk SSD 2. Load-balancing 3. Failure handling 41

  42. Implementation and Emulation Process Monitor • Processor • Reserve DRAM as ExCache (4KB page as cache line) CPU CPU CPU CPU Processor • h/w only on hit path, s/w managed miss path LLC Disk ExCache • Indirection layer to store states for 113 Linux syscalls • Memory RDMA Network • Limit number of cores, kernel-space only Memory Monitor Linux Kernel Module • Storage/Global Resource Monitors CPU CPU CPU CPU CPU • Implemented as kernel module on Linux LLC Disk LLC Disk • Network DRAM DRAM Memory Storage • RDMA RPC stack based on LITE [ SOSP’17 ] 42

  43. Performance Evaluation • Unmodified TensorFlow, running CIFAR-10 7 Linux − swap − SSD Linux − swap − ramdisk • Working set: 0.9G Slowdown InfiniSwap 5 LegoOS • 4 threads 3 • Systems in comparison 1 128 256 512 • Baseline: Linux with unlimited memory ExCache/Memory Size (MB) LegoOS Config: 1P , 1M, 1S • Swap to SSD, and ramdisk Only 1.3x to 1.7x slowdown when • InfiniSwap [ NSDI’17 ] disaggregating devices with LegoOS To gain better resource packing, 43 elasticity, and fault tolerance!

  44. LegoOS Summary • Resource disaggregation calls for new system • LegoOS : a new OS designed and built from scratch for datacenter resource disaggregation • Split OS into distributed micro-OS services, running at device • Many challenges and many potentials 44

  45. Disaggregated Datacenter flexible, heterogeneous, elastic, perf/$, resilient, scalable, easy-to-use Physically Disaggregated Resources Disaggregated Operating System (OSDI’18) New Processor and Memory Architecture Networking for Disaggregated Resources Networking for Disaggregated Resources Kernel-Level RDMA Virtualization (SOSP’17) Kernel-Level RDMA Virtualization (SOSP’17) RDMA Network RDMA Network

  46. Network Requirements for Resource Disaggregation • Low latency RDMA • High bandwidth • Scale • Reliable 46

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend