How .NET Runtime Evolves for the Cloud Mei-Chin Tsai Workload such - PowerPoint PPT Presentation

How .NET Runtime Evolves for the Cloud Mei-Chin Tsai

Workload such as Exchange, Bing Workload such as Lambda or Functions App App App App Container Container Container Container Monolithic Application Virtual Machine Virtual Machine Host OS Host OS Physical Server Physical Server

• Number of available CPU cores Physical • Number of threads resources that • Number of managed heaps • Size of available memory impact • Heap size Runtime • Number of heaps • Others heuristics

• .NET GCs are generational • Two different flavors of GCs today • Workstation GC • One managed heap (one GC thread) .NET GCs • Server GC • N managed heaps and N GC threads

Server GC Workstation GC one GC heap per core one heap for all Core 1 Core 2 Core 3 Core 4 Core 1 Core 2 Core 3 Core 4 Heap 1 Heap 2 Heap 3 Heap 4 Heap

Use multi-pronged approach for scaling Runtime Application/Process Application Runtime Configuration Using less Allow application memory is Scale down Scale up to specify intent generally better Optimize for Docker support many-core chip architecture

• Reduce the initial commit size of gen 0 Using less • Reduce the initial gen 0 allocation budget to better align with modern cache size and cache hierarchy memory ry is • New policy to determine number of GC heaps to create based on memory limit generally • Example – • Application memory limit is 160MB, default better – less GC memory segment per heap is 16MB • Old behavior: allocating one heap per core memory ry by on 48 core machine exceeds limit • New behavior: allocate 10 heaps, meets default limit

TechEmpower benchmarks ~50% of committed memory reduction

• Memory limit set on container • docker run -m 100mb -t xxx • GC heap is not the only component use memory. Scale down – • Introducing GCHeapHardLimit config • GCHeapHardLimit - specifies a hard limit for the GC Docker heap • GCHeapHardLimitPercent - specifies a percentage of container the physical memory this process is allowed to use • If neither is specified but the process is running inside a support container with a memory limit specified, we will take this as the hard limit: • max (20mb, 75% of the memory limit on the container)

• Observation - Bing frontend observed many TLB misses in their workload latency • Add an application config to allow large page Allow support application to • Pay more cost on each new page load request but hope to pay less frequently specify fy intent • On Windows – Runtime commit all the - Larg rge pages managed memory upfront. • Does change application performance support characteristic • Use carefully

Bing frontend (SNR) – P95 improvement ~108ms -> ~88ms (18.5% improvement). 50 th %ile (average), the improvement was around 9%

Trend is to use more cores (many of our customers are on 32 to 48 cores and are looking to upgrade core count) E.g. AMD ROME CPU – 64 cores, NUMA Scale Up – many-core processors The heap balancing mechanism needed to be revisited

Server GC one GC heap per core Each heap maintains its gen0 budget (ie, allocations it allows before triggering the next GC) Core 1 Core 2 Core 3 Core 4 • when any heap’s budget is exceeded, a GC pass is triggered • When GC is triggered, the whole world is stopped Heap 1 Heap 2 Heap 3 Heap 4 Memory in use

• When allocations on threads are balanced, they should stay allocating on the same heap • When allocations on threads are Heap unbalanced, they should in general spread evenly across heaps balancing goal • But there are special considerations, eg, we should favor the heap for that core

Current heap balancing mechanism explained • Home and alloc heap • Local heaps (on current NUMA node) vs remote heaps • Look at local heaps first • Requires a large delta to balance to a remote heap • When allocating to a remote heap, we incur not just remote allocation cost. We also incur remote access cost in the future. • Problem – we are trying too hard to keep heaps well balanced • Not showing up as problems when you had fewer heaps to search • The cost of remote access cannot be easily factored in ahead of time

Realizations • If we do less work and still achieve similar fill ratios, we should do that instead of looking at each heap • Balancing on earlier allocations is less important than later ones which tend to survive more

Thoughts • Really need better tooling to help with the investigation • vtune does show many memory counters but they can be hard to interpret; we also want to correlate with GC activities • New GC specific tooling shows how threads and their alloc heaps migrate Show the heap/thread logs of runtime instrumentation

How .NET Runtime Evolves for the Cloud Mei-Chin Tsai Workload such - PowerPoint PPT Presentation

How .NET Runtime Evolves for the Cloud Mei-Chin Tsai Workload such as Exchange, Bing Workload such as Lambda or Functions App App App App Container Container Container Container Monolithic Application Virtual Machine Virtual Machine

Testing Concurrency Runtime via a Testing Concurrency Runtime via a Stochastic Stress Framework

Horizon Runtime Efficient Event Scheduling in Runtime Efficient Event Scheduling in

Runtime systems Runtime systems Functional program are very high-level: its not obvious how to

The The SeETL RunTime RunTime SeETL Utilities Presentation Utilities Presentation

TenantGuard: Scalable Runtime Verification of Cloud-Wide VM-Level Network Isolation Han Song

Runtime System COMP 524: Programming Languages Based in part on slides and notes by J. Erickson,

Characteristics of Adapti tive Runtime Systems in HPC Laxmikant (Sanjay) Kale

WoT Runtime, Scripting, Bindings Zoltan Kis, Intel WoT Runtime WoT RT Script 1 Things Things

Task scheduling over Heterogeneous Multicore Machines: a Runtime Perspective Raymond Namyst

MarQ : Monitoring At Runtime with QEA Giles Reger in collaboration with Helena Cuenca Cruz,

Runtime Considerations Were moving towards actually producing target code. This means we need

Another approach to runtime checking Typical runtime checking is by duplicating entire CPU

Solid Type System vs Runtime Checks and Unit Tests Vladimir Pavkin Plan Fail Fast concept

Becky Coffin Kingfisher plc Net Positive 2 Net Positive 3 Net Positive 4 Creating the

The Microsoft .NET Framework The Common Language Runtime Common Language Specification

by net ki net kimy mya Net Kimya is dedicated to sustainability, supplying environmentally

Neutrino Dynamics in Big Bang Nucleosynthesis Evan Grohs University of California Berkeley 13

Lattice QCD for Nuclear Physics Saul D. Cohen (for NPLQCD Collaboration) International Workshop

Anticancer Activity: Pharmacophore Generation and 3D QSAR Analysis PRITAM NAGESH DUBE* , a ,

GMM-based classification from noisy features Alexey Ozerov ( 1 ) , Mathieu Lagrange ( 2 ) and Em m

Architectures Panayiotis Petrides () Pedro Trancoso ()() () Computer Science and (*)

Energy'Proportionality'and'Worload' Consolidation'for'Latency6critical' Applications

Scheduling Multi-Periodic Mixed-Criticality DAGs on Multi-Core Architectures Roberto MEDINA

FuseSoc - cores never been so much fun Olof Kindgren Qamcom Research & Technology, FOSSi

How .NET Runtime Evolves for the Cloud Mei-Chin Tsai Workload such - PowerPoint PPT Presentation

How .NET Runtime Evolves for the Cloud Mei-Chin Tsai Workload such as Exchange, Bing Workload such as Lambda or Functions App App App App Container Container Container Container Monolithic Application Virtual Machine Virtual Machine

Testing Concurrency Runtime via a Testing Concurrency Runtime via a Stochastic Stress Framework

Horizon Runtime Efficient Event Scheduling in Runtime Efficient Event Scheduling in

Runtime systems Runtime systems Functional program are very high-level: its not obvious how to

The The SeETL RunTime RunTime SeETL Utilities Presentation Utilities Presentation

TenantGuard: Scalable Runtime Verification of Cloud-Wide VM-Level Network Isolation Han Song

Runtime System COMP 524: Programming Languages Based in part on slides and notes by J. Erickson,

Characteristics of Adapti tive Runtime Systems in HPC Laxmikant (Sanjay) Kale

WoT Runtime, Scripting, Bindings Zoltan Kis, Intel WoT Runtime WoT RT Script 1 Things Things

Task scheduling over Heterogeneous Multicore Machines: a Runtime Perspective Raymond Namyst

MarQ : Monitoring At Runtime with QEA Giles Reger in collaboration with Helena Cuenca Cruz,

Runtime Considerations Were moving towards actually producing target code. This means we need

Another approach to runtime checking Typical runtime checking is by duplicating entire CPU

Solid Type System vs Runtime Checks and Unit Tests Vladimir Pavkin Plan Fail Fast concept

Becky Coffin Kingfisher plc Net Positive 2 Net Positive 3 Net Positive 4 Creating the

The Microsoft .NET Framework The Common Language Runtime Common Language Specification

by net ki net kimy mya Net Kimya is dedicated to sustainability, supplying environmentally

Neutrino Dynamics in Big Bang Nucleosynthesis Evan Grohs University of California Berkeley 13

Lattice QCD for Nuclear Physics Saul D. Cohen (for NPLQCD Collaboration) International Workshop

Anticancer Activity: Pharmacophore Generation and 3D QSAR Analysis PRITAM NAGESH DUBE* , a ,

GMM-based classification from noisy features Alexey Ozerov ( 1 ) , Mathieu Lagrange ( 2 ) and Em m

Architectures Panayiotis Petrides (*) Pedro Trancoso (*)(**) (**) Computer Science and (*)

Energy'Proportionality'and'Worload' Consolidation'for'Latency6critical' Applications

Scheduling Multi-Periodic Mixed-Criticality DAGs on Multi-Core Architectures Roberto MEDINA

FuseSoc - cores never been so much fun Olof Kindgren Qamcom Research &amp; Technology, FOSSi

Architectures Panayiotis Petrides () Pedro Trancoso ()() () Computer Science and (*)

FuseSoc - cores never been so much fun Olof Kindgren Qamcom Research & Technology, FOSSi