Multicore Linux and Middleware Chenyang Lu CSE 520S - PowerPoint PPT Presentation

Multicore ¡Linux ¡and ¡Middleware ¡ Chenyang ¡Lu ¡ CSE ¡520S ¡

Motivation ¡and ¡Contributions ¡  Trend ¡towards ¡mul9-‑processor ¡and ¡mul9-‑core ¡pla=orms ¡ affects ¡both ¡OS ¡and ¡middleware ¡ Techniques ¡designed ¡for ¡uni-‑processors ¡need ¡revisi9ng ¡   Contribu9ons ¡to ¡real-‑9me ¡systems ¡on ¡mul9-‑core ¡pla=orms ¡ A ¡performance ¡evalua9on ¡of ¡relevant ¡Linux ¡features ¡  MC-‑ORB ¡middleware ¡designed ¡for ¡mul9-‑core ¡pla=orms ¡  Evalua9on ¡of ¡MC-‑ORB’s ¡mul9-‑core ¡aware ¡RT ¡performance ¡  2 ¡

Background ¡and ¡Related ¡Work ¡  Linux ¡2.6 ¡introduced ¡SMP ¡and ¡mul9-‑core ¡support ¡ Linux ¡2.6.23 ¡added ¡the ¡Completely ¡Fair ¡Scheduler ¡(CFS) ¡  However, ¡many ¡deployed ¡pla=orms ¡predate ¡2.6.23 ¡  We ¡studied ¡Linux ¡2.6.17 ¡as ¡a ¡representa9ve ¡compromise ¡   We ¡assume ¡unmodified ¡COTS ¡Linux ¡as ¡our ¡middleware ¡ design ¡point, ¡for ¡highly ¡portable ¡real-‑9me ¡performance ¡  The ¡differing ¡trade-‑offs ¡for ¡uni-‑processor ¡vs. ¡mul9-‑ processor ¡pla=orms ¡mo9vate ¡new ¡middleware ¡designs ¡ 3 ¡

Linux ¡Performance: ¡Clock ¡Differences ¡I ¡  We ¡first ¡evaluated ¡clock ¡ differences ¡between ¡cores ¡ How ¡well ¡do ¡pla=orm/Linux ¡maintain ¡  synchroniza9on? ¡ We ¡used ¡RDTSC ¡instruc9on ¡to ¡record ¡  clock ¡9cks ¡on ¡each ¡core ¡  We ¡bounced ¡a ¡message ¡back ¡and ¡ forth ¡between ¡two ¡cores ¡ Used ¡arrival ¡TSCs ¡(x, ¡y, ¡z) ¡to ¡measure ¡  round ¡trip ¡delay ¡(RTD) ¡ The ¡results ¡show ¡that ¡the ¡cores’ ¡  frequencies ¡were ¡well ¡matched ¡ 4 ¡

Linux ¡Performance: ¡Clock ¡Differences ¡II ¡  We ¡then ¡es9mated ¡the ¡cores’ ¡ temporal ¡offsets ¡as ¡ ¡δ 0 ¡= ¡2y 1 –x 0 – ¡z 0 ¡; ¡δ 1 ¡= ¡2y 0 –x 1 –z 1 ¡  Insight ¡1 ¡ Though ¡frequencies ¡matched ¡well, ¡  avg. ¡offset ¡was ¡~1.3μs ¡ Mo9vates ¡measuring ¡offsets ¡in ¡our ¡  subsequent ¡analyses ¡ ¡ 5 ¡

Linux ¡Performance: ¡Load ¡Balancing ¡ ¡ ¡ ¡ Overhead ¡per ¡imbalance ¡(ns) ¡ ¡ Tasks ¡ U+liza+on ¡ Imbalances ¡ Overhead ¡ Minimum ¡ Mean ¡ Maximum ¡ detected ¡in ¡ (total ¡μs) ¡ 5 ¡min ¡ 10 ¡ 0.6 ¡ 211 ¡ 405 ¡ 983 ¡ 1899 ¡ 207 ¡ 30 ¡ 0.6 ¡ 210 ¡ 566 ¡ 1178 ¡ 2120 ¡ 247 ¡ 10 ¡ 1.0 ¡ 588 ¡ 536 ¡ 854 ¡ 1463 ¡ 509 ¡ 30 ¡ 1.0 ¡ 596 ¡ 671 ¡ 1124 ¡ 2069 ¡ 670 ¡  Can ¡thread ¡affinity ¡thwart ¡(bad) ¡Linux ¡rebalancing? ¡  We ¡ran ¡sets ¡of ¡10 ¡vs. ¡30 ¡tasks ¡(all ¡bound ¡to ¡one ¡core ¡to ¡prevent ¡ rebalancing), ¡with ¡total ¡u9liza9ons ¡of ¡0.6 ¡vs. ¡1.0 ¡  Insight ¡2 ¡  Though ¡overhead ¡is ¡small ¡and ¡amor9zed, ¡compiling ¡kernels ¡with ¡ rebalancing ¡off ¡appears ¡to ¡be ¡a ¡preferable ¡method ¡ 6 ¡

Linux ¡Performance: ¡Migration ¡Strategies ¡  Two ¡key ¡migra9on ¡strategies ¡ Thread ¡migrates ¡itself ¡  Separate ¡manager ¡thread ¡migrates ¡it ¡ 0 1 2 3  Case 1: a running thread  State ¡of ¡the ¡migra9ng ¡thread ¡ modifies its own affinity influences ¡the ¡mechanisms ¡& ¡cost ¡ ¡ Thread ¡core ¡affinity ¡mask ¡is ¡updated ¡  For ¡running ¡thread, ¡may ¡involve ¡ ¡ 0 1 2 3  kernel ¡run ¡queues ¡& ¡scheduler ¡ ¡ Case 2: a separate manager thread modifies a running thread’s affinity Self ¡migra9on ¡ always ¡entails ¡  migra9ng ¡a ¡running ¡thread ¡ ¡ 0 1 2 3 Case 3: a separate manager thread modifies a sleeping thread’s affinity 7 ¡

Linux ¡Performance: ¡Migration ¡Costs ¡ Insight ¡3 ¡ Every ¡strategy ¡ risks ¡a ¡non-‑negligible ¡  thread ¡migra9on ¡cost ¡ Mo9vates ¡binding ¡task ¡threads ¡into ¡  self migration core-‑specific ¡thread ¡pools ¡ (~ 16 to 45 µs) Mo9vates ¡an ¡ORB ¡architecture ¡with ¡a ¡  separate ¡manager ¡thread ¡(next) ¡ manager manager migrates migrates sleeping thread running (~ 4 to 10 µs) thread (~ 18 to 36 µs) 8 ¡

Conventional ¡Middleware ¡Architecture ¡  Tradi9onal ¡single-‑CPU ¡approach ¡benefits ¡from ¡leader/ followers ¡to ¡ reduce ¡costly ¡hand-‑offs ¡ E.g., ¡TAO, ¡nORB ¡   However, ¡mul9ple ¡cores ¡increase ¡ risk ¡of ¡migra6on ¡ Leader ¡invokes ¡TA ¡(and ¡ 1. AC) ¡for ¡task ¡ Picks ¡new ¡leader ¡ 2. New ¡leader ¡may ¡need ¡ 3. to ¡move ¡old ¡ Old ¡leader ¡runs ¡the ¡ 4. task ¡(on ¡the ¡ appropriate ¡core) ¡ 9 ¡

MC-‑ORB ¡Middleware ¡Architecture ¡  In ¡contrast, ¡MC-‑ORB’s ¡threading ¡architecture ¡ leverages ¡hand-‑ offs ¡to ¡avoid ¡ ¡thread ¡migra9ons ¡ ¡ Key ¡trade ¡off: ¡copying/locking ¡costs ¡vs. ¡migra9on ¡costs ¡  Request ¡is ¡queued ¡ 1. Manager ¡thread ¡ 2. reads ¡requests ¡in ¡ priority ¡order ¡ Invokes ¡TA ¡w/AC ¡ 3. Manager ¡picks ¡thread ¡ 4. from ¡pool ¡ Thread ¡runs ¡task ¡ 5. 10 ¡

Real-‑Time ¡ORB ¡Performance ¡Evaluation ¡  To ¡gauge ¡performance ¡costs ¡of ¡our ¡middleware ¡ architecture ¡we ¡examined ¡four ¡key ¡features ¡ Allocate ¡on ¡same ¡vs. ¡other ¡core ¡(as ¡manager ¡thread) ¡  Thread ¡available ¡vs. ¡migra9on ¡needed ¡  Realloca9on ¡is ¡vs. ¡is ¡not ¡required ¡to ¡allocate ¡task ¡  New ¡task ¡is ¡admined ¡vs. ¡rejected ¡   We ¡evaluated ¡our ¡middleware ¡architecture ¡both ¡with ¡ (MC-‑ORB) ¡and ¡without ¡(MC-‑ORB*) ¡rejec9on ¡ MC-‑ORB* ¡compared ¡to ¡nORB ¡(designed ¡for ¡uniprocessors) ¡  Varied ¡u9liza9on ¡granularity ¡& ¡magnitude ¡(10 ¡task ¡sets) ¡  We ¡measured ¡how ¡many ¡of ¡the ¡task ¡sets ¡missed ¡a ¡deadline ¡  11 ¡

Overheads ¡for ¡MC-‑ORB’s ¡Extensions ¡(μs) ¡ Scenario Minimum Mean Maximum 1 43 55 109 2 42 58 111 3 50 64 121 4 222 235 289 5 39 50 107 Scenarios ¡used ¡for ¡Overhead ¡Evalua9on ¡ 1. New ¡task ¡on ¡same ¡core ¡as ¡manager ¡ 2. New ¡task ¡on ¡different ¡core ¡(similar ¡cost ¡to ¡1) ¡ 3. (Sleeping) ¡thread ¡moved ¡from ¡other ¡core ¡to ¡run ¡new ¡task ¡ 4. (All) ¡running ¡tasks ¡reallocated ¡to ¡make ¡room ¡for ¡new ¡task ¡ 5. The ¡new ¡task ¡is ¡rejected ¡(low ¡cost, ¡but ¡it’s ¡pure ¡overhead) ¡ 12 ¡

Fraction ¡of ¡Workloads ¡w/ ¡Deadline ¡Misses ¡ Total Balance Factor Utilization ORB 0.1 0.2 0.3 0.5 nORB 0.4 0 0 0 1.4 MC-ORB* 0 0 0 0 nORB 0.8 0.3 0.1 0.1 1.5 MC-ORB* 0 0.1 + 0.1 + 0 nORB 1.0 0.5 0.1 0.1 1.6 MC-ORB* 0.3 + 0.4 + 0.4 + 0.3 + With ¡rejec9on, ¡>94% ¡of ¡tasks ¡were ¡ admi8ed ¡by ¡MC-‑ORB ¡and ¡all ¡admined ¡tasks ¡  met ¡all ¡deadlines ¡ Without ¡rejec9on ¡(where ¡ + ¡shows ¡need ¡for ¡AC) ¡MC-‑ORB* ¡  Worked ¡bener ¡with ¡less ¡balanced ¡workloads ¡ ¡  Outperformed ¡nORB ¡in ¡6 ¡cases ¡(green) ¡  Performed ¡the ¡same ¡as ¡nORB ¡in ¡4 ¡cases ¡(grey) ¡  Underperformed ¡nORB ¡in ¡2 ¡cases ¡(red) ¡  13 ¡

Concluding ¡Remarks ¡  COTS ¡OS ¡evalua9ons ¡ Measurement ¡on ¡specific ¡target ¡pla=orms ¡is ¡crucial ¡  Behaviors ¡of ¡hardware ¡ and ¡OS ¡mechanisms ¡are ¡important ¡   Middleware ¡architectures ¡ OS ¡evalua9ons ¡establish ¡design ¡trade-‑off ¡parameters ¡  Prior ¡design ¡decisions ¡may ¡be ¡reversed ¡on ¡new ¡pla=orms ¡   Performance ¡evalua9ons ¡bear ¡out ¡our ¡new ¡design ¡ Even ¡w/out ¡admission ¡control, ¡MC-‑ORB ¡architecture ¡helps ¡ ¡  W/ ¡AC ¡admined ¡high ¡u9liza9on, ¡and ¡met ¡all ¡deadlines ¡  14 ¡

Multicore Linux and Middleware Chenyang Lu CSE 520S - PowerPoint PPT Presentation

Multicore Linux and Middleware Chenyang Lu CSE 520S Motivation and Contributions Trend towards mul9-processor and mul9-core pla=orms affects both OS

Middleware Chapter 2: Contents - Chapter 2 Understanding middleware Middleware as a

Entity Resolution: Glue for Middleware Hector Garcia-Molina Stanford University Middleware

Introduction to Linux Aline Abler Aline Abler Linux, whats that? The pieces of a Linux

The Why, Where and How of Multicore Anant Agarwal MIT and Tilera Corp. What is Multicore?

State of Multicore OCaml KC Sivaramakrishnan University of OCaml Labs Cambridge Outline

Multicore Multicore curiculum 1 Motivation Moores Law: the number of transistors double

Linux from Sensors to Servers ! When is Linux Not Linux? ! 1 1 Linux runs across a huge range

Linux Overview Amir Hossein Payberah payberah@gmail.com 1 Agenda Linux Overview Linux

From Middleware Implementor to Middleware User (There and Back Again) Steve Vinoski Member of

Java Middleware Patrick Eugster, Till Bay, Tomas Hruz Java Middleware What is middleware

Middleware Queries Queries Middleware Middleware Queries Prof. Paolo Ciaccia Prof. Paolo

Linux Kung Fu Introduction What is Linux? Why Linux? What is the difference between a client

Multicore OCaml GC KC Sivaramakrishnan, Stephen Dolan University of OCaml Labs Cambridge

Multicore Synchronization a pragmatic introduction Multicore Synchronization This is a talk on

RETHINKING OPERATING SYSTEM DESIGNS FOR A Ken Birman Based heavily MULTICORE WORLD on a slide

From RPC to RMI Protocols for middleware services Protocols for middleware services

KDI RDF Fausto Giunchiglia and Mattia Fumagallli University of Trento Title (font gill sans MT)

The P e Penn nnsylvan ania E a Exper perien ence ce Af After er W Wayfai air Amy Gill

S T M -

Remote Employee Engagement and Culture Virtual Conference WFH Engagement by the Numbers: Get to

Multiple failure-time data Multiple failure-time data or multivariate survival data are

A practical view on linear algebra tools Evgeny Epifanovsky University of Southern California

EMBODIED CARBON IN THE BUILT ENVIRONMENT: SESSION 5 - REUSE August 17, 2018 Disclaimer Webinar

Improving network agility with seamless BGP reconfigurations Laurent Vanbever

Sambuz

Useful Links

Newsletter

Mail Us