uniprof: Transparent Unikernel Performance Profiling & Debugging - PowerPoint PPT Presentation

uniprof: Transparent Unikernel Performance Profiling & Debugging Florian Schmidt, Research Scientist, NEC Europe Ltd.

Unikernels? ▌ Faster, smaller, better! 2

Unikernels? ▌ Faster, smaller, better! clip arts: clipproject.info Unikernels are hard to debug. ▌ But ever heard this? Kernel debugging is horrible! 3

Unikernels? ▌ Faster, smaller, better! clip arts: clipproject.info Unikernels are hard to debug. ▌ But ever heard this? Kernel debugging is horrible! ▌ Then you might say But that’s not really true! Unikernels are a single linked binary. They have a shared address space. You can just use gdb! 4

Unikernels? ▌ Faster, smaller, better! clip arts: clipproject.info Unikernels are hard to debug. ▌ But ever heard this? Kernel debugging is horrible! ▌ Then you might say But that’s not really true! Unikernels are a single linked binary. They have a shared address space. You can just use gdb! ▌ And while that is true… ▌ … we are admittedly lacking tools 5

Unikernels? ▌ Faster, smaller, better! clip arts: clipproject.info Unikernels are hard to debug. ▌ But ever heard this? Kernel debugging is horrible! ▌ Then you might say But that’s not really true! Unikernels are a single linked binary. They have a shared address space. You can just use gdb! ▌ And while that is true… ▌ … we are admittedly lacking tools ▌ Such as effective profilers 6

Enter uniprof ▌ Goals:  Performance profiler  No changes to profiled code necessary  Minimal overhead 7

Enter uniprof ▌ Goals:  Performance profiler  No changes to profiled code necessary  Minimal overhead  Useful in production environments 8

Enter uniprof ▌ Goals:  Performance profiler  No changes to profiled code necessary  Minimal overhead  Useful in production environments ▌ So, a stack profiler call_main+0x278  Collect stack traces at regular intervals main+0x1c schedule+0x3a monotonic_clock+0x1a 9

Enter uniprof ▌ Goals:  Performance profiler  No changes to profiled code necessary  Minimal overhead  Useful in production environments ▌ So, a stack profiler call_main+0x278  Collect stack traces at regular intervals main+0x1c call_main+0x278 schedule+0x3a main+0x1c  Many of them monotonic_clock+0x1a call_main+0x278 blkfront_aio_poll+0x32 main+0x1c netfront_rx+0xa netfront_get_responses+0x1c netfrontif_rx_handler+0x20 netfrontif_transmit+0x1a0 call_main+0x278 netfront_xmit_pbuf+0xa4 main+0x1c netfront_rx+0xa 10

Enter uniprof ▌ Goals:  Performance profiler  No changes to profiled code necessary  Minimal overhead  Useful in production environments ▌ So, a stack profiler call_main+0x278  Collect stack traces at regular intervals main+0x1c call_main+0x278 schedule+0x3a main+0x1c  Many of them monotonic_clock+0x1a call_main+0x278 blkfront_aio_poll+0x32 main+0x1c  Analyze which code paths show up often netfront_rx+0xa netfront_get_responses+0x1c • Either because they take a long time netfrontif_rx_handler+0x20 netfrontif_transmit+0x1a0 • Or because they are hit often call_main+0x278 netfront_xmit_pbuf+0xa4 main+0x1c  Point towards potential bottlenecks netfront_rx+0xa 11

xenctx ▌ Turns out, a stack profiler for Xen already exists  Well, kinda 12

xenctx ▌ Turns out, a stack profiler for Xen already exists  Well, kinda $ xenctx -f -s <symbol table file> <DOMID> ▌ xenctx is bundled with Xen [...] Call Trace:  Introspection tool [<0000000000004868>] three+0x58 <-- 00000000000ffea0: [<00000000000044f2>] two+0x52  Option to print call stack 00000000000ffef0: [<00000000000046a6>] one+0x12 00000000000fff40: [<000000000002ff66>] 00000000000fff80: [<0000000000012018>] call_main+0x278 13

xenctx ▌ Turns out, a stack profiler for Xen already exists  Well, kinda $ xenctx -f -s <symbol table file> <DOMID> ▌ xenctx is bundled with Xen [...] Call Trace:  Introspection tool [<0000000000004868>] three+0x58 <-- 00000000000ffea0: [<00000000000044f2>] two+0x52  Option to print call stack 00000000000ffef0: [<00000000000046a6>] one+0x12 00000000000fff40: [<000000000002ff66>] 00000000000fff80: [<0000000000012018>] call_main+0x278 ▌ So if we run this over and over, we have a stack profiler  Well, kinda 14

xenctx ▌ Downside: xenctx is slow  Very slow: 3ms+ per trace  Doesn’t sound like much, but really adds up (e.g., 100 samples/s = 300ms/s)  Can’t really blame it, not designed as a fast stack profiler 15

xenctx ▌ Downside: xenctx is slow  Very slow: 3ms+ per trace  Doesn’t sound like much, but really adds up (e.g., 100 samples/s = 300ms/s)  Can’t really blame it, not designed as a fast stack profiler ▌ Performance isn’t just a nice -to-have  We interrupt the guest all the time  Can’t walk stack while guest is running: race conditions  High overhead can influence results!  Low overhead is imperative for use on production unikernels 16

xenctx ▌ Downside: xenctx is slow  Very slow: 3ms+ per trace  Doesn’t sound like much, but really adds up (e.g., 100 samples/s = 300ms/s)  Can’t really blame it, not designed as a fast stack profiler ▌ Performance isn’t just a nice -to-have  We interrupt the guest all the time  Can’t walk stack while guest is running: race conditions  High overhead can influence results!  Low overhead is imperative for use on production unikernels ▌ First question: extend xenctx or write something from scratch?  Spoiler: look at the talk title  More insight when I come to the evaluation 17

What do we need? 18

What do we need? ▌ Registers (for FP, IP)  This is pretty easy: getvcpucontext() hypercall 19

What do we need? ▌ Registers (for FP, IP)  This is pretty easy: getvcpucontext() hypercall ▌ Access to stack memory (to read return addresses and next FPs)  This is the complicated step  We need to do address resolution 20

What do we need? ▌ Registers (for FP, IP)  This is pretty easy: getvcpucontext() hypercall ▌ Access to stack memory (to read return addresses and next FPs)  This is the complicated step  We need to do address resolution • Memory introspection requires mapping memory over • We’re looking at (uni)kernel code • But there’s still a virtual  (guest) physical resolution 21

What do we need? ▌ Registers (for FP, IP)  This is pretty easy: getvcpucontext() hypercall ▌ Access to stack memory (to read return addresses and next FPs)  This is the complicated step  We need to do address resolution • Memory introspection requires mapping memory over • We’re looking at (uni)kernel code • But there’s still a virtual  (guest) physical resolution • Even in guest is PVH, can’t benefit from it, because we’re looking in from outside • So we need to manually walk page tables 22

What do we need? ▌ Registers (for FP, IP)  This is pretty easy: getvcpucontext() hypercall ▌ Access to stack memory (to read return addresses and next FPs)  This is the complicated step  We need to do address resolution • Memory introspection requires mapping memory over • We’re looking at (uni)kernel code • But there’s still a virtual  (guest) physical resolution • Even in guest is PVH, can’t benefit from it, because we’re looking in from outside • So we need to manually walk page tables ▌ Symbol table (to resolve function names)  Thankfully, this is easy again: extract symbols from ELF with nm 23

Registers IP … NULL FP … … Local variables Stack trace: Frame pointer Return address Other registers Local variables Frame pointer Return address Other registers Local variables Stack 24

Registers IP … NULL FP … … Local variables Stack trace: Frame pointer Return address Other registers Local variables Frame pointer Return address Other registers Local variables function three() { […] Stack } 25

Registers IP … NULL FP … … Local variables Stack trace: Frame pointer Return address Other registers three +0xca IP Local variables Frame pointer Return address Other registers Local variables function three() { […] Stack } 26

Registers IP … NULL FP … … Local variables Stack trace: Frame pointer Return address Other registers three +0xca IP Local variables Frame pointer Return address Other registers Local variables function three() { […] Stack } 27

Registers IP … NULL FP … … Local variables Stack trace: Frame pointer Return address Other registers three +0xca IP Local variables Frame pointer Return address function two() { Other registers […] Local variables three(); } function three() { […] Stack } 28

Registers IP … NULL FP … … Local variables Stack trace: Frame pointer Return address Other registers three +0xca IP two +0xc1 FP+1word Local variables Frame pointer Return address function two() { Other registers […] Local variables three(); } function three() { […] Stack } 29

uniprof: Transparent Unikernel Performance Profiling & Debugging - PowerPoint PPT Presentation

uniprof: Transparent Unikernel Performance Profiling & Debugging Florian Schmidt, Research Scientist, NEC Europe Ltd. Unikernels? Faster, smaller, better! 2 Unikernels? Faster, smaller, better! clip arts: clipproject.info Unikernels

Unikernel Experiment Theory, practice and perspective @argent_smith Evrone.com {Tver.io} 1 /

Simulating Transparent Migration in Java Java doesnt provide transparent migration. non

Transparent Assessment Providing transparent goals and expectations for students Jonathon Adams

HermitCore A Unikernel for Extreme Scale Computing Stefan Lankes 1 , Simon Pickartz 1 , Jens

A Linux in Unikernel Clothing Hsuan-Chi Kuo + , Dan Williams, Ricardo Koller and Sibin Mohan + +

A Binary Compatible Unikernel Pierre Olivier, Daniel Chiba, Stefan Lankes + , Changwoo Min*,

Transparent migration of virtual Transparent migration of virtual infrastructures in large

Device-Transparent Personal Storage Based on the article Eyo: Device Transparent Personal

Transparent boundary conditions for the elastic Transparent boundary conditions for the elastic

GPU Construction and Transparent GPU Construction and Transparent Rendering of Iso-Surfaces

Technical debt management as transparent communication hub Matthias Kittner Technical debt

DARPA/I2O Transparent Computing Program THEIA: Tagging and Tracking of Multi-Level Host Events

Transparent Flow Mapping for NEAT Felix Weinrank, Michael Txen Department of Electrical

Fair Computation using Enclaves and Shared Ledger Rohit Sinha , Siva Gaddam, and Ranjit Kumaresan

Image Approximation with Transparent Introduction Triangles Objective Function Search

Transparent Wishes Kai von Fintel and Sabine Iatridou April 21, 2017 Berlin 1

CS7015 (Deep Learning) : Lecture 20 Markov Chains, Gibbs Sampling for Training RBMs, Contrastive

Infinitary logic and basically disconnected compact Hausdorff spaces ToLo VI Serafina Lapenta

The Maximum Clique Interdiction Game Fabio Furini, Ivana Ljubi c, Sbastien Martin, Pablo San

Regular Sets of Trees and Probability Matteo Mio CNRS & ENSLyon Matteo Mio Workshop on

Two-Player Zero-sum Games Played on Graphs: -Regular and Quantitative Objectives

Maximal left ideals of operators acting on a Banach space s

NLP from (almost) Scratch Bhuvan Venkatesh, Sarah Schieferstein (bvenkat2, schfrst2)

Whos me? Zequi V azquez DevOps & Backend PhD student Hacking & Security

uniprof: Transparent Unikernel Performance Profiling & Debugging - PowerPoint PPT Presentation

uniprof: Transparent Unikernel Performance Profiling & Debugging Florian Schmidt, Research Scientist, NEC Europe Ltd. Unikernels? Faster, smaller, better! 2 Unikernels? Faster, smaller, better! clip arts: clipproject.info Unikernels

Unikernel Experiment Theory, practice and perspective @argent_smith Evrone.com {Tver.io} 1 /

Simulating Transparent Migration in Java Java doesnt provide transparent migration. non

Transparent Assessment Providing transparent goals and expectations for students Jonathon Adams

HermitCore A Unikernel for Extreme Scale Computing Stefan Lankes 1 , Simon Pickartz 1 , Jens

A Linux in Unikernel Clothing Hsuan-Chi Kuo + , Dan Williams*, Ricardo Koller* and Sibin Mohan + +

A Binary Compatible Unikernel Pierre Olivier*, Daniel Chiba*, Stefan Lankes + , Changwoo Min*,

Transparent migration of virtual Transparent migration of virtual infrastructures in large

Device-Transparent Personal Storage Based on the article Eyo: Device Transparent Personal

Transparent boundary conditions for the elastic Transparent boundary conditions for the elastic

GPU Construction and Transparent GPU Construction and Transparent Rendering of Iso-Surfaces

Technical debt management as transparent communication hub Matthias Kittner Technical debt

DARPA/I2O Transparent Computing Program THEIA: Tagging and Tracking of Multi-Level Host Events

Transparent Flow Mapping for NEAT Felix Weinrank, Michael Txen Department of Electrical

Fair Computation using Enclaves and Shared Ledger Rohit Sinha , Siva Gaddam, and Ranjit Kumaresan

Image Approximation with Transparent Introduction Triangles Objective Function Search

Transparent Wishes Kai von Fintel and Sabine Iatridou April 21, 2017 Berlin 1

CS7015 (Deep Learning) : Lecture 20 Markov Chains, Gibbs Sampling for Training RBMs, Contrastive

Infinitary logic and basically disconnected compact Hausdorff spaces ToLo VI Serafina Lapenta

The Maximum Clique Interdiction Game Fabio Furini, Ivana Ljubi c, Sbastien Martin, Pablo San

Regular Sets of Trees and Probability Matteo Mio CNRS &amp; ENSLyon Matteo Mio Workshop on

Two-Player Zero-sum Games Played on Graphs: -Regular and Quantitative Objectives

Maximal left ideals of operators acting on a Banach space s

NLP from (almost) Scratch Bhuvan Venkatesh, Sarah Schieferstein (bvenkat2, schfrst2)

Whos me? Zequi V azquez DevOps &amp; Backend PhD student Hacking &amp; Security

A Linux in Unikernel Clothing Hsuan-Chi Kuo + , Dan Williams, Ricardo Koller and Sibin Mohan + +

A Binary Compatible Unikernel Pierre Olivier, Daniel Chiba, Stefan Lankes + , Changwoo Min*,

Regular Sets of Trees and Probability Matteo Mio CNRS & ENSLyon Matteo Mio Workshop on

Whos me? Zequi V azquez DevOps & Backend PhD student Hacking & Security