uniprof transparent unikernel
play

uniprof: Transparent Unikernel Performance Profiling & Debugging - PowerPoint PPT Presentation

uniprof: Transparent Unikernel Performance Profiling & Debugging Florian Schmidt, Research Scientist, NEC Europe Ltd. Unikernels? Faster, smaller, better! 2 Unikernels? Faster, smaller, better! clip arts: clipproject.info Unikernels


  1. uniprof: Transparent Unikernel Performance Profiling & Debugging Florian Schmidt, Research Scientist, NEC Europe Ltd.

  2. Unikernels? ▌ Faster, smaller, better! 2

  3. Unikernels? ▌ Faster, smaller, better! clip arts: clipproject.info Unikernels are hard to debug. ▌ But ever heard this? Kernel debugging is horrible! 3

  4. Unikernels? ▌ Faster, smaller, better! clip arts: clipproject.info Unikernels are hard to debug. ▌ But ever heard this? Kernel debugging is horrible! ▌ Then you might say But that’s not really true! Unikernels are a single linked binary. They have a shared address space. You can just use gdb! 4

  5. Unikernels? ▌ Faster, smaller, better! clip arts: clipproject.info Unikernels are hard to debug. ▌ But ever heard this? Kernel debugging is horrible! ▌ Then you might say But that’s not really true! Unikernels are a single linked binary. They have a shared address space. You can just use gdb! ▌ And while that is true… ▌ … we are admittedly lacking tools 5

  6. Unikernels? ▌ Faster, smaller, better! clip arts: clipproject.info Unikernels are hard to debug. ▌ But ever heard this? Kernel debugging is horrible! ▌ Then you might say But that’s not really true! Unikernels are a single linked binary. They have a shared address space. You can just use gdb! ▌ And while that is true… ▌ … we are admittedly lacking tools ▌ Such as effective profilers 6

  7. Enter uniprof ▌ Goals:  Performance profiler  No changes to profiled code necessary  Minimal overhead 7

  8. Enter uniprof ▌ Goals:  Performance profiler  No changes to profiled code necessary  Minimal overhead  Useful in production environments 8

  9. Enter uniprof ▌ Goals:  Performance profiler  No changes to profiled code necessary  Minimal overhead  Useful in production environments ▌ So, a stack profiler call_main+0x278  Collect stack traces at regular intervals main+0x1c schedule+0x3a monotonic_clock+0x1a 9

  10. Enter uniprof ▌ Goals:  Performance profiler  No changes to profiled code necessary  Minimal overhead  Useful in production environments ▌ So, a stack profiler call_main+0x278  Collect stack traces at regular intervals main+0x1c call_main+0x278 schedule+0x3a main+0x1c  Many of them monotonic_clock+0x1a call_main+0x278 blkfront_aio_poll+0x32 main+0x1c netfront_rx+0xa netfront_get_responses+0x1c netfrontif_rx_handler+0x20 netfrontif_transmit+0x1a0 call_main+0x278 netfront_xmit_pbuf+0xa4 main+0x1c netfront_rx+0xa 10

  11. Enter uniprof ▌ Goals:  Performance profiler  No changes to profiled code necessary  Minimal overhead  Useful in production environments ▌ So, a stack profiler call_main+0x278  Collect stack traces at regular intervals main+0x1c call_main+0x278 schedule+0x3a main+0x1c  Many of them monotonic_clock+0x1a call_main+0x278 blkfront_aio_poll+0x32 main+0x1c  Analyze which code paths show up often netfront_rx+0xa netfront_get_responses+0x1c • Either because they take a long time netfrontif_rx_handler+0x20 netfrontif_transmit+0x1a0 • Or because they are hit often call_main+0x278 netfront_xmit_pbuf+0xa4 main+0x1c  Point towards potential bottlenecks netfront_rx+0xa 11

  12. xenctx ▌ Turns out, a stack profiler for Xen already exists  Well, kinda 12

  13. xenctx ▌ Turns out, a stack profiler for Xen already exists  Well, kinda $ xenctx -f -s <symbol table file> <DOMID> ▌ xenctx is bundled with Xen [...] Call Trace:  Introspection tool [<0000000000004868>] three+0x58 <-- 00000000000ffea0: [<00000000000044f2>] two+0x52  Option to print call stack 00000000000ffef0: [<00000000000046a6>] one+0x12 00000000000fff40: [<000000000002ff66>] 00000000000fff80: [<0000000000012018>] call_main+0x278 13

  14. xenctx ▌ Turns out, a stack profiler for Xen already exists  Well, kinda $ xenctx -f -s <symbol table file> <DOMID> ▌ xenctx is bundled with Xen [...] Call Trace:  Introspection tool [<0000000000004868>] three+0x58 <-- 00000000000ffea0: [<00000000000044f2>] two+0x52  Option to print call stack 00000000000ffef0: [<00000000000046a6>] one+0x12 00000000000fff40: [<000000000002ff66>] 00000000000fff80: [<0000000000012018>] call_main+0x278 ▌ So if we run this over and over, we have a stack profiler  Well, kinda 14

  15. xenctx ▌ Downside: xenctx is slow  Very slow: 3ms+ per trace  Doesn’t sound like much, but really adds up (e.g., 100 samples/s = 300ms/s)  Can’t really blame it, not designed as a fast stack profiler 15

  16. xenctx ▌ Downside: xenctx is slow  Very slow: 3ms+ per trace  Doesn’t sound like much, but really adds up (e.g., 100 samples/s = 300ms/s)  Can’t really blame it, not designed as a fast stack profiler ▌ Performance isn’t just a nice -to-have  We interrupt the guest all the time  Can’t walk stack while guest is running: race conditions  High overhead can influence results!  Low overhead is imperative for use on production unikernels 16

  17. xenctx ▌ Downside: xenctx is slow  Very slow: 3ms+ per trace  Doesn’t sound like much, but really adds up (e.g., 100 samples/s = 300ms/s)  Can’t really blame it, not designed as a fast stack profiler ▌ Performance isn’t just a nice -to-have  We interrupt the guest all the time  Can’t walk stack while guest is running: race conditions  High overhead can influence results!  Low overhead is imperative for use on production unikernels ▌ First question: extend xenctx or write something from scratch?  Spoiler: look at the talk title  More insight when I come to the evaluation 17

  18. What do we need? 18

  19. What do we need? ▌ Registers (for FP, IP)  This is pretty easy: getvcpucontext() hypercall 19

  20. What do we need? ▌ Registers (for FP, IP)  This is pretty easy: getvcpucontext() hypercall ▌ Access to stack memory (to read return addresses and next FPs)  This is the complicated step  We need to do address resolution 20

  21. What do we need? ▌ Registers (for FP, IP)  This is pretty easy: getvcpucontext() hypercall ▌ Access to stack memory (to read return addresses and next FPs)  This is the complicated step  We need to do address resolution • Memory introspection requires mapping memory over • We’re looking at (uni)kernel code • But there’s still a virtual  (guest) physical resolution 21

  22. What do we need? ▌ Registers (for FP, IP)  This is pretty easy: getvcpucontext() hypercall ▌ Access to stack memory (to read return addresses and next FPs)  This is the complicated step  We need to do address resolution • Memory introspection requires mapping memory over • We’re looking at (uni)kernel code • But there’s still a virtual  (guest) physical resolution • Even in guest is PVH, can’t benefit from it, because we’re looking in from outside • So we need to manually walk page tables 22

  23. What do we need? ▌ Registers (for FP, IP)  This is pretty easy: getvcpucontext() hypercall ▌ Access to stack memory (to read return addresses and next FPs)  This is the complicated step  We need to do address resolution • Memory introspection requires mapping memory over • We’re looking at (uni)kernel code • But there’s still a virtual  (guest) physical resolution • Even in guest is PVH, can’t benefit from it, because we’re looking in from outside • So we need to manually walk page tables ▌ Symbol table (to resolve function names)  Thankfully, this is easy again: extract symbols from ELF with nm 23

  24. Registers IP … NULL FP … … Local variables Stack trace: Frame pointer Return address Other registers Local variables Frame pointer Return address Other registers Local variables Stack 24

  25. Registers IP … NULL FP … … Local variables Stack trace: Frame pointer Return address Other registers Local variables Frame pointer Return address Other registers Local variables function three() { […] Stack } 25

  26. Registers IP … NULL FP … … Local variables Stack trace: Frame pointer Return address Other registers three +0xca IP Local variables Frame pointer Return address Other registers Local variables function three() { […] Stack } 26

  27. Registers IP … NULL FP … … Local variables Stack trace: Frame pointer Return address Other registers three +0xca IP Local variables Frame pointer Return address Other registers Local variables function three() { […] Stack } 27

  28. Registers IP … NULL FP … … Local variables Stack trace: Frame pointer Return address Other registers three +0xca IP Local variables Frame pointer Return address function two() { Other registers […] Local variables three(); } function three() { […] Stack } 28

  29. Registers IP … NULL FP … … Local variables Stack trace: Frame pointer Return address Other registers three +0xca IP two +0xc1 FP+1word Local variables Frame pointer Return address function two() { Other registers […] Local variables three(); } function three() { […] Stack } 29

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend