measuring proof lite performance in non virtualized
play

Measuring PROOF Lite performance in (non)virtualized environment - PowerPoint PPT Presentation

Measuring PROOF Lite performance in (non)virtualized environment Ioannis Charalampidis, Aristotle University of Thessaloniki Summer Student 2010 Overview Introduction Benchmarks: Overall execution time Benchmarks: In-depth analysis


  1. Measuring PROOF Lite performance in (non)virtualized environment Ioannis Charalampidis, Aristotle University of Thessaloniki Summer Student 2010

  2. Overview • Introduction • Benchmarks: Overall execution time • Benchmarks: In-depth analysis • Conclusion

  3. What am I looking for? • There is a known overhead caused by the virtualization process ▫ How big is it? ▫ Where is located? ▫ How can we minimize it? ▫ Which hypervisor has the best performance? • I am using CernVM as guest

  4. What is CernVM? • It’s a baseline Virtual Software Appliance for use by LHC experiments • It’s available for many hypervisors • Hyper-V • KVM / QEMU • VM Ware • XEN • Virtual Box

  5. How am I going to find the answers? • Using as benchmark a standard data analysis application (ROOT + PROOF Lite) • Test it on different hypervisors • And on varying number of workers/CPUs • Compare the performance (Physical vs. Virtualized)

  6. Problem • The benchmark application requires too much time to complete ( 2 min ~ 15 min ) ▫ At least 3 runs are required for reliable results ▫ The in-depth analysis overhead is about 40% ▫ It is not efficient to perform detailed analysis for every CPU / Hypervisor configuration  Create the overall execution time benchmarks  Find the best configuration to run the traces on

  7. Benchmarks performed • Overall time ▫ Using time utility and automated batch scripts • In-depth analysis ▫ Tracing system calls using  Strace  KernelTAP ▫ Analyzing the trace files using applications I wrote  BASST (Batch analyzer based on STrace)  KARBON (General purpose application profiler based on trace files)

  8. Process description and results

  9. Benchmark Configuration • Base machine ▫ Scientific Linux CERN 5 • Guests ▫ CernVM 2.1 • Software packages from SLC repositories ▫ Linux Kernel 2.6.18-194.8.1.el5 ▫ XEN 3.1.2 + 2.6.18-194.8.1.el5 ▫ KVM 83-194.8.1.el5 ▫ Python 2.5.4p2 (from AFS) ▫ ROOT 5.26.00b (from AFS) • Base machine hardware ▫ 24 x Intel Xeon X7460 2.66GHz with VT-x Support (64 bit) ▫ No VT-d nor Extended Page Tables (EPT) hardware support ▫ 32G RAM

  10. Benchmark Configuration • Virtual machine configuration ▫ 1, 2 to 16 CPUs with 2 CPU step ▫ <CPU#> + 1Gb RAM for Physical disk and Network tests ▫ <CPU#> + 17Gb RAM for RAM Disk tests ▫ Disk image for the OS ▫ Physical disk for the Data + Software • Important background services running ▫ NSCD (Caching daemon)

  11. Benchmark Configuration • Caches were cleared before every test ▫ Page cache, dentries and inodes ▫ Using the /proc/sys/vm/drop_caches flag • No swap memory was used ▫ By periodically monitoring the free memory

  12. Automated batch scripts • The VM batch script runs on the Server host machine • It repeats the following procedure: ▫ Crate a new Virtual Machine Hypervisor ▫ Wait for the machine to finish booting ▫ Connect to the controlling script Client inside the VM ▫ Drop caches both on the host and the guest Benchmark Benchmark Benchmark ▫ Start the job ▫ Receive and archive the results

  13. Problem • There was a bug on PROOF Lite that was looking up a non-existing hostname during the startup of each worker Example : 0.2-plit litehp2 hp24.c .cer ern.c .ch-128 281241251-1271  • Discovered by detailed system call tracing The hostname couldn’t be cached  The application had to wait for the timeout  The startup time was delayed randomly  Call tracing applications made this delay even bigger  virtually hanging the application

  14. Problem • The problem was resolved with: ▫ A minimal DNS proxy was developed that fakes the existence of the buggy hostname ▫ It was later fixed in PROOF source cernvm.cern.ch? 137.138.234.20 Fake DNS Application DNS Server Proxy x.x-xxxxxx-xxx-xxx? 127.0.0.1

  15. Problem Ex Example: le: Events / sec for different CPU settings, as reported by the buggy benchmark Befor ore After 18000 16000 14000 12000 10000 8000 6000 4000 2000 0 0 5 10 15 20 25 30 0 5 10 15 20 25 30 RAM Disk - XEN RAM Disk - Host RAM Disk + Fixed DNS - XEN RAM Disk + Fixed DNS - Host Phys. Disk - XEN Phys. Disk - HOST Phys. Disk + Fixed DNS - XEN Phys. Disk + Fixed DNS - Host

  16. Results – Physical Disk 14000 12000 10000 Events / Sec 8000 Baremetal 6000 XEN KVM 4000 2000 0 1 2 4 6 8 10 12 14 16 Worker ers = CPUs

  17. Results – Network (XROOTD) 14000 12000 10000 Events / Sec 8000 Baremetal 6000 XEN KVM 4000 2000 0 1 2 4 6 8 10 12 14 16 Worker ers = CPUs

  18. Results – RAM Disk 14000 12000 10000 Events / Sec 8000 Baremetal 6000 XEN KVM 4000 2000 0 1 2 4 6 8 10 12 14 16 Worker ers = CPUs

  19. Results – Relative values Physical Disk RAM Disk Network (XROOTD) 1.2 1 0.8 al Ratio emetal areme 0.6 VM/Bar 0.4 0.2 0 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 Worker ers = CPUs Worker ers = CPUs Worker ers = CPUs Bare metal XEN KVM

  20. Results – Absolute values Physical Disk RAM Disk Network (XROOTD) 14000 12000 10000 Events / Sec 8000 6000 4000 2000 0 0 5 10 15 0 5 10 15 0 5 10 15 Worker ers = CPUs Worker ers = CPUs Worker ers = CPUs Bare metal XEN KVM

  21. Results – Comparison chart 14000 12000 10000 8000 Events / Sec 6000 4000 2000 0 0 2 4 6 8 10 12 14 16 18 Worker ers = CPUs Physical Disk - Bare metal Xrootd - Bare metal RAM Disk - Bare metal Physical Disk - XEN Xrootd - XEN RAM Disk - XEN Physica Disk - KVM Xrootd - KVM RAM Disk - KVM

  22. Procedure, problems and results

  23. In depth analysis • In order to get more details the program execution was monitored and all the system calls were traced and logged • Afterwards, the analyzer extracted useful information from the trace files such as ▫ Detecting the time spent on each system call ▫ Detecting the filesystem / network activity • The process of tracing adds some overhead but it is cancelled out from the overall performance measurement

  24. System call tracing utilities • STrace ace ▫ Traces application-wide system calls from user space Kernel ▫ Connects to the tracing process using the ptrace() system call and monitors it’s activity STrace • Advantages ▫ Traces the application’s system calls in real time Process ▫ Has very verbose output • Disadvantages ▫ Creates big overhead

  25. System call tracing utilities • SystemT mTAP AP ▫ Traces system-wide kernel System TAP activity, asynchronously ▫ Runs as a kernel module • Advantages Kernel ▫ Can trace virtually everything on a running kernel ▫ Supports scriptable kernel probes Process • Disadvantages ▫ It is not simple to extract detailed information ▫ System calls can be lost on high CPU activity

  26. System call tracing utilities • Sample STrace e output: 5266 1282662179.860933 arch_prctl(ARCH_SET_FS, 0x2b5f2bcc27d0) = 0 <0.000005> 5266 1282662179.860960 mprotect(0x34ca54d000, 16384, PROT_READ) = 0 <0.000007> 5266 1282662179.860985 mprotect(0x34ca01b000, 4096, PROT_READ) = 0 <0.000006> 5266 1282662179.861009 munmap(0x2b5f2bc92000, 189020) = 0 <0.000011> 5266 1282662179.861082 open("/usr/lib/locale/locale-archive", O_RDONLY) = 4 <0.000008> 5266 1282662179.861113 fstat(4, {st_mode=S_IFREG|0644, st_size=56442560, ...}) = 0 <0.000005> 5266 1282662179.861166 mmap(NULL, 56442560, PROT_READ, MAP_PRIVATE, 4, 0) = 0x2b5f2bcc3000 <0.000007> 5266 1282662179.861192 close(4) = 0 <0.000005> 5266 1282662179.861269 brk(0) = 0x1ad1f000 <0.000005> 5266 1282662179.861290 brk(0x1ad40000) = 0x1ad40000 <0.000006> 5266 1282662179.861444 open("/usr/share/locale/locale.alias", O_RDONLY) = 4 <0.000009> 5266 1282662179.861483 fstat(4, {st_mode=S_IFREG|0644, st_size=2528, ...}) = 0 <0.000005> 5266 1282662179.861944 read(4, "", 4096) = 0 <0.000006> 5266 1282662179.861968 close(4) = 0 <0.000005> 5266 1282662179.861989 munmap(0x2b5f2f297000, 4096) = 0 <0.000009> 5264 1282662179.863063 wait4(-1, 0x7fff8d813064, WNOHANG, NULL) = -1 ECHILD (No child processes) ...

  27. KARBON – A trace file analyzer

  28. KARBON – A trace file analyzer • Is a general purpose application profiler based on system call trace files • It traces file descriptors and reports detailed I/O statistics for files, network sockets and FIFO pipes • It analyzes the child processes and creates process graphs and process trees • It can detect the “Hot spots” of an application • Custom analyzing tools can be created on-demand using the development API

  29. KARBON – Application block diagram Preprocessing Tool Presenter Source (File or TCP Stream) Tokenizer Router Filter Analyzer Presenter

  30. Results • Time utilization of the traced application Physical Disk - KVM File IO Physical Disk - XEN Net IO Misc calls Physical Disk - Baremetal File IO Network (Xrootd) - KVM UNIX Sockets Network (Xrootd) - XEN TCP Sockets Misc calls Network (Xrootd) - Baremetal RAM Disk - KVM File IO RAM Disk - XEN Net IO Misc calls RAM Disk - Baremetal 0 50000 100000 150000 200000 250000 300000 Time spent (ms)

Recommend


More recommend