Measuring PROOF Lite performance in (non)virtualized environment - PowerPoint PPT Presentation

Measuring PROOF Lite performance in (non)virtualized environment Ioannis Charalampidis, Aristotle University of Thessaloniki Summer Student 2010

Overview • Introduction • Benchmarks: Overall execution time • Benchmarks: In-depth analysis • Conclusion

What am I looking for? • There is a known overhead caused by the virtualization process ▫ How big is it? ▫ Where is located? ▫ How can we minimize it? ▫ Which hypervisor has the best performance? • I am using CernVM as guest

What is CernVM? • It’s a baseline Virtual Software Appliance for use by LHC experiments • It’s available for many hypervisors • Hyper-V • KVM / QEMU • VM Ware • XEN • Virtual Box

How am I going to find the answers? • Using as benchmark a standard data analysis application (ROOT + PROOF Lite) • Test it on different hypervisors • And on varying number of workers/CPUs • Compare the performance (Physical vs. Virtualized)

Problem • The benchmark application requires too much time to complete ( 2 min ~ 15 min ) ▫ At least 3 runs are required for reliable results ▫ The in-depth analysis overhead is about 40% ▫ It is not efficient to perform detailed analysis for every CPU / Hypervisor configuration  Create the overall execution time benchmarks  Find the best configuration to run the traces on

Benchmarks performed • Overall time ▫ Using time utility and automated batch scripts • In-depth analysis ▫ Tracing system calls using  Strace  KernelTAP ▫ Analyzing the trace files using applications I wrote  BASST (Batch analyzer based on STrace)  KARBON (General purpose application profiler based on trace files)

Process description and results

Benchmark Configuration • Base machine ▫ Scientific Linux CERN 5 • Guests ▫ CernVM 2.1 • Software packages from SLC repositories ▫ Linux Kernel 2.6.18-194.8.1.el5 ▫ XEN 3.1.2 + 2.6.18-194.8.1.el5 ▫ KVM 83-194.8.1.el5 ▫ Python 2.5.4p2 (from AFS) ▫ ROOT 5.26.00b (from AFS) • Base machine hardware ▫ 24 x Intel Xeon X7460 2.66GHz with VT-x Support (64 bit) ▫ No VT-d nor Extended Page Tables (EPT) hardware support ▫ 32G RAM

Benchmark Configuration • Virtual machine configuration ▫ 1, 2 to 16 CPUs with 2 CPU step ▫ <CPU#> + 1Gb RAM for Physical disk and Network tests ▫ <CPU#> + 17Gb RAM for RAM Disk tests ▫ Disk image for the OS ▫ Physical disk for the Data + Software • Important background services running ▫ NSCD (Caching daemon)

Benchmark Configuration • Caches were cleared before every test ▫ Page cache, dentries and inodes ▫ Using the /proc/sys/vm/drop_caches flag • No swap memory was used ▫ By periodically monitoring the free memory

Automated batch scripts • The VM batch script runs on the Server host machine • It repeats the following procedure: ▫ Crate a new Virtual Machine Hypervisor ▫ Wait for the machine to finish booting ▫ Connect to the controlling script Client inside the VM ▫ Drop caches both on the host and the guest Benchmark Benchmark Benchmark ▫ Start the job ▫ Receive and archive the results

Problem • There was a bug on PROOF Lite that was looking up a non-existing hostname during the startup of each worker Example : 0.2-plit litehp2 hp24.c .cer ern.c .ch-128 281241251-1271  • Discovered by detailed system call tracing The hostname couldn’t be cached  The application had to wait for the timeout  The startup time was delayed randomly  Call tracing applications made this delay even bigger  virtually hanging the application

Problem • The problem was resolved with: ▫ A minimal DNS proxy was developed that fakes the existence of the buggy hostname ▫ It was later fixed in PROOF source cernvm.cern.ch? 137.138.234.20 Fake DNS Application DNS Server Proxy x.x-xxxxxx-xxx-xxx? 127.0.0.1

Problem Ex Example: le: Events / sec for different CPU settings, as reported by the buggy benchmark Befor ore After 18000 16000 14000 12000 10000 8000 6000 4000 2000 0 0 5 10 15 20 25 30 0 5 10 15 20 25 30 RAM Disk - XEN RAM Disk - Host RAM Disk + Fixed DNS - XEN RAM Disk + Fixed DNS - Host Phys. Disk - XEN Phys. Disk - HOST Phys. Disk + Fixed DNS - XEN Phys. Disk + Fixed DNS - Host

Results – Physical Disk 14000 12000 10000 Events / Sec 8000 Baremetal 6000 XEN KVM 4000 2000 0 1 2 4 6 8 10 12 14 16 Worker ers = CPUs

Results – Network (XROOTD) 14000 12000 10000 Events / Sec 8000 Baremetal 6000 XEN KVM 4000 2000 0 1 2 4 6 8 10 12 14 16 Worker ers = CPUs

Results – RAM Disk 14000 12000 10000 Events / Sec 8000 Baremetal 6000 XEN KVM 4000 2000 0 1 2 4 6 8 10 12 14 16 Worker ers = CPUs

Results – Relative values Physical Disk RAM Disk Network (XROOTD) 1.2 1 0.8 al Ratio emetal areme 0.6 VM/Bar 0.4 0.2 0 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 Worker ers = CPUs Worker ers = CPUs Worker ers = CPUs Bare metal XEN KVM

Results – Absolute values Physical Disk RAM Disk Network (XROOTD) 14000 12000 10000 Events / Sec 8000 6000 4000 2000 0 0 5 10 15 0 5 10 15 0 5 10 15 Worker ers = CPUs Worker ers = CPUs Worker ers = CPUs Bare metal XEN KVM

Results – Comparison chart 14000 12000 10000 8000 Events / Sec 6000 4000 2000 0 0 2 4 6 8 10 12 14 16 18 Worker ers = CPUs Physical Disk - Bare metal Xrootd - Bare metal RAM Disk - Bare metal Physical Disk - XEN Xrootd - XEN RAM Disk - XEN Physica Disk - KVM Xrootd - KVM RAM Disk - KVM

Procedure, problems and results

In depth analysis • In order to get more details the program execution was monitored and all the system calls were traced and logged • Afterwards, the analyzer extracted useful information from the trace files such as ▫ Detecting the time spent on each system call ▫ Detecting the filesystem / network activity • The process of tracing adds some overhead but it is cancelled out from the overall performance measurement

System call tracing utilities • STrace ace ▫ Traces application-wide system calls from user space Kernel ▫ Connects to the tracing process using the ptrace() system call and monitors it’s activity STrace • Advantages ▫ Traces the application’s system calls in real time Process ▫ Has very verbose output • Disadvantages ▫ Creates big overhead

System call tracing utilities • SystemT mTAP AP ▫ Traces system-wide kernel System TAP activity, asynchronously ▫ Runs as a kernel module • Advantages Kernel ▫ Can trace virtually everything on a running kernel ▫ Supports scriptable kernel probes Process • Disadvantages ▫ It is not simple to extract detailed information ▫ System calls can be lost on high CPU activity

System call tracing utilities • Sample STrace e output: 5266 1282662179.860933 arch_prctl(ARCH_SET_FS, 0x2b5f2bcc27d0) = 0 <0.000005> 5266 1282662179.860960 mprotect(0x34ca54d000, 16384, PROT_READ) = 0 <0.000007> 5266 1282662179.860985 mprotect(0x34ca01b000, 4096, PROT_READ) = 0 <0.000006> 5266 1282662179.861009 munmap(0x2b5f2bc92000, 189020) = 0 <0.000011> 5266 1282662179.861082 open("/usr/lib/locale/locale-archive", O_RDONLY) = 4 <0.000008> 5266 1282662179.861113 fstat(4, {st_mode=S_IFREG|0644, st_size=56442560, ...}) = 0 <0.000005> 5266 1282662179.861166 mmap(NULL, 56442560, PROT_READ, MAP_PRIVATE, 4, 0) = 0x2b5f2bcc3000 <0.000007> 5266 1282662179.861192 close(4) = 0 <0.000005> 5266 1282662179.861269 brk(0) = 0x1ad1f000 <0.000005> 5266 1282662179.861290 brk(0x1ad40000) = 0x1ad40000 <0.000006> 5266 1282662179.861444 open("/usr/share/locale/locale.alias", O_RDONLY) = 4 <0.000009> 5266 1282662179.861483 fstat(4, {st_mode=S_IFREG|0644, st_size=2528, ...}) = 0 <0.000005> 5266 1282662179.861944 read(4, "", 4096) = 0 <0.000006> 5266 1282662179.861968 close(4) = 0 <0.000005> 5266 1282662179.861989 munmap(0x2b5f2f297000, 4096) = 0 <0.000009> 5264 1282662179.863063 wait4(-1, 0x7fff8d813064, WNOHANG, NULL) = -1 ECHILD (No child processes) ...

KARBON – A trace file analyzer

KARBON – A trace file analyzer • Is a general purpose application profiler based on system call trace files • It traces file descriptors and reports detailed I/O statistics for files, network sockets and FIFO pipes • It analyzes the child processes and creates process graphs and process trees • It can detect the “Hot spots” of an application • Custom analyzing tools can be created on-demand using the development API

KARBON – Application block diagram Preprocessing Tool Presenter Source (File or TCP Stream) Tokenizer Router Filter Analyzer Presenter

Results • Time utilization of the traced application Physical Disk - KVM File IO Physical Disk - XEN Net IO Misc calls Physical Disk - Baremetal File IO Network (Xrootd) - KVM UNIX Sockets Network (Xrootd) - XEN TCP Sockets Misc calls Network (Xrootd) - Baremetal RAM Disk - KVM File IO RAM Disk - XEN Net IO Misc calls RAM Disk - Baremetal 0 50000 100000 150000 200000 250000 300000 Time spent (ms)

Measuring PROOF Lite performance in (non)virtualized environment - PowerPoint PPT Presentation

Measuring PROOF Lite performance in (non)virtualized environment Ioannis Charalampidis, Aristotle University of Thessaloniki Summer Student 2010 Overview Introduction Benchmarks: Overall execution time Benchmarks: In-depth analysis

vegawidget Composing and Rendering Interactive Vega(-Lite) Charts vegawidget Using Vega-Lite in

LICENCE LITE Licence lite finding a new route to market for smaller electricity generators

WASILAH GLOBAL GROUP BHD INTRODUCING WASILAH BATA LITE WASILAH BATA LITE INTRODUCTION

MDT Electronics ATLAS "Lite" Electronics Status ASD Lite STATUS 16,000 chips in house

Machine learning on mobile and edge devices with TensorFlow Lite Developer advocate for

CrossBow: From Hardware Virtualized NICs t\ to Virtualized Networks Sunay Tripathi, Nicolas

Building a Fast, Virtualized Building a Fast, Virtualized Data Plane with Data Plane with

3515ICT Theory of Computation Some sample proofs 4-0 Proof types 1. Proof

PROOF installation/usage Attila Krasznahorkay for the Tier3 PROOF WG Wednesday, June 9, 2010

TOURNAMENT PAPER WORK REVIEW TOURNAMENT PLAYER VERIFICATION FORM Proof of Age Proof of

CS 671 Automated Reasoning Proof Automation in First Order Logic 1. Tactic-based proof search 2.

Measuring Performance November 17, 2008 Measuring Performance Introduction CPU Peformance and

Jobbing Market Refractory Coating Product Line February 2012 General Overview Mold Lite Plus is

Representability in DL-Lite R Knowledge Base Exchange Marcelo Arenas 1 Elena Botoeva 2 Diego

Alternative Investment Fund Managers Directive The Depositary -Lite Regime A Practical

Covenant-Lite Loans: Leveraged Lending in the Syndicated Loan Market Analyzing Elements and

Virtual Annual General Meeting 17 July 2020 Corona: well prepared for the crisis early on

OVERVIEW OF PRESENTATION School districts across the state are charged with developing

Council Rural Quality Improvement Technical Assistance (RQITA) Program January 4, 2018 Agenda

FLORIDA TRUSTEE IMPLEMENTATION GROUP RESTORATION PLAN 1 AND ENVIRONMENTAL ASSESSMENT PUBLIC WEBINAR

Annual General Shareholders Meeting Gabriel Moura Chief Executive Officer March March 18 18 th

How to Keep UP Through Digital Transformation with Next-Generation App Development Peter Sjoberg

Virtual Identities This lesson considers the extent to which people are able to create online,

Strategic Planning Update Where it All Started: 2010 NOAA Proposal Concept of a Florida-Centric

Measuring PROOF Lite performance in (non)virtualized environment - PowerPoint PPT Presentation

Measuring PROOF Lite performance in (non)virtualized environment Ioannis Charalampidis, Aristotle University of Thessaloniki Summer Student 2010 Overview Introduction Benchmarks: Overall execution time Benchmarks: In-depth analysis

vegawidget Composing and Rendering Interactive Vega(-Lite) Charts vegawidget Using Vega-Lite in

LICENCE LITE Licence lite finding a new route to market for smaller electricity generators

WASILAH GLOBAL GROUP BHD INTRODUCING WASILAH BATA LITE WASILAH BATA LITE INTRODUCTION

MDT Electronics ATLAS &quot;Lite&quot; Electronics Status ASD Lite STATUS 16,000 chips in house

Machine learning on mobile and edge devices with TensorFlow Lite Developer advocate for

CrossBow: From Hardware Virtualized NICs t\ to Virtualized Networks Sunay Tripathi, Nicolas

Building a Fast, Virtualized Building a Fast, Virtualized Data Plane with Data Plane with

3515ICT Theory of Computation Some sample proofs 4-0 Proof types 1. Proof

PROOF installation/usage Attila Krasznahorkay for the Tier3 PROOF WG Wednesday, June 9, 2010

TOURNAMENT PAPER WORK REVIEW TOURNAMENT PLAYER VERIFICATION FORM Proof of Age Proof of

CS 671 Automated Reasoning Proof Automation in First Order Logic 1. Tactic-based proof search 2.

Measuring Performance November 17, 2008 Measuring Performance Introduction CPU Peformance and

Jobbing Market Refractory Coating Product Line February 2012 General Overview Mold Lite Plus is

Representability in DL-Lite R Knowledge Base Exchange Marcelo Arenas 1 Elena Botoeva 2 Diego

Alternative Investment Fund Managers Directive The Depositary -Lite Regime A Practical

Covenant-Lite Loans: Leveraged Lending in the Syndicated Loan Market Analyzing Elements and

Virtual Annual General Meeting 17 July 2020 Corona: well prepared for the crisis early on

OVERVIEW OF PRESENTATION School districts across the state are charged with developing

Council Rural Quality Improvement Technical Assistance (RQITA) Program January 4, 2018 Agenda

FLORIDA TRUSTEE IMPLEMENTATION GROUP RESTORATION PLAN 1 AND ENVIRONMENTAL ASSESSMENT PUBLIC WEBINAR

Annual General Shareholders Meeting Gabriel Moura Chief Executive Officer March March 18 18 th

How to Keep UP Through Digital Transformation with Next-Generation App Development Peter Sjoberg

Virtual Identities This lesson considers the extent to which people are able to create online,

Strategic Planning Update Where it All Started: 2010 NOAA Proposal Concept of a Florida-Centric

MDT Electronics ATLAS "Lite" Electronics Status ASD Lite STATUS 16,000 chips in house