by raoufehsadat hashemian
play

by : Raoufehsadat Hashemian The 4th ACM/SPEC International Conference - PowerPoint PPT Presentation

R. Hashemian 1 , D. Krishnamurthy 1 , M. Arlitt 2 , N. Carlsson 3 1. University of Calgary 2. HP Labs 3. Linkping University by : Raoufehsadat Hashemian The 4th ACM/SPEC International Conference on Performance Engineering ICPE 2013 OUTLINE


  1. R. Hashemian 1 , D. Krishnamurthy 1 , M. Arlitt 2 , N. Carlsson 3 1. University of Calgary 2. HP Labs 3. Linköping University by : Raoufehsadat Hashemian The 4th ACM/SPEC International Conference on Performance Engineering ICPE 2013

  2. OUTLINE • Introduction • Scalability Evaluation • Scalability Enhancement Approach • Validation • Conclusion 2 ICPE13 Improving the Scalability of a Multi-core Web Server

  3. INTRODUCTION PROBLEM DESCRIPTION Enterprise applications • • Performance: Improving QoS • e.g. Lower response times • Cost: Less money spent on hardware • e.g. Improving effective utilization 40 Response Time 30 20 10 0 0 50 100 CPU Utilization (%) Goal: Higher utilization and acceptable response time • Multi-core technology • 3 ICPE13 Improving the Scalability of a Multi-core Web Server

  4. INTRODUCTION BACKGROUND • Web servers before multi-core • Mature topic, wide-ranging discussions • Multi-core architecture • Most research on batch (non-interactive) workload • Web servers running on Multi-core • BUS problem in UMA system ( Veal et al. `07 ) • Multiple Web server instances: 1 Instance per processor (Scogland et al.` 09, Boyd et.al ,10 Gaud et. al ,11) 5 ICPE13 Improving the Scalability of a Multi-core Web Server

  5. SCALABILITY EVALUATION SCALABILITY MEASUREMENT • Measure Web server scalability for two workloads • Evaluate the effectiveness of multiple Web server approach in scalability • Scalability • Maximum Achievable Throughput (MAT) 4 ICPE13 Improving the Scalability of a Multi-core Web Server

  6. SCALABILITY EVALUATION EXPERIMENTAL SETUP • 2 x 4 core Intel Xeon E5620 processors NUMA Architecture Processor 0 Processor 1 Microarch. Nehalem C C C C C C C C Frequency 2.4 GHz 0 2 4 6 0 2 4 6 L1 Cache 32K IC - 32K DC L1 L1 L1 L1 L1 L1 L1 L1 L2 Cache 256K L2 L2 L2 L2 L2 L2 L2 L2 L3 Cache 12M (Inclusive) L3 L3 Inter-conn. QPI -5.86 GT/s Memory 16GB - DDR3-1333 Memory Memory Bank 0 Bank 1 • OS: Linux, kernel 3, Ubuntu • Webserver: Lighttpd • Application Server: php (FastCGI module) 6 ICPE13 Improving the Scalability of a Multi-core Web Server

  7. SCALABILITY EVALUATION WORKLOADS • TCP/IP Intensive workload • High TCP connection rate • Processing: low user level & high kernel level • 1 KB static file, up to 155,000 requests/second • SPECweb Support workload • Both static requests and php requests • Wider range of request types • Processing: high user level & moderate kernel level 7 ICPE13 Improving the Scalability of a Multi-core Web Server

  8. SCALABILITY EVALUATION CONFIGURATION TUNING • 1 Lighttpd worker process per core • Disabling scheduler effect (use affinity) • Distributing interrupt handling load • Improved MAT up to 69% • Balanced utilization levels for the eight cores • Fully utilize the server 8 ICPE13 Improving the Scalability of a Multi-core Web Server

  9. SCALABILITY EVALUATION RESULTS • TCP/IP Intensive workload Scalability • Sub-linear Maximum Achievable Throughput 146,000 req/sec • SPECweb Support workload Scalability • Almost linear Maximum Achievable Throughput 23,000 req/sec Number of Cores 9 ICPE13 Improving the Scalability of a Multi-core Web Server

  10. SCALABILITY EVALUATION RESPONSE DISTRIBUTION ANALYSIS “Low response time” requests  increased response times  Static • “High response time” requests  decreased response time  Dynamic • 1 0.8 CDF of Response times P [ X <= x ] 0.6 80% CPU Utilization SPECweb Support Workload 0.4 0.2 1 Core 2 Core 4 Core 8 Core 0 -1 0 1 2 3 10 10 10 10 10 x = Response time (msec) 10 ICPE13 Improving the Scalability of a Multi-core Web Server

  11. SCALABILITY ENHANCEMENT MULTIPLE WEBSITE REPLICAS • Approach: Using one Web server instance per processor • Goal: Reduce inter-processor data migration Single Replica Process Replica 1 Process NIC 1 Queue NIC1 Queue NIC 2 Queue Replica 2 Process NIC 2 Queue Processor 0 Processor 1 Processor 0 Processor 1 NIC NIC NIC NIC 1 2 1 2 Original Configuration Alternative Configuration with one replica with two replicas 11 ICPE13 Improving the Scalability of a Multi-core Web Server

  12. SCALABILITY ENHANCEMENT EVALUATING NEW CONFIGURATION Response time (ms) Response time (ms) Request rate (req/sec) Request rate (req/sec) SPECweb Support Workload TCP/IP Intensive Workload  Scalability Degradation  Scalability improvement  MAT decrement: 10%  MAT increment: 12.3% 12 ICPE13 Improving the Scalability of a Multi-core Web Server

  13. SCALABILITY ENHANCEMENT EVALUATING NEW CONFIGURATION • The response time inflation for Dynamic requests dominates the improvement achieved for Static requests • Mean and 99.9th percentile response times increase with 2-replicas 1 P [X <= x] CDF of Response times 80% CPU Utilization 0.95 22,000 req/sec SPECweb Support workload 0.9 0 2 4 10 10 10 Response Time (ms) 13 ICPE13 Improving the Scalability of a Multi-core Web Server

  14. VALIDATION INTER-CONNECT TRAFFIC • Inter-connect traffic decreased significantly for TCP/IP intensive workload • For Support workload, the change is not significant Inter-connect Traffic Inter-connect Traffic (Bytes/sec) (Bytes/sec) Request Rate (req/sec) Request Rate (req/sec) SPECweb Support Workload TCP/IP Intensive Workload 14 ICPE13 Improving the Scalability of a Multi-core Web Server

  15. VALIDATION LAST LEVEL CACHE • Last Level cache HIT ratio degrades with 2-replica configuration L3 Cache HIT Ratio Request Rate (req/sec) SPECweb Support Workload 15 ICPE13 Improving the Scalability of a Multi-core Web Server

  16. CONCLUSIONS • Multi-core Web server: scalable after tuning • Multiple Website Replicas • The effect on the scalability was workload dependent • LLC contention caused by php application • The result may be architecture dependent • The result may be application dependent • Future plan: • An automatic, workload adaptive approach to decide about best configuration 16 ICPE13 Improving the Scalability of a Multi-core Web Server

  17. Raoufeh Hashemian University of Calgary, Canada rhashem@ucalgary.ca This work is financially supported by: 17 ICPE13 Improving the Scalability of a Multi-core Web Server

  18. REFERENCES • Cherkasova et al.`00: Characterizing Temporal Locality and its Impact on Web Server Performance, International Conference on Computer Communications and Networks’00, Cherkasova; Ciardo; HP Labs • Elnozahy et al.`03: Energy Conservation Policies for Web Servers, USITS '03, Elnozahy; Kistler; Ramakrishnan; IBM • Majo et al.`12: Matching Memory Access Patterns and Data Placement for NUMA Systems, GC’12, Majo; Gross; ETH • Blagodurov et al.`11: A case for NUMA-aware contention management on multicore systems, USENIX ATC'11, Blagodurov; Zhuravlev; Dashti; Fedorova; SFU • Veal et al.`07: Performance scalability of a multi-core web server. A CM/IEEE ANCS ’07,Veal; Foong; Intel • Scogland et al.`09: Asymmetric interactions in symmetric multi-core systems: Analysis, enhancements and evaluation. ACM/IEEE SC’08, Scogland; Balaji; Feng; Narayanaswamy, • Boyd et.al ` 10: An analysis of linux scalability to many cores, USENIX OSDI’10, Boyd-Wickizer; Clements; Mao; Pesterev; Kaashoek; Morris; Zeldovich, MIT • Gaud et. al`11: Application-level optimizations on numa multicore architectures: the apache case study, RR- LIG-011, Gaud; Lachaize; Lepers; Muller; Quema. -2 ICPE13 Improving the Scalability of a Multi-core Web Server

  19. SCALABILITY EVALUATION CONFIGURATION TUNING Network interrupt handling • • 4 RSS queue per NIC port • Each queue bind to one core 2.0 Before Distributing Int. Load Response time (msec) After Distributing Int. Load 1.5 1.0 0.5 0.0 0 50,000 100,000 150,000 200,000 Rate (req/sec) -3 ICPE13 Improving the Scalability of a Multi-core Web Server

  20. SCALABILITY EVALUATION CONFIGURATION TUNING OS scheduling • • Binding each lighttpd process to 1 core 2 No Affinity 1.8 Response time (msec) 1.6 With affinity 1.4 1.2 1 0.8 0.6 0.4 0.2 0 0 50000 100000 150000 200000 Rate (req/sec) -4 ICPE13 Improving the Scalability of a Multi-core Web Server

  21. SCALABILITY EVALUATION WEB TIER VS. APPLICATION TIER • Static: Requests with lower response time • Processed only in Web tier (lighttpd) • Dynamic: Requests with higher response time • Processed only in Web and application tiers (lighttpd and php) Response time (ms) File size (Byte) -5 ICPE13 Improving the Scalability of a Multi-core Web Server

  22. SCALABILITY EVALUATION EXPERIMENTAL SETUP -6 ICPE13 Improving the Scalability of a Multi-core Web Server

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend