OUTLINE Introduction Scalability Evaluation Scalability - - PowerPoint PPT Presentation

outline
SMART_READER_LITE
LIVE PREVIEW

OUTLINE Introduction Scalability Evaluation Scalability - - PowerPoint PPT Presentation

R. Hashemian 1 , D. Krishnamurthy 1 , M. Arlitt 2 , N. Carlsson 3 1. University of Calgary 2. HP Labs 3. Linkping University by : Raoufehsadat Hashemian The 4th ACM/SPEC International Conference on Performance Engineering ICPE 2013 OUTLINE


slide-1
SLIDE 1
  • R. Hashemian1, D. Krishnamurthy1, M. Arlitt2, N. Carlsson3

The 4th ACM/SPEC International Conference on

Performance Engineering

ICPE 2013

  • 1. University of Calgary
  • 2. HP Labs
  • 3. Linköping University

by: Raoufehsadat Hashemian

slide-2
SLIDE 2
  • Introduction
  • Scalability Evaluation
  • Scalability Enhancement Approach
  • Validation
  • Conclusion

Improving the Scalability of a Multi-core Web Server

ICPE13 2

OUTLINE

slide-3
SLIDE 3
  • Enterprise applications
  • Performance: Improving QoS
  • e.g. Lower response times
  • Cost: Less money spent on hardware
  • e.g. Improving effective utilization
  • Goal: Higher utilization and acceptable response time
  • How to achieve this “Goal” for Web servers running on Multi-

core hardware?

INTRODUCTION

PROBLEM DESCRIPTION

Improving the Scalability of a Multi-core Web Server

ICPE13 3

10 20 30 40 50 100 Response Time CPU Utilization (%)

slide-4
SLIDE 4
  • Web servers before multi-core
  • Mature topic, wide-ranging discussions
  • Multi-core architecture
  • Most research on batch (non-interactive) workload
  • Web servers running on Multi-core
  • BUS problem in UMA system (Veal et al.`07)
  • Multiple Web server instances: 1 instance per

processor (Scogland et al.`09, Boyd et.al,10 Gaud et. al,11)

INTRODUCTION

BACKGROUND

Improving the Scalability of a Multi-core Web Server

ICPE13 5

slide-5
SLIDE 5

SCALABILITY EVALUATION

SCALABILITY MEASUREMENT

  • Measure Web server scalability for two workloads
  • Evaluate the effectiveness of multiple Web server

approach in the server’s scalability

  • Scalability
  • Maximum Achievable Throughput (MAT)

Improving the Scalability of a Multi-core Web Server

ICPE13 4

slide-6
SLIDE 6
  • 2 x 4 core Intel Xeon E5620 processors

NUMA Architecture

  • OS: Linux, kernel 3, Ubuntu
  • Webserver: Lighttpd
  • Application Server: php (FastCGI module)

SCALABILITY EVALUATION

EXPERIMENTAL SETUP

Nehalem Microarch. 2.4 GHz Frequency 32K IC - 32K DC L1 Cache 256K L2 Cache 12M (Inclusive) L3 Cache QPI -5.86 GT/s Inter-conn. 16GB - DDR3-1333 Memory

Processor 0

L3

C C 2 C 4 C 6

L2

L1 L1 L1 L1

L2 L2 L2

Processor 1

L3

C C 2 C 4 C 6

L2

L1 L1 L1 L1

L2 L2 L2

Memory Bank 0 Memory Bank 1 Improving the Scalability of a Multi-core Web Server

ICPE13 6

slide-7
SLIDE 7
  • TCP/IP Intensive workload
  • High TCP connection rate
  • Processing: low user level & high kernel level
  • 1 KB static file, up to 155,000 requests/second
  • SPECweb Support workload
  • Both static requests and php requests
  • Wider range of request types
  • Processing: high user level & moderate kernel level

SCALABILITY EVALUATION

WORKLOADS

Improving the Scalability of a Multi-core Web Server

ICPE13 7

slide-8
SLIDE 8
  • Change default lighttpd recommendation (1 Lighttpd worker

process per core)

  • Disable default Linux scheduling (use affinity)
  • Distribute interrupt handling load

SCALABILITY EVALUATION

CONFIGURATION TUNING

  • Improved MAT up to 69%
  • Balanced utilization levels for the eight cores
  • Fully utilized the server

Improving the Scalability of a Multi-core Web Server

ICPE13 8

slide-9
SLIDE 9
  • TCP/IP Intensive workload
  • Sub-linear

Maximum Achievable Throughput 146,000 req/sec

  • SPECweb Support workload
  • Almost linear

Maximum Achievable Throughput 23,000 req/sec

Scalability Scalability SCALABILITY EVALUATION

RESULTS

Improving the Scalability of a Multi-core Web Server

ICPE13 9

Number of Cores

slide-10
SLIDE 10

10

  • 1

10 10

1

10

2

10

3

0.2 0.4 0.6 0.8 1

x = Response time (msec) P [ X <= x ]

1 Core 2 Core 4 Core 8 Core

Response time vs. Core Count

  • “Low response time” requests
  • Static requests
  • Performance degrades
  • “High response time” requests
  • Dynamic requests
  • Performance improves

Knowing this behavior, how can we improve the scalability?

SCALABILITY EVALUATION

RESPONSE DISTRIBUTION ANALYSIS

Improving the Scalability of a Multi-core Web Server

ICPE13 10 CDF of Response times 80% CPU Utilization SPECweb Support Workload

10 10

2

0.9 0.92 0.94 0.96 0.98 1

slide-11
SLIDE 11

SCALABILITY ENHANCEMENT

MULTIPLE WEBSITE REPLICAS

  • Approach: Use 1 Web server instance per processor
  • Goal: Reduce inter-processor data migration

Processor 0 Processor 1

Single Replica Process NIC1 Queue NIC 2 Queue

NIC 2 NIC 1

Replica 1 Process NIC 1 Queue Replica 2 Process NIC 2 Queue

Processor 0 Processor 1 NIC 2 NIC 1 Original Configuration with one replica Alternative Configuration with two replicas

Improving the Scalability of a Multi-core Web Server

ICPE13 11

slide-12
SLIDE 12

Request rate (req/sec) Response time (ms)

TCP/IP Intensive Workload

  • Scalability Improvement
  • MAT increment: 12.3%

SPECweb Support Workload

  • Scalability Degradation
  • MAT decrement: 10%

SCALABILITY ENHANCEMENT

EVALUATING NEW CONFIGURATION

Improving the Scalability of a Multi-core Web Server

ICPE13 12

Request rate (req/sec) Response time (ms)

slide-13
SLIDE 13
  • Hypothesis:
  • Cache contention with 2-

replicas due to the larger working set size of dynamic requests

  • The response time inflation for Dynamic requests dominates the improvement

achieved for Static requests

  • Mean and 99.9th percentile response times increase with 2-replicas

CDF of Response times 80% CPU Utilization 22,000 req/sec SPECweb Support workload P [X <= x] Response Time (ms)

10 10

2

10

4

0.9 0.95 1

SCALABILITY ENHANCEMENT

EVALUATING NEW CONFIGURATION

Improving the Scalability of a Multi-core Web Server

ICPE13 13

slide-14
SLIDE 14

Request Rate (req/sec) Inter-connect Traffic (Bytes/sec) Request Rate (req/sec)

VALIDATION

INTER-CONNECT TRAFFIC

Inter-connect Traffic (Bytes/sec)

  • Inter-connect traffic decreased

significantly

  • Improved performance

Improving the Scalability of a Multi-core Web Server

ICPE13 14 TCP/IP Intensive Workload SPECweb Support Workload

  • No significant decrement
  • Improved performance for Static

requests

slide-15
SLIDE 15

L3 Cache HIT Ratio Request Rate (req/sec)

Improving the Scalability of a Multi-core Web Server

ICPE13 15

VALIDATION

LAST LEVEL CACHE

SPECweb Support Workload

  • Last Level cache (LLC) HIT ratio degrades with 2-replica configuration

Confirms the cache contention hypothesis

slide-16
SLIDE 16

CONCLUSIONS

  • Multi-core Web server: scalable after tuning
  • 80% utilization with acceptable response time
  • Multiple Website Replicas
  • The effect on the scalability is workload dependent
  • Dynamic requests trigger LLC contention
  • Contention may be architecture and application dependent
  • Future plan:
  • Design and develop an automatic, workload adaptive technique

which decides about best configuration

Improving the Scalability of a Multi-core Web Server

ICPE13 16

slide-17
SLIDE 17

This work is financially supported by: Raoufeh Hashemian University of Calgary, Canada rhashem@ucalgary.ca

Improving the Scalability of a Multi-core Web Server

ICPE13 17

slide-18
SLIDE 18

REFERENCES

  • Cherkasova et al.`00:Characterizing Temporal Locality and its Impact on Web Server Performance,

International Conference on Computer Communications and Networks’00, Cherkasova; Ciardo; HP Labs

  • Elnozahy et al.`03: Energy Conservation Policies for Web Servers, USITS '03, Elnozahy; Kistler; Ramakrishnan; IBM
  • Majo et al.`12: Matching Memory Access Patterns and Data Placement for NUMA Systems, GC’12, Majo;

Gross; ETH

  • Blagodurov et al.`11: A case for NUMA-aware contention management on multicore systems, USENIX

ATC'11, Blagodurov; Zhuravlev; Dashti; Fedorova; SFU

  • Veal et al.`07: Performance scalability of a multi-core web server. ACM/IEEE ANCS’07,Veal; Foong; Intel
  • Scogland et al.`09: Asymmetric interactions in symmetric multi-core systems: Analysis, enhancements and
  • evaluation. ACM/IEEE SC’08, Scogland; Balaji; Feng; Narayanaswamy,
  • Boyd et.al`10: An analysis of linux scalability to many cores, USENIX OSDI’10, Boyd-Wickizer; Clements;

Mao; Pesterev; Kaashoek; Morris; Zeldovich, MIT

  • Gaud et. al`11: Application-level optimizations on numa multicore architectures: the apache case study, RR-

LIG-011, Gaud; Lachaize; Lepers; Muller; Quema.

Improving the Scalability of a Multi-core Web Server

ICPE13

  • 2
slide-19
SLIDE 19
  • Network interrupt handling
  • 4 RSS queue per NIC port
  • Each queue bind to one core

SCALABILITY EVALUATION

CONFIGURATION TUNING

Improving the Scalability of a Multi-core Web Server

ICPE13

  • 3

0.0 0.5 1.0 1.5 2.0 50,000 100,000 150,000 200,000 Response time (msec) Rate (req/sec) Before Distributing Int. Load After Distributing Int. Load

slide-20
SLIDE 20
  • OS scheduling
  • Binding each lighttpd process to 1 core

SCALABILITY EVALUATION

CONFIGURATION TUNING

Improving the Scalability of a Multi-core Web Server

ICPE13

  • 4

0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 50000 100000 150000 200000 Response time (msec) Rate (req/sec) No Affinity With affinity

slide-21
SLIDE 21
  • Static: Requests with lower response time
  • Processed only in Web tier (lighttpd)
  • Dynamic: Requests with higher response time
  • Processed only in Web and application tiers (lighttpd and php)

Response time (ms)

SCALABILITY EVALUATION

WEB TIER VS. APPLICATION TIER

Improving the Scalability of a Multi-core Web Server

ICPE13

  • 5

File size (Byte)

slide-22
SLIDE 22

SCALABILITY EVALUATION

EXPERIMENTAL SETUP

Improving the Scalability of a Multi-core Web Server

ICPE13

  • 6