Selective Early Request Termination Selective Early Request - - PowerPoint PPT Presentation

selective early request termination selective early
SMART_READER_LITE
LIVE PREVIEW

Selective Early Request Termination Selective Early Request - - PowerPoint PPT Presentation

Selective Early Request Termination Selective Early Request Termination for Busy Internet Services for Busy Internet Services Jingyu Zhou and Tao Yang Zhou and Tao Yang Jingyu Ask.com Ask.com University of California, Santa Barbara


slide-1
SLIDE 1

Selective Early Request Termination Selective Early Request Termination for Busy Internet Services for Busy Internet Services

Jingyu Jingyu Zhou and Tao Yang Zhou and Tao Yang Ask.com Ask.com University of California, Santa Barbara University of California, Santa Barbara

slide-2
SLIDE 2

Multi Multi-

  • tier Internet Services

tier Internet Services

Index servers Index servers (partition 1) (partition 1) Doc servers Doc servers Query Query Frontends Frontends Local Local-

  • area

area network network Index servers Index servers (partition 2) (partition 2) Firewall/ Firewall/ Web switch Web switch Index servers Index servers (partition 3) (partition 3) Query Query caches caches

slide-3
SLIDE 3

Multi Multi-

  • thread Programming

thread Programming Model for Request Processing Model for Request Processing

  • Multi-threaded

service tier

– E.g., Apache, IIS, BEA WebLogic, and Neptune

Thread 1 Thread 2 Thread 3 Thread N Get request Process request Send result

Thread 3 ?

slide-4
SLIDE 4

Problem Statement Problem Statement

  • Service-level agreement

– E.g., 99% requests within 1s

  • A QoS challenge to be met during

– Flash-crowd type of high request rate – Size distribution shift: percentage of long requests increases

slide-5
SLIDE 5

Motivating Example: Motivating Example: Size Distribution Shift Size Distribution Shift

  • Settings

– 50 requests/ s – Two types of requests: 5ms and 500ms – Long requests vary from 0.1% to 10%

  • Results

– Significant throughput loss – Magnitude increase of response time – Admission control alone isn’t enough

2 4 6 8 10 10 20 30 40 50 Percentage of Long Requests Throughput (requests/s) 2 4 6 8 10 200 400 600 800 Mean Response Time (ms) Response time Throughput

slide-6
SLIDE 6

Current Techniques Current Techniques

  • Admission control

– Response time feedback (e.g., SEDA, Quorum) – Bounding request queue length (e.g., Neptune) – Policing TCP SYN packets (e.g., [ Voigt’01]

  • Adaptive service degradation

– E.g., reduce image quality

  • Size-based scheduling

– Only for static content – File size as estimator

slide-7
SLIDE 7

SERT Idea & Challenges SERT Idea & Challenges

  • Idea

– Request-aware: differentiate long and short requests – Early termination: abort long requests during

  • verload
  • Challenges

– Detect long/ short dynamic requests – Adaptive selection of termination threshold – Resource accounting for safety – Simplicity in programming

slide-8
SLIDE 8

SERT Architecture SERT Architecture

Thread Pool Request Queue

Resource Access

Resources

Memory

Lock File

...

Resource Accounting Module Termination Handler

I n v

  • k

e Set/Cancel Timer

Timer & Terminator

Terminate

Threshold Controller

slide-9
SLIDE 9

Resource Accounting Resource Accounting

  • Targets a class of requests that are

– Read-only – Stateless

  • Resources

– Memory: track heaps and memory mapped areas – Locks: use an integer counter – Sockets & file descriptors

slide-10
SLIDE 10

Threshold Controller Adjusts Threshold Controller Adjusts Termination Threshold Termination Threshold

  • Ideas

– During light load allow execute longer: large threshold – During heavy load terminate earlier: small threshold – Load index p is throughput loss

  • Formula

– Threshold= LB + F(p)× (UB-LB), where:

timeout range is [ LB, UB] ⎪ ⎪ ⎩ ⎪ ⎪ ⎨ ⎧ − − = ) ( 1 ) (

α

LW HW p HW p F

p < = LW LW < p < HW p > = HW

slide-11
SLIDE 11

Implementation & Usage Implementation & Usage

  • Intercept GLIBC/ Pthread functions

– Memory, Pthread locks, etc.

  • POSIX signal for terminations
  • Use sigsetjmp()/ siglongjmp()
  • Neptune middleware uses SERT APIs
  • Applications link the SERT library

with no code changes

slide-12
SLIDE 12

SERT APIs SERT APIs

  • Start timer thread and set signal type

extern int SERT_init_timer(int signum);

  • Start & end of a request

extern void SERT_start(); extern void SERT_end();

  • Set timeout value and controller parameters

extern void SERT_set_args(struct sert_arg * );

  • Set the rollback point

extern void SERT_register_rollbackpoint(void * );

slide-13
SLIDE 13

A Pseudo A Pseudo-

  • code Example

code Example

void worker() { while (1) { Request * request = get_request(); jump_buf env; if (sigsetjmp(&env, 1) = = 0) { SERT_register_rollbackpoint(&env); } else { / * longjmp back, resources has already been deallocated * / continue; } SERT_start(); process_request(request); SERT_end(); send_result(request); } }

slide-14
SLIDE 14

Experimental Settings Experimental Settings

  • Hardware

– 9 dual PIII 1.4GHz machines – Each has 4 GB RAM, 10K RPM SCSI disk – Fast Ethernet

  • Applications from Ask.com

– Index matching: find web pages containing key words; heavy-tailed; 2.1 GB warm data in memory – Ranking: rank page importance; exponential; in memory

App.

  • Ave. (ms)

90% (ms)

  • Max. (ms)

Index Match 23.6 46 2,732 Ranking 93 212 14,035

slide-15
SLIDE 15

Size Distribution Shift Size Distribution Shift

  • During shift, about 10% requests are 500+ ms
  • SERT

– 209.1% higher throughput – 54.7% response time reduction

20 40 60 80 100 120 140 160 180 20 40 60 80

Time (s) Throughput

←Pattern shift begins (30s) Pattern shift ends (155s)→

AC SERT Request Rate 20 40 60 80 100 120 140 160 180 1 2 3 4 5 6

Time (s) Response Time (s)

slide-16
SLIDE 16

Ranking Service Evaluation Ranking Service Evaluation

50 100 1 2 3 4 5 6 7 8 9 10

Load (%) Throughput Loss Percent Underloaded

100 120 140 160 180 200 10 20 30 40 50 60 70

Load (%) Throughput Loss Percent Overloaded

AC SERT 50 100 50 100 150 200 250 300

Load (%) Mean Response Time (ms) Underloaded

100 120 140 160 180 200 100 200 300 400 500 600 700 800

Load (%) Mean Response Time (ms) Overloaded

AC SERT

slide-17
SLIDE 17

Evaluation of Threshold Evaluation of Threshold Controller for Ranking Service Controller for Ranking Service

  • Adaptive controller vs. fixed threshold of 0.5s, 3.0s, 15s

80 100 120 140 160 180 200 10 20 30 40 50 60 70

Load (%) Throughput Loss Percent

80 100 120 140 160 180 200 100 200 300 400 500 600 700 800

Load (%) Response Time (ms)

15 3.0 0.5 Adapt

slide-18
SLIDE 18

Evaluation of Threshold Evaluation of Threshold Controller for Index Matching Controller for Index Matching

100 150 200 10 20 30 40 50 60

Load (%) Throughput Loss Percent

100 150 200 200 250 300 350 400 450

Load (%) Response Time (ms)

8.0 3.0 1.5 Adapt

  • Adaptive controller vs. fixed threshold of 0.5s, 3.0s, 15s
slide-19
SLIDE 19

Related Work Related Work

  • Real-time database systems

[ Kuo’00,Lin’90,Shu’94]

– Higher priority transaction aborts lower ones – UNDO/ REDO log for recovery

  • Recoverable memory libraries

– Recoverable virtual memory [ Saty.’94] , Rio Vista [ Lowell’97] – Application modifications needed

  • Process checkpointing and rollback

– Fault tolerance[ Li’90] , program replay [ Srinivasan’04] and debugging [ Qin’05]

slide-20
SLIDE 20

Conclusions Conclusions

  • Contribution: an early termination

scheme for busy Internet services

– Dynamically select termination threshold – Safely terminate requests early – Provide API for multi-threaded services

  • Future work

– Perform cooperative early-termination across different nodes and tiers

slide-21
SLIDE 21

Questions? Questions?

slide-22
SLIDE 22

CDF of Response Time during CDF of Response Time during Size Distribution Shift Size Distribution Shift

E.g., completed within one second

– SERT 81.7% – AC 45.3%