Hani Jamjoom MICHIGAN E M M Chun-Ting Chou M M Kang G. Shin M - - PowerPoint PPT Presentation

hani jamjoom
SMART_READER_LITE
LIVE PREVIEW

Hani Jamjoom MICHIGAN E M M Chun-Ting Chou M M Kang G. Shin M - - PowerPoint PPT Presentation

M M M M M The Impact of Concurrency Gains M M M on the Analysis and Control M M M of Multi-threaded Internet Services M M Hani Jamjoom MICHIGAN E M M Chun-Ting Chou M M Kang G. Shin M M M M NGINE M Electrical


slide-1
SLIDE 1

MICHIGAN E NGINE E RING

M M M M M M M M M M M M M M M M M M M M M M M M M M

The Impact of Concurrency Gains

  • n the Analysis and Control
  • f Multi-threaded Internet Services

Hani Jamjoom Chun-Ting Chou Kang G. Shin

Electrical Engineering and Computer Science The University of Michigan

slide-2
SLIDE 2

MICHIGAN E NGINE E RING

M M M M M M M M M M M M M M M M M M M M M M M M M M

E nvironment of Interest

services A typical shared hosting environment with:

  • Multiple services time share a single system
  • All requests are treated equally

Services use multi-threading or multi- processing to improve their efficiency:

  • Increase throughput
  • Reduce request waiting time

TYPICAL SERVER requests

slide-3
SLIDE 3

MICHIGAN E NGINE E RING

M M M M M M M M M M M M M M M M M M M M M M M M M M

Interactions of Interest

services

TYPICAL SERVER requests

  • Concurrency gains

are workload dependent

  • Characterize

impact of gains on server performance

  • Multiple services

compete for server resources

  • Predict and control

thread and service interactions

slide-4
SLIDE 4

MICHIGAN E NGINE E RING

M M M M M M M M M M M M M M M M M M M M M M M M M M

Why Should We Care?

  • Better predict how concurrency will impact perceived

performance

  • Improved system configuration
  • Maximize the benefits from multi-threading/multi-processing

abstractions

  • Better predict interactions between different threads and

services

  • Design better QoS control mechanisms
  • Avoid possible pitfalls when designing ad hoc controls
slide-5
SLIDE 5

MICHIGAN E NGINE E RING

M M M M M M M M M M M M M M M M M M M M M M M M M M

Challenges in Analyzing Multi-threading

  • Concurrency gains are workload-dependent
  • There is no fixed parameterization for all workloads
  • System time is not the only source of thread interactions
  • Bottleneck resources (e.g., disk) introduce indirect interactions
  • Small number requests for one service can have big impact on

performance of other services

  • Multi-threading or multi-processing is implemented in

different ways (cooperative, preemptive, etc.)

slide-6
SLIDE 6

MICHIGAN E NGINE E RING

M M M M M M M M M M M M M M M M M M M M M M M M M M

Goals of Paper & Talk

  • Characterize concurrency gains of different workloads
  • Predict performance of a single service
  • Look at how different services can interact with each
  • ther
  • Design a generic mechanism for configuring thread limits

to provide worst-case QoS guarantees

slide-7
SLIDE 7

MICHIGAN E NGINE E RING

M M M M M M M M M M M M M M M M M M M M M M M M M M

Characterizing Concurrency Gains

  • Express increase in throughput as a function of thread

allocation

  • Use SPECWeb’99-like workload as examples of possible

workloads

slide-8
SLIDE 8

MICHIGAN E NGINE E RING

M M M M M M M M M M M M M M M M M M M M M M M M M M

Real Workload Measurements

slide-9
SLIDE 9

MICHIGAN E NGINE E RING

M M M M M M M M M M M M M M M M M M M M M M M M M M

Simple Model…but Workload Dependent

  • REGION 1: Gain approximated by a linear function
  • REGION 2: Flat gain since threading yields no performance gains
  • REGION 3: Dramatic drop due to system thrashing
slide-10
SLIDE 10

MICHIGAN E NGINE E RING

M M M M M M M M M M M M M M M M M M M M M M M M M M

Service Model

Requests Service Class Queues Application A Application B CPU schedule in a round- robin fashion Service class 2 Service class 1 Service class 3

slide-11
SLIDE 11

MICHIGAN E NGINE E RING

M M M M M M M M M M M M M M M M M M M M M M M M M M

Analysis of Single Service Class

  • µ(i) is the state-dependent service rate = µ G(i) / i
  • CTMC assumes scheduling quantum is infinitesimally small, but is

necessary to account for speedup

  • Use standard techniques to derive steady-state probabilities and

estimate the processing delay

  • Continuous Time Markov Chain (CTMC):
slide-12
SLIDE 12

MICHIGAN E NGINE E RING

M M M M M M M M M M M M M M M M M M M M M M M M M M

Predicting Behavior of Real Systems

slide-13
SLIDE 13

MICHIGAN E NGINE E RING

M M M M M M M M M M M M M M M M M M M M M M M M M M

Potential Inaccuracies in CTMC Method

  • Transition point between overload and underload

is affected by our simplified arrival model

  • Using infinitesimal time quanta for thread

scheduling overestimates delay

  • Well-behaved service distribution does not

account for few long-lived requests

slide-14
SLIDE 14

MICHIGAN E NGINE E RING

M M M M M M M M M M M M M M M M M M M M M M M M M M

E xtending Results to Multiple Service Classes

  • Assume that services are independent
  • Do not consider situations when requests are processed by

multiple stages

  • Split independent services into two types:
  • Homogeneous: services with similar workload requirements

(e.g., differentiating between different client groups)

  • Heterogeneous: services with different workload

requirements (e.g., static vs. dynamic workloads)

  • Characterize the change in speedup
slide-15
SLIDE 15

MICHIGAN E NGINE E RING

M M M M M M M M M M M M M M M M M M M M M M M M M M

Non-predictable Interactions between Heterogeneous Services

  • Existence of an

artificial ceiling for static workload when hosted with dynamic workload

  • Shift in bottleneck

from CPU to disk

slide-16
SLIDE 16

MICHIGAN E NGINE E RING

M M M M M M M M M M M M M M M M M M M M M M M M M M

Proportional Resource Sharing between Homogenous Services

  • Throughput is

proportional to thread allocation

  • The bottleneck

resource is proportionally shared

slide-17
SLIDE 17

MICHIGAN E NGINE E RING

M M M M M M M M M M M M M M M M M M M M M M M M M M

Providing Worst-Case Delay Guarantees

  • How to configure thread limits for each running service
  • Confine performance degradation when one service gets
  • verloaded
  • Algorithm based on analytical model of homogeneous

services

  • Associate a cost function with each service class
  • Use dynamic programming to allow any cost function
  • Choose the thread allocation that minimizes the total cost

assuming any service can become overloaded

slide-18
SLIDE 18

MICHIGAN E NGINE E RING

M M M M M M M M M M M M M M M M M M M M M M M M M M

Overview of Dynamic Programming Algorithm

mmax 1 2 m

Class 1 Class 2 Classes 1,2

+

Each table corresponds to the worst-case cost

  • f class i if given m threads. The remaining

(mmax– m) threads belong to the other service classes, which are assumed to be saturated. 1 A new table combines tables of class 1 and 2. Each row holds the minimum cost if classes 1 and 2 are given m threads. This process is repeated, combining the resulting table from the previous step and the next service class. 2 The optimal allocation is found by tracing back the allocation that produced the minimum worst-case cost. 3

Class 3

+

Classes 1,2,3

mmax 2 m mmax 3 m

slide-19
SLIDE 19

MICHIGAN E NGINE E RING

M M M M M M M M M M M M M M M M M M M M M M M M M M

E xperimental Setup

Configure one service class with a fixed number of threads that are always busy, and compare predicted response time of the other service class with the measured values for different thread allocations.

slide-20
SLIDE 20

MICHIGAN E NGINE E RING

M M M M M M M M M M M M M M M M M M M M M M M M M M

Predicted Allocation is Close to Measured

Low priority service is allocated 8 threads Low priority service is allocated 16 threads

slide-21
SLIDE 21

MICHIGAN E NGINE E RING

M M M M M M M M M M M M M M M M M M M M M M M M M M

Summary of Results

  • Accounting for concurrency gains
  • Improves accuracy of prediction
  • Crucial when analyzing multiple service classes
  • Bimodal behavior of service queues
  • Queues are almost empty or totally full
  • Queueing only occurs when the service becomes critically loaded or overloaded

since a request is admitted quickly by a worker thread

  • Long queues are unnecessary to improve service performance
  • Analysis of Web workloads
  • Hard to (analytically) predict performance when services have different workload

characteristics (such as I/O heavy vs. CPU heavy)

  • Analysis can be used to provide predictable results and worst-case delay

guarantees with similar workloads

slide-22
SLIDE 22

MICHIGAN E NGINE E RING

M M M M M M M M M M M M M M M M M M M M M M M M M M

Future Directions

  • Fine grain analysis of heterogeneous services
  • Key for providing effective and general thread-based controls
  • Multi-Staged Services
  • A request may pass through multiple stages (e.g., through

HTTP, Application, and DB servers) before completion

  • Adds an extra level of possible interactions between threads
slide-23
SLIDE 23

MICHIGAN E NGINE E RING

M M M M M M M M M M M M M M M M M M M M M M M M M M

Thank You…

HANI JAMJOOM

  • Dept. of EECS

The University of Michigan

jamjoom@eecs.umich.edu www.eecs.umich.edu/~jamjoom