Exploiting Hardw are Heterogeneity for I nteractive Services - - PowerPoint PPT Presentation

exploiting hardw are heterogeneity for i nteractive
SMART_READER_LITE
LIVE PREVIEW

Exploiting Hardw are Heterogeneity for I nteractive Services - - PowerPoint PPT Presentation

Exploiting Hardw are Heterogeneity for I nteractive Services Yuxiong He 2 Joint work with Shaolei Ren 1 , Sameh Elnikety 2 , Kathryn S McKinley 2 1 Florida International University 2 Microsoft Research 1 I nteractive Services Applications


slide-1
SLIDE 1

Exploiting Hardw are Heterogeneity for I nteractive Services

Yuxiong He 2

Joint work with Shaolei Ren1, Sameh Elnikety2, Kathryn S McKinley2

1Florida International University

2Microsoft Research

1

slide-2
SLIDE 2

I nteractive Services

  • Applications

– Web search, web server, finance server

  • Requirements

– High quality, fast response – High throughput, low cost

2

slide-3
SLIDE 3

Hardw are for I nteractive Services in Today’s Data Center

  • Homogeneous servers

3

Few fast high-performance cores Many slow energy-efficient cores

slide-4
SLIDE 4

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 5 15 25 35 45 55 65 75 85 95 probability service dem and ( m s)

Variance of Job Service Dem and

Hom ogeneous server w ith slow cores: cannot satisfy QoS of long requests Hom ogeneous server w ith fast cores: meet QoS but energy consuming and lower throughput

4

  • Figure. Measured Bing search service demand distribution
slide-5
SLIDE 5

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 5 15 25 35 45 55 65 75 85 95 probability service dem and ( m s)

Opportunity of Heterogeneity

5

  • Figure. Measured Bing search service demand distribution

Slow cores Challenges:

  • 1. Service demand

is unknown.

  • 2. Jobs compete for

cores. Heterogeneous server: combine fast and slow cores Fast cores

slide-6
SLIDE 6

Contributions

  • FOF scheduler for heterogeneous servers
  • Bing search server simulation

– Double throughput while meeting QoS

  • FOF for servers with SMT (Simultaneous

Multithreading)

  • Finance server implementation

– 16% higher throughput than default OS scheduler

6

slide-7
SLIDE 7

Scheduling Model

  • Inputs
  • Queue of jobs
  • Job service demand unknown
  • Job deadline
  • Partial results

7

Measured Bing search quality profile

slide-8
SLIDE 8

Scheduling Model

  • Inputs
  • Queue of jobs
  • Job service demand unknown
  • Job deadline
  • Partial results
  • Outputs
  • Assign jobs to fast/ slow cores
  • Decide processing time of jobs
  • Objective
  • Maximize total quality of all jobs

8

slide-9
SLIDE 9

Challenge I . Unknow n Service Dem and

  • How can we assign long jobs to fast cores

and short jobs to slow cores?

  • Key insight: Slow to Fast

– Migrate a job from slower to faster cores – Short jobs complete on slow cores – Leave fast cores for long jobs

9

slide-10
SLIDE 10

Challenge I I . Jobs Com pete for Cores

  • Which jobs should be processed by fast

cores?

  • Key insight: Fast Old

– Assign fast cores to old jobs.

10

slide-11
SLIDE 11

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 5 20 35 50 65 80 95 service dem and

2 0 m s

“Fast Old” insight

11

  • Older job has closer deadline.
  • Older job has more work left.
  • “Fast old” improves response quality

2 7 .2 m s 3 1 .6 m s

probability

slide-12
SLIDE 12

Fast Medium Slow

1 . Fast first: alw ays use the fastest available core 2 . Fast old: prom ote old jobs slow to fast

FOF Scheduler: Fast Old & First

12

slide-13
SLIDE 13

Evaluation

  • Simulation modeling Bing search workload
  • Hardware:

4 servers configurations with same design time power budget

13

A: 2 Big cores (Sandy Bridge) B: 10 Medium cores (Nehalem) C: 24 Small cores (AtomD) D: 1 B + 4 M + 2 S

slide-14
SLIDE 14

0.975 0.98 0.985 0.99 0.995 1 10 20 30 40 50 60 70 80 90 100 Quality Arrival rate: Queries per Second

Hom ogeneous Fast vs Slow Cores

Hom ogeneous

  • A. 2 Fast
  • B. 1 0 Medium
  • C. 2 4 Slow

14

0 .9 9 8

slide-15
SLIDE 15

0.975 0.98 0.985 0.99 0.995 1 10 20 30 40 50 60 70 80 90 100 Quality Arrival rate: Queries per Second

Hom ogeneous Fast vs Slow Cores

Hom ogeneous

  • A. 2 Fast
  • B. 1 0 Medium
  • C. 2 4 Slow

15

A

0 .9 9 8

slide-16
SLIDE 16

0.975 0.98 0.985 0.99 0.995 1 10 20 30 40 50 60 70 80 90 100 Quality Arrival rate: Queries per Second

Hom ogeneous Fast vs Slow Cores

Hom ogeneous

  • A. 2 Fast
  • B. 1 0 Medium
  • C. 2 4 Slow

16

A B

0 .9 9 8

slide-17
SLIDE 17

0.975 0.98 0.985 0.99 0.995 1 10 20 30 40 50 60 70 80 90 100 Quality QPS 0.975 0.98 0.985 0.99 0.995 1 10 20 30 40 50 60 70 80 90 100 Quality QPS

FOF on Heterogeneous

  • D. 1 Fast

+ 4 Medium + 2 Slow

Heterogeneous vs. Hom ogeneous

Hom ogeneous

  • A. 2 Big
  • B. 1 0 Medium
  • C. 2 4 Sm all

17

Double 0 .9 9 8 throughput

  • r buy 5 0 % few er servers

A B D

0 .9 9 8

slide-18
SLIDE 18

Opportunities on Existing Data Center Hardw are

  • SMT (Simultaneous Multithreading) or

Hyperthreading

  • SMT creates asymmetry among cores

– Fast core: a physical core only runs one job – Slow core: two logical cores belonging to the same physical core both run jobs

18

slide-19
SLIDE 19

I nsight SMT = dynam ic heterogeneous core

19

4 fast 3 fast+ 2 fast + 2 slow 4 slow ... 8 slow

SMT off SMT on … SMT on 1 core all cores Sim ultaneous Multithreading ( SMT)

slide-20
SLIDE 20

FOF Scheduler for SMT

1 . Fast first Fastest = unshared core 2 . Fast old free core? Find shared pair ( oldest, X) m ove X to free core

20

slide-21
SLIDE 21

Evaluation

  • Implementation on Finance application:

Monte-Carlo computation for option price

  • Hardware: 6 Core 2-way SMT 3.33 GHz

Intel Xeon X5680

– shared (slow) smt-core speed = 0.63 x unshared (fast) core speed

  • FOF achieves

– 16% higher throughput than default OS scheduler while meeting QoS

21

slide-22
SLIDE 22

Conclusions

  • FoF scheduler for interactive services

– Exploit hardware heterogeneity – Achieve both high quality and high throughput

  • Heterogeneous servers: Bing search

simulation

– Double throughput while meeting QoS

  • SMT: Finance server implementation

– 16% higher throughput than default OS scheduler

22

slide-23
SLIDE 23

Thank you & Questions

23