Green-CM: Energy efficient contention management for Transactional - - PowerPoint PPT Presentation

green cm energy efficient contention management for
SMART_READER_LITE
LIVE PREVIEW

Green-CM: Energy efficient contention management for Transactional - - PowerPoint PPT Presentation

Green-CM: Energy efficient contention management for Transactional Memory Shady Alaa Paolo Romano INESC-ID/IST Mats Brorsson - KTH Agenda Introduction Related work Architecture Green-CM Evaluation Conclusion ICPP


slide-1
SLIDE 1

Green-CM: Energy efficient contention management for Transactional Memory

Shady Alaa Paolo Romano – INESC-ID/IST Mats Brorsson - KTH

slide-2
SLIDE 2

Agenda

  • Introduction
  • Related work
  • Architecture
  • Green-CM
  • Evaluation
  • Conclusion

ICPP 2015 - Green-CM 2

slide-3
SLIDE 3

Introduction

  • Multicores are everywhere

– Complex programming

  • Locks
  • Deadlocks

– Transactional memory

  • Atomics blocks
  • Transparent from programmer

Main memory

Core 1 Core 2 Core 3 Core 4

atomic{ if(bal>amount) withdraw(amount); }

ICPP 2015 - Green-CM 3

slide-4
SLIDE 4

Introduction

  • Energy efficiency

– First order design choice – Battery based devices – Data centers

  • Goal

– Energy efficient transactional memory in terms of both energy and performance

ICPP 2015 - Green-CM 4

slide-5
SLIDE 5

Introduction

  • Contention Manager

– minimize contention – which transaction to abort – when to restart an aborted transaction

  • Energy efficiency:

– wait implementation – DVFS

ICPP 2015 - Green-CM 5

slide-6
SLIDE 6

Related work

  • Few work in literature

– Mainly HTM

  • Clock gating processors upon abort

– Lowering frequency upon abort

  • Using simulator
  • Studies

– HTM consume lower energy

  • Does not fit all workloads

– Need for adaptability

  • Using DVFS in TM

– Fastlane

  • Designed for low number of threads

ICPP 2015 - Green-CM 6

slide-7
SLIDE 7

Architecture

Asymmetric* Conten.on Manager* Tx*abort* (no.*of*retries,* core*on*which* tx*is*execu.ng)* Throughput*

*

Energy* Controller* Hybrid* Wait** Implementa.on* backEoff* dura.on* Tuning*of* Β* Tuning*of* α,*Τ* * End** backEoff* Restart* Tx*

ICPP 2015 - Green-CM 7

slide-8
SLIDE 8

Architecture

Asymmetric* Conten.on Manager* Tx*abort* (no.*of*retries,* core*on*which* tx*is*execu.ng)* Throughput*

*

Energy* Controller* Hybrid* Wait** Implementa.on* backEoff* dura.on* Tuning*of* Β* Tuning*of* α,*Τ* * End** backEoff* Restart* Tx*

ICPP 2015 - Green-CM 8

slide-9
SLIDE 9

Implementing waits

  • Building block for contention managers
  • Drastic effect on energy consumption
  • Can be implemented in two ways:

– Busy waiting – sleeping

ICPP 2015 - Green-CM 9

slide-10
SLIDE 10

Implementing waits

  • Busy waiting

– Fine granularity – Similar to real actual work

  • Sleeping

– Coarse granularity – Low energy consumption – expensive

ICPP 2015 - Green-CM 10

slide-11
SLIDE 11

Implementing waits

  • Hybrid approach

– Either busy wait or sleep

  • Adaptive fashion

– How to determine the threshold

  • Cost of sleep

ICPP 2015 - Green-CM 11

slide-12
SLIDE 12

Implementing waits

1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 100 1000 10000 100000 1x106 1x107 EDP / best EDP

  • Static Thresholds

Intruder Kmeans Threshold

ICPP 2015 - Green-CM 12

No one size fits all

slide-13
SLIDE 13

Architecture

Asymmetric* Conten.on Manager* Tx*abort* (no.*of*retries,* core*on*which* tx*is*execu.ng)* Throughput*

*

Energy* Controller* Hybrid* Wait** Implementa.on* backEoff* dura.on* Tuning*of* Β* Tuning*of* α,*Τ* * End** backEoff* Restart* Tx*

ICPP 2015 - Green-CM 13

slide-14
SLIDE 14

Asymmetric CM

  • DVFS

– Variable operating frequency

  • Exploiting DVFS

– Boosting active threads – Reducing freq. of backing off threads

  • Enabling DVFS

– Manual control is expensive – How to favor automatic boosting

P0 3.0 GHz P1 2.4 GHz P2 2.2 GHz P3 2.0 GHz P4 1.8 GHz P5 1.6 GHz P6 1.4 GHz

ICPP 2015 - Green-CM 14

slide-15
SLIDE 15

Asymmetric CM

Linear backoff Exp. Backoff Exp. Backoff Exp. Backoff Linear backoff Exp. Backoff Exp. Backoff Exp. Backoff

  • Linear backoff cores:

– Shorter backoff periods – Mainly busy waiting backoffs

  • Exp. Backoff cores:

– Longer backoff periods – Mainly sleep waiting

  • Favor boosting

– When enough cores are in sleep states

8 core processor

Busy wait Sleep Sleep Sleep Busy waiting Sleep Sleep Sleep

Boosted Sleep Sleep Sleep Boosted Sleep Sleep Sleep

ICPP 2015 - Green-CM 15

slide-16
SLIDE 16

Asymmetric CM

  • Increased contention?

– Cores not backing off exponentially

  • Control number of cores to be boosted

ICPP 2015 - Green-CM 16

slide-17
SLIDE 17

Asymmetric CM

0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 4 8 16 EDP / best EDP

  • No. of Boosted Threads

Static No. of Boosted Threads Intruder Kmeans Genome Memcached STM7

ICPP 2015 - Green-CM 17

slide-18
SLIDE 18

Architecture

Asymmetric* Conten.on Manager* Tx*abort* (no.*of*retries,* core*on*which* tx*is*execu.ng)* Throughput*

*

Energy* Controller* Hybrid* Wait** Implementa.on* backEoff* dura.on* Tuning*of* Β* Tuning*of* α,*Τ* * End** backEoff* Restart* Tx*

ICPP 2015 - Green-CM 18

slide-19
SLIDE 19

Controller

  • Online, lightweight
  • Hill climbing
  • Challenges:

– Collection of energy – Multi dimensional

  • Different exploration strategies

– Stabilization – Random jumps

ICPP 2015 - Green-CM 19

slide-20
SLIDE 20

Controller

  • Tuning α (threshold for hybrid)

0.5 1 1.5 2 2.5 I n t r u d e r K m e a n s M e m c a c h e d S T M 7 A v e r a g e EDP / best EDP Benchmark

no stab stab stab jmp 1 stab jmp 10

ICPP 2015 - Green-CM 20

slide-21
SLIDE 21

Controller

  • Tuning β (no. of boosted threads)

0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 I n t r u d e r K m e a n s M e m c a c h e d S T M 7 A v e r a g e EDP / best EDP Benchmark

no stab stab stab jmp 1 stab jmp 10

ICPP 2015 - Green-CM 21

slide-22
SLIDE 22

Controller

  • Merging the learners

0.5 1 1.5 2 2.5 Intruder Kmeans Memcached STM7 Average EDP / best EDP Benchmark

Coupling the Tuners

independent stab jmp 1 stab – stab stab jmp 1 – stab stab jmp 10 – stab bidim stab jmp 1

ICPP 2015 - Green-CM 22

slide-23
SLIDE 23

Evaluation

0.2 0.4 0.6 0.8 1 1.2 4 8 16 32 48 64 EDP-GreenCM / EDP Threads Intruder

ICPP 2015 - Green-CM 23

slide-24
SLIDE 24

Evaluation

0.2 0.4 0.6 0.8 1 1.2 4 8 16 32 48 64 EDP-GreenCM / EDP Threads STM7

ICPP 2015 - Green-CM 24

slide-25
SLIDE 25

Evaluation

0.2 0.4 0.6 0.8 1 1.2 4 8 16 32 48 64 EDP-GreenCM / EDP Threads Memcached

ICPP 2015 - Green-CM 25

slide-26
SLIDE 26

Evaluation

spin no-asym asym % of total cores Intruder, 64 threads p0 p1 p2 p3 p4 p5 p6

ICPP 2015 - Green-CM 26

slide-27
SLIDE 27

Conclusion

  • Implementation of waits has a significant

impact on energy efficiency

  • Experimental results (obtained on real

system) contradict previously published

  • nes based on simulation
  • Exploiting DVFS enhances energy

efficiency

  • Self-tuning is needed to adapt to different

workloads

ICPP 2015 - Green-CM 27

slide-28
SLIDE 28

THANK YOU

ICPP 2015 - Green-CM 28

slide-29
SLIDE 29

Evaluation

ICPP 2015 - Green-CM 29

0.2 0.4 0.6 0.8 1 1.2 4 8 16 32 48 64 Energy-GreenCM / Energy Threads Intruder

slide-30
SLIDE 30

Evaluation

ICPP 2015 - Green-CM 30

0.2 0.4 0.6 0.8 1 1.2 4 8 16 32 48 64 Time-GreenCM / Time Threads Intruder