[PPT] - System Challenges and System Challenges and Opportunities for PowerPoint Presentation

SLIDE 1

System Challenges and System Challenges and Opportunities for Opportunities for Transactional Memory Transactional Memory

JaeWoong JaeWoong Chung Chung

Computer System Lab Computer System Lab Stanford University Stanford University

SLIDE 2

2

My thesis is about My thesis is about

Computer system design that help leveraging

Computer system design that help leveraging hardware parallelism hardware parallelism

Transactional memory (TM) for easy parallel

Transactional memory (TM) for easy parallel programming programming

Contribution

Contribution

Challenges to building an efficient and practical TM system

Challenges to building an efficient and practical TM system

Opportunities to use TM beyond parallel programming

Opportunities to use TM beyond parallel programming

SLIDE 3

3

Multi Core Processors Multi Core Processors

No more frequency race

No more frequency race

The era of multi cores

The era of multi cores

Parallel programming is not easy

Parallel programming is not easy

Split a sequential task into multiple sub tasks

Split a sequential task into multiple sub tasks Year Performance

Pentium (1993) Pentium 4 (2000) Core Duo (2006)

SLIDE 4

4

Locking is hard to use Locking is hard to use

Object reference graph (e.g. Java and C++) Object reference graph (e.g. Java and C++)

Synchronize access to shared data

Synchronize access to shared data

Coarse

Coarse-

grain locking

grain locking

Easy to program

Easy to program

The other task is blocked

The other task is blocked

Fine

Fine-

grain locking

grain locking

High concurrency

High concurrency

Hard to use

Hard to use

Dead lock, priority inversion,

Dead lock, priority inversion, … …

High locking overhead

High locking overhead

1 2 3 4 6 5 Task1 Task2 : Object : Reference

SLIDE 5

Transactional Memory Transactional Memory

SLIDE 6

6

Transactional Memory Transactional Memory

Atomic and isolated execution of instructions

Atomic and isolated execution of instructions

Atomicity : All or nothing

Atomicity : All or nothing

Isolation : No intermediate results

Isolation : No intermediate results

Programmer

Programmer

A transaction encloses instructions

A transaction encloses instructions

logically sequential execution of transactions

logically sequential execution of transactions

TM system

TM system

Transactions are executed in parallel without conflict

Transactions are executed in parallel without conflict

If conflict, one of them is aborted and restarts

If conflict, one of them is aborted and restarts // Instructions // for Task1 // Instructions // for Task2 TX_Begin TX_End TX_Begin TX_End

SLIDE 7

7

TM Example TM Example

Data versioning

Data versioning

At

At TX_Begin TX_Begin, save register values , save register values

At write, save old memory values

At write, save old memory values

Conflict detection

Conflict detection

Read

Read-

set and write

set and write-

set per transaction

set per transaction

Conflict detection with set comparison

Conflict detection with set comparison 1 2 3 4 6 5 Tx1 Tx2

Tx 1 : 1 3 Tx 2 : 2 5 6 R R W W

R R R W W

SLIDE 8

8

TM Benefits TM Benefits

Logically sequential execution of transactions

Logically sequential execution of transactions

Optimistic concurrency control for parallel transaction

Optimistic concurrency control for parallel transaction execution execution

No dead lock, priority inversion, and convoying

No dead lock, priority inversion, and convoying

TM system handles pathological cases

TM system handles pathological cases

Composability

Composability

Error Recovery

Error Recovery

SLIDE 9

9

TM System Design TM System Design

Many proposals in hardware and software

Many proposals in hardware and software

Hardware acceleration for TM is crucial for performance

Hardware acceleration for TM is crucial for performance

HTM is 2 ~3 times faster than STM

HTM is 2 ~3 times faster than STM

Correctness : strong isolation

Correctness : strong isolation

Hardware TM

Hardware TM

In the beginning

In the beginning

register checkpoint

register checkpoint

At memory access

At memory access

Set read/write bits per cache line

Set read/write bits per cache line

Buffer new values in cache or log old values

Buffer new values in cache or log old values

Conflict detection

Conflict detection

With cache coherence protocol

With cache coherence protocol

With transaction validation protocol

With transaction validation protocol

SLIDE 10

10

Hardware TM Example Hardware TM Example

TM hardware

TM hardware

1 2 3 4 6 5 Tx1 Tx2

L1 cache

ADDR : DATA : R : W

1 XXX 3 XXX 5 XXX

L1 cache

ADDR : DATA : R : W

2 XXX 6 XXX

BUS

Memory

Load 1 1 1 1

TM programs

TM programs

Load 5

Conflict

5 XXX

Core 2 Regs’ Core 1

1

R R R W W R

Regs’

1 1

Tx1 Tx2

SLIDE 11

11

Challenges and Opportunities Challenges and Opportunities

How to build efficient TM system tuned for common

How to build efficient TM system tuned for common case? case?

How to build practical TM system to deal with uncommon

How to build practical TM system to deal with uncommon case? case?

Can we use TM to support system software?

Can we use TM to support system software?

Can we use TM to improve other important system

Can we use TM to improve other important system metrics? metrics?

SLIDE 12

12

Contributions Contributions

Challenges to building TM systems

Challenges to building TM systems

Common case behavior of parallel programs

Common case behavior of parallel programs

Extract architectural parameters for efficient TM system design

Extract architectural parameters for efficient TM system design

TM virtualization

TM virtualization

Overcome the limitation of TM hardware

Overcome the limitation of TM hardware

Opportunity for system beyond parallel programming

Opportunity for system beyond parallel programming

Multithreading for dynamic binary translation

Multithreading for dynamic binary translation

Guarantee correctness of DBT

Guarantee correctness of DBT

Support for reliability, security, and fast memory snapshot

Support for reliability, security, and fast memory snapshot

Improve important system metrics other than performance

Improve important system metrics other than performance

SLIDE 13

13

Outline Outline

Software parallelization : a major issue for performance

Software parallelization : a major issue for performance

Transactional memory

Transactional memory

Challenges to building TM systems

Challenges to building TM systems

Common case behavior of parallel programs

Common case behavior of parallel programs

TM virtualization

TM virtualization

Opportunities for systems beyond parallel programming

Opportunities for systems beyond parallel programming

Multithreading for dynamic binary translation

Multithreading for dynamic binary translation

Support for reliability, security, and fast memory snapshot

Support for reliability, security, and fast memory snapshot

Conclusion

Conclusion

SLIDE 14

Challenges to Challenges to Building TM Systems Building TM Systems

SLIDE 15

15

Challenge 1 : Challenge 1 : Common Case Behavior of Common Case Behavior of Parallel Programs Parallel Programs

Goal

Goal

Understand the common case behavior of TM programs

Understand the common case behavior of TM programs

Few TM programs available

Few TM programs available

More TM programs now but for research purpose

More TM programs now but for research purpose

Few efficient TM systems as development tool

Few efficient TM systems as development tool

“

“chicken & egg problem chicken & egg problem” ”

SLIDE 16

16

Inferring Transactions in Inferring Transactions in Multithread Programs Multithread Programs

Analyze existing parallel programs

Analyze existing parallel programs

Assumption : the inherent parallelism remains regardless of

Assumption : the inherent parallelism remains regardless of programming tools programming tools

Mapping programming primitives to transactions

Mapping programming primitives to transactions

Begin/End Parallel_For Begin/End Lock/Unlock Transaction primitive Transaction primitive Programming primitive Programming primitive

SLIDE 17

17

Parallel Applications Parallel Applications

Different domains and different language

Different domains and different language

APPLU, APPLU, Equake Equake, Art, Swim, CG, BT, IS , Art, Swim, CG, BT, IS OpenMP OpenMP Barnes, Mp3d, Ocean, Radix, FMM, Barnes, Mp3d, Ocean, Radix, FMM, Cholesky Cholesky, , Radiosity Radiosity, FFT, , FFT, Volrend Volrend, Water , Water-

N2, Water

N2, Water-

Spatial

Spatial ANL ANL Apache, Apache, Kingate Kingate, Bp , Bp-

vision, Localize, Ultra Tic

vision, Localize, Ultra Tic Tac Tac Toe, MPEG2, AOL Server Toe, MPEG2, AOL Server Pthread Pthread MolDyn MolDyn, , MonteCarlo MonteCarlo, , RayTracer RayTracer, Crypt, , Crypt, LUFact LUFact, , Series, SOR, Series, SOR, SparseMatmult SparseMatmult, SPECjbb2000, PMD, , SPECjbb2000, PMD, HSQLDB HSQLDB Java Java Applications Applications Languages Languages

SLIDE 18

18

Key Metrics of TM Programs Key Metrics of TM Programs

Support for nesting and system calls Frequency of nesting & I/O in transactions Transaction commit/abort

verhead

Write-set to length ratio Buffer size Read-/write-set size TM primitive overhead Transaction length Architectural parameters Architectural parameters Key metrics Key metrics

Measure the key metrics of TM programs

Measure the key metrics of TM programs

Use the metrics to make suggestions for TM designs

Use the metrics to make suggestions for TM designs

SLIDE 19

19

Transaction Length Transaction Length

Observation : Up to 95% of transactions < 5000 instructions

Suggestion : Light-weight transactional primitives

Observation : Rare but long transactions

Suggesion : Transaction over context-switching

16782 772 114 256 ANL average 22591 1056 805 879 Pthreads average 13519488 4256 149 5949 Java average Max 95th % 50th % Avg Length in Instructions Application

Number of instructions executed in transaction

SLIDE 20

20 0.01 0.1 1 10 100 1000 50th 80th Percentile of Transactions Write Set Size in Kbytes

ANL Java Pthreads

0.1 1 10 100 1000 50th 80th Percentile of Transactions Read Set Size in Kbytes

ANL Java Pthreads

Read Read-

/Write

/Write-

Set Size

Set Size

Observation : 98% transactions <16KB read

Observation : 98% transactions <16KB read-

set and

set and <6KB write set <6KB write set

Suggestion : 32K L1 cache is sufficient

Suggestion : 32K L1 cache is sufficient

Observation : Few very large transaction > 32K

Observation : Few very large transaction > 32K

Suggestion : Need for buffer space virtualization

Suggestion : Need for buffer space virtualization

Bytes of data read/written by transaction

Bytes of data read/written by transaction

SLIDE 21

21

Outline Outline

Software parallelization : a major issue for performance

Software parallelization : a major issue for performance

Transactional memory

Transactional memory

Challenges to building TM systems

Challenges to building TM systems

Common case behavior of parallel programs

Common case behavior of parallel programs

TM virtualization

TM virtualization

Opportunities for systems beyond parallel programming

Opportunities for systems beyond parallel programming

Multithreading for dynamic binary translation

Multithreading for dynamic binary translation

Support for reliability, security, and fast memory snapshot

Support for reliability, security, and fast memory snapshot

Conclusion

Conclusion

SLIDE 22

22

Challenge 2 : Challenge 2 : TM Virtualization TM Virtualization

Problem

Problem

Limited hardware resources tuned for common cases

Limited hardware resources tuned for common cases

E.g. buffer size for 99% transactions

E.g. buffer size for 99% transactions

How do we cover uncommon cases as well?

How do we cover uncommon cases as well?

Cache as buffer for transactional data

Cache as buffer for transactional data

What if cache capacity is exhausted? => space virtualization

What if cache capacity is exhausted? => space virtualization

What if a transaction is interrupted?

What if a transaction is interrupted?

Time virtualization

Time virtualization

What if transactions are nested deeply?

What if transactions are nested deeply?

Depth virtualization

Depth virtualization

SLIDE 23

23

XTM: XTM: eXtended eXtended TM TM

Goals

Goals

Virtualize

Virtualize TM space, time, and depth at low HW cost TM space, time, and depth at low HW cost

Completely transparent to user SW

Completely transparent to user SW

Minimize interference with coexisting HW transactions

Minimize interference with coexisting HW transactions

Assumption

Assumption

Overflows, interrupts, and deep nesting are rare

Overflows, interrupts, and deep nesting are rare

Approach

Approach

Transactional data and metadata in virtual memory

Transactional data and metadata in virtual memory

Using virtual memory support in OS

Using virtual memory support in OS

Data versioning & conflict detection at page granularity

Data versioning & conflict detection at page granularity

Similar to page

Similar to page-

based software DSM systems

based software DSM systems

SLIDE 24

24

XTM Overview XTM Overview

Basic operation

Basic operation

On HTM overflow, rollback and restart in SW mode

On HTM overflow, rollback and restart in SW mode

At the first access, create a copy of original (master) page

At the first access, create a copy of original (master) page

Change the address mapping to the copy (private page)

Change the address mapping to the copy (private page)

Transactional data in private page, committed data in master

Transactional data in private page, committed data in master page page

At commit, make the private page the new master page

At commit, make the private page the new master page

All orchestrated by the operating system (no HW)

All orchestrated by the operating system (no HW)

Conflict detection

Conflict detection

Use TLB shoot

Use TLB shoot-

downs to gain exclusive page access

downs to gain exclusive page access

Hardware requirement

Hardware requirement

Overflow exception

Overflow exception

SLIDE 25

25

XTM Example XTM Example

Timeline Per-Tx Page table For Level 0 Private Copy R W

XTM Metadata

Master page table Master page Page table pointer HTM Overflow Xtn Write Xtn Read Commit

SLIDE 26

26

Hardware Acceleration Hardware Acceleration

XTM

XTM-

g

g

Gradual page

Gradual page-

by

by-

page switching

page switching

Reduce the switch overhead from hardware mode to software

Reduce the switch overhead from hardware mode to software mode mode

A portion of transactional data in private pages, the rest in th

A portion of transactional data in private pages, the rest in the e cache cache

Hardware requirement :

Hardware requirement : OV(overflow OV(overflow) bit in page table ) bit in page table

XTM

XTM-

e

e

Additional buffer for overflowed read/write bits

Additional buffer for overflowed read/write bits

Reduce false conflict at page granularity

Reduce false conflict at page granularity

Hardware requirement : Eviction buffer

Hardware requirement : Eviction buffer

SLIDE 27

27

Switch to software mode

Time Virtualization Time Virtualization

Interrupt Can wait? No Yes Wait for short transaction to finish Young transction? No Abort young transaction Yes

Interrupt and context

Interrupt and context-

switch procedure

switch procedure

Na Naï ïve ve Approach Approach Rare case

SLIDE 28

28

Performance Analysis Performance Analysis

XTM causes no cost for applications without overflow

XTM causes no cost for applications without overflow

XTM

XTM-

g presents a good cost/performance tradeoff point

g presents a good cost/performance tradeoff point

20% faster to 50% slower than a fully

20% faster to 50% slower than a fully-

hardware solution

hardware solution

0.5 1 1.5 2 2.5 3 XTM XTM-g XTM-e VTM XTM XTM-g XTM-e VTM XTM XTM-g XTM-e VTM XTM XTM-g XTM-e VTM XTM XTM-g XTM-e VTM XTM XTM-g XTM-e VTM t omcat v [ 37.7%] volrend [ 0.01%] radix [ 0.26%] micro- P10 [ 39.2%] micro- P20 [ 60.3%] micro- P30 [ 60.8%] Normalized E xecution Time Versioning Validat ion Commit Violat ions Idle Useful 8.3

SLIDE 29

29

Outline Outline

Software parallelization : a major issue for performance

Software parallelization : a major issue for performance

Transactional memory

Transactional memory

Challenges to building TM systems

Challenges to building TM systems

Common case behavior of parallel programs

Common case behavior of parallel programs

TM virtualization

TM virtualization

Opportunities for systems beyond parallel programming

Opportunities for systems beyond parallel programming

Multithreading for dynamic binary translation

Multithreading for dynamic binary translation

Support for reliability, security, and fast memory snapshot

Support for reliability, security, and fast memory snapshot

Conclusion

Conclusion

SLIDE 30

Opportunities for Opportunities for Systems beyond Systems beyond Parallel Programming Parallel Programming

SLIDE 31

31

Opportunity 1 : Opportunity 1 : Dynamic Binary Translation Dynamic Binary Translation

DBT

DBT

Binary code is translated in run

Binary code is translated in run-

time

time

PIN,

PIN, Valgrind Valgrind, , DynamoRIO DynamoRIO, , StarDBT StarDBT, etc , etc

DBT use cases

DBT use cases

Translation on new target architecture

Translation on new target architecture

JIT optimizations in virtual machines

JIT optimizations in virtual machines

Binary instrumentation

Binary instrumentation

Profiling, security, debugging,

Profiling, security, debugging, … …

Original Binary Translated Binary DBT Framework DBT Tool

SLIDE 32

32

Example: Dynamic Information Flow Example: Dynamic Information Flow Tracking (DIFT) Tracking (DIFT)

Track

Track untrusted untrusted data data

A taint bit per memory byte

A taint bit per memory byte

Security policy uses the taint bit.

Security policy uses the taint bit.

E.g. no

E.g. no syscall syscall with with untrusted untrusted data data t = XX ; // untrusted data from network ……. swap t, u1; u2 = u1; taint(t) = 1; swap taint(t), taint(u1); taint(u2) = taint(u1); Variables Taint bits t u1 u2

SLIDE 33

33

Atomicity between original and instrumented

Atomicity between original and instrumented instructions for correctness instructions for correctness

Thread 1 swap t, u1; Thread2 u2 = u1; swap taint(t), taint(u1); taint(u2) = taint(u1); Variables Taint bits t u1 u2 XX 1 XX

Problem : Problem : DBT with Parallel Program DBT with Parallel Program

Security Security Breach !! Breach !!

SLIDE 34

34

How to Guarantee Atomicity? How to Guarantee Atomicity?

Easy but unsatisfactory solutions

Easy but unsatisfactory solutions

No multithreaded programs (

No multithreaded programs (StarDBT StarDBT) )

Serialization (

Serialization (Valgrind Valgrind) )

Hard solution : Locking

Hard solution : Locking

Idea : Enclose original and instrumented instruction with lock

Idea : Enclose original and instrumented instruction with lock

Fine

Fine-

grained locks

grained locks

locking overhead, convoying, limited scope of DBT

locking overhead, convoying, limited scope of DBT

ptimizations
ptimizations
Coarse

Coarse-

grained locks

grained locks

performance degradation

performance degradation

Lock nesting between app & DBT locks

Lock nesting between app & DBT locks

potential deadlock

potential deadlock

Tool developers should be feature + multithreading experts

Tool developers should be feature + multithreading experts

SLIDE 35

35

Transactional Memory for Transactional Memory for Correctness of DBT Correctness of DBT

Idea

Idea

Original and instrumented instructions in a transaction

Original and instrumented instructions in a transaction

Advantages

Advantages

Atomic execution

Atomic execution

High performance through optimistic concurrency

High performance through optimistic concurrency

Support for nested transactions

Support for nested transactions Thread 1 swap t, u1; swap taint(t), taint(u1); Thread2 u2 = u1; taint(u2) = taint(u1); TX_Begin TX_End TX_Begin TX_End

SLIDE 36

36

Granularity of Transaction Granularity of Transaction Instrumentation Instrumentation

Per instruction : short

Per instruction : short

High overhead of executing

High overhead of executing TX_Begin TX_Begin and and TX_End TX_End

Limited scope for DBT optimizations

Limited scope for DBT optimizations

Per basic block : long

Per basic block : long

Amortizing the

Amortizing the TX_Begin TX_Begin and and TX_End TX_End overhead

verhead
Easy to match

Easy to match TX_Begin TX_Begin and and TX_End TX_End

Per trace : longer

Per trace : longer

Further amortization of the overhead

Further amortization of the overhead

Potentially high transaction conflict

Potentially high transaction conflict

Profile

Profile-

based sizing : dynamic

based sizing : dynamic

Optimize transaction size based on transaction abort ratio

Optimize transaction size based on transaction abort ratio

SLIDE 37

37

Baseline Performance Results Baseline Performance Results

0% 10% 20% 30% 40% 50% 60%

Barnes Equake F mm Radiosity Radix S wim Tomcatv Water Water‐ spatial

Normalized Overhead (%)

8 CPUs

8 CPUs

Software TM and DIFT on PIN

Software TM and DIFT on PIN

41 % overhead on the average

41 % overhead on the average

Transaction at the DBT trace granularity

Transaction at the DBT trace granularity

SLIDE 38

38

Hardware Acceleration Hardware Acceleration

Overhead reduction

Overhead reduction

28% with register checkpoint

28% with register checkpoint

12% with register checkpoint + hardware signature

12% with register checkpoint + hardware signature

6% with full hardware TM

6% with full hardware TM

0% 10% 20% 30% 40% 50% 60%

B arnes E quake F MMR adios ityR adix S wim TomcatvWater Water‐ s patial

Normailized Overhead (%)

Register Checkpoint

Reg. + HW

Signature SW Only Full HW

SLIDE 39

39

Outline Outline

Software parallelization : a major issue for performance

Software parallelization : a major issue for performance

Transactional memory

Transactional memory

Challenges to building TM systems

Challenges to building TM systems

Common case behavior of parallel programs

Common case behavior of parallel programs

TM virtualization

TM virtualization

Opportunities for systems beyond parallel programming

Opportunities for systems beyond parallel programming

Multithreading for dynamic binary translation

Multithreading for dynamic binary translation

Support for reliability, security, and fast memory snapshot

Support for reliability, security, and fast memory snapshot

Conclusion

Conclusion

SLIDE 40

40

Opportunity 2 : Opportunity 2 : Improving Other System Metrics Improving Other System Metrics

TM hardware consists of

TM hardware consists of

Fine

Fine-

grain data versioning HW

grain data versioning HW

Fine

Fine-

grain access tracking HW

grain access tracking HW

Fast exception handlers

Fast exception handlers

Can use such HW for other purposes

Can use such HW for other purposes

Reliability, Security,

Reliability, Security, … …

The benefits for SW

The benefits for SW

Finer granularity (compared to VM

Finer granularity (compared to VM-

based approach)

based approach)

User

User-

level event handling (compared to VM

level event handling (compared to VM-

based approach)

based approach)

No instrumentation overhead (compared to DBT

No instrumentation overhead (compared to DBT-

based approach)

based approach)

Simplified code (compared to DBT

Simplified code (compared to DBT-

based approach)

based approach)

SLIDE 41

41

Outline for Outline for TM Hardware Application TM Hardware Application

Reliability

Reliability

Global & local checkpoints (data versioning)

Global & local checkpoints (data versioning)

Security

Security

Fine

Fine-

grain read/write barriers (address tracking)

grain read/write barriers (address tracking)

Isolated execution (data versioning)

Isolated execution (data versioning)

Memory snapshot (data versioning)

Memory snapshot (data versioning)

Concurrent garbage collector

Concurrent garbage collector

Dynamic memory profiler

Dynamic memory profiler

SLIDE 42

42

Memory Snapshot Memory Snapshot

Snapshot

Snapshot

Read

Read-

only image
nly image
Multiple regions

Multiple regions

Shared by multiple

Shared by multiple threads threads

Applications

Applications

Service threads that

Service threads that analyze memory in analyze memory in parallel with app threads parallel with app threads

Garbage collection,

Garbage collection, memory profiling (heap & memory profiling (heap & stack), stack), … …

Memory

mutator mutator mutator mutator

Read-only Snapshot

collector collector collector collector

< Garbage Collection >

SLIDE 43

43

TM Hardware TM Hardware ⇒ ⇒ Snapshot Snapshot

Feature correspondence

Feature correspondence

TM metadata

TM metadata ⇒ ⇒ track data written since snapshot track data written since snapshot

TM versioning

TM versioning ⇒ ⇒ storage for progressive snapshot storage for progressive snapshot

Including virtualization mechanism

Including virtualization mechanism

TM conflict detection

TM conflict detection ⇒ ⇒ catch errors catch errors

Writes to read

Writes to read-

only snapshot
nly snapshot
Differences & additions

Differences & additions

Data versioning for single thread Vs. multiple thread

Data versioning for single thread Vs. multiple thread

Table to record snapshot regions

Table to record snapshot regions

Resulting snapshot system

Resulting snapshot system

Fast : O(# CPUs) Scan (create) and O(1) write/read

Fast : O(# CPUs) Scan (create) and O(1) write/read

Small memory footprint : O(# memory locations written)

Small memory footprint : O(# memory locations written)

SLIDE 44

44

GC Overhead GC Overhead

Parallel GC: stop app threads & run GC threads

Parallel GC: stop app threads & run GC threads

20% to 30% overhead for memory intensive apps

20% to 30% overhead for memory intensive apps

Snapshot GC

Snapshot GC ⇒ ⇒ GC is essentially free GC is essentially free

Fast : Stop app, take snapshot, then run GC & app concurrently

Fast : Stop app, take snapshot, then run GC & app concurrently

Simple : +100 lines over parallel GC by Boehm

Simple : +100 lines over parallel GC by Boehm

Fundamentally simpler than any other concurrent GC

Fundamentally simpler than any other concurrent GC

SLIDE 45

45

Conclusion Conclusion

Challenges to building TM systems

Challenges to building TM systems

Common case behavior of parallel programs

Common case behavior of parallel programs

Extract architectural parameters for efficient TM system design

Extract architectural parameters for efficient TM system design

TM virtualization

TM virtualization

Overcome the limitation of TM hardware

Overcome the limitation of TM hardware

Opportunity for system beyond parallel programming

Opportunity for system beyond parallel programming

Multithreading for dynamic binary translation

Multithreading for dynamic binary translation

Fix correctness issue for DBT

Fix correctness issue for DBT

Support for reliability, security, and fast memory snapshot

Support for reliability, security, and fast memory snapshot

Improve important system metrics other than performance

Improve important system metrics other than performance

SLIDE 46

46

Acknowledgement Acknowledgement

KyungHae

KyungHae, wife , wife

Parents, brother, in

Parents, brother, in-

laws

laws

Prof.
Prof. Kozyrakis

Kozyrakis, advisor , advisor

Prof.
Prof. Olukotun

Olukotun, associate advisor , associate advisor

Prof. Garcia
Prof. Garcia-
Molina

Molina

Prof.
Prof. Saraswat

Saraswat

TCC group mates and research colleagues

TCC group mates and research colleagues

Korean mafia

Korean mafia