AScalable,NonblockingApproachto TransactionalMemory JaredCasper - - PowerPoint PPT Presentation

a scalable non blocking approach to transactional memory
SMART_READER_LITE
LIVE PREVIEW

AScalable,NonblockingApproachto TransactionalMemory JaredCasper - - PowerPoint PPT Presentation

AScalable,NonblockingApproachto TransactionalMemory JaredCasper HassanChafi BrianD.Carlstrom AustenMcDonald WoongkiBaek ChiCaoMinh ChristosKozyrakisKunleOlukotun


slide-1
SLIDE 1
  • AScalable,NonblockingApproachto

TransactionalMemory

HassanChafi AustenMcDonald

JaredCasper

ChiCaoMinh BrianD.Carlstrom WoongkiBaek ChristosKozyrakisKunleOlukotun

ComputerSystemLaboratory StanfordUniversity http://tcc.stanford.edu

slide-2
SLIDE 2
  • TransactionalMemory
  • Problem:ParallelProgrammingishardandexpensive.

Correctnessvs.performance

  • Solution:TransactionalMemory

Programmerdefinedisolated,atomicregions Easytoprogram,comparableperformancetofinegrainedlocking Doneinsoftware(STM),hardware(HTM),orboth(Hybrid)

  • ConflictDetection

Optimistic:Detectconflictsattransactionboundaries Pessimistic:Detectconflictsduringexecution

  • Versionmanagement

Lazy:Speculativewriteskeptincacheuntilendoftransaction Eager:Speculativelywrite“inplace”,rollbackonabort

slide-3
SLIDE 3
  • !

Sowhat’stheproblem?(Haven’twefiguredthisoutalready?)

  • CoresarethenewGHz

Trendis2xcores/2years:2in‘05,4in‘07,>16notfaraway Sun:N2has8coreswith8threads=64threads

  • Ittakesalottoadoptanewprogrammingmodel

Mustlasttensofyearswithoutmuchtweaking TransactionalMemorymust(eventually)scaleto100sofprocessors

  • TMstudiessofaruseasmallnumberofcores!

Assumebroadcastsnoopingprotocol

  • Ifitdoesnotscale,itdoesnotmatter
slide-4
SLIDE 4
  • "

Lazyoptimisticvs.Eagerpessimistic

  • Lazyoptimistic
  • Optimisticparallelism
  • Fastaborts

Lazyoptimistic

  • Slowercommits… good

enough??

slide-5
SLIDE 5
  • #
  • Serialcommit⇒ Parallelcommit

At256proc,if5%oftheworkisserial,maximumspeedupis18.6x Twophasecommitusingdirectories

  • Writethrough⇒ writeback

Bandwidthrequirementsmustscalenicely Again,usingdirectories

  • Restoftalk:

AugmentingTCCwithdirectories Doesitwork?

Whatarewegoingtodoaboutit?

slide-6
SLIDE 6
  • $

ProtocolOverview

  • Duringthetransaction

Trackreadandwritesetsinthecache Tracksharersofalineinthedirectory

  • Twophasecommit

Validation:Markalllinesinwritesetindirectories

  • Lockslinefrombeingwrittenbyanothertransaction

Commit: Invalidateallsharersofmarkedlines

  • Dirtylinesbecome“owned” indirectory
  • Requireglobalorderingoftransactions

UseaGlobalTransactionID(TID)Vendor

slide-7
SLIDE 7
  • DirectoryStructure

0x1000 ….. 0x0004 0x0000 Owned Marked PN … P1 P0 Address SharersList

Directory

NowServingTID(NSTID) SkipVector

  • Directorytrackssharersofeachlineathomenode

Markedbitisusedintheprotocol

  • NowservingTID:transactioncurrentlybeingservicedbydirectory

Usedtoensureaglobalorderingoftransactions SkipvectorusedtohelpmanageNSTID(seepaper)

slide-8
SLIDE 8
  • %

CacheStructure

Data Tag SM SR Valid Dirty

Cache

SharingVector WritingVector

  • Eachcachelinetracksifitwasspeculativelyread(SR)ormodified(SM)

Meaningthatlinewasreadorwritteninthecurrenttransaction

  • SharingandWritingvectorsrememberdirectoriesreadfromorwrittento

Simplebitvector

slide-9
SLIDE 9
  • &

Commitprocedure

  • Validation

RequestTID Informalldirectoriesnotinwritingvectorwewillnotbewritingtothem(Skip) RequestNSTIDofalldirectoriesinwritingvector

  • WaituntilallNSTIDs≥ ourTID

Markalllinesthatwehavemodified

  • CanhappeninparalleltogettingNSTIDs

RequestNSTIDofalldirectoriesinsharingvector

  • WaituntilallNSTIDs≥ ourTID
  • Commit

Informalldirectoriesinwritingvectorofcommit Directoryinvalidatesallothercopiesofwrittenline,andmarkslineowned

  • Invalidationmayviolateothertransaction
slide-10
SLIDE 10
  • ParallelCommitExample

NSTID:1Directory0 P1P2MO X … NSTID:1Directory1 P1P2MO Y … TID Vendor Tid:? P2 Tid:? P1

Load X DataX DataY LoadY LDY STY Commit LDX STX Commit

slide-11
SLIDE 11
  • P2

P1

ParallelCommitExample

NSTID:1Directory0 P1P2MO X … NSTID:1Directory1 P1P2MO Y … TID Vendor

LDY STY Commit LDX STX Commit TIDReq. TID=1 TIDReq. TID=2

Tid:2 Tid:1 Tid:? Tid:?

slide-12
SLIDE 12
  • NSTID:1

ParallelCommitExample

Directory0 P1P2MO X … Directory1 P1P2MO Y … TID Vendor Tid:2 P2 Tid:1 P1

NSTIDProbe NSTIDProbe LDY STY Commit LDX STX Commit

NSTID:2 NSTID:1

Skip1

Skip2

NSTID:1 NSTID:2

NSTID:3

slide-13
SLIDE 13
  • !

ParallelCommitExample

NSTID: 2Directory0 P1P2MO X … NSTID:1Directory1 P1P2MO Y … TID Vendor Tid:2 P2 Tid:1 P1

MarkX MarkY LDY STY Commit LDX STX Commit

slide-14
SLIDE 14
  • "

ParallelCommitExample

NSTID: 2Directory0 P1P2MO X … NSTID:1Directory1 P1P2MO Y … TID Vendor Tid:2 P2 Tid:1 P1

Commit Commit LDY STY Commit LDX STX Commit

slide-15
SLIDE 15
  • #

ConflictResolutionExample

NSTID:1Directory0 P1P2MO X … NSTID:1Directory1 P1P2MO Y … TID Vendor Tid:? P2 Tid:? P1

Load X DataX DataY LoadY LDY LDX STX Commit LDX STX Commit

slide-16
SLIDE 16
  • $

ConflictResolution Example

NSTID:1Directory0 P1P2MO X … NSTID:1Directory1 P1P2MO Y … TID Vendor Tid:? P2 Tid:?

DataX LoadX TIDReq. TID=1

P1

LDY LDX STX Commit LDX STX Commit

… Tid:1

slide-17
SLIDE 17
  • ConflictResolution Example

NSTID:1Directory0 P1P2MO X … Directory1 P1P2MO Y … TID Vendor Tid:x Tid:1 P1

NSTIDProbe NSTIDProbe TIDReq. TID=2 Skip1

P2 Tid:2 NSTID:1 NSTID:2

LDX STX Commit

LDY LDX STX Commit

slide-18
SLIDE 18
  • %

ConflictResolution Example

NSTID:1Directory0 P1P2MO X … Directory1 P1P2MO Y … TID Vendor Tid:2 P2 Tid:1 P1

NSTID:1 Mark:X NSTIDProbe NSTID:3 Skip2 NSTID:1

NSTID:2 NSTID:3

LDX STX Commit

LDY LDX STX Commit

slide-19
SLIDE 19
  • &

ConflictResolution Example

Directory0 P1P2MO X … NSTID:3Directory1 P1P2MO Y … TID Vendor Tid:2 P2 Tid:1 P1

Commit InvalidateX

Violation!

LDX STX Commit

LDY LDX STX Commit

NSTID:1 NSTID:2

slide-20
SLIDE 20
  • ConflictResolution Example(Writeback)

NSTID:2Directory0 P1P2MO X … NSTID:3Directory1 P1P2MO Y … TID Vendor Tid:2 P2 Tid:1 P1

Request:X WB:X DataX LoadX LDX STX Commit

slide-21
SLIDE 21
  • Evaluationenvironment

1pernode,10cyclelatency Directory 100cyclelatency MainMemory 2Dgridtopology,14cyclelinklatency Interconnection 512KB,32bytecacheline,8way,16cyclelatency L2 32KB,32bytecacheline,4way,1cyclelatency L1 1 64singleissuePowerPCcores CPU

slide-22
SLIDE 22
  • ItScales!
  • "

$ %

  • "

$ %

% $ ! $" % $ ! $" % $ ! $" % $ ! $"

'()*+,-. /0+

  • 1(
  • 2

barnes radix SVMClassify equake

57x!

slide-23
SLIDE 23
  • !

Resultsforsmalltransactions

volrend water nsquared

  • "

$ %

  • "

$

% $ ! $" % $ ! $"

'()*+,-. /0+

  • 1(

2 !

  • "

#! ! #!

slide-24
SLIDE 24
  • "

LatencyTolerance

  • !

" # $

  • "

%

  • "

%

  • "

%

  • /0+
  • 1(
  • 2

swim radix waterspatial $%&"

slide-25
SLIDE 25
  • #

Remotetrafficbandwidth

!&'()% *!%+)%,-./012./

3 3# 3 3 # 3 3# 3!

  • +

4 5+ (* 6 7( 6 8+ ( 9

  • :7(
slide-26
SLIDE 26
  • $

Takehome

  • TransactionalMemorysystemsmustscaleforTMtobeuseful
  • LazyoptimisticTMsystemshaveinherentbenefits

Nonblocking Fastabort

  • LazyoptimisticTMsystemscale

Fastparallelcommit Bandwidthefficiencythroughwritebackcommit

slide-27
SLIDE 27
  • Questions?

Whew! JaredCasper

jaredc@stanford.edu

ComputerSystemsLab StanfordUniversity http://tcc.stanford.edu