AnEffectiveHybrid TransactionalMemorySystem withStrongIsolationGuarantees
,MartinTrautmann,JaeWoongChung, AustenMcDonald,NathanBronson,JaredCasper, ChristosKozyrakis,KunleOlukotun
ComputerSystemsLaboratory StanfordUniversity http://tcc.stanford.edu
AnEffectiveHybrid TransactionalMemorySystem - - PowerPoint PPT Presentation
AnEffectiveHybrid TransactionalMemorySystem withStrongIsolationGuarantees ,MartinTrautmann,JaeWoongChung, AustenMcDonald,NathanBronson,JaredCasper,
ComputerSystemsLaboratory StanfordUniversity http://tcc.stanford.edu
1
TransactionalMemory(TM)systemsarepromising
Largeatomicblockssimplifyparallelprogramming Speedoffine3grainlockswithsimplicityofcoarse3grainlocks
TMcanbeimplementedineitherhardwareorsoftware
HardwareTM(HTM)isfastbutinflexible&costly SoftwareTM(STM)isflexiblebutslow
Signature3AcceleratedTM(SigTM)isanewhybridTM
Useshardwaresignaturestoacceleratesoftwaretransactions
Fast,flexible,&cost3effective
Implementsstrongisolationoftransactionalcode
Correct&predictableexecutionofsoftwaretransactions
2
Introduction SigTM Performance SigTM StrongIsolation RelatedWork Conclusion
3
Low3level Compiler
WhatdotheseSTMfunctionsdo?
4
Constantoverheadcostpertransaction Expensiveonlyforshorttransactions
"#
5
Buildingtheread3setisexpensive Overheadcostpertransactionvaries
Localityofreadaccesses,sizeofread3set,transactionlength
$%&' '' "
6
)*
Localityofwriteaccesses,sizeofwrite3set,transactionlength
Significantlylessexpensivethan (reads≥ writes) Calledtowriteshareddata→ addtowrite3set
7
Expensive:scanread3set(1x);scanwrite3set(3x),locks
++ $
, '' " $ ( $%&' $
8
1.5x3 7xslowdownoversequential HybridTMshouldfocuson and++
9
SigTM simplifiesSTMbyusingsimplehardware
SW SW
SW(locks)
SW(version#)
10
SigTM addsalittleHW(signatures)toaccelerateSTM
EachHWthreadhas2HWsignatures:read3set,write3set NootherHWmodifications(e.g.,noextracachestates)
& and&) populatesignatures
Time Read3SetSignature /
1 2 3 4
! " .5642 /
" # 3 !
11
Signatureswatchcoherencemessages
SWenables/disables
Onhitinsignature,either:
TriggerSWaborthandler(conflictdetection) NACKremoterequest(isolationenforcement)
Signaturesmaygeneratefalseconflicts
Performancebutnotcorrectnessissue Reducewithlongersignatures&betterhashfunctions
Read3SetSignature /
" # 3 !
" !
12
Read3setsignaturestartsmonitoringcoherencemessages
Ifhit,signatureinvokes" Continuousvalidationofread3set
& ! "# $%% %&
13
& doesnotneedto:
Validatereadaddress→ continuousvalidationbyHWsignature Buildsoftwareread3set→ justaddtoread3setsignature
& $ $%&' % % (
14
&) populates write3setsignature
Usedduring&++
Write3setversioningstillinSW
&)* % % $%*
15
Read3setsignatureeliminatesscanofread3settovalidate Write3setsignatureeliminateslocks Twowrite3setscansinsteadofthree
&++ $%% %& $ %' $%% % $%% %& $
$%% %&
16
Measureddynamicinstructioncounts
R=#wordsinread3set;W=#wordsinwrite3set
Measuredsingle3threadperformancerelativetosequential
41+12W 44+16R+31W ( 8 19
1.25x )& 0.41 0.14 0.81 0.65
17
Execution3drivensimulationtocompare:SigTM,STM,HTM STAMP:StanfordTransactionalAppsforMultiprocessing
4benchmarksforTMresearchwritteninC
delaunay:Delaunaymeshgeneration genome:genesequencing kmeans:K3meansclustering vacation:travelreservationsystem(similartoSPECjbb2000)
Parallelizedfromsequentialcode
Coarse3graintransactions(intuitiveparallelprogramming) Over95%oftimeisspentintransactions
STMcodeismanuallyoptimized(samecodeforSigTM)
HTMcodehasnoinstrumentationonreads/writes
18
SigTM fasterthanSTMbutslowerthanHTM Genome:SigTM 30%fasterthanSTM;within10%ofHTM Vacation:SigTM 2.8xfasterthanSTM;2xslowerthanHTM
Manynon3redundantreadbarriers→ largeperformancedifference
19
Decreasedsignaturesizetoincreasefalseconflicts Performancesensitivetoread3setsignaturelength
1024bitsisrecommended
Performanceinsensitivetowrite3setsignaturelength
128bitsisrecommended
20
Introduction SigTM Performance SigTM StrongIsolation RelatedWork Conclusion
21
Twoacceptableoutcomes:
T1commitsfirst;T1privatizes&usesnon3incremented* T2commitsfirst;T1privatizes&usesincremented*
Workscorrectlywithlock3basedsynchronization
Race3freeprogram
Thread1
*99
22
Thread1
*99
AllSTMsmayleadtounexpectedresultswiththiscode
T1mayusebothold&newvalueafterprivatization
Cause:non3transactionalaccessesarenotinstrumented
Non3Txwritesdonotcause Txto abort Txcommitnot isolatedwithrespectto non3TXaccesses
23
Definition:transactionsare isolatedfromnon3Txaccesses HTM → inherentstrongisolation
Non3Txcausecoherencemessages Conflictdetectionmechanismenforcesstrongisolation
STM → supplementedstrongisolation
Additionalbarriersneededinnon3Txaccesses Somecanbeoptimizedbutstillasourceofoverhead
SigTM→ inherentstrongisolation
Withoutadditionalinstrumentationoroverhead
24
Non3Txwritetoread3set?
Hitsinread3setsignature → transactionaborts
77- 77. +
9-
25
Introduction SigTMPerformance SigTMStrongIsolation RelatedWork Conclusion
26
Kumar(PPoPP’06)andHyTM(ASPLOS’06)
RequiresignificantcachemodificationsforHTM Need2versionsoftransactioncode
HASTM(MICRO’06)
Requirescachemodifications(expensivefornesting) Cacheupdatesfromprefetching/speculationproblematic
RTM(ISCA’07– latertoday)
Requiressignificantcachemodifications(TMESI)
Cachehandlescommoncaseconflictdetectionand buffering
Poorperformance(slowerthansequential…)
27
Bulk(ISCA’06)
FirstuseofsignaturesforTM RequiresadditionalHWforwriteversioning
LogTM3SE(HPCA’07)
AdditionalHWtoimplementundolog AdditionalHWtorememberrecentlyloggedlines Recommendedsmallersignatures(32–64bits)
28
SigTMisahybridTMthat:
Usesminimaladditionalhardware
1Kbitsforread3setsignature;128bitsforwrite3setsignature Nomodificationtocaches
ReducestheruntimeoverheadofSWtransactions
EliminatesSWread3set,locks,andtimestamps Continuousvalidationofread3setbyHWsignatures
Leadstogoodperformance
OutperformsSTMby30%– 280% Slowdown comparedtoHTM is10%– 100%
Deliversstrongisolationforpredictablebehavior
29
!& AnewbenchmarksuitedesignedforTMresearch