The moment of truth: are we done with STM? Nuno Diegues , Paolo - PowerPoint PPT Presentation

The moment of truth: are we done with STM? Nuno Diegues , Paolo Romano, Luís Rodrigues ndiegues@gsd.inesc-id.pt Nuno Diegues 1/27

Over 20 years of Transactional Memory Nuno Diegues 2/27

Over 20 years of Transactional Memory Commodity processors with hardware support Nuno Diegues 2/27

Over 20 years of Transactional Memory Processors by IBM (BG/Q and zEC12) and Intel (Haswell) Nuno Diegues 2/27

The question Raise the question: are we done with STM ? Nuno Diegues 3/27

The question Raise the question: are we done with STM ? + Hardware ought to be faster + Transparency and ease of use Nuno Diegues 3/27

The question Raise the question: are we done with STM ? + Hardware ought to be faster + Transparency and ease of use - Research in STMs has evolved into a mature state - Limited nature of hardware Nuno Diegues 3/27

The question Raise the question: are we done with STM ? + Hardware ought to be faster + Transparency and ease of use - Research in STMs has evolved into a mature state - Limited nature of hardware What else is there to find? Nuno Diegues 3/27

Outline 1 (Quick) Motivation 2 Study Description 3 Compared Techniques 4 Results and Insights 5 Summary of Conclusions Nuno Diegues 4/27

Outline 1 Motivation 2 Study Description 3 Compared Techniques 4 Results and Insights 5 Summary of Conclusions Nuno Diegues 5/27

Study Commodity hardware in Intel TSX ◮ IBM processors target high performance computing Nuno Diegues 6/27

Study Commodity hardware in Intel TSX ◮ IBM processors target high performance computing ◮ Intel Haswell Xeon E3-1275v3 3.5GHz (3.9GHz Turbo) ◮ 4 cores, 8 hardware threads (via hyper-threading) ◮ 4x32KB L1 caches, 4x256KB L2 caches, 8MB L3 cache Nuno Diegues 6/27

Study Commodity hardware in Intel TSX ◮ IBM processors target high performance computing ◮ Intel Haswell Xeon E3-1275v3 3.5GHz (3.9GHz Turbo) ◮ 4 cores, 8 hardware threads (via hyper-threading) ◮ 4x32KB L1 caches, 4x256KB L2 caches, 8MB L3 cache Standard metrics for evaluation ◮ Time to complete benchmarks ◮ Power consumed (collected via Intel RAPL) ◮ Relative to sequential, non-instrumented executions Nuno Diegues 6/27

Study Commodity hardware in Intel TSX ◮ IBM processors target high performance computing ◮ Intel Haswell Xeon E3-1275v3 3.5GHz (3.9GHz Turbo) ◮ 4 cores, 8 hardware threads (via hyper-threading) ◮ 4x32KB L1 caches, 4x256KB L2 caches, 8MB L3 cache Standard metrics for evaluation ◮ Time to complete benchmarks ◮ Power consumed (collected via Intel RAPL) ◮ Relative to sequential, non-instrumented executions ◮ Combined metric: Speedup / KJoules Nuno Diegues 6/27

Study Commodity hardware in Intel TSX ◮ IBM processors target high performance computing ◮ Intel Haswell Xeon E3-1275v3 3.5GHz (3.9GHz Turbo) ◮ 4 cores, 8 hardware threads (via hyper-threading) ◮ 4x32KB L1 caches, 4x256KB L2 caches, 8MB L3 cache Standard metrics for evaluation ◮ Time to complete benchmarks ◮ Power consumed (collected via Intel RAPL) ◮ Relative to sequential, non-instrumented executions ◮ Combined metric: Speedup / KJoules STAMP benchmarks (excluded Bayes) with standard parameters Nuno Diegues 6/27

Compared Techniques Locks STM HTM Hybrid TM Nuno Diegues 8/27

Compared Techniques - Locks All benchmarks used an interface with the atomic construct: GL : single global lock FL : fine-grained locks — per-application effort Nuno Diegues 9/27

Compared Techniques - STM Nuno Diegues 10/27

Compared Techniques - STM TL2 : commit-time locking NOrec : aimed at low thread count (single commit lock) TinySTM : encounter-time locking SwissTM : mixed encounter-time and commit-time locking Nuno Diegues 10/27

Compared Techniques - HTM Intel TSX is single version, ensures strong isolation and allows nesting. Most important it is best-effort : Nuno Diegues 11/27

Compared Techniques - HTM Intel TSX is single version, ensures strong isolation and allows nesting. Most important it is best-effort : No transaction is guaranteed to commit Exhausting cache lines with transactional footprint Architectural states, instructions, traps Nuno Diegues 11/27

Compared Techniques - HTM Intel TSX is single version, ensures strong isolation and allows nesting. Most important it is best-effort : No transaction is guaranteed to commit Exhausting cache lines with transactional footprint Architectural states, instructions, traps Fallback path must be provided in software ◮ address to routine provided on XBEGIN Nuno Diegues 11/27

Compared Techniques - HTM Intel TSX is single version, ensures strong isolation and allows nesting. Most important it is best-effort : No transaction is guaranteed to commit Exhausting cache lines with transactional footprint Architectural states, instructions, traps Fallback path must be provided in software ◮ address to routine provided on XBEGIN TSX-GL and TSX-FL Nuno Diegues 11/27

Compared Techniques - HyTM Use an STM in the fallback path of TSX: TSX-TL2 with reduced hardware transactions TSX-NOrec simpler, since NOrec has a single lock Nuno Diegues 12/27

STAMP results Workload Characterization: Time in Tx (%) Contention kmeans low (7) low ssca2 low (17) low intruder medium (33) high vacation high (89) low genome high (97) low yada high (99) medium labyrinth high (100) high Nuno Diegues 14/27

STAMP results Workload Characterization: Time in Tx (%) Contention kmeans low (7) low L L ssca2 low (17) low M intruder medium (33) high vacation high (89) low M H genome high (97) low yada high (99) medium H H labyrinth high (100) high Nuno Diegues 14/27

STAMP results Characterization of the Techniques Most Performant Least Power Consumption L kmeans L ssca2 M intruder M vacation H genome H yada H labyrinth Nuno Diegues 15/27

Plot labels GL TSX-GL TL2 TSX-TL2 NOrec TSX-NOrec SwissTM TinySTM Nuno Diegues 16/27

Plot labels TSX-GL TSX-NOrec TinySTM Nuno Diegues 16/27

Plot labels TSX-GL TSX-NOrec TinySTM Speedup / KJoule along increasing threads Nuno Diegues 16/27

kmeans - low intensity 120 100 80 Speedup/Joule 60 40 20 0 1 2 3 4 5 6 7 8 threads Sequential overhead is noticeable TSX-GL TSX-NOrec GL allows some concurrency due to L workload TinySTM HyTMs lag behind due to the STMs poor performance Nuno Diegues 17/27

STAMP results - low intensity of transactions capacity architectural % of transactions aborted 80 conflict interaction 8 threads 4 threads 40 1 thread 0 TSX-GL TSX-TL2 TSX-NOrec 1 thread has negligible aborts STMs have 15 % abort rate Nuno Diegues 18/27

STAMP results - low intensity of transactions Characterization of the Techniques Most Performant Least Power Consumption L kmeans TSX-GL TSX-GL L ssca2 TSX-GL TSX-GL M intruder M vacation H genome H yada H labyrinth Nuno Diegues 19/27

intruder - medium intensity 14 12 10 Speedup/Joule 8 6 4 2 1 2 3 4 5 6 7 8 threads Binding threads round-robin: > 4t uses hyper-threading TSX-GL TSX-NOrec TSX -based approaches suffer from pressure on caches TinySTM Best STMs (not TL2 ) scale regardless Nuno Diegues 20/27

STAMP results - medium intensity of transactions Most Performant Least Power Consumption L kmeans TSX-GL TSX-GL L ssca2 TSX-GL TSX-GL M intruder TSX-GL ≤ 4t; TinySTM ≥ 5t TSX-GL ≤ 5t; TinySTM ≥ 6t M vacation TSX-GL ≤ 2t; TinySTM ≥ 3t TSX-GL ≤ 4t; TinySTM ≥ 5t H genome H yada H labyrinth Nuno Diegues 21/27

yada - high intensity 6 5.5 5 4.5 Speedup/Joule 4 3.5 3 2.5 2 1.5 1 0.5 1 2 3 4 5 6 7 8 threads TSX-GL does not scale HyTMs follow the trend of the STM counter-part TSX-GL When time to complete stagnates, power consumption TSX-NOrec stagnates TinySTM ◮ Logical cores of hyper-threading ◮ Allow for additional hardware parallelism ◮ Do not consume as much additional power Nuno Diegues 22/27

STAMP results - high intensity of transactions capacity architectural 80 conflict interaction 40 0 TSX-GL TSX-TL2 TSX-NOrec Most conflicts are not due to data accesses Nuno Diegues 23/27

STAMP results Most Performant Least Power Consumption L kmeans TSX-GL TSX-GL L ssca2 TSX-GL TSX-GL M intruder TSX-GL ≤ 4t; TinySTM ≥ 5t TSX-GL ≤ 5t; TinySTM ≥ 6t M vacation TSX-GL ≤ 2t; TinySTM ≥ 3t TSX-GL ≤ 4t; TinySTM ≥ 5t H genome TinySTM TinySTM H yada SwissTM TinySTM H labyrinth STMs (except TL2) STMs (except TL2) Nuno Diegues 24/27

STAMP - fine-grained locking Requires a per-application effort Reasoning with transactions is meant to simplify programming Nuno Diegues 25/27

STAMP - fine-grained locking Requires a per-application effort Reasoning with transactions is meant to simplify programming Does not change the landscape of performance and power consumption Nuno Diegues 25/27

The moment of truth: are we done with STM? Nuno Diegues , Paolo - PowerPoint PPT Presentation

The moment of truth: are we done with STM? Nuno Diegues , Paolo Romano, Lus Rodrigues ndiegues@gsd.inesc-id.pt Nuno Diegues 1/27 Over 20 years of Transactional Memory Nuno Diegues 2/27 Over 20 years of Transactional Memory Commodity

Scanning Tunneling Microscopy (STM) and spin-polarized STM Part I - STM Wulf Wulfhekel

Scanning Tunneling Microscopy (STM) and spin-polarized STM Part II - spin polarized STM Wulf

Basic Blocks and Traces Lecture 8 Canonical Trees signature CANON = sig val linearize :

STM/STS study of surface electronic STM/STS study of surface electronic density of states of Sr 2

R&D R&D sul sul fotovoltaico fotovoltaico in STM in STM Marina Foti IMS R&D

Hybrid STM/HTM for Nested Transactions in Java Keith Chapman Tony Hosking Eliot Moss Purdue U

PPP over SONET from STS-1 (STM-0/AU-3) to STS-192c (STM-64/AU-4-64c)

SYMBOLIC LOGIC UNIT 3: COMPUTING TRUTH VALUES Truth Values The truth value of a

Truth, T Truth-values, and the l like Fabien Schang National Research University Higher

8 energy = our inner feelings Energy is how you feel from moment to moment Mood Elevator

The moment-LP and moment-SOS approaches Jean B. Lasserre LAAS-CNRS and Institute of Mathematics,

30/03/2016 Safety Moment Ben Green Planning Forum Ian Fletcher 22 March 2016 2 Safety Moment

Truth Revisited: What is Truth? Truth is Important. Pilate therefore said to Jesus: Art

Winning Presentation in a Day Get It Done Right, Get It Done Fast Winning Presentation in a Day

Winning Presentation in a Day Get It Done Right, Get It Winning Presentation in a Day Get It Done

SmartMill Technology Guaranteeing Particle Size, Quality and Production with On-site Sorbent

Six Lessons Learned Adherence and the Pharmacy Home Project November 29th, 2012 Presentation at

Dissolving Titan: Dissolution geology on Saturns moon Michael J. Malaska, PhD / NPP Senior

TURK TALK: A HYBRID APPROACH TO TEACHING THERAPEUTIC COMMUNICATION Presented by Michelle Cullen,

Town of MounT Holly Star Lake Dam November 21, 2013 Presentation by Shawn Patenaude, PE 1

Roberta Griffith PowerPoint Presentation for IAC 47 th Congress in Barcelona 15 minutes

Woomera Exploration Limited Exploring the highly prospective Gawler Craton and Musgrave Province

Rebooting the Original Olympic Sport Cary Depel Chair Colin Nicholson Chief Executive

11/9/2018 WA S H I N G T O N S T AT E U N I V E R S I T Y Office of Research Assurances

The moment of truth: are we done with STM? Nuno Diegues , Paolo - PowerPoint PPT Presentation

The moment of truth: are we done with STM? Nuno Diegues , Paolo Romano, Lus Rodrigues ndiegues@gsd.inesc-id.pt Nuno Diegues 1/27 Over 20 years of Transactional Memory Nuno Diegues 2/27 Over 20 years of Transactional Memory Commodity

Scanning Tunneling Microscopy (STM) and spin-polarized STM Part I - STM Wulf Wulfhekel

Scanning Tunneling Microscopy (STM) and spin-polarized STM Part II - spin polarized STM Wulf

Basic Blocks and Traces Lecture 8 Canonical Trees signature CANON = sig val linearize :

STM/STS study of surface electronic STM/STS study of surface electronic density of states of Sr 2

R&amp;D R&amp;D sul sul fotovoltaico fotovoltaico in STM in STM Marina Foti IMS R&amp;D

Hybrid STM/HTM for Nested Transactions in Java Keith Chapman Tony Hosking Eliot Moss Purdue U

PPP over SONET from STS-1 (STM-0/AU-3) to STS-192c (STM-64/AU-4-64c)

SYMBOLIC LOGIC UNIT 3: COMPUTING TRUTH VALUES Truth Values The truth value of a

Truth, T Truth-values, and the l like Fabien Schang National Research University Higher

8 energy = our inner feelings Energy is how you feel from moment to moment Mood Elevator

The moment-LP and moment-SOS approaches Jean B. Lasserre LAAS-CNRS and Institute of Mathematics,

30/03/2016 Safety Moment Ben Green Planning Forum Ian Fletcher 22 March 2016 2 Safety Moment

Truth Revisited: What is Truth? Truth is Important. Pilate therefore said to Jesus: Art

Winning Presentation in a Day Get It Done Right, Get It Done Fast Winning Presentation in a Day

Winning Presentation in a Day Get It Done Right, Get It Winning Presentation in a Day Get It Done

SmartMill Technology Guaranteeing Particle Size, Quality and Production with On-site Sorbent

Six Lessons Learned Adherence and the Pharmacy Home Project November 29th, 2012 Presentation at

Dissolving Titan: Dissolution geology on Saturns moon Michael J. Malaska, PhD / NPP Senior

TURK TALK: A HYBRID APPROACH TO TEACHING THERAPEUTIC COMMUNICATION Presented by Michelle Cullen,

Town of MounT Holly Star Lake Dam November 21, 2013 Presentation by Shawn Patenaude, PE 1

Roberta Griffith PowerPoint Presentation for IAC 47 th Congress in Barcelona 15 minutes

Woomera Exploration Limited Exploring the highly prospective Gawler Craton and Musgrave Province

Rebooting the Original Olympic Sport Cary Depel Chair Colin Nicholson Chief Executive

11/9/2018 WA S H I N G T O N S T AT E U N I V E R S I T Y Office of Research Assurances

R&D R&D sul sul fotovoltaico fotovoltaico in STM in STM Marina Foti IMS R&D