T IME T RAVELING H ARDWARE AND S OFTWARE S YSTEMS Xiangyao Yu, Srini - PowerPoint PPT Presentation

T IME T RAVELING H ARDWARE AND S OFTWARE S YSTEMS Xiangyao Yu, Srini Devadas CSAIL, MIT

F OR F IFTY YEARS , WE HAVE RIDDEN M OORE ’ S L AW Moore’s Law and the scaling of clock frequency = printing press for the currency of performance

T ECHNOLOGY S CALING 10,000,000 u Transistors ¡x ¡1000 ¡ 1,000,000 ■ Clock ¡frequency ¡(MHz) ¡ 100,000 ▲ Power ¡(W) ¡ 10,000 ● Cores ¡ 1,000 100 10 1 0 1970 1975 1980 1985 1990 1995 2000 2005 2010 Each generation of Moore’s law doubles the number of transistors but clock frequency has stopped increasing.

T ECHNOLOGY S CALING 10,000,000 u Transistors ¡x ¡1000 ¡ 1,000,000 ■ Clock ¡frequency ¡(MHz) ¡ 100,000 ▲ Power ¡(W) ¡ 10,000 ● Cores ¡ 1,000 100 10 1 0 1970 1975 1980 1985 1990 1995 2000 2005 2010 To increase performance, need to exploit parallelism.

D IFFERENT K INDS OF P ARALLELISM - 1 Instruction Level Transaction Level a = b + c Read A Read A d = e + f Read B Read D Compute C Compute E g = d + b Read C Read E Compute F 5 ¡

D IFFERENT K INDS OF P ARALLELISM - 2 Thread Level Task Level Search(“image”) x = A B C Cloud Different thread computes each entry of product matrix C 6 ¡

D IFFERENT K INDS OF P ARALLELISM - 3 Thread Level User Level Search(“image”) x = A B C Cloud Lookup(“data”) Different thread computes Query(“record”) each entry of product matrix C 7 ¡

D EPENDENCY D ESTROYS P ARALLELISM for i = 1 to n a[b[i]] = (a[b[i - 1]] + b[i]) / c[i] Need to compute i th entry after i - 1 th has been computed L 8 ¡

D IFFERENT K INDS OF D EPENDENCY WAW: Read A Write A Semantics No decide dependency! Read A Write A order WAR: RAW: Write A Read A We have Read needs flexibility Read A Write A new value here! 9 ¡

D EPENDENCE IS A CROSS T IME , B UT W HAT I S T IME ? • Time can be physical time • Time could correspond to logical timestamps assigned to instructions • Time could be a combination of the above à Time is a definition of ordering 10 ¡

WAR D EPENDENCE Initially A = 10 Thread 0 Thread 1 Write A A=13 Logical Order Physical Time Order Read A Local copy of A = 10 Read happens later than Write in physical time but is before Write in logical time. 11 ¡

W HAT I S C ORRECTNESS ? • We define correctness of a parallel program based on its outputs in relation to the program run sequentially 12 ¡

SEQUENTIAL CONSISTENCY A B Can we exploit C D this freedom in correct Global Memory Order execution to A B C D avoid A C B D dependency? C D A B C B A D 13 ¡

A VOIDING D EPENDENCY A CROSS THE S TACK Circuit Efficient atomic instructions Multicore Tardis coherence protocol Processor Multicore TicToc concurrency control Database with Andy Pavlo and Daniel Sanchez Distributed Distributed TicToc Database Distributed Transaction processing with Shared Memory fault tolerance 14 ¡

S HARED M EMORY S YSTEMS Cache ¡ Concurrency ¡ Coherence ¡ Control ¡ Multi-core OLTP Processor Database 15 ¡

D IRECTORY -B ASED C OHERENCE P P P P P P P P P • Data replicated and cached locally for access • Uncached data copied to local cache, writes invalidate data copies 16 ¡

C ACHE C OHERENCE S CALABILITY 250% ¡ Storage ¡Overhead ¡ 200% ¡ Read A Read A Write A 150% ¡ Today 100% ¡ 50% ¡ Invalidation 0% ¡ 16 ¡ 64 ¡ 256 ¡ 1024 ¡ O(N) Sharer List Core ¡Count ¡ 17 ¡

L EASE -B ASED C OHERENCE Ld(A) Ld(A) Ld(A) Program Timestamp Core 0 (pts) St(A) Core 1 Write Read Timestamp Timestamp t=0 1 2 3 4 5 6 7 Time Logical (wts) (rts) Timestamp • A ¡read ¡gets ¡a ¡lease ¡on ¡a ¡cacheline ¡ • Lease ¡renewal ¡aLer ¡lease ¡expires ¡ • A ¡store ¡can ¡only ¡commit ¡aLer ¡leases ¡expire ¡ • Tardis: ¡logical ¡leases ¡ 18 ¡

L OGICAL T IMESTAMP Invalidation Physical Time Order Tardis Logical Time Order (No Invalidation) (concept borrowed from database) Old Version New Version logical time 19 ¡

T IMESTAMP M ANAGEMENT Program Timestamp ( pts ) pts=5 Core Timestamp of last memory operation Cache A S 0 10 Write Timestamp ( wts ) Data created at wts wts rts Read Timestamp ( rts ) state Data valid from wts to rts A S 0 10 B S 0 5 Lease Shared LLC rts wts Logical Time 20 ¡

T WO -C ORE E XAMPLE Core 0 Core 1 pts=0 pts=0 Core 0 Core 1 store A 1 Cache Cache load B 2 store B 3 load A 4 load B 5 A S 0 0 physical S 0 0 B time 21 ¡

S TORE A @ C ORE 0 Core 0 Core 1 pts=1 pts=0 pts=0 Core 0 Core 1 store A 1 Cache Cache load B 2 A M 1 1 store B 3 load A 4 ST(A) Req load B 5 A M M S 0 0 0 0 owner:0 Write at pts = 1 B S 0 0 22 ¡

L OAD B @ C ORE 0 Core 0 Core 1 pts=1 pts=0 Core 0 Core 1 store A 1 Cache Cache load B 2 A M 1 1 store B 3 B load A 4 LD(B) Req load B 5 A M owner:0 Reserve rts to pts + lease = 11 B S 0 S 0 S 0 0 11 11 23 ¡

S TORE B @ C ORE 1 Core 0 Core 1 pts=1 pts=12 pts=0 Core 0 Core 1 store A 1 Cache Cache load B 2 A M 1 1 store B 3 B B S 0 11 M 12 12 load A 4 ST(B) Req load B 5 A M owner:0 Exclusive ownership returned B M S 0 11 0 11 M owner:1 No invalidation 24 ¡

Two V ERSIONS C OEXIST Core 0 Core 1 pts=1 pts=12 Core 0 Core 1 store A 1 Cache Cache load B 2 A M 1 1 store B 3 B B S 0 11 M 12 12 load A 4 load B 5 A M owner:0 Core 1 traveled ahead in time B M owner:1 Versions ordered in logical time 25 ¡

L OAD A @ C ORE 1 Core 0 Core 1 pts=1 pts=12 Core 0 Core 1 store A 1 Cache Cache load B 2 A A S 1 22 M 1 1 S 1 22 store B 3 B B S 0 11 M 12 12 load A 4 LD(A) Req load B 5 WB(A) Req A S 1 22 M owner:0 Write ¡back ¡request ¡to ¡Core ¡0 ¡ B M owner:1 Downgrade ¡from ¡M ¡to ¡S ¡ Reserve ¡ rts ¡to ¡ pts ¡+ ¡lease ¡= ¡22 ¡ 26 ¡

L OAD B @ C ORE 0 Core 0 Core 1 pts=1 pts=12 Core 0 Core 1 store A 1 Cache Cache load B 2 A S 1 22 A S 1 22 store B 3 physical time B B S 0 11 M 12 12 load A 4 load B 5 logical timestamp A M owner:0 ¡ ¡ ¡ ¡ ¡ ¡ ¡> ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡, ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡< ¡ 5 4 5 4 B M owner:1 physical logical time time ¡ global memory order ≠ physical time order 27 ¡

S UMMARY OF E XAMPLE Directory Tardis Core 0 Core 0 Core 1 Core 1 RAW store A store A RAW WAR load B load B store B store B RAW load A load A physical load B load B WAR time physical time order physical + logical time order 28 ¡

P HYSIOLOGICAL T IME Tardis Global Memory Order Core 0 Core 1 store A (1) 2 load B (1) 1 3 store B (12) load A (12) Physical Time Logical Time Physiological Time load B (1) X < PL Y := X < L Y or (X = L Y and X < P Y) Thm : Tardis obeys Sequential Consistency 29 ¡

T ARDIS P ROS AND C ONS Scalability Lease Renew Speculative Read Timestamp Size No Invalidation, Multicast or Timestamp Compression Broadcast Time Stands Still Livelock Avoidance 30 ¡

E VALUATION Storage overhead per cacheline (N cores) Directory: N bits per cacheline Tardis: Max(Const, log(N)) bits per cacheline 250% ¡ Directory ¡ 200% ¡ Tardis ¡ 150% ¡ 100% ¡ 50% ¡ 0% ¡ 16 ¡ 64 ¡ 256 ¡ 1024 ¡ 31 ¡

S PEEDUP Graphite Multi-core Simulator (64 cores) DIRECTORY ¡ TARDIS ¡ TARDIS ¡AT ¡256 ¡CORES ¡ 1.3 ¡ Normalized ¡Speedup ¡ 1.2 ¡ 1.1 ¡ 1 ¡ 0.9 ¡ 32 ¡

N ETWORK T RAFFIC DIRECTORY ¡ TARDIS ¡ TARDIS ¡AT ¡256 ¡CORES ¡ 1.05 ¡ Normalized ¡Traffic ¡ 1 ¡ 0.95 ¡ 0.9 ¡ 0.85 ¡ 0.8 ¡ 33 ¡

Network Storage Traffic Overhead Snoopy Coherence Directory Coherence Optimized Directory TARDIS High Performance Complexity Degradation 34 ¡

C ONCURRENCY C ONTROL COMMIT COMMIT BEGIN BEGIN COMMIT COMMIT BEGIN BEGIN Results should Serializability correspond to some serial order COMMIT COMMIT COMMIT COMMIT BEGIN BEGIN BEGIN BEGIN of atomic execution 35 ¡

C ONCURRENCY C ONTROL COMMIT COMMIT BEGIN BEGIN COMMIT COMMIT BEGIN BEGIN Results should Can’t Have This correspond to some serial order COMMIT COMMIT COMMIT BEGIN BEGIN BEGIN COMMIT BEGIN of atomic execution 36 ¡

B OTTLENECK 1: T IMESTAMP A LLOCATION T/O ¡ 2PL ¡ • Centralized Allocator 25 ¡ (Million ¡txn/s) ¡ 20 ¡ Throughput ¡ – Timestamp allocation is 15 ¡ a scalability bottleneck 10 ¡ 5 ¡ 0 ¡ • Synchronized Clock 0 ¡ 20 ¡ 40 ¡ 60 ¡ 80 ¡ Thread ¡Count ¡ – Clock skew causes unnecessary aborts 37 ¡

T IME T RAVELING H ARDWARE AND S OFTWARE S YSTEMS Xiangyao Yu, Srini - PowerPoint PPT Presentation

T IME T RAVELING H ARDWARE AND S OFTWARE S YSTEMS Xiangyao Yu, Srini Devadas CSAIL, MIT F OR F IFTY YEARS , WE HAVE RIDDEN M OORE S L AW Moores Law and the scaling of clock frequency = printing press for the currency of performance T

IME Committee Fraud & Abuse Discussion 2 IME Advisory Committee Schedule IME Provider

H ARDWARE P REPROCESSING F RAMEWORK (HPF) Traditional hardware Hardware preprocessing description

M ULTICORE H ARDWARE S HARED R ESOURCES : U NDERSTANDING OF THE S TATE OF THE A RT Gabriel

HARDWARE H ARDWARE T YPES Microcontroller (MCU) Arduino, ESP8266, Particle Single Board

E XPLOITING S EMANTIC C OMMUTATIVITY IN H ARDWARE S PECULATION G UOWEI Z HANG , V IRGINIA C HIU , D

9/7/2017 Behind the Curtain of an IME Behind the Curtain of an IME Dan Gerstenblitt, MD-MPH

Axib ibase Tim ime Series Database Axib ibase Tim ime Series Database Axibase Time-Series

Axib ibase Tim ime Series Database Axib ibase Tim ime Series Database Axibase Time-Series

SME SME s & Cluste r s & Cluste r Pote ntial in L Pote ntial in L ime r ime r ic k

Responsibilit y at Met shallit us Responsibilit y at Met shallit us Coast al and Marit ime

Reection theorems for cardinal functions and cardinal arithmetic Alberto Marcelino Egnio

Je ffc o Sc hools Start T ime T ask F orc e Board of E duc ation Pre se ntation F e

Synthetic Bio-Communication S YNTHETIC B IO -C OMMUNICATION 1. A TOMIC 2. I NTERCELLULAR 3. T IME

rvores e rvores Binrias Siang Wun Song - Universidade de So Paulo - IME/USP MAC 5710 -

with Process Mining POC Real-tim ime e proc oces ess perfor ormance nce using g Near r

rvore Binria de Busca tima Siang Wun Song - Universidade de So Paulo - IME/USP MAC 5710 -

Time within Distributed Systems Time is important, however, it is problematic in distributed

Calculating - Confluence Compositionally Gordon J. Pace University of Malta, Malta

The Design and Engineering of Concurrency Libraries Doug Lea SUNY Oswego Outline Overview of

DISTRIBUTED SYSTEMS Department of Computing Science Umea University Distributed Systems - D N

1/5/2012 Overview of Interconnects Presentation Outline Myrinet and Quadrics General

Data-centric Programming for Distributed Systems Chp2&3.2 by Peter Alvaro, 2015 presenter:

Networking from the Bottom Up: IPv6 George Neville-Neil gnn@neville-neil.com May 8, 2010 George

Comparing P2P Systems Anthony D. Joseph John Kubiatowicz CS294-4 Why so many systems? Many

T IME T RAVELING H ARDWARE AND S OFTWARE S YSTEMS Xiangyao Yu, Srini - PowerPoint PPT Presentation

T IME T RAVELING H ARDWARE AND S OFTWARE S YSTEMS Xiangyao Yu, Srini Devadas CSAIL, MIT F OR F IFTY YEARS , WE HAVE RIDDEN M OORE S L AW Moores Law and the scaling of clock frequency = printing press for the currency of performance T

IME Committee Fraud &amp; Abuse Discussion 2 IME Advisory Committee Schedule IME Provider

H ARDWARE P REPROCESSING F RAMEWORK (HPF) Traditional hardware Hardware preprocessing description

M ULTICORE H ARDWARE S HARED R ESOURCES : U NDERSTANDING OF THE S TATE OF THE A RT Gabriel

HARDWARE H ARDWARE T YPES Microcontroller (MCU) Arduino, ESP8266, Particle Single Board

E XPLOITING S EMANTIC C OMMUTATIVITY IN H ARDWARE S PECULATION G UOWEI Z HANG , V IRGINIA C HIU , D

9/7/2017 Behind the Curtain of an IME Behind the Curtain of an IME Dan Gerstenblitt, MD-MPH

Axib ibase Tim ime Series Database Axib ibase Tim ime Series Database Axibase Time-Series

Axib ibase Tim ime Series Database Axib ibase Tim ime Series Database Axibase Time-Series

SME SME s &amp; Cluste r s &amp; Cluste r Pote ntial in L Pote ntial in L ime r ime r ic k

Responsibilit y at Met shallit us Responsibilit y at Met shallit us Coast al and Marit ime

Reection theorems for cardinal functions and cardinal arithmetic Alberto Marcelino Egnio

Je ffc o Sc hools Start T ime T ask F orc e Board of E duc ation Pre se ntation F e

Synthetic Bio-Communication S YNTHETIC B IO -C OMMUNICATION 1. A TOMIC 2. I NTERCELLULAR 3. T IME

rvores e rvores Binrias Siang Wun Song - Universidade de So Paulo - IME/USP MAC 5710 -

with Process Mining POC Real-tim ime e proc oces ess perfor ormance nce using g Near r

rvore Binria de Busca tima Siang Wun Song - Universidade de So Paulo - IME/USP MAC 5710 -

Time within Distributed Systems Time is important, however, it is problematic in distributed

Calculating - Confluence Compositionally Gordon J. Pace University of Malta, Malta

The Design and Engineering of Concurrency Libraries Doug Lea SUNY Oswego Outline Overview of

DISTRIBUTED SYSTEMS Department of Computing Science Umea University Distributed Systems - D N

1/5/2012 Overview of Interconnects Presentation Outline Myrinet and Quadrics General

Data-centric Programming for Distributed Systems Chp2&amp;3.2 by Peter Alvaro, 2015 presenter:

Networking from the Bottom Up: IPv6 George Neville-Neil gnn@neville-neil.com May 8, 2010 George

Comparing P2P Systems Anthony D. Joseph John Kubiatowicz CS294-4 Why so many systems? Many

IME Committee Fraud & Abuse Discussion 2 IME Advisory Committee Schedule IME Provider

SME SME s & Cluste r s & Cluste r Pote ntial in L Pote ntial in L ime r ime r ic k

Data-centric Programming for Distributed Systems Chp2&3.2 by Peter Alvaro, 2015 presenter: