Java on Scalable Memory Architectures University of Crete , 25th of - - PowerPoint PPT Presentation

java on scalable memory architectures
SMART_READER_LITE
LIVE PREVIEW

Java on Scalable Memory Architectures University of Crete , 25th of - - PowerPoint PPT Presentation

Java on Scalable Memory Architectures University of Crete , 25th of October 2016 Foivos S. Zakkak Except where otherwise noted, this presentation is licensed under the Creative Commons Attribution-ShareAlike 4.0 International License. Third


slide-1
SLIDE 1

Java on Scalable Memory Architectures

University of Crete, 25th of October 2016 Foivos S. Zakkak

Except where otherwise noted, this presentation is licensed under the Creative Commons Attribution-ShareAlike 4.0 International License. Third party marks and brands are the property of their respective holders.

slide-2
SLIDE 2

Outline

1 Introduction 2 Java Distributed Memory Model 3 Designing DiSquawk 4 Modeling DiSquawk 5 Implementation & Evaluation 6 Conclusions

slide-3
SLIDE 3

1 Introduction 1.1 Hardware Trends 1/ 31

Moore’s Law, Power Wall & Coherence Protocols

■ The size of a transistor shrinks to half every ~18 months [Krzanich2015] ■ Due to the power wall frequency scaling stopped ■ The extra transistors are used for extra cores ■ Cache coherency not scaling in terms of performance and energy consumption

[Kaxiras et al. 2010, Choi et al. 2011]

Ph.D. Thesis Defense

  • F. Zakkak - zakkak@ics.forth.gr
slide-4
SLIDE 4

1 Introduction 1.1 Hardware Trends 2/ 31

Emerging Trends

■ Hundreds of cores per U (rack unit) ■ Lack of global memory coherence ■ Faster communication (TCP-IP vs RDMA) Ph.D. Thesis Defense

  • F. Zakkak - zakkak@ics.forth.gr
slide-5
SLIDE 5

1 Introduction 1.1 Hardware Trends 3/ 31

Architectures

EuroServer [Durand et al. 2014]

Figure Source: Durand et al. 2014

Runnemede [Carter et al. 2013]

Figure Source: Carter et al. 2013

Ph.D. Thesis Defense

  • F. Zakkak - zakkak@ics.forth.gr
slide-6
SLIDE 6

1 Introduction 1.2 Programming Languages 4/ 31

Managed Programming Languages

Highlights

■ Consistent behavior across difgerent platforms ■ Abstract away hardware details ■ Shorter time to market (TTM) ■ Portability: “write once run everywhere”

Mechanisms

■ Language Specifjcation ■ Memory Model ■ Process Virtual Machine Ph.D. Thesis Defense

  • F. Zakkak - zakkak@ics.forth.gr
slide-7
SLIDE 7

1 Introduction 1.2 Programming Languages 4/ 31

Managed Programming Languages

Highlights

■ Consistent behavior across difgerent platforms ■ Abstract away hardware details ■ Shorter time to market (TTM) ■ Portability: “write once run everywhere”

Mechanisms

■ Language Specifjcation ■ Memory Model ■ Process Virtual Machine Ph.D. Thesis Defense

  • F. Zakkak - zakkak@ics.forth.gr
slide-8
SLIDE 8

1 Introduction 1.2 Programming Languages 5/ 31

Contributions

■ Java Distributed Memory Model (JDMM)

[ISMM’14]

□ Extension of JMM that exposes memory management operations □ Adheres to JMM □ Allows same optimizations as JMM ■ Process Virtual Machine Design and Implementation

[JTRES’16]

□ Design of novel algorithms for software caching and synchronization □ Implementation of a distributed JVM for non-cache-coherent architectures □ Evaluation on Formic, a 512-core Emulator ■ Distributed Java Calculus (DJC)

[PPPJ’16]

□ Mathematical model of the implementation □ Proof of adherence to JMM Ph.D. Thesis Defense

  • F. Zakkak - zakkak@ics.forth.gr
slide-9
SLIDE 9

Outline

1 Introduction 2 Java Distributed Memory Model 3 Designing DiSquawk 4 Modeling DiSquawk 5 Implementation & Evaluation 6 Conclusions

slide-10
SLIDE 10

2 Java Distributed Memory Model 2.1 Background 6/ 31

What is a memory model?

■ Formal specifjcation that describes the behavior of a program ■ Models all possible executions of a program (legal or not) ■ Provides rules that determine which executions are legal Ph.D. Thesis Defense

  • F. Zakkak - zakkak@ics.forth.gr
slide-11
SLIDE 11

2 Java Distributed Memory Model 2.1 Background 7/ 31

What is it good for?

■ Contract between language or machine designers, implementers, and end-users ■ Language requirements regarding interaction of threads through memory ■ Helps language implementers build the runtime and/or the compiler ■ Helps developers understand a program’s behavior Ph.D. Thesis Defense

  • F. Zakkak - zakkak@ics.forth.gr
slide-12
SLIDE 12

2 Java Distributed Memory Model 2.1 Background 8/ 31

The Java Memory Model

run() { // Thread 1 // ... thread2.start(); synchronized(object1) {

  • bject1.field1 = 42;

} } run() { // Thread 2 synchronized(object1) { temp1 = object1.field1; } thread1.join();

  • bject1.field1 = 24;

// ... }

Sp Sp L W U Fi Fi St St L R U J W

B F ≤𝑞𝑝: program order ≤𝑡𝑝: synchronization order ≤𝑡𝑥: synchronizes-with ≤ℎ𝑐: happens-before Synchronization action Regular action Cache action

Ph.D. Thesis Defense

  • F. Zakkak - zakkak@ics.forth.gr
slide-13
SLIDE 13

2 Java Distributed Memory Model 2.1 Background 8/ 31

The Java Memory Model

run() { // Thread 1 // ... thread2.start(); synchronized(object1) {

  • bject1.field1 = 42;

} } run() { // Thread 2 synchronized(object1) { temp1 = object1.field1; } thread1.join();

  • bject1.field1 = 24;

// ... }

Sp Sp L W U Fi Fi St St L R U J W

B F ≤𝑞𝑝: program order ≤𝑡𝑝: synchronization order ≤𝑡𝑥: synchronizes-with ≤ℎ𝑐: happens-before Synchronization action Regular action Cache action

Ph.D. Thesis Defense

  • F. Zakkak - zakkak@ics.forth.gr
slide-14
SLIDE 14

2 Java Distributed Memory Model 2.1 Background 8/ 31

The Java Memory Model

run() { // Thread 1 // ... thread2.start(); synchronized(object1) {

  • bject1.field1 = 42;

} } run() { // Thread 2 synchronized(object1) { temp1 = object1.field1; } thread1.join();

  • bject1.field1 = 24;

// ... }

Sp Sp L W U Fi Fi St St L R U J W

B F ≤𝑞𝑝: program order ≤𝑡𝑝: synchronization order ≤𝑡𝑥: synchronizes-with ≤ℎ𝑐: happens-before Synchronization action Regular action Cache action

Ph.D. Thesis Defense

  • F. Zakkak - zakkak@ics.forth.gr
slide-15
SLIDE 15

2 Java Distributed Memory Model 2.1 Background 8/ 31

The Java Memory Model

run() { // Thread 1 // ... thread2.start(); synchronized(object1) {

  • bject1.field1 = 42;

} } run() { // Thread 2 synchronized(object1) { temp1 = object1.field1; } thread1.join();

  • bject1.field1 = 24;

// ... }

Sp Sp L W U Fi Fi St St L R U J W

B F ≤𝑞𝑝: program order ≤𝑡𝑝: synchronization order ≤𝑡𝑥: synchronizes-with ≤ℎ𝑐: happens-before Synchronization action Regular action Cache action

Ph.D. Thesis Defense

  • F. Zakkak - zakkak@ics.forth.gr
slide-16
SLIDE 16

2 Java Distributed Memory Model 2.1 Background 8/ 31

The Java Memory Model

run() { // Thread 1 // ... thread2.start(); synchronized(object1) {

  • bject1.field1 = 42;

} } run() { // Thread 2 synchronized(object1) { temp1 = object1.field1; } thread1.join();

  • bject1.field1 = 24;

// ... }

Sp Sp L W U Fi Fi St St L R U J W

B F ≤𝑞𝑝: program order ≤𝑡𝑝: synchronization order ≤𝑡𝑥: synchronizes-with ≤ℎ𝑐: happens-before Synchronization action Regular action Cache action

Ph.D. Thesis Defense

  • F. Zakkak - zakkak@ics.forth.gr
slide-17
SLIDE 17

2 Java Distributed Memory Model 2.1 Background 8/ 31

The Java Memory Model

run() { // Thread 1 // ... thread2.start(); synchronized(object1) {

  • bject1.field1 = 42;

} } run() { // Thread 2 synchronized(object1) { temp1 = object1.field1; } thread1.join();

  • bject1.field1 = 24;

// ... }

Sp Sp L W U Fi Fi St St L R U J W

B F ≤𝑞𝑝: program order ≤𝑡𝑝: synchronization order ≤𝑡𝑥: synchronizes-with ≤ℎ𝑐: happens-before Synchronization action Regular action Cache action

Ph.D. Thesis Defense

  • F. Zakkak - zakkak@ics.forth.gr
slide-18
SLIDE 18

2 Java Distributed Memory Model 2.1 Background 8/ 31

The Java Memory Model

run() { // Thread 1 // ... thread2.start(); synchronized(object1) {

  • bject1.field1 = 42;

} } run() { // Thread 2 synchronized(object1) { temp1 = object1.field1; } thread1.join();

  • bject1.field1 = 24;

// ... }

Sp Sp L W U Fi Fi St St L R U J W

B F ≤𝑞𝑝: program order ≤𝑡𝑝: synchronization order ≤𝑡𝑥: synchronizes-with ≤ℎ𝑐: happens-before Synchronization action Regular action Cache action

Ph.D. Thesis Defense

  • F. Zakkak - zakkak@ics.forth.gr
slide-19
SLIDE 19

2 Java Distributed Memory Model 2.1 Background 8/ 31

The Java Memory Model

run() { // Thread 1 // ... thread2.start(); synchronized(object1) {

  • bject1.field1 = 42;

} } run() { // Thread 2 synchronized(object1) { temp1 = object1.field1; } thread1.join();

  • bject1.field1 = 24;

// ... }

Sp Sp L W U Fi Fi St St L R U J W

B F ≤𝑞𝑝: program order ≤𝑡𝑝: synchronization order ≤𝑡𝑥: synchronizes-with ≤ℎ𝑐: happens-before Synchronization action Regular action Cache action

Ph.D. Thesis Defense

  • F. Zakkak - zakkak@ics.forth.gr
slide-20
SLIDE 20

2 Java Distributed Memory Model 2.1 Background 8/ 31

The Java Memory Model

run() { // Thread 1 // ... thread2.start(); synchronized(object1) {

  • bject1.field1 = 42;

} } run() { // Thread 2 synchronized(object1) { temp1 = object1.field1; } thread1.join();

  • bject1.field1 = 24;

// ... }

Sp Sp L W U Fi Fi St St L R U J W

B F ≤𝑞𝑝: program order ≤𝑡𝑝: synchronization order ≤𝑡𝑥: synchronizes-with ≤ℎ𝑐: happens-before Synchronization action Regular action Cache action

Ph.D. Thesis Defense

  • F. Zakkak - zakkak@ics.forth.gr
slide-21
SLIDE 21

2 Java Distributed Memory Model 2.1 Background 8/ 31

The Java Memory Model

run() { // Thread 1 // ... thread2.start(); synchronized(object1) {

  • bject1.field1 = 42;

} } run() { // Thread 2 synchronized(object1) { temp1 = object1.field1; } thread1.join();

  • bject1.field1 = 24;

// ... }

Sp Sp L W U Fi Fi St St L R U J W

B F ≤𝑞𝑝: program order ≤𝑡𝑝: synchronization order ≤𝑡𝑥: synchronizes-with ≤ℎ𝑐: happens-before Synchronization action Regular action Cache action

Ph.D. Thesis Defense

  • F. Zakkak - zakkak@ics.forth.gr
slide-22
SLIDE 22

2 Java Distributed Memory Model 2.1 Background 8/ 31

The Java Memory Model

run() { // Thread 1 // ... thread2.start(); synchronized(object1) {

  • bject1.field1 = 42;

} } run() { // Thread 2 synchronized(object1) { temp1 = object1.field1; } thread1.join();

  • bject1.field1 = 24;

// ... }

Sp Sp L W U Fi Fi St St L R U J W

B F ≤𝑞𝑝: program order ≤𝑡𝑝: synchronization order ≤𝑡𝑥: synchronizes-with ≤ℎ𝑐: happens-before Synchronization action Regular action Cache action

Ph.D. Thesis Defense

  • F. Zakkak - zakkak@ics.forth.gr
slide-23
SLIDE 23

2 Java Distributed Memory Model 2.1 Background 8/ 31

The Java Memory Model

run() { // Thread 1 // ... thread2.start(); synchronized(object1) {

  • bject1.field1 = 42;

} } run() { // Thread 2 synchronized(object1) { temp1 = object1.field1; } thread1.join();

  • bject1.field1 = 24;

// ... }

Sp Sp L W U Fi Fi St St L R U J W

B F ≤𝑞𝑝: program order ≤𝑡𝑝: synchronization order ≤𝑡𝑥: synchronizes-with ≤ℎ𝑐: happens-before Synchronization action Regular action Cache action

Ph.D. Thesis Defense

  • F. Zakkak - zakkak@ics.forth.gr
slide-24
SLIDE 24

2 Java Distributed Memory Model 2.1 Background 8/ 31

The Java Memory Model

run() { // Thread 1 // ... thread2.start(); synchronized(object1) {

  • bject1.field1 = 42;

} } run() { // Thread 2 synchronized(object1) { temp1 = object1.field1; } thread1.join();

  • bject1.field1 = 24;

// ... }

Sp Sp L W U Fi Fi St St L R U J W

B F ≤𝑞𝑝: program order ≤𝑡𝑝: synchronization order ≤𝑡𝑥: synchronizes-with ≤ℎ𝑐: happens-before Synchronization action Regular action Cache action

Ph.D. Thesis Defense

  • F. Zakkak - zakkak@ics.forth.gr
slide-25
SLIDE 25

2 Java Distributed Memory Model 2.1 Background 8/ 31

The Java Memory Model

run() { // Thread 1 // ... thread2.start(); synchronized(object1) {

  • bject1.field1 = 42;

} } run() { // Thread 2 synchronized(object1) { temp1 = object1.field1; } thread1.join();

  • bject1.field1 = 24;

// ... }

Sp Sp L W U Fi Fi St St L R U J W

B F ≤𝑞𝑝: program order ≤𝑡𝑝: synchronization order ≤𝑡𝑥: synchronizes-with ≤ℎ𝑐: happens-before Synchronization action Regular action Cache action

Ph.D. Thesis Defense

  • F. Zakkak - zakkak@ics.forth.gr
slide-26
SLIDE 26

2 Java Distributed Memory Model 2.1 Background 8/ 31

The Java Memory Model

run() { // Thread 1 // ... thread2.start(); synchronized(object1) {

  • bject1.field1 = 42;

} } run() { // Thread 2 synchronized(object1) { temp1 = object1.field1; } thread1.join();

  • bject1.field1 = 24;

// ... }

Sp Sp L W U Fi Fi St St L R U J W

B F ≤𝑞𝑝: program order ≤𝑡𝑝: synchronization order ≤𝑡𝑥: synchronizes-with ≤ℎ𝑐: happens-before Synchronization action Regular action Cache action

Ph.D. Thesis Defense

  • F. Zakkak - zakkak@ics.forth.gr
slide-27
SLIDE 27

2 Java Distributed Memory Model 2.1 Background 8/ 31

The Java Memory Model

run() { // Thread 1 // ... thread2.start(); synchronized(object1) {

  • bject1.field1 = 42;

} } run() { // Thread 2 synchronized(object1) { temp1 = object1.field1; } thread1.join();

  • bject1.field1 = 24;

// ... }

Sp Sp L W U Fi Fi St St L R U J W

B F ≤𝑞𝑝: program order ≤𝑡𝑝: synchronization order ≤𝑡𝑥: synchronizes-with ≤ℎ𝑐: happens-before Synchronization action Regular action Cache action

Ph.D. Thesis Defense

  • F. Zakkak - zakkak@ics.forth.gr
slide-28
SLIDE 28

2 Java Distributed Memory Model 2.1 Background 8/ 31

The Java Memory Model

run() { // Thread 1 // ... thread2.start(); synchronized(object1) {

  • bject1.field1 = 42;

} } run() { // Thread 2 synchronized(object1) { temp1 = object1.field1; } thread1.join();

  • bject1.field1 = 24;

// ... }

Sp Sp L W U Fi Fi St St L R U J W

B F ≤𝑞𝑝: program order ≤𝑡𝑝: synchronization order ≤𝑡𝑥: synchronizes-with ≤ℎ𝑐: happens-before Synchronization action Regular action Cache action

Ph.D. Thesis Defense

  • F. Zakkak - zakkak@ics.forth.gr
slide-29
SLIDE 29

2 Java Distributed Memory Model 2.1 Background 8/ 31

The Java Memory Model

run() { // Thread 1 // ... thread2.start(); synchronized(object1) {

  • bject1.field1 = 42;

} } run() { // Thread 2 synchronized(object1) { temp1 = object1.field1; } thread1.join();

  • bject1.field1 = 24;

// ... }

Sp Sp L W U Fi Fi St St L R U J W

B F ≤𝑞𝑝: program order ≤𝑡𝑝: synchronization order ≤𝑡𝑥: synchronizes-with ≤ℎ𝑐: happens-before Synchronization action Regular action Cache action

Ph.D. Thesis Defense

  • F. Zakkak - zakkak@ics.forth.gr
slide-30
SLIDE 30

2 Java Distributed Memory Model 2.1 Background 8/ 31

The Java Memory Model

run() { // Thread 1 // ... thread2.start(); synchronized(object1) {

  • bject1.field1 = 42;

} } run() { // Thread 2 synchronized(object1) { temp1 = object1.field1; } thread1.join();

  • bject1.field1 = 24;

// ... }

Sp Sp L W U Fi Fi St St L R U J W

B F ≤𝑞𝑝: program order ≤𝑡𝑝: synchronization order ≤𝑡𝑥: synchronizes-with ≤ℎ𝑐: happens-before Synchronization action Regular action Cache action

Ph.D. Thesis Defense

  • F. Zakkak - zakkak@ics.forth.gr
slide-31
SLIDE 31

2 Java Distributed Memory Model 2.2 The model 9/ 31

JDMM Introduced Well-Formedness Rules

R W Iv F nor nor W

  • r

F B B W W Synchronization action Regular action Cache action ≤𝑞𝑝: program order ≤𝑡𝑝: synchronization order ≤𝑡𝑥: synchronizes-with ≤ℎ𝑐: happens-before Vr

F

B

Vw

Ph.D. Thesis Defense

  • F. Zakkak - zakkak@ics.forth.gr
slide-32
SLIDE 32

2 Java Distributed Memory Model 2.2 The model 9/ 31

JDMM Introduced Well-Formedness Rules

R W Iv F nor nor W

  • r

F B B W W Synchronization action Regular action Cache action ≤𝑞𝑝: program order ≤𝑡𝑝: synchronization order ≤𝑡𝑥: synchronizes-with ≤ℎ𝑐: happens-before Vr

F

B

Vw

Ph.D. Thesis Defense

  • F. Zakkak - zakkak@ics.forth.gr
slide-33
SLIDE 33

2 Java Distributed Memory Model 2.2 The model 9/ 31

JDMM Introduced Well-Formedness Rules

R W Iv F nor nor W

  • r

F B B W W Synchronization action Regular action Cache action ≤𝑞𝑝: program order ≤𝑡𝑝: synchronization order ≤𝑡𝑥: synchronizes-with ≤ℎ𝑐: happens-before Vr

F

B

Vw

Ph.D. Thesis Defense

  • F. Zakkak - zakkak@ics.forth.gr
slide-34
SLIDE 34

2 Java Distributed Memory Model 2.2 The model 9/ 31

JDMM Introduced Well-Formedness Rules

R W Iv F nor nor W

  • r

F B B W W Synchronization action Regular action Cache action ≤𝑞𝑝: program order ≤𝑡𝑝: synchronization order ≤𝑡𝑥: synchronizes-with ≤ℎ𝑐: happens-before Vr

F

B

Vw

Ph.D. Thesis Defense

  • F. Zakkak - zakkak@ics.forth.gr
slide-35
SLIDE 35

2 Java Distributed Memory Model 2.2 The model 9/ 31

JDMM Introduced Well-Formedness Rules

R W Iv F nor nor W

  • r

F B B W W Synchronization action Regular action Cache action ≤𝑞𝑝: program order ≤𝑡𝑝: synchronization order ≤𝑡𝑥: synchronizes-with ≤ℎ𝑐: happens-before Vr

F

B

Vw

Ph.D. Thesis Defense

  • F. Zakkak - zakkak@ics.forth.gr
slide-36
SLIDE 36

2 Java Distributed Memory Model 2.2 The model 9/ 31

JDMM Introduced Well-Formedness Rules

R W Iv F nor nor W

  • r

F B B W W Synchronization action Regular action Cache action ≤𝑞𝑝: program order ≤𝑡𝑝: synchronization order ≤𝑡𝑥: synchronizes-with ≤ℎ𝑐: happens-before Vr

F

B

Vw

Ph.D. Thesis Defense

  • F. Zakkak - zakkak@ics.forth.gr
slide-37
SLIDE 37

2 Java Distributed Memory Model 2.2 The model 9/ 31

JDMM Introduced Well-Formedness Rules

R W Iv F nor nor W

  • r

F B B W W Synchronization action Regular action Cache action ≤𝑞𝑝: program order ≤𝑡𝑝: synchronization order ≤𝑡𝑥: synchronizes-with ≤ℎ𝑐: happens-before Vr

F

B

Vw

Ph.D. Thesis Defense

  • F. Zakkak - zakkak@ics.forth.gr
slide-38
SLIDE 38

2 Java Distributed Memory Model 2.2 The model 9/ 31

JDMM Introduced Well-Formedness Rules

R W Iv F nor nor W

  • r

F B B W W Synchronization action Regular action Cache action ≤𝑞𝑝: program order ≤𝑡𝑝: synchronization order ≤𝑡𝑥: synchronizes-with ≤ℎ𝑐: happens-before Vr

F

B

Vw

Ph.D. Thesis Defense

  • F. Zakkak - zakkak@ics.forth.gr
slide-39
SLIDE 39

2 Java Distributed Memory Model 2.2 The model 9/ 31

JDMM Introduced Well-Formedness Rules

R W Iv F nor nor W

  • r

F B B W W Synchronization action Regular action Cache action ≤𝑞𝑝: program order ≤𝑡𝑝: synchronization order ≤𝑡𝑥: synchronizes-with ≤ℎ𝑐: happens-before Vr

F

B

Vw

Ph.D. Thesis Defense

  • F. Zakkak - zakkak@ics.forth.gr
slide-40
SLIDE 40

2 Java Distributed Memory Model 2.2 The model 9/ 31

JDMM Introduced Well-Formedness Rules

R W Iv F nor nor W

  • r

F B B W W Synchronization action Regular action Cache action ≤𝑞𝑝: program order ≤𝑡𝑝: synchronization order ≤𝑡𝑥: synchronizes-with ≤ℎ𝑐: happens-before Vr

F

B

Vw

Ph.D. Thesis Defense

  • F. Zakkak - zakkak@ics.forth.gr
slide-41
SLIDE 41

2 Java Distributed Memory Model 2.2 The model 9/ 31

JDMM Introduced Well-Formedness Rules

R W Iv F nor nor W

  • r

F B B W W Synchronization action Regular action Cache action ≤𝑞𝑝: program order ≤𝑡𝑝: synchronization order ≤𝑡𝑥: synchronizes-with ≤ℎ𝑐: happens-before Vr

F

B

Vw

Ph.D. Thesis Defense

  • F. Zakkak - zakkak@ics.forth.gr
slide-42
SLIDE 42

2 Java Distributed Memory Model 2.2 The model 10/ 31

Formalization

JMM Execution

E = ⟨P, A, ≤𝑞𝑝, ≤𝑡𝑝, W(), V(), ≤𝑡𝑥, ≤ℎ𝑐⟩

JDMM Execution

ED = ⟨P, AD, ≤𝑒

𝑞𝑝, ≤𝑒 𝑡𝑝, W(), V(), ≤𝑒 𝑡𝑥, ≤𝑒 ℎ𝑐, Cs(), Bf(), Ab(), Ai()⟩

JMM & JDMM Rules

∀r ∈ AD ∶ ∃𝑦 ∈ AD ∶ (x ≤𝑒

𝑞𝑝 r) ∧ (x.v = r.v) ∧ (x.k ∈ {W, F})

Ph.D. Thesis Defense

  • F. Zakkak - zakkak@ics.forth.gr
slide-43
SLIDE 43

2 Java Distributed Memory Model 2.2 The model 10/ 31

Formalization

JMM Execution

E = ⟨P, A, ≤𝑞𝑝, ≤𝑡𝑝, W(), V(), ≤𝑡𝑥, ≤ℎ𝑐⟩

JDMM Execution

ED = ⟨P, AD, ≤𝑒

𝑞𝑝, ≤𝑒 𝑡𝑝, W(), V(), ≤𝑒 𝑡𝑥, ≤𝑒 ℎ𝑐, Cs(), Bf(), Ab(), Ai()⟩

JMM & JDMM Rules

∀r ∈ AD ∶ ∃𝑦 ∈ AD ∶ (x ≤𝑒

𝑞𝑝 r) ∧ (x.v = r.v) ∧ (x.k ∈ {W, F})

Ph.D. Thesis Defense

  • F. Zakkak - zakkak@ics.forth.gr
slide-44
SLIDE 44

2 Java Distributed Memory Model 2.2 The model 10/ 31

Formalization

JMM Execution

E = ⟨P, A, ≤𝑞𝑝, ≤𝑡𝑝, W(), V(), ≤𝑡𝑥, ≤ℎ𝑐⟩

JDMM Execution

ED = ⟨P, AD, ≤𝑒

𝑞𝑝, ≤𝑒 𝑡𝑝, W(), V(), ≤𝑒 𝑡𝑥, ≤𝑒 ℎ𝑐, Cs(), Bf(), Ab(), Ai()⟩

JMM & JDMM Rules

∀r ∈ AD ∶ ∃𝑦 ∈ AD ∶ (x ≤𝑒

𝑞𝑝 r) ∧ (x.v = r.v) ∧ (x.k ∈ {W, F})

Ph.D. Thesis Defense

  • F. Zakkak - zakkak@ics.forth.gr
slide-45
SLIDE 45

2 Java Distributed Memory Model 2.2 The model 10/ 31

Formalization

JMM Execution

E = ⟨P, A, ≤𝑞𝑝, ≤𝑡𝑝, W(), V(), ≤𝑡𝑥, ≤ℎ𝑐⟩

JDMM Execution

ED = ⟨P, AD, ≤𝑒

𝑞𝑝, ≤𝑒 𝑡𝑝, W(), V(), ≤𝑒 𝑡𝑥, ≤𝑒 ℎ𝑐, Cs(), Bf(), Ab(), Ai()⟩

JMM & JDMM Rules

∀r ∈ AD ∶ ∃𝑦 ∈ AD ∶ (x ≤𝑒

𝑞𝑝 r) ∧ (x.v = r.v) ∧ (x.k ∈ {W, F})

Ph.D. Thesis Defense

  • F. Zakkak - zakkak@ics.forth.gr
slide-46
SLIDE 46

2 Java Distributed Memory Model 2.2 The model 11/ 31

JDMM Properties

■ Allows same optimizations as JMM ■ Satisfjes the same causality test cases as JMM □ Small tests, testing the conformance of the implementation and specifjcation to

the expected behaviour

□ Behaviors extracted by JVM implementations at the time of JMM specifjcation Ph.D. Thesis Defense

  • F. Zakkak - zakkak@ics.forth.gr
slide-47
SLIDE 47

2 Java Distributed Memory Model 2.2 The model 12/ 31

Implementation Advice Based on JDMM

■ Avoid sync on nested monitor acquisition ■ Avoid local caching ■ Avoid sync at context switching ■ Sync at thread migration ■ Direct cache-to-cache transfers Ph.D. Thesis Defense

  • F. Zakkak - zakkak@ics.forth.gr
slide-48
SLIDE 48

Outline

1 Introduction 2 Java Distributed Memory Model 3 Designing DiSquawk 4 Modeling DiSquawk 5 Implementation & Evaluation 6 Conclusions

slide-49
SLIDE 49

3 Designing DiSquawk 13/ 31

Memory Management

Ph.D. Thesis Defense

  • F. Zakkak - zakkak@ics.forth.gr
slide-50
SLIDE 50

3 Designing DiSquawk 14/ 31

Software Caching

■ Private write caches per thread □ Write-back on capacity misses □ Write-back at release operations ■ Shared read caches per core □ Invalidate on capacity misses □ Invalidate on acquire operations □ Update on write-back

T1 T2 T1 T2

Ph.D. Thesis Defense

  • F. Zakkak - zakkak@ics.forth.gr
slide-51
SLIDE 51

3 Designing DiSquawk 14/ 31

Software Caching

■ Private write caches per thread □ Write-back on capacity misses □ Write-back at release operations ■ Shared read caches per core □ Invalidate on capacity misses □ Invalidate on acquire operations □ Update on write-back

T1 T2 T1 T2 m-enter write m-exit m-enter read m-exit

Ph.D. Thesis Defense

  • F. Zakkak - zakkak@ics.forth.gr
slide-52
SLIDE 52

3 Designing DiSquawk 14/ 31

Software Caching

■ Private write caches per thread □ Write-back on capacity misses □ Write-back at release operations ■ Shared read caches per core □ Invalidate on capacity misses □ Invalidate on acquire operations □ Update on write-back

T1 T2 T1 T2 m-enter write m-exit m-enter read m-exit

Ph.D. Thesis Defense

  • F. Zakkak - zakkak@ics.forth.gr
slide-53
SLIDE 53

3 Designing DiSquawk 14/ 31

Software Caching

■ Private write caches per thread □ Write-back on capacity misses □ Write-back at release operations ■ Shared read caches per core □ Invalidate on capacity misses □ Invalidate on acquire operations □ Update on write-back

T1 T2 T1 T2 m-enter write m-exit m-enter read m-exit

Ph.D. Thesis Defense

  • F. Zakkak - zakkak@ics.forth.gr
slide-54
SLIDE 54

3 Designing DiSquawk 15/ 31

Synchronization

■ Synchronization Managers (on dedicated cores) ■ Queuing requests when monitor not available ■ Co-located threads may reuse monitors to reduce network traffjc (combining) Ph.D. Thesis Defense

  • F. Zakkak - zakkak@ics.forth.gr
slide-55
SLIDE 55

3 Designing DiSquawk 16/ 31

Scheduling

■ New thread Start

  • 1. Pick a random core from the same island
  • 2. If its deque is full pick a random island

■ Work-Stealing (asynchronous) within coherent-islands

  • 1. Attempt ⌈√#Cores per island⌉ steals

■ Work-Dealing (synchronous) across coherent-islands

  • 1. Attempt ⌈√#Coherent islands⌉ deals
  • 2. Get half tasks

Ph.D. Thesis Defense

  • F. Zakkak - zakkak@ics.forth.gr
slide-56
SLIDE 56

3 Designing DiSquawk 17/ 31

Thread.start()

  • 1. Weak thread creation
  • 2. Send thread start request

to random core

  • 3. Local thread initialization
  • 4. Local stack allocation

■ Running thread operates on VMThread class ■ Remote threads synchronize with Thread class

Thread state VMThread VMThread stack …

Ph.D. Thesis Defense

  • F. Zakkak - zakkak@ics.forth.gr
slide-57
SLIDE 57

Outline

1 Introduction 2 Java Distributed Memory Model 3 Designing DiSquawk 4 Modeling DiSquawk 5 Implementation & Evaluation 6 Conclusions

slide-58
SLIDE 58

4 Modeling DiSquawk 18/ 31

Approach

■ Defjnition of Distributed Java Calculus (DJC), a Java core calculus □ Explicit cache operations □ Locking & Synchronization actions ■ Operational semantics refmecting DiSquawk’s design Ph.D. Thesis Defense

  • F. Zakkak - zakkak@ics.forth.gr
slide-59
SLIDE 59

4 Modeling DiSquawk 19/ 31

Modeling DiSquawk

  • 1.f1 o1.f2 o2.f1 o3.f1
  • 1.f1 o1.f2 o2.f1 o3.f1

𝒟1 o1.f1 o1.f2

  • 1.f1 o1.f2
  • 1.f1 o1.f2 𝒠1 o1.f2
  • 1.f2

𝒟2 o1.f1 o1.f2

  • 1.f1 o1.f2
  • 1.f1 o1.f2 𝒠2 o1.f2

run() { // Thread 1 // ...

  • 1.f2++;

} run() { // Thread 2 int temp; // ... temp = o1.f1; thread1.join(); temp = o1.f2; }

F R W B Fi

F R Iv J F R Fi

Ph.D. Thesis Defense

  • F. Zakkak - zakkak@ics.forth.gr
slide-60
SLIDE 60

4 Modeling DiSquawk 19/ 31

Modeling DiSquawk

  • 1.f1 o1.f2 o2.f1 o3.f1
  • 1.f1 o1.f2 o2.f1 o3.f1

𝒟1 o1.f1 o1.f2

  • 1.f1 o1.f2
  • 1.f1 o1.f2 𝒠1 o1.f2
  • 1.f2

𝒟2 o1.f1 o1.f2

  • 1.f1 o1.f2
  • 1.f1 o1.f2 𝒠2 o1.f2

run() { // Thread 1 // ...

  • 1.f2++;

} run() { // Thread 2 int temp; // ... temp = o1.f1; thread1.join(); temp = o1.f2; }

F R W B Fi

F R Iv J F R Fi

Ph.D. Thesis Defense

  • F. Zakkak - zakkak@ics.forth.gr
slide-61
SLIDE 61

4 Modeling DiSquawk 19/ 31

Modeling DiSquawk

  • 1.f1 o1.f2 o2.f1 o3.f1
  • 1.f1 o1.f2 o2.f1 o3.f1

𝒟1 o1.f1 o1.f2

  • 1.f1 o1.f2
  • 1.f1 o1.f2 𝒠1 o1.f2
  • 1.f2

𝒟2 o1.f1 o1.f2

  • 1.f1 o1.f2
  • 1.f1 o1.f2 𝒠2 o1.f2

run() { // Thread 1 // ...

  • 1.f2++;

} run() { // Thread 2 int temp; // ... temp = o1.f1; thread1.join(); temp = o1.f2; }

F R W B Fi

F R Iv J F R Fi

Ph.D. Thesis Defense

  • F. Zakkak - zakkak@ics.forth.gr
slide-62
SLIDE 62

4 Modeling DiSquawk 19/ 31

Modeling DiSquawk

  • 1.f1 o1.f2 o2.f1 o3.f1
  • 1.f1 o1.f2 o2.f1 o3.f1

𝒟1 o1.f1 o1.f2

  • 1.f1 o1.f2
  • 1.f1 o1.f2 𝒠1 o1.f2
  • 1.f2

𝒟2 o1.f1 o1.f2

  • 1.f1 o1.f2
  • 1.f1 o1.f2 𝒠2 o1.f2

run() { // Thread 1 // ...

  • 1.f2++;

} run() { // Thread 2 int temp; // ... temp = o1.f1; thread1.join(); temp = o1.f2; }

F R W B Fi

F R Iv J F R Fi

Ph.D. Thesis Defense

  • F. Zakkak - zakkak@ics.forth.gr
slide-63
SLIDE 63

4 Modeling DiSquawk 19/ 31

Modeling DiSquawk

  • 1.f1 o1.f2 o2.f1 o3.f1
  • 1.f1 o1.f2 o2.f1 o3.f1

𝒟1 o1.f1 o1.f2

  • 1.f1 o1.f2
  • 1.f1 o1.f2 𝒠1 o1.f2
  • 1.f2

𝒟2 o1.f1 o1.f2

  • 1.f1 o1.f2
  • 1.f1 o1.f2 𝒠2 o1.f2

run() { // Thread 1 // ...

  • 1.f2++;

} run() { // Thread 2 int temp; // ... temp = o1.f1; thread1.join(); temp = o1.f2; }

F R W B Fi

F R Iv J F R Fi

Ph.D. Thesis Defense

  • F. Zakkak - zakkak@ics.forth.gr
slide-64
SLIDE 64

4 Modeling DiSquawk 19/ 31

Modeling DiSquawk

  • 1.f1 o1.f2 o2.f1 o3.f1
  • 1.f1 o1.f2 o2.f1 o3.f1

𝒟1 o1.f1 o1.f2

  • 1.f1 o1.f2
  • 1.f1 o1.f2 𝒠1 o1.f2
  • 1.f2

𝒟2 o1.f1 o1.f2

  • 1.f1 o1.f2
  • 1.f1 o1.f2 𝒠2 o1.f2

run() { // Thread 1 // ...

  • 1.f2++;

} run() { // Thread 2 int temp; // ... temp = o1.f1; thread1.join(); temp = o1.f2; }

F R W B Fi

F R Iv J F R Fi

Ph.D. Thesis Defense

  • F. Zakkak - zakkak@ics.forth.gr
slide-65
SLIDE 65

4 Modeling DiSquawk 19/ 31

Modeling DiSquawk

  • 1.f1 o1.f2 o2.f1 o3.f1
  • 1.f1 o1.f2 o2.f1 o3.f1

𝒟1 o1.f1 o1.f2

  • 1.f1 o1.f2
  • 1.f1 o1.f2 𝒠1 o1.f2
  • 1.f2

𝒟2 o1.f1 o1.f2

  • 1.f1 o1.f2
  • 1.f1 o1.f2 𝒠2 o1.f2

run() { // Thread 1 // ...

  • 1.f2++;

} run() { // Thread 2 int temp; // ... temp = o1.f1; thread1.join(); temp = o1.f2; }

F R W B Fi

F R Iv J F R Fi

Ph.D. Thesis Defense

  • F. Zakkak - zakkak@ics.forth.gr
slide-66
SLIDE 66

4 Modeling DiSquawk 19/ 31

Modeling DiSquawk

  • 1.f1 o1.f2 o2.f1 o3.f1
  • 1.f1 o1.f2 o2.f1 o3.f1

𝒟1 o1.f1 o1.f2

  • 1.f1 o1.f2
  • 1.f1 o1.f2 𝒠1 o1.f2
  • 1.f2

𝒟2 o1.f1 o1.f2

  • 1.f1 o1.f2
  • 1.f1 o1.f2 𝒠2 o1.f2

run() { // Thread 1 // ...

  • 1.f2++;

} run() { // Thread 2 int temp; // ... temp = o1.f1; thread1.join(); temp = o1.f2; }

F R W B Fi

F R Iv J F R Fi

Ph.D. Thesis Defense

  • F. Zakkak - zakkak@ics.forth.gr
slide-67
SLIDE 67

4 Modeling DiSquawk 19/ 31

Modeling DiSquawk

  • 1.f1 o1.f2 o2.f1 o3.f1
  • 1.f1 o1.f2 o2.f1 o3.f1

𝒟1 o1.f1 o1.f2

  • 1.f1 o1.f2
  • 1.f1 o1.f2 𝒠1 o1.f2
  • 1.f2

𝒟2 o1.f1 o1.f2

  • 1.f1 o1.f2
  • 1.f1 o1.f2 𝒠2 o1.f2

run() { // Thread 1 // ...

  • 1.f2++;

} run() { // Thread 2 int temp; // ... temp = o1.f1; thread1.join(); temp = o1.f2; }

F R W B Fi

F R Iv J F R Fi

Ph.D. Thesis Defense

  • F. Zakkak - zakkak@ics.forth.gr
slide-68
SLIDE 68

4 Modeling DiSquawk 19/ 31

Modeling DiSquawk

  • 1.f1 o1.f2 o2.f1 o3.f1
  • 1.f1 o1.f2 o2.f1 o3.f1

𝒟1 o1.f1 o1.f2

  • 1.f1 o1.f2
  • 1.f1 o1.f2 𝒠1 o1.f2
  • 1.f2

𝒟2 o1.f1 o1.f2

  • 1.f1 o1.f2
  • 1.f1 o1.f2 𝒠2 o1.f2

run() { // Thread 1 // ...

  • 1.f2++;

} run() { // Thread 2 int temp; // ... temp = o1.f1; thread1.join(); temp = o1.f2; }

F R W B Fi

F R Iv J F R Fi

Ph.D. Thesis Defense

  • F. Zakkak - zakkak@ics.forth.gr
slide-69
SLIDE 69

4 Modeling DiSquawk 19/ 31

Modeling DiSquawk

  • 1.f1 o1.f2 o2.f1 o3.f1
  • 1.f1 o1.f2 o2.f1 o3.f1

𝒟1 o1.f1 o1.f2

  • 1.f1 o1.f2
  • 1.f1 o1.f2 𝒠1 o1.f2
  • 1.f2

𝒟2 o1.f1 o1.f2

  • 1.f1 o1.f2
  • 1.f1 o1.f2 𝒠2 o1.f2

run() { // Thread 1 // ...

  • 1.f2++;

} run() { // Thread 2 int temp; // ... temp = o1.f1; thread1.join(); temp = o1.f2; }

F R W B Fi

F R Iv J F R Fi

Ph.D. Thesis Defense

  • F. Zakkak - zakkak@ics.forth.gr
slide-70
SLIDE 70

4 Modeling DiSquawk 19/ 31

Modeling DiSquawk

  • 1.f1 o1.f2 o2.f1 o3.f1
  • 1.f1 o1.f2 o2.f1 o3.f1

𝒟1 o1.f1 o1.f2

  • 1.f1 o1.f2
  • 1.f1 o1.f2 𝒠1 o1.f2
  • 1.f2

𝒟2 o1.f1 o1.f2

  • 1.f1 o1.f2
  • 1.f1 o1.f2 𝒠2 o1.f2

run() { // Thread 1 // ...

  • 1.f2++;

} run() { // Thread 2 int temp; // ... temp = o1.f1; thread1.join(); temp = o1.f2; }

F R W B Fi

F R Iv J F R Fi

Ph.D. Thesis Defense

  • F. Zakkak - zakkak@ics.forth.gr
slide-71
SLIDE 71

4 Modeling DiSquawk 19/ 31

Modeling DiSquawk

  • 1.f1 o1.f2 o2.f1 o3.f1
  • 1.f1 o1.f2 o2.f1 o3.f1

𝒟1 o1.f1 o1.f2

  • 1.f1 o1.f2
  • 1.f1 o1.f2 𝒠1 o1.f2
  • 1.f2

𝒟2 o1.f1 o1.f2

  • 1.f1 o1.f2
  • 1.f1 o1.f2 𝒠2 o1.f2

run() { // Thread 1 // ...

  • 1.f2++;

} run() { // Thread 2 int temp; // ... temp = o1.f1; thread1.join(); temp = o1.f2; }

F R W B Fi

F R Iv J F R Fi

Ph.D. Thesis Defense

  • F. Zakkak - zakkak@ics.forth.gr
slide-72
SLIDE 72

4 Modeling DiSquawk 20/ 31

Distributed Java Calculus

DJC Memory State

ℋ; ⃗ 𝒟; ⃗ 𝒠

DJC Execution Step

ℋ; ⃗ 𝒟; ⃗ 𝒠 ⊢ 𝑈

⃗ 𝛽

− →

⃗ 𝑑 ℋ ′; ⃗

𝒟 ′; ⃗ 𝒠 ′ ⊢ 𝑈 ′

Ph.D. Thesis Defense

  • F. Zakkak - zakkak@ics.forth.gr
slide-73
SLIDE 73

4 Modeling DiSquawk 20/ 31

Distributed Java Calculus

DJC Memory State

ℋ; ⃗ 𝒟; ⃗ 𝒠

DJC Execution Step

ℋ; ⃗ 𝒟; ⃗ 𝒠 ⊢ 𝑈

⃗ 𝛽

− →

⃗ 𝑑 ℋ ′; ⃗

𝒟 ′; ⃗ 𝒠 ′ ⊢ 𝑈 ′

Ph.D. Thesis Defense

  • F. Zakkak - zakkak@ics.forth.gr
slide-74
SLIDE 74

4 Modeling DiSquawk 20/ 31

Distributed Java Calculus

DJC Memory State

ℋ; ⃗ 𝒟; ⃗ 𝒠

DJC Execution Step

ℋ; ⃗ 𝒟; ⃗ 𝒠 ⊢ 𝑈

⃗ 𝛽

− →

⃗ 𝑑 ℋ ′; ⃗

𝒟 ′; ⃗ 𝒠 ′ ⊢ 𝑈 ′

Ph.D. Thesis Defense

  • F. Zakkak - zakkak@ics.forth.gr
slide-75
SLIDE 75

Outline

1 Introduction 2 Java Distributed Memory Model 3 Designing DiSquawk 4 Modeling DiSquawk 5 Implementation & Evaluation 6 Conclusions

slide-76
SLIDE 76

5 Implementation & Evaluation 5.1 Implementation 21/ 31

DiSquawk

■ Runs on the Formic-Cube [Lyberis et al. 2012] ■ Bare-metal ■ Single System Image (SSI) ■ Based on Squawk

[Simon et al. 2006]

Not OS dependent Targets single-core embedded devices

■ Uses Myrmics’ core libraries

[Lyberis 2013]

Ph.D. Thesis Defense

  • F. Zakkak - zakkak@ics.forth.gr
slide-77
SLIDE 77

5 Implementation & Evaluation 5.1 Implementation 22/ 31

The Formic-Cube

■ 64-boards ■ 8-cores per board ■ 128MiB per board ■ 512-cores in total ■ 8GiB in total ■ No memory coherence

(not even on the same board)

■ Emulator (CPU clock @10MHz) Ph.D. Thesis Defense

  • F. Zakkak - zakkak@ics.forth.gr
slide-78
SLIDE 78

5 Implementation & Evaluation 5.1 Implementation 23/ 31

Limitations

■ Garbage Collection ■ Files / File-system ■ Sockets / Network ■ java.util.concurrent ■ Coherent Islands ■ JIT compilation Ph.D. Thesis Defense

  • F. Zakkak - zakkak@ics.forth.gr
slide-79
SLIDE 79

5 Implementation & Evaluation 5.2 Evaluation 24/ 31

Methodology

■ 4 benchmarks (Java Grande and PARSEC) + microbenchmarks ■ Compare scale factor against the HotSpotVM □ x86 64-core NUMA machine □ Disabled JIT compilation (-Xint) □ Run in server mode (-server) □ Tuned heap size to avoid garbage collection ■ Evaluate weak scaling Ph.D. Thesis Defense

  • F. Zakkak - zakkak@ics.forth.gr
slide-80
SLIDE 80

5 Implementation & Evaluation 5.2 Evaluation 25/ 31

SOR (Successive Over-Relaxation)

■ Stencil computation ■ Volatile accesses at every iteration ■ Memory intensive ■ 143% DiSquawk overhead on 1 thread

0.5 1 1.5 2 2.5 3 Normalized Execution Time Baseline Overhead 0.25 1 2 4 8 16 32 64 128 256 512 1 2 4 8 16 32 64 128 256 504 Throughput Scaling Number of Java threads (1 per core) Linear HotSpot DiSquawk

Ph.D. Thesis Defense

  • F. Zakkak - zakkak@ics.forth.gr
slide-81
SLIDE 81

5 Implementation & Evaluation 5.2 Evaluation 26/ 31

Crypt

■ 2-phase (1. Encrypt 2. Decrypt) ■ Synchronization through barrier ■ Memory intensive ■ 181% DiSquawk overhead on 1 thread

0.5 1 1.5 2 2.5 3 Normalized Execution Time Baseline Overhead 0.25 1 2 4 8 16 32 64 128 256 512 1 2 4 8 16 32 64 128 256 504 Throughput Scaling Number of Java threads (1 per core) Linear HotSpot DiSquawk

Ph.D. Thesis Defense

  • F. Zakkak - zakkak@ics.forth.gr
slide-82
SLIDE 82

5 Implementation & Evaluation 5.2 Evaluation 27/ 31

Black-Scholes

■ Embarrassingly parallel ■ Computation intensity > Memory

intensity

■ 72% DiSquawk overhead on 1 thread

0.5 1 1.5 2 2.5 3 Normalized Execution Time Baseline Overhead 0.25 1 2 4 8 16 32 64 128 256 512 1 2 4 8 16 32 64 128 256 504 Throughput Scaling Number of Java threads (1 per core) Linear HotSpot DiSquawk

Ph.D. Thesis Defense

  • F. Zakkak - zakkak@ics.forth.gr
slide-83
SLIDE 83

5 Implementation & Evaluation 5.2 Evaluation 28/ 31

Series

■ Fourier coeffjcients calculation ■ Embarrassingly parallel ■ Compute intensive ■ 16% DiSquawk overhead on 1 thread

0.5 1 1.5 2 2.5 3 Normalized Execution Time Baseline Overhead 0.25 1 2 4 8 16 32 64 128 256 512 1 2 4 8 16 32 64 128 256 504 Throughput Scaling Number of Java threads (1 per core) Linear HotSpot DiSquawk

Ph.D. Thesis Defense

  • F. Zakkak - zakkak@ics.forth.gr
slide-84
SLIDE 84

5 Implementation & Evaluation 5.2 Evaluation 29/ 31

Overhead Breakdown

0.5 1 1.5 2 2.5 3 SOR Crypt Black-Scholes Series Normalized Execution Time Baseline SC Overhead SM Overhead

Ph.D. Thesis Defense

  • F. Zakkak - zakkak@ics.forth.gr
slide-85
SLIDE 85

6 Conclusions 29/ 31

Outline

1 Introduction 2 Java Distributed Memory Model 3 Designing DiSquawk 4 Modeling DiSquawk 5 Implementation & Evaluation 6 Conclusions

slide-86
SLIDE 86

6 Conclusions 6.1 Takeaways 30/ 31

Takeaways

■ We can run Java on non-cache-coherent architectures with hundreds of cores ■ If you are willing to follow our example: □ download DiSquawk and port it to your platform

  • r

□ start by understanding JDMM □ study our design and let us know of any optimizations you might thing of □ use DJC to argue about your implementation’s correctness Ph.D. Thesis Defense

  • F. Zakkak - zakkak@ics.forth.gr
slide-87
SLIDE 87

6 Conclusions 6.2 Open Research Problems 31/ 31

Open Research Problems

■ Machine checked proofs ■ Implementation & Evaluation on partially coherent architectures

(e.g., EUROSERVER [Durand et al. 2014])

■ Use of a state-of-the-art JVM as the base ■ On-the-fmy/Concurrent Garbage Collection ■ Study of java.util.concurrent and porting to future many-core

architectures (started in GreenVM project)

■ Java Memory Model update (ongoing JEP-188)

Thank You!

Ph.D. Thesis Defense

  • F. Zakkak - zakkak@ics.forth.gr
slide-88
SLIDE 88

6 Conclusions 6.2 Open Research Problems 31/ 31

Open Research Problems

■ Machine checked proofs ■ Implementation & Evaluation on partially coherent architectures

(e.g., EUROSERVER [Durand et al. 2014])

■ Use of a state-of-the-art JVM as the base ■ On-the-fmy/Concurrent Garbage Collection ■ Study of java.util.concurrent and porting to future many-core

architectures (started in GreenVM project)

■ Java Memory Model update (ongoing JEP-188)

Thank You!

Ph.D. Thesis Defense

  • F. Zakkak - zakkak@ics.forth.gr
slide-89
SLIDE 89

7 Backup Slides 7.1 Publications 31/ 31

Publications

Peer reviewed

■ Java Distributed Memory Model in ISMM’14 ■ Distributed Java Calculus in PPPJ’16 ■ DiSquawk Design in JTRES’16

Other

■ DiSquawk open-source @github ■ DJC technical report Ph.D. Thesis Defense

  • F. Zakkak - zakkak@ics.forth.gr
slide-90
SLIDE 90

7 Backup Slides 7.1 Publications 31/ 31

Conferences

ISMM’14

■ International Symposium on Memory Management ■ Co-located with PLDI Ph.D. Thesis Defense

  • F. Zakkak - zakkak@ics.forth.gr
slide-91
SLIDE 91

7 Backup Slides 7.1 Publications 31/ 31

Conferences

PPPJ’16

■ 13th International Conference on Principles and Practices of Programming on

the Java Platform: Virtual Machines, Languages, and Tools

■ Part of Managed Languages & Runtimes week ’16 Ph.D. Thesis Defense

  • F. Zakkak - zakkak@ics.forth.gr
slide-92
SLIDE 92

7 Backup Slides 7.1 Publications 31/ 31

Conferences

JTRES’16

■ 14th International Workshop on Java Technologies for Real-time and Embedded

Systems

■ Part of Managed Languages & Runtimes week ’16 Ph.D. Thesis Defense

  • F. Zakkak - zakkak@ics.forth.gr
slide-93
SLIDE 93

7 Backup Slides 7.1 Publications 31/ 31

Synchronization

■ Synchronization Managers

(on dedicated cores)

■ Queuing requests when monitor not available ■ Co-located threads may reuse monitors to reduce network traffjc (combining) Ph.D. Thesis Defense

  • F. Zakkak - zakkak@ics.forth.gr
slide-94
SLIDE 94

7 Backup Slides 7.1 Publications 31/ 31

Formalization

DJC Local Execution Step Example

[Assign]

𝑠 ∈ dom (ℋ) ¬volatile (𝑠.𝑔) 𝒠 ′ = 𝒠[𝑠.𝑔 ↦ 𝑤] ℋ; 𝒟; 𝒠 ⊢ 𝑑⟨𝑠𝑢, 𝑠.𝑔 ≔ 𝑤⟩

𝑋

− − → ℋ; 𝒟; 𝒠 ′ ⊢ 𝑑⟨𝑠𝑢, 𝑤⟩

DJC Global Execution Step Example

[Spawn]

ℋ(𝑠𝑢′) = ⟨𝐷, ⃖⃖⃖⃖⃖⃖⃖⃖ ⃗ 𝑔 ↦ 𝑤⟩ ℋ ′(𝑠𝑢′) = ⟨𝐷, ⃖⃖⃖⃖⃖⃖⃖⃖ ⃗ 𝑔 ↦ 𝑤, spawned⟩ run(){return 𝑓; } ∈ 𝐷 ⃗ 𝒠(𝑑) = ∅ 𝑑′ ∈ Cids ℋ; ⃗ 𝒟; ⃗ 𝒠 ⊢ 𝑑⟨𝑠𝑢, 𝑠𝑢′.start()⟩

{𝑇𝑞}

− − − →

{𝑑}

ℋ ′; ⃗ 𝒟; ⃗ 𝒠 ⊢ 𝑑⟨𝑠𝑢, ()⟩ ∥ 𝑑′⟨𝑠𝑢′, start⟩

Ph.D. Thesis Defense

  • F. Zakkak - zakkak@ics.forth.gr
slide-95
SLIDE 95

7 Backup Slides 7.1 Publications 31/ 31

Formalization

DJC Local Execution Step Example

[Assign]

𝑠 ∈ dom (ℋ) ¬volatile (𝑠.𝑔) 𝒠 ′ = 𝒠[𝑠.𝑔 ↦ 𝑤] ℋ; 𝒟; 𝒠 ⊢ 𝑑⟨𝑠𝑢, 𝑠.𝑔 ≔ 𝑤⟩

𝑋

− − → ℋ; 𝒟; 𝒠 ′ ⊢ 𝑑⟨𝑠𝑢, 𝑤⟩

DJC Global Execution Step Example

[Spawn]

ℋ(𝑠𝑢′) = ⟨𝐷, ⃖⃖⃖⃖⃖⃖⃖⃖ ⃗ 𝑔 ↦ 𝑤⟩ ℋ ′(𝑠𝑢′) = ⟨𝐷, ⃖⃖⃖⃖⃖⃖⃖⃖ ⃗ 𝑔 ↦ 𝑤, spawned⟩ run(){return 𝑓; } ∈ 𝐷 ⃗ 𝒠(𝑑) = ∅ 𝑑′ ∈ Cids ℋ; ⃗ 𝒟; ⃗ 𝒠 ⊢ 𝑑⟨𝑠𝑢, 𝑠𝑢′.start()⟩

{𝑇𝑞}

− − − →

{𝑑}

ℋ ′; ⃗ 𝒟; ⃗ 𝒠 ⊢ 𝑑⟨𝑠𝑢, ()⟩ ∥ 𝑑′⟨𝑠𝑢′, start⟩

Ph.D. Thesis Defense

  • F. Zakkak - zakkak@ics.forth.gr