DEALING WITH LAYERS OF OBFUSCATION IN PSEUDO-UNIFORM MEMORY Robert - PowerPoint PPT Presentation

ROME Workshop, August 23, 2016 DEALING WITH LAYERS OF OBFUSCATION IN PSEUDO-UNIFORM MEMORY Robert Kuban, Mark Simon Sch¨ ops, J¨ org Nolte, Randolf Rotta rottaran@b-tu.de 1 Research supported by German BMBF grant 01IH13003.

PROBLEM: MEMORY LATENCY ON INTEL XEON PHI KNC Example: Measuring avg. time is unstable between restarts Affects: micro-benchmarks, algorithm tuning, developer’s sanity. . . also application performance? ⇒ Outline 1. Causes? 2. Solutions? 3. Is it worthwhile? 2

OUTLINE 1. Causes? 2. Solutions? 3. Is it worthwhile? 4. Conclusions 1 · Causes? 3

CAUSES: MULTIPLE PERFORMANCE BOTTLENECKS 1 1. compute bound core core cache cache 2. memory throughput : streaming, matrix alg. 4,5 3. memory latency : key-value stores, graphs coherence directory 4. coherence latency : synchronisation variable 2,3 5. coherence throughput : many sync. variables memory 1 · Causes? 4

HW SOLUTION: STRIPING TO MAXIMISE THROUGHPUT 1. striping over memory channels, banks, and coherence directories 2. past: NUMA throughput bottlenecks ⇒ mostly local striping 3. many-cores: no throughput bottlenecks but larger network core core core core cache cache cache cache directory directory directory memory memory memory memory memory memory 1 · Causes? 5

HW SOLUTION: STRIPING TO MAXIMISE THROUGHPUT 1. striping over memory channels, banks, and coherence directories 2. past: NUMA throughput bottlenecks ⇒ mostly local striping 3. many-cores: no throughput bottlenecks but larger network core core core core cache cache cache cache directory directory directory directory memory memory memory memory memory memory 1 · Causes? 5

INTEL XEON PHI KNC IN DETAIL 4 threads 4 threads 57 - 61 cores L2 cache L2 cache 2x 2x memory memory directory 64 directories directory ctrl ctrl 2x 2x memory memory bi-directional ring network ctrl ctrl memory striping by (PhysAddr/62)&0xF . 1 avg. remote L2 read ≈ 240 cycles, contention > 16 threads. 2 some lines near to memory, up to 28% app. speedup possible. 3 1John McCalpin: https://software.intel.com/en-us/forums/intel-many-integrated-core/topic/586138 2Ramos et al: Modeling communication in cache-coherent SMP systems: A case-study with Xeon Phi. 3Balazs Gerofi et al: Exploiting Hidden Non-uniformity of Uniform Memory Access on Manycore CPUs 1 · Causes? 6

OUTLINE 1. Causes? 2. Solutions? 3. Is it worthwhile? 4. Conclusions 2 · Solutions? 7

REVERSE ENGINEERING KNC’S DIRECTORY STRIPING measure: fetch line currently owned by neighbour L2 two cores, two lines: one for measurement, other for coordination minimum RDTSC cycles, MyThOS kernel as bare-metal env. core core L2 cache L2 cache 3 2 1 directory core core L2 cache 3 L2 cache 2 directory 1 2 · Solutions? 8

RESULTS: PSEUDO-RANDOMLY SCATTERED ≈ 140 cycles best case vs. ≈ 400 cycles worst case 400 latency from core 0 to 1 300 200 100 0 256 512 768 1024 cache line 2 · Solutions? 9

RESULTS: RECONSTRUCTED MAPPING OF LINES TO DIRECTORIES Enables quick initialisation without measurements 400 latency from core 0 to 1 300 200 100 0 16 32 48 64 tag directory 2 · Solutions? 10

IMPLICATIONS Support in the MyThOS kernel per page: base address for line �→ directory per node: balanced mapping for directory �→ nearby core kernel objects can allocate local lines for sync. vars. Application challenges avoid > 16 threads accessing same line co-locate dependent tasks squeeze synchronisation into cache lines no easy migration after allocation 2 · Solutions? 11

OUTLINE 1. Causes? 2. Solutions? 3. Is it worthwhile? 4. Conclusions 3 · Is it worthwhile? 12

DEALING WITH LAYERS OF OBFUSCATION IN PSEUDO-UNIFORM MEMORY Robert - PowerPoint PPT Presentation

ROME Workshop, August 23, 2016 DEALING WITH LAYERS OF OBFUSCATION IN PSEUDO-UNIFORM MEMORY Robert Kuban, Mark Simon Sch ops, J org Nolte, Randolf Rotta rottaran@b-tu.de 1 Research supported by German BMBF grant 01IH13003. PROBLEM: MEMORY

Curriculum on The Cadet Corps Uniform Class A Uniform Class A Uniform Agenda C1. Class A

Obfuscation: know your enemy Ninon EYROLLES neyrolles@quarkslab.com Serge GUELTON

Curriculum on The Cadet Corps Uniform Wear It WIth honor Class C Uniform Class C Uniform

Non-Uniform Computation Lecture 10 Non-Uniform Computational Models: Circuits 1 Non-Uniform

Obfuscation Lecture 26 Different Flavours VBB Obfuscation Note: Considers only corrupt

HOST Circuit Obfuscation I ECE 525 Hardware Obfuscation (Drawn from "Hardware Protection

Dealing With The Irate Customer Dealing With The Irate Customer Dealing with difficult

Circuits Lecture 11 Uniform Circuit Complexity 1 Recall 2 Recall Non-uniform complexity 2

Non-Uniform Computation & Circuits Lecture 10 Wherein every language can be decided 1

OBFUSCURO : : A Commodity Obfuscation Engine for Intel SGX Adil Ahmad , Byunggill Joe, Yuan

Obfuscation from LWE? proofs, attacks, candidates Hoeteck Wee CNRS & ENS . . . . . .

Authorship Obfuscation Using Heuristic Search Masters Thesis Defence by Janek Bevendorff on 20

Lattice-Based SNARGs and Their Application to More Efficient Obfuscation Dan Boneh, Yuval Ishai,

) UNION SELECT `This_Talk` AS ('New Optimization and Obfuscation Optimization and Obfuscation

Engineering Code Obfuscation ISSISP 2017 - Obfuscation I Christian Collberg Department of

An NFR Pattern Approach to Dealing An NFR Pattern Approach to Dealing An NFR Pattern Approach to

Fluid/solid coupled convection/diffusion in unidirectional flows. Charles Pierre 1 , Franck

Access Control in Untrusted Cloud Storage using Unidirectional Re-encryption Zach Kissel, Jie

Advanced Election Techniques in Rings Eero Hkkinen 2007-02-21 Advanced Election Techniques in

Unicast Conjecture in Network Coding Zongpeng Li What is Network

Bound states in PT -symmetric layers Radek Nov ak Department of Physics Faculty of Nuclear

Advanced Interconnects In-house Bump Bonding Helmholtz Program: Matter and Technologies PoF III

Cellular Neural Networks and Least Squares for partial differential problems parallel solving

AIRS/IASI Radiance Comparisons Tom Pagano George Aumann Steve Broberg NASA AIRS Project Office

DEALING WITH LAYERS OF OBFUSCATION IN PSEUDO-UNIFORM MEMORY Robert - PowerPoint PPT Presentation

ROME Workshop, August 23, 2016 DEALING WITH LAYERS OF OBFUSCATION IN PSEUDO-UNIFORM MEMORY Robert Kuban, Mark Simon Sch ops, J org Nolte, Randolf Rotta rottaran@b-tu.de 1 Research supported by German BMBF grant 01IH13003. PROBLEM: MEMORY

Curriculum on The Cadet Corps Uniform Class A Uniform Class A Uniform Agenda C1. Class A

Obfuscation: know your enemy Ninon EYROLLES neyrolles@quarkslab.com Serge GUELTON

Curriculum on The Cadet Corps Uniform Wear It WIth honor Class C Uniform Class C Uniform

Non-Uniform Computation Lecture 10 Non-Uniform Computational Models: Circuits 1 Non-Uniform

Obfuscation Lecture 26 Different Flavours VBB Obfuscation Note: Considers only corrupt

HOST Circuit Obfuscation I ECE 525 Hardware Obfuscation (Drawn from &quot;Hardware Protection

Dealing With The Irate Customer Dealing With The Irate Customer Dealing with difficult

Circuits Lecture 11 Uniform Circuit Complexity 1 Recall 2 Recall Non-uniform complexity 2

Non-Uniform Computation &amp; Circuits Lecture 10 Wherein every language can be decided 1

OBFUSCURO : : A Commodity Obfuscation Engine for Intel SGX Adil Ahmad *, Byunggill Joe*, Yuan

Obfuscation from LWE? proofs, attacks, candidates Hoeteck Wee CNRS &amp; ENS . . . . . .

Authorship Obfuscation Using Heuristic Search Masters Thesis Defence by Janek Bevendorff on 20

Lattice-Based SNARGs and Their Application to More Efficient Obfuscation Dan Boneh, Yuval Ishai,

) UNION SELECT `This_Talk` AS ('New Optimization and Obfuscation Optimization and Obfuscation

Engineering Code Obfuscation ISSISP 2017 - Obfuscation I Christian Collberg Department of

An NFR Pattern Approach to Dealing An NFR Pattern Approach to Dealing An NFR Pattern Approach to

Fluid/solid coupled convection/diffusion in unidirectional flows. Charles Pierre 1 , Franck

Access Control in Untrusted Cloud Storage using Unidirectional Re-encryption Zach Kissel, Jie

Advanced Election Techniques in Rings Eero Hkkinen 2007-02-21 Advanced Election Techniques in

Unicast Conjecture in Network Coding Zongpeng Li What is Network

Bound states in PT -symmetric layers Radek Nov ak Department of Physics Faculty of Nuclear

Advanced Interconnects In-house Bump Bonding Helmholtz Program: Matter and Technologies PoF III

Cellular Neural Networks and Least Squares for partial differential problems parallel solving

AIRS/IASI Radiance Comparisons Tom Pagano George Aumann Steve Broberg NASA AIRS Project Office

HOST Circuit Obfuscation I ECE 525 Hardware Obfuscation (Drawn from "Hardware Protection

Non-Uniform Computation & Circuits Lecture 10 Wherein every language can be decided 1

OBFUSCURO : : A Commodity Obfuscation Engine for Intel SGX Adil Ahmad , Byunggill Joe, Yuan

Obfuscation from LWE? proofs, attacks, candidates Hoeteck Wee CNRS & ENS . . . . . .