Memory Hierarchies [FLPR12] Matteo Frigo, Charles E. Leiserson, - PowerPoint PPT Presentation

Memory Hierarchies [FLPR12] Matteo Frigo, Charles E. Leiserson, Harald Prokop, Sridhar Ramachandran. Cache- Oblivious Algorithms. ACM Transactions on Algorithms, 8(1), Article No. 4, 2012. [BFJ02] Gerth Stølting Brodal, Rolf Fagerberg, Riko Jacob. Cache-Oblivious Search Trees via Binary Trees of Small Height. In Proc. 13th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), 39-48, 2002. [JM13] Tomasz Jurkiewicz, Kurt Mehlhorn. The cost of address translation, In Proc. 15th Annual Meeting on Algorithm Engineering & Experiments (ALENEX), 148-162, 2013.

Memory Hierarchies vs Efficiency  Cache misses (L1, L2, L3, ...)  Prefetching  Cache associativity  Virtual to physical mapping  Translation Look-aside Buffer (TLB)  TLB misses

Some Typical Access times Level Access time Cache line size L1 Data ~16 KB 5 ns 64 bytes L1 Instruction ~16 KB L2 ~512 KB 20 ns 64 bytes L3 ~10 MB 30 ns 64 bytes Main memory 60 ns Disk 10 ms 4 KB

Intel(R) Core(TM) i7-3820 CPU @ 3.60GHz  32nm, 4 core [8 threads], L1, L2 and L3 line size 64 bytes  L1 instruction 32K 8-way write-through per core  L1 data 32K 8-way write-back per core  L1 cache latency 3 clock cycles  L2 256KB 8-way write-back unified cache per core  L2 cache latency 12 clock cycles  L3 10MB 20-way write-back unified cache shared by ALL cores  L3 cache latency 26-31 clock cycles  L1 instruction TLB , 4K pages, 64 entries, 4-way  L1 data TLB, 4K pages, 64 entries, 4-way  L2 TLB, 4K pages, 512 entries, 4-way  ALL caches and TLBs use a pseudo LRU replacement policy

Virtual to Physical Address Mapping

Cost of Address Translation [JM13] Tomasz Jurkiewicz, Kurt Mehlhorn. The cost of address translation, In Proc. 15th Annual Meeting on Algorithm Engineering & Experiments (ALENEX), 148-162, 2013.

Cache-Oblivious Model  I/O model...but algorithms do not know B and M M  Assume optimal cache replacement strategy B  Optimal on all levels (under some assumptions) [FLPR12] Matteo Frigo, Charles E. Leiserson, Harald Prokop, Sridhar Ramachandran. Cache- Oblivious Algorithms. ACM Transactions on Algorithms, 8(1), Article No. 4, 2012.

Recursive Tree Layout (van Emde Boas layout) Binary tree Searches O(log B N ) IOs Range Searches O(log B N + k / B ) Harald Prokop 1999, MIT MSc thesis ” Cache-Oblivious Algorithms ”, June 1999

Four Tree Layouts DFS Inorder BFS Recursive / van Emde Boas

Random Searches in Pointer Layouts vEB

Random Searches in Implicit Layouts 9-ary bfs

Making Trees Dynamic ?  Trees of bounded depth Andersson and Lai 1990  Rebuild subtrees when depth  log n + O(1)  Insert: O(log 2 n ) amortized

Static  Dynamic  Emded dynamic tree into a complete tree  Static layout of tree (e.g. van Emde Boas layout)  Search O(log B N )  Update O(log B N + (log 2 N )/ B ) [BFJ02] Gerth Stølting Brodal, Rolf Fagerberg, Riko Jacob. Cache-Oblivious Search Trees via Binary Trees of Small Height. In Proc. 13th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), 39-48, 2002.

Insertions into Implicit Layout  Insertions factor 10-100 slower than searches

Matrix Transpose N x N matrix, divided by N 2 Multiply N x N matrix, divided by N 3 [FLPR12] Matteo Frigo, Charles E. Leiserson, Harald Prokop, Sridhar Ramachandran. Cache- Oblivious Algorithms. ACM Transactions on Algorithms, 8(1), Article No. 4, 2012.

Memory Hierarchies [FLPR12] Matteo Frigo, Charles E. Leiserson, - PowerPoint PPT Presentation

Memory Hierarchies [FLPR12] Matteo Frigo, Charles E. Leiserson, Harald Prokop, Sridhar Ramachandran. Cache- Oblivious Algorithms. ACM Transactions on Algorithms, 8(1), Article No. 4, 2012. [BFJ02] Gerth Stlting Brodal, Rolf Fagerberg, Riko

Lecture 20: Cache Hierarchies, Virtual Memory Todays topics: Cache hierarchies

Integrable twisted hierarchies Twisted with D 2 symmetries hierarchies of a splitting type

Outline DMP204 SCHEDULING, TIMETABLING AND ROUTING 1. Complexity Hierarchies Lecture 2 2

Memory II. Memory improvement III. Problems with memory 3 systems/stages of Memory: memory

Memory Hierarchy Design Memory Hierarchy Design Chapter 5 and Appendix C 1 Overview

Memory Hierarchy: Caching CSE 141, S2'06 Jeff Brown The memory subsystem Computer Control

OUTLINE CHAPTER 10 Recursive Hierarchies Table of contents Recursive Hierarchies and Bridges

Relational Data Hierarchies CSC444 Why hierarchies?

Hierarchies in inclusion logic Miika Hannula University of Helsinki 27.8.2014 Miika Hannula

Soliton hierarchies and matrix loop algebras Wen-Xiu Ma Department of Mathematics and Statistics

Relational Data Hierarchies CS444 Why hierarchies?

Relational Data Hierarchies CSC544 Why hierarchies?

Selective Restructuring of Bo nding Vol me Hierarchies for Bounding Volume Hierarchies for

1 Memory SoC Persistent Memory-Driven Memory Memory Processor-Centric Memory SoC SoC

Networks Computer-Computer Comm CPU CPU CPU CPU Memory Device Device Memory Memory

Virtual Memory 1 Memory Hierarchy Memory 4GB Cache 1M Registers 1K Question: What if

Today Memory Management Segmentation, Paging Improving memory performance MMU

Plan Hierarchical memories and their impact on our programs 1 Cache Memories, Cache Complexity

CENG3420 Lecture 09: Virtual Memory & Performance Bei Yu (Latest update: March 19, 2020)

x86 Memory Protection and Binary Memory Threads Formats Allocators Translation User System

[537] TLBs Tyler Harter 9/21/14 Overview Review Paging TLBs (Chapter 18) TLB measurement demo

Algorithm X. Li, C. Bao, F. Baker March 2009 Abstract This document specifies an update to

Alignment in Machine Translation CMSC 723 / LING 723 / INST 725 M ARINE C ARPUAT

Statistical Machine Translation Statistical Machine Translation p Lecture 2 Theory and Praxis of

Memory Hierarchies [FLPR12] Matteo Frigo, Charles E. Leiserson, - PowerPoint PPT Presentation

Memory Hierarchies [FLPR12] Matteo Frigo, Charles E. Leiserson, Harald Prokop, Sridhar Ramachandran. Cache- Oblivious Algorithms. ACM Transactions on Algorithms, 8(1), Article No. 4, 2012. [BFJ02] Gerth Stlting Brodal, Rolf Fagerberg, Riko

Lecture 20: Cache Hierarchies, Virtual Memory Todays topics: Cache hierarchies

Integrable twisted hierarchies Twisted with D 2 symmetries hierarchies of a splitting type

Outline DMP204 SCHEDULING, TIMETABLING AND ROUTING 1. Complexity Hierarchies Lecture 2 2

Memory II. Memory improvement III. Problems with memory 3 systems/stages of Memory: memory

Memory Hierarchy Design Memory Hierarchy Design Chapter 5 and Appendix C 1 Overview

Memory Hierarchy: Caching CSE 141, S2'06 Jeff Brown The memory subsystem Computer Control

OUTLINE CHAPTER 10 Recursive Hierarchies Table of contents Recursive Hierarchies and Bridges

Relational Data Hierarchies CSC444 Why hierarchies?

Hierarchies in inclusion logic Miika Hannula University of Helsinki 27.8.2014 Miika Hannula

Soliton hierarchies and matrix loop algebras Wen-Xiu Ma Department of Mathematics and Statistics

Relational Data Hierarchies CS444 Why hierarchies?

Relational Data Hierarchies CSC544 Why hierarchies?

Selective Restructuring of Bo nding Vol me Hierarchies for Bounding Volume Hierarchies for

1 Memory SoC Persistent Memory-Driven Memory Memory Processor-Centric Memory SoC SoC

Networks Computer-Computer Comm CPU CPU CPU CPU Memory Device Device Memory Memory

Virtual Memory 1 Memory Hierarchy Memory 4GB Cache 1M Registers 1K Question: What if

Today Memory Management Segmentation, Paging Improving memory performance MMU

Plan Hierarchical memories and their impact on our programs 1 Cache Memories, Cache Complexity

CENG3420 Lecture 09: Virtual Memory &amp; Performance Bei Yu (Latest update: March 19, 2020)

x86 Memory Protection and Binary Memory Threads Formats Allocators Translation User System

[537] TLBs Tyler Harter 9/21/14 Overview Review Paging TLBs (Chapter 18) TLB measurement demo

Algorithm X. Li, C. Bao, F. Baker March 2009 Abstract This document specifies an update to

Alignment in Machine Translation CMSC 723 / LING 723 / INST 725 M ARINE C ARPUAT

Statistical Machine Translation Statistical Machine Translation p Lecture 2 Theory and Praxis of

CENG3420 Lecture 09: Virtual Memory & Performance Bei Yu (Latest update: March 19, 2020)