Architectures with Large Die-Stacked DRAM Cache Adarsh Patil - PowerPoint PPT Presentation

TLB and Pagewalk Performance in Multicore Architectures with Large Die-Stacked DRAM Cache Adarsh Patil Adviser: Prof. R Govindarajan Perspective Seminar 6 th Nov 2015

Outline ■ Introduction ฀ Address Translation - TLBs and Page Walks ฀ Die stacked DRAM caches ■ Objective ■ Experimental Setup ฀ Framework ฀ Methodology ■ Results ■ Conclusion and Future Work CSA Perspective Seminar 2 6th Nov 2015

Computing Trends ■ Software ฀ Large memory footprint *Apps in Big Data Bench / Cloud Suite Benchmark CSA Perspective Seminar 3 6th Nov 2015

Computing Trends ■ Software ฀ Large memory footprint ฀ Virtualization and cloud computing *Source : VMware CSA Perspective Seminar 4 6th Nov 2015

Computing Trends ■ Software ฀ Large memory footprint ฀ Virtualization and cloud computing Intel Haswell-E & IBM Power 8 ■ Architectural ฀ Multicore / Manycore architectures CSA Perspective Seminar 5 6th Nov 2015

Computing Trends ■ Software ฀ Large memory footprint ฀ Virtualization and cloud computing ■ Architectural ฀ Multicore / Manycore architectures ฀ Large Die stacked DRAM cache *Source : Invensas, Tessera CSA Perspective Seminar 6 6th Nov 2015

Paged Virtual Memory ■ Virtual address space divided into “ pages ” ■ “ Page Table ” : In -memory table, organized as radix tree , to map virtual to physical address and store meta-information (replacement, access privilege, dirty bit etc.) ■ Page table entries cached in fast lookup structures called “ Translation Lookaside Buffers (TLBs) ” ■ Page Table has evolved to 4-level tree to accommodate 48-bit VA CSA Perspective Seminar 7 6th Nov 2015

Page Table Structure 47 : 39 63 : 48 38 : 30 29 : 21 20 : 12 11 : 0 Sign extension PD | PL2 PTE | PL1 Page Offset PML4 | PL4 PDP | PL3 L1 L4 … CR 3 Register … ppn: 362 L3 ppn: 382 ppn: 156 … … ppn: 467 ppn: NULL L2 ppn: NULL Data Superpage ppn: 684 ppn: 137 … … ppn: 673 L1 ppn: 041 … ppn: 734 ■ Hierarchical page table Data ppn: 424 … Page ppn: 016 ■ 4 memory references - 4KB page … 3 memory references - 2MB superpage ■ Each entry is 8 bytes ■ TLB stores VA to PA CSA Perspective Seminar 8 6th Nov 2015

Page Table Structure-Virtualization ■ Guest Page Table (gPT) ฀ Translate guest virtual to guest physical ฀ Setup and modified by guest independently ■ Nested Page Table (nPT) ฀ Translate host virtual to host physical ฀ Controlled by host ■ Upto 24 memory references on page walk ■ TLB stores to end to end translation CSA Perspective Seminar 18 6th Nov 2015

Address Translation in Hardware CORE Cache Cache CORE Cache Cache L1 L1 L1 TLB L3 Cache Hardware Page L2 L2 Instr MMU MMU Shared Walker Page Walk Caches Cache L2 TLB L1 TLB Data Cache Cache CORE Cache CORE L1 TLB Translatio Cache Cache L1 L1 Page Walk Superpage n L2 L2 cycles MMU MMU Page Walk Caches Caches Miss Multi-level TLB < 4 cycles VA PA CPU Large Miss Set Miss L1 Cache ? PIPT Memory VIPT Caches 4 cycles Hit Hit 6 / 10 180-200 cycles cycles Data CSA Perspective Seminar 19 6th Nov 2015

TLB-reach & page walk latencies ■ TLB-reach: amount of data that can be accessed without causing a miss ฀ Clustered [HPCA ‘14] and Coalesced [MICRO ’12] TLBs ฀ Superpage friendly TLBs [HPCA ‘15] using skewed TLBs ฀ Shared last level TLBs [HPCA ‘11] evaluates shared TLBs for multi -cores ฀ Direct segment [ISCA ‘13] - primary region abstraction to map part of the virtual address space using segment registers and avoid paging completely ฀ Redundant memory mappings [ISCA ‘15] - allocation in units called ranges (eager paging in OS) and maintain ranges in a separate range-TLB, compatible with traditional paging. ■ Speeding up miss handling ฀ AMD proposed Accelerating 2D page walks [ASPLOS ‘08] by using page walk caches for virtualization ฀ Characterize TLB behaviors and sensitivity of individual SPEC 2000 [SIGMETRICS ‘02] and PARSEC [PACT ‘09] applications CSA Perspective Seminar 20 6th Nov 2015

Architectures with Large Die-Stacked DRAM Cache Adarsh Patil - PowerPoint PPT Presentation

TLB and Pagewalk Performance in Multicore Architectures with Large Die-Stacked DRAM Cache Adarsh Patil Adviser: Prof. R Govindarajan Perspective Seminar 6 th Nov 2015 Outline Introduction Address Translation - TLBs and Page Walks

Architectures Architectural styles Software architectures Architectures versus middleware

CompSci 356: Computer Network Architectures Lecture 2: Network Architectures Xiaowei Yang

Architectures, Architectures, Microkernels, IPC, Microkernels, IPC, Capabilities Capabilities

Overview Agent Architectures Definition of agent architecture Classical Architectures for

CompSci 356: Computer Network Architectures Lecture 2: Network Architectures Xiaowei Yang

HPC Architectures Types of resource currently in use Outline Shared memory architectures

HPC Architectures Types of resource currently in use Outline Shared memory architectures

GLAST Large Area Telescope: GLAST Large Area Telescope: Gamma- -ray Large ray Large Gamma

NOAA Software Engineering for Novel Architectures (SENA) Project Leslie Hart GTC DC 2016

Building Partitioned Architectures Building Partitioned Architectures based on the based on the

Aligning, not Integrating Aligning, not Integrating Architectures: Architectures: Leveraging a

Network Kernel Architectures and Implementation (01204423) Single-Node Architectures Chaiporn

Alternative Architectures Philipp Koehn 15 October 2020 Philipp Koehn Machine Translation:

Layered Systems Software Design and Architectures Layered Systems BSD Unix Layered Architecture

Nanowire- -Based Based Nanowire Programmable Programmable Architectures Architectures

CSCE 496/896 Lecture 6: Architectures Stephen Scott Recurrent Architectures Introduction Basic

Stupid !! Andr Seznec 2 Single thread performance Has been driving architecture till

Making Good Enough...Better: Addressing the Multiple Objectives of High-Performance Parallel

Distributed and on-demand cache for CMS experiment at LHC Diego Ciangottini on behalf of CMS

NOW Handout Page 1 9 Parallel Architecture Framework Scalable Machines What are the design

Plan Motivations (to combine navigation and querying in a file system) Specification (ls = ?,

Finding packages, project organization Steve Bagley somgen223.stanford.edu 1 How to find R

1 Querying Irregular Dataset Structure Multi-dimensional Datasets Irregular datasets

Architecture and Synthesis for Multi- -Cycle Cycle Architecture and Synthesis for Multi On-