Multicore Workshop NUMA Mark Bull David Henty EPCC, University - PowerPoint PPT Presentation

Sep 04, 2022 •255 likes •342 views

Multicore Workshop NUMA Mark Bull David Henty EPCC, University of Edinburgh Distributed shared memory Shared memory machines using buses and a single main memory do not scale to large numbers of processors bus and memory become a

Multicore Workshop NUMA Mark Bull David Henty EPCC, University of Edinburgh
Distributed shared memory • Shared memory machines using buses and a single main memory do not scale to large numbers of processors – bus and memory become a bottleneck • Distributed shared memory machines designed to: – scale to larger numbers of processors – retain a single address space • Modest sized multi-socket systems connected with HyperTransport or QPI are, in fact, distributed shared memory • Also true of recent multicore chips – multiple “dies” on a single chip (i.e. single socket) 20/11/2012 NUMA 2
True shared memory P P P P P P Network Memory Examples: Sun X4600, all multicore PCs, IBM p575, NEC SX8, Fujitsu PRIMEQUEST 20/11/2012 NUMA 3
Distributed shared memory P P P P P P P P P P P P P P P P M M M M M M M M Network 20/11/2012 NUMA 4
Directory based coherency • For scalability, there is no bus, so snooping is not possible • Instead use a directory structure – bit vector for every block – one bit per processor – stored in (distributed) memory – bit is set to 1 whenever the corresponding processor caches the block. • Still some scalability issues: – directory takes up a lot of space for large machines – e.g. 128 byte cache block, 256 processors: directory is 20% of memory – some techniques to get round this 20/11/2012 NUMA 5
Implementation • Node where memory (and directory entry) is located is called the home node. • Basic principal is same as snoopy protocol – cache block has same 3 states (modified, shared, invalid) – directory entry has modifed, shared and uncached states. • Cache misses go to the home node for data, and directory bits are set accordingly for read/write misses. • Directory can: – invalidate a copy in a remote cache – fetch the data back from a remote cache • Cache can write back to home node. 20/11/2012 NUMA 6
cc-NUMA • We have described a distributed shared memory system where every memory address has a home node. • This type of system is known a a cache-coherent non- uniform memory architecture (cc-NUMA). • Main problem is that access to remote memories take longer than to local memory – difficult to determine which is the best node to allocate given page on • OS is responsible for allocating pages • Common policies are: – first touch: allocate on node which makes first access to the page – round robin: allocate cyclically 20/11/2012 NUMA 7
Migration and replication • Possible for the OS to move pages between nodes as an application is running • Pages can either be migrated or replicated. • Migration involves the relocation of a page to a new home node. • Replication involves the creation of a “shadow” of the page on another node. – read miss can go to the shadow page • Cache coherency is still maintained by hardware on a cache block basis. 20/11/2012 NUMA 8

Recommend

State of Multicore OCaml KC Sivaramakrishnan University of OCaml Labs Cambridge Outline

State of Multicore OCaml KC Sivaramakrishnan University of OCaml Labs Cambridge Outline Overview of the multicore OCaml project Multicore OCaml runtime design Future directions Multicore OCaml Multicore OCaml Add native

932 views • 62 slides

The Why, Where and How of Multicore Anant Agarwal MIT and Tilera Corp. What is Multicore?

The Why, Where and How of Multicore Anant Agarwal MIT and Tilera Corp. What is Multicore? Whatevers Inside? What is Multicore? Whatevers Inside? Seriously, multicore satisfies three properties Single chip Multiple

494 views • 35 slides

Multicore Multicore curiculum 1 Motivation Moores Law: the number of transistors double

Multicore Multicore curiculum 1 Motivation Moores Law: the number of transistors double every 18 months Fonte: Intel Multicore curiculum 2 Memory capacity also increases Multicore curiculum 3 The Memory Wall 100,000 10,000 1,000

391 views • 17 slides

Multicore OCaml GC KC Sivaramakrishnan, Stephen Dolan University of OCaml Labs Cambridge

Multicore OCaml GC KC Sivaramakrishnan, Stephen Dolan University of OCaml Labs Cambridge Multicore OCaml Multicore OCaml Adds native support for concurrency and parallelism in OCaml Multicore OCaml Adds native support for concurrency

1.28k views • 115 slides

Multicore Synchronization a pragmatic introduction Multicore Synchronization This is a talk on

Multicore Synchronization a pragmatic introduction Multicore Synchronization This is a talk on mechanical sympathy of parallel systems on modern multicore systems. Understanding both your workload and your environment allows for effective

1.21k views • 107 slides

RETHINKING OPERATING SYSTEM DESIGNS FOR A Ken Birman Based heavily MULTICORE WORLD on a slide

RETHINKING OPERATING SYSTEM DESIGNS FOR A Ken Birman Based heavily MULTICORE WORLD on a slide set by Colin Ponce THE RISE OF MULTICORE CPUS Multicore computer: A computer with more than one CPU. 1960-1990: Multicore existed in

571 views • 31 slides

When Multicore Isnt Enough: Trends and the Future for Multi-Multicore Systems Matt Reilly

When Multicore Isnt Enough: Trends and the Future for Multi-Multicore Systems Matt Reilly Chief Engineer SiCortex, Inc 1 Monday, September 22, 2008 1 The Computational Model For a large set of interesting problems (N is number of

526 views • 34 slides

A Scalable Ordering Primitive for Multicore Machines Sanidhya Kashyap Changwoo Min Kangnyeon Kim

A Scalable Ordering Primitive for Multicore Machines Sanidhya Kashyap Changwoo Min Kangnyeon Kim Taesoo Kim Era of multicore machines 2 Scope of multicore machines Huge hardware thread parallelism How are operations executed correctly?

1.37k views • 30 slides

The Challenge of Multicore The Challenge of Multicore and and Specialized Accelerators for

The Challenge of Multicore The Challenge of Multicore and and Specialized Accelerators for Specialized Accelerators for Mathematical Software Mathematical Software Jack Dongarra Alfredo Buttari, Jakub Kurzak, Julie Langou, Julien Langou,

437 views • 17 slides

Practical Algebraic Effect Handlers in Multicore OCaml KC Sivaramakrishnan University of

Practical Algebraic Effect Handlers in Multicore OCaml KC Sivaramakrishnan University of OCaml Cambridge Labs Multicore OCaml Native support for concurrency and parallelism https://github.com/ocamllabs/ocaml-multicore Led from

526 views • 36 slides

Reactive design patterns for microservices on multicore Reactive summit - 22/10/18

Reactive Software with elegance Reactive design patterns for microservices on multicore Reactive summit - 22/10/18 charly.bechara@tredzone.com Outline Microservices on multicore Reactive Multicore Patterns Modern Software Roadmap 2 1

1.14k views • 69 slides

Multicore Based Packet Splitting Multicore Based Packet Splitting Approaches for High Speed

Multicore Based Packet Splitting Multicore Based Packet Splitting Approaches for High Speed Network A Approaches for High Speed Network A h h f f Hi h S Hi h S d N t d N t k k Security Security Security Security APAN 32 nd Meeting

459 views • 21 slides

The Impact of Multicore Multicore on on The Impact of Math Software Math Software and and

The Impact of Multicore Multicore on on The Impact of Math Software Math Software and and Exploiting Single Precision Exploiting Single Precision in Obtaining Double Precision in Obtaining Double Precision Jack Dongarra University of

666 views • 31 slides

Multicore job management in the Multicore job management in the Worldwide LHC Computing Grid

Multicore job management in the Multicore job management in the Worldwide LHC Computing Grid Worldwide LHC Computing Grid EGI Community Forum, Helsinki, May 20th 2014 EGI Community Forum, Helsinki, May 20th 2014 Antonio Prez-Calero Yzquierdo

1.39k views • 30 slides

Computer Architecture Summer 2020 Multicore Dan Sorin and Tyler Bletsch Duke University

ECE/CS 250 Computer Architecture Summer 2020 Multicore Dan Sorin and Tyler Bletsch Duke University Multicore and Multithreaded Processors Why multicore? Thread-level parallelism Multithreaded cores Multiprocessors Design

748 views • 42 slides

T-106.5800 Seminar on Software Techniques Seminar on Multicore Programming Multicore Technology

T-106.5800 Seminar on Software Techniques Seminar on Multicore Programming Multicore Technology in Mobile Devices Antti P Miettinen antti.p.miettinen@nokia.com February 12, 2009 Abstract power budget of roughly three Watts for a hand- held

741 views • 4 slides

Slide Set #15: Exploiting Memory Hierarchy 1 ADMIN Chapter 5 Reading 5.1, 5.3, 5.4 2

Slide Set #15: Exploiting Memory Hierarchy 1 ADMIN Chapter 5 Reading 5.1, 5.3, 5.4 2 Memory, Cost, and Performance Ideal World: we want a memory that is Fast, Big, & Cheap! Recent real world situation:

216 views • 6 slides

What to do when coalescing fails Virtual Memory and Demand Paging 5H. Memory Compaction

4/14/2017 What to do when coalescing fails Virtual Memory and Demand Paging 5H. Memory Compaction garbage collection is just another way to free 6A. Swapping to Secondary Storage doesnt greatly help or hurt fragmentation 5E.

249 views • 11 slides

Virtual Memory 3 / I/O 1 last time working set, Zipf usage models LRU page replacement

Virtual Memory 3 / I/O 1 last time working set, Zipf usage models LRU page replacement approximating LRU by sampling accessed bits or mark invalid nit: said Linux marked invalid to test probably not on x86 instead periodic scanning of

1.2k views • 109 slides

Previous Lecture Slides for Lecture 15 ENCM 501: Principles of Computer Architecture Winter 2014

slide 2/20 ENCM 501 W14 Slides for Lecture 15 Previous Lecture Slides for Lecture 15 ENCM 501: Principles of Computer Architecture Winter 2014 Term Steve Norman, PhD, PEng Virtual memory, page tables, and TLBs. Electrical & Computer

644 views • 4 slides

Building a scalable time-series database using Postgres Mike Freedman Co-founder / CTO,

Building a scalable time-series database using Postgres Mike Freedman Co-founder / CTO, Timescale mike@timescale.com https://github.com/timescale/timescaledb Time-series data is everywhere, greater volumes than ever before What DB for

689 views • 45 slides

Co-Evaluation of Pattern Matching Algorithms on IoT Devices with Embedded GPUs Charalampos

Co-Evaluation of Pattern Matching Algorithms on IoT Devices with Embedded GPUs Charalampos Stylianopoulos Simon Kindstrm Magnus Almgren Olaf Landsiedel Marina Papatriantafilou Distributed Computing and Systems Motivation IoT security

908 views • 31 slides

Distributed Shared Memory Shared memory : difficult to realize vs . easy to program with.

CSCE 613 : Operating Systems Memory Models CSCE 613: Interlude: Distributed Shared Memory Shared Memory Systems Consistency Models Distributed Shared Memory Systems page based shared-variable based Reading (old!):

679 views • 21 slides

Last Class: Paging & Segmentation Paging: divide memory into fixed-sized pages, map to

Last Class: Paging & Segmentation Paging: divide memory into fixed-sized pages, map to frames (OS view of memory) Segmentation: divide process into logical segments (compiler view of memory Combine paging and segmentation

332 views • 19 slides