Harris Corner Detec.on on a NUMA Manycore Claude TADONKI C entre de - PowerPoint PPT Presentation

Harris Corner Detec.on on a NUMA Manycore Claude TADONKI C entre de R echerche en I nformatique (CRI) Joint work with Olfa HAGGUI and Lionel LACASSAGNE S ousse N ational S chool of E ngineering - L aboratoire d’ I nformatique de P aris 6 (LIP6) Mines ParisTech - PSL 1

Corner points are used for mo.on detec.on for instance From the intensity I (color not needed), we need to compute (approximated) deriva:ves and combined them 2 Claude TADONKI Harris Corner Dectec=on on a NUMA Manycore Seminar at Centre de Recherche en Informa=que – April 16, 2018 - FONTAINEBLEAU - FRANCE

The procedure applies a series of convolu=on kernels to the input intensity matrix Each convolu.on is a stencil computa=on The whole computa.on can be fully serial , fully pilelined , or hybrid . Memory acces paEerns are the main focus w.r.t performances 3 Claude TADONKI Harris Corner Dectec=on on a NUMA Manycore Seminar at Centre de Recherche en Informa=que – April 16, 2018 - FONTAINEBLEAU - FRANCE

Stencil computa=on : redundant memory accesses, cache misses , and unalignement Scheduling of the convolu=ons : Intermediate reads/writes ( space and access :me ) SIMD : not efficient in its standard form (what we get from the compiler) SM Parallelism : bus conten.on, threads synchroniza.on, NUMA We are going to explain our approach for each of the aforemen-oned aspects !!! 4 Claude TADONKI Harris Corner Dectec=on on a NUMA Manycore Seminar at Centre de Recherche en Informa=que – April 16, 2018 - FONTAINEBLEAU - FRANCE

Separability Half-Pipe Clustering 5 Claude TADONKI Harris Corner Dectec=on on a NUMA Manycore Seminar at Centre de Recherche en Informa=que – April 16, 2018 - FONTAINEBLEAU - FRANCE

Vectoriza.on with the original memory access paEern leads to unaligned accesses We propose a diagonal shiS to keep all accesses aligned The last vector register contains 2 dirty values, but the whole vector is stored (4-components vector) 6 Claude TADONKI Harris Corner Dectec=on on a NUMA Manycore Seminar at Centre de Recherche en Informa=que – April 16, 2018 - FONTAINEBLEAU - FRANCE

The goal here is to load data into vector registers once , and then perform all dependent calcula=ons ( op:mal data consump:on and memory accesses saving ) 7 Claude TADONKI Harris Corner Dectec=on on a NUMA Manycore Seminar at Centre de Recherche en Informa=que – April 16, 2018 - FONTAINEBLEAU - FRANCE

Thus, for the computa.on of an en.re row, the typical steps at each itera.on are: 8 Claude TADONKI Harris Corner Dectec=on on a NUMA Manycore Seminar at Centre de Recherche en Informa=que – April 16, 2018 - FONTAINEBLEAU - FRANCE

9 Claude TADONKI Harris Corner Dectec=on on a NUMA Manycore Seminar at Centre de Recherche en Informa=que – April 16, 2018 - FONTAINEBLEAU - FRANCE

We consider the half-pipe clustering We pipeline the two cluster steps (SOLBEL+MUL and GAUSS+COARSITY) through a loop fusion We apply an array contrac=on ( mod 3 ) for the intermediate storage 10 Claude TADONKI Harris Corner Dectec=on on a NUMA Manycore Seminar at Centre de Recherche en Informa=que – April 16, 2018 - FONTAINEBLEAU - FRANCE

Both input and output images are stored on NUMA node 0 Each NUMA node locally computes its chunk (block of lines) of the final result Within each NUMA node, the work is equally distributed by block to its threads Expected memory alloca.on on the NUMA nodes is done by explicit binding rou.nes 11 Claude TADONKI Harris Corner Dectec=on on a NUMA Manycore Seminar at Centre de Recherche en Informa=que – April 16, 2018 - FONTAINEBLEAU - FRANCE

(1) Original SIMD without the in-registers strategy (2) Op.mized SIMD with the in-registers strategy In-register strategy doubles the overall peformances and Our sequen.al implementa.on outperforms the state-of-the-art absolute performance 12 Claude TADONKI Harris Corner Dectec=on on a NUMA Manycore Seminar at Centre de Recherche en Informa=que – April 16, 2018 - FONTAINEBLEAU - FRANCE

Claude TADONKI Harris Corner Dectec=on on a NUMA Manycore Seminar at Centre de Recherche en Informa=que – April 16, 2018 - FONTAINEBLEAU - FRANCE

Harris Corner Detec.on on a NUMA Manycore Claude TADONKI C entre de - PowerPoint PPT Presentation

Harris Corner Detec.on on a NUMA Manycore Claude TADONKI C entre de R echerche en I nformatique (CRI) Joint work with Olfa HAGGUI and Lionel LACASSAGNE S ousse N ational S chool of E ngineering - L aboratoire d I nformatique de P aris 6 (LIP6)

ManyCore ManyCore Computing: ManyCore ManyCore Computing: Computing: Computing: The Impact on

Scalable NUMA-aware Blocking Synchronization Primitives Sanidhya Kashyap , Changwoo Min, Taesoo

NUMA-aware Reader-Writer Locks Tom Herold, Marco Lamina 04.02.2015 NUMA Seminar Agenda 1.

Automatic NUMA Balancing Rik van Riel, Principal Software Engineer, Red Hat Vinod Chegu, Master

G Corner Electrical Systems Limited SYSTEMS DC Busbar Systems G Corner Electrical CORNER Systems

COMP 633 - Parallel Computing Lecture 10 September 15, 2020 CC-NUMA (1) CC-NUMA implementation

American Corner Cambodia Presented by CHEA EA SOPHE HEA American Corner Coordinator An

NUMA Non-Uniform Memory Access Numa becomes more common because memory controllers get close

NUMA Support for Charm++ Does memory affinity matter? Christiane Pousa Ribeiro Maxime Martinasso

FreeBSD and NUMA John Baldwin NYC*BUG June 3, 2015 What is NUMA Non-Uniform Memory

NUMA-Friendly Stack (using Delegation and Elimination) Irina Calciu Justin Gottschlich Maurice

NUMA-ICTM: A Parallel Version of ICTM Exploiting Memory Placement Strategies for NUMA Machines

NUMA-aware Matrix-Matrix-Multiplication Max Reimann, Philipp Otto 1 About this talk

GARDEN CORNER CURVES INTRODUCTION Updat e: Garden Corner Curves Concept St udy Result

CMSC 131 Fall 2018 Announcements Project #5 due on Thursday Corner Cases What are corner

ShfmLocks: Scalable and Practjcal Locking for Manycore Systems Changwoo Min COSMOSS Lab / ECE /

IS THE BRAIN A STATISTICIAN? Work in progress with Aline Duarte, Guilherme Ost, Antonio Galves,

Welcome to FOSDEM! Philip Paeps <philip@fosdem.org> & Pascal Bleser

Project Application Development Building an IJVM emulator Sebastian Osterlund

with Machine Learning Michela Paganini Yale 1 Yale How does ML empower Physics at the

Case Study of Molecular Algorithm Design CMC12, Fontainebleau/Paris, August 2011 Gerd Gruenert,

s 2 ( 2573 ) and the prediction of novel exotic charmed mesons R. Molina 1 , T. Branz 2 , and E.

Averaging Robertson-Walker Cosmologies Iain A. Brown Institut f ur Theoretische Physik,

Strategies to determine the X(3872) energy from QCD lattice simulations R. Molina 1 , E. J. Garzon

Harris Corner Detec.on on a NUMA Manycore Claude TADONKI C entre de - PowerPoint PPT Presentation

Harris Corner Detec.on on a NUMA Manycore Claude TADONKI C entre de R echerche en I nformatique (CRI) Joint work with Olfa HAGGUI and Lionel LACASSAGNE S ousse N ational S chool of E ngineering - L aboratoire d I nformatique de P aris 6 (LIP6)

ManyCore ManyCore Computing: ManyCore ManyCore Computing: Computing: Computing: The Impact on

Scalable NUMA-aware Blocking Synchronization Primitives Sanidhya Kashyap , Changwoo Min, Taesoo

NUMA-aware Reader-Writer Locks Tom Herold, Marco Lamina 04.02.2015 NUMA Seminar Agenda 1.

Automatic NUMA Balancing Rik van Riel, Principal Software Engineer, Red Hat Vinod Chegu, Master

G Corner Electrical Systems Limited SYSTEMS DC Busbar Systems G Corner Electrical CORNER Systems

COMP 633 - Parallel Computing Lecture 10 September 15, 2020 CC-NUMA (1) CC-NUMA implementation

American Corner Cambodia Presented by CHEA EA SOPHE HEA American Corner Coordinator An

NUMA Non-Uniform Memory Access Numa becomes more common because memory controllers get close

NUMA Support for Charm++ Does memory affinity matter? Christiane Pousa Ribeiro Maxime Martinasso

FreeBSD and NUMA John Baldwin NYC*BUG June 3, 2015 What is NUMA Non-Uniform Memory

NUMA-Friendly Stack (using Delegation and Elimination) Irina Calciu Justin Gottschlich Maurice

NUMA-ICTM: A Parallel Version of ICTM Exploiting Memory Placement Strategies for NUMA Machines

NUMA-aware Matrix-Matrix-Multiplication Max Reimann, Philipp Otto 1 About this talk

GARDEN CORNER CURVES INTRODUCTION Updat e: Garden Corner Curves Concept St udy Result

CMSC 131 Fall 2018 Announcements Project #5 due on Thursday Corner Cases What are corner

ShfmLocks: Scalable and Practjcal Locking for Manycore Systems Changwoo Min COSMOSS Lab / ECE /

IS THE BRAIN A STATISTICIAN? Work in progress with Aline Duarte, Guilherme Ost, Antonio Galves,

Welcome to FOSDEM! Philip Paeps &lt;philip@fosdem.org&gt; &amp; Pascal Bleser

Project Application Development Building an IJVM emulator Sebastian Osterlund

with Machine Learning Michela Paganini Yale 1 Yale How does ML empower Physics at the

Case Study of Molecular Algorithm Design CMC12, Fontainebleau/Paris, August 2011 Gerd Gruenert,

s 2 ( 2573 ) and the prediction of novel exotic charmed mesons R. Molina 1 , T. Branz 2 , and E.

Averaging Robertson-Walker Cosmologies Iain A. Brown Institut f ur Theoretische Physik,

Strategies to determine the X(3872) energy from QCD lattice simulations R. Molina 1 , E. J. Garzon

Welcome to FOSDEM! Philip Paeps <philip@fosdem.org> & Pascal Bleser