A new Direct Connected Component Labeling and Analysis Algorithm for - PowerPoint PPT Presentation

A new Direct Connected Component Labeling and Analysis Algorithm for GPUs Arthur Hennequin 1 , 2 , Lionel Lacassagne 1 LIP6, Sorbonne University, CNRS, France 1 LHCb experiment, CERN, Switzerland 2 GTC 2019 March 21 st 1 / 29

What are Connected Component Labeling and Analysis ? Connected Components L abeling (CCL) consists in assigning a unique number (label) to each connected component of a binary image Connected Components A nalysis (CCA) consists in computing some features associated to each connected component like the bounding box [ x min , x max ] x [ y min , y max ], the sum of pixels S , the sums of x and y coordinates Sx , Sy 1 2 binary level image connected component connected component gray level image (segmentation by labeling analysis (motion detection) • seems easy for a human being that has a global view of the image but, • ill-posed problem: the computer has only a local view around a pixel (neighborhood) • important in computer vision for pattern recognition, motion detection ... 2 / 29

Two classes of CCL algorithms • multi-pass iterative algorithms ◮ compute the local positive min over a 3 × 3 neighborhood ◮ until stabilization : the number of iterations depends on the data ◮ not predictable, nor suited for embedded systems • two-pass direct algorithms ◮ first pass = temporary label creation and equivalence building ◮ need an equivalence table to memorize the connectivity between labels ◮ then transitive closure of the tree associated to the equivalence table ◮ second pass = label relabeling • on CPU, scalar algorithms are all direct and can be parallelized • on SIMD CPU, until 2019, all SIMD algorithms are iterative, except 1 • on GPU, until 2018, all algorithms are iterative, except 3 Why so few direct algorithms on GPU and SIMD ? ⇒ because extremely complex to design (not suited for SIMD nor GPU) 3 / 29

Direct algorithms are based on Union-Find structure Algorithm 2: Find( e , T ) Algorithm 1: Rosenfeld labeling algorithm while T [ e ] � = e do for i = 0 : h − 1 do e ← T [ e ] for j = 0 : w − 1 do if I [ i ][ j ] � = 0 then return e // the root of the tree e 1 ← E [ i − 1][ j ] e 2 ← E [ i ][ j − 1] if ( e 1 = e 2 = 0 ) then Algorithm 3: Union( e 1 , e 2 , T ) ne ← ne + 1 r 1 ← Find( e 1 , T ) e x ← ne r 2 ← Find( e 2 , T ) else if ( r 1 < r 2 ) then r 1 ← Find ( e 1 , T ) T [ r 2 ] ← r 1 r 2 ← Find ( e 2 , T ) else e x ← min + ( r 1 , r 2 ) T [ r 1 ] ← r 2 if ( r 1 � = 0 and r 1 � = e x ) then T [ r 1 ] ← e x if ( r 2 � = 0 and r 2 � = e x ) then T [ r 2 ] ← e x else Algorithm 4: Transitive Closure e x ← 0 for i = 0 : ne do E [ i ][ j ] ← e x T [ e ] ← T [ T [ e ]] Parallel algorithms do: • sparse addressing ⇒ scatter/gather SIMD instructions (AVX512/SVE) • concurrent min computation ⇒ recursive atomic min instruction (CUDA) 4 / 29

Classic direct algorithm: Rosenfeld (1966) Rosenfeld algorithm is the first 2-pass algorithm with an equivalence table • when two labels belong to the same component, an equivalence is created and stored into the equivalence table T • for example, there is an equivalence between 2 and 3 (stair pattern) and between 4 and 2 (concavity pattern) • stair and concavity are the only two patterns generator of equivalence • here, background in gray and foreground in white 1 1 1 0 0 0 1 1 predecessor predecessor 1 1 2 pixels labels 1 0 0 0 1 1 1 1 p1 e1 2 ex 1 1 ex 1 0 1 0 1 1 1 1 p2 px e2 ex stair concavity 1 0 1 1 1 1 1 1 patterns generator binary image of pixels current pixel current label of equivalence image of pixels image of labels 1 1 1 0 0 0 2 2 1 1 1 0 0 0 2 2 1 0 0 0 3 3 2 2 1 0 0 0 2 2 2 2 1 0 4 0 2 2 2 2 1 0 2 0 2 2 2 2 1 0 4 4 2 2 2 2 1 0 2 2 2 2 2 2 image of labels image of labels after relabeling e 0 1 2 3 4 3 1 2 T[e] 0 1 2 2 2 4 equivalence table equivalence trees 5 / 29

Parallel State-of-the-art • Parallel Light Speed Labeling[1](L. Cabaret, L. Lacassagne, D. Etiemble) (2018) ◮ parallel algorithm for CPU ◮ based on RLE (Run Length Encoding) to speed up processing and saves memory accesses ◮ current fastest CCA algorithm on CPU • Distanceless Label Propagation[2](L. Cabaret, L. Lacassagne, D. Etiemble) (2018) ◮ direct CCL algorithm for GPU • Playne-Equivalence[3](D. P. Playne, K.A. Hawick) (2018) ◮ direct CCL algorithm for GPU (2D and 3D versions) ◮ based on the analysis of local pixels configuration to avoid unnecessary and costly atomic operations to save memory accesses. 6 / 29

Equivalence merge function & concurrency issue The direct CCL algorithms rely on Union-Find to manage equivalences. A parallel merge operation can lead to concurrency issues: 1 1 2 3 4 1 3 4 2 1 4 4 1 4 4 3 4 3 4 1 1 2 4 4 1 4 4 2 1 4 4 1 4 4 4 4 • 1 st example (top-left): no concurrency, T[3] ← 1, T[4] ← 1 • 2 nd example (top-right): no concurrency, T[3] ← 1, T[4] ← 2 • 3 rd example (bottom-left): non-problematic concurrency, T[4] ← 1, T[4] ← 1 • 4 th example (bottom-right): concurrency issue, T[4] ← 1, T[4] ← 2 ◮ 4 can’t be equal to 1 and 2 ◮ ⇒ 4 has to point to 1 and 2 has to point to 1 too... 7 / 29

Equivalence merge function (aka recursive Union) The merge function, introduced by Playne and Hawick, solves the concurrency issues by iteratively merging labels using atomic operations Algorithm 5: merge(L, e 1 , e 2 ) while e 1 � = e 2 and e 1 � = L[e 1 ] do e 1 ← L[e 1 ] // root of e 1 while e 1 � = e 2 and e 2 � = L[e 2 ] do e 2 ← L[e 2 ] // root of e 2 while e 1 � = e 2 do if e 1 < e 2 then swap (e 1 , e 2 ) e 3 ← atomicMin (L[e 1 ], e 2 ) // recursive min if e 3 = e 1 then e 1 ← e 2 else e 1 ← e 3 By definition, e 3 ≤ L[ e 1 ], so: • if e 3 = e 1 : no concurrent write, update of L is successful, terminates the loop • if e 3 < e 1 : concurrent write, L was updated by another thread, need to merge e 3 and e 2 8 / 29

H ardware A ccelerated algorithm : HA4 Analysis of state-of-the-art weaknesses: • vertical borders (non-coalescent memory accesses) • expensive atomic operations Analysis of state-of-the-art strengths: • equivalence table embedded in the image (Cabaret, Playne) • merge function (Komura [4] + Playne) • segments labeling (Light Speed Labeling) • necessary condition to merge two equivalence trees (Playne) Figure 1: All possible 4 pixels configurations. Only (f) need to merge labels. (Playne) 9 / 29

H ardware A ccelerated: HA4 The algorithm is divided into 3 kernels: • strip labeling: the image is split into horizontal strips of 4 rows. Each strip is processed by a block of 32 × 4 threads (one warp per row). Only the head of segment is labeled • border merging: to merge the labels on the horizontal borders between strips • relabeling / features computation: to propagate the label of each segment to the pixels or to compute the features associated to the connected components 10 / 29

Example – Strip labeling initialization (Step #0) The 8 × 8 image is divided into 2 strips of 8 × 4 pixels, warp size = 8 Initial strip labeling: 0 1 2 3 4 5 6 7 0 6 0 • only the head of each segment ( start node ) 1 8 12 2 1 6 1 8 2 0 is labeled with an unique label 3 2 4 2 6 • equal to its linear address: L [ k ] = k 0 3 2 3 4 ∆ 1 40 43 47 with k = y × width + x 2 48 54 3 56 62 • warning: label numbering starts at 0, not 1 (a) Initialization 11 / 29

Example – Strip labeling (Step #1) After initialization: • detection of merging nodes using necessary conditions in each thread • update of start nodes only Strips’ segments are now labeled 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 0 6 0 0 6 0 32 1 8 12 1 0 6 2 1 6 1 8 2 0 2 8 1 2 1 2 8 6 40 34 3 2 4 2 6 3 1 6 1 8 16 12 48 43 47 3 2 3 4 0 3 2 3 2 0 40 43 47 1 32 34 34 1 20 18 56 54 2 48 54 2 40 47 3 56 62 3 48 54 26 62 (b) Strip labeling (c) Strip labeled Here, a CC spanning over several strips is represented by 3 disjoint trees of labels 12 / 29

Example – Border merging (Step #2) Same merging operations on border nodes only. All the segments are correctly labeled. A CC spanning to several strips is represented by 1 tree. 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 0 6 0 0 0 1 0 6 1 0 6 2 8 1 2 1 2 2 8 1 2 1 2 3 1 6 1 8 3 1 6 1 8 0 3 2 3 2 0 0 3 2 1 32 34 34 1 32 34 34 2 40 47 2 40 47 3 48 54 3 48 54 (d) Border merging (e) Border merged 0 32 0 32 8 6 40 34 8 6 40 34 16 12 48 43 47 16 12 48 43 47 20 18 56 54 20 18 56 54 26 62 26 62 13 / 29

A new Direct Connected Component Labeling and Analysis Algorithm for - PowerPoint PPT Presentation

A new Direct Connected Component Labeling and Analysis Algorithm for GPUs Arthur Hennequin 1 , 2 , Lionel Lacassagne 1 LIP6, Sorbonne University, CNRS, France 1 LHCb experiment, CERN, Switzerland 2 GTC 2019 March 21 st 1 / 29 What are Connected

Connectivity and Biconnectivity 462 cec CS 16: Connectivity Connected Components Connected

Requirements of the Final Rule for Restaurant Menu Labeling Loretta Carey Food Labeling and

Definitions in the Final Rule for Restaurant Menu Labeling Loretta Carey Food Labeling and

Fall Seminar Seed Sampling & Labeling Larry Nees Seed Administrator Office of INDIANA

Hub Labeling Algorithms Andrew V. Goldberg Amazon.com A.V. Goldberg Hub Labeling 6/2/2016 1 /

Structured Perceptron CMSC 470 Marine Carpuat POS tagging Sequence labeling with the perceptron

Strongly connected components Finding strongly-connected components A strongly connected component

Continuous Latent Variables Oliver Schulte - CMPT 419/726 Bishop PRML Ch. 12 Principal Component

Fast Connected Component Labeling Algorithm Using A Divide and Conquer Technique by Jung-Me Park

Functional components Notification component Application received Refuse ? Notification

WIO IOSAP Project Budget Nairobi Convention WIO IOSAP Budget per Project Component COMPONENT

Section 1 Principal Component Analysis 1 / 16 Principal Component Analysis ST 810-006

Direct Link Networks Direct Link Networks 10/11/06 UIUC - CS/ECE438, Fall 2006 2 Direct Link

4 Keys to Success in Connected Cars Bosch Connected World 2019 The leading platform for connected

State of Collaboration Direct Deposit and Payroll Reissuance 1 1 Topics Direct Deposit

Direct loan Direct loan Information Information Feder deral Direct Student Loans l Direct

Overview Yandex Services Car Detection Yandex.Taxi 3D Car Detection Yandex

Zero-shot Sequence Labeling: Transferring Knowledge from Sentences to Tokens Marek Rei

Digital labels as the next step for informed consumer decisions - experiences from an EU field

Regulatory and Policy Updates Therapeutic Products Directorate Health Canada Cindy Evans

whats in a label? www.oprl.org.uk Founded in 2009 as an arms length not -for-profit by

Is SEN a stigmatising or positive label the parents view York Parent Carer Forum University of

Development of a novel ECO-LABELing EU-harmonized methodology for cost-effective, safer and

BUDGET OVERVIEW: HOUSE vs SENATE FOCUS ON EDUCATION FUNDING Nancy Chamberlain April 5, 2017

A new Direct Connected Component Labeling and Analysis Algorithm for - PowerPoint PPT Presentation

A new Direct Connected Component Labeling and Analysis Algorithm for GPUs Arthur Hennequin 1 , 2 , Lionel Lacassagne 1 LIP6, Sorbonne University, CNRS, France 1 LHCb experiment, CERN, Switzerland 2 GTC 2019 March 21 st 1 / 29 What are Connected

Connectivity and Biconnectivity 462 cec CS 16: Connectivity Connected Components Connected

Requirements of the Final Rule for Restaurant Menu Labeling Loretta Carey Food Labeling and

Definitions in the Final Rule for Restaurant Menu Labeling Loretta Carey Food Labeling and

Fall Seminar Seed Sampling &amp; Labeling Larry Nees Seed Administrator Office of INDIANA

Hub Labeling Algorithms Andrew V. Goldberg Amazon.com A.V. Goldberg Hub Labeling 6/2/2016 1 /

Structured Perceptron CMSC 470 Marine Carpuat POS tagging Sequence labeling with the perceptron

Strongly connected components Finding strongly-connected components A strongly connected component

Continuous Latent Variables Oliver Schulte - CMPT 419/726 Bishop PRML Ch. 12 Principal Component

Fast Connected Component Labeling Algorithm Using A Divide and Conquer Technique by Jung-Me Park

Functional components Notification component Application received Refuse ? Notification

WIO IOSAP Project Budget Nairobi Convention WIO IOSAP Budget per Project Component COMPONENT

Section 1 Principal Component Analysis 1 / 16 Principal Component Analysis ST 810-006

Direct Link Networks Direct Link Networks 10/11/06 UIUC - CS/ECE438, Fall 2006 2 Direct Link

4 Keys to Success in Connected Cars Bosch Connected World 2019 The leading platform for connected

State of Collaboration Direct Deposit and Payroll Reissuance 1 1 Topics Direct Deposit

Direct loan Direct loan Information Information Feder deral Direct Student Loans l Direct

Overview Yandex Services Car Detection Yandex.Taxi 3D Car Detection Yandex

Zero-shot Sequence Labeling: Transferring Knowledge from Sentences to Tokens Marek Rei

Digital labels as the next step for informed consumer decisions - experiences from an EU field

Regulatory and Policy Updates Therapeutic Products Directorate Health Canada Cindy Evans

whats in a label? www.oprl.org.uk Founded in 2009 as an arms length not -for-profit by

Is SEN a stigmatising or positive label the parents view York Parent Carer Forum University of

Development of a novel ECO-LABELing EU-harmonized methodology for cost-effective, safer and

BUDGET OVERVIEW: HOUSE vs SENATE FOCUS ON EDUCATION FUNDING Nancy Chamberlain April 5, 2017

Fall Seminar Seed Sampling & Labeling Larry Nees Seed Administrator Office of INDIANA