Toward Large-Scale Image Segmentation On Summit Sudip K. Seal , - PowerPoint PPT Presentation

Toward Large-Scale Image Segmentation On Summit Sudip K. Seal , Seung-Hwan Lim, Dali Wang, Jacob Hinkle, Dalton Lunga, Aristeidis Tsaris Oak Ridge National Laboratory, USA August 19, 2020 International Conference on Parallel Processing Alberta, Canada ORNL is managed by UT-Battelle, LLC for the US Department of Energy

Introduction Semantic Segmentation of Images Input Image Ø Given an image with 𝑂×𝑂 pixels and a set of 𝑙 distinct classes, label each of the 𝑂 ! pixels with one of the 𝑙 distinct classes. Ø For example, given a 256 ×256 image of a car, road, buildings and people, a semantic segmentation of the image classifies each of the 256×256 = 2 "# pixels into one of 𝑙 = 4 classes {car, road, building, people}. Semantic Segmentation Input image Segmented image Segmented Image conv 3 x 3, ReLU max pool, 2 x 2 up-conv 2 x 2 conv 1 x 1 copy and crop 2 2 Image credit: https://mc.ai/how-to-do-semantic-segmentation-using-deep-learning/

The U-Net Model 𝜗 = 3 ⋅ 2 $%! − 1 𝑒𝑜 & Input Image Output Image 𝑂 𝑂 − 𝜗 U-Net Architecture 𝑂 𝑂 − 𝜗 • Refer to this as the 𝜗 − region (halo). • Halo width ( 𝜗 ) is a function of U-Net architecture (depth, channel width, filter sizes, etc.). U-Net: Convolutional Networks for • Halo width ( 𝜗 ) determines the Biomedical Image Segmentation Olaf Ronneberger, Philipp Fischer, Thomas Brox receptive field of the model. Medical Image Computing and Computer- Assisted Intervention (MICCAI), Springer, LNCS, • Larger the receptive field, wider the Vol.9351: 234--241, 2015. length-scales of identifiable objects. 3 3

Why Is It A Summit-scale Problem? Larger receptive fields require larger models Ø Satellite images collected at high-resolutions (30-50 cm) yield Model size very large 10,000 x 10,000 images. Ø Most computer vision workloads deal with images of 𝑃 ( 10 ! × 10 ! ) resolution (for example, ImageNet). Sample size Data size 10000-fold larger Ø This work targets ultra-wide extent images with 𝑃 ( 10 ' ×10 ' ) image size resolution ⇒ 10,000-fold larger data samples ! Ø At present, requires many days to train a single model (even Multi-TB of data from DAQ systems. on special-purpose DL platforms like DGX boxes). Ø Hyperparameter tuning of these models take much longer. Ø Need accurate scalable high-speed training framework. Ø Large U-Net models are needed to resolve multi-scale objects (buildings, solar panels, land cover details). Ø Advanced DAQ systems generate vast amounts of high- resolution images ⇒ large data volume . 4 4

Sample Parallelism - Taming Large Image Size Leveraging Summit’s Vast GPU Farm Blue dashed square Ø Given a 𝑂×𝑂 image, U-Net segments a 0 is segmented for 0 0 , 0 1 𝑂 − 𝜗 ×(𝑂 − 𝜗) inset square. each appended tile. 𝑂 − 𝜗 𝑂 10,000 𝑂 − 𝜗 Tile size chosen such that appended 𝑂 tile plus model parameters fit on a single Summit GPU. Ø Partition each 𝑂×𝑂 = 10000×10000 image sample into non-overlapping tiles. Ø Append an extra halo region of width 𝜗 along each side of each tile. Ø Assign each appended tile to a Summit GPU. Use standard U-Net to segment appended tile. Ø Each GPU segments an area equal to that of the original non-overlapping tile. 5 5

Performance of Sample-Parallel U-Net Training Ø Optimal tiling for each 10000×10000 sample image was found to be 8×8 . Ø Each 1250×1250 tile was appended with a halo of width 𝜗 = 92 and assigned to a single Summit GPU. Ø 10 – 11 Summit nodes to train each 10000× 10000 image sample. Ø A U-Net model was trained on a data set of 100 10000×10000×4 satellite images, collected at 30- 50 cm resolution. Ø The training time per epoch was shown to be ∼ 12 seconds using 1200 Summit GPUs compared to ∼ 1,740 seconds on a DGX-1 . Ø Initial testing revealed no appreciable loss of training/validation accuracy using the new parallel framework. +100X Faster U-Net Training 6 6

Limitations of Sample Parallelism 𝐿 → 𝐺𝑗𝑚𝑢𝑓𝑠 𝑡𝑗𝑨𝑓 § 𝑇 → 𝑇𝑢𝑠𝑗𝑒𝑓 𝑚𝑓𝑜𝑕𝑢ℎ § 𝑄 → 𝑄𝑏𝑒𝑒𝑗𝑜𝑕 𝑚𝑗𝑨𝑓 § 𝑜 ! → 𝑂𝑝. 𝑝𝑔 𝑑𝑝𝑜𝑤𝑡 𝑞𝑓𝑠 𝑚𝑓𝑤𝑓𝑚 § 𝜗 = 3 ⋅ 2 $%! − 1 𝑒𝑜 & 𝑒 = 9 :%" ;<%!= − 1 𝑀 → 𝑜𝑝. 𝑝𝑔 𝑉𝑂𝑓𝑢 𝑚𝑓𝑤𝑓𝑚𝑡 § ′ 𝑈 : 0 0 × 0 ′ 𝑂×𝑂 = 𝑟 " (𝑈 # ×𝑈′) , 0 𝑈 § 1 = N 𝑈′×𝑈′ Ø An image of size 𝑂× 𝑂 is partitioned into a 𝑟×𝑟 array of 𝑈 ( ×𝑈 ( tiles. 𝑈×𝑈 ) ! )*+,- .*-/01 *2 &*03/+,+4*56 317 +4-1 '8 Ø 𝐹 ∼ )*+,- .*-/01 *2 /612/- &*03/+,+4*56 317 +4-1 = 𝑃 ) "! ∼ 𝑃 1 + 𝑟 𝑂 = 10,000 9 𝑈×𝑈 Ø Ideally, 𝐹 = 1 . Ø Decreasing 𝑟 (increasing tile sizes) increases the memory 𝑂 = 𝑟𝑈 ( requirement and quickly overtakes memory available per GPU. Ø Decreasing 𝜗 decreases the receptive field of the model. Ø On the other hand, the goal is to decrease 𝑟 and increase 𝜗. Ø Decrease 𝑟 ⇒ increasing tile size 𝑈′ and decreasing 𝜗 steers away from target receptive fields. Ø To satisfy both, larger U-Net models than can fit on a GPU needed. Ø Need model-parallel execution. 7 7

Model-Parallelism - Taming Large Model Size Node-level Pipeline-Parallel Execution Single Summit Node No load balance GPU 1 GPU 2 GPU 3 GPU 4 GPU 5 GPU 6 -- skip connections omitted for ease of presentation -- Memory needed/GPU = size(micro-batch) + size(partition) Number of consecutive layers mapped to a GPU is Ø called partition . $ ! $ " $ # $ $ $ ) $ * $ "! $ "" $ % $ & $ ' $ ( $ "# $ "$ $ "% $ "& Number of layers in each partition is called balance . Ø # $ ! ! ! ! " $$ " $# " $" " $! Update $! $" $# $$ Subdivide each mini-batch of tiles into smaller micro- ! ! ! ! " #$ " ## " #" " #! # # Ø Update #! #" ## #$ batches that are assigned to each partition. # " ! ! ! ! " "$ " "# " "" " "! Update "! "" "# "$ Micro-batches per partition ≡ 𝑛𝑐𝑞𝑞 Ø # ! ! ! ! ! " !$ " !# " !" " !! Update !! !" !# !$ TorchGPipe: PyTorch implementation of Gpipe* Framework * Huang, Yanping, Yonglong Cheng, Dehao Chen, HyoukJoong Lee, Jiquan Ngiam, Quoc V. Le and Zhifeng Chen. “GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism.” NeurIPS (2019). 8 8

Model Parallel Experiments !×! Image Samples #×# Padded Tiles Single Node Execution !′×!′ 96 GB !×! MODEL PARALLEL U-NET !′×!′ !×! Benchmark U-Net Models !′×!′ !×! SUMMIT NODE No. of Conv. Layers No. of Trainable !×! SAMPLE PARALLEL Model 𝝑 Levels Per Level Parameters Small (Standard) 5 2 72,301,856 92 Medium-1 5 5 232,687,904 230 10× larger number of trainable parameters. • 4× fold larger receptive field. • Medium-2 6 2 289,357,088 188 Large 7 2 1,157,578,016 380 Medium -1 𝟑. 𝟗× (192) , 𝟑. 𝟔× (512) and 𝟑× (1024) Speedup doubles (small: 1.97 ; medium-2: 2.01 ) speedup using 6 pipeline stages. as no. of pipeline stages increases from 1 to 6. 9 9

Need for Performance Improvement Single Node Execution Ø Small, Medium-2 and Large Models: Ø Layers: 109, 129 and 149. Ø Balances: small {14, 24, 30, 22, 12, 7}; medium-2 {16, 26, 38, 26, 12, 11}; large {18, 30, 44, 30, 14, 13}. Ø Need load balanced pipelined execution. n c ! X ( I ` − i · d ) 2 Ø Encoder memory: I 2 ` + 2 ` n f E ` = O i =1 n c !! Ø Decoder memory: 2 ` 0 n f X ( I ` 0 − i · d ) 2 2 I 2 D ` 0 = O ` 0 + i =1 ℓ ( = 𝑀 − ℓ Ø Memory profile: 𝐹 ℓ + 𝐸 ℓ( vs. ℓ , 10 10

Wrapping Up This Paper: Prototype Sample + Model Parallel Framework Ø Training image segmentation neural !×! Image Samples network models become extremely #×# Padded Tiles challenging when: !′×!′ 96 GB !×! Ø Image sizes are very large MODEL PARALLEL U-NET !′×!′ !×! Ø Desired receptive fields are large !′×!′ !×! SUMMIT NODE Ø Volume of training data is large. !×! SAMPLE PARALLEL Ø Fast training/inference needed for 𝟐𝟏× larger number of trainable parameters. • 𝟐𝟏𝟏𝟏𝟏× larger image size. • 𝟓× fold larger receptive field. • geo-sensing applications –satellite imagery, disaster assessment, precision Load Balance Heuristics Data Parallelism agriculture, etc. Ø This work is a first step – can train 10× Ongoing Work: Sample + Model + Data Parallel Framework larger U-Net models with 4× larger receptive field on 10000× larger SAMPLE PARALLEL 96 GB !×! #×# Padded Tiles images. MODEL PARALLEL U-NET !×! Image Samples !×! DATA PARALLEL Ø Ongoing efforts are underway to MODEL PARALLEL U-NET !′×!′ integrate load balancing heuristics !×! !′×!′ !×! MODEL PARALLEL U-NET and data-parallel execution to handle !′×!′ large volumes of training data !×! !×! MODEL PARALLEL U-NET efficiently. !×! SUMMIT NODE 11 11

THANK YOU 12 12

Toward Large-Scale Image Segmentation On Summit Sudip K. Seal , - PowerPoint PPT Presentation

Toward Large-Scale Image Segmentation On Summit Sudip K. Seal , Seung-Hwan Lim, Dali Wang, Jacob Hinkle, Dalton Lunga, Aristeidis Tsaris Oak Ridge National Laboratory, USA August 19, 2020 International Conference on Parallel Processing Alberta,

VIDEO SIGNALS Segmentation WHAT IS SEGMENTATION WHAT IS SEGMENTATION Segmentation is a

Segmentation Bottom-up Segmentation Semantic / instance segmentation Many Slides from L.

Semantic Segmentation / Instance Segmentation Based on Deep learning Yiding Liu 2018.12.08

Image Segmentation Image Segmentation: Definitions How do we know which groups of pixels in a

Segmentation Segmentation Segmentation Define the accurate boundaries of all objects in an image

Lecture 8: Image Segmentation Peng Chao Face++ Researcher pengchao@megvii.com Nov. 2017

Segmentation 2014-11-14 Robin Strand Centre for Image Analysis Dept. of IT Uppsala University

Pixel-Level Im Image Understanding wit ith Semantic Segmentation and Panoptic Segmentation

Image Segmentation Machine Learning Study Group Presented by Yaochen Xie Jan 25, 2018 Outline

Segmentation using Segmentation using Bayesian Decision Theory Bayesian Decision Theory

A large-scale International IPv6 Network A large-scale International IPv6 Network www.6net.org

Segmentation H. Papasaika, E. Baltsavias Image Segmentation Partitioning of an image into a set

Part 1 : Image Segmentation Anne Vialard LaBRI, Universit de Bordeaux Contents Introduction

Image Restoration Image Enhancement and Image Restoration both deal with improving images. Image

FINANCING LARGE SCALE SOLAR Large Scale Solar Conference - Sydney Gloria Chan Director, Large

YIN XU 1. Image Segmentaion & Retrieval What is image segmentation? Whats the

Chapter 5: Concentration The Probabilistic Method Summer 2020 Freie Universitt Berlin Chapter

MANAGING CREDIT RISK UNDER THE BASEL III FRAMEWORK: THE PRESENTATION SLIDES Download Free

Project Topic (by SeonHong Na, Columbia University, New York) Chemo mechanical coupling

(Nearly) Efficient Algorithms for the Graph Matching Problem Tselil Schramm (Harvard/MIT) with

Asymptotic limits of the Shallow Water equations Carine Lucas MAPMO - univ. Orl eans, France

Propagation of tropical heating perturbations to the midlatitudes and the role of orography

Large-scale learning for image classification Zaid Harchaoui CVML13, July 2013 Zaid Harchaoui

Matching Using LSH Forest Michael Cochez * 1st International KEYSTONE Conference * Industrial

Toward Large-Scale Image Segmentation On Summit Sudip K. Seal , - PowerPoint PPT Presentation

Toward Large-Scale Image Segmentation On Summit Sudip K. Seal , Seung-Hwan Lim, Dali Wang, Jacob Hinkle, Dalton Lunga, Aristeidis Tsaris Oak Ridge National Laboratory, USA August 19, 2020 International Conference on Parallel Processing Alberta,

VIDEO SIGNALS Segmentation WHAT IS SEGMENTATION WHAT IS SEGMENTATION Segmentation is a

Segmentation Bottom-up Segmentation Semantic / instance segmentation Many Slides from L.

Semantic Segmentation / Instance Segmentation Based on Deep learning Yiding Liu 2018.12.08

Image Segmentation Image Segmentation: Definitions How do we know which groups of pixels in a

Segmentation Segmentation Segmentation Define the accurate boundaries of all objects in an image

Lecture 8: Image Segmentation Peng Chao Face++ Researcher pengchao@megvii.com Nov. 2017

Segmentation 2014-11-14 Robin Strand Centre for Image Analysis Dept. of IT Uppsala University

Pixel-Level Im Image Understanding wit ith Semantic Segmentation and Panoptic Segmentation

Image Segmentation Machine Learning Study Group Presented by Yaochen Xie Jan 25, 2018 Outline

Segmentation using Segmentation using Bayesian Decision Theory Bayesian Decision Theory

A large-scale International IPv6 Network A large-scale International IPv6 Network www.6net.org

Segmentation H. Papasaika, E. Baltsavias Image Segmentation Partitioning of an image into a set

Part 1 : Image Segmentation Anne Vialard LaBRI, Universit de Bordeaux Contents Introduction

Image Restoration Image Enhancement and Image Restoration both deal with improving images. Image

FINANCING LARGE SCALE SOLAR Large Scale Solar Conference - Sydney Gloria Chan Director, Large

YIN XU 1. Image Segmentaion &amp; Retrieval What is image segmentation? Whats the

Chapter 5: Concentration The Probabilistic Method Summer 2020 Freie Universitt Berlin Chapter

MANAGING CREDIT RISK UNDER THE BASEL III FRAMEWORK: THE PRESENTATION SLIDES Download Free

Project Topic (by SeonHong Na, Columbia University, New York) Chemo mechanical coupling

(Nearly) Efficient Algorithms for the Graph Matching Problem Tselil Schramm (Harvard/MIT) with

Asymptotic limits of the Shallow Water equations Carine Lucas MAPMO - univ. Orl eans, France

Propagation of tropical heating perturbations to the midlatitudes and the role of orography

Large-scale learning for image classification Zaid Harchaoui CVML13, July 2013 Zaid Harchaoui

Matching Using LSH Forest Michael Cochez * 1st International KEYSTONE Conference * Industrial

YIN XU 1. Image Segmentaion & Retrieval What is image segmentation? Whats the