cresitt event ia embarquee et recherche amont
play

CRESITT EVENT IA EMBARQUEE ET RECHERCHE AMONT CEA Presentation for - PowerPoint PPT Presentation

CRESITT EVENT IA EMBARQUEE ET RECHERCHE AMONT CEA Presentation for CRESITT | October 17th, 2019 Sandrine Varenne, David Briand CEA LIST sandrine.varenne@cea.fr | 1 IA EMBARQUE ET RECHERCHE AMONT 1 LES TRAVAUX DU CEA DRT ( DIRECTION DE LA


  1. CRESITT EVENT IA EMBARQUEE ET RECHERCHE AMONT CEA Presentation for CRESITT | October 17th, 2019 Sandrine Varenne, David Briand CEA LIST sandrine.varenne@cea.fr | 1

  2. IA EMBARQUÉE ET RECHERCHE AMONT 1 LES TRAVAUX DU CEA DRT ( DIRECTION DE LA RECHERCHE TECHNOLOGIQUE ) EN INTELLIGENCE ARTIFICIELLE 2 A PERÇU GÉNÉRAL DE NOS ACTIVITÉS EN IA EMBARQUÉE 3 ZOOM SUR NOS OUTILS N2D2 ET NOS ACCÉLÉRATEURS HARDWARE (PNEURO, DNEURO…) 4 CONCLUSION | 2

  3. CEA TECH & Artificial Intelligence Text & Audio Semantics Video Other signals.. Images DATA Architecture Data analytics NVM Algorithms Architecture Software Hardware IC Conception Algorithm Certification & know-how know-how Adequation software Communication verification Tools… 3D Integration Smart … … Systems CEA CONFIDENTIEL | 3

  4. CEA TECH & Artificial Intelligence To address the embedded Challenges Days, weeks on multi-GPU server until correct accuracy (topology, training set, parameters…) Labeled Machine learning DNN databases algorithm model Nvidia DGX-1 (8 Tesla P100) training prediction “A car” New data DNN trained model prediction Low-latency inference (TPU, FPGA, GPU, PNeuro…) CEA CONFIDENTIEL | 4

  5. KNOW-HOW OF CEA IN DEEP LEARNING & EMBEDDED AI EXPERIENCES Possible FRAMEWORKS Code Generation link with Modules for CPU, Manycore CPU, GPU, FPGA, OFF THE SHELF Optimized C Cuda HLS TensorRT Dedicated HW ELEMENTS CuDNN C++ OpenMP OpenCL HW LIBRAIRIES DNEURO HW IP SPIKING+ NVM PNEURO SPIKING CEA CONFIDENTIEL | 5

  6. N2D2 An European Platform to address Embedded Systems’ Challenges N2D2 has been totally developed by CEA Database Handling and Data Preprocessing Help  • Data conditioning • Semi automatic Data labelling Standalone Code generation for • COTS* Components (CPU, GPU, FPGA)  • Specific Hardware Targets (ST, Kalray, Renesas …) • NN Hardware Accelerators based on CEA IP >> Well adapted for embedded AI Decision help for the implementation phase  • Hardware Cost & Form Factor • Power Consumption • Latency  Spike Coding * COTS : Commercial Off-The-Shelf Components CEA CONFIDENTIEL | 6

  7. Context / Motivations • Deep Neural Networks (DNN) are very successful in the vast majority of classification/recognition benchmarks … on high-end multi-250W GPU clusters 85 Top-1 ImageNet accuracy (%) 80 • Embedding low-power DNN remains challenging: 75 • Must adapt and simplify DNN topologies 70 • Reduce layers complexity (number of operations) 65 • Reduce precision (8 bit integer or less) 60 • Today’s general purpose CoTS are inefficient for DNNs 55 • Number of cores too low 50 • Computing cores too complex (floating point computation) • 45 Low MAC/cycle efficiency 10 100 1000 10000 100000 • Insufficient memory Complexity (MMACs)  Balancing speed/power and applicative performances is a major challenge  Need for a framework to automate DNN shrinking exploration and evaluation, performances projection and porting on embedded platforms CONFIDENTIEL

  8. Deep learning for embedded computing N2D2 : DNN design framework • Unified modeling and NN exploration tool • Custom applications building & optimization (CNN, Faster- RCNN…) O PTIMIZED E MBEDDED • Hardware mapping & benchmarking (CPUs, GPUs, FPGAs, ASIPs) C ODE G ENERATION • N2D2 is available at https://github.com/CEA-LIST/N2D2/ ACCELERATION H ARDWARE Embedded Programmable processor PNeuro ASIC neural Dee • Clustered 8-bit SIMD architecture computing • Designed for DNN processing chains and image processing • Published at DATE 2018 FPGA H ARDWARE A CCELERATION Dataflow FPGA IP DNeuro • Optimized RTL DNN layer kernels • Automatic RTL generation through N2D2 • Dataflow computation, designed to use the DSP available on FPGA CONFIDENTIEL

  9. Deep learning for embedded computing N2D2 : DNN design framework • Unified modeling and NN exploration tool • Custom applications building & optimization (CNN, Faster- RCNN…) O PTIMIZED E MBEDDED • Hardware mapping & benchmarking (CPUs, GPUs, FPGAs, ASIPs) C ODE G ENERATION • N2D2 is available at https://github.com/CEA-LIST/N2D2/ Motivations • Deep Neural Networks (DNN) are today extremely successful in the vast Embedded majority of classification/recognition benchmarks… on high -end multi-250W neural Dee GPU clusters computing • Embedding low-power DNN remains challenging: • Must adapt and simplify DNN topologies • Reduce layers complexity (number of operations) • Reduce precision (8 bit integer)  Balancing speed/power and applicative performances is a major challenge Need for a framework to automate DNN shrinking exploration and • evaluation, performances projection and porting on embedded platforms CONFIDENTIEL

  10. N2D2: DNN Design Environment • A unique platform for the design and exploration of DNN applications SW DNN libraries CONSIDERED CRITERIA COTS • OpenCL, OpenMP, • Accuracy (approximate computing…) • Many-core CPUs • Memory need CuDNN, CUDA, (MPPA, P2012, ARM…) • Computational Complexity TensorRT • GPUs, FPGAs • PNeuro, ASMP Learning & HW ACCELERATORS HW DNN libraries Test PNeuro DNeuro, C/HLS databases Optimization Trained DNN Data Modeling Learning Test Code Generation Code Execution conditioning CONFIDENTIEL

  11. N2D2: Data Augmentation, Conditioning and Analysis • N2D2 integrates data processing and analysis dataflow building • Genericity: process image and sound, 1D, 2D or 3D data • Associate a label for each data point, 1D or 2D labels • Support arbitrary label shapes (circular, rectangular, polygonal or pixel-wise defined) • Apply transformations to data, pixel-wise labels and geometrical labels • Basic operations: rescaling, flipping, normalization, affine, filtering, DFT… • Advanced operations: elastic distortion, random slices/labels extraction, morphological reconstructions… Test set Data channels Validation set Learn set Channel Channel Channel Extract Affine Extract STATS DATA- Affine Slice STATS Extract Affine DATA- Slice Rescale STATS Op=-STATS Data- Slice Rescale Op=-STATS BASE Extract DL Core / Rescale BASE Extract Op=-STATS .mean base .mean Extract Affine Spike coding .mean STATS Channel Affine STATS Channel Affine STATS Channel Op/=STATS Extract Op/=STATS Extract Op=/STATS .stdDev Extract .stdDev .stdDev (cumulative) mean min Nb. of data Annotation data max (geometric mean (cumulative) Nb. of data min and pixel-wise) Value Transformation module max Data analysis module Value CONFIDENTIEL

  12. N2D2: Typical Outputs Layer-wise detailed memory Dataflow visualization Results visualization: and computing requirements - Pixel-wise segmentation - ROI bounding box extraction and classification N2D2 INI network description file ; Database Input=conv1 [database] Type=Conv Type=MNIST_IDX_Database KernelWidth=5 Validation=0.2 KernelHeight=5 NbChannels=12 ; Environment Stride=2 [env] ConfigSection=common.config SizeX=24 SizeY=24 ; Third layer (fully connected) [fc1] BatchSize=128 Input=conv2 [env.Transformation] Type=Fc Type=PadCropTransformation NbOutputs=100 Width=[env]SizeX ConfigSection=common.config Height=[env]SizeY ; Output layer (fully connected) [env.OnTheFlyTransformation] [fc2] Type=DistortionTransformation Input=fc1 ApplyTo=LearnOnly Type=Fc ElasticGaussianSize=21 NbOutputs=10 ElasticSigma=6.0 ConfigSection=common.config ElasticScaling=36.0 Scaling=10.0 ; Softmax layer [soft] Rotation=10.0 Input=fc2 ; First layer (convolutionnal) Type=Softmax [conv1] NbOutputs=10 Input=env WithLoss=1 Type=Conv ConfigSection=common.config KernelWidth=5 KernelHeight=5 ; Common solvers config [common.config] NbChannels=6 Stride=2 WeightsSolver.LearningRate=0.05 ConfigSection=common.config WeightsSolver.Decay=0.0005 Solvers.LearningRatePolicy=StepDecay Solvers.LearningRateStepSize= [sp] _EpochSize ; Second layer (convolutionnal) [conv2] Solvers.LearningRateDecay=0.993 Layer-wise weights and kernels Pixel-wise and object wise visualization, distribution and Layer-wise output visualization confusion matrix reporting data-range analysis and data-range analysis CONFIDENTIEL

  13. N2D2: DNN Complexity Analysis High weights memory High in/out buffer memory High computation Absolute Relative metrics metrics CONFIDENTIEL

  14. N2D2: Calibration for Integer Precision • Weights clamping and/or normalization • Layers output activation distribution quantization • Histogram analysis and optimal quantization threshold determination • Using Kullback – Leibler divergence  Goal: automatic and guaranteed best result without retraining CONFIDENTIEL

  15. N2D2: Hardware Exports GPU generic C++/OpenCL HLS FPGA (Intel) HLS FPGA (Xilinx) GPU (NVidia) C++/OpenCL C/HLS C++/CUDA/CuDNN/ TensorRT Dataflow DNeuro ( ) Support SSD and N2D2  TensorRT configurable Faster-RCNN RTL on Drive PX2 RTL library A unified tool for multiple MPPA ( ) hardware targets C++/OpenCL KaNN API DSP-like PNeuro ( ) programmable CPU x86 / ARM / DSP RTL/ASM SIMD processor C/OpenMP C++/OpenCL R-Car ( ) CNN-IP C API NeuroSpike ( ) RTL ASMP ( ) C/OpenMP/CVA8 Generic spike SystemC Generic / not optimized for a specific product CONFIDENTIEL

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend