tsing nghua hua university versity introduction
play

Tsing nghua hua University versity Introduction Deep learning - PowerPoint PPT Presentation

Deep ep500 500 BOF 2018 Jidong ong Zhai Tsing nghua hua University versity Introduction Deep learning has widely used in lots of areas Introduction A lot of deep learning frameworks, compute libraries and acceleration devices


  1. Deep ep500 500 BOF 2018 Jidong ong Zhai Tsing nghua hua University versity

  2. Introduction • Deep learning has widely used in lots of areas

  3. Introduction • A lot of deep learning frameworks, compute libraries and acceleration devices CNTK Frameworks ··· Compute BLAS ··· Libraries Compute TPU ··· Devices

  4. Introduction • However, how to evaluate? ? ? ? Benchmark CNTK Frameworks ··· Compute BLAS ··· Libraries Compute TPU ··· Devices

  5. Introduction • However, how to evaluate? ? ? ? Benchmark Set Which is better? Optimization Target CNTK Frameworks ··· Compute Running Time BLAS ··· Libraries Resource Use Promote Scalability Development Efficiency Compute … TPU ··· Devices

  6. Related Deep Learning Benchmarks convnet- TensorFlow DeepBench 2 DAWNBench 3 benchmarks 1 Benchmark 4 Framework Compute Library Compute Library Target Framework Compute Library Compute Device Framework Granularity Neural Network Basic Operation Neural Network Neural Network Models Training Low Diversity Diversity Only CNN 2 CNN + 1 RNN 4 CNN Inference CIFAR10 、 ImageNet Limited Dataset Dataset ImageNet Dummy Data ImageNet SQuAD Training Time and Single Metric Metrics Time Per Iteration Time Cost to certain Total Training Time Accuracy 1. convnet-benchmarks: https://github.com/soumith/convnet-benchmarks 2. Baidu DeepBench: https://github.com/baidu-research/DeepBench 3. Cody A. Coleman et al. DAWNBench: An End-to-End Deep Learning Benchmark and Competition . NIPS 2017 4. TensorFlow Benchmark https://www.tensorflow.org/performance/benchmarks

  7. Related Deep Learning Benchmarks MLPerf 1 Framework Evaluation Target Compute Device Granularity Neural Network 1. Image(Classification, Detection) Characteristics 2. NLP(Translation, Sentiment Analysis) Various Applications Diversity 3. Speech(Recognition) 4. Reinforcement Learning & Recommendation Dataset ImageNet, COCO, WMT, Librispeech, MovieLens , … Various Datasets Evaluation Metrics Training Time, Power Use and Cost to certain Accuracy 1. https://mlperf.org/

  8. How to evaluate HPC systems for machine learning?

  9. Our Work on Workload Analysis for Deep Learning • Preliminary workload analysis Applications Image Machine Language Question Classification Translation Model Answering Models VGG ResNet Seq2seq RNN LM AoA Reader WikiText-2 Easy to obtain Cifar Real time Real Data Dummy Data Dataset CBTest Tatoeba Controllable Generative

  10. Our Work • Time • Time of every operation type within one iteration • Time of phases within one iteration Seq2seq AoA Reader RNN LM ResNet VGG 0 100 200 300 400 500 600 700 Time(ms) Data Forward Backward Loss Update

  11. Workload Analysis 18,432 1.0 • Memory Usage 16,384 0.8 Memory Use(MB) 14,336 • Memory Usage Break Down 12,288 0.6 Ratio • Memory Usage – Input Size 10,240 0.4 8,192 6,144 0.2 4,096 2,048 0.0 0 50000 100000 150000 200000 Pic Area(Pixel 2 ) Traning Inference Training/Inference Seq2seq 18,432 1.0 16,384 AoA Reader 0.8 14,336 Memory Use(MB) 12,288 RNN LM 0.6 10,240 Ratio 8,192 0.4 ResNet 6,144 4,096 0.2 VGG 2,048 0 0.0 0 2000 4000 6000 8000 10000 12000 14000 16000 0 200 400 600 800 1000 1200 Memory Use(MB) Sequence Length Weight Mediate Result + Temp Training Inference Training/Inference

  12. Workload Characterization • Hardware Counters • For GPU GPU Warp Execution Warp Non-Pred Execution Bandwidth TFLPOS Occupancy Efficiency Efficiency Utilization Normalized 1 0.46 1.00 1.00 4.02 5.65

  13. Questions about an HPC Oriented Deep Learning Benchmark • Questions we need to think: • Model Selection • Various application areas? • A synthetic model with main features? • Dataset • Fixed data set (Imagenet)? • A Generative Data? • Metrics • Time for training? • Gflops? • AI operations per second?

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend