Big Data for Data Science Scalable Machine Learning - PowerPoint PPT Presentation

Big Data for Data Science Scalable Machine Learning event.cwi.nl/lsde

A SHORT INTRODUCTION TO NEURAL NETWORKS credits: cs231n.stanford.edu; Fei-Fei Li, Justin Johnson, Serena Yeung event.cwi.nl/lsde

Example: Image Recognition Input Image Weights Loss AlexNet ‘convolutional’ neural network credits: cs231n.stanford.edu; Fei-Fei Li, Justin Johnson, Serena Yeung event.cwi.nl/lsde

Neural Nets - Basics • Score function (linear, matrix) • Activation function (normalize [0-1]) • Regularization function (penalize complex W) credits: cs231n.stanford.edu; Fei-Fei Li, Justin Johnson, Serena Yeung event.cwi.nl/lsde

Neural Nets are Computational Graphs • Score , Activation and Regularization together with a Loss function 1.00 • For backpropagation, we need a formula for the “gradient”, i.e. the derivative of each computational function: credits: cs231n.stanford.edu; Fei-Fei Li, Justin Johnson, Serena Yeung event.cwi.nl/lsde

Training the model: backpropagation • backpropagate loss to the weights to be adjusted, proportional to a learning rate -0.53 1.00 -1/(1.37) 2 *1.00 = -0.53 • For backpropagation, we need a formula for the “gradient”, i.e. the derivative of each computational function: credits: cs231n.stanford.edu; Fei-Fei Li, Justin Johnson, Serena Yeung, Song Han) event.cwi.nl/lsde

Training the model: backpropagation • backpropagate loss to the weights to be adjusted, proportional to a learning rate -0.53 -0.53 1.00 1 *-0.53 = -0.53 • For backpropagation, we need a formula for the “gradient”, i.e. the derivative of each computational function: credits: cs231n.stanford.edu; Fei-Fei Li, Justin Johnson, Serena Yeung, Song Han) event.cwi.nl/lsde

Training the model: backpropagation • backpropagate loss to the weights to be adjusted, proportional to a learning rate -0.53 -0.53 -0.20 1.00 e -1.00 *-0.53 = -0.20 • For backpropagation, we need a formula for the “gradient”, i.e. the derivative of each computational function: credits: cs231n.stanford.edu; Fei-Fei Li, Justin Johnson, Serena Yeung event.cwi.nl/lsde

Training the model: backpropagation • backpropagate loss to the weights to be adjusted, proportional to a learning rate -0.20 0.20 0.40 0.20 -0.40 0.20 -0.53 -0.53 0.20 -0.20 1.00 -0.60 0.20 • For backpropagation, we need a formula for the “gradient”, i.e. the derivative of each computational function: credits: cs231n.stanford.edu; Fei-Fei Li, Justin Johnson, Serena Yeung event.cwi.nl/lsde

Activation Functions credits: cs231n.stanford.edu; Fei-Fei Li, Justin Johnson, Serena Yeung event.cwi.nl/lsde

Get going quickly: Transfer Learning credits: cs231n.stanford.edu; Fei-Fei Li, Justin Johnson, Serena Yeung event.cwi.nl/lsde

Neural Network Architecture • (mini) batch-wise training • matrix calculations galore credits: cs231n.stanford.edu; Fei-Fei Li, Justin Johnson, Serena Yeung event.cwi.nl/lsde

DEEP LEARNING SOFTWARE credits: cs231n.stanford.edu; Fei-Fei Li, Justin Johnson, Serena Yeung event.cwi.nl/lsde

Deep Learning Frameworks ➔ Caffe2 Caffe Paddle (UC Berkeley) (Facebook) (Baidu) ➔ . PyTorch Torch CNTK (NYU/Facebook) (Facebook) (Microsoft) ➔ TensorFlow Theano MXNET (Univ. Montreal) (Google) (Amazon) • Easily build big computational graphs • Easily compute gradients in these graphs • Run it at high speed (e.g. GPU) credits: cs231n.stanford.edu; Fei-Fei Li, Justin Johnson, Serena Yeung event.cwi.nl/lsde

Deep Learning Frameworks ..have to compute ..gradient computations are ..similar to TensorFlow gradients by hand.. generated automagically from the Not a “new language” but forward phase (z=x*y; b=a+x; c= No GPU support embedded in Python sum(b)) + GPU support (control flow). credits: cs231n.stanford.edu; Fei-Fei Li, Justin Johnson, Serena Yeung event.cwi.nl/lsde

TensorFlow: TensorBoard GUI credits: cs231n.stanford.edu; Fei-Fei Li, Justin Johnson, Serena Yeung event.cwi.nl/lsde

Higher Levels of Abstraction f ormulas “by name” =stochastic gradient descent credits: cs231n.stanford.edu; Fei-Fei Li, Justin Johnson, Serena Yeung event.cwi.nl/lsde

Static vs Dynamic Graphs credits: cs231n.stanford.edu; Fei-Fei Li, Justin Johnson, Serena Yeung event.cwi.nl/lsde

Static vs Dynamic: optimization credits: cs231n.stanford.edu; Fei-Fei Li, Justin Johnson, Serena Yeung, Song Han) event.cwi.nl/lsde

Static vs Dynamic: serialization serialization = create a runnable program from the trained network credits: cs231n.stanford.edu; Fei-Fei Li, Justin Johnson, Serena Yeung, Song Han) event.cwi.nl/lsde

Static vs Dynamic: conditionals, loops credits: cs231n.stanford.edu; Fei-Fei Li, Justin Johnson, Serena Yeung event.cwi.nl/lsde

What to Use? • TensorFlow is a safe bet for most projects. Not perfect but has huge community, wide usage. Maybe pair with high-level wrapper (Keras, Sonnet, etc) • PyTorch is best for research. However still new, there can be rough patches. • Use TensorFlow for one graph over many machines Consider Caffe , Caffe2 , or TensorFlow for production deployment • Consider TensorFlow or Caffe2 for mobile credits: cs231n.stanford.edu; Fei-Fei Li, Justin Johnson, Serena Yeung event.cwi.nl/lsde

DEEP LEARNING PERFORMANCE OPTIMIZATIONS credits: cs231n.stanford.edu, Song Han event.cwi.nl/lsde

ML models are getting larger credits: cs231n.stanford.edu, Song Han event.cwi.nl/lsde

First Challenge: Model Size credits: cs231n.stanford.edu, Song Han event.cwi.nl/lsde

Second Challenge: Energy Efficiency credits: cs231n.stanford.edu, Song Han event.cwi.nl/lsde

Third Challenge: Training Speed credits: cs231n.stanford.edu, Song Han event.cwi.nl/lsde

Hardware Basics credits: cs231n.stanford.edu, Song Han event.cwi.nl/lsde

Special hardware? It’s in your pocket.. • iPhone 8 with A11 chip 6 CPU cores: Apple GPU 2 powerful 4 energy-efficient Apple TPU (deep learning ASIC) only on-chip FPGA missing (will come in time..) event.cwi.nl/lsde

Hardware Basics: Number Representation credits: cs231n.stanford.edu, Song Han event.cwi.nl/lsde

Hardware Basics: Memory = Energy larger model ➔ more memory references ➔ more energy consumed credits: cs231n.stanford.edu, Song Han event.cwi.nl/lsde

Pruning Neural Networks credits: cs231n.stanford.edu, Song Han event.cwi.nl/lsde

Pruning Neural Networks • Learning both Weights and Connections for Efficient Neural Networks, Han, Pool, Tran Dally, NIPS2015 credits: cs231n.stanford.edu, Song Han event.cwi.nl/lsde

Pruning Changes the Weight Distribution credits: cs231n.stanford.edu, Song Han event.cwi.nl/lsde

Pruning Happens in the Human Brain credits: cs231n.stanford.edu, Song Han event.cwi.nl/lsde

Trained Quantization credits: cs231n.stanford.edu, Song Han event.cwi.nl/lsde

Trained Quantization: Before • Continuous weight distribution credits: cs231n.stanford.edu, Song Han event.cwi.nl/lsde

Trained Quantization: After • Discrete weight distribution credits: cs231n.stanford.edu, Song Han event.cwi.nl/lsde

Trained Quantization: How Many Bits? • Deep Compression: compressing deep neural networks with pruning, trained quantization and Huffman coding, Han, Moa, Dally, ICLR2016 credits: cs231n.stanford.edu, Song Han event.cwi.nl/lsde

Quantization to Fixed Point Decimals (=Ints) credits: cs231n.stanford.edu, Song Han event.cwi.nl/lsde

Hardware Basics: Number Representation credits: cs231n.stanford.edu, Song Han event.cwi.nl/lsde

Mixed Precision Training credits: cs231n.stanford.edu, Song Han event.cwi.nl/lsde

DEEP LEARNING HARDWARE credits: cs231n.stanford.edu, Song Han event.cwi.nl/lsde

The end of CPU scaling event.cwi.nl/lsde

CPUs for Training - SIMD to the rescue? credits: cs231n.stanford.edu, Song Han event.cwi.nl/lsde

CPUs for Training - SIMD to the rescue? 4 scalar instructions 1 SIMD instruction event.cwi.nl/lsde

“ALU”: arithmetic logic unit (implements +, *, - etc. instructions) CPU vs GPU CPU: A lot of chip GPU: almost all chip surface for cache surface for ALUs memory and control (compute power) GPU cards have their own memory chips: smaller but nearby and faster than system memory credits: cs231n.stanford.edu; Fei-Fei Li, Justin Johnson, Serena Yeung event.cwi.nl/lsde

Programming GPUs • CUDA (NVIDIA only) – Write C-like code that runs directly on the GPU – Higher-level APIs: cuBLAS, cuFFT, cuDNN, etc • OpenCL – Similar to CUDA, but runs on anything – Usually slower :( All major deep learning libraries (TensorFlow, PyTorch, MXNET, etc) support training and model evaluation on GPUs. credits: cs231n.stanford.edu; Fei-Fei Li, Justin Johnson, Serena Yeung event.cwi.nl/lsde

CPU vs GPU: performance credits: cs231n.stanford.edu; Fei-Fei Li, Justin Johnson, Serena Yeung event.cwi.nl/lsde

Big Data for Data Science Scalable Machine Learning - PowerPoint PPT Presentation

Big Data for Data Science Scalable Machine Learning event.cwi.nl/lsde A SHORT INTRODUCTION TO NEURAL NETWORKS credits: cs231n.stanford.edu; Fei-Fei Li, Justin Johnson, Serena Yeung event.cwi.nl/lsde Example: Image Recognition Input Image

Machine Learning Anders Holst SICS Big Data Analytics Analysis Big Data Big Value Big Data

CS535 Big Data 1/22/2020 Sangmi Lee Pallickara CS535 Big Data | Computer Science Department

Big Data Algorithms with Medical Applications Yixin Chen Outline Challenges to big data

COMP9313: Big Data Management Introduction to Big Data Management What is big data? Tweeted by

CS535 Big Data 2/5/2020 Week 3- B Sangmi Lee Pallickara CS535 Big Data | Computer Science |

CS535 Big Data 1/29/2020 Week 2- B Sangmi Lee Pallickara CS535 Big Data | Computer Science |

CS535 Big Data 2/10/2019 Week 4-A Sangmi Lee Pallickara CS535 Big Data | Computer Science |

CS535 Big Data 4/13/2020 Week 12-A Sangmi Lee Pallickara CS535 Big Data | Computer Science |

CS535 Big Data 4/27/2020 Week 14-A Sangmi Lee Pallickara CS535 Big Data | Computer Science |

Fundamentals of Big Data BIG DATA F UN DAMEN TALS W ITH P YS PARK Upendra Devisetty Science

HOW BIG IS BIG DATA FOR AN INSURER LIKE AXA? CHALLENGES & OPPORTUNITIES Paris Big Data

From Big Data Management to Big Data Science 1 What is next? Real big data is widely available

CS535 Big Data 3/4/2020 Week 7-B Sangmi Lee Pallickara CS535 Big Data | Computer Science |

I Prefer Pi Corey Sinnamon Febuary 3, 2015 Big Day 3/14/15 Big Day 3/14/15 Themes Big

Big Data Analytics: What is Big Data? Stony Brook University CSE545, Fall 2016 the inaugural

Big Data Analytics: What is Big Data? H. Andrew Schwartz Stony Brook University CSE545, Fall

1. Motivation Or: Wise Words References: [] PHYS 6110: Math. Methods of Theoretical Physics I,

Tragic Hero 07.01.13 || English 2322: British Literature: Anglo-Saxon Mid 18th Century || D.

Testing Today we are going to talk about testing. Before you all lapse into comas in

Quantum cryptanalysis: How to break some classical cryptosystems with quantum computers? Miklos

Slides are available at http://kvf.me/man-csa 1 How to do conditional things with words Kai von

in Turkish Metehan Ouz, Burak ney, Dennis Ryan Storoshenko 2020 Meeting of the Canadian

An Analysis of An Analysis of Network Configuration Artifacts Network Configuration Artifacts

1 Using Primitive Variables Using Primitive Variables } Once you create a primitive } Once you

Sambuz

Useful Links

Newsletter

Mail Us

Big Data for Data Science Scalable Machine Learning - PowerPoint PPT Presentation

Big Data for Data Science Scalable Machine Learning event.cwi.nl/lsde A SHORT INTRODUCTION TO NEURAL NETWORKS credits: cs231n.stanford.edu; Fei-Fei Li, Justin Johnson, Serena Yeung event.cwi.nl/lsde Example: Image Recognition Input Image

Machine Learning Anders Holst SICS Big Data Analytics Analysis Big Data Big Value Big Data

CS535 Big Data 1/22/2020 Sangmi Lee Pallickara CS535 Big Data | Computer Science Department

Big Data Algorithms with Medical Applications Yixin Chen Outline Challenges to big data

COMP9313: Big Data Management Introduction to Big Data Management What is big data? Tweeted by

CS535 Big Data 2/5/2020 Week 3- B Sangmi Lee Pallickara CS535 Big Data | Computer Science |

CS535 Big Data 1/29/2020 Week 2- B Sangmi Lee Pallickara CS535 Big Data | Computer Science |

CS535 Big Data 2/10/2019 Week 4-A Sangmi Lee Pallickara CS535 Big Data | Computer Science |

CS535 Big Data 4/13/2020 Week 12-A Sangmi Lee Pallickara CS535 Big Data | Computer Science |

CS535 Big Data 4/27/2020 Week 14-A Sangmi Lee Pallickara CS535 Big Data | Computer Science |

Fundamentals of Big Data BIG DATA F UN DAMEN TALS W ITH P YS PARK Upendra Devisetty Science

HOW BIG IS BIG DATA FOR AN INSURER LIKE AXA? CHALLENGES &amp; OPPORTUNITIES Paris Big Data

From Big Data Management to Big Data Science 1 What is next? Real big data is widely available

CS535 Big Data 3/4/2020 Week 7-B Sangmi Lee Pallickara CS535 Big Data | Computer Science |

I Prefer Pi Corey Sinnamon Febuary 3, 2015 Big Day 3/14/15 Big Day 3/14/15 Themes Big

Big Data Analytics: What is Big Data? Stony Brook University CSE545, Fall 2016 the inaugural

Big Data Analytics: What is Big Data? H. Andrew Schwartz Stony Brook University CSE545, Fall

1. Motivation Or: Wise Words References: [] PHYS 6110: Math. Methods of Theoretical Physics I,

Tragic Hero 07.01.13 || English 2322: British Literature: Anglo-Saxon Mid 18th Century || D.

Testing Today we are going to talk about testing. Before you all lapse into comas in

Quantum cryptanalysis: How to break some classical cryptosystems with quantum computers? Miklos

Slides are available at http://kvf.me/man-csa 1 How to do conditional things with words Kai von

in Turkish Metehan Ouz, Burak ney, Dennis Ryan Storoshenko 2020 Meeting of the Canadian

An Analysis of An Analysis of Network Configuration Artifacts Network Configuration Artifacts

1 Using Primitive Variables Using Primitive Variables } Once you create a primitive } Once you

Sambuz

Useful Links

Newsletter

Mail Us

HOW BIG IS BIG DATA FOR AN INSURER LIKE AXA? CHALLENGES & OPPORTUNITIES Paris Big Data