Getuing Starued with TensorFlow on GPUs Magnus Hyttsten - PowerPoint PPT Presentation

Getuing Starued with TensorFlow on GPUs Magnus Hyttsten @MagnusHyttsten 1

Agenda +

An Awkward Social Experiment (that I'm afraid you will be paru of...)

ROCKS!

"GTC" Input Data Examples (Train & Test Data) Model <Awkward Output (Your Silence> Brain)

"GTC" Labels Input Data (Correct Answers) Examples (Train & Test Data) "Rocks" Model Output Loss (Your function Brain) Optimizer

"GTC" Labels Input Data (Correct Answers) Examples (Train & Test Data) "Rocks" "Rocks" Model Output Loss (Your function Brain) Optimizer

Input Data "Classical" Output Data + Programming Code Input Data Machine Learning Code + Output Data

TensorFlow 2.0 Alpha is out Easy Powergul Scalable Simplified APIs. Flexibility and performance. Tested at Google-scale. Focused on Keras and Power to do cutting edge research Deploy everywhere eager execution and scale to > 1 exaflops

tf.data (Dataset) tf.feature_column (Transfer Learning) High-level APIs Perform Distributed Training (talk @1pm) E.g. V100

Built to Distribute and Scale Premade Estimators LinearClassifier DNNLinearCombinedClassifier BoostedTreeClassifier DNNClassifier DNNLinearCombinedRegressor BoostedTreeRegressor BaselineClassifier LinearRegressor BaselineRegressor DNNRegressor calls input_fn (Datasets, tf.data)

Premade Estimators Premade Estimators LinearRegressor(...) Datasets LinearClassifier(...) DNNRegressor(...) DNNClassifier(...) DNNLinearCombinedRegressor(...) DNNLinearCombinedClassifier(...) estimator = BaselineRegressor(...) BaselineClassifier(...) BoostedTreeRegressor(...) BoostedTreeClassifier(...) # Train locally estimator.train ( input_fn=..., ... estimator.evaluate( input_fn=..., ...) Datasets estimator.predict ( input_fn=..., ...)

Premade Estimator - Wide & Deep wide_columns = [ tf.feature_column.bucketized_column( 'age',=[18, 27, 40, 65])] deep_columns = [ tf.feature_column.numeric_column('visits'), tf.feature_column.numeric_column('clicks')] tf.estimator.DNNLinearCombinedClassifier( linear_feature_columns=wide_columns, dnn_feature_columns=deep_columns, dnn_hidden_units=[100, 75, 50, 25])

tf.data (Dataset) tf.feature_column (Transfer Learning) Perform Distributed Training E.g. V100

Custom Models tf.keras.layers tf.keras model = tf.keras.models.Sequential([ tf.keras.layers.Flatten(), tf.keras.layers.Dense(512, activation='relu'), tf.keras.layers.Dropout(0.2), tf.keras.layers.Dense(10, activation='softmax') ]) model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy']) model.fit (dataset, epochs=5) Datasets model.evaluate(dataset) model.predict (dataset)

TensorFlow Datasets import tensorflow_datasets as tfds ● 30+ available train_ds = tfds.load( "imdb_reviews" , ● Add your own split="train", as_supervised=True) audio ● ● translate "nsynth" ○ structured ● "wmt_translate_ende" ○ image "titanic" ● ○ ○ "wmt_translate_enfr" "celeb_a" ○ text ● video ● "cifar10" "imdb_reviews" ○ ○ ○ "bair_robot_pushing_small" "coco2014" ○ "lm1b" ○ "moving_mnist" ○ "diabetic_retinopathy_detection" "squad" ○ ○ ○ "starcrafu_video" "imagenet2012" ○ "mnist" ○ "open_images_v4" ○

TensorFlow Summary ● Datasets (tf.data) for the input pipeline a. TensorFlow Datasets is great b. tf.feature_columns are cool too ● Premade Estimators Keras Models (tf.keras) ●

The V-100 And why is it so good @ Machine Learning???

Disclaimer High-Level - We look at only parts of the power of GPUs ● ● Simple Overview - More optimal designs exist Reduced Scope - Only considering fully-connected layers, etc ●

Strengths of V100 Built for Massively Parallel Computations ● ● Specific hardware / software to manage Deep Learning Workloads (Tensor Cores, mixed-precision execution, etc)

Strengths of V100 Built for Massively Parallel Computations ● ● Specific hardware / software to manage Deep Learning Workloads (Tensor Cores, mixed-precision execution, etc) Tesla SXM V100 ● 5376 cores (FP32)

My Questions Around the GPU What are we going to do with 5376 FP32 cores?

The Unsatisfactory Answer What are we going to do with 5376 FP32 cores? "Execute things in parallel"!

What are we going to do with 5376 FP32 cores? "Execute things in parallel"! Yes, but how can we exactly do that for ML Workloads?

What are we going to do with 5376 FP32 cores? "Execute things in parallel"! Yes, but how can we exactly do that for ML Workloads? "Hey, that's your job - That's why we're here listening"!

What are we going to do with 5376 FP32 cores? "Execute things in parallel"! Yes, but how can we exactly do that for ML Workloads? "Hey, that's your job - That's why we're here listening"! Alright, let me try to talk about that then

We may have a huge number of layers ● Each layer can have huge number of neurons ● --> There may be 100s millions or even billions * and + ops All knobs are W values that we need to tune So that given a certain input, they generate the correct output

"Matrix Multiplication is EATING (the computing resources of) THE WORLD" h i_j = [X 0 , X 1 , X 2, ... ] * [W 0 , W 1 , W 2, ... ] h i_j = X 0 *W 0 + X 1 *W 1 + X 2 *W 2 + ...

Matmul X = [1.0, 2.0, ..., 256.0] # Let's say we have 256 input values W = [0.1, 0.1, ..., 0.1] # Then we need to have 256 weight values h 0,0 = X * W # [1*0.1 + 2*0.1 + ... + 256*0.1] == 32389.6

Single-threaded Execution

Single-threaded Execution X = [1.0, 2.0, ..., 256.0] # Let's say we have 256 input values W = [0.1, 0.1, ..., 0.1] # Then we need to have 256 weight values h 0,0 = X * W # [1*0.1 + 2*0.1 + ... + 256*0.1] == 32389.6 X W [ [ 1 0.1 2 0.1 . . * . . . . 256 0.1 [ [

Single-threaded Execution X = [1.0, 2.0, ..., 256.0] # Let's say we have 256 input values W = [0.1, 0.1, ..., 0.1] # Then we need to have 256 weight values h 0,0 = X * W # [1*0.1 + 2*0.1 + ... + 256*0.1] == 32389.6 X W [ [ 1*0.1 = 0.1 1 0.1 2 0.1 . . * . . . . 256 0.1 [ [

Single-threaded Execution X = [1.0, 2.0, ..., 256.0] # Let's say we have 256 input values W = [0.1, 0.1, ..., 0.1] # Then we need to have 256 weight values h 0,0 = X * W # [1*0.1 + 2*0.1 + ... + 256*0.1] == 32389.6 X Prev W [ [ 1*0.1 = 0.1 1 0.1 2 0.1 0.1 . . * . . . . 256 0.1 [ [

Single-threaded Execution X = [1.0, 2.0, ..., 256.0] # Let's say we have 256 input values W = [0.1, 0.1, ..., 0.1] # Then we need to have 256 weight values h 0,0 = X * W # [1*0.1 + 2*0.1 + ... + 256*0.1] == 32389.6 X Prev W [ [ 1*0.1 = 0.1 1 0.1 2 0.1 0.1 + 2*0.1 = 0.3 . . * . . . . 256 0.1 [ [

Single-threaded Execution X = [1.0, 2.0, ..., 256.0] # Let's say we have 256 input values W = [0.1, 0.1, ..., 0.1] # Then we need to have 256 weight values h 0,0 = X * W # [1*0.1 + 2*0.1 + ... + 256*0.1] == 32389.6 X Prev W [ [ 1*0.1 = 0.1 1 0.1 2 0.1 0.1 + 2*0.1 = 0.3 . . . * . . . 3238.5+255*0.1 = 3264 . . 256 0.1 3264 + 256*0.1 = 3289.6 [ [

Single-threaded Execution X = [1.0, 2.0, ..., 256.0] # Let's say we have 256 input values W = [0.1, 0.1, ..., 0.1] # Then we need to have 256 weight values h 0,0 = X * W # [1*0.1 + 2*0.1 + ... + 256*0.1] == 32389.6 X Prev W Single-threaded [ [ Execution 1*0.1 = 0.1 1 0.1 2 0.1 0.1 + 2*0.1 = 0.3 256 * t . . . * . . . 3238.5+255*0.1 = 3264 . . 256 0.1 3264 + 256*0.1 = 3289.6 [ [

GPU Execution

GPU - #1 Multiplication Step X = [1.0, 2.0, ..., 256.0] # Let's say we have 256 input values W = [0.1, 0.1, ..., 0.1] # Then we need to have 256 weight values h 0,0 = X * W # [1*0.1 + 2*0.1 + ... + 256*0.1] == 32389.6 X W [ [ 1 0.1 2 0.1 . . * . . . . 256 0.1 [ [

GPU - #1 Multiplication Step X = [1.0, 2.0, ..., 256.0] # Let's say we have 256 input values W = [0.1, 0.1, ..., 0.1] # Then we need to have 256 weight values h 0,0 = X * W # [1*0.1 + 2*0.1 + ... + 256*0.1] == 32389.6 X W [ [ 1 0.1 Tesla SXM V100 2 0.1 5376 cores (FP32) . . * . . . . 256 0.1 [ [

GPU - #1 Multiplication Step X = [1.0, 2.0, ..., 256.0] # Let's say we have 256 input values W = [0.1, 0.1, ..., 0.1] # Then we need to have 256 weight values h 0,0 = X * W # [1*0.1 + 2*0.1 + ... + 256*0.1] == 32389.6 X W [ [ 1 0.1 2 0.1 . . * . . . . 256 0.1 [ [

Getuing Starued with TensorFlow on GPUs Magnus Hyttsten - PowerPoint PPT Presentation

Getuing Starued with TensorFlow on GPUs Magnus Hyttsten @MagnusHyttsten 1 Agenda + An Awkward Social Experiment (that I'm afraid you will be paru of...) ROCKS! "GTC" Input Data Examples (Train & Test Data) Model

C-FX-02-V1.0 DSV 4.0 2 45 15 TensorFlow TensorBoard TensorFlow

Getting Started with TensorFlow Part I: TensorFlow Graphs and Sessions Nick Winovich Department

A Trip Through the NGC TensorFlow Container GTC 2019 S9256 AGENDA A Trip Through the TensorFlow

Distributed TensorFlow Stony Brook University CSE545, Fall 2017 Goals Understand

TensorFlow w/XLA: TensorFlow, Compiled! Expressiveness with performance Pre-release

TensorFlow: a Framework for Scalable Machine Learning ACM Learning Center, 2016 You probably

TensorFlow: neural networks lab Paolo Dragone and Andrea Passerini paolo.dragone@unitn.it

Some resources for ML/TensorFlow TensorFlow resources A good tutorial (about 2:40:00 long)

Why use GPUs for graph processing? FOSDEM 2020 2 GPUs and Graphs Graphs GPUs Found

Cutuing Edge TensorFlow Keras Tuner: hyperuuning for humans Elie Bursztein Google, @elie ? ?

Machine learning on mobile and edge devices with TensorFlow Lite Developer advocate for

TensorFlow Extended (TFX) An End-to-End ML Platform Clemens Mewald TensorFlow Extended (TFX): An

TensorFlow Probability Joshua V. Dillon Software Engineer Google Research What is TensorFlow

Getting Started with TensorFlow Part II: Monitoring Training and Validation Nick Winovich

TensorFlow Flexible, Scalable, Portable Rajat Monga Engineering Director, TensorFlow Released

Tensorflow 2.x Review Session CS330: Deep Multi-task and Meta Learning 9/17/2019 Rafael

@MagnusHyttsten Meet Robin Guinea Pig Meet Robin An Awkward Social Experiment (that I'm

Selection and estimation in exploratory subgroup analyses a proposal Gerd Rosenkranz, Novartis

RESULTS PRESENTATION for the year ended 28 February 2019 P RO P E RT Y F U N D

The results Official Launch of L&C Project Sunday, July 23, 2017 1 Mollymook Beach

Substantial Damage Estimator Overview Chapman 2008 Introductions And Housekeeping My name is

Control of Reaction Systems via Rate Estimation and Feedback Linearization Diogo Rodrigues,

Advances in error estimation for Advances in error estimation for homogenisation homogenisation

April 9, 2020 - 8:00-9:00 am Teleconference: (647) 951-8467 or Long Distance: 1 (844) 304 -7743

Getuing Starued with TensorFlow on GPUs Magnus Hyttsten - PowerPoint PPT Presentation

Getuing Starued with TensorFlow on GPUs Magnus Hyttsten @MagnusHyttsten 1 Agenda + An Awkward Social Experiment (that I'm afraid you will be paru of...) ROCKS! "GTC" Input Data Examples (Train & Test Data) Model

C-FX-02-V1.0 DSV 4.0 2 45 15 TensorFlow TensorBoard TensorFlow

Getting Started with TensorFlow Part I: TensorFlow Graphs and Sessions Nick Winovich Department

A Trip Through the NGC TensorFlow Container GTC 2019 S9256 AGENDA A Trip Through the TensorFlow

Distributed TensorFlow Stony Brook University CSE545, Fall 2017 Goals Understand

TensorFlow w/XLA: TensorFlow, Compiled! Expressiveness with performance Pre-release

TensorFlow: a Framework for Scalable Machine Learning ACM Learning Center, 2016 You probably

TensorFlow: neural networks lab Paolo Dragone and Andrea Passerini paolo.dragone@unitn.it

Some resources for ML/TensorFlow TensorFlow resources A good tutorial (about 2:40:00 long)

Why use GPUs for graph processing? FOSDEM 2020 2 GPUs and Graphs Graphs GPUs Found

Cutuing Edge TensorFlow Keras Tuner: hyperuuning for humans Elie Bursztein Google, @elie ? ?

Machine learning on mobile and edge devices with TensorFlow Lite Developer advocate for

TensorFlow Extended (TFX) An End-to-End ML Platform Clemens Mewald TensorFlow Extended (TFX): An

TensorFlow Probability Joshua V. Dillon Software Engineer Google Research What is TensorFlow

Getting Started with TensorFlow Part II: Monitoring Training and Validation Nick Winovich

TensorFlow Flexible, Scalable, Portable Rajat Monga Engineering Director, TensorFlow Released

Tensorflow 2.x Review Session CS330: Deep Multi-task and Meta Learning 9/17/2019 Rafael

@MagnusHyttsten Meet Robin Guinea Pig Meet Robin An Awkward Social Experiment (that I'm

Selection and estimation in exploratory subgroup analyses a proposal Gerd Rosenkranz, Novartis

RESULTS PRESENTATION for the year ended 28 February 2019 P RO P E RT Y F U N D

The results Official Launch of L&amp;C Project Sunday, July 23, 2017 1 Mollymook Beach

Substantial Damage Estimator Overview Chapman 2008 Introductions And Housekeeping My name is

Control of Reaction Systems via Rate Estimation and Feedback Linearization Diogo Rodrigues,

Advances in error estimation for Advances in error estimation for homogenisation homogenisation

April 9, 2020 - 8:00-9:00 am Teleconference: (647) 951-8467 or Long Distance: 1 (844) 304 -7743

The results Official Launch of L&C Project Sunday, July 23, 2017 1 Mollymook Beach