Introduction to Deep Learning and Tensorflow Day 3 MR ST LF FS - PowerPoint PPT Presentation

Introduction to Deep Learning and Tensorflow Day 3 MR ST LF FS SO IB CINECA Roma - SCAI Department M arco R orro S tefano T agliaventi L uca F erraro F rancesco S alvadore S ergio O rlandini I sabella B accarelli Rome, 7th-9th Nov 2018

Outline AE Best practices 1 Autoencoders Large-Scale Deep Learning 2 Best practices DAVIDE Jupyter on DAVIDE 3 Large-Scale Deep Learning GAN Objective Algorithm 4 DAVIDE References 5 Jupyter on DAVIDE 6 GAN Objective Algorithm 7 References

Autoencoders AE Best • A basic autoencoder is a neural network that is trained to attempt to copy practices its input to its output. Large-Scale Deep • as such it is usually considered an unsupervised learning algorithm Learning • Internally, it has a hidden layer h that describes a code ( internal DAVIDE representation ) used to represent the input. Jupyter on DAVIDE • The network may be viewed as consisting of two parts GAN • an encoder function h = f ( x ) Objective Algorithm • a decoder that produces a reconstruction r = g ( h ) . References

Autoencoders sketch AE Best practices Large-Scale Deep Learning DAVIDE Jupyter on DAVIDE GAN Objective Algorithm References

AE applications AE Best practices Large-Scale Deep Learning DAVIDE • If an autoencoder succeeds in simply learning to set g ( f ( x )) = x Jupyter on everywhere, it does not seem especially useful DAVIDE • However, (different types of) autoencoders have several applications: GAN Objective • compression Algorithm • dimensionality reduction (to be used to feed other networks) References • feature extraction (possibly before feeding to other networks) • unsupervised pre-training • noise reduction • anomaly detection

Undercomplete AE AE Best practices Large-Scale • If the h is constrained to have a smaller dimension than x – undercomplete Deep Learning autoencoder – the autoencoder can only produce an approximate DAVIDE reconstruction of x Jupyter on • ...often we are not interested in the output of the decoder but in the the DAVIDE code output of the encoding part h GAN Objective • In such case the code h Algorithm • is a compressed version of x References • can capture the most salient features of the training data • The learning process tries to minimize the loss function which penalizes the reconstruction for being dissimilar from the input L ( x , g ( f ( x ))) • Nonlinear autoencoders somehow learn a nonlinear generalization of PCA

Regularized AE AE Best practices Large-Scale Deep • If the dimensionality of the code is too high – equal to input or larger than Learning DAVIDE input ( overcomplete ) – it can completely store the input information (even Jupyter on using linear encoder and decoder) but may fail to learn anything useful DAVIDE • Rather than limiting the model capacity by keeping the code size small, GAN regularized autoencoders use a loss function that encourages the model Objective Algorithm to have other properties besides the ability to copy its input to its output References • sparsity of the representation • smallness of the derivative of the representation • robustness to noise • robustness to missing inputs • A regularized autoencoder can be nonlinear and overcomplete but still learn something useful about the data distribution

Sparse AE • A sparse autoencoder is simply an autoencoder whose training criterion AE Best involves a sparsity penalty Ω( h ) on the code layer h , in addition to the practices reconstruction error: Large-Scale Deep L ( x , g ( f ( x ))) + Ω( h ) Learning where typically we have h = f ( x ) , the encoder output. DAVIDE • Sparse autoencoders are typically used to learn features for another task Jupyter on DAVIDE such as classification. GAN Objective • The penalty Ω( h ) can be seen as a regularizer term added to a Algorithm feedforward network whose primary task is to copy the input to the output References (unsupervised learning objective) and possibly also perform some supervised task (with a supervised learning objective) that depends on these sparse features. • Unlike other regularizers such as weight decay, there is not a straightforward Bayesian interpretation to this regularizer. • One way to achieve actual zeros in h for sparse (and denoising) autoencoders is to use rectified linear units to produce the code layer. With a prior that actually pushes the representations to zero (like the absolute value penalty), one can thus indirectly control the average number of zeros in the representation.

Denoising AE • Rather than adding a penalty Ω to the cost function, we can obtain an AE Best autoencoder that learns something useful by changing the reconstruction practices error term of the cost function. Large-Scale Deep • A denoising autoencoder or DAE instead minimizes Learning DAVIDE L ( x , g ( f (˜ x ))) Jupyter on DAVIDE where ˜ x is a copy of x that has been corrupted by some form of noise. GAN Objective • Denoising autoencoders must therefore undo this corruption rather than Algorithm simply copying their input. References

Denoising AE sketch • Not only denoising autoencoders are used to remove noise: in order to AE force the hidden layer to discover more robust features and prevent it from Best practices simply learning the identity, we train the autoencoder to reconstruct the Large-Scale input from an artificially corrupted version of it. Deep Learning DAVIDE Jupyter on DAVIDE GAN Objective Algorithm References

Deep or shallow? AE Best • Autoencoders are just feedforward networks. The same loss functions and practices output unit types that can be used for traditional feedforward networks are Large-Scale Deep also used for autoencoders. Learning • Autoencoders are often trained with only a single layer encoder and a DAVIDE single layer decoder. However, this is not a requirement. In fact, using Jupyter on DAVIDE deep encoders and decoders offers many advantages. GAN • Depth can exponentially reduce the computational cost of representing Objective Algorithm some functions. References • Depth can also exponentially decrease the amount of training data needed to learn some functions. • Experimentally, deep autoencoders yield much better compression than corresponding shallow or linear autoencoders • A common strategy for training a deep autoencoder is to greedily pretrain the deep architecture by training a stack of shallow autoencoders, so we often encounter shallow autoencoders, even when the ultimate goal is to train a deep autoencoder.

Anomaly detection using AE - Part I AE Best practices Large-Scale Deep • Example dataset: 280000 instances of credit card use and for each Learning DAVIDE transaction the classification fraudulent/legit Jupyter on • The goal: for a new transaction predict if fraudulent or not DAVIDE • Some critical issues: GAN Objective • as a classification problem there is lack of large fraudulent training set Algorithm • marking every transaction as non-fraud would lead to > 99% accuracy (we References need a different metrics) • no single parameter analysis helps in finding out the fraud transactions • Idea: use AE training the model on the normal (non-fraud) transactions • Once the model is trained, in order to predict whether or not a new/unseen transaction is normal or fraudulent, we calculate the reconstruction error

Anomaly detection using AE - Part II AE Best practices • If the error is larger than a predefined threshold, we will mark it as a fraud Large-Scale Deep • depending on the threshold we will have different quality of results Learning • the so called confusion matrix helps summarizing the quality of results DAVIDE Jupyter on DAVIDE GAN Objective Algorithm References

Accuracy vs Precision vs Recall AE Best Accuracy = Number of correct predictions practices = Large-Scale Total number of predictions Deep Learning TP+TN = DAVIDE TP+TN+FP+FN Jupyter on DAVIDE TP GAN Precision = Objective TP + FP Algorithm References TP Recall = TP + FN • Accuracy alone doesn’t tell the full story when you’re working with a class-imbalanced data set • assuming trivial always predominant model gives high accuracy, but it is useless

Variational autoencoders (ideas) - 1 • Variational Autoencoders (VAEs) are powerful generative models now AE having applications as diverse as Best • generate a random, new output, that looks similar to the training data (e.g. practices generating fake human faces, producing purely synthetic music) Large-Scale • alter, or explore variations on data you already have, and not just in a random Deep Learning way either, but in a desired, specific direction DAVIDE • The fundamental problem with autoencoders, for generation, is that the Jupyter on latent space where their encoded vectors lie, may not be continuous, or DAVIDE allow easy interpolation. GAN Objective Algorithm References

Introduction to Deep Learning and Tensorflow Day 3 MR ST LF FS - PowerPoint PPT Presentation

Introduction to Deep Learning and Tensorflow Day 3 MR ST LF FS SO IB CINECA Roma - SCAI Department M arco R orro S tefano T agliaventi L uca F erraro F rancesco S alvadore S ergio O rlandini I sabella B accarelli Rome, 7th-9th Nov 2018

C-FX-02-V1.0 DSV 4.0 2 45 15 TensorFlow TensorBoard TensorFlow

Getting Started with TensorFlow Part I: TensorFlow Graphs and Sessions Nick Winovich Department

TensorFlow: a Framework for Scalable Machine Learning ACM Learning Center, 2016 You probably

TensorFlow: neural networks lab Paolo Dragone and Andrea Passerini paolo.dragone@unitn.it

A Trip Through the NGC TensorFlow Container GTC 2019 S9256 AGENDA A Trip Through the TensorFlow

Distributed TensorFlow Stony Brook University CSE545, Fall 2017 Goals Understand

TensorFlow w/XLA: TensorFlow, Compiled! Expressiveness with performance Pre-release

Some resources for ML/TensorFlow TensorFlow resources A good tutorial (about 2:40:00 long)

TensorFlow Probability Joshua V. Dillon Software Engineer Google Research What is TensorFlow

Machine learning on mobile and edge devices with TensorFlow Lite Developer advocate for

TensorFlow Flexible, Scalable, Portable Rajat Monga Engineering Director, TensorFlow Released

Tensorflow 2.x Review Session CS330: Deep Multi-task and Meta Learning 9/17/2019 Rafael

Getting Started with TensorFlow Part II: Monitoring Training and Validation Nick Winovich

TensorFlow Extended (TFX) An End-to-End ML Platform Clemens Mewald TensorFlow Extended (TFX): An

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Government 320: Public Opinion and Public Choice Spring 2007 Tuesday and Thursday 2:554:10 (MG

Crops and Climate Len Coop, OSU Integrated Plant Protection Center & Botany Plant Pathology

Enhancing Security of Linux-based Android Devices Aubrey-Derrick Schmidt, Hans-Gunther Schmidt,

Measurement of the Evolution of Reactor Antineutrino Flux and Spectrum at Daya Bay Phys. Rev.

We use Blue Waters to: Replicate 4-D evolution of mantle dynamics PI: Lijun Liu Members: Jiashun

Worldline colour fields and quantum field theory James P. Edwards ICNFP Crete Aug 2017 Based on

TDDD04: Test plans and software defect taxonomies Lena Buffoni lena.buffoni@liu.se 3 Lecture

DATA FORWARD ADVANCING BUSINESS AND MARKETING GOALS THROUGH ANALYTICS TOPICS FOR DISCUSSION

Introduction to Deep Learning and Tensorflow Day 3 MR ST LF FS - PowerPoint PPT Presentation

Introduction to Deep Learning and Tensorflow Day 3 MR ST LF FS SO IB CINECA Roma - SCAI Department M arco R orro S tefano T agliaventi L uca F erraro F rancesco S alvadore S ergio O rlandini I sabella B accarelli Rome, 7th-9th Nov 2018

C-FX-02-V1.0 DSV 4.0 2 45 15 TensorFlow TensorBoard TensorFlow

Getting Started with TensorFlow Part I: TensorFlow Graphs and Sessions Nick Winovich Department

TensorFlow: a Framework for Scalable Machine Learning ACM Learning Center, 2016 You probably

TensorFlow: neural networks lab Paolo Dragone and Andrea Passerini paolo.dragone@unitn.it

A Trip Through the NGC TensorFlow Container GTC 2019 S9256 AGENDA A Trip Through the TensorFlow

Distributed TensorFlow Stony Brook University CSE545, Fall 2017 Goals Understand

TensorFlow w/XLA: TensorFlow, Compiled! Expressiveness with performance Pre-release

Some resources for ML/TensorFlow TensorFlow resources A good tutorial (about 2:40:00 long)

TensorFlow Probability Joshua V. Dillon Software Engineer Google Research What is TensorFlow

Machine learning on mobile and edge devices with TensorFlow Lite Developer advocate for

TensorFlow Flexible, Scalable, Portable Rajat Monga Engineering Director, TensorFlow Released

Tensorflow 2.x Review Session CS330: Deep Multi-task and Meta Learning 9/17/2019 Rafael

Getting Started with TensorFlow Part II: Monitoring Training and Validation Nick Winovich

TensorFlow Extended (TFX) An End-to-End ML Platform Clemens Mewald TensorFlow Extended (TFX): An

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Government 320: Public Opinion and Public Choice Spring 2007 Tuesday and Thursday 2:554:10 (MG

Crops and Climate Len Coop, OSU Integrated Plant Protection Center &amp; Botany Plant Pathology

Enhancing Security of Linux-based Android Devices Aubrey-Derrick Schmidt, Hans-Gunther Schmidt,

Measurement of the Evolution of Reactor Antineutrino Flux and Spectrum at Daya Bay Phys. Rev.

We use Blue Waters to: Replicate 4-D evolution of mantle dynamics PI: Lijun Liu Members: Jiashun

Worldline colour fields and quantum field theory James P. Edwards ICNFP Crete Aug 2017 Based on

TDDD04: Test plans and software defect taxonomies Lena Buffoni lena.buffoni@liu.se 3 Lecture

DATA FORWARD ADVANCING BUSINESS AND MARKETING GOALS THROUGH ANALYTICS TOPICS FOR DISCUSSION

Crops and Climate Len Coop, OSU Integrated Plant Protection Center & Botany Plant Pathology