Introduction to Big Data and Machine Learning Neural Networks - PowerPoint PPT Presentation

Introduction to Big Data and Machine Learning Neural Networks Slides primarily based on Ch 5. PRML book by Bishop Dr. Mihail November 5, 2019 (Dr. Mihail) Intro Big Data November 5, 2019 1 / 25

Neural Networks History Origins in attempts to find mathematical representations of information processing in biological systems (cca. 1943 McCulloch and Pitts) In essence: input goes through a sequence of transformations using a fixed number of basis functions Many (more each day) variations exist, that are domain specific (e.g., convolutional neural networks, recurrent neural networks, etc.) (Dr. Mihail) Intro Big Data November 5, 2019 2 / 25

Neural Networks Feedforward Neural Networks We begin with a reminder of models for generalized linear regression and classification, based on linear combinations of fixed non-linear basis functions φ j ( x ) M � y ( x , w ) = f ( w j φ j ( x )) (1) j = 1 where f is a nonlinear activation function in case of classification, and is the identity in case of regression Neural networks extend this model by making the basis functions φ j ( x ) depend on parameters and to allow these parameters to be adjusted, along with the coefficients { w j } during training (Dr. Mihail) Intro Big Data November 5, 2019 3 / 25

Basic Neural Networks Basics A series of functional transformations First, we construct M linear combinations of the input variables x 1 , . . . , x D in the form D � w ( 1 ) j 0 x i + w ( 1 ) a j = (2) j 0 i = 1 where j = 1 , . . . , M and the superscript ( 1 ) indicates that the corresponding parameters are in the first “layer” of the network We shall refer to parameters w ( 1 ) as weights and the parameters w ( 1 ) ji j 0 as biases (Dr. Mihail) Intro Big Data November 5, 2019 4 / 25

Neural Networks Basics The parameters a j are known as activations , each of them transformed using a differentiable, nonlinear activation function h to give z j = h ( a j ) (3) These quantities correspond to the outputs of the basis functions φ , where in the context of NNs are called hidden units The nonlinear funcions h are chosen based on domain specific applications (e.g., ReLU, sigmoid, tanh), whose values are combined linearly to give output unit activations M w ( 2 ) kj z j + w ( 2 ) � a k = (4) k 0 j = 1 where k = 1 , . . . , K and K is the total number of outputs (Dr. Mihail) Intro Big Data November 5, 2019 5 / 25

Neural Networks Output Finally, the output unit activations are transformed using an appropriate activation function, to give a set of network outputs y k For standard regression problems, the activation can be the identity y k = a k For binary classification problems, each output is transformed using a logistic sigmoid function y k = σ ( a k ) (5) where 1 σ ( a ) = (6) 1 + exp ( − a ) For multiclass problems, softmax can be used (Dr. Mihail) Intro Big Data November 5, 2019 6 / 25

Neural Networks Example (Dr. Mihail) Intro Big Data November 5, 2019 7 / 25

Neural Networks Model We can combine these various stages to give the overall network: M D w ( 2 ) w ( 1 ) x i + w ( 1 ) j 0 ) + w ( 2 ) � � y k ( x , w ) = σ ( kj h ( k 0 ) (7) ji j = 1 i = 1 The NN model is simply a nonlinear function from a set of input variables { x i } to a set of output variables { y k } controlled by a vector w of adjustable parameters The function shown in the figure in the previous slide can then be interpreted as a forward propagation of information through the network (Dr. Mihail) Intro Big Data November 5, 2019 8 / 25

Neural Networks Nomenclature These models first came to be known as multilayer perceptron (MLP) due to multiple uses of nonlinearities through the layers The “deep” models simply refer that there are many such layers of nonlinearities These functions are differentiable w.r.t. network parameters, a key important fact for training (Dr. Mihail) Intro Big Data November 5, 2019 9 / 25

Neural Networks Number of internal (hidden) nodes If activations are linear, and the number of internal nodes are greater than both input and output nodes, there is an equivalent linear function If the activations are linear, and the number of hidden units are less than the input or the output nodes, this can be shown to be related to PCA dimensionality reduction (Dr. Mihail) Intro Big Data November 5, 2019 10 / 25

Training Neural Networks Training NNs are a general class of parametric, nonlinear functions from a vector x to a vector y (or t for target, as used in previous lectures) of output variables One way to think about learning, is to think of “fitting” the model from the inputs x to the targets t , by minimizing some error function: N E ( w ) = 1 � || y ( x n , w ) − t n || 2 (8) 2 n = 1 Many ways to optimize the above function, but | w | in modern networks can be millions (Dr. Mihail) Intro Big Data November 5, 2019 11 / 25

Neural Networks Optimizing E The task: find w that minimizes E ( w ) Looking at this geometrically, w define an error surface E ( w ) w 1 w A w B w C w 2 ∇ E (Dr. Mihail) Intro Big Data November 5, 2019 12 / 25

Neural Networks Gradually minimize Take small steps in the weight space from w to w + δ w When doing so, δ E ≈ δ w T ∇ E ( w ) , where ∇ E is a vector that points in the direction of the greatest rate of increase of E Smallest E will occur where the gradient vanishes ∇ E ( w ) = 0 (9) Analytic solution is not feasible (network with M hidden units, each point in weight space is a member of a family of M ! 2 M equivalent points) Compromise: can take small steps in the direction of −∇ E ( W ) (Dr. Mihail) Intro Big Data November 5, 2019 13 / 25

Neural Networks Gradient descent algorithm Initialize with some random w ( 0 ) Iterate: w ( τ + 1 ) = w ( τ ) + ∆ w ( τ ) until convergence MANY algorithms (solvers) exist to do this efficiently and find good solutions Outside of lecture scope: analytical computation of ∂ E n ∂ w ji , done by repeated application of chain rule (Dr. Mihail) Intro Big Data November 5, 2019 14 / 25

Neural Networks Convolutional Neural Networks A special type of feedforward network architecture, where the input is an image (matrix or tensor, for color images) Why? To achieve invariance typical of imagery (e.g., translation, rotation, scale) (Dr. Mihail) Intro Big Data November 5, 2019 15 / 25

Neural Networks Convolution in CNNs Simply a dot product, element wise multiplication of a small matrix (kernel) with part of an image (Dr. Mihail) Intro Big Data November 5, 2019 16 / 25

Neural Networks CNN example (Dr. Mihail) Intro Big Data November 5, 2019 17 / 25

Neural Networks Another CNN example (Dr. Mihail) Intro Big Data November 5, 2019 18 / 25

Neural Networks Another CNN example (Dr. Mihail) Intro Big Data November 5, 2019 19 / 25

Fully convolutional Neural Networks Fully Convolutional NN (Dr. Mihail) Intro Big Data November 5, 2019 20 / 25

Neural Networks Implementations Tens of popular ones, most widely known: Tensorflow 1 Keras 2 Caffee 3 Torch 4 DeepPy 5 (Dr. Mihail) Intro Big Data November 5, 2019 21 / 25

Introduction to Big Data and Machine Learning Neural Networks - PowerPoint PPT Presentation

Introduction to Big Data and Machine Learning Neural Networks Slides primarily based on Ch 5. PRML book by Bishop Dr. Mihail November 5, 2019 (Dr. Mihail) Intro Big Data November 5, 2019 1 / 25 Neural Networks History Origins in attempts

Machine Learning Anders Holst SICS Big Data Analytics Analysis Big Data Big Value Big Data

Neural Machine Translation Gongbo Tang 8 October 2018 Outline Neural Machine Translation 1

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Introduction to Neural Machine Translation Gongbo Tang 16 September 2019 Outline Why Neural

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

x ? Machine Learning 5/4/20 Tim Althoff, UW CS547: Machine Learning for Big Data,

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Big Data Algorithms with Medical Applications Yixin Chen Outline Challenges to big data

COMP9313: Big Data Management Introduction to Big Data Management What is big data? Tweeted by

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

Neural Machine Translation II Refinements Philipp Koehn 17 October 2017 Philipp Koehn Machine

Differential Privacy Machine Learning Li Xiong Big Data + Machine Learning + Machine

CS535 Big Data 1/22/2020 Sangmi Lee Pallickara CS535 Big Data | Computer Science Department

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

DS504/CS586: Big Data Analytics --Introduction & Logistics Prof. Yanhua Li Time: 6:00pm

Responder Struck-By Fatalities SURVIVAL ALERT UPDATE Total of 44 Responder Struck-By

Bohrs inequality for harmonic mappings Stavros Evdoridis Dept. of Mathematics and Systems

Transitioning to Integrated Instruction TESD September 29, 2020 Elementary Schools Week One -

Integral D-finite Functions Manuel Kauers Institute for Algebra Johannes Kepler University Linz,

School Year 2019-2020 Update Sam Jordan Grants Administrator Who we are... Sam Jordan Kathy

The Role of UAF in the Development of Alaskas Mineral Resources Rajive Ganguli, PhD, PE

Interpolation Introduction 1. For analyzing functions f ( x ) , say finding minima, we use a

Introduction to Big Data and Machine Learning Neural Networks - PowerPoint PPT Presentation

Introduction to Big Data and Machine Learning Neural Networks Slides primarily based on Ch 5. PRML book by Bishop Dr. Mihail November 5, 2019 (Dr. Mihail) Intro Big Data November 5, 2019 1 / 25 Neural Networks History Origins in attempts

Machine Learning Anders Holst SICS Big Data Analytics Analysis Big Data Big Value Big Data

Neural Machine Translation Gongbo Tang 8 October 2018 Outline Neural Machine Translation 1

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Introduction to Neural Machine Translation Gongbo Tang 16 September 2019 Outline Why Neural

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

x ? Machine Learning 5/4/20 Tim Althoff, UW CS547: Machine Learning for Big Data,

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Big Data Algorithms with Medical Applications Yixin Chen Outline Challenges to big data

COMP9313: Big Data Management Introduction to Big Data Management What is big data? Tweeted by

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

Neural Machine Translation II Refinements Philipp Koehn 17 October 2017 Philipp Koehn Machine

Differential Privacy Machine Learning Li Xiong Big Data + Machine Learning + Machine

CS535 Big Data 1/22/2020 Sangmi Lee Pallickara CS535 Big Data | Computer Science Department

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

DS504/CS586: Big Data Analytics --Introduction &amp; Logistics Prof. Yanhua Li Time: 6:00pm

Responder Struck-By Fatalities SURVIVAL ALERT UPDATE Total of 44 Responder Struck-By

Bohrs inequality for harmonic mappings Stavros Evdoridis Dept. of Mathematics and Systems

Transitioning to Integrated Instruction TESD September 29, 2020 Elementary Schools Week One -

Integral D-finite Functions Manuel Kauers Institute for Algebra Johannes Kepler University Linz,

School Year 2019-2020 Update Sam Jordan Grants Administrator Who we are... Sam Jordan Kathy

The Role of UAF in the Development of Alaskas Mineral Resources Rajive Ganguli, PhD, PE

Interpolation Introduction 1. For analyzing functions f ( x ) , say finding minima, we use a

DS504/CS586: Big Data Analytics --Introduction & Logistics Prof. Yanhua Li Time: 6:00pm