Dropout as a Structured Shrinkage Prior Eric Nalisnick , Jos Miguel - PowerPoint PPT Presentation

Dropout as a Structured Shrinkage Prior Eric Nalisnick , José Miguel Hernández-Lobato , Padhraic Smyth University of Cambridge University of California, Irvine

Dropout & Multiplicative Noise (2012) Standard Neural After Applying Network Dropout 2

<latexit sha1_base64="kEf5cwRXb6DTemwrlLiDzlO+Lg=">ACB3icbVDLSgMxFM3UV62vUZeCBItQcpMFXRZcOygn1AZxgymUwbmSGJCOUoTs3/obF4q49Rfc+Tem7Sy09UDgcM653NwTpowq7TjfVmldW19o7xZ2dre2d2z9w86KskJm2csET2QqQIo4K0NdWM9FJEA8Z6Yajm6nfSBS0UTc63FKfI4GgsYUI2kwD72mAlHKMjpOaQT6CnKYVor1LPArjp1Zwa4TNyCVEGBVmB/eVGCM06Exgwp1XedVPs5kpiRiYVL1MkRXiEBqRvqECcKD+f3TGBp0aJYJxI84SGM/X3RI64UmMemiRHeqgWvan4n9fPdHzt51SkmSYCzxfFGYM6gdNSYEQlwZqNDUFYUvNXiIdIqxNdRVTgrt48jLpNOruRb1xd1ltoqKOMjgCJ6AGXHAFmuAWtEAbYPAInsEreLOerBfr3fqYR0tWMXMI/sD6/AGOQZiP</latexit> Dropout & Multiplicative Noise Implementation as Multiplicative Noise : (2012) Hidden Units Weights Diagonal Matrix of Random Variables λ i,i ∼ p ( λ ) Standard Neural After Applying Network Dropout 3

<latexit sha1_base64="kEf5cwRXb6DTemwrlLiDzlO+Lg=">ACB3icbVDLSgMxFM3UV62vUZeCBItQcpMFXRZcOygn1AZxgymUwbmSGJCOUoTs3/obF4q49Rfc+Tem7Sy09UDgcM653NwTpowq7TjfVmldW19o7xZ2dre2d2z9w86KskJm2csET2QqQIo4K0NdWM9FJEA8Z6Yajm6nfSBS0UTc63FKfI4GgsYUI2kwD72mAlHKMjpOaQT6CnKYVor1LPArjp1Zwa4TNyCVEGBVmB/eVGCM06Exgwp1XedVPs5kpiRiYVL1MkRXiEBqRvqECcKD+f3TGBp0aJYJxI84SGM/X3RI64UmMemiRHeqgWvan4n9fPdHzt51SkmSYCzxfFGYM6gdNSYEQlwZqNDUFYUvNXiIdIqxNdRVTgrt48jLpNOruRb1xd1ltoqKOMjgCJ6AGXHAFmuAWtEAbYPAInsEreLOerBfr3fqYR0tWMXMI/sD6/AGOQZiP</latexit> Dropout & Multiplicative Noise Implementation as Multiplicative Noise : (2012) Hidden Units Weights Diagonal Matrix of Random Variables λ i,i ∼ p ( λ ) • Dropout corresponds to p( λ ) being Bernoulli. • Gaussian, beta, and uniform noise have Standard Neural After Applying Network Dropout been shown to work as well. 4

<latexit sha1_base64="kEf5cwRXb6DTemwrlLiDzlO+Lg=">ACB3icbVDLSgMxFM3UV62vUZeCBItQcpMFXRZcOygn1AZxgymUwbmSGJCOUoTs3/obF4q49Rfc+Tem7Sy09UDgcM653NwTpowq7TjfVmldW19o7xZ2dre2d2z9w86KskJm2csET2QqQIo4K0NdWM9FJEA8Z6Yajm6nfSBS0UTc63FKfI4GgsYUI2kwD72mAlHKMjpOaQT6CnKYVor1LPArjp1Zwa4TNyCVEGBVmB/eVGCM06Exgwp1XedVPs5kpiRiYVL1MkRXiEBqRvqECcKD+f3TGBp0aJYJxI84SGM/X3RI64UmMemiRHeqgWvan4n9fPdHzt51SkmSYCzxfFGYM6gdNSYEQlwZqNDUFYUvNXiIdIqxNdRVTgrt48jLpNOruRb1xd1ltoqKOMjgCJ6AGXHAFmuAWtEAbYPAInsEreLOerBfr3fqYR0tWMXMI/sD6/AGOQZiP</latexit> Dropout & Multiplicative Noise Implementation as Multiplicative Noise : (2012) Hidden Units Weights Diagonal Matrix of Random Variables λ i,i ∼ p ( λ ) • Dropout corresponds to p( λ ) being Bernoulli. • Gaussian, beta, and uniform noise have Standard Neural After Applying Network Dropout been shown to work as well. 5

Dropout as a Gaussian Scale Mixture 6

Dropout as a Gaussian Scale Mixture Gaussian Scale Mixtures A random variable θ is a Gaussian scale mixture iff it can be expressed as the product of a Gaussian random variable and an independent scalar random variable [Beale & Mallows, 1959] : 7

Dropout as a Gaussian Scale Mixture Gaussian Scale Mixtures A random variable θ is a Gaussian scale mixture iff it can be expressed as the product of a Gaussian random variable and an independent scalar random variable [Beale & Mallows, 1959] : Can be reparametrized into a hierarchical form : 8

<latexit sha1_base64="US8pWfsk+4hnLFkFwbrxIjiSxQ=">ACEnicbVDLSgNBEJyNrxhfUY9eBoOQAi7UdBjwIsniWAekMRldjJxszsLjO9alj2G7z4K148KOLVkzf/xsnjoIkFDUVN91dXi4Btv+tlJLyura+n1zMbm1vZOdnevroNIUVajgQhU0yOaCe6zGnAQrBkqRqQnWMbno/9xh1Tmgf+NYxC1pGk7/MepwSM5GYL927Mi7cJbmsucRvYA8SXSd4u4rHSl+QmLidubCcFN5uzS/YEeJE4M5JDM1Td7Fe7G9BIMh+oIFq3HDuETkwUcCpYkmlHmoWEDkmftQz1iWS6E09eSvCRUbq4FyhTPuCJ+nsiJlLrkfRMpyQw0PeWPzPa0XQO+vE3A8jYD6dLupFAkOAx/ngLleMghgZQqji5lZMB0QRCibFjAnBmX95kdTLJe4VL46yVXILI40OkCHKI8cdIoq6AJVUQ1R9Iie0St6s56sF+vd+pi2pqzZzD76A+vzB6ZAnOo=</latexit> <latexit sha1_base64="WqtOMyEvjU7vCAHV+56b9SYRNo=">ACKnicbVDLSsNAFJ34rPVdelmsAgVtCRV0GXFjQsXFewDmhAmk0k7dDIJMxOhHyPG3/FTRdKceuHOGkjaOuFYc6ce+6de48XMyqVaU6NldW19Y3N0lZ5e2d3b79ycNiRUSIwaeOIRaLnIUkY5aStqGKkFwuCQo+Rrje6y/PdZyIkjfiTGsfECdGA04BipDTlVm4DN2VZzQ6RGnpBOszclJ+zCyuDthcxX45DfaX2g+7oyzXwh9td/Y8cytVs27OAi4DqwBVUETLrUxsP8JSLjCDEnZt8xYOSkSimJGsrKdSBIjPEID0teQo5BIJ52tmsFTzfgwiIQ+XMEZ+7siRaHMh9bKfEy5mMvJ/3L9RAU3Tkp5nCjC8fyjIGFQRTD3DfpUEKzYWAOEBdWzQjxEAmGl3S1rE6zFlZdBp1G3LuNx6tqExV2lMAxOAE1YIFr0AT3oAXaAIMX8AbewYfxakyMqfE5l64YRc0R+BPG1ze4V6jG</latexit> <latexit sha1_base64="kEf5cwRXb6DTemwrlLiDzlO+Lg=">ACB3icbVDLSgMxFM3UV62vUZeCBItQcpMFXRZcOygn1AZxgymUwbmSGJCOUoTs3/obF4q49Rfc+Tem7Sy09UDgcM653NwTpowq7TjfVmldW19o7xZ2dre2d2z9w86KskJm2csET2QqQIo4K0NdWM9FJEA8Z6Yajm6nfSBS0UTc63FKfI4GgsYUI2kwD72mAlHKMjpOaQT6CnKYVor1LPArjp1Zwa4TNyCVEGBVmB/eVGCM06Exgwp1XedVPs5kpiRiYVL1MkRXiEBqRvqECcKD+f3TGBp0aJYJxI84SGM/X3RI64UmMemiRHeqgWvan4n9fPdHzt51SkmSYCzxfFGYM6gdNSYEQlwZqNDUFYUvNXiIdIqxNdRVTgrt48jLpNOruRb1xd1ltoqKOMjgCJ6AGXHAFmuAWtEAbYPAInsEreLOerBfr3fqYR0tWMXMI/sD6/AGOQZiP</latexit> Dropout as a Gaussian Scale Mixture Let’s assume a Gaussian prior on Gaussian Scale Mixtures the NN weights … A random variable θ is a Gaussian scale mixture f l ( h n,l − 1 Λ l W l ) iff it can be expressed as the product of a Gaussian random variable and an independent scalar random variable [Beale & Mallows, 1959] : Noise Weights λ i,i ∼ p ( λ ) w i,j ∼ N(0 , σ 2 0 ) Can be reparametrized into a hierarchical form : 9

<latexit sha1_base64="WqtOMyEvjU7vCAHV+56b9SYRNo=">ACKnicbVDLSsNAFJ34rPVdelmsAgVtCRV0GXFjQsXFewDmhAmk0k7dDIJMxOhHyPG3/FTRdKceuHOGkjaOuFYc6ce+6de48XMyqVaU6NldW19Y3N0lZ5e2d3b79ycNiRUSIwaeOIRaLnIUkY5aStqGKkFwuCQo+Rrje6y/PdZyIkjfiTGsfECdGA04BipDTlVm4DN2VZzQ6RGnpBOszclJ+zCyuDthcxX45DfaX2g+7oyzXwh9td/Y8cytVs27OAi4DqwBVUETLrUxsP8JSLjCDEnZt8xYOSkSimJGsrKdSBIjPEID0teQo5BIJ52tmsFTzfgwiIQ+XMEZ+7siRaHMh9bKfEy5mMvJ/3L9RAU3Tkp5nCjC8fyjIGFQRTD3DfpUEKzYWAOEBdWzQjxEAmGl3S1rE6zFlZdBp1G3LuNx6tqExV2lMAxOAE1YIFr0AT3oAXaAIMX8AbewYfxakyMqfE5l64YRc0R+BPG1ze4V6jG</latexit> Dropout as a Gaussian Scale Mixture Let’s assume a Gaussian prior on Gaussian Scale Mixtures the NN weights … A random variable θ is a Gaussian scale mixture f l ( h n,l − 1 Λ l W l ) iff it can be expressed as the product of a Gaussian random variable and an independent scalar random variable [Beale & Mallows, 1959] : Definition of a Gaussian Scale Mixture Can be reparametrized into a hierarchical form : 10

<latexit sha1_base64="WqtOMyEvjU7vCAHV+56b9SYRNo=">ACKnicbVDLSsNAFJ34rPVdelmsAgVtCRV0GXFjQsXFewDmhAmk0k7dDIJMxOhHyPG3/FTRdKceuHOGkjaOuFYc6ce+6de48XMyqVaU6NldW19Y3N0lZ5e2d3b79ycNiRUSIwaeOIRaLnIUkY5aStqGKkFwuCQo+Rrje6y/PdZyIkjfiTGsfECdGA04BipDTlVm4DN2VZzQ6RGnpBOszclJ+zCyuDthcxX45DfaX2g+7oyzXwh9td/Y8cytVs27OAi4DqwBVUETLrUxsP8JSLjCDEnZt8xYOSkSimJGsrKdSBIjPEID0teQo5BIJ52tmsFTzfgwiIQ+XMEZ+7siRaHMh9bKfEy5mMvJ/3L9RAU3Tkp5nCjC8fyjIGFQRTD3DfpUEKzYWAOEBdWzQjxEAmGl3S1rE6zFlZdBp1G3LuNx6tqExV2lMAxOAE1YIFr0AT3oAXaAIMX8AbewYfxakyMqfE5l64YRc0R+BPG1ze4V6jG</latexit> Dropout as a Gaussian Scale Mixture Let’s assume a Gaussian prior on Gaussian Scale Mixtures the NN weights … A random variable θ is a Gaussian scale mixture f l ( h n,l − 1 Λ l W l ) iff it can be expressed as the product of a Gaussian random variable and an independent scalar random variable [Beale & Mallows, 1959] : Definition of a Gaussian Scale Mixture SWITCH TO HIERARCHICAL PARAMETRIZATION Can be reparametrized into a hierarchical form : 11

Dropout as a Structured Shrinkage Prior Eric Nalisnick , Jos Miguel - PowerPoint PPT Presentation

Dropout as a Structured Shrinkage Prior Eric Nalisnick , Jos Miguel Hernndez-Lobato , Padhraic Smyth University of Cambridge University of California, Irvine Dropout & Multiplicative Noise (2012) Standard Neural After

A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE

Jumpout : Improved Dropout for Deep Neural Networks with ReLUs Shengjie Wang, Tianyi Zhou, Jeff

Structured Prediction Introduction What is structured prediction? CS 6355: Structured Prediction

Econ 2148, fall 2017 Shrinkage in the Normal means model Maximilian Kasy Department of

Advanced Econometrics 2, Hilary term 2020 Shrinkage in the Normal means model Maximilian Kasy

Econ 2148, fall 2019 Shrinkage in the Normal means model Maximilian Kasy Department of

Shrinkage priors Dr. Jarad Niemi Iowa State University August 24, 2017 Jarad Niemi (Iowa State)

Scaling Log-Structured KV-Stores featuring Monkey and Dostoevsky SIGMOD17 / SIGMOD18 Niv Dayan

Machine Learning Fall 2017 Structured Prediction (structured perceptron, HMM, structured SVM)

Dropout improves Recurrent Neural Networks for Handwriting Recognition Vu Pham Th eodore

AMMI Introduction to Deep Learning 6.3. Dropout Fran cois Fleuret

Follow the Leader with Dropout Perturbations Tim van Erven COLT, 2014 Joint work with: Wojciech

Deep learning 6.3. Dropout Fran cois Fleuret https://fleuret.org/ee559/ Nov 2, 2020 A first

Preve Prevention ntion of of Dro Dropout pout in Vo in Vocatio cational Training nal

Deep learning 6.3. Dropout Fran cois Fleuret https://fleuret.org/dlc/ Dec 20, 2020 A first

Shrinkage Overview Joint DN Presentation 25 th October 2016 Matt Marshall (National Grid) John

WELCOME Special Education Director Meeting What if teachers were treated like football stars?

Do Banks Pass Through Credit Expansions to Consumers Who Want to Borrow? Evidence from Credit

EuCARD magnet development HFM-EuCARD, GdR, 14 October 2010 Gijs de Rijk CERN EUCARD - HE-LHC'10

ORED Revision Awards and Resubmission Tips March 26, 2018 ORED Revision Awards Purpose

A simple knowledge-based agent function KB-Agent ( percept ) returns an action static : KB , a

D URING COVID-19 Objectives for Today Focus on best practices and adapt stewardship plans

Classification for High Dimensional Problems Using Bayesian Neural Networks and Dirichlet

Antecedents of Organizational Sophistication in Violent Non-State Actors Gina S. Ligon Michael

Dropout as a Structured Shrinkage Prior Eric Nalisnick , Jos Miguel - PowerPoint PPT Presentation

Dropout as a Structured Shrinkage Prior Eric Nalisnick , Jos Miguel Hernndez-Lobato , Padhraic Smyth University of Cambridge University of California, Irvine Dropout & Multiplicative Noise (2012) Standard Neural After

A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE

Jumpout : Improved Dropout for Deep Neural Networks with ReLUs Shengjie Wang*, Tianyi Zhou*, Jeff

Structured Prediction Introduction What is structured prediction? CS 6355: Structured Prediction

Econ 2148, fall 2017 Shrinkage in the Normal means model Maximilian Kasy Department of

Advanced Econometrics 2, Hilary term 2020 Shrinkage in the Normal means model Maximilian Kasy

Econ 2148, fall 2019 Shrinkage in the Normal means model Maximilian Kasy Department of

Shrinkage priors Dr. Jarad Niemi Iowa State University August 24, 2017 Jarad Niemi (Iowa State)

Scaling Log-Structured KV-Stores featuring Monkey and Dostoevsky SIGMOD17 / SIGMOD18 Niv Dayan

Machine Learning Fall 2017 Structured Prediction (structured perceptron, HMM, structured SVM)

Dropout improves Recurrent Neural Networks for Handwriting Recognition Vu Pham Th eodore

AMMI Introduction to Deep Learning 6.3. Dropout Fran cois Fleuret

Follow the Leader with Dropout Perturbations Tim van Erven COLT, 2014 Joint work with: Wojciech

Deep learning 6.3. Dropout Fran cois Fleuret https://fleuret.org/ee559/ Nov 2, 2020 A first

Preve Prevention ntion of of Dro Dropout pout in Vo in Vocatio cational Training nal

Deep learning 6.3. Dropout Fran cois Fleuret https://fleuret.org/dlc/ Dec 20, 2020 A first

Shrinkage Overview Joint DN Presentation 25 th October 2016 Matt Marshall (National Grid) John

WELCOME Special Education Director Meeting What if teachers were treated like football stars?

Do Banks Pass Through Credit Expansions to Consumers Who Want to Borrow? Evidence from Credit

EuCARD magnet development HFM-EuCARD, GdR, 14 October 2010 Gijs de Rijk CERN EUCARD - HE-LHC'10

ORED Revision Awards and Resubmission Tips March 26, 2018 ORED Revision Awards Purpose

A simple knowledge-based agent function KB-Agent ( percept ) returns an action static : KB , a

D URING COVID-19 Objectives for Today Focus on best practices and adapt stewardship plans

Classification for High Dimensional Problems Using Bayesian Neural Networks and Dirichlet

Antecedents of Organizational Sophistication in Violent Non-State Actors Gina S. Ligon Michael

Jumpout : Improved Dropout for Deep Neural Networks with ReLUs Shengjie Wang, Tianyi Zhou, Jeff