dropout as a structured shrinkage prior
play

Dropout as a Structured Shrinkage Prior Eric Nalisnick , Jos Miguel - PowerPoint PPT Presentation

Dropout as a Structured Shrinkage Prior Eric Nalisnick , Jos Miguel Hernndez-Lobato , Padhraic Smyth University of Cambridge University of California, Irvine Dropout & Multiplicative Noise (2012) Standard Neural After


  1. Dropout as a Structured Shrinkage Prior Eric Nalisnick , José Miguel Hernández-Lobato , Padhraic Smyth University of Cambridge University of California, Irvine

  2. Dropout & Multiplicative Noise (2012) Standard Neural After Applying Network Dropout 2

  3. <latexit sha1_base64="kEf5cwRXb6DTemwrlLiDzlO+Lg=">ACB3icbVDLSgMxFM3UV62vUZeCBItQcpMFXRZcOygn1AZxgymUwbmSGJCOUoTs3/obF4q49Rfc+Tem7Sy09UDgcM653NwTpowq7TjfVmldW19o7xZ2dre2d2z9w86KskJm2csET2QqQIo4K0NdWM9FJEA8Z6Yajm6nfSBS0UTc63FKfI4GgsYUI2kwD72mAlHKMjpOaQT6CnKYVor1LPArjp1Zwa4TNyCVEGBVmB/eVGCM06Exgwp1XedVPs5kpiRiYVL1MkRXiEBqRvqECcKD+f3TGBp0aJYJxI84SGM/X3RI64UmMemiRHeqgWvan4n9fPdHzt51SkmSYCzxfFGYM6gdNSYEQlwZqNDUFYUvNXiIdIqxNdRVTgrt48jLpNOruRb1xd1ltoqKOMjgCJ6AGXHAFmuAWtEAbYPAInsEreLOerBfr3fqYR0tWMXMI/sD6/AGOQZiP</latexit> Dropout & Multiplicative Noise Implementation as Multiplicative Noise : (2012) Hidden Units Weights Diagonal Matrix of Random Variables λ i,i ∼ p ( λ ) Standard Neural After Applying Network Dropout 3

  4. <latexit sha1_base64="kEf5cwRXb6DTemwrlLiDzlO+Lg=">ACB3icbVDLSgMxFM3UV62vUZeCBItQcpMFXRZcOygn1AZxgymUwbmSGJCOUoTs3/obF4q49Rfc+Tem7Sy09UDgcM653NwTpowq7TjfVmldW19o7xZ2dre2d2z9w86KskJm2csET2QqQIo4K0NdWM9FJEA8Z6Yajm6nfSBS0UTc63FKfI4GgsYUI2kwD72mAlHKMjpOaQT6CnKYVor1LPArjp1Zwa4TNyCVEGBVmB/eVGCM06Exgwp1XedVPs5kpiRiYVL1MkRXiEBqRvqECcKD+f3TGBp0aJYJxI84SGM/X3RI64UmMemiRHeqgWvan4n9fPdHzt51SkmSYCzxfFGYM6gdNSYEQlwZqNDUFYUvNXiIdIqxNdRVTgrt48jLpNOruRb1xd1ltoqKOMjgCJ6AGXHAFmuAWtEAbYPAInsEreLOerBfr3fqYR0tWMXMI/sD6/AGOQZiP</latexit> Dropout & Multiplicative Noise Implementation as Multiplicative Noise : (2012) Hidden Units Weights Diagonal Matrix of Random Variables λ i,i ∼ p ( λ ) • Dropout corresponds to p( λ ) being Bernoulli. • Gaussian, beta, and uniform noise have Standard Neural After Applying Network Dropout been shown to work as well. 4

  5. <latexit sha1_base64="kEf5cwRXb6DTemwrlLiDzlO+Lg=">ACB3icbVDLSgMxFM3UV62vUZeCBItQcpMFXRZcOygn1AZxgymUwbmSGJCOUoTs3/obF4q49Rfc+Tem7Sy09UDgcM653NwTpowq7TjfVmldW19o7xZ2dre2d2z9w86KskJm2csET2QqQIo4K0NdWM9FJEA8Z6Yajm6nfSBS0UTc63FKfI4GgsYUI2kwD72mAlHKMjpOaQT6CnKYVor1LPArjp1Zwa4TNyCVEGBVmB/eVGCM06Exgwp1XedVPs5kpiRiYVL1MkRXiEBqRvqECcKD+f3TGBp0aJYJxI84SGM/X3RI64UmMemiRHeqgWvan4n9fPdHzt51SkmSYCzxfFGYM6gdNSYEQlwZqNDUFYUvNXiIdIqxNdRVTgrt48jLpNOruRb1xd1ltoqKOMjgCJ6AGXHAFmuAWtEAbYPAInsEreLOerBfr3fqYR0tWMXMI/sD6/AGOQZiP</latexit> Dropout & Multiplicative Noise Implementation as Multiplicative Noise : (2012) Hidden Units Weights Diagonal Matrix of Random Variables λ i,i ∼ p ( λ ) • Dropout corresponds to p( λ ) being Bernoulli. • Gaussian, beta, and uniform noise have Standard Neural After Applying Network Dropout been shown to work as well. 5

  6. Dropout as a Gaussian Scale Mixture 6

  7. Dropout as a Gaussian Scale Mixture Gaussian Scale Mixtures A random variable θ is a Gaussian scale mixture iff it can be expressed as the product of a Gaussian random variable and an independent scalar random variable [Beale & Mallows, 1959] : 7

  8. Dropout as a Gaussian Scale Mixture Gaussian Scale Mixtures A random variable θ is a Gaussian scale mixture iff it can be expressed as the product of a Gaussian random variable and an independent scalar random variable [Beale & Mallows, 1959] : Can be reparametrized into a hierarchical form : 8

  9. <latexit sha1_base64="US8pWfsk+4hnLFkFwbrxIjiSxQ=">ACEnicbVDLSgNBEJyNrxhfUY9eBoOQAi7UdBjwIsniWAekMRldjJxszsLjO9alj2G7z4K148KOLVkzf/xsnjoIkFDUVN91dXi4Btv+tlJLyura+n1zMbm1vZOdnevroNIUVajgQhU0yOaCe6zGnAQrBkqRqQnWMbno/9xh1Tmgf+NYxC1pGk7/MepwSM5GYL927Mi7cJbmsucRvYA8SXSd4u4rHSl+QmLidubCcFN5uzS/YEeJE4M5JDM1Td7Fe7G9BIMh+oIFq3HDuETkwUcCpYkmlHmoWEDkmftQz1iWS6E09eSvCRUbq4FyhTPuCJ+nsiJlLrkfRMpyQw0PeWPzPa0XQO+vE3A8jYD6dLupFAkOAx/ngLleMghgZQqji5lZMB0QRCibFjAnBmX95kdTLJe4VL46yVXILI40OkCHKI8cdIoq6AJVUQ1R9Iie0St6s56sF+vd+pi2pqzZzD76A+vzB6ZAnOo=</latexit> <latexit sha1_base64="WqtOMyEvjU7vCAHV+56b9SYRNo=">ACKnicbVDLSsNAFJ34rPVdelmsAgVtCRV0GXFjQsXFewDmhAmk0k7dDIJMxOhHyPG3/FTRdKceuHOGkjaOuFYc6ce+6de48XMyqVaU6NldW19Y3N0lZ5e2d3b79ycNiRUSIwaeOIRaLnIUkY5aStqGKkFwuCQo+Rrje6y/PdZyIkjfiTGsfECdGA04BipDTlVm4DN2VZzQ6RGnpBOszclJ+zCyuDthcxX45DfaX2g+7oyzXwh9td/Y8cytVs27OAi4DqwBVUETLrUxsP8JSLjCDEnZt8xYOSkSimJGsrKdSBIjPEID0teQo5BIJ52tmsFTzfgwiIQ+XMEZ+7siRaHMh9bKfEy5mMvJ/3L9RAU3Tkp5nCjC8fyjIGFQRTD3DfpUEKzYWAOEBdWzQjxEAmGl3S1rE6zFlZdBp1G3LuNx6tqExV2lMAxOAE1YIFr0AT3oAXaAIMX8AbewYfxakyMqfE5l64YRc0R+BPG1ze4V6jG</latexit> <latexit sha1_base64="kEf5cwRXb6DTemwrlLiDzlO+Lg=">ACB3icbVDLSgMxFM3UV62vUZeCBItQcpMFXRZcOygn1AZxgymUwbmSGJCOUoTs3/obF4q49Rfc+Tem7Sy09UDgcM653NwTpowq7TjfVmldW19o7xZ2dre2d2z9w86KskJm2csET2QqQIo4K0NdWM9FJEA8Z6Yajm6nfSBS0UTc63FKfI4GgsYUI2kwD72mAlHKMjpOaQT6CnKYVor1LPArjp1Zwa4TNyCVEGBVmB/eVGCM06Exgwp1XedVPs5kpiRiYVL1MkRXiEBqRvqECcKD+f3TGBp0aJYJxI84SGM/X3RI64UmMemiRHeqgWvan4n9fPdHzt51SkmSYCzxfFGYM6gdNSYEQlwZqNDUFYUvNXiIdIqxNdRVTgrt48jLpNOruRb1xd1ltoqKOMjgCJ6AGXHAFmuAWtEAbYPAInsEreLOerBfr3fqYR0tWMXMI/sD6/AGOQZiP</latexit> Dropout as a Gaussian Scale Mixture Let’s assume a Gaussian prior on Gaussian Scale Mixtures the NN weights … A random variable θ is a Gaussian scale mixture f l ( h n,l − 1 Λ l W l ) iff it can be expressed as the product of a Gaussian random variable and an independent scalar random variable [Beale & Mallows, 1959] : Noise Weights λ i,i ∼ p ( λ ) w i,j ∼ N(0 , σ 2 0 ) Can be reparametrized into a hierarchical form : 9

  10. <latexit sha1_base64="WqtOMyEvjU7vCAHV+56b9SYRNo=">ACKnicbVDLSsNAFJ34rPVdelmsAgVtCRV0GXFjQsXFewDmhAmk0k7dDIJMxOhHyPG3/FTRdKceuHOGkjaOuFYc6ce+6de48XMyqVaU6NldW19Y3N0lZ5e2d3b79ycNiRUSIwaeOIRaLnIUkY5aStqGKkFwuCQo+Rrje6y/PdZyIkjfiTGsfECdGA04BipDTlVm4DN2VZzQ6RGnpBOszclJ+zCyuDthcxX45DfaX2g+7oyzXwh9td/Y8cytVs27OAi4DqwBVUETLrUxsP8JSLjCDEnZt8xYOSkSimJGsrKdSBIjPEID0teQo5BIJ52tmsFTzfgwiIQ+XMEZ+7siRaHMh9bKfEy5mMvJ/3L9RAU3Tkp5nCjC8fyjIGFQRTD3DfpUEKzYWAOEBdWzQjxEAmGl3S1rE6zFlZdBp1G3LuNx6tqExV2lMAxOAE1YIFr0AT3oAXaAIMX8AbewYfxakyMqfE5l64YRc0R+BPG1ze4V6jG</latexit> Dropout as a Gaussian Scale Mixture Let’s assume a Gaussian prior on Gaussian Scale Mixtures the NN weights … A random variable θ is a Gaussian scale mixture f l ( h n,l − 1 Λ l W l ) iff it can be expressed as the product of a Gaussian random variable and an independent scalar random variable [Beale & Mallows, 1959] : Definition of a Gaussian Scale Mixture Can be reparametrized into a hierarchical form : 10

  11. <latexit sha1_base64="WqtOMyEvjU7vCAHV+56b9SYRNo=">ACKnicbVDLSsNAFJ34rPVdelmsAgVtCRV0GXFjQsXFewDmhAmk0k7dDIJMxOhHyPG3/FTRdKceuHOGkjaOuFYc6ce+6de48XMyqVaU6NldW19Y3N0lZ5e2d3b79ycNiRUSIwaeOIRaLnIUkY5aStqGKkFwuCQo+Rrje6y/PdZyIkjfiTGsfECdGA04BipDTlVm4DN2VZzQ6RGnpBOszclJ+zCyuDthcxX45DfaX2g+7oyzXwh9td/Y8cytVs27OAi4DqwBVUETLrUxsP8JSLjCDEnZt8xYOSkSimJGsrKdSBIjPEID0teQo5BIJ52tmsFTzfgwiIQ+XMEZ+7siRaHMh9bKfEy5mMvJ/3L9RAU3Tkp5nCjC8fyjIGFQRTD3DfpUEKzYWAOEBdWzQjxEAmGl3S1rE6zFlZdBp1G3LuNx6tqExV2lMAxOAE1YIFr0AT3oAXaAIMX8AbewYfxakyMqfE5l64YRc0R+BPG1ze4V6jG</latexit> Dropout as a Gaussian Scale Mixture Let’s assume a Gaussian prior on Gaussian Scale Mixtures the NN weights … A random variable θ is a Gaussian scale mixture f l ( h n,l − 1 Λ l W l ) iff it can be expressed as the product of a Gaussian random variable and an independent scalar random variable [Beale & Mallows, 1959] : Definition of a Gaussian Scale Mixture SWITCH TO HIERARCHICAL PARAMETRIZATION Can be reparametrized into a hierarchical form : 11

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend