math for vgg
play

Math for VGG 1 Intro I am writing this to help you understand what - PDF document

Math for VGG 1 Intro I am writing this to help you understand what the code is doing. Its still work in progress, toward making it more reader friendly. At this point, its just a bunch of math formulas you do not want to follow. 2


  1. Math for VGG 1 Intro I am writing this to help you understand what the code is doing. It’s still work in progress, toward making it more reader friendly. At this point, it’s just a bunch of math formulas you do not want to follow. 2 Notational Conventions • Since there are lots of variables that take multiple indices, it would be difficult to parse them if we use subscripts for indices. We therefore put indices in parens, like x ( i, j, k, l ) , instead of subscripts x i,j,k,l . This would be much easier to read. 3 Symbols Constant Parameters and Indexes • B (Batch size) : the number of samples in a mini-batch – 0 ≤ b < B (batch) : an index of a sample in a mini batch • C (Classes) : the number of classes – 0 ≤ c < C (class) : an index of a class • IC (Input Channels) : the number of channels in an input image of a layer (e.g., three if an image has red, green and blue components) – 0 ≤ ic < IC (input channel index) : an index of a channel in an input image • OC (Output Channels) : the number of channels in an output image of a layer – 0 ≤ oc < OC (output channel index) : an index of a channel in an output image • H (Height) : the number of pixels in a single column of an image – 0 ≤ i < H (image row index) • W (Width) : the number of pixels in a single row of an image – 0 ≤ j < W (image column index) • K (Kernel size) : half of the number of pixels in a single row or a single column. throughout VGG, K is actually always 1 and the kernel is actually 3x3 pixels. – − K ≤ i ′ ≤ K (kernel row index) – − K ≤ j ′ ≤ K (kernel column index) 1

  2. Multidimensional Data • x ( b, ic, i, j ) : a batch of images input to a layer • y ( b, oc, i, j ) : a batch of images output from a layer • w ( oc, ic, i ′ , j ′ ) : filters (kernels) applied to each image 4 Convolution2D Description: Convolution takes a batch of images ( x ) and a filter ( w ) and outputs another batch of images ( y ). An input batch x consists of B images, each of which consists of IC channels, each of which consists of ( H × W ) pixels. A filter is essentially a small image. It consists of OC output channels, each of which consists of IC input channels, each of which consits of (2 K +1) × (2 K +1) pixels. An output batch consists of B images, each of which consists of OC channels, each of which consists of ( H × W ) pixels. Each pixel in the output is obtained by taking the inner product of the filter • x ( b, ic, i, j ) : the pixel value of b th image’s ic th chanel Forward: ∑ w ( oc, ic, i ′ , j ′ ) x ( b, ic, i + i ′ , j + j ′ ) y ( b, oc, i, j ) = (1) 0 ≤ ic < H, − K ≤ i ′ ≤ K, − K ≤ j ′ ≤ K, The actual code must take care of array index underflow and overflow. In the expression above, we assume all elements whose indices underflow or overflow are zero. Backward: ∂y ( b ′ , oc, i, j ) ∂L ∂L ∑ = (2) ∂x ( b, ic, i + i ′ , j + j ′ ) ∂y ( b ′ , oc, i, j ) ∂x ( b, ic, i + i ′ , j + j ′ ) b ′ ,oc,i,j ∂L ∂y ( b, oc, i, j ) ∑ = (3) ∂y ( b, oc, i, j ) ∂x ( b, ic, i + i ′ , j + j ′ ) oc,i,j ∂L ∑ ∂y ( b, oc, i, j ) w ( oc, ic, i ′ , j ′ ) = (4) oc,i,j Equivalently, let i ′′ = i + i ′ and j ′′ = j + j ′ . i ′′ − i ′ < H 0 ≤ i = (5) j ′′ − j ′ < W 0 ≤ j = (6) ∂L ∂L ∑ ∂y ( b, oc, i ′′ − i ′ , j ′′ − j ′ ) w ( oc, ic, i ′ , j ′ ) = (7) ∂x ( b, ic, i ′′ , j ′′ ) oc, i ′′ − H < i ′ ≤ i ′′ , j ′′ − W < j ′ ≤ j ′′ Replacing i ′′ with i and j ′′ with j for readability, we get ∂L ∂L ∑ ∂y ( b, oc, i − i ′ , j − j ′ ) w ( oc, ic, i ′ , j ′ ) = (8) ∂x ( b, ic, i, j ) oc, i − H < i ′ ≤ i, j − W < j ′ ≤ j 2

  3. ∂y ( b, oc ′ , i, j ) ∂L ∂L ∑ = (9) ∂w ( oc, ic, i ′ , j ′ ) ∂y ( b, oc ′ , i, j ) ∂w ( oc, ic, i ′ , j ′ ) b,oc ′ ,i,j ∂L ∂y ( b, oc, i, j ) ∑ = (10) ∂y ( b, oc, i, j ) ∂w ( oc, ic, i ′ , j ′ ) b,i,j ∂L ∑ ∂y ( b, oc, i, j ) x ( b, ic, i + i ′ , j + j ′ ) = (11) b,i,j 5 Linear4D Forward: ∑ y ( b, c, 0 , 0) = x ( b, ic, 0 , 0) w ( ic, c ) (12) ic Backward: ∂y ( b ′ , c, 0 , 0) ∂L ∂L ∑ = (13) ∂x ( b, ic, 0 , 0) ∂y ( b ′ , c, 0 , 0) ∂x ( b, ic, 0 , 0) b ′ ,c ∂L ∑ = ∂y ( b, c, 0 , 0) w ( ic, c ) (14) c ∂y ( b, c ′ , 0 , 0) ∂L ∂L ∑ = (15) ∂w ( ic, c ) ∂y ( b, c ′ , 0 , 0) ∂w ( ic, c ) b,c ′ ∂L ∑ = ∂y ( b, c, 0 , 0 , ) x ( b, ic, 0 , 0) (16) b 6 Dropout4 Forward: y ( b, c, i, j ) = R ( b, c, i, j ) x ( b, c, i, j ) (17) where R ( b, c, i, j ) is a random matrix whose element is 0 with probatility p and 1 / (1 − p ) with probability (1 − p ) Backward: ∂L ∂L = ∂y ( b, c, i, j ) R ( b, c, i, j ) (18) ∂x ( b, c, i, j ) 7 BatchNormalization4 Forward: 1 ∑ µ ( ic ) = x ( b, ic, i, j ) , (19) BHW b,i,j 1 σ 2 ( ic ) ∑ ( x ( b, ic, i, j ) − µ ( ic )) 2 , = (20) BHW b,i,j x ( b, ic, i, j ) − µ ( ic ) ˆ x ( b, ic, i, j ) = , (21) √ σ 2 ( ic ) + ϵ y ( b, ic, i, j ) = γ ( ic )ˆ x ( b, ic, i, j ) + β ( ic ) . (22) 3

  4. Backward: ∂L ∂L ∂y ( b, ic, i, j ) ∑ = (23) ∂γ ( ic ′ ) ∂γ ( ic ) ∂y ( b, ic, i, j ) b,i,j ∂L x ( b, ic, i, j ) − µ ( ic ) ∑ = (24) ∂y ( b, ic, i, j ) √ σ 2 ( ic ) + ϵ b,i,j ∂L ∂L ∑ = (25) ∂β ( ic ) ∂y ( b, ic, i, j ) b,i,j ∂L ∂L ∂y = (26) ∂ ˆ x ( b, ic, i, j ) ∂y ( b, ic, i, j ) ∂ ˆ x ( b, ic, i, j ) ∂L = ∂y ( b, ic, i, j ) γ ( ic ) (27) ∂L ∂L ∂ ˆ x ( b, ic, i, j ) ∑ = (28) ∂σ 2 ( ic ) ∂σ 2 ( ic ) ∂ ˆ x ( b, ic, i, j ) b,i,j − 1 ∂L x ( b, ic, i, j ) − µ ( ic ) ∑ = (29) ( σ 2 ( ic ) + ϵ ) 3 / 2 2 ∂ ˆ x ( b, ic, i, j ) b,i,j ∂σ 2 ( ic ) ∂L ∂L ∂ ˆ x ( b, ic, i, j ) ∂L ∑ = + (30) ∂σ 2 ( ic ) ∂µ ( ic ) ∂ ˆ x ( b, ic, i, j ) ∂µ ( ic ) ∂µ ( ic ) b,i,j ∂L 1 ∂L 2 ∑ ∑ = − + ( µ ( ic ) − x ( b, ic, i, j )) (31) √ ∂σ 2 ( ic ) ∂ ˆ x ( b, ic, i, j ) σ 2 ( ic ) + ϵ BHW b,i,j b,i,j ∂L 1 ∑ = − (32) √ ∂ ˆ x ( b, ic, i, j ) σ 2 ( ic ) + ϵ b,i,j ∂σ 2 ( ic ) ∂L ∂L ∂ ˆ x ( b, ic, i, j ) ∂L ∂L ∂µ ( ic ) = ∂x ( b, ic, i, j ) + ∂x ( b, ic, i, j ) + (33) ∂σ 2 ( ic ) ∂x ( b, ic, i, j ) ∂ ˆ x ( b, ic, i, j ) ∂µ ( ic ) ∂x ( b, ic, i, j ) ∂L 1 ∂L 2 ∂L 1 = + BHW ( x ( b, ic, i, j ) − µ ( ic )) + (34) ∂σ 2 ( ic ) ∂ ˆ x ( b, ic, i, j ) √ σ 2 ( ic ) + ϵ ∂µ ( ic ) BHW ∂L γ ( ic ) = (35) ∂y ( b, ic, i, j ) √ σ 2 ( ic ) + ϵ    − 1 ∂L x ( b, ic, i, j ) − µ ( ic ) 2 ∑ + BHW ( x ( b, ic, i, j ) − µ ( ic )) (36)  ( σ 2 ( ic ) + ϵ ) 3 / 2 2 ∂ ˆ x ( b, ic, i, j ) b,i,j   ∂L 1 1 ∑ +  − (37)  √ ∂ ˆ x ( b, ic, i, j ) σ 2 ( ic ) + ϵ BHW b,i,j ∂L γ ( ic ) = (38) ∂y ( b, ic, i, j ) √ σ 2 ( ic ) + ϵ   γ ( ic ) ∂L x ( b, ic, i, j ) − µ ( ic ) ∑  ( x ( b, ic, i, j ) − µ ( ic )) − (39) ( σ 2 ( ic ) + ϵ ) 3 / 2 BHW ∂y ( b, ic, i, j ) b,i,j 4

  5.   γ ( ic ) ∂L 1 ∑ − (40)  BHW ∂y ( b, ic, i, j ) √ σ 2 ( ic ) + ϵ b,i,j ∂L γ ( ic ) = (41) √ ∂y ( b, ic, i, j ) σ 2 ( ic ) + ϵ   1 γ ( ic ) ∂L x ( b, ic, i, j ) − µ ( ic ) ∑  ( x ( b, ic, i, j ) − µ ( ic )) (42) − BHW √ ∂y ( b, ic, i, j ) σ 2 ( ic ) + ϵ σ 2 ( ic ) + ϵ b,i,j 1 γ ( ic ) ∂L ∑ − (43) BHW √ ∂y ( b, ic, i, j ) σ 2 ( ic ) + ϵ b,i,j ∂L γ ( ic ) = (44) ∂y ( b, ic, i, j ) √ σ 2 ( ic ) + ϵ   1 γ ( ic ) ∂L x ( b, ic, i, j ) − µ ( ic )  x ( b, ic, i, j ) − µ ( ic ) ∑ (45) − BHW √ ∂y ( b, ic, i, j ) √ √ σ 2 ( ic ) + ϵ σ 2 ( ic ) + ϵ σ 2 ( ic ) + ϵ b,i,j 1 γ ( ic ) ∂L ∑ − (46) BHW √ σ 2 ( ic ) + ϵ ∂y ( b, ic, i, j ) b,i,j ∂L γ ( ic ) = (47) ∂y ( b, ic, i, j ) √ σ 2 ( ic ) + ϵ 1 γ ( ic ) ∂L − ∂γ ( ic ) ˆ x ( b, ic, i, j ) (48) BHW √ σ 2 ( ic ) + ϵ 1 γ ( ic ) ∂L (49) − BHW √ ∂β ( ic ) σ 2 ( ic ) + ϵ γ ( ic ) ( ∂L 1 ( ∂L ∂L )) = ∂y ( b, ic, i, j ) − ∂γ ( ic ) ˆ x ( b, ic, i, j ) + (50) √ BHW ∂β ( ic ) σ 2 ( ic ) + ϵ 8 Relu4 Forward: y ( b, c, i, j ) = max(0 , x ( b, c, i, j )) (51) { x ( b, c, i, j ) x ( b, c, i, j ) ≥ 0 = (52) 0 x ( b, c, i, j ) < 0 Backward:  ∂L ∂L ( x ( b, c, i, j ) ≥ 0)  = (53) ∂y ( b, c, i, j ) ∂x ( b, c, i, j ) 0 otherwise  9 MaxPooling2d Forward: Si ≤ i ′ <S ( i +1) ,Sj ≤ j ′ <S ( j +1) x ( b, c, i ′ , j ′ ) y ( b, c, i, j ) = max (54) Backward: ∂y ( b ′ , c ′ , i ′ , j ′ ) ∂L ∂L ∑ = (55) ∂x ( b, c, i, j ) ∂y ( b ′ , c ′ , i ′ , j ′ ) ∂x ( b, c, i, j ) b ′ ,c ′ ,i ′ ,j ′ 5

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend