Memory-Optimal Direct Convolutions for Maximizing Classification Accuracy in Embedded Devices
Albert Gural1, Boris Murmann1
1Stanford University
Memory-Optimal Direct Convolutions for Maximizing Classification - - PowerPoint PPT Presentation
Memory-Optimal Direct Convolutions for Maximizing Classification Accuracy in Embedded Devices Albert Gural 1 , Boris Murmann 1 1 Stanford University The 36 th International Conference on Machine Learning Long Beach, California June 11, 2019
1Stanford University
Negative-Memory Overhead
28 × 28 × 1 176 10
AvgPool 2x2 Conv 3x3 Conv 3x3 Conv 3x3 MaxPool 2x2 Dense Flatten
𝑝𝑣𝑢 ≤ 𝑔 𝑗𝑜
𝑝𝑣𝑢 > 𝑔 𝑗𝑜
input pixel
stale pixel kernel
25 cost; 20 free 30 cost; 32 free 55 cost; 60 free
1 2 3 4 5 6 7 8 15 16 17 18 19 20 21 9 22 28 29 30 31 32 33 10 23 34 39 40 41 42 43 11 24 35 44 48 49 50 51 12 25 36 45 52 55 56 57 13 26 37 46 53 58 60 61 14 27 38 47 54 59 62 63
25 cost; 20 free 30 cost; 32 free 55 cost; 60 free
1 2 3 4 5 6 7 8 15 16 17 18 19 20 21 9 22 28 29 30 31 32 33 10 23 34 39 40 41 42 43 11 24 35 44 48 49 50 51 12 25 36 45 52 55 56 57 13 26 37 46 53 58 60 61 14 27 38 47 54 59 62 63
serial comm.
611c1c0150318141b1532a27304888b8bc8e67062038e88784217b578e0efd047480558181f06fe8114475add415fe81d527ec42a3ead2c862d 28feb482fc6d4e7edd1aea57f685f7d8948f6841c6b33258fc5711cd0707446d404138fb231989e9b70981b0183cc38412578774407764ea141 cf9b18a2e08e2e64de7562bf6d28b7df6eb38509483f11e91a3d001ca7db26e09d6088f7589c72715f1e7cf4c9d71f5685849580b016f2150e2 17812fb5d60d6f5cf46420917c4a4797cd83fd2871a087f0183112871fa8784600ce27f8d1f8ed31c302ee7bbf07ea57ec7f8073e7e47957731 8389b88df8381783282cef87d8e0838ff827f78cc1478e5be8d78bd8a79e86ed8742a1698872180d4c635470d03c1762e37c0da766287f8718e 8c6889a89b88d0c02080e4ddfa3f73ba3a4267c0fd14e7f825042c259f1e85798cf58f188583ca788442c828608e78488f608df88a888488580 875380774bf08edc8e7a908e8e72bd72e4218e74e448f39f1fd315c72948ece4f5eae8049d89fff871b722d83ac60e38d788791838867845a78 3f87287aec2df8082e7d18c80e41788cb8eafc2ab3f2872854ef1028cd717c078c1de2a2f708d58b648872fc331834ebca48772d1583f21d678 71ec85b8074ee7dd83888b61c78dfd70df88227788a8817b837887881f78b3801c837b77d88fce824478d08e79e07dc1e0877e8745d06d89d37 38c548fdc88858318d1d7e721855d47630dc1889d788a458f378b7c9147ca788ff8093cfe88574877b8142707388cf898787a7c71383a8fae08 974c0078fc756f88d7628e288dd18f0d88330f8b76213289a2c08880d7273271f27e87d8e7b77a8f80b9888ffa88811877f0b867f1b4f04bd48 f87f88e96418778877881888772f744004a4b87574db264736063827118387031d32ddc312808f7c87f8f75073837887757a7848c8a1a77e88e 84f7768668c278881cd770d3663f3f7c8703be8e423cf14f8683f87b63418370286340f327d86cdee423ec0422473b8c50307e37c9817e80555 7b54106788c741f788d07c1d17217e7ae8d623fe24ff48ed87f323081303e40421633c84143d76f882577472e8e3f1f2175088678a85271e493 f67d8f4668708fe7728d788782f387773788274288d870d2e48ceb7753f3144f8e524385508f1777c2e88fdcbe21318893f78ae677877d8178e 83f8537255f1382b88312323154313d450652b7c87418073c187e888b437878888e8fb88782783c52d2b88de2771023820746e561125c083132 37488e4282608346e21d42231d3444a2ef23321887600f51e687a1fcf48c8cdbe887157300df41ffd0f1df827f8f1104e3f2157e1f643f8beee 7b80155e435011151001c1e12ee1f4223ece1f342ee1c27fb0ef8f5e2221e031751032e611f1c1480b448b5775155b5842c804538d708773f24 308788d0078fb10240def3117e05227d09648373133d572e55a11d0402467e01677017212083874782c6f68578f7774853085712187404ee811 4d24f38222a02278287f2a4487661787f188b787888288880cc87c70872d77417778bf39c87861747857ef3342d625e071814718270ef761308 3c5618437be61412c2eb234c4d4e0ec13c7d0a1822637f853473b302e30ed20e00af2e2511f4c3d0c44231213473f1c10952520320411101251 82f3cb4e30333d07aebdb9ed47748758df4dd7b53e52e40f21ee343df10f4bde0271582f7e18c4d2432fb62b7186357f787f06f2788171f101c f7858e5e8487083283b8ed6a77e2d2884843d3d983e6dede578ef8b7a8e78608f18788f887c82e28d07768683571c5d1722a18645f717532667 582482c7f78890c887878882188e332a7c73d8fd7c1852418328797c7f815878801575f7278272e381bb17ed1bd4e4848754e7e72230313811e 705d7c8d478f38488878da7e5b82b075e5816665012826c781f7ece383c80335202e373f20250d323c003f5e68086738787135d2c22f817af8e e80db08787f81818b4853872837f78d7377e12857b781d78f83880e607832e2e72f321730448f4d3f5c38876768137c77e7e158ff9708df8e88 237d7287b788385787c88387f8dc77817b67878427f8080d1a47f1aca2e0
Program SRAM
serial comm.
Serialized CNN + Input Images Output Classes
NN workspace (1960B)
NN serialization (1525B) NN activations (435B)
28 × 28 × 1 176 10
AvgPool 2x2 Conv 3x3 Conv 3x3 Conv 3x3 MaxPool 2x2 Dense Flatten
Network Topology Weights and Biases Stack (88B)