Neural Architecture Ligeng Zhu May 4th 1 The Blooming of CNNs 2 - - PowerPoint PPT Presentation

neural architecture
SMART_READER_LITE
LIVE PREVIEW

Neural Architecture Ligeng Zhu May 4th 1 The Blooming of CNNs 2 - - PowerPoint PPT Presentation

Neural Architecture Ligeng Zhu May 4th 1 The Blooming of CNNs 2 Bypass Connection x ` +1 = F ` ( x ` ) + x ` = F ` ( x ` ) + F ` 1 ( x ` 1 ) + x ` 1 = F ` ( x ` ) + F ` 1 ( x ` 1 ) + ... + F 1 ( x 1 ) = y ` 1 + y


slide-1
SLIDE 1

Ligeng Zhu May 4th

Neural Architecture

1

slide-2
SLIDE 2

The Blooming of CNNs

2

slide-3
SLIDE 3

Bypass Connection

3

x`+1 = F`(x`) + x` = F`(x`) + F`−1(x`−1) + x`−1 = F`(x`) + F`−1(x`−1) + ... + F1(x1) = y`−1 + y`−2 + ... + y1.

Direct gradient flow between any two layer, makes

  • ptimizer easy to optimize.
slide-4
SLIDE 4

Cons of Residual Connection

4

  • Information loss during summation (especially in deep case)

Cifar-10 param error Res-32 0.46M 7.51 Res-44 0.66M 7.17 Res-56 0.85M 6.97 Res-110 1.7M 6.43 Res-1202 19.4M 7.93

3 + 10 + 15 = 28 (easy) 28 = ? + ? + ? (difficult)

slide-5
SLIDE 5

Improves of Residual Connection

5

  • Avoid information loss via replacing sum with concat

3 + 10 + 15 = 28 (easy) 28 = ? + ? + ? (difficult) concat(3, 10, 15) = [3, 10, 15] [3, 10, 15] = concat(3, 10, 15)

# ResNet pre-activation def ResidualBlock(x): x1 = BN_ReLU_Conv(x) x2 = BN_ReLU_Conv(x1) return x + x2 for i in range(N): model.add(ResidualBlock) # DenseNet BC structure def DenseBlock(x): x1 = BN_ReLU_Conv(x) x2 = BN_ReLU_Conv(x1) return Concat([x, x2]) for i in range(N): model.add(DenseBlock)

Huang, G., Liu, Z., Van Der Maaten, L., & Weinberger, K. Q. (2017). Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4700-4708).

slide-6
SLIDE 6

DenseNet

6

Huang, G., Liu, Z., Van Der Maaten, L., & Weinberger, K. Q. (2017). Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4700-4708).

  • Concat is more parameter-efficient than sum.
slide-7
SLIDE 7

Cons of Concatenation

7

  • Disadvantage :
  • Exploding parameters in deep networks-> O(n^2)
  • Redundant inputs in deeper layers

Dense-40-12 1.0M Dense-100-12 7.0M Dense-100-24 27.2M Dense-200-12 OOM

slide-8
SLIDE 8

Rethink about ResNet and DenseNet

8

  • Features are densely aggregated in both ResNet and DenseNet.

x`+1 = F`(x`) + x` = F`(x`) + F`−1(x`−1) + x`−1 = F`(x`) + F`−1(x`−1) + ... + F1(x1) = y`−1 + y`−2 + ... + y1. x`+1 = F`(x`) ⊕ x` = F`(x`) ⊕ F`−1(x`−1) ⊕ x`−1 = F`(x`) ⊕ F`−1(x`−1) ⊕ ... ⊕ F1(x1) = y`−1 ⊕ y`−2 ⊕ ... ⊕ y1.

slide-9
SLIDE 9

Variations of dense aggregation (how to aggregate)

9

ResNet DenseNet Mixed Link Dual Path

slide-10
SLIDE 10

Sum and Concat

10

  • ResNet and DenseNet are both dense aggregation structure.
  • Summation appears to be powerful on gradients, BUT
  • Information loss leads to parameter deficiency
  • Concat is a better way of aggregations, BUT
  • Blowing params and redundancy
  • Any way to utilize both advantages without bringing new troubles?
slide-11
SLIDE 11

Sparsely Aggregated Convolutional Networks

11 Zhu, L., Deng, R., Maire, M., Deng, Z., Mori, G., & Tan, P. (2018). Sparsely aggregated convolutional

  • networks. In Proceedings of the European Conference on Computer Vision (ECCV) (pp. 186-201).
  • Instead of “how to aggregate”, consider “what to aggregate”
  • Only gather layers with exponential offsets
slide-12
SLIDE 12

Params and Gradient Flow Analysis

12 Zhu, L., Deng, R., Maire, M., Deng, Z., Mori, G., & Tan, P. (2018). Sparsely aggregated convolutional

  • networks. In Proceedings of the European Conference on Computer Vision (ECCV) (pp. 186-201).
  • The total skip connections (params)
  • The gradient flow between any two layers
  • For example, when base is 2

logc1 + logc2 + ... + logcN = logcN! ≈ logcN N = O(NlgN)

N offsets => logcN × (c − 1) steps

14 offsets => 11102 => 3 steps 23 offsets => 101112 => 4 steps

slide-13
SLIDE 13

Dense Concatenation and Sparse Aggregation

13

(a) Dense Aggregation: Equivalent Exploded View of (a)

F0 × F1 × F2 × F3 × F4 × F5 × F6 × F7 × F8

(b) Sparse Aggregation (Our Proposed Topology)

F0 × F1 × F2 × F3 × F4 × F5 × F6 × F7 × F8

ResNet & DenseNet: each layer takes all previous outputs. SparseNet: each layer takes all outputs with exponential offset (e.g., i-1, i - 2, i - 4, i - 8 …)

Zhu, L., Deng, R., Maire, M., Deng, Z., Mori, G., & Tan, P. (2018). Sparsely aggregated convolutional

  • networks. In Proceedings of the European Conference on Computer Vision (ECCV) (pp. 186-201).
slide-14
SLIDE 14

Better parameter utilization

14 Zhu, L., Deng, R., Maire, M., Deng, Z., Mori, G., & Tan, P. (2018). Sparsely aggregated convolutional

  • networks. In Proceedings of the European Conference on Computer Vision (ECCV) (pp. 186-201).
slide-15
SLIDE 15

Better Param-Perform Curve

15 Zhu, L., Deng, R., Maire, M., Deng, Z., Mori, G., & Tan, P. (2018). Sparsely aggregated convolutional

  • networks. In Proceedings of the European Conference on Computer Vision (ECCV) (pp. 186-201).
slide-16
SLIDE 16

Remaining Question

16

  • What if, let the network self-choose what to aggregate?
slide-17
SLIDE 17

From Manual Design to Architecture Search

17

Manual Architecture Design

VGGNets Inception Models ResNets DenseNets ….

Automatic Architecture Search

Human Expertise Machine Learning

Reinforcement Learning Neuro-evolution Bayesian Optimization Monte Carlo Tree Search …

Computational Resources

slide-18
SLIDE 18

NASNet

18

slide-19
SLIDE 19

Everything is good, except the cost

19

Learning Transferable Architectures for Scalable Image Recognition

4 days * 24 hours * 500 GPUs = 48,000 GPU hours

slide-20
SLIDE 20

Common Way: Proxy

20

  • Search on a small dataset, then transfer to large one(s).
  • e.g., CIFAR -> ImageNet
  • Search a subset(a single or few blocks), then repeats
  • Train only a few epochs instead fully train the model.

Proxy leads to sub-optimal!

slide-21
SLIDE 21

Exploration on Efficient NAS

21

slide-22
SLIDE 22

Efficient Architecture Search by Network Transformation

22

Net2Wider Net2Deeper

slide-23
SLIDE 23

Efficient Architecture Search by Network Transformation

23

  • Instead of sample a random layer, sample a equivalent transformation
slide-24
SLIDE 24

Exploration on Efficient NAS

24

slide-25
SLIDE 25

Understanding and Simplifying One-Shot Architecture Search

25

  • 1. Train a larger network (with all candidates)
  • 2. Sample a path, validate the performance.
  • 3. Repeat step 2.
  • 4. Choose the one with highest performance.
slide-26
SLIDE 26

26

Massachusetts Institute of Technology Han Cai, Ligeng Zhu, Song Han

ProxylessNAS: Direct Neural Architecture Search

  • n Target Task and Hardware
slide-27
SLIDE 27

27

From General Design to Specialized CNN

ResNet Inception DenseNet MobileNet ShuffleNet

Previous Paradigm: One CNN for all datasets. Our Work: Customize CNN for each dataset.

Proxyless NAS

slide-28
SLIDE 28

28

From General Design to Specialized CNN

Previous Paradigm: One CNN for all platforms.

ResNet Inception DenseNet MobileNet ShuffleNet

Our Work: Customize CNN for each platform.

Proxyless NAS

slide-29
SLIDE 29

29

Conventional NAS: Computation Expensive

Current neural architecture search (NAS) is VERY EXPENSIVE.

  • NASNet: 48,000 GPU hours ≈ 5 years on single GPU
  • DARTS: 100Gb GPU memory* ≈ 9 times of modern GPU

Therefore, previous work have to utilize proxy tasks:

  • CIFAR-10 -> ImageNet
  • Small architecture space (e.g. low depth) -> large architecture space
  • Fewer epochs training -> full training

…….

*if directly search on ImageNet, like us

Proxy Task

Transfer Architecture Updates

Target Task & Hardware Learner

slide-30
SLIDE 30

30

Conventional NAS: proxy-based

Proxies:

  • CIFAR-10 -> ImageNet
  • Small architecture space (e.g. low depth) -> large architecture space
  • Fewer epochs training -> full training

Limitations of Proxy

  • Suboptimal for the target task
  • Blocks are forced to share the same structure.
  • Cannot optimize for specific hardware.

Proxy Task

Transfer Architecture Updates

Target Task & Hardware Learner

slide-31
SLIDE 31

31

Our Work: proxyless, save GPU hours by 200x

Goal: Directly learn architectures on the target task and hardware, while allowing all blocks to have different structures. We achieved by

  • 1. Reducing the cost of NAS (GPU hours and memory) to the same level of regular training.
  • 2. Cooperating hardware feedback (e.g. latency) into the search process.

Learner Target Task & Hardware

Architecture Update

Proxy Task

Transfer Architecture

Update

Target Task & Hardware Learner

slide-32
SLIDE 32

32

To make NAS 200x more Efficient

poor weapon but smart students Less GPUs but: we have more efficient algorithm AI research institutes: Good weapon (GPU cluster) Many Engineers

High-end GPU cluster Many Engineers poor equipment, smart algorithm Google, Facebook, NVIDIA

slide-33
SLIDE 33

33

Model Compression

Pruning Binarization

Save GPU hours Save GPU Memory Neural Architecture Search

slide-34
SLIDE 34

34

Pruning redundant paths based on architecture parameters Simplify NAS to be a single training process of a over-parameterized network. No meta controller. Stand on the shoulder of giants. Build the cumbersome network with all candidate paths

Save GPU Hours

slide-35
SLIDE 35

35

Save GPU Memory

Binarize the architecture parameters and allow only one path of activation to be active in memory at run-time. We propose gradient-based and RL methods to update the binarized parameters. Thereby, the memory footprint reduces from O(N) to O(1).

slide-36
SLIDE 36

Search Cost

36

slide-37
SLIDE 37

FLOPs != Latency

37

10% FLOPs difference 60% latency difference

slide-38
SLIDE 38

Hardware-aware Constraints

38

Op Lat. 4ms 3ms 7ms … … 1ms

… …

<latexit sha1_base64="bFGBoTXmqpV1xsabjmpSF38Jk=">AB8HicbVDLSgNBEOyNrxhfUY9eBoPgKewGQY8BLx4jmIckS5idJIhM7PLzKwQlnyFw+KePVzvPk3TpI9aGJBQ1HVTXdXlAhurO9/e4WNza3tneJuaW/4PCofHzSMnGqGTZLGLdiahBwRU2LbcCO4lGKiOB7WhyO/fbT6gNj9WDnSYSjpSfMgZtU56bPSzXoSWzvrlil/1FyDrJMhJBXI0+uWv3iBmqURlmaDGdAM/sWFGteVM4KzUSw0mlE3oCLuOKirRhNni4Bm5cMqADGPtSlmyUH9PZFQaM5WR65TUjs2qNxf/87qpHd6EGVdJalGx5aJhKoiNyfx7MuAamRVTRyjT3N1K2JhqyqzLqORCFZfXietWjXwq8H9VaVey+MowhmcwyUEcA1uIMGNIGBhGd4hTdPey/eu/exbC14+cwp/IH3+QPd/pBj</latexit><latexit sha1_base64="bFGBoTXmqpV1xsabjmpSF38Jk=">AB8HicbVDLSgNBEOyNrxhfUY9eBoPgKewGQY8BLx4jmIckS5idJIhM7PLzKwQlnyFw+KePVzvPk3TpI9aGJBQ1HVTXdXlAhurO9/e4WNza3tneJuaW/4PCofHzSMnGqGTZLGLdiahBwRU2LbcCO4lGKiOB7WhyO/fbT6gNj9WDnSYSjpSfMgZtU56bPSzXoSWzvrlil/1FyDrJMhJBXI0+uWv3iBmqURlmaDGdAM/sWFGteVM4KzUSw0mlE3oCLuOKirRhNni4Bm5cMqADGPtSlmyUH9PZFQaM5WR65TUjs2qNxf/87qpHd6EGVdJalGx5aJhKoiNyfx7MuAamRVTRyjT3N1K2JhqyqzLqORCFZfXietWjXwq8H9VaVey+MowhmcwyUEcA1uIMGNIGBhGd4hTdPey/eu/exbC14+cwp/IH3+QPd/pBj</latexit><latexit sha1_base64="bFGBoTXmqpV1xsabjmpSF38Jk=">AB8HicbVDLSgNBEOyNrxhfUY9eBoPgKewGQY8BLx4jmIckS5idJIhM7PLzKwQlnyFw+KePVzvPk3TpI9aGJBQ1HVTXdXlAhurO9/e4WNza3tneJuaW/4PCofHzSMnGqGTZLGLdiahBwRU2LbcCO4lGKiOB7WhyO/fbT6gNj9WDnSYSjpSfMgZtU56bPSzXoSWzvrlil/1FyDrJMhJBXI0+uWv3iBmqURlmaDGdAM/sWFGteVM4KzUSw0mlE3oCLuOKirRhNni4Bm5cMqADGPtSlmyUH9PZFQaM5WR65TUjs2qNxf/87qpHd6EGVdJalGx5aJhKoiNyfx7MuAamRVTRyjT3N1K2JhqyqzLqORCFZfXietWjXwq8H9VaVey+MowhmcwyUEcA1uIMGNIGBhGd4hTdPey/eu/exbC14+cwp/IH3+QPd/pBj</latexit><latexit sha1_base64="bFGBoTXmqpV1xsabjmpSF38Jk=">AB8HicbVDLSgNBEOyNrxhfUY9eBoPgKewGQY8BLx4jmIckS5idJIhM7PLzKwQlnyFw+KePVzvPk3TpI9aGJBQ1HVTXdXlAhurO9/e4WNza3tneJuaW/4PCofHzSMnGqGTZLGLdiahBwRU2LbcCO4lGKiOB7WhyO/fbT6gNj9WDnSYSjpSfMgZtU56bPSzXoSWzvrlil/1FyDrJMhJBXI0+uWv3iBmqURlmaDGdAM/sWFGteVM4KzUSw0mlE3oCLuOKirRhNni4Bm5cMqADGPtSlmyUH9PZFQaM5WR65TUjs2qNxf/87qpHd6EGVdJalGx5aJhKoiNyfx7MuAamRVTRyjT3N1K2JhqyqzLqORCFZfXietWjXwq8H9VaVey+MowhmcwyUEcA1uIMGNIGBhGd4hTdPey/eu/exbC14+cwp/IH3+QPd/pBj</latexit>

<latexit sha1_base64="tcOz0ez3MuOKlAxBXf/w/Cixi04=">AB8XicbVBNS8NAEJ3Ur1q/qh69LBbBU0mKoMeCF48V7Ae2oUy2m3bpZhN2N0I/RdePCji1X/jzX/jts1BWx8MPN6bYWZekAiujet+O6WNza3tnfJuZW/4PCoenzS0XGqKGvTWMSqF6BmgkvWNtwI1ksUwygQrBtMb+d+94kpzWP5YLKE+RGOJQ85RWOlx9YwH6BIJjgbVmtu3V2ArBOvIDUo0BpWvwajmKYRk4YK1LrvuYnxc1SGU8FmlUGqWYJ0imPWt1RixLSfLy6ekQurjEgYK1vSkIX6eyLHSOsCmxnhGaiV725+J/XT014+dcJqlhki4XhakgJibz98mIK0aNyCxBqri9ldAJKqTGhlSxIXirL6+TqPuXv/qrWbBRxlOEMzuESPLiGJtxBC9pAQcIzvMKbo50X5935WLaWnGLmFP7A+fwBp0WQ1w=</latexit><latexit sha1_base64="tcOz0ez3MuOKlAxBXf/w/Cixi04=">AB8XicbVBNS8NAEJ3Ur1q/qh69LBbBU0mKoMeCF48V7Ae2oUy2m3bpZhN2N0I/RdePCji1X/jzX/jts1BWx8MPN6bYWZekAiujet+O6WNza3tnfJuZW/4PCoenzS0XGqKGvTWMSqF6BmgkvWNtwI1ksUwygQrBtMb+d+94kpzWP5YLKE+RGOJQ85RWOlx9YwH6BIJjgbVmtu3V2ArBOvIDUo0BpWvwajmKYRk4YK1LrvuYnxc1SGU8FmlUGqWYJ0imPWt1RixLSfLy6ekQurjEgYK1vSkIX6eyLHSOsCmxnhGaiV725+J/XT014+dcJqlhki4XhakgJibz98mIK0aNyCxBqri9ldAJKqTGhlSxIXirL6+TqPuXv/qrWbBRxlOEMzuESPLiGJtxBC9pAQcIzvMKbo50X5935WLaWnGLmFP7A+fwBp0WQ1w=</latexit><latexit sha1_base64="tcOz0ez3MuOKlAxBXf/w/Cixi04=">AB8XicbVBNS8NAEJ3Ur1q/qh69LBbBU0mKoMeCF48V7Ae2oUy2m3bpZhN2N0I/RdePCji1X/jzX/jts1BWx8MPN6bYWZekAiujet+O6WNza3tnfJuZW/4PCoenzS0XGqKGvTWMSqF6BmgkvWNtwI1ksUwygQrBtMb+d+94kpzWP5YLKE+RGOJQ85RWOlx9YwH6BIJjgbVmtu3V2ArBOvIDUo0BpWvwajmKYRk4YK1LrvuYnxc1SGU8FmlUGqWYJ0imPWt1RixLSfLy6ekQurjEgYK1vSkIX6eyLHSOsCmxnhGaiV725+J/XT014+dcJqlhki4XhakgJibz98mIK0aNyCxBqri9ldAJKqTGhlSxIXirL6+TqPuXv/qrWbBRxlOEMzuESPLiGJtxBC9pAQcIzvMKbo50X5935WLaWnGLmFP7A+fwBp0WQ1w=</latexit><latexit sha1_base64="tcOz0ez3MuOKlAxBXf/w/Cixi04=">AB8XicbVBNS8NAEJ3Ur1q/qh69LBbBU0mKoMeCF48V7Ae2oUy2m3bpZhN2N0I/RdePCji1X/jzX/jts1BWx8MPN6bYWZekAiujet+O6WNza3tnfJuZW/4PCoenzS0XGqKGvTWMSqF6BmgkvWNtwI1ksUwygQrBtMb+d+94kpzWP5YLKE+RGOJQ85RWOlx9YwH6BIJjgbVmtu3V2ArBOvIDUo0BpWvwajmKYRk4YK1LrvuYnxc1SGU8FmlUGqWYJ0imPWt1RixLSfLy6ekQurjEgYK1vSkIX6eyLHSOsCmxnhGaiV725+J/XT014+dcJqlhki4XhakgJibz98mIK0aNyCxBqri9ldAJKqTGhlSxIXirL6+TqPuXv/qrWbBRxlOEMzuESPLiGJtxBC9pAQcIzvMKbo50X5935WLaWnGLmFP7A+fwBp0WQ1w=</latexit>

<latexit sha1_base64="PwTcVNjRQpkKx+mHJHVbmIkms3U=">AB8XicbVDLSgNBEOyNrxhfUY9eBoPgKewGQY8BLx4jmAcmS+idTJIhM7PLzKwQlvyFw+KePVvPk3TpI9aGJBQ1HVTXdXlAhurO9/e4WNza3tneJuaW/4PCofHzSMnGqKWvSWMS6E6FhgivWtNwK1k0QxkJ1o4mt3O/cS04bF6sNOEhRJHig85Reukx0Y/6xk+kjrlyt+1V+ArJMgJxXI0eiXv3qDmKaSKUsFGtMN/MSGWrLqWCzUi81LE6wRHrOqpQMhNmi4tn5MIpAzKMtStlyUL9PZGhNGYqI9cp0Y7NqjcX/O6qR3ehBlXSWqZostFw1QG5P5+2TANaNWTB1Bqrm7ldAxaqTWhVRyIQSrL6+TVq0a+NXg/qpSr+VxFOEMzuESAriGOtxBA5pAQcEzvMKbZ7wX7937WLYWvHzmFP7A+/wBuC6Q4g=</latexit><latexit sha1_base64="PwTcVNjRQpkKx+mHJHVbmIkms3U=">AB8XicbVDLSgNBEOyNrxhfUY9eBoPgKewGQY8BLx4jmAcmS+idTJIhM7PLzKwQlvyFw+KePVvPk3TpI9aGJBQ1HVTXdXlAhurO9/e4WNza3tneJuaW/4PCofHzSMnGqKWvSWMS6E6FhgivWtNwK1k0QxkJ1o4mt3O/cS04bF6sNOEhRJHig85Reukx0Y/6xk+kjrlyt+1V+ArJMgJxXI0eiXv3qDmKaSKUsFGtMN/MSGWrLqWCzUi81LE6wRHrOqpQMhNmi4tn5MIpAzKMtStlyUL9PZGhNGYqI9cp0Y7NqjcX/O6qR3ehBlXSWqZostFw1QG5P5+2TANaNWTB1Bqrm7ldAxaqTWhVRyIQSrL6+TVq0a+NXg/qpSr+VxFOEMzuESAriGOtxBA5pAQcEzvMKbZ7wX7937WLYWvHzmFP7A+/wBuC6Q4g=</latexit><latexit sha1_base64="PwTcVNjRQpkKx+mHJHVbmIkms3U=">AB8XicbVDLSgNBEOyNrxhfUY9eBoPgKewGQY8BLx4jmAcmS+idTJIhM7PLzKwQlvyFw+KePVvPk3TpI9aGJBQ1HVTXdXlAhurO9/e4WNza3tneJuaW/4PCofHzSMnGqKWvSWMS6E6FhgivWtNwK1k0QxkJ1o4mt3O/cS04bF6sNOEhRJHig85Reukx0Y/6xk+kjrlyt+1V+ArJMgJxXI0eiXv3qDmKaSKUsFGtMN/MSGWrLqWCzUi81LE6wRHrOqpQMhNmi4tn5MIpAzKMtStlyUL9PZGhNGYqI9cp0Y7NqjcX/O6qR3ehBlXSWqZostFw1QG5P5+2TANaNWTB1Bqrm7ldAxaqTWhVRyIQSrL6+TVq0a+NXg/qpSr+VxFOEMzuESAriGOtxBA5pAQcEzvMKbZ7wX7937WLYWvHzmFP7A+/wBuC6Q4g=</latexit><latexit sha1_base64="PwTcVNjRQpkKx+mHJHVbmIkms3U=">AB8XicbVDLSgNBEOyNrxhfUY9eBoPgKewGQY8BLx4jmAcmS+idTJIhM7PLzKwQlvyFw+KePVvPk3TpI9aGJBQ1HVTXdXlAhurO9/e4WNza3tneJuaW/4PCofHzSMnGqKWvSWMS6E6FhgivWtNwK1k0QxkJ1o4mt3O/cS04bF6sNOEhRJHig85Reukx0Y/6xk+kjrlyt+1V+ArJMgJxXI0eiXv3qDmKaSKUsFGtMN/MSGWrLqWCzUi81LE6wRHrOqpQMhNmi4tn5MIpAzKMtStlyUL9PZGhNGYqI9cp0Y7NqjcX/O6qR3ehBlXSWqZostFw1QG5P5+2TANaNWTB1Bqrm7ldAxaqTWhVRyIQSrL6+TVq0a+NXg/qpSr+VxFOEMzuESAriGOtxBA5pAQcEzvMKbZ7wX7937WLYWvHzmFP7A+/wBuC6Q4g=</latexit>

P✏

<latexit sha1_base64="kqA7Jl41ZQtj0KHGby5+uabk+5g=">AB83icbVDLSgNBEOz1GeMr6tHLYBA8hd0g6DHgxWME84DsEmYnWTI7MwyMyuEJb/hxYMiXv0Zb/6Nk2QPmljQUFR1090Vp4Ib6/vf3sbm1vbObmvH9weHRcOTltG5Vphi2mhNLdmBoUXGLciuwm2qkSywE0/u5n7nCbXhSj7aYpRQkeSDzmj1klhs5+HmBoulJz1K1W/5i9A1klQkCoUaPYrX+FAsSxBaZmgxvQCP7VRTrXlTOCsHGYGU8omdIQ9RyVN0ET54uYZuXTKgAyVdiUtWai/J3KaGDNYteZUDs2q95c/M/rZXZ4G+VcplFyZaLhpkgVpF5AGTANTIrpo5Qprm7lbAx1ZRZF1PZhRCsvrxO2vVa4NeCh+tqo17EUYJzuIArCOAGnAPTWgBgxSe4RXevMx78d69j2XrhlfMnMEfeJ8/a+2R3w=</latexit><latexit sha1_base64="kqA7Jl41ZQtj0KHGby5+uabk+5g=">AB83icbVDLSgNBEOz1GeMr6tHLYBA8hd0g6DHgxWME84DsEmYnWTI7MwyMyuEJb/hxYMiXv0Zb/6Nk2QPmljQUFR1090Vp4Ib6/vf3sbm1vbObmvH9weHRcOTltG5Vphi2mhNLdmBoUXGLciuwm2qkSywE0/u5n7nCbXhSj7aYpRQkeSDzmj1klhs5+HmBoulJz1K1W/5i9A1klQkCoUaPYrX+FAsSxBaZmgxvQCP7VRTrXlTOCsHGYGU8omdIQ9RyVN0ET54uYZuXTKgAyVdiUtWai/J3KaGDNYteZUDs2q95c/M/rZXZ4G+VcplFyZaLhpkgVpF5AGTANTIrpo5Qprm7lbAx1ZRZF1PZhRCsvrxO2vVa4NeCh+tqo17EUYJzuIArCOAGnAPTWgBgxSe4RXevMx78d69j2XrhlfMnMEfeJ8/a+2R3w=</latexit><latexit sha1_base64="kqA7Jl41ZQtj0KHGby5+uabk+5g=">AB83icbVDLSgNBEOz1GeMr6tHLYBA8hd0g6DHgxWME84DsEmYnWTI7MwyMyuEJb/hxYMiXv0Zb/6Nk2QPmljQUFR1090Vp4Ib6/vf3sbm1vbObmvH9weHRcOTltG5Vphi2mhNLdmBoUXGLciuwm2qkSywE0/u5n7nCbXhSj7aYpRQkeSDzmj1klhs5+HmBoulJz1K1W/5i9A1klQkCoUaPYrX+FAsSxBaZmgxvQCP7VRTrXlTOCsHGYGU8omdIQ9RyVN0ET54uYZuXTKgAyVdiUtWai/J3KaGDNYteZUDs2q95c/M/rZXZ4G+VcplFyZaLhpkgVpF5AGTANTIrpo5Qprm7lbAx1ZRZF1PZhRCsvrxO2vVa4NeCh+tqo17EUYJzuIArCOAGnAPTWgBgxSe4RXevMx78d69j2XrhlfMnMEfeJ8/a+2R3w=</latexit><latexit sha1_base64="kqA7Jl41ZQtj0KHGby5+uabk+5g=">AB83icbVDLSgNBEOz1GeMr6tHLYBA8hd0g6DHgxWME84DsEmYnWTI7MwyMyuEJb/hxYMiXv0Zb/6Nk2QPmljQUFR1090Vp4Ib6/vf3sbm1vbObmvH9weHRcOTltG5Vphi2mhNLdmBoUXGLciuwm2qkSywE0/u5n7nCbXhSj7aYpRQkeSDzmj1klhs5+HmBoulJz1K1W/5i9A1klQkCoUaPYrX+FAsSxBaZmgxvQCP7VRTrXlTOCsHGYGU8omdIQ9RyVN0ET54uYZuXTKgAyVdiUtWai/J3KaGDNYteZUDs2q95c/M/rZXZ4G+VcplFyZaLhpkgVpF5AGTANTIrpo5Qprm7lbAx1ZRZF1PZhRCsvrxO2vVa4NeCh+tqo17EUYJzuIArCOAGnAPTWgBgxSe4RXevMx78d69j2XrhlfMnMEfeJ8/a+2R3w=</latexit>

Learnable Block i Latency Lookup Table (LUT) Estimated Latency Arch Info

Query the latency from the lookup table.

E[LATi] =Pα × F(conv 3x3)+ Pβ × F(conv 5x5)+ Pσ × F(identity)+ ...... Pζ × F(pool 3x3)

<latexit sha1_base64="VRgvkxmFph20LMXmsTdND8bOKw8=">AC3nicdVJNb9QwEHXCVwlfWzhysViBipBWCaUqF6QCAnHgsEjdtlIcBceZzVp17Mh2qi5RDlw4gBXfhc3fgh3vNlQaLuMZOnpzRvPm7GzSnBjw/Cn51+4eOnylbWrwbXrN27eGqzf3jOq1gwmTAmlDzJqQHAJE8utgINKAy0zAfvZ4ctFfv8ItOFK7tp5BUlJC8mnFHrqHTwi2RQcNlQwQsJeRuQktpZljWv2riDumzePt9NeZvgZzh4gMcpoaKaUwsL8Hg1xt/ZEzJI5JuHm+2D/EjTMhSnIGl+H/ireOtU2LDi3LFzTwH6Wab/9WOujip+7C6SaWU6B0FBGR+MmU6GIajsAt8HkQ9GKI+xungB8kVq0vngwlqTByFlU0aqi1nAtzWagMVZYe0gNhBSZ2RpOmep8X3HZPjqdLuSIs79t+KhpbGzMvMKRfGzdncglyVi2s7fZo0XFa1BcmWja1wFbhxVvjnGtgVswdoExz5xWzGdWUWfcjAreE6OzI58He41EUjqJ3T4Y7L/p1rKG76B7aQBHaRjvoDRqjCWJe7H30Pntf/Pf+J/+r/20p9b2+5g46Ff73w5Z4Ic=</latexit><latexit sha1_base64="VRgvkxmFph20LMXmsTdND8bOKw8=">AC3nicdVJNb9QwEHXCVwlfWzhysViBipBWCaUqF6QCAnHgsEjdtlIcBceZzVp17Mh2qi5RDlw4gBXfhc3fgh3vNlQaLuMZOnpzRvPm7GzSnBjw/Cn51+4eOnylbWrwbXrN27eGqzf3jOq1gwmTAmlDzJqQHAJE8utgINKAy0zAfvZ4ctFfv8ItOFK7tp5BUlJC8mnFHrqHTwi2RQcNlQwQsJeRuQktpZljWv2riDumzePt9NeZvgZzh4gMcpoaKaUwsL8Hg1xt/ZEzJI5JuHm+2D/EjTMhSnIGl+H/ireOtU2LDi3LFzTwH6Wab/9WOujip+7C6SaWU6B0FBGR+MmU6GIajsAt8HkQ9GKI+xungB8kVq0vngwlqTByFlU0aqi1nAtzWagMVZYe0gNhBSZ2RpOmep8X3HZPjqdLuSIs79t+KhpbGzMvMKRfGzdncglyVi2s7fZo0XFa1BcmWja1wFbhxVvjnGtgVswdoExz5xWzGdWUWfcjAreE6OzI58He41EUjqJ3T4Y7L/p1rKG76B7aQBHaRjvoDRqjCWJe7H30Pntf/Pf+J/+r/20p9b2+5g46Ff73w5Z4Ic=</latexit><latexit sha1_base64="VRgvkxmFph20LMXmsTdND8bOKw8=">AC3nicdVJNb9QwEHXCVwlfWzhysViBipBWCaUqF6QCAnHgsEjdtlIcBceZzVp17Mh2qi5RDlw4gBXfhc3fgh3vNlQaLuMZOnpzRvPm7GzSnBjw/Cn51+4eOnylbWrwbXrN27eGqzf3jOq1gwmTAmlDzJqQHAJE8utgINKAy0zAfvZ4ctFfv8ItOFK7tp5BUlJC8mnFHrqHTwi2RQcNlQwQsJeRuQktpZljWv2riDumzePt9NeZvgZzh4gMcpoaKaUwsL8Hg1xt/ZEzJI5JuHm+2D/EjTMhSnIGl+H/ireOtU2LDi3LFzTwH6Wab/9WOujip+7C6SaWU6B0FBGR+MmU6GIajsAt8HkQ9GKI+xungB8kVq0vngwlqTByFlU0aqi1nAtzWagMVZYe0gNhBSZ2RpOmep8X3HZPjqdLuSIs79t+KhpbGzMvMKRfGzdncglyVi2s7fZo0XFa1BcmWja1wFbhxVvjnGtgVswdoExz5xWzGdWUWfcjAreE6OzI58He41EUjqJ3T4Y7L/p1rKG76B7aQBHaRjvoDRqjCWJe7H30Pntf/Pf+J/+r/20p9b2+5g46Ff73w5Z4Ic=</latexit><latexit sha1_base64="VRgvkxmFph20LMXmsTdND8bOKw8=">AC3nicdVJNb9QwEHXCVwlfWzhysViBipBWCaUqF6QCAnHgsEjdtlIcBceZzVp17Mh2qi5RDlw4gBXfhc3fgh3vNlQaLuMZOnpzRvPm7GzSnBjw/Cn51+4eOnylbWrwbXrN27eGqzf3jOq1gwmTAmlDzJqQHAJE8utgINKAy0zAfvZ4ctFfv8ItOFK7tp5BUlJC8mnFHrqHTwi2RQcNlQwQsJeRuQktpZljWv2riDumzePt9NeZvgZzh4gMcpoaKaUwsL8Hg1xt/ZEzJI5JuHm+2D/EjTMhSnIGl+H/ireOtU2LDi3LFzTwH6Wab/9WOujip+7C6SaWU6B0FBGR+MmU6GIajsAt8HkQ9GKI+xungB8kVq0vngwlqTByFlU0aqi1nAtzWagMVZYe0gNhBSZ2RpOmep8X3HZPjqdLuSIs79t+KhpbGzMvMKRfGzdncglyVi2s7fZo0XFa1BcmWja1wFbhxVvjnGtgVswdoExz5xWzGdWUWfcjAreE6OzI58He41EUjqJ3T4Y7L/p1rKG76B7aQBHaRjvoDRqjCWJe7H30Pntf/Pf+J/+r/20p9b2+5g46Ff73w5Z4Ic=</latexit>

Loss = LossCE + λ1||w||2

2 + λ2E[LAT]

<latexit sha1_base64="kXj15GWogApedlTjZ+ES1hwcZ7o=">ACRnicbVBNSxBEK3ZfJk1ias5tK4CEJgmVkEcwkYRcjBg4Krwu49PTUamPB901iUv/LpcPHvzJ3jxYAhe7RlXSDQF3Txe1aNevbhQ0pDvX3mtFy9fvX4z97Y9/+79h4XO4tKByUstcCByleujmBtUMsMBSVJ4VGjkazwMD7bqvuHP1AbmWf7NCkwTPlJsdScHJU1Al3cmO+svqP7NZ2xT6zkXLyhEcBm05/TqdR/9j2G57wnJqNVmNS2ce5PhulnE51arer4SPc+bZfhVXU6fo9vyn2HAQz0IVZ7Uady1GSizLFjITixgwDv6DQck1SKzao9JgwcUZP8GhgxlP0YS2MVWxFckbJxr9zJiDfu3wvLUmEkau8napXnaq8n/9Yljb+EVmZFSZiJh0XjUjHKWZ0pS6RGQWriABdaOq9MnHLNBbnk2y6E4OnJz8FBvxf4vWBvrbuxOYtjDj7BMqxCAOuwAd9hFwYg4Bdcwy389i68G+Pd/cw2vJmo/wT7XgHrYxsgA=</latexit><latexit sha1_base64="kXj15GWogApedlTjZ+ES1hwcZ7o=">ACRnicbVBNSxBEK3ZfJk1ias5tK4CEJgmVkEcwkYRcjBg4Krwu49PTUamPB901iUv/LpcPHvzJ3jxYAhe7RlXSDQF3Txe1aNevbhQ0pDvX3mtFy9fvX4z97Y9/+79h4XO4tKByUstcCByleujmBtUMsMBSVJ4VGjkazwMD7bqvuHP1AbmWf7NCkwTPlJsdScHJU1Al3cmO+svqP7NZ2xT6zkXLyhEcBm05/TqdR/9j2G57wnJqNVmNS2ce5PhulnE51arer4SPc+bZfhVXU6fo9vyn2HAQz0IVZ7Uady1GSizLFjITixgwDv6DQck1SKzao9JgwcUZP8GhgxlP0YS2MVWxFckbJxr9zJiDfu3wvLUmEkau8napXnaq8n/9Yljb+EVmZFSZiJh0XjUjHKWZ0pS6RGQWriABdaOq9MnHLNBbnk2y6E4OnJz8FBvxf4vWBvrbuxOYtjDj7BMqxCAOuwAd9hFwYg4Bdcwy389i68G+Pd/cw2vJmo/wT7XgHrYxsgA=</latexit><latexit sha1_base64="kXj15GWogApedlTjZ+ES1hwcZ7o=">ACRnicbVBNSxBEK3ZfJk1ias5tK4CEJgmVkEcwkYRcjBg4Krwu49PTUamPB901iUv/LpcPHvzJ3jxYAhe7RlXSDQF3Txe1aNevbhQ0pDvX3mtFy9fvX4z97Y9/+79h4XO4tKByUstcCByleujmBtUMsMBSVJ4VGjkazwMD7bqvuHP1AbmWf7NCkwTPlJsdScHJU1Al3cmO+svqP7NZ2xT6zkXLyhEcBm05/TqdR/9j2G57wnJqNVmNS2ce5PhulnE51arer4SPc+bZfhVXU6fo9vyn2HAQz0IVZ7Uady1GSizLFjITixgwDv6DQck1SKzao9JgwcUZP8GhgxlP0YS2MVWxFckbJxr9zJiDfu3wvLUmEkau8napXnaq8n/9Yljb+EVmZFSZiJh0XjUjHKWZ0pS6RGQWriABdaOq9MnHLNBbnk2y6E4OnJz8FBvxf4vWBvrbuxOYtjDj7BMqxCAOuwAd9hFwYg4Bdcwy389i68G+Pd/cw2vJmo/wT7XgHrYxsgA=</latexit><latexit sha1_base64="kXj15GWogApedlTjZ+ES1hwcZ7o=">ACRnicbVBNSxBEK3ZfJk1ias5tK4CEJgmVkEcwkYRcjBg4Krwu49PTUamPB901iUv/LpcPHvzJ3jxYAhe7RlXSDQF3Txe1aNevbhQ0pDvX3mtFy9fvX4z97Y9/+79h4XO4tKByUstcCByleujmBtUMsMBSVJ4VGjkazwMD7bqvuHP1AbmWf7NCkwTPlJsdScHJU1Al3cmO+svqP7NZ2xT6zkXLyhEcBm05/TqdR/9j2G57wnJqNVmNS2ce5PhulnE51arer4SPc+bZfhVXU6fo9vyn2HAQz0IVZ7Uady1GSizLFjITixgwDv6DQck1SKzao9JgwcUZP8GhgxlP0YS2MVWxFckbJxr9zJiDfu3wvLUmEkau8napXnaq8n/9Yljb+EVmZFSZiJh0XjUjHKWZ0pS6RGQWriABdaOq9MnHLNBbnk2y6E4OnJz8FBvxf4vWBvrbuxOYtjDj7BMqxCAOuwAd9hFwYg4Bdcwy389i68G+Pd/cw2vJmo/wT7XgHrYxsgA=</latexit>

E[LAT] =

N

X

i

E[LATi]

<latexit sha1_base64="WcTGPvwF/T0BgrUCTKYk73PLuWE=">ACK3icdVDLSgMxFM34rPVdekmWARXZUYE3Qi1IrgQqdAXTMchk2ba0CQzJBmhDPM/bvwVF7rwgVv/w/QhaKsHAodziX3niBmVGnbfrPm5hcWl5ZzK/nVtfWNzcLWdkNFicSkjiMWyVaAFGFUkLqmpFWLAniASPNoH8+9Jt3RCoaiZoexMTjqCtoSDHSRvILlTZHuid5epG53/TqrJZ58BS2VcL9lGa36XUG/8n51PMLRbtkjwBniTMhRTB1S8tTsRTjgRGjOklOvYsfZSJDXFjGT5dqJIjHAfdYlrqECcKC8d3ZrBfaN0YBhJ84SGI/XnRIq4UgMemORwTXtDcW/PDfR4YmXUhEnmg8/ihMGNQRHBYHO1QSrNnAEIQlNbtC3EMSYW3qzZsSnOmTZ0njsOTYJefmqFiuTOrIgV2wBw6A45BGVyCKqgDO7BI3gBr9aD9Wy9Wx/j6Jw1mdkBv2B9fgFHoKjp</latexit><latexit sha1_base64="WcTGPvwF/T0BgrUCTKYk73PLuWE=">ACK3icdVDLSgMxFM34rPVdekmWARXZUYE3Qi1IrgQqdAXTMchk2ba0CQzJBmhDPM/bvwVF7rwgVv/w/QhaKsHAodziX3niBmVGnbfrPm5hcWl5ZzK/nVtfWNzcLWdkNFicSkjiMWyVaAFGFUkLqmpFWLAniASPNoH8+9Jt3RCoaiZoexMTjqCtoSDHSRvILlTZHuid5epG53/TqrJZ58BS2VcL9lGa36XUG/8n51PMLRbtkjwBniTMhRTB1S8tTsRTjgRGjOklOvYsfZSJDXFjGT5dqJIjHAfdYlrqECcKC8d3ZrBfaN0YBhJ84SGI/XnRIq4UgMemORwTXtDcW/PDfR4YmXUhEnmg8/ihMGNQRHBYHO1QSrNnAEIQlNbtC3EMSYW3qzZsSnOmTZ0njsOTYJefmqFiuTOrIgV2wBw6A45BGVyCKqgDO7BI3gBr9aD9Wy9Wx/j6Jw1mdkBv2B9fgFHoKjp</latexit><latexit sha1_base64="WcTGPvwF/T0BgrUCTKYk73PLuWE=">ACK3icdVDLSgMxFM34rPVdekmWARXZUYE3Qi1IrgQqdAXTMchk2ba0CQzJBmhDPM/bvwVF7rwgVv/w/QhaKsHAodziX3niBmVGnbfrPm5hcWl5ZzK/nVtfWNzcLWdkNFicSkjiMWyVaAFGFUkLqmpFWLAniASPNoH8+9Jt3RCoaiZoexMTjqCtoSDHSRvILlTZHuid5epG53/TqrJZ58BS2VcL9lGa36XUG/8n51PMLRbtkjwBniTMhRTB1S8tTsRTjgRGjOklOvYsfZSJDXFjGT5dqJIjHAfdYlrqECcKC8d3ZrBfaN0YBhJ84SGI/XnRIq4UgMemORwTXtDcW/PDfR4YmXUhEnmg8/ihMGNQRHBYHO1QSrNnAEIQlNbtC3EMSYW3qzZsSnOmTZ0njsOTYJefmqFiuTOrIgV2wBw6A45BGVyCKqgDO7BI3gBr9aD9Wy9Wx/j6Jw1mdkBv2B9fgFHoKjp</latexit><latexit sha1_base64="WcTGPvwF/T0BgrUCTKYk73PLuWE=">ACK3icdVDLSgMxFM34rPVdekmWARXZUYE3Qi1IrgQqdAXTMchk2ba0CQzJBmhDPM/bvwVF7rwgVv/w/QhaKsHAodziX3niBmVGnbfrPm5hcWl5ZzK/nVtfWNzcLWdkNFicSkjiMWyVaAFGFUkLqmpFWLAniASPNoH8+9Jt3RCoaiZoexMTjqCtoSDHSRvILlTZHuid5epG53/TqrJZ58BS2VcL9lGa36XUG/8n51PMLRbtkjwBniTMhRTB1S8tTsRTjgRGjOklOvYsfZSJDXFjGT5dqJIjHAfdYlrqECcKC8d3ZrBfaN0YBhJ84SGI/XnRIq4UgMemORwTXtDcW/PDfR4YmXUhEnmg8/ihMGNQRHBYHO1QSrNnAEIQlNbtC3EMSYW3qzZsSnOmTZ0njsOTYJefmqFiuTOrIgV2wBw6A45BGVyCKqgDO7BI3gBr9aD9Wy9Wx/j6Jw1mdkBv2B9fgFHoKjp</latexit>

Gradient Based Reinforce Based

slide-39
SLIDE 39

39

Results: ProxylessNAS on CIFAR-10

  • Directly explore a huge space: 54 distinct blocks and possible architectures
  • State-of-the-art test error with 6X fewer params (Compared to AmeobaNet-B)
slide-40
SLIDE 40

40

When targeting GPU platform, the accuracy is further improved to 75.1%. 3.1% higher than MobilenetV2.

Results: Proxyless-NAS on ImageNet, GPU Platform

slide-41
SLIDE 41

41

  • With >74.5% top-1 accuracy, ProxylessNAS is 1.8x faster than MobileNet-v2,

the current industry standard.

Results: ProxylessNAS on ImageNet, Mobile Platform

slide-42
SLIDE 42

42

ProxylessNAS for Hardware Specialization

slide-43
SLIDE 43

43

Results: ProxylessNAS on ImageNet, Mobile Platform

Model Top-1 Latency Hardware Aware No Proxy No Repeat Search Cost Manually Designed MobilenetV1 70.6 113ms

  • x
  • MobilenetV2

72.0 75ms

  • x
  • ProxylessNAS achieves state-of-the art accuracy (%) on ImageNet

(under mobile latency constraint ≤ 80ms) with 200× less search cost in GPU hours. “LL” indicates latency regularization loss.

NAS NASNet-A 74.0 183ms x x x 48000 AmoebaNet-A 74.4 190ms x x x 75600 MNasNet 74.0 76ms yes x x 40000 ProxylessNAS ProxylessNAS-G 71.8 83ms yes yes yes 200 ProxylessNAS-G + LL 74.2 79ms yes Yes yes 200 ProxylessNAS-R 74.6 78ms yes Yes yes 200 ProxylessNAS-R + MIXUP 75.1 78ms yes yes yes 200

slide-44
SLIDE 44

44

The History of Architectures

(1) The history of finding efficient Mobile model (2) The history of finding efficient CPU model (3) The history of finding efficient GPU model

https://hanlab.mit.edu/files/proxylessNAS/visualization.mp4

slide-45
SLIDE 45

45

Detailed Architectures

MB1 3x3 MB3 5x5 MB3 7x7 MB6 7x7 MB3 5x5 MB6 5x5 MB3 3x3 MB3 5x5 MB6 7x7 MB6 7x7 MB6 7x7 MB6 5x5 MB6 7x7 Conv 3x3 Pooling FC MB3 3x3

40x112x112 24x112x112 3x224x224 32x56x56 56x28x28 56x28x28 112x14x14 112x14x14 128x14x14 128x14x14 128x14x14 256x7x7 256x7x7 256x7x7 256x7x7 432x7x7

Conv 3x3 MB1 3x3 MB3 5x5 MB3 3x3 MB3 7x7 MB3 3x3 MB3 5x5 MB3 5x5 MB6 7x7

32x112x112 32x112x112 3x224x224 32x56x56 40x56x56 40x28x28 40x28x28 40x28x28 40x28x28

MB3 5x5 MB3 5x5

80x14x14 80x14x14

MB6 5x5 MB3 5x5 MB3 5x5 MB3 5x5 MB6 7x7 MB3 7x7 MB6 7x7 Pooling FC

80x14x14 96x14x14 96x14x14 96x14x14 192x7x7 192x7x7 192x7x7 192x7x7 320x7x7

MB3 5x5

80x14x14

MB6 7x7 MB3 7x7

96x14x14

Conv 3x3 MB1 3x3 MB6 3x3 MB3 3x3 MB3 3x3 MB3 3x3 MB6 3x3 MB3 3x3 MB3 3x3

40x112x112 24x112x112 3x224x224 32x56x56 32x56x56 32x56x56 32x56x56 48x28x28 48x28x28

MB6 3x3 MB3 5x5

48x28x28 48x28x28

MB6 5x5 MB3 3x3 MB3 3x3 MB3 3x3 MB6 5x5 MB3 3x3 MB6 5x5 Pooling FC

88x14x14 104x14x14 104x14x14 104x14x14 216x7x7 216x7x7 216x7x7 216x7x7 360x7x7

MB3 3x3

88x14x14

MB3 5x5 MB3 5x5

104x14x14

(1) Efficient mobile architecture found by Proxy-less NAS. (2) Efficient CPU architecture found by Proxy-less NAS. (3) Efficient GPU architecture found by Proxy-less NAS.

slide-46
SLIDE 46

46

AMC: AutoML for Model Compression

He et al [ECCV’18]

Proxyless Neural Architecture Search

Cai et al [ICLR’19]

HAQ: Hardware-aware 
 Automated Quantization

Wang et al [CVPR’19], oral Machine learning expert Hardware expert Non expert Hardware-Centric AutoML

+

A u t

  • M

L

BitFusion (On the Edge)

PE

&

<<

+

  • an
  • w0
  • wn
  • a0

+ PE PE PE PE PE

  • BISMO (On the Cloud)

PE

&

<<

+

  • an
  • w0
  • wn
  • a0

+ PE PE PE PE PE

  • Direct

Feedback Hardware Mapping

3 bit weight 5 bit activation 1 0 1 0 0 0 1 0 1 1 1 0 1 0 1 0 0 1 0 1 0 1 1 1 0 1 0 1 0 0 1 1 1 1 0 1 0 1 0 0 1 0 …… ……

Quantized Model

… Layer 3 3bit / 5bit Layer 4 6bit / 7bit Layer 5 4bit / 6bit Layer 6 5bit / 6bit

Hardware Accelerator Policy

BISMO (On the Edge)

PE

&

<<

Cycle 0 (MSB) Cycle T (LSB)

+

  • an
  • w0
  • wn
  • a0

+ PE PE PE PE PE

slide-47
SLIDE 47

Embrace Open-source

47

  • Our models are now released on Github with pre-trained weights.

# https://github.com/MIT-HAN-LAB/ProxylessNAS from proxyless_nas import * net = proxyless_cpu(pretrained=True) net = proxyless_gpu(pretrained=True) net = proxyless_mobile(pretrained=True)

slide-48
SLIDE 48

AutoDeeplab

48

slide-49
SLIDE 49

NAS-FPN

49