Jianchao Yang Toutiao AI Lab in Silicon Valley Joint work with - PowerPoint PPT Presentation

Jianchao Yang Toutiao AI Lab in Silicon Valley Joint work with Xiaojie Jin (NUS), Ning Xu (Snap), Yingzhen Yang

• Quest for compact and efficient deep models • Memory usage • Computation cost • App size

• Quest for compact and efficient deep models • Memory usage • Computation cost • App size • WSNet: Compact and efficient network design • Smaller model (e.g., up to 180x smaller on ESC50) • Faster computation (e.g., up to 18x faster on ESC50) • Accuracy comparable to state of the arts

• Conventional convolution filters are initialized and trained separately. ! # ! "

• Conventional convolution filters are initialized and trained separately. • Convolution filters are highly redundant. ! # ! "

• Conventional convolution filters are initialized and trained separately. • Convolution filters are highly redundant. ! # Model quantization Model pruning Low rank Signal sparsity … ! "

• Main Idea : The convolution filters generated from a compact learnable parameter set (low-dimensional manifold), instead of learned separately. ! # ! "

• Main Idea : The convolution filters generated from a compact learnable parameter set (low-dimensional manifold), instead of learned separately. Φ ! 1 % ( = ! ( (Φ) % ' Φ : learnable compact parameter set ! # ! , : mapping function to generate the ( -. convolution filter % &

• Main Idea : The convolution filters generated from a compact learnable parameter set (low-dimensional manifold), instead of learned separately. Φ ! 1 % ( = ! ( (Φ) % ' Φ : learnable compact parameter set ! # ! , : mapping function to generate the ( -. convolution filter We focus on weight sampling for function % & ! , in this work: weight tying!

• Model quantization (e.g., Han et al. 2015) • Weight tying as a result of weight quantization on a learnt model • HashedNet (Chen et al. 2015) • Random weight tying with hashing before model training • Epitome (Jojic et al. 2003) • A statistical model that tie pixel values in overlapping patches Jojic et al. Epitomic analysis of appearance and shape. ICCV 2003. Chen et al. Compressing convolutional neural networks with the hashing trick. ICML 2015.

• Simplest case: 1D Convolution with single channel • Shift sampling Φ • ! : projection matrix • Φ : condensed parameter set # $ = ! $ (Φ)

• Simplest case: 1D Convolution with single channel • Shift sampling Φ • ! : projection matrix • Φ : condensed parameter set # $ = ! $ (Φ) 7 weights to generate 5 1x3 filters (15 weights)

• 1D convolution • Input feature map ! ∈ ℝ $×& , where ((, *) denotes length of input and number of channels • Output feature map * ∈ ℝ $×, , where - denotes number of filters • Convolution kernel . ∈ ℝ /×&×, , where 0 denotes filter length • Number of Multi-Adds: (*0-

• Weight sampling overview ! ∗ : Length of Φ $ ∗ : Channel number of Φ ! : Length of % $ : Channel number of % & : Number of filters ' : Sampling stride ( : Repeating factor !$& Compactness = ! ∗ $ ∗

• Weight shift sampling in spatial dimension • Conventional CNN: ! independent filters with size " , #params = !" • WSNet: Condensed filter with size of " ∗ , #params = " + (! − 1)) Compactness = *+ + ∗ ≈ + -

• Repeating weight sampling in channel dimension • Conventional CNN: Each filter with channel ! • WSNet: Condensed filter with channel ! ∗ % Compactness = C = % ∗

• Example Stride ! = 1 Channel repeating $ = 4 times Filter length & = 16 Compactness = ()* ( ∗ ) ∗ ≈ &$ = 64 Same idea can be generalized to fully connected layers!

• Sample more filters with larger condensed filter (bigger ! ∗ ) and a small stride to increase capacity. Sampling Stride: S Sampling Stride: Ŝ • Increased computation?

• Recap of conventional 1D convolution • Input feature map ! ∈ ℝ $×& • Output feature map ' ∈ ℝ $×( • Convolution kernel ) ∈ ℝ *×&×( • Number of Multi-Adds: +',-

• Re-use the convolution results between overlapped input and filters

• An efficient variant of the integral image method

• Acceleration in terms of Multi-Adds: #!$ # ∗ 3 + ! ∗ − 1 + ! ∗ + $ • Example convolution layer • Conv kernel size !, #, $ = (8, 64, 128) • Condensed kernel ! ∗ , # ∗ = 135, 16 • Input feature map (0, #) • Computation acceleration of ~27 for this layer

• Direct extension • Spatial sampling : shifting patch sampling from a 2D condensed filter • Channel sampling : repeat sampling in the channel dimension

• Direct extension • Spatial sampling : shifting patch sampling from a 2D condensed filter • Channel sampling : repeat sampling in the channel dimension • Compactness • Conventional filter: ( !, ℎ, $, % ) • Condensed filter: ( &, ', $ ∗ ) • Sampling strides: ) ! , ) ℎ

• Tensor decomposition extension • Decompose 3D weight tensors into three 1D vectors (Jin et al. 2015) • Apply WSNet on each 1D vector as in 1D CNN. 3D convolution 1D convolution over three directions Jin et al. Flattened neural networks for feedforward acceleration. ICLR 2015.

• Channel dimension dominates model size and computation. • Channel reordering to reduce computation cost.

• Tasks and datasets • WSNet-1D: Audio classification • ESC-50 • UrbanSound8K • DCASE • WSNet-2D: Image classification • CIFAR 10 • MNIST • ImageNet

• Notation settings of WSNet • Name of WSNet model in the form of ! " # $ % & ' ( • ! " denotes compactness ) in spatial dimension • # $ denotes channel repeating * times • % & denotes ratio of filters as + between WSNet vs baseline through dense sampling • ' ( denotes compression ratio of , by weight quantization when used.

• Baseline network for ESC-50, UrbanaSound8K, DCASE • Network adopted from SoundNet (Aytar et al. 2016) for fair comparison Aytar et al. SoundNet: Learning sound representation from unlabeled videos. NIPS 2016.

• ESC-50 : A collection of 2000 short environmental recordings comprising 50 equally balanced classes of sound events (e.g., animals, water sounds, urban noises, human non-speech sounds, etc.)

• UrbanSound8K : A collection of 8732 short recordings of various urban sound sources (air conditioner, car horn, playing children, etc.)

• DCASE : Detection and Classification of Acoustic Scenes and Events Challenge. It contains 10 acoustic scene categories, with ten samples of 30s recording for training per category.

• Direct 2D extension on CIFAR 10 and MNIST • Same baseline network as HashedNet (Chen et al. 2015) Chen et al. Compressing convolutional neural networks with the hashing trick. ICML 2015.

• Tensor decomposition extension on ImageNet • Single view test • Baseline network is Res34 Model #Params #Multi-Adds Top-1 Res18 11.2M 1800M 70.6 Res34 21.3M 3600M 73.1 MobileNet 4.2M 575M 70.6 WSNet 2.7M 540M 70.4

• WSNet provides a novel design scheme for convolutional neural networks to learn compact and efficient models. • Achieve comparable accuracy with STOA, but with much fewer parameters and computation cost for CNNs. • For future work, explore more filter generation methods, e.g., learning a generative statistical model or low-dimensional basis.

We are hiring research scientist, software engineer, and intern in • Areas • Sites • Computer vision • Beijing • Computer graphics • Silicon Valley (USA) • Machine learning • Seattle (USA) • Natural language processing • Knowledge discover and data mining • Speech and audio processing • Recommender system Send resume to lab-hr@bytedance.com for Beijing Positions and rdus.staffing@bytedance.com for Silicon Valley Positions

Thank You! Reference: https://arxiv.org/abs/1711.10067?context=cs

Jianchao Yang Toutiao AI Lab in Silicon Valley Joint work with - PowerPoint PPT Presentation

Jianchao Yang Toutiao AI Lab in Silicon Valley Joint work with Xiaojie Jin (NUS), Ning Xu (Snap), Yingzhen Yang Quest for compact and efficient deep models Memory usage Computation cost App size Quest for compact and

Non-Negative Graph Embedding N N ti G h E b ddi Jianchao Yang Shuicheng Yan Yun Fu Jianchao

Speaker: Jianchao Lu Jianchao Lu, Xiaomi Mao, Baris Taskin VLSI Lab Electrical & Computer

Keypoint-Based Action Keypoint-Based Action Recognition Recognition Presenter: Jianchao Yang

Educare of California at Silicon Valley (ECSV) A Center of Excellence in Education Dennis Cima

0 -Sparse Subspace Clustering Yingzhen Yang 1 , Jiashi Feng 2 , Nebojsa Jojic 3 , Jianchao Yang

Crossed product C -algebras and nuclear dimension Jianchao Wu University of M unster Aug

PV Technology Based on Crystalline Silicon Wafers Manufacturing of Crystalline Silicon Week 4.2

Sili Silicon Valley Silicon Valley ll Energy Storage Symposium Energy Storage Symposium gy g

Silicon Valley Advanced Water Purification Center | 1 Silicon Valley Advanced Water Purification

INNOVATION SILICON VALLEY, SEPTEMBER 9-13, 2019 Immersion Program in Silicon Valley 9th Edition A

The IoT Inc Business The IoT Inc Business Meetup Meetup Silicon Silicon Valley Valley Op

Face Hallucination via Face Hallucination via Sparse Coding Jianchao Yang, Hao Tang, Yi Ma, and

YANG Data Models for TE and RSVP drafu-ietg-teas-yang-te-08 drafu-ietg-teas-yang-rsvp-07

Yang Yang MICHIGAN TECH Yang Yang , yyang7@mtu.edu RESEARCH FORUM TECHTALKS Current research

User Modeling on Demographic Attributes in Big Mobile Social Networks Yang Yang Northwestern

Silicon Labs Corporate Overview J A N U A R Y 2 0 2 0 The leader in silicon, software and

Silicon Valley Boston 02 / 23 / 2017 Warfighter DIUx accelerates commercial innovation in the

Virtual Power Plant Options Stakeholder Engagement Webinar May 14, 2019 2 Agenda 10:00

Click to edit Master title style WALL STREET + SILICON VALLEY = Marketplace Lending for

The Leveraging of Silicon Valley: Venture Debt in the Innovation Economy Jesse Davis, Adair

Pipeline Perceptions and Realities A deeper look at barriers to D&I in tech and their impact

The Advent of Robotics in the Workplace Chris Eastham Fieldfisher LLP Chris Eastham

Introduction and Background: Began in 2016 Two $2,500 awards are given: a judges award

Welcome 2016 Silicon Valley All-Star Coaches www.siliconvalleynjb.com www.njbl.org SV All-Star

Jianchao Yang Toutiao AI Lab in Silicon Valley Joint work with - PowerPoint PPT Presentation

Jianchao Yang Toutiao AI Lab in Silicon Valley Joint work with Xiaojie Jin (NUS), Ning Xu (Snap), Yingzhen Yang Quest for compact and efficient deep models Memory usage Computation cost App size Quest for compact and

Non-Negative Graph Embedding N N ti G h E b ddi Jianchao Yang Shuicheng Yan Yun Fu Jianchao

Speaker: Jianchao Lu Jianchao Lu, Xiaomi Mao, Baris Taskin VLSI Lab Electrical &amp; Computer

Keypoint-Based Action Keypoint-Based Action Recognition Recognition Presenter: Jianchao Yang

Educare of California at Silicon Valley (ECSV) A Center of Excellence in Education Dennis Cima

0 -Sparse Subspace Clustering Yingzhen Yang 1 , Jiashi Feng 2 , Nebojsa Jojic 3 , Jianchao Yang

Crossed product C -algebras and nuclear dimension Jianchao Wu University of M unster Aug

PV Technology Based on Crystalline Silicon Wafers Manufacturing of Crystalline Silicon Week 4.2

Sili Silicon Valley Silicon Valley ll Energy Storage Symposium Energy Storage Symposium gy g

Silicon Valley Advanced Water Purification Center | 1 Silicon Valley Advanced Water Purification

INNOVATION SILICON VALLEY, SEPTEMBER 9-13, 2019 Immersion Program in Silicon Valley 9th Edition A

The IoT Inc Business The IoT Inc Business Meetup Meetup Silicon Silicon Valley Valley Op

Face Hallucination via Face Hallucination via Sparse Coding Jianchao Yang, Hao Tang, Yi Ma, and

YANG Data Models for TE and RSVP drafu-ietg-teas-yang-te-08 drafu-ietg-teas-yang-rsvp-07

Yang Yang MICHIGAN TECH Yang Yang , yyang7@mtu.edu RESEARCH FORUM TECHTALKS Current research

User Modeling on Demographic Attributes in Big Mobile Social Networks Yang Yang Northwestern

Silicon Labs Corporate Overview J A N U A R Y 2 0 2 0 The leader in silicon, software and

Silicon Valley Boston 02 / 23 / 2017 Warfighter DIUx accelerates commercial innovation in the

Virtual Power Plant Options Stakeholder Engagement Webinar May 14, 2019 2 Agenda 10:00

Click to edit Master title style WALL STREET + SILICON VALLEY = Marketplace Lending for

The Leveraging of Silicon Valley: Venture Debt in the Innovation Economy Jesse Davis, Adair

Pipeline Perceptions and Realities A deeper look at barriers to D&amp;I in tech and their impact

The Advent of Robotics in the Workplace Chris Eastham Fieldfisher LLP Chris Eastham

Introduction and Background: Began in 2016 Two $2,500 awards are given: a judges award

Welcome 2016 Silicon Valley All-Star Coaches www.siliconvalleynjb.com www.njbl.org SV All-Star

Speaker: Jianchao Lu Jianchao Lu, Xiaomi Mao, Baris Taskin VLSI Lab Electrical & Computer

Pipeline Perceptions and Realities A deeper look at barriers to D&I in tech and their impact