L ear ning and Vision R esear ch Gr oup Shuic he ng YAN Natio - - PowerPoint PPT Presentation

l ear ning and vision r esear ch gr oup
SMART_READER_LITE
LIVE PREVIEW

L ear ning and Vision R esear ch Gr oup Shuic he ng YAN Natio - - PowerPoint PPT Presentation

L ear ning and Vision R esear ch Gr oup Shuic he ng YAN Natio nal U nive rsity o f Singapo re Learning and Vision Research Group (LV) Founded early 2008 20-30 members Three Indicators of Excellence for Members Industry


slide-1
SLIDE 1

L ear ning and Vision R esear ch Gr

  • up

Shuic he ng YAN

Natio nal U nive rsity o f Singapo re

slide-2
SLIDE 2

Learning and Vision Research Group (LV)

  • Founded early 2008
  • 20-30 members
slide-3
SLIDE 3

Three Indicators of Excellence for Members

High Citations Competition Awards Industry Commercialization

One indicator is enough for a member to be an excellent researcher

slide-4
SLIDE 4

Past, Present and Future of LV

Subspace Learning Sparsity/Low-rank Deep Learning

Smart Services/Devices

(Never-ending Learning)

Past 回顾经典 Present 立足现在 Future 微展未来

slide-5
SLIDE 5

Learning and Vision Group, Past

Subspace Learning, Sparsity/Low‐rank [Block‐Diagonality]

[Guangcan LIU, Canyi LU, Jiashi FENG]

slide-6
SLIDE 6

Subspace: Graph Embedding and Extensions

, ,

ii ij j i p p p p p ii ij j i

L D S D S L D S D S

 

     

 

Graph Embedding and Extensions: A General Framework for Dimensionality Reduction, TPAMI’07, Yan, et al. 1

[{ } , ],

N n i

i i

G S

x x

 

Intrinsic Graph: Penalty Graph

1

[{ } , ]

P N P i

i

G S

x

2

max ||

  • ||

( )

p p T i j ij Y i j

y y S Tr YL Y 

2 1 2

min ||

  • ||

( ), [ , ,..., ]

T i j ij N Y i j

y y S Tr YLY Y y y y  

slide-7
SLIDE 7

Subspace: Graph Embedding and Extensions

, ,

ii ij j i p p p p p ii ij j i

L D S D S L D S D S

 

     

 

Graph Embedding and Extensions: A General Framework for Dimensionality Reduction, TPAMI’07, Yan, et al. 1

[{ } , ],

N n i

i i

G S

x x

 

Intrinsic Graph: Penalty Graph

1

[{ } , ]

P N P i

i

G S

x

Direct Graph Embedding

( ) m ax ( )

p T T Y

T r YL Y T r YL Y

Original PCA & LDA, ISOMAP, LLE, Laplacian Eigenmap Linearization

PCA, LDA, LPP, LEA ……

T

y W x 

Kernelization

KPCA, KDA ……

( )

i i i

W A x   

Tensorization

CSA, DATER

……

1 2 1 2

X

n i i n

y W W W     

Type Formulation Example

slide-8
SLIDE 8

Block‐diagonality‐1: Low‐Rank Representation (LRR)

  • Given data learn the affinity matrix by LRR
  • Theorem: The solution to LRR is block diagonal when the data

are drawn from independent subspaces. Block diagonal affinity matrix

Robust recovery of subspace structures by low-rank representation, TPAMI’13, Liu, et al.

slide-9
SLIDE 9

Block‐diagonality‐2: Unified Block‐diagonal Conditions

  • Theorem: The solution to the following problem

is block diagonal if (1) lies in independent subspaces; (2) satisfies the EBD conditions or is the unique solution.

  • Enforced Block Diagonal (EBD) conditions
  • Many known regularizers satisfy EBD conditions, e.g.,
  • Robust and efficient subspace segmentation via least squares regression, ECCV’12, Lu, et al.
slide-10
SLIDE 10

Block‐diagonality‐3: Hard Block‐biagonal Constraint

  • Key property
  • The block diagonal prior
  • LRR with hard block diagonal constraint

Robust Subspace Segmentation with Laplacian Constraint, CVPR’14, Feng, et al.

slide-11
SLIDE 11

Learning and Vision Group, Present

NUS‐Purine: A Bi‐graph based Deep Learning Framework

[A3: Architecture, Algorithms, Applications]

slide-12
SLIDE 12

Deep Learning in Learning and Vision (LV) Research Group

Architecture

Purine:

General, bi‐graph based DL framework Multi‐PC Multi‐CPU/GPU Approximate Linear speedup High re‐usability, bridge academia and industry

Network‐in‐Network + Computational Baby Learning:

More human‐brain‐like network structure and learning process, reguralizers

Algorithms Applications

Smart Services/Devices + Cloud/Embedded System:

Object analytics, product search/recom., human analytics, others Landing

slide-13
SLIDE 13

Deep Learning in Learning and Vision (LV) Research Group

Architecture

Purine:

General, bi‐graph based DL framework Multi‐PC Multi‐CPU/GPU Approximate Linear speedup High re‐usability, bridge academia and industry

Algorithms Applications

Smart Services/Devices + Cloud/Embedded System:

Object analytics, product search/recom., human analytics, others Landing

1. 4 winner awards in VOC 2. One 2nd prize in VOC 3. 2nd prize in ImageNet’13 4. 1st prize in ImageNet’14 Best paper/demo awards: ACM MM13, ACM MM12, Also licensed LFW: 98.78%, 2nd best Best human parsing performance Cross‐age synthesis Face analysis with occlusions

Network‐in‐Network + Computational Baby Learning:

More human‐brain‐like network structure and learning process, reguralizers

slide-14
SLIDE 14

A3‐I. Architecture

Purine: a Bi-graph based Deep Learning Framework

[Min LIN, Xuan LUO, Shuo LI]

slide-15
SLIDE 15

What is “Purine”

  • Benefited from the open source

deep learning framework Caffe.

  • In purine, the math functions and

core computations are adapted from Caffe.

  • Close molecular structure

http://caffe.berkeleyvision.org/

slide-16
SLIDE 16

Difference from Caffe

Caffe Purine

slide-17
SLIDE 17

Definition Graph vs Computation Graph

Definition Graph Computation Graph Computation Graph of Convolutional Layer

slide-18
SLIDE 18

Computation Graph Definition Graph Computation Graph of Dropout Layer

Definition Graph vs Computation Graph

slide-19
SLIDE 19

Purine Overview

Two Subsystems in Purine: Interpretation: Compose network in Python, generate computation graph in YAML Optimization: Dispatch and solve computation graphs

slide-20
SLIDE 20

Basic Components

  • Blob (a tensor that contains data)
  • Op (operator that performs computation
  • n blobs and outputs blobs)

SoftmaxLoss SoftmaxLossDown Gaussian Bernoulli Constant Uniform Copy Merge Slice Sum WeightedSum Mul Swap Dumper Loader Conv ConvDown ConvWeight Inner InnerDown InnerWeight Bias BiasDown Pool PoolDown Relu ReluDown Softmax SoftmaxDown

Built in Op types Ops are modular, They can be developed and packed in a shared library with some common functions exported. Purine can then dynamically load the ops like extensions.

slide-21
SLIDE 21

Sub‐system‐1: Interpretation

Computation Graph Definition Graph

slide-22
SLIDE 22

Sub‐system‐2: Optimization

How to solve computation graph?

  • Start from sources
  • Stop at sinks
  • Applies to any Directed Acyclic

Graph (DAG)

  • Op will compute when all its inputs

are ready

  • Blob is ready when all its inputs

have computed

  • All computations are event based

and asynchronous, parallelized where possible

slide-23
SLIDE 23

Why Computation Graph

  • Less hard coding
  • All tasks (algorithm and parallel computing) are consistently defined in graphs

hard coded

  • Solver
  • Forward and Backward pass in the same graph

In definition graph: Introduce concepts like forward pass and backward pass hard coded to alternate forward and backward pass. In computation graph: The logic is in the graph

  • Any scheme of parallelism can be expressed in computation graph
slide-24
SLIDE 24

Parallelization Implementation

Properties of Ops and Blobs

type: blob name: weight size: [96, 3, 11, 11] location: ip: 127.0.0.1 device: 0 type: op

  • p_type: Conv

name: conv1 inputs: [ bottom, weight ]

  • utputs: [ top ]

location: ip: 127.0.0.1 device: 0 thread: 1

  • ther fields ...

Example Op defined in YAML Example Blob defined in YAML

Location: The location that the blob/op resides on, including:

  • ip address of the target machine
  • what device it is on (CPU/GPU)

Thread: Thread is needed for op because both CPU and GPU can be multiple threaded (Streams in terms of NVIDIA GPU).

slide-25
SLIDE 25

Parallelization‐1 (Pipeline)

One computation graph can span multiple machines! Case 1, Pipeline Special Op: Copy.

  • Location A & B are same machine

different devices: Copy does one of the following:

  • 1. cudaMemcpyHostToDevice
  • 2. cudaMemcpyDeviceToDevice
  • 3. cudaMemcpyDeviceToHost
  • Location A & B are on different

machines: Copy reside on both machines Source side: nn_send(socket, data) Target side: nn_receive(socket, data) Special Op: Copy.

  • Copy is executed as soon as input

blob is ready

  • Copy is run in its own worker
  • thread. Computation and data

transfer are overlapped wherever possible.

  • GPU inbound and outbound copy

are in different streams, fully utilize CUDA’s dual copy engines.

How to run this pipeline?

slide-26
SLIDE 26

Parallelization‐1 (Pipeline)

Replicate Graph 1 Graph 2 Iterate Graphs/Subgraphs Graph 3

slide-27
SLIDE 27

Parallelization‐2 (Data parallelism)

  • Explicitly duplicate the

nets at different locations

  • Each duplicate run

different data

  • Gather weight gradients at

parameter server Case 2, Data parallelism

slide-28
SLIDE 28

Parallelization‐2 (Data parallelism)

Overlap data transfer and computation

  • Higher layer gradients are computed

earlier than lower layers.

  • Higher layer can send gradients to

parameter server and get them back while the lower layers are doing their computation.

  • Especially true for very deep

networks

  • Data parallelism even for fully

connected layers. Though lots of parameter for FC layer, latency is hidden.

  • Cross machine (network) latency is

less of a problem

slide-29
SLIDE 29

Profiling Result

Parameter update

  • f lowest layer

Data transfer overlaps with computation

100 200 300 400 500 600 700 800

GPUs 1 2 3 4 8

Images per second

Note that 8 GPUs are on different machines. 8 GPUs train GoogleNet in 40 hours. Top5 error rate 12.67% (tuning)

slide-30
SLIDE 30

A3‐II. Algorithms

Network-in-Network

[More Human-brain-like Network Structure] [Min LIN, Qiang CHENG]

slide-31
SLIDE 31

“Network in Network” (NIN)

NIN: more human-brain-like: non-linear filters, pure convolutional

CNN

slide-32
SLIDE 32

“Network in Network” (NIN)

NIN: more human-brain-like: non-linear filters, pure convolutional

CNN NIN

slide-33
SLIDE 33

“Network in Network” (NIN)

NIN: more human-brain-like: non-linear filters, pure convolutional Intuitively less overfitting globally, and more discriminative locally

With less parameter #

[4] Ian J. Goodfellow, David Warde-Farley, Mehdi Mirza, Aaron C. Courville, Yoshua Bengio: Maxout

  • Networks. ICML (3) 2013: 1319-1327

[4]

CNN NIN

slide-34
SLIDE 34

Better Local Abstraction

Local patch is projected to its feature vector. Using a small network. Motivation: Better Local Abstraction!

Cascaded Cross Channel Parametric Pooling (CCCP)

Lin, Min, Qiang Chen, and Shuicheng Yan. "Network In Network." ICLR‐2014.

slide-35
SLIDE 35

CCCP ≈ Cascaded 1x1 Convolution in Implementation

slide-36
SLIDE 36

Global Average Pooling

Save tons of parameters, and benefit Purine amazingly

CNN NIN Confidence map of each category

slide-37
SLIDE 37

Inspiring other deeper models

256 480 480 512 512 512 GoogLeNet = Deeper Network-in-Network

slide-38
SLIDE 38

A3‐II. Algorithms

Computational Baby Learning

[More Human-brain-like Learning Process] [Xiaodan LIANG, Si LIU]

slide-39
SLIDE 39

Prior Knowledge Exploring Physical World Exploring Broader Physical World… … Few positive instances

Computational Baby Learning: More Human‐brain‐like Learning Process

slide-40
SLIDE 40

Computational Baby Learning Framework

… … … …

Learning with Video Contexts

Knowledge Update Better Knowledge Base Detector Update Mining Variable Instances Exemplar Learning Prior Knowledge Modeling

slide-41
SLIDE 41

Experimental Results

Performances in different iterations of our framework on VOC 2007 with two positive samples for each class.

slide-42
SLIDE 42

Results: Explored Video Contexts

Tracking Results

Bird Bicycle Chair TV monitor

slide-43
SLIDE 43

Experimental Results

Detection average precision (%) on PASCAL VOC2012 test. Detection average precision (%) on PASCAL VOC2010 test.

slide-44
SLIDE 44

A3‐III. Applications

Object Analytics and Fashion Search/Recommendation, Human Analytics

slide-45
SLIDE 45

Object Analytics

[Min LIN, Qiang CHENG, Jian DONG, Yunchao WEI]

slide-46
SLIDE 46

NIN for ImageNet Object Detection

Context Refinement + Adaptive Non- parametric Rectification Region Hypothesis

Handcrafte d Features Coding + Pooling SVMs Learning

PASCAL VOC 2012 Classification Solution as Global Context Three Model Average NIN for Object Detection

R-CNN Framework

slide-47
SLIDE 47

Large Scale Visual Recognition Challenge

  • ImageNet Challenges:

– Be held yearly 2010 – 2014. – Data: 1000 object classes, 1.2 million images. – Tasks: object classification, detection and fine-grained recognition. – Trend: From PASCAL VOC to ImageNet.

  • Challenges:

– Large scale problem: data storage, computation cost, etc. – Learning in practices: feature learning and discriminative learning.

more “discriminable” synsets less “discriminable” synsets

slide-48
SLIDE 48

 Results on validation set (0.5:0.5 of val set for validation and testing)  Results on test set (winner of ILSVRC14 detection task)

Submission Method MAP

NIN

the baseline result by using NIN as feature extractor for RCNN

35.61% 3 NINs

Using concatenated features from multiple NIN as feature extractor for RCNN

36.52% (↑0.91%) 3 NINs + ctx

adaptive non‐parametric rectification of outputs from both object detectors and global context

37.49% (↑0.97%) 3 NINs + ctx

adaptive non‐parametric rectification of outputs from both object detectors and global context ‐

37.212%

NIN for ILSVRC‐14 Object Detection

slide-49
SLIDE 49

PASCAL VOC‐12 Classification: Hypotheses‐CNN‐Pooling (HCP)

 Our framework

c 96 256 384 384 256 4096 4096 55 27 13 13 13 5 5 3 3 3 3 3 3 Max Pooling Max Pooling Max Pooling

Shared convolutional neural network

11

dog,person,sheep

Max Pooling

Scores for individual hypothesis

Hypotheses assumption: single- labeled

slide-50
SLIDE 50

Characteristics of Our Framework

 No ground‐truth bounding box information is required for training on the

multi‐label image dataset

 The proposed HCP infrastructure is robust to the noisy and/or redundant

hypotheses

 No explicit hypothesis label is required for training  The shared CNN can be well pre‐trained with a large‐scale single‐label

image dataset

 The HCP outputs are naturally multi‐label prediction results

slide-51
SLIDE 51

Training of HCP

 Hypotheses extraction  Initialization of HCP

 Pre‐training on a large‐scale single‐label image set, e.g. ImageNet  Image‐fine‐tuning on a multi‐label image set

 Hypotheses‐fine‐tuning

slide-52
SLIDE 52

CNN in HCP

Shared CNN

dog,person,sheep

c Max Pooling

Scores for individual hypothesis

slide-53
SLIDE 53

PASCAL VOC

 PASCAL VOC Visual object classes challenges.

 Be held yearly 2007 – 2012.  Tens of teams from universities and industries participated including INRIA,

Berkeley, Oxford, NEC, etc.

 Become “the dataset” for visual object recognition research.

 Main tasks: object classification, detection and segmentation.

 Other tasks: person layout, action recognition, etc.

 Data: 20 object classes, ~23,000 images with fine labeling.

Visual Object Recognition Object Segmentation Object Classification Person, Horse, Barrier, Table, etc Object Detection

slide-54
SLIDE 54

Experimental Results

 Performance on PASCAL VOC 2012

slide-55
SLIDE 55

NIN in HCP

Shared NIN

dog,person,sheep

c Max Pooling

Scores for individual hypothesis

slide-56
SLIDE 56

Compared with State‐of‐the‐arts on VOC 2012

Category NUS‐PSL[1] PRE‐1000C[2] PRE‐1512[2] Chatfield et al.[3] HCP‐NIN HCP‐NIN+NUS‐PSL plane 97.3 93.5 94.6 96.8 98.4 99.5 bicycle 84.2 78.4 82.9 82.5 89.5 93.7 bird 80.8 87.7 88.2 91.5 96.2 96.8 boat 85.3 80.9 84.1 88.1 91.7 94.0 bottle 60.8 57.3 60.3 62.1 72.5 77.7 bus 89.9 85.0 89.0 88.3 91.1 95.3 car 86.8 81.6 84.4 81.9 87.2 92.4 cat 89.3 89.4 90.7 94.8 97.1 98.2 chair 75.4 66.9 72.1 70.3 73.0 86.1 cow 77.8 73.8 86.8 80.2 89.5 91.3 table 75.1 62.0 69.0 76.2 75.1 83.5 dog 83.0 89.5 92.1 92.9 96.3 97.3 horse 87.5 83.2 93.4 90.3 93.0 96.8 motor 90.1 87.6 88.6 89.3 90.5 96.3 person 95.0 95.8 96.1 95.2 94.8 95.8 plant 57.8 61.4 64.3 57.4 66.5 72.2 sheep 79.2 79.0 86.6 83.6 90.3 91.5 sofa 73.4 54.3 62.3 66.4 65.8 81.1 train 94.5 88.0 91.1 93.5 95.6 97.6 tv 80.7 78.3 79.8 81.9 82.0 90.0 MAP 82.2 78.7 82.8 83.2 86.8 91.4

[1] S. Yan, J. Dong, Q. Chen, Z. Song, Y. Pan, W. Xia, H. Zhongyang, Y. Hua, and S. Shen. Generalized hierarchical matching for subcategory aware

  • bject classification. In Visual Recognition Challange workshop, ECCV, 2012.

[2] M. Oquab, L. Bottou, I. Laptev, and J. Sivic. Learning and transferring mid-level image representations using convolutional neural networks. CVPR, 2014. [3] K. Chatfield, K. Simonyan, A. Vedaldi, A. Zisserman. Return of the Devil in the Details: Delving Deep into Convolutional Nets , BMVC, 2014

From 81.7% | < 90.3%

slide-57
SLIDE 57

Category NUS‐PSL[1] PRE‐1000C[2] Chatfield [3] HCP‐1000C PRE‐1512[2] HCP‐2000C HCP‐2000C+[1] plane 97.3 93.5 96.8 98.4 94.6 98.5 99.5 bicycle 84.2 78.4 82.5 89.5 82.9 91.4 94.1 bird 80.8 87.7 91.5 96.2 88.2 96.2 96.9 boat 85.3 80.9 88.1 91.7 84.1 93.2 94.7 bottle 60.8 57.3 62.1 72.5 60.3 72.5 77.6 bus 89.9 85.0 88.3 91.1 89.0 92.6 95.8 car 86.8 81.6 81.9 87.2 84.4 88.9 93.3 cat 89.3 89.4 94.8 97.1 90.7 97.4 98.3 chair 75.4 66.9 70.3 73.0 72.1 77.0 87.2 cow 77.8 73.8 80.2 89.5 86.8 95.9 96.0 table 75.1 62.0 76.2 75.1 69.0 79.3 84.3 dog 83.0 89.5 92.9 96.3 92.1 96.8 97.6 horse 87.5 83.2 90.3 93.0 93.4 97.5 98.5 motor 90.1 87.6 89.3 90.5 88.6 92.9 96.8 person 95.0 95.8 95.2 94.8 96.1 95.4 96.1 plant 57.8 61.4 57.4 66.5 64.3 67.8 73.4 sheep 79.2 79.0 83.6 90.3 86.6 94.7 94.7 sofa 73.4 54.3 66.4 65.8 62.3 70.0 83.2 train 94.5 88.0 93.5 95.6 91.1 96.8 98.2 tv 80.7 78.3 81.9 82.0 79.8 83.0 89.6 MAP 82.2 78.7 83.2 86.8 82.8 88.9 92.3

Compared with State‐of‐the‐arts on VOC 2012 [Live Demo]

[1] S. Yan, J. Dong, Q. Chen, Z. Song, Y. Pan, W. Xia, H. Zhongyang, Y. Hua, and S. Shen. Generalized hierarchical matching for subcategory aware

  • bject classification. In Visual Recognition Challange workshop, ECCV, 2012.

[2] M. Oquab, L. Bottou, I. Laptev, and J. Sivic. Learning and transferring mid-level image representations using convolutional neural networks. CVPR, 2014. [3] K. Chatfield, K. Simonyan, A. Vedaldi, A. Zisserman. Return of the Devil in the Details: Delving Deep into Convolutional Nets , BMVC, 2014

From 84.2% | 90.3%

slide-58
SLIDE 58

Fashion Search and Recommendation

[Junshi HUANG]

slide-59
SLIDE 59

Deep Search: Attribute‐aware Neural Network for Clothes Retrieval

Upload an Image Enter an Image URL

query_image.jpg Choose File Submit Search

slide-60
SLIDE 60

Deep Search, is an attribute‐aware fashion‐related search engine, based on a tree‐structure neural network.

Input Image

Cropped Image High-level Feature Retrieval Result The Framework of Deep Search

slide-61
SLIDE 61

Qualitative Results [Live Demo, licensed to ******* company, online already]

Top Bottom Query

slide-62
SLIDE 62

Results [Live Demo, licensed to ******* company, online already]

slide-63
SLIDE 63

Human Analytics

[Luoqi LIU, Zhiheng NIU, Xiangbo SHU, Xiaodan LIANG, Si LIU]

slide-64
SLIDE 64

NIN based Face Recognition Solution

 NIN trained over 400k face images of 8,000 subjects  Our current accuracy on LFW is 98.78%

 Second best in the world (vs. 99.40%)  Better than Facebook (97.35) and Face++ (97.27%)

LFW

slide-65
SLIDE 65

Current Focus on Face Analytics

Face analysis with occlusions Cross‐age FR and synthesis

徐娇, Jiao XU

slide-66
SLIDE 66

Current Focus on Face Analytics

Face analysis with occlusions Cross‐age FR and synthesis

徐娇, Jiao XU

slide-67
SLIDE 67

Current Focus on Face Analytics

Face analysis with occlusions Cross‐age FR and synthesis

徐娇, Jiao XU

slide-68
SLIDE 68

Current Focus on Face Analytics

Face analysis with occlusions Cross‐age FR and synthesis

徐娇, Jiao XU

slide-69
SLIDE 69

Human Parsing

Test Paperdoll ATR (noSPR) ATR ATR ATR (noSPR) Paperdoll Test

Current best human parsing engine in the world!

slide-70
SLIDE 70

Human Parsing

Test Paperdoll ATR (noSPR) ATR ATR ATR (noSPR) Paperdoll Test

Current best human parsing engine in the world!

slide-71
SLIDE 71

Learning and Vision Group, Future

Smart Services/Devices [Never‐ending Learning]

slide-72
SLIDE 72

Never‐ending Visual Learning

Computation‐efficient:

Cloud‐computation large‐scale computing

Energy‐efficient:

Wearable sensors Movable sensors

Data storage Analysis and service recommendation User smart-phone Cloud

Commonality: require new learning strategies, not one-stage passive learning; self-learning, ever-ending learning; learning with noisy labels

slide-73
SLIDE 73

Past, Present and Future of LV

Subspace Learning Sparsity/Low‐rank Deep Learning Smart Services/Devices

(Never‐ending Learning) Past 回顾经典 Present 立足现在 Future 微展未来

slide-74
SLIDE 74

Email: eleyans@nus.edu.sg