L ear ning and Vision R esear ch Gr
- up
Shuic he ng YAN
Natio nal U nive rsity o f Singapo re
L ear ning and Vision R esear ch Gr oup Shuic he ng YAN Natio - - PowerPoint PPT Presentation
L ear ning and Vision R esear ch Gr oup Shuic he ng YAN Natio nal U nive rsity o f Singapo re Learning and Vision Research Group (LV) Founded early 2008 20-30 members Three Indicators of Excellence for Members Industry
Shuic he ng YAN
Natio nal U nive rsity o f Singapo re
One indicator is enough for a member to be an excellent researcher
Subspace Learning Sparsity/Low-rank Deep Learning
Smart Services/Devices
(Never-ending Learning)
Past 回顾经典 Present 立足现在 Future 微展未来
[Guangcan LIU, Canyi LU, Jiashi FENG]
, ,
ii ij j i p p p p p ii ij j i
L D S D S L D S D S
Graph Embedding and Extensions: A General Framework for Dimensionality Reduction, TPAMI’07, Yan, et al. 1
N n i
i i
1
P N P i
i
2
max ||
( )
p p T i j ij Y i j
y y S Tr YL Y
2 1 2
min ||
( ), [ , ,..., ]
T i j ij N Y i j
y y S Tr YLY Y y y y
, ,
ii ij j i p p p p p ii ij j i
L D S D S L D S D S
Graph Embedding and Extensions: A General Framework for Dimensionality Reduction, TPAMI’07, Yan, et al. 1
N n i
i i
1
P N P i
i
Direct Graph Embedding
( ) m ax ( )
p T T Y
T r YL Y T r YL Y
Original PCA & LDA, ISOMAP, LLE, Laplacian Eigenmap Linearization
PCA, LDA, LPP, LEA ……
T
y W x
Kernelization
KPCA, KDA ……
( )
i i i
W A x
Tensorization
CSA, DATER
……
1 2 1 2
X
n i i n
y W W W
Type Formulation Example
Robust recovery of subspace structures by low-rank representation, TPAMI’13, Liu, et al.
Robust Subspace Segmentation with Laplacian Constraint, CVPR’14, Feng, et al.
Architecture
General, bi‐graph based DL framework Multi‐PC Multi‐CPU/GPU Approximate Linear speedup High re‐usability, bridge academia and industry
More human‐brain‐like network structure and learning process, reguralizers
Algorithms Applications
Object analytics, product search/recom., human analytics, others Landing
Architecture
General, bi‐graph based DL framework Multi‐PC Multi‐CPU/GPU Approximate Linear speedup High re‐usability, bridge academia and industry
Algorithms Applications
Object analytics, product search/recom., human analytics, others Landing
1. 4 winner awards in VOC 2. One 2nd prize in VOC 3. 2nd prize in ImageNet’13 4. 1st prize in ImageNet’14 Best paper/demo awards: ACM MM13, ACM MM12, Also licensed LFW: 98.78%, 2nd best Best human parsing performance Cross‐age synthesis Face analysis with occlusions
More human‐brain‐like network structure and learning process, reguralizers
http://caffe.berkeleyvision.org/
SoftmaxLoss SoftmaxLossDown Gaussian Bernoulli Constant Uniform Copy Merge Slice Sum WeightedSum Mul Swap Dumper Loader Conv ConvDown ConvWeight Inner InnerDown InnerWeight Bias BiasDown Pool PoolDown Relu ReluDown Softmax SoftmaxDown
hard coded
type: blob name: weight size: [96, 3, 11, 11] location: ip: 127.0.0.1 device: 0 type: op
name: conv1 inputs: [ bottom, weight ]
location: ip: 127.0.0.1 device: 0 thread: 1
Example Op defined in YAML Example Blob defined in YAML
100 200 300 400 500 600 700 800
Images per second
CNN
CNN NIN
With less parameter #
[4] Ian J. Goodfellow, David Warde-Farley, Mehdi Mirza, Aaron C. Courville, Yoshua Bengio: Maxout
[4]
CNN NIN
Cascaded Cross Channel Parametric Pooling (CCCP)
Lin, Min, Qiang Chen, and Shuicheng Yan. "Network In Network." ICLR‐2014.
CNN NIN Confidence map of each category
256 480 480 512 512 512 GoogLeNet = Deeper Network-in-Network
Prior Knowledge Exploring Physical World Exploring Broader Physical World… … Few positive instances
… … … …
Learning with Video Contexts
Knowledge Update Better Knowledge Base Detector Update Mining Variable Instances Exemplar Learning Prior Knowledge Modeling
Performances in different iterations of our framework on VOC 2007 with two positive samples for each class.
Tracking Results
[Min LIN, Qiang CHENG, Jian DONG, Yunchao WEI]
Context Refinement + Adaptive Non- parametric Rectification Region Hypothesis
Handcrafte d Features Coding + Pooling SVMs Learning
PASCAL VOC 2012 Classification Solution as Global Context Three Model Average NIN for Object Detection
R-CNN Framework
– Be held yearly 2010 – 2014. – Data: 1000 object classes, 1.2 million images. – Tasks: object classification, detection and fine-grained recognition. – Trend: From PASCAL VOC to ImageNet.
– Large scale problem: data storage, computation cost, etc. – Learning in practices: feature learning and discriminative learning.
more “discriminable” synsets less “discriminable” synsets
Results on validation set (0.5:0.5 of val set for validation and testing) Results on test set (winner of ILSVRC14 detection task)
NIN
the baseline result by using NIN as feature extractor for RCNN
35.61% 3 NINs
Using concatenated features from multiple NIN as feature extractor for RCNN
36.52% (↑0.91%) 3 NINs + ctx
adaptive non‐parametric rectification of outputs from both object detectors and global context
37.49% (↑0.97%) 3 NINs + ctx
adaptive non‐parametric rectification of outputs from both object detectors and global context ‐
37.212%
Our framework
c 96 256 384 384 256 4096 4096 55 27 13 13 13 5 5 3 3 3 3 3 3 Max Pooling Max Pooling Max Pooling
Shared convolutional neural network
11
…
dog,person,sheep
Max Pooling
…
Scores for individual hypothesis
Hypotheses assumption: single- labeled
No ground‐truth bounding box information is required for training on the
The proposed HCP infrastructure is robust to the noisy and/or redundant
No explicit hypothesis label is required for training The shared CNN can be well pre‐trained with a large‐scale single‐label
The HCP outputs are naturally multi‐label prediction results
Hypotheses extraction Initialization of HCP
Pre‐training on a large‐scale single‐label image set, e.g. ImageNet Image‐fine‐tuning on a multi‐label image set
Hypotheses‐fine‐tuning
…
dog,person,sheep
c Max Pooling
…
Scores for individual hypothesis
PASCAL VOC Visual object classes challenges.
Be held yearly 2007 – 2012. Tens of teams from universities and industries participated including INRIA,
Become “the dataset” for visual object recognition research.
Main tasks: object classification, detection and segmentation.
Other tasks: person layout, action recognition, etc.
Data: 20 object classes, ~23,000 images with fine labeling.
Visual Object Recognition Object Segmentation Object Classification Person, Horse, Barrier, Table, etc Object Detection
Performance on PASCAL VOC 2012
…
dog,person,sheep
c Max Pooling
…
Scores for individual hypothesis
Category NUS‐PSL[1] PRE‐1000C[2] PRE‐1512[2] Chatfield et al.[3] HCP‐NIN HCP‐NIN+NUS‐PSL plane 97.3 93.5 94.6 96.8 98.4 99.5 bicycle 84.2 78.4 82.9 82.5 89.5 93.7 bird 80.8 87.7 88.2 91.5 96.2 96.8 boat 85.3 80.9 84.1 88.1 91.7 94.0 bottle 60.8 57.3 60.3 62.1 72.5 77.7 bus 89.9 85.0 89.0 88.3 91.1 95.3 car 86.8 81.6 84.4 81.9 87.2 92.4 cat 89.3 89.4 90.7 94.8 97.1 98.2 chair 75.4 66.9 72.1 70.3 73.0 86.1 cow 77.8 73.8 86.8 80.2 89.5 91.3 table 75.1 62.0 69.0 76.2 75.1 83.5 dog 83.0 89.5 92.1 92.9 96.3 97.3 horse 87.5 83.2 93.4 90.3 93.0 96.8 motor 90.1 87.6 88.6 89.3 90.5 96.3 person 95.0 95.8 96.1 95.2 94.8 95.8 plant 57.8 61.4 64.3 57.4 66.5 72.2 sheep 79.2 79.0 86.6 83.6 90.3 91.5 sofa 73.4 54.3 62.3 66.4 65.8 81.1 train 94.5 88.0 91.1 93.5 95.6 97.6 tv 80.7 78.3 79.8 81.9 82.0 90.0 MAP 82.2 78.7 82.8 83.2 86.8 91.4
[1] S. Yan, J. Dong, Q. Chen, Z. Song, Y. Pan, W. Xia, H. Zhongyang, Y. Hua, and S. Shen. Generalized hierarchical matching for subcategory aware
[2] M. Oquab, L. Bottou, I. Laptev, and J. Sivic. Learning and transferring mid-level image representations using convolutional neural networks. CVPR, 2014. [3] K. Chatfield, K. Simonyan, A. Vedaldi, A. Zisserman. Return of the Devil in the Details: Delving Deep into Convolutional Nets , BMVC, 2014
From 81.7% | < 90.3%
Category NUS‐PSL[1] PRE‐1000C[2] Chatfield [3] HCP‐1000C PRE‐1512[2] HCP‐2000C HCP‐2000C+[1] plane 97.3 93.5 96.8 98.4 94.6 98.5 99.5 bicycle 84.2 78.4 82.5 89.5 82.9 91.4 94.1 bird 80.8 87.7 91.5 96.2 88.2 96.2 96.9 boat 85.3 80.9 88.1 91.7 84.1 93.2 94.7 bottle 60.8 57.3 62.1 72.5 60.3 72.5 77.6 bus 89.9 85.0 88.3 91.1 89.0 92.6 95.8 car 86.8 81.6 81.9 87.2 84.4 88.9 93.3 cat 89.3 89.4 94.8 97.1 90.7 97.4 98.3 chair 75.4 66.9 70.3 73.0 72.1 77.0 87.2 cow 77.8 73.8 80.2 89.5 86.8 95.9 96.0 table 75.1 62.0 76.2 75.1 69.0 79.3 84.3 dog 83.0 89.5 92.9 96.3 92.1 96.8 97.6 horse 87.5 83.2 90.3 93.0 93.4 97.5 98.5 motor 90.1 87.6 89.3 90.5 88.6 92.9 96.8 person 95.0 95.8 95.2 94.8 96.1 95.4 96.1 plant 57.8 61.4 57.4 66.5 64.3 67.8 73.4 sheep 79.2 79.0 83.6 90.3 86.6 94.7 94.7 sofa 73.4 54.3 66.4 65.8 62.3 70.0 83.2 train 94.5 88.0 93.5 95.6 91.1 96.8 98.2 tv 80.7 78.3 81.9 82.0 79.8 83.0 89.6 MAP 82.2 78.7 83.2 86.8 82.8 88.9 92.3
[1] S. Yan, J. Dong, Q. Chen, Z. Song, Y. Pan, W. Xia, H. Zhongyang, Y. Hua, and S. Shen. Generalized hierarchical matching for subcategory aware
[2] M. Oquab, L. Bottou, I. Laptev, and J. Sivic. Learning and transferring mid-level image representations using convolutional neural networks. CVPR, 2014. [3] K. Chatfield, K. Simonyan, A. Vedaldi, A. Zisserman. Return of the Devil in the Details: Delving Deep into Convolutional Nets , BMVC, 2014
From 84.2% | 90.3%
[Junshi HUANG]
query_image.jpg Choose File Submit Search
Deep Search, is an attribute‐aware fashion‐related search engine, based on a tree‐structure neural network.
Input Image
Cropped Image High-level Feature Retrieval Result The Framework of Deep Search
Top Bottom Query
Second best in the world (vs. 99.40%) Better than Facebook (97.35) and Face++ (97.27%)
LFW
徐娇, Jiao XU
徐娇, Jiao XU
徐娇, Jiao XU
徐娇, Jiao XU
Test Paperdoll ATR (noSPR) ATR ATR ATR (noSPR) Paperdoll Test
Test Paperdoll ATR (noSPR) ATR ATR ATR (noSPR) Paperdoll Test
Cloud‐computation large‐scale computing
Wearable sensors Movable sensors
Data storage Analysis and service recommendation User smart-phone Cloud
Commonality: require new learning strategies, not one-stage passive learning; self-learning, ever-ending learning; learning with noisy labels
Subspace Learning Sparsity/Low‐rank Deep Learning Smart Services/Devices
(Never‐ending Learning) Past 回顾经典 Present 立足现在 Future 微展未来
Email: eleyans@nus.edu.sg