VCI2R at the NTCIR-13 Lifelog-2 LSAT Task
Presented by: Qianli Xu Co-authors: Jie Lin, Ana del Molino, Qianli Xu, Fen Fang, V. Subbaraju, Joo-Hwee Lim, Liyuan Li, V. Chandrasekhar Organization: Institute for Infocomm Research, A*STAR, Singapore
VCI 2 R at the NTCIR-13 Lifelog-2 LSAT Task Presented by: Qianli Xu - - PowerPoint PPT Presentation
VCI 2 R at the NTCIR-13 Lifelog-2 LSAT Task Presented by: Qianli Xu Co-authors: Jie Lin, Ana del Molino, Qianli Xu, Fen Fang, V. Subbaraju, Joo-Hwee Lim, Liyuan Li, V. Chandrasekhar Organization: Institute for Infocomm Research, A*STAR,
Presented by: Qianli Xu Co-authors: Jie Lin, Ana del Molino, Qianli Xu, Fen Fang, V. Subbaraju, Joo-Hwee Lim, Liyuan Li, V. Chandrasekhar Organization: Institute for Infocomm Research, A*STAR, Singapore
(I2R), A*STAR, Singapore
– Visual Computing – Human Language Tech – Data Analytics – Neural Biomedical Tech – etc.
– Video/image analytics & search – Augmented visual intelligence – Visual inspection Website: www.a-star.edu.sg/i2r/
Query Topic Object Classifier Places Classifier Object Detector
NTCIR-13 Classifier
Time tag Loc tag # People Lifelog Images
Training Images
CNN Faster RCNN User-given
Online Offline w1 w2 w3 w4 w5 w6 w7 Feature weight Relevant concepts
Temporal Smoothing … …
Image + Metadata Query Topics Semantic Gap
CNN predications relevant to query topics?
contribute the most?
coherence, remove outliers
location (GPS) and Time
“Castle @ Night” “Working in a coffee shop” “Gardening in my home”
del Molino, et al., 2017, VC-I2R at ImageCLEF2017: Ensemble of deep learned features for lifelog video summarization. CLEF Working Notes, CEUR.
– Object: ResNet152 – ImageNet1K – Place: ResNet152 – Place365
– Faster R-CNN – MSCOCO (80)
– VGG-16 – ImageNet1K – Replace the last layer (1K neurons) with 634 neurons – Sigmoid as the activation function
– Sighthound (https://www.sighthound.com)
Objects Places MSCOCO Task Relevant Avoid Relevant Avoid Relevant 1 computer group meeting
group meeting etc.
keyboard 2 television food glass computer group meeting living room television room etc. conference room lecture room etc. tv remote etc. 3 computer group meeting
coffee shop living room etc. conference room
etc. laptop keyboard 4 computer pencil notebook
living room hotel room etc. conference room
etc. laptop book etc. 5 food glass drum white goods menu’ food court restaurant etc.
sandwich etc.
CRF for Feature weighing that accommodates individual differences Relevance mapping for each topic
Eθ(s) = λ X
i
φu(si) | {z } unary + X
ij
φp(si, sj) | {z } pairwise , the unary potentials enforce the selection of static
ImageNet1K Places365 MSCOCO NTCIR Time # People Location tag Training Images w1 w2 w3 w4 w5 w6 w7 w8 w9 w10 w11 w12 Feature weight Relevant concepts
0.2 0.4 0.6 0.8 1 Eat Lunch Gardening Castle at Night Coffee Sunset Graveyard Lecturing Shopping Working Late On Computer Cooking Flying Juice Photo of Sea Beers in Bar Greek Amphit TV Recording Work w Coffee Painting Walls Eating Pasta Exercises Mountain Hiking Turtles User 1 User 2
mAP
0.502 0.528 0.654 0.748 0.761 0.826 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Fixed Adaptive (User) Adaptive (User + Event) mAP User 2 User 1
0.4 0.5 0.6 0.7 0.8 0.9 mAP All − NTCIR−13 − ImageNet1K − Places365 − MSCOCO − Location − Time − #People User 1 User 2
User 2 User 1
Feature importance Decrease in performance when we remove one type of
decrease, the more important the feature. Effect of temporal smoothing
Whether temporal smoothing is performed or not
Effect of threshold for relevant concept searching
Semantic concepts which activation level is above the threshold is considered relevant to the query topic
0.528 0.543 0.761 0.789 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 No smoothing Temporal smoothing mAP
Effective Lifelog Image Retrieval
High Quality Data Good Semantic Features Reasonable Ground Truth Intelligence in Interpretation
Topics Intelligence in Model Fine- tuning
Email: qxu@i2r.a-star.edu.sg