video images
play

Video Images WangTao, wtao@qiyi.com IQIYI ltd. 2016.4 Outline - PowerPoint PPT Presentation

CNN Based Object Detection in Large Video Images WangTao, wtao@qiyi.com IQIYI ltd. 2016.4 Outline Introduction Background Challenge Our approach System framework Object detection Scene recognition Body


  1. CNN Based Object Detection in Large Video Images WangTao, wtao@qiyi.com IQIYI ltd. 2016.4

  2. Outline • Introduction • Background • Challenge • Our approach • System framework • Object detection • Scene recognition • Body segmentation • Same style matching • Experiments • Conclusion

  3. Background • Image retrieval • Video advertising Video out applications

  4. Challenge • Real video data vs. image dataset - Clutter background - Multiple objects - Small objects - Variant pose/position - Partial occlusion

  5. Our task • Problems : • Content based object retrieval in large video images • High accuracy for same style matching • High speed in large video database • Solution : • Accurate object detection + scene classification • Discriminated DNN features and PCA/LDA transformation • Speed up by parallel indexing and hierarchical filtering

  6. System framework Scene Classification Video key indexing frame Object Body CNN Indexing detection segmentation feature Database Scene Classification Query image Faster-RCNN CNN Body Match query rect segmentation feature Distance sort Result

  7. Object detection (I) • Object detection by faster-RCNN Faster-RCNN, Region proposals + object scores, [Ren, Shaoqing, et al. • NIPS2015] Trained on MS coco db (300k images) + video images (10k images) • More pervasive and general for images with multi-objects •

  8. • Multi-class object detection including • Clothes(skirt , jacket , trousers ) • Bags ( handbag , backpack , draw-bar box ) • Electronics ( mobile, laptop , TV , keyboard , mouse , microwave oven , oven , refrigerator ) • Glasses, necklace, hat • Shoes

  9. Object detection (II) • Object detection by CNN regression • Input an image, output the coordinates of the object rectangle [Erhan, Dumitru, et al. CVPR2014] • Efficient for images with single object, not recognized by faster-RCNN

  10. Body Segmentation • Constraint by human body parts • CNN based body segmentation [Jonathan Long,CVPR2015] • Bounding box, body mask, body parsing original image segmentation image

  11. Scene classification • CNN based Scene classification [Bolei Zhou, NIPS2014] Video Is Scene? CNN absed Multi-frame tags Key frame yes/no Scene classification fusion Scene classification Preciosn:65.8% Recall:74% Threshold@0.7 Preciosn:83.8% Recall:56.7% Non scene images Scene images of kitchen, office, living room, and bedroom

  12. Scene classes 28 dentists • 0 kitchen 14 outdoor_ice_world 29 drugstore • 1 dining 15 indoor_ice_skating_rink 30 music_studio • 2 bakery 16 baseball 31 music_store • 3 ice_cream_parlor 17 football 32 sandbeach • 4 bathroom 18 basketball_court 33 hairsalon • 5 washing_room 19 swimming_pool 34 bar • 6 bedroom 20 track 35 pagoda • 7 living_room 21 bowling_alley 36 bamboo_forest • 8 office 22 billiards 37 mountain • 9 children_room 23 tennis 38 coast • 10 nursery 24 volleyball 39 creek • 11 toyshop 25 gymnasium 40 waterfall • 12 shoe_shop 26 pleasure_ground 41 grass • 13 jewelry_shop 27 hospital_room 42 other

  13. Same style matching • SIFT feature matching Normalization of SIFT • Dimension : 128dim x 400pts • MAP 22% • • CNN feature of imagenet 1k classifier Model :VGG19 • Layers : fc7 • Dimension : 4096  600 • MAP 28% • • CNN feature of Same style classifier Model :VGG19 • Layers : fc7 • Dimension : 4096  600 • MAP 34% •

  14. Multi-feature fusion • Same class matching classifier on imagenet 21k classes of 15M images • Same style matching classifier trained on 1239 queries of 1M images CNN Models Feature dim MAP Inception_bn1k 1024 24% Inception_21k 1024 34% Vgg19_caffe 4096 34% Inception_21k + vgg19_caffe 5120 43% • Speed • Nvidia K40 GPU, 10x faster than CPU i7 • Faster RCNN speed: 200ms/frame , image size 1920x1080 • Vgg19 feature speed: 60ms/frame, image size 256x256

  15. Experiments • MAP precision on 3M testing images, trained on1M images Vgg 19model Full image Object PCA+LDA Inception-21k MAP rectangle × × × √ √ 27.8% × × × √ √ 34.2% × × √ √ √ 37.3% × × √ √ √ 43.1% × √ √ √ √ 46.1% • Speed up Parallel flann tree indexing • Hierarchical filtering by object classes, 10x faster speed • Query speed: 1s /image on 5000 teleplays with 2M images •

  16. Query system GUI

  17. Query examples on image dataset

  18. Query examples on video dataset

  19. Conclusion • Bounding box is important to recognize object • Fusion Same style matching with same class matching features to get higher accuracy • PCA and LDA further improve accuracy and speed • GPU is faster for CNN feature extraction • Speed up query by parallel indexing and hierarchical filtering

  20. References Erhan, Dumitru, et al. "Scalable object detection using deep neural networks." Proceedings of the IEEE Conference on • Computer Vision and Pattern Recognition . 2014. Ren, Shaoqing, et al. "Faster R-CNN: Towards real-time object detection with region proposal networks." Advances in • Neural Information Processing Systems . 2015. Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional neural • networks." Advances in neural information processing systems . 2012. Arandjelović , Relja, and Andrew Zisserman. "Three things everyone should know to improve object retrieval." Proceedings • of the IEEE Conference on Computer Vision and Pattern Recognition. 2012. Jonathan Long, Evan Shelhamer, Trevor Darrell, Fully convolution Networks for Semantic Segmentation. CVPR 2015 • arXiv:1411.4038. Conditional Random Fields as Recurrent Neural Networks. S. Zheng, S. Jayasumana, B. Romera-Paredes, V. Vineet, Z. Su, D. • Du, C. Huang, P. Torr ICCV 2015. Li Shen, Zhouchen Lin and Qingming Huang, Learning deep convolutional neural networks for places2 scene recognition, • Clinical Orthopaedics and Related Research, 2015 Bolei Zhou, Agata Lapedriza, Jianxiong Xiao, Antonio Torralba and Aude Oliva, Learning Deep Features for Scene • Recognition using Places Database, NIPS, 2014 Bolei Zhou, Aditya Khosla, Agata Lapedriza, Aude Oliva and Antonio Torralba, Object detectors emerge in deep scene cnns, • ICLR, 2015 Ruobing Wu, Baoyuan Wang, Wenping Wang and Yizhou Yu, Harvesting discriminative meta objects with deep CNN • features for Scene Classification, ICCV, 2015 Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jonathon Shlens, Zbigniew Wojna,Rethinking the Inception • Architecture for Computer Vision, arXiv:1512.00567 ,2015

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend