SLIDE 1 D R E S S L I K E A S T A R : R E T R I E V I N G F A S H I O N P R O D U C T S F R O M V I D E O S
N O A G A R C I A & G E O R G E V O G I A T Z I S
C O M P U T E R V I S I O N I N F A S H I O N W O R K S H O P
SLIDE 2
Fashion in Videos
Movies TV shows Online
SLIDE 3
Fashion in Videos
Sex and the City
SLIDE 4
Fashion in Videos
The Devil Wears Prada
SLIDE 5
Fashion in Videos
The Great Gastby
SLIDE 6
Fashion in Videos
Make fashion products in videos more accessible to users.
SLIDE 7
Fashion in Videos
SLIDE 8 Constraints
Camera viewpoint cannot be moved to have a better view of the fashion object.
SLIDE 9 Constraints
The creation of bounding boxes around the object of interest may distract users from the video.
SLIDE 10 Constraints
Small, partially
blurred.
SLIDE 11
Our Proposal
Instead of object recognition...
SLIDE 12
Our Proposal
Instead of object recognition... frame retrieval
SLIDE 13
Related Work
Clothing Retrieval
Attribute classification [1] Domain adaptation [2] Image Retrieval in Videos [3] Temporal tracking [4] Scene Descriptors [5, 6] Our Approach: binary temporal tracking + fast indexing.
Scene Retrieval
SLIDE 14
Challenges
Average movie duration Standard FPS rate Average frames per movie 120 minutes 24 fps 172,800 frames With only 5 or 6 movies
More than a million frames!
SLIDE 15
Our System
Three main modules: Product indexing Training phase Query phase
SLIDE 16
Our System
Three main modules: Training phase Query phase Product indexing
SLIDE 17
Our System: Product indexing
Fashion items and frames related in an database.
SLIDE 18
Our System
Three main modules: Product indexing Query phase Training phase
SLIDE 19 Our System: Training phase
BRIEF features are more constant
- ver time than SIFT or CNN.
BRIEF SIFT
SLIDE 20 Our System: Training phase
Similar frames are grouped into shots.
shot 2 shot 1 shot 3
SLIDE 21
Our System: Training phase
SLIDE 22
Our System
Three main modules: Product indexing Training phase Query phase
SLIDE 23
Our System: Query phase
SLIDE 24
Our System: Query phase
SLIDE 25
Our System: Query phase
SLIDE 26
Our System: Query phase
SLIDE 27
Our System: Query phase
SLIDE 28
Our System: Query phase
Use the most similar frame to find the fashion products in the indexed product database.
SLIDE 29
Experiments - Dataset
Webcam captures video playback. Frame number is used as a ground truth. The retrieved frame should be visually similar to the annotated ground truth.
SLIDE 30 Experiments - Retrieval Performance
Results using a single movie, 1h 49min duration
BF: Brute Force KT: Kd-Tree KF: Key Frame
Huge gain in memory requierements with our method.
SLIDE 31 The Great Gatsby Casablanca
Match Point Maleficent Magnolia Big Fish The Help Ant-Man Her
Absolutely Anything The Social Network Captain Phillips 12 Years a Slave American Hustle Out of Africa Groundhog Day A Single Man Seven Pounds
The Wolf of Wall Street The Devil Wears Prada Puss in Boots Despicable Me Family United Marshland The Body
Harry Potter and the Deathly Hallows The Hobbit: The Desolation of Smaug Rise of the Planet of the Apes Pirates of the Caribbean Neon Genesis Evangelion 300: Rise of an Empire Grave of the Fireflies Witching and Bitching 2 Francs, 40 Pesetas Lee Daniels’ The Spanish Affair 2 The Last Circus The Physician El Niño
Experiments - Scalability
40 movies 80 hours 7 millon frames
SLIDE 32
Experiments - Scalability
Results using 40 movies Data reduction: From 3,040M features to 58M key features.
SLIDE 33
Conclusions
System to perform video clothing retrieval. It helps users to find items shown in videos. Based on frame retrieval and fast indexing. It scales well when the collection is increased.
SLIDE 34 T H A N K Y O U !
N O A G A R C I A A S T O N U N I V E R S I T Y C O N T A C T : G A R C I A D N @ A S T O N . A C . U K G I T H U B : N O A G A R C I A / D R E S S T A R
C O M P U T E R V I S I O N I N F A S H I O N W O R K S H O P
SLIDE 35 References
[1] Z. Liu, P. Luo, S. Qiu, X. Wang, and X. Tang. Deepfashion: Powering robust clothes recognition and retrieval with rich annotations. In CVPR, 2016. [2] S. Liu, Z. Song, G. Liu, C. Xu, H. Lu, and S. Yan. Street-to-shop: Cross-scenario clothing retrieval via parts alignment and auxiliary set. In CVPR, 2012. [3] J. Sivic and A. Zisserman. Video Google: a text retrieval approach to object matching in videos. In ICCV, 2003. [4] A. Anjulan and N. Canagarajah. Object based video retrieval with local region
- tracking. Signal Processing: Image Communication, 22(7), 2007.
[5] C.-Z. Zhu and S. Satoh. Large vocabulary quantization for searching instances from
- videos. In ACM ICMR, 2012.
[6] A. Araujo and B. Girod. Large-scale video retrieval using image queries. IEEE Transactions on Circuits and Systems for Video Technology, 2017.