D R E S S L I K E A S T A R : R E T R I E V I N G F A S H I O N - - PowerPoint PPT Presentation

d r e s s l i k e a s t a r r e t r i e v i n g f a s h i
SMART_READER_LITE
LIVE PREVIEW

D R E S S L I K E A S T A R : R E T R I E V I N G F A S H I O N - - PowerPoint PPT Presentation

D R E S S L I K E A S T A R : R E T R I E V I N G F A S H I O N P R O D U C T S F R O M V I D E O S N O A G A R C I A & G E O R G E V O G I A T Z I S C O M P U T E R V I S I O N I N F A S H I O N W O R K S H O P Fashion in Videos Movies


slide-1
SLIDE 1

D R E S S L I K E A S T A R : R E T R I E V I N G F A S H I O N P R O D U C T S F R O M V I D E O S

N O A G A R C I A & G E O R G E V O G I A T Z I S

C O M P U T E R V I S I O N I N F A S H I O N W O R K S H O P

slide-2
SLIDE 2

Fashion in Videos

Movies TV shows Online

slide-3
SLIDE 3

Fashion in Videos

Sex and the City

slide-4
SLIDE 4

Fashion in Videos

The Devil Wears Prada

slide-5
SLIDE 5

Fashion in Videos

The Great Gastby

slide-6
SLIDE 6

Fashion in Videos

Make fashion products in videos more accessible to users.

slide-7
SLIDE 7

Fashion in Videos

slide-8
SLIDE 8

Constraints

  • 1. Camera view

Camera viewpoint cannot be moved to have a better view of the fashion object.

slide-9
SLIDE 9

Constraints

  • 2. User interaction

The creation of bounding boxes around the object of interest may distract users from the video.

slide-10
SLIDE 10

Constraints

  • 3. Small objects

Small, partially

  • ccluded and

blurred.

slide-11
SLIDE 11

Our Proposal

Instead of object recognition...

slide-12
SLIDE 12

Our Proposal

Instead of object recognition... frame retrieval

slide-13
SLIDE 13

Related Work

Clothing Retrieval

Attribute classification [1] Domain adaptation [2] Image Retrieval in Videos [3] Temporal tracking [4] Scene Descriptors [5, 6] Our Approach: binary temporal tracking + fast indexing.

Scene Retrieval

slide-14
SLIDE 14

Challenges

Average movie duration Standard FPS rate Average frames per movie 120 minutes 24 fps 172,800 frames With only 5 or 6 movies

More than a million frames!

slide-15
SLIDE 15

Our System

Three main modules: Product indexing Training phase Query phase

slide-16
SLIDE 16

Our System

Three main modules: Training phase Query phase Product indexing

slide-17
SLIDE 17

Our System: Product indexing

Fashion items and frames related in an database.

slide-18
SLIDE 18

Our System

Three main modules: Product indexing Query phase Training phase

slide-19
SLIDE 19

Our System: Training phase

BRIEF features are more constant

  • ver time than SIFT or CNN.

BRIEF SIFT

slide-20
SLIDE 20

Our System: Training phase

Similar frames are grouped into shots.

shot 2 shot 1 shot 3

slide-21
SLIDE 21

Our System: Training phase

slide-22
SLIDE 22

Our System

Three main modules: Product indexing Training phase Query phase

slide-23
SLIDE 23

Our System: Query phase

slide-24
SLIDE 24

Our System: Query phase

slide-25
SLIDE 25

Our System: Query phase

slide-26
SLIDE 26

Our System: Query phase

slide-27
SLIDE 27

Our System: Query phase

slide-28
SLIDE 28

Our System: Query phase

Use the most similar frame to find the fashion products in the indexed product database.

slide-29
SLIDE 29

Experiments - Dataset

Webcam captures video playback. Frame number is used as a ground truth. The retrieved frame should be visually similar to the annotated ground truth.

slide-30
SLIDE 30

Experiments - Retrieval Performance

Results using a single movie, 1h 49min duration

BF: Brute Force KT: Kd-Tree KF: Key Frame

Huge gain in memory requierements with our method.

slide-31
SLIDE 31

The Great Gatsby Casablanca

Match Point Maleficent Magnolia Big Fish The Help Ant-Man Her

Absolutely Anything The Social Network Captain Phillips 12 Years a Slave American Hustle Out of Africa Groundhog Day A Single Man Seven Pounds

The Wolf of Wall Street The Devil Wears Prada Puss in Boots Despicable Me Family United Marshland The Body

Harry Potter and the Deathly Hallows The Hobbit: The Desolation of Smaug Rise of the Planet of the Apes Pirates of the Caribbean Neon Genesis Evangelion 300: Rise of an Empire Grave of the Fireflies Witching and Bitching 2 Francs, 40 Pesetas Lee Daniels’ The Spanish Affair 2 The Last Circus The Physician El Niño

Experiments - Scalability

40 movies 80 hours 7 millon frames

slide-32
SLIDE 32

Experiments - Scalability

Results using 40 movies Data reduction: From 3,040M features to 58M key features.

slide-33
SLIDE 33

Conclusions

System to perform video clothing retrieval. It helps users to find items shown in videos. Based on frame retrieval and fast indexing. It scales well when the collection is increased.

slide-34
SLIDE 34

T H A N K Y O U !

N O A G A R C I A A S T O N U N I V E R S I T Y C O N T A C T : G A R C I A D N @ A S T O N . A C . U K G I T H U B : N O A G A R C I A / D R E S S T A R

C O M P U T E R V I S I O N I N F A S H I O N W O R K S H O P

slide-35
SLIDE 35

References

[1] Z. Liu, P. Luo, S. Qiu, X. Wang, and X. Tang. Deepfashion: Powering robust clothes recognition and retrieval with rich annotations. In CVPR, 2016. [2] S. Liu, Z. Song, G. Liu, C. Xu, H. Lu, and S. Yan. Street-to-shop: Cross-scenario clothing retrieval via parts alignment and auxiliary set. In CVPR, 2012. [3] J. Sivic and A. Zisserman. Video Google: a text retrieval approach to object matching in videos. In ICCV, 2003. [4] A. Anjulan and N. Canagarajah. Object based video retrieval with local region

  • tracking. Signal Processing: Image Communication, 22(7), 2007.

[5] C.-Z. Zhu and S. Satoh. Large vocabulary quantization for searching instances from

  • videos. In ACM ICMR, 2012.

[6] A. Araujo and B. Girod. Large-scale video retrieval using image queries. IEEE Transactions on Circuits and Systems for Video Technology, 2017.