Learning visual styles Kristen Grauman Department of Computer - - PowerPoint PPT Presentation

learning visual styles
SMART_READER_LITE
LIVE PREVIEW

Learning visual styles Kristen Grauman Department of Computer - - PowerPoint PPT Presentation

Learning visual styles Kristen Grauman Department of Computer Science University of Texas at Austin Visual recognition + fashion Recognizing Recognizing instances categories Kristen Grauman, UT Austin Visual recognition + fashion


slide-1
SLIDE 1

Learning visual styles

Kristen Grauman Department of Computer Science University of Texas at Austin

slide-2
SLIDE 2

Visual recognition + fashion

Recognizing instances Recognizing categories

Kristen Grauman, UT Austin

slide-3
SLIDE 3

Visual recognition + fashion

Recognizing instances Recognizing categories

Kristen Grauman, UT Austin

slide-4
SLIDE 4

Visual recognition + fashion

But fashion also introduces new challenges for high-level vision:

Subtle distinctions Personalization and taste Composition and compatibility

Requires computational models for style

Kristen Grauman, UT Austin

slide-5
SLIDE 5

Visual recognition + fashion

Many applications for learning to model style

Kristen Grauman, UT Austin

slide-6
SLIDE 6

This talk

  • Subtle visual attributes
  • Style discovery and forecasting
  • Creating capsule wardrobes
slide-7
SLIDE 7

Smiling Not Smiling

???

Parikh & Grauman, ICCV 2011 Singh & Lee, ECCV 2016

>?

  • High-level semantic properties shared by objects
  • Human-understandable and machine-detectable

[Oliva et al. 2001, Ferrari & Zisserman 2007, Kumar et al. 2008, Farhadi et al. 2009, Lampert et al. 2009, Endres et al. 2010, Wang & Mori 2010, Berg et al. 2010, Branson et al. 2010, Parikh & Grauman 2011, …]

Relative attributes

slide-8
SLIDE 8

Not Smiling

Parikh & Grauman, ICCV 2011 Singh & Lee, ECCV 2016

>?

  • Learn a ranking function per attribute

<

Relative attributes

slide-9
SLIDE 9

Relative attributes

Now we can compare images by attribute’s “strength”

bright smiling natural

[Parikh & Grauman, ICCV 2011]

slide-10
SLIDE 10

WhittleSearch: Relative attribute feedback

Whittle away irrelevant images via precise semantic feedback

Feedback: “shinier than these” Feedback: “less formal than these”

Refined top search results Initial top search results

… …

Query: “white high-heeled shoes”

[Kovashka, Parikh, and Grauman, CVPR 2012, IJCV 2015]

slide-11
SLIDE 11

Coarse

vs.

Fine-Grained

vs.

Challenge: fine-grained comparisons

Sparsity of supervision problem:

  • 1. Label availability: lots of possible pairs.
  • 2. Image availability: subtleties hard to curate.

Which is more sporty?

Kristen Grauman, UT Austin

slide-12
SLIDE 12

Idea: Semantic jitter

Overcome sparsity of available fine-grained image pairs with attribute-conditioned image generation

Yu & Grauman, ICCV 2017

Images generated by Yan et al. 2016 Attribute2Image CVAE approach

slide-13
SLIDE 13

Idea: Semantic jitter

Overcome sparsity of available fine-grained image pairs with attribute-conditioned image generation

Yu & Grauman, ICCV 2017

Images generated by Yan et al. 2016 Attribute2Image CVAE approach

slide-14
SLIDE 14

Overcome sparsity of available fine-grained image pairs with attribute-conditioned image generation

  • sporty
  • pen

comfort

+

Our idea: Semantic jitter

+ +

  • Status quo:

Low-level jitter

vs.

Idea: Semantic jitter

Yu & Grauman, ICCV 2017

slide-15
SLIDE 15

Train rankers with both real and synthetic image pairs, test on real fine-grained pairs.

Novel Pair

Real Pairs Synthetic Pairs

vs.

80 85 90 95

Attribute accuracy

Semantic jitter for attribute learning

Ranking functions trained with deep spatial transformer ranking networks [Singh & Lee 2016] or Local RankSVM [Yu & Grauman 2014] Yu & Grauman, ICCV 2017

Faces, Shoes

slide-16
SLIDE 16

[Singh 2016] [Yu 2014] [Parikh 2011]

  • State-of-the-art fine-grained comparisons
  • All models trained on 64x64 images

> >

Open

Yu & Grauman, ICCV 2017

Semantic jitter for attribute learning

[Singh 2016] [Yu 2014] [Parikh 2011]

UT Zappos-50K dataset

slide-17
SLIDE 17

Challenge: Which attributes matter?

slide-18
SLIDE 18

Idea: Prominent relative attributes

Chen & Grauman, CVPR 2018

Infer which comparisons are perceptually salient

slide-19
SLIDE 19

Approach: What causes prominence?

  • Large difference in

attribute strength:

  • Unusual and

uncommon attribute

  • ccurrences:
  • Absence of other

noticeable differences:

Visible Forehead Colorful Dark Hair

In general: Interactions between all the relative attributes in an image pair cause prominent differences.

Prominent Difference: Chen & Grauman, CVPR 2018

slide-20
SLIDE 20

Approach: Predicting prominent differences

  • Relative

Attribute Rankers Relative Attribute Rankers

Prominence Multiclass Classifier

  • input:
  • Symmetric

encoding Chen & Grauman, CVPR 2018

slide-21
SLIDE 21

Accuracy Accuracy # Top prominent as ground truth # Top prominent as ground truth Ranking SVM Deep CNN

Results: Prominent differences

Chen & Grauman, CVPR 2018

slide-22
SLIDE 22

(Top 3 prominent differences for each pair)

Results: Prominent differences

Chen & Grauman, CVPR 2018

slide-23
SLIDE 23

Prominent differences: impact on visual search

Feedback: “shinier than these” Feedback: “less formal than these”

Refined top search results Initial top search results

… …

Query: “white high-heeled shoes”

Leverage prominence to better focus search results Faster retrieval of user’s target image without using any additional user feedback.

Chen & Grauman, CVPR 2018

slide-24
SLIDE 24

This talk

  • Subtle visual attributes
  • Style discovery and forecasting
  • Creating capsule wardrobes
slide-25
SLIDE 25

From items to styles

Kristen Grauman, UT Austin

slide-26
SLIDE 26

From items to styles

Requires a representation of visual style Challenges:

  • Same “look” manifests in different garments
  • Emerges organically and evolves over time
  • Soft boundaries

CNN image similarity manually defined style labels stylistic similarity?

Kristen Grauman, UT Austin

slide-27
SLIDE 27

blazer-color-blue pants-color-red

  • Material, cut, pattern
  • Fine-tune classification on ResNet50
  • Color, clothing article:
  • Segmentation on DeepLab-DenseCRF

Detect localized attributes

slide-28
SLIDE 28

Figure credit: Chris Bail

Topic models, e.g., Latent Dirichlet Allocation (LDA)

Topic models: Inspiration from text

slide-29
SLIDE 29

Idea: Discovering visual styles

Unsupervised learning of a style-coherent embedding with a polylingual topic model

Mimno et al. "Polylingual topic models." EMNLP 2009.

Hsiao & Grauman, ICCV 2017

...

An outfit is a mixture of (latent) styles. A style is a distribution over attributes. An outfit is a mixture of (latent) styles. A style is a distribution over attributes.

slide-30
SLIDE 30

Styles we automatically discover in the Amazon dataset [McAuley et al. 2015]

Example discovered styles (dresses)

slide-31
SLIDE 31

Example discovered styles (dresses)

Styles we automatically discover in the Amazon dataset [McAuley et al. 2015]

slide-32
SLIDE 32

Styles we automatically discover in the HipsterWars dataset [Kiapour et al]

Example discovered styles (full outfit)

slide-33
SLIDE 33

Style discovery accuracy

Attributes and PolyLDA show result if using either predicted attributes (first) or ground truth attributes (second).

How well do our discovered styles align with human-perceived styles?

slide-34
SLIDE 34

Discovered latent styles (topics) Image embedding

Style-coherent embedding

slide-35
SLIDE 35

Discovered latent styles (topics) Image embedding

Style-coherent embedding Leverage this embedding for 1) Style browsing 2) Style mixing 3) Style summarization 4) Style forecasting Leverage this embedding for 1) Style browsing 2) Style mixing 3) Style summarization 4) Style forecasting

slide-36
SLIDE 36

Style browsing results

Maintain style coherence while also permitting diversity

vs. query

Similar in CNN space Similar in style space (ours)

slide-37
SLIDE 37

Style browsing results

HipsterWars dataset

[Kiapour ECCV 2014]

DeepFashion dataset

[Liu CVPR 2016]

Maintain style coherence while also permitting diversity

slide-38
SLIDE 38

Bohemian Hipster

Our embedding naturally facilitates browsing for mixes of user-selected styles

Mixing styles

Hsiao & Grauman, ICCV 2017

slide-39
SLIDE 39

Bohemian Hipster

Our embedding naturally facilitates browsing for mixes of user-selected styles

Mixing styles

Hsiao & Grauman, ICCV 2017

slide-40
SLIDE 40

Style summarization

Given a gallery of photos Given a gallery of photos Summarize by dominant styles Summarize by dominant styles

Hsiao & Grauman, ICCV 2017

slide-41
SLIDE 41

Style forecasting

  • 1. Visual style discovery
  • 2. Construct style

temporal trajectory

  • 3. Forecast future trend
  • 4. Style description via

signature attributes

Al-Halah et al., ICCV 2017

Can we predict the future popularity of styles?

slide-42
SLIDE 42

Amazon dataset

[McAuley et al. SIGIR 2015]

  • Dresses, Tops & Tees and Shirts -- over 6 years
  • 80,000 items and 210,000 transactions
slide-43
SLIDE 43

Visual trend forecasting

We predict the future popularity of each style

Amazon dataset [McAuley et al. SIGIR 2015] Al-Halah et al., ICCV 2017

slide-44
SLIDE 44

Out of fashion Classic In fashion Trending Unpopular Re-emerging

Lifecycle of a visual style

Al-Halah et al., ICCV 2017

slide-45
SLIDE 45

Interpretable forecasts

What kind of fabric, texture, color will be popular next year?

slide-46
SLIDE 46

This talk

  • Subtle visual attributes
  • Style discovery and forecasting
  • Creating capsule wardrobes
slide-47
SLIDE 47

Capsule pieces

Outfit #1 Outfit #2 Outfit #3 Outfit #4 Outfit #5

Creating a “capsule” wardrobe

Goal: Select minimal set of pieces that mix and match well to create many viable outfits

slide-48
SLIDE 48

Capsule pieces

Outfit #1 Outfit #2

Incompatible outfits!

Creating a “capsule” wardrobe

slide-49
SLIDE 49

Outfit #1 Outfit #2 Outfit #3

All too similar…

Capsule pieces

Creating a “capsule” wardrobe

slide-50
SLIDE 50

All compatible and diverse.

Outfit #1 Outfit #2 Outfit #3 Outfit #4

Capsule pieces

Creating a “capsule” wardrobe

slide-51
SLIDE 51

Q1: How to learn visual compatibility?

Co-purchase data

[McAuley 2015, Veit 2015, He 2016]

Manual curation

[Li 2017, Song 2017, Han 2017]

Unlabeled in the wild photos?

Supervised

slide-52
SLIDE 52

Style model → Visual compatibility

Gauge mutual compatibility of garments via likelihood under topic model

Hsiao & Grauman, CVPR 2018

Recall: an outfit is a mixture of (latent) styles. A style is a distribution over attributes. Recall: an outfit is a mixture of (latent) styles. A style is a distribution over attributes.

slide-53
SLIDE 53
  • BiLSTM [Han et al. 17]:

unsupervised sequential model trained on Polyvore sets.

  • Monomer [He et al. 16]: supervised

embedding trained on Amazon products co-purchase info.

Visual compatibility results

Encouraging results for learning compatibility from unlabeled, full-body images

slide-54
SLIDE 54

Most compatible

Visual compatibility results

Hsiao & Grauman, CVPR 2018

slide-55
SLIDE 55

Least compatible

Visual compatibility results

Hsiao & Grauman, CVPR 2018

slide-56
SLIDE 56

Q2: How to optimize a capsule?

Capsule pieces

Outfit #1 Outfit #2 Outfit #3 Outfit #4 Outfit #5

set of garments = argmax compatibility + versatility

Pose as subset selection problem

Hsiao & Grauman, CVPR 2018

slide-57
SLIDE 57

Outfit #1 Outfit #2 Outfit #3 Outfit #4

y

Capsule pieces

A0T A1T A2T

Capsule via subset selection

set of garments = argmax compatibility + versatility

Pose as subset selection problem

  • ptimal set of

composed

  • utfits

…..

Hsiao & Grauman, CVPR 2018

slide-58
SLIDE 58

Outfit #1 Outfit #2 Outfit #3 Outfit #4

y

…..

Capsule pieces

A0T A1T A2T

Capsule via subset selection

c(o1) c(o3) c(o2) c(o4)

Compatibility scored by topic model likelihood

modular modular

slide-59
SLIDE 59

y

Capsule pieces

A0T A1T A2T

Capsule via subset selection

Versatility scored by style coverage

work evening shopping

z1 z2 z3

style

  • utfit
slide-60
SLIDE 60

Capsule pieces

A0T A1T A2T

Capsule via subset selection

Versatility scored by style coverage

submodular submodular

work eve shop

z1 z2 z3

coversz2 coversz3 covers z1 coversz3

slide-61
SLIDE 61

Capsule pieces

A0T A1T A2T

Capsule via subset selection

Versatility scored by style coverage

submodular submodular

y

modular modular

Compatibility scored by topic model likelihood

  • ptimal set of
  • utfits

We devise EM-like solution for which we can show (sub)modularity holds. But each addition is a garment!

slide-62
SLIDE 62

Distance from “ground truth” manually curated capsules from Polyvore.com

Quantifying capsule error

0.5 1 1.5 2 2.5

Cluster Centers MMR Ours

Hsiao & Grauman, CVPR 2018

slide-63
SLIDE 63

Iterative preferred 59% of the time

  • vs. naïve greedy

Human subject study

14 subjects, female, ages 20’s-60’s

Hsiao & Grauman, CVPR 2018

slide-64
SLIDE 64

Personalized capsule

Example personalized capsule

Discover user’s style preferences from album

Hsiao & Grauman, CVPR 2018

slide-65
SLIDE 65

Personalized capsule

Example personalized capsule

Discover user’s style preferences from album

Hsiao & Grauman, CVPR 2018

slide-66
SLIDE 66

Summary

  • Visual style introduces new problems for

computer vision beyond traditional recognition

  • New ideas and methods for:

– Subtle visual comparisons – Style discovery and forecasting – Capsule wardrobe creation Aron Yu Kimberly Hsiao Ziad Al-Halah Steven Chen

slide-67
SLIDE 67

Papers

  • Learning the Latent "Look": Unsupervised Discovery of a Style-Coherent

Embedding from Fashion Images. W-L. Hsiao and K.

  • Grauman. In Proceedings of the International Conference on Computer

Vision (ICCV), Venice, Italy, Oct 2017.

  • Creating Capsule Wardrobes from Fashion Images. W-L. Hsiao and K.
  • Grauman. In Proceedings of IEEE Conference on Computer Vision and

Pattern Recognition (CVPR), Salt Lake City, June 2018.

  • Compare and Contrast: Learning Prominent Visual Differences. S. Chen

and K. Grauman. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, June 2018.

  • Fashion Forward: Forecasting Visual Style in Fashion. Z. Al-Halah, R.

Stiefelhagen, and K. Grauman. In Proceedings of the International Conference on Computer Vision (ICCV), Venice, Italy, Oct 2017.

  • Semantic Jitter: Dense Supervision for Visual Comparisons via Synthetic
  • Images. A. Yu and K. Grauman. In Proceedings of the International

Conference on Computer Vision (ICCV), Venice, Italy, Oct 2017.

Code and data: http://www.cs.utexas.edu/~grauman/research/pubs.html