Neural Outfit Recommendation DAPA Workshop @ WSDM 2019 Maarten de - - PowerPoint PPT Presentation

neural outfit recommendation
SMART_READER_LITE
LIVE PREVIEW

Neural Outfit Recommendation DAPA Workshop @ WSDM 2019 Maarten de - - PowerPoint PPT Presentation

Neural Outfit Recommendation DAPA Workshop @ WSDM 2019 Maarten de Rijke February 15, 2019 University of Amsterdam derijke@uva.nl Based on joint work with Jun Ma, Pengjie Ren, Yujie Lin, Zhaochun Ren, and Zhumin Chen 1 Background Outfit


slide-1
SLIDE 1

Neural Outfit Recommendation

DAPA Workshop @ WSDM 2019

Maarten de Rijke February 15, 2019

University of Amsterdam derijke@uva.nl

slide-2
SLIDE 2

Based on joint work with Jun Ma, Pengjie Ren, Yujie Lin, Zhaochun Ren, and Zhumin Chen

1

slide-3
SLIDE 3

Background Outfit recommendation Fashion recommendation machine Some results Conclusion

2

slide-4
SLIDE 4

Neural IR

Big uptake and injection of energy in the field

  • Learning to match
  • Learning to rank
  • Content understanding – text, image, video, . . .
  • Behavior understanding
  • . . .

3

slide-5
SLIDE 5

The need to take stock, repeatedly

Quickly building up a rich body of knowledge

  • Li and Xu (2013) – Semantic matching in search
  • Onal et al. (2018) – Neural information retrieval: At the end of the early years
  • Mitra and Craswell (2019) – An introduction to neural information retrieval
  • Li et al. (20XX) – . . .

4

slide-6
SLIDE 6

Rough edges

Lin (2018) – The Neural Hype and Comparisons Against Weak Baselines

  • Everyone is trying to win
  • “demonstrating that a new method beats previous methods on a given task or

benchmark”

  • Often, our baselines are weak

5

slide-7
SLIDE 7

Rough edges

How to improve ourselves

  • Compare apples to apples
  • Work on insights – reasons for success, reasons for failure
  • Use reference baselines

6

slide-8
SLIDE 8

Rough edges

How to improve ourselves

  • Compare apples to apples
  • Work on insights – reasons for success, reasons for failure
  • Use reference baselines
  • Share everything
  • Use reference implementations
  • Engage with product owners for additional eyes and checks
  • Win in different ways – task, constraints, metrics, . . .

6

slide-9
SLIDE 9

7

slide-10
SLIDE 10

Background Outfit recommendation Fashion recommendation machine Some results Conclusion

8

slide-11
SLIDE 11

Outfit recommendation

A different task, with a twist Fashion recommendation – increased attention Outfit recommendation – given a top (i.e., upper garment), recommend a list of bottoms (e.g., trousers or skirts) from a large collection that best match the top, and vice versa

  • Allow users to provide some descriptions as conditions that the recommended

items should accord with as much as possible

9

slide-12
SLIDE 12

Unpacking the task

Two main challenges

  • visual understanding – aims to extract effective visual features
  • visual matching – aims to model a human notion of compatibility to compute a

match between fashion items

10

slide-13
SLIDE 13

Unpacking the task

Two main challenges

  • visual understanding – aims to extract effective visual features
  • visual matching – aims to model a human notion of compatibility to compute a

match between fashion items Typically, visual understanding and matching conducted based on recommendation loss alone

  • Supervision signal is just whether two given items are matched or not and no

supervision is available to directly connect the visual signals of the fashion items

  • Can we come up with a sense of esthetics?

10

slide-14
SLIDE 14

Background Outfit recommendation Fashion recommendation machine Some results Conclusion

11

slide-15
SLIDE 15

Fashion recommendation machine

Lin et al. (2019) – Improving Outfit Recommendation with Co-supervision of Fashion Generation

1 Neural co-supervision learning framework, FARM, for outfit recommendation that

simultaneously yields recommendation and generation

2 Layer-to-layer matching mechanism as a bridge between generation and

recommendation – improves recommendation by leveraging generation features

12

slide-16
SLIDE 16

FARM architecture

13

slide-17
SLIDE 17

FARM architecture

For the fashion generator

  • Use CNN as top encoder to extract visual features from top image It
  • Learn semantic representation for bag-of-words vector d of bottom description
  • Use variational transformer to learn mapping from bottom distribution to Gaussian

distribution based on visual features of It and semantic representation of d

  • Sample a random vector from Gaussian distribution and input it to a DCNN (as

bottom generator) to generate bottom image Ig that matches It and d

  • Explicitly forces top encoder to encode more aesthetic matching information into

visual features

14

slide-18
SLIDE 18

FARM architecture

For the fashion recommender

  • Also employs CNN as bottom encoder to extract visual features from candidate

bottom image Ib

  • Evaluate matching score between Ib and (It, d) pair from three angles

1 Visual matching between Ib and It 2 Description matching between Ib and d 3 Layer-to-layer matching between Ib and Ig, which leverages generation information

to improve recommendation

15

slide-19
SLIDE 19

FARM architecture

FARM jointly trains the fashion generator and fashion recommender Three types of loss

1 Generation loss (visual + textual) 2 Loss based on ELBO 3 Recommendation loss (like BPR) 16

slide-20
SLIDE 20

Background Outfit recommendation Fashion recommendation machine Some results Conclusion

17

slide-21
SLIDE 21

A sample of results

FashionVC and ExpFashion datasets sampled from Polyvore online community 4-tuples (top, top description, bottom, bottom description)

18

slide-22
SLIDE 22

Bake-off

19

slide-23
SLIDE 23

Co-supervision learning

20

slide-24
SLIDE 24

Layer-to-layer

21

slide-25
SLIDE 25

Some samples: Real vs generated

22

slide-26
SLIDE 26

Some samples: Recommendations

23

slide-27
SLIDE 27

Some samples: Real vs generated

24

slide-28
SLIDE 28

Background Outfit recommendation Fashion recommendation machine Some results Conclusion

25

slide-29
SLIDE 29

What have we done?

Outfit recommendation

  • Visual understanding
  • Visual matching

Proposed a co-supervision learning framework, FARM

  • For visual understanding, FARM captures more aesthetic characteristics with

supervision of generation learning

  • For visual matching, FARM incorporates layer-to-layer matching mechanism to

evaluate matching score of candidate and generated items at different neural layers

26

slide-30
SLIDE 30

What should we do next?

Effectiveness of generated images to explain the recommendations? Improvement in quality of generated images leads to improvement in recommendations? How to recommend complete outfits?

27

slide-31
SLIDE 31

Playing the winning game

How to improve ourselves

  • Compare apples to apples
  • Work on insights – reasons for success, reasons for failure
  • Use reference baselines
  • Share everything
  • Use reference implementations
  • Engage with product owners for additional eyes and checks
  • Win in different ways – task, constraints, metrics, . . .

28

slide-32
SLIDE 32

References i

  • H. Li and J. Xu. Semantic matching in search. Foundations and Trends in Information Retrieval, 7(5):343–469,

2013.

  • J. Lin. The neural hype and comparisons against weak baselines. SIGIR Forum, 52(2):40–51, 2018.
  • Y. Lin, P. Ren, Z. Chen, Z. Ren, J. Ma, and M. de Rijke. Improving outfit recommendation with co-supervision
  • f fashion generation. In The Web Conference 2019, May 2019.
  • B. Mitra and N. Craswell. An introduction to neural information retrieval. Foundations and Trends in

Information Retrieval, 13(1), January 2019.

  • K. D. Onal, Y. Zhang, I. S. Altingovde, M. M. Rahman, P. Karagoz, A. Braylan, B. Dang, H.-L. Chang,
  • H. Kim, Q. McNamara, A. Angert, E. Banner, V. Khetan, T. McDonnell, A. T. Nguyen, D. Xu, B. C.

Wallace, M. de Rijke, and M. Lease. Neural information retrieval: At the end of the early years. Information Retrieval Journal, 21(2–3):111–182, June 2018.

slide-33
SLIDE 33

Acknowledgments

All content represents the opinion of the author(s), which is not necessarily shared or endorsed by their employers and/or sponsors.