Using Deep Learning to rank and tag 6,30 5,90 millions of hotel - - PowerPoint PPT Presentation

using deep learning to rank and tag
SMART_READER_LITE
LIVE PREVIEW

Using Deep Learning to rank and tag 6,30 5,90 millions of hotel - - PowerPoint PPT Presentation

0,00 9,80 16,10 16,10 7,40 6,94 Using Deep Learning to rank and tag 6,30 5,90 millions of hotel images 15/11/2018 - PyParis 2018 0,00 Christopher Lennan (Senior Data Scientist) @chris_lennan Tanuj Jain (Data Scientist) @tjainn


slide-1
SLIDE 1

16,10 0,00 9,80 16,10 0,00 6,94 5,90 6,30 7,40 8,10 0,32 0,32

1

Using Deep Learning to rank and tag millions of hotel images

15/11/2018 - PyParis 2018

Christopher Lennan (Senior Data Scientist) @chris_lennan Tanuj Jain (Data Scientist) @tjainn

#idealoTech

slide-2
SLIDE 2

16,10 0,00 9,80 16,10 0,00 6,94 5,90 6,30 7,40 8,10 0,32 0,32

Agenda

2

1. idealo.de 2. Business Motivation 3. Models and Training 4. Image Tagging 5. Image Aesthetics 6. Summary

slide-3
SLIDE 3

16,10 0,00 9,80 16,10 0,00 6,94 5,90 6,30 7,40 8,10 0,32 0,32

Some Key Facts

18

More than 18 years experience 700 “idealos” from 40 nations Active in 6 different countries (DE, AT, ES, IT, FR, UK) 16 million users/month 1 50.000 shops Over 330 million offers for 2 million products Tüv certified comparison portal 2 Germany's 4th largest eCommerce website

slide-4
SLIDE 4

16,10 0,00 9,80 16,10 0,00 6,94 5,90 6,30 7,40 8,10 0,32 0,32

Motivation

4

slide-5
SLIDE 5

16,10 0,00 9,80 16,10 0,00 6,94 5,90 6,30 7,40 8,10 0,32 0,32

idealo hotel price comparison

hotel.idealo.de

5

  • 2.306.658 accommodations
  • 308.519.299 images
  • ~ 133 images per

accommodation

slide-6
SLIDE 6

16,10 0,00 9,80 16,10 0,00 6,94 5,90 6,30 7,40 8,10 0,32 0,32

Importance of Photography for Hotels

6

“.. after price, photography is the most important factor for travelers and prospects scanning OTA sites..” “.. Photography plays a role of 60% in the decision to book with a particular hotel ..” “.. study published today by TripAdvisor, it would seem like photos have the greatest impact driving engagement from travelers researching on hotel and B&B pages ..”

slide-7
SLIDE 7

16,10 0,00 9,80 16,10 0,00 6,94 5,90 6,30 7,40 8,10 0,32 0,32

7

slide-8
SLIDE 8

16,10 0,00 9,80 16,10 0,00 6,94 5,90 6,30 7,40 8,10 0,32 0,32

8

slide-9
SLIDE 9

16,10 0,00 9,80 16,10 0,00 6,94 5,90 6,30 7,40 8,10 0,32 0,32

9

slide-10
SLIDE 10

1 2 3 4 5 6 7 8 9 10 11 12 13

slide-11
SLIDE 11

16,10 0,00 9,80 16,10 0,00 6,94 5,90 6,30 7,40 8,10 0,32 0,32

11

Position: 19 Position: 1

Current image placement

Image Aesthetics

slide-12
SLIDE 12

16,10 0,00 9,80 16,10 0,00 6,94 5,90 6,30 7,40 8,10 0,32 0,32

12

Image Aesthetics

Current image placement

Position: 17 Position: 3

slide-13
SLIDE 13

16,10 0,00 9,80 16,10 0,00 6,94 5,90 6,30 7,40 8,10 0,32 0,32

13

Beautiful images should appear earlier in the gallery

slide-14
SLIDE 14

1 2 3 4 5 6 7 8 9 10 11 12 13

slide-15
SLIDE 15

1 2 3 4 5 6 7 8 9 10 11 12 13

slide-16
SLIDE 16

16,10 0,00 9,80 16,10 0,00 6,94 5,90 6,30 7,40 8,10 0,32 0,32

16

Ensure different areas get depicted

slide-17
SLIDE 17

1 2 3 4 5 6 7 8

Bedroom Bathroom Restaurant Facade Fitness Studio Kitchen

slide-18
SLIDE 18

16,10 0,00 9,80 16,10 0,00 6,94 5,90 6,30 7,40 8,10 0,32 0,32

Understanding Image Content

18

1. Tag the image with the hotel property area 2. Predict aesthetic quality

Two part problem

slide-19
SLIDE 19

16,10 0,00 9,80 16,10 0,00 6,94 5,90 6,30 7,40 8,10 0,32 0,32

Models & Training

19

slide-20
SLIDE 20

16,10 0,00 9,80 16,10 0,00 6,94 5,90 6,30 7,40 8,10 0,32 0,32

Transfer Learning

20

  • Use pre-trained CNN that was trained on millions of images

(e.g. MobileNet or VGG16)

  • Replace top layers so that the output fits with classification task
  • Train existing and new layer weights
slide-21
SLIDE 21

16,10 0,00 9,80 16,10 0,00 6,94 5,90 6,30 7,40 8,10 0,32 0,32

Transfer Learning

CNN architecture (VGG16)

21

slide-22
SLIDE 22

16,10 0,00 9,80 16,10 0,00 6,94 5,90 6,30 7,40 8,10 0,32 0,32

Training regime

22

1. Only train the newly added dense layers with high learning rate 2. Then train all layers with low learning rate Goal: Do not juggle around the pre-trained convolutional weights too much

slide-23
SLIDE 23

16,10 0,00 9,80 16,10 0,00 6,94 5,90 6,30 7,40 8,10 0,32 0,32

23

Training regime

slide-24
SLIDE 24

16,10 0,00 9,80 16,10 0,00 6,94 5,90 6,30 7,40 8,10 0,32 0,32

  • CEL generally used for “one-class” ground truth classifications (e.g. image tagging)
  • CEL ignores inter-class relationships between score buckets

24

Loss functions

Cross-entropy loss (CEL)

source: https://ssq.github.io/2017/02/06/Udacity%20MLND%20Notebook/

slide-25
SLIDE 25

16,10 0,00 9,80 16,10 0,00 6,94 5,90 6,30 7,40 8,10 0,32 0,32

25

Loss functions

  • For ordered classes, classification settings can outperform regressions
  • Training on datasets with intrinsic ordering can benefit from EMD loss objective

Earth Mover’s Distance (EMD)

slide-26
SLIDE 26

16,10 0,00 9,80 16,10 0,00 6,94 5,90 6,30 7,40 8,10 0,32 0,32

Local AWS

26

GPU training workflow

ECR

push Custom AMI

datasets nvidia-docker

EC2

GPU instance launch Docker Machine train script Docker image build Dockerfile SSH evaluation script Docker Machine

EC2

GPU instance launch Jupyter notebook

Setup Train Evaluate

launch evaluation container with nvidia-docker

pull image copy existing model

S3

launch training container with nvidia-docker

store train outputs pull image copy existing model

slide-27
SLIDE 27

16,10 0,00 9,80 16,10 0,00 6,94 5,90 6,30 7,40 8,10 0,32 0,32

Image Tagging

27

slide-28
SLIDE 28

16,10 0,00 9,80 16,10 0,00 6,94 5,90 6,30 7,40 8,10 0,32 0,32

Tagging Problem

  • Given an image, tag it as belonging to a single class
  • Multiclass classification model with classes:

○ Bedroom ○ Bathroom ○ Foyer ○ Restaurant ○ Swimming Pool ○ Kitchen ○ View of Exterior (Facade) ○ Reception

28

slide-29
SLIDE 29

16,10 0,00 9,80 16,10 0,00 6,94 5,90 6,30 7,40 8,10 0,32 0,32

Multiple Datasets

Will go over them one-by-one and see:

  • Dataset properties
  • Results
  • Issues

29

slide-30
SLIDE 30

16,10 0,00 9,80 16,10 0,00 6,94 5,90 6,30 7,40 8,10 0,32 0,32

Wellness Dataset

  • Idealo in-house pre-labelled images
  • Mostly pictures of 2 or 3 stars properties

30

slide-31
SLIDE 31

16,10 0,00 9,80 16,10 0,00 6,94 5,90 6,30 7,40 8,10 0,32 0,32

Wellness Dataset

  • Balanced: Equal sample count in

all categories for all sets

31

slide-32
SLIDE 32

16,10 0,00 9,80 16,10 0,00 6,94 5,90 6,30 7,40 8,10 0,32 0,32

Wellness Dataset: Metrics

Top-1- accuracy: 86%

32

slide-33
SLIDE 33

16,10 0,00 9,80 16,10 0,00 6,94 5,90 6,30 7,40 8,10 0,32 0,32

Wellness Dataset: Wrong Predictions

True Class of these images: BATHROOM, Predicted as: RECEPTION

Rectangular structure = Reception with high probability → BIAS!

33

slide-34
SLIDE 34

16,10 0,00 9,80 16,10 0,00 6,94 5,90 6,30 7,40 8,10 0,32 0,32

Wellness Dataset: Wrong Predictions

True Class of these images: BATHROOM

Wrong true label of images → NOISE in the dataset!

34

slide-35
SLIDE 35

16,10 0,00 9,80 16,10 0,00 6,94 5,90 6,30 7,40 8,10 0,32 0,32

Correcting Bias

  • Augmentation operations, same for every class:

○ Random cropping ○ Rotation ○ Horizontal flipping

  • Data enrichment:

External data from google images

35

slide-36
SLIDE 36

16,10 0,00 9,80 16,10 0,00 6,94 5,90 6,30 7,40 8,10 0,32 0,32

Augmented Wellness + Google Dataset: Metrics

Top-1- accuracy: 88%

36

slide-37
SLIDE 37

16,10 0,00 9,80 16,10 0,00 6,94 5,90 6,30 7,40 8,10 0,32 0,32

Gotta Clean!

37

slide-38
SLIDE 38

16,10 0,00 9,80 16,10 0,00 6,94 5,90 6,30 7,40 8,10 0,32 0,32

Cleaning Dataset

  • Hand-cleaned each category:

○ Deleted pictures that do not belong in its category ○ Removed duplicates (presence of duplicates can give us wrong metrics) ○ Added more images from external sources for classes with a small number of images left after cleaning

38

slide-39
SLIDE 39

16,10 0,00 9,80 16,10 0,00 6,94 5,90 6,30 7,40 8,10 0,32 0,32

Cleaned Data: Metrics

Top-1- accuracy: 91%

39

slide-40
SLIDE 40

16,10 0,00 9,80 16,10 0,00 6,94 5,90 6,30 7,40 8,10 0,32 0,32

Cleaned Dataset: Results

  • Bathroom vs. Reception confusion has almost vanished!
  • View_of_exterior vs Pool confusion has reduced
  • Foyer performance:

○ Most misclassifications of Foyer gets assigned to Reception ○ This is human problem as well!

40

slide-41
SLIDE 41

16,10 0,00 9,80 16,10 0,00 6,94 5,90 6,30 7,40 8,10 0,32 0,32

Foyer or Reception?

41

slide-42
SLIDE 42

16,10 0,00 9,80 16,10 0,00 6,94 5,90 6,30 7,40 8,10 0,32 0,32

Learnings so far

  • The model can only be as good as the data (cleaning)
  • Foyer is a hard category to predict

42

slide-43
SLIDE 43

16,10 0,00 9,80 16,10 0,00 6,94 5,90 6,30 7,40 8,10 0,32 0,32

Understanding Model Decisions

43

slide-44
SLIDE 44

16,10 0,00 9,80 16,10 0,00 6,94 5,90 6,30 7,40 8,10 0,32 0,32

Understanding Decisions: Class Activation Maps

  • Use the penultimate Global Average Pooling Layer (GAP) to get class activation map
  • Highlights discriminative region that lead to a classification

44

slide-45
SLIDE 45

16,10 0,00 9,80 16,10 0,00 6,94 5,90 6,30 7,40 8,10 0,32 0,32

Insights With CAM

Swimming Pool misclassified as Bathroom

45

CAM

slide-46
SLIDE 46

16,10 0,00 9,80 16,10 0,00 6,94 5,90 6,30 7,40 8,10 0,32 0,32

Insights With CAM

Swimming Pool misclassified as Bathroom

46

CAM

slide-47
SLIDE 47

16,10 0,00 9,80 16,10 0,00 6,94 5,90 6,30 7,40 8,10 0,32 0,32

Insights With CAM

Swimming Pool misclassified as Bathroom

47

CAM

slide-48
SLIDE 48

16,10 0,00 9,80 16,10 0,00 6,94 5,90 6,30 7,40 8,10 0,32 0,32

Insights With CAM

Swimming Pool misclassified as Bathroom

Using rails to misidentify Pool as Bathroom.

48

slide-49
SLIDE 49

16,10 0,00 9,80 16,10 0,00 6,94 5,90 6,30 7,40 8,10 0,32 0,32

Insights With CAM

Bathroom correct classification

49

CAM

slide-50
SLIDE 50

16,10 0,00 9,80 16,10 0,00 6,94 5,90 6,30 7,40 8,10 0,32 0,32

Insights With CAM

Bathroom correct classification

50

CAM

slide-51
SLIDE 51

16,10 0,00 9,80 16,10 0,00 6,94 5,90 6,30 7,40 8,10 0,32 0,32

Insights With CAM

Bathroom correct classification

51

CAM

slide-52
SLIDE 52

16,10 0,00 9,80 16,10 0,00 6,94 5,90 6,30 7,40 8,10 0,32 0,32

Insights With CAM

Bathroom correct classification

Using faucets to correctly identify Bathroom.

52

slide-53
SLIDE 53

16,10 0,00 9,80 16,10 0,00 6,94 5,90 6,30 7,40 8,10 0,32 0,32

Learnings so far

  • Attribution techniques like CAM lend interpretability
  • CAM can drive data collection in specific directions

53

slide-54
SLIDE 54

16,10 0,00 9,80 16,10 0,00 6,94 5,90 6,30 7,40 8,10 0,32 0,32

Tagging Next Steps

1. Add still more data a. Explore manual tagging options for training (Example: Amazon Mechanical Turk) 2. Add more classes a. Fitness Studio b. Conference Room c. Other

54

slide-55
SLIDE 55

16,10 0,00 9,80 16,10 0,00 6,94 5,90 6,30 7,40 8,10 0,32 0,32

Image Aesthetics

slide-56
SLIDE 56

16,10 0,00 9,80 16,10 0,00 6,94 5,90 6,30 7,40 8,10 0,32 0,32

Ground Truth Labels

For the NIMA model we need “true” probability distribution over all classes for each image:

  • AVA dataset: we have frequencies over all classes for each image

→ normalize frequencies to get “true” probability distribution

56

(6.151 / 1.334)

slide-57
SLIDE 57

16,10 0,00 9,80 16,10 0,00 6,94 5,90 6,30 7,40 8,10 0,32 0,32

Iterations

57

We have gone through two iterations of the aesthetic model:

  • First iteration - Train on AVA Dataset
  • Second iteration - Fine-tune first iteration model on in-house labelled data
slide-58
SLIDE 58

16,10 0,00 9,80 16,10 0,00 6,94 5,90 6,30 7,40 8,10 0,32 0,32

Results - first iteration

58

Linear correlation coefficient (LCC): 0.5987 Spearman's correlation coefficient (SCRR): 0.6072 Earth Mover's Distance: 0.2018 Accuracy (threshold at 5): 0.74

Aesthetic model - MobileNet

slide-59
SLIDE 59

16,10 0,00 9,80 16,10 0,00 6,94 5,90 6,30 7,40 8,10 0,32 0,32

Examples - first iteration

Aesthetic model

59

slide-60
SLIDE 60

16,10 0,00 9,80 16,10 0,00 6,94 5,90 6,30 7,40 8,10 0,32 0,32

Examples - first iteration

Aesthetic model

60

slide-61
SLIDE 61

16,10 0,00 9,80 16,10 0,00 6,94 5,90 6,30 7,40 8,10 0,32 0,32

Examples - first iteration

Aesthetic model

61

slide-62
SLIDE 62

16,10 0,00 9,80 16,10 0,00 6,94 5,90 6,30 7,40 8,10 0,32 0,32

Examples - first iteration

Aesthetic model

62

slide-63
SLIDE 63

16,10 0,00 9,80 16,10 0,00 6,94 5,90 6,30 7,40 8,10 0,32 0,32

Examples - first iteration

Aesthetic model

63

slide-64
SLIDE 64

16,10 0,00 9,80 16,10 0,00 6,94 5,90 6,30 7,40 8,10 0,32 0,32

Results - second iteration

64

  • We built a simple labeling application
  • http://image-aesthetic-labelling-app-nima.apps.eu.idealo.com/
  • ~ 12 people from idealo Reise and Data Science labeled

○ 1000 hotel images for aesthetics

  • We fine-tuned the aesthetic model with 800 training images
  • Built aesthetic test dataset with 200 images
slide-65
SLIDE 65

16,10 0,00 9,80 16,10 0,00 6,94 5,90 6,30 7,40 8,10 0,32 0,32

Results - second iteration

65

Linear correlation coefficient (LCC): 0.7986 Spearman's correlation coefficient (SCRR): 0.7743 Earth Mover's Distance: 0.1236 Accuracy (threshold at 5): 0.85

Aesthetic model - MobileNet

slide-66
SLIDE 66

16,10 0,00 9,80 16,10 0,00 6,94 5,90 6,30 7,40 8,10 0,32 0,32

Examples - second iteration

Aesthetic model

66

slide-67
SLIDE 67

16,10 0,00 9,80 16,10 0,00 6,94 5,90 6,30 7,40 8,10 0,32 0,32

Examples - second iteration

Aesthetic model

67

slide-68
SLIDE 68

16,10 0,00 9,80 16,10 0,00 6,94 5,90 6,30 7,40 8,10 0,32 0,32

Examples - second iteration

Aesthetic model

68

slide-69
SLIDE 69

16,10 0,00 9,80 16,10 0,00 6,94 5,90 6,30 7,40 8,10 0,32 0,32

Examples - second iteration

Aesthetic model

69

slide-70
SLIDE 70

16,10 0,00 9,80 16,10 0,00 6,94 5,90 6,30 7,40 8,10 0,32 0,32

Examples - second iteration

Aesthetic model

70

slide-71
SLIDE 71

16,10 0,00 9,80 16,10 0,00 6,94 5,90 6,30 7,40 8,10 0,32 0,32

Production

Aesthetic model

71

slide-72
SLIDE 72

16,10 0,00 9,80 16,10 0,00 6,94 5,90 6,30 7,40 8,10 0,32 0,32

Production

Aesthetic model

72

  • To date we have scored ~280 million images
  • Distribution of scores (sample of 1 million scores):
slide-73
SLIDE 73

16,10 0,00 9,80 16,10 0,00 6,94 5,90 6,30 7,40 8,10 0,32 0,32

Production - Low Scores

Aesthetic model

73

slide-74
SLIDE 74

16,10 0,00 9,80 16,10 0,00 6,94 5,90 6,30 7,40 8,10 0,32 0,32

Production - Medium Scores

Aesthetic model

74

slide-75
SLIDE 75

16,10 0,00 9,80 16,10 0,00 6,94 5,90 6,30 7,40 8,10 0,32 0,32

Production - High Scores

Aesthetic model

75

slide-76
SLIDE 76

16,10 0,00 9,80 16,10 0,00 6,94 5,90 6,30 7,40 8,10 0,32 0,32

Understanding Model Decisions

76

slide-77
SLIDE 77

16,10 0,00 9,80 16,10 0,00 6,94 5,90 6,30 7,40 8,10 0,32 0,32

Convolutional Filter Visualisations

Layer 23

MobileNet original MobileNet Aesthetic

77

slide-78
SLIDE 78

16,10 0,00 9,80 16,10 0,00 6,94 5,90 6,30 7,40 8,10 0,32 0,32

Convolutional Filter Visualisations

Layer 51

MobileNet original MobileNet Aesthetic

78

slide-79
SLIDE 79

16,10 0,00 9,80 16,10 0,00 6,94 5,90 6,30 7,40 8,10 0,32 0,32

Convolutional Filter Visualisations

Layer 79

MobileNet original MobileNet Aesthetic

79

slide-80
SLIDE 80

16,10 0,00 9,80 16,10 0,00 6,94 5,90 6,30 7,40 8,10 0,32 0,32

Aesthetic Learnings

  • Hotel specific labeled data is key - Aesthetic model improved markedly from 800

additional training samples

  • NIMA only requires few samples to achieve good results (EMD loss)
  • Labeled hotel images also important for test set (model evaluation)
  • Training on GPU significantly improved training time (~30 fold)

80

slide-81
SLIDE 81

16,10 0,00 9,80 16,10 0,00 6,94 5,90 6,30 7,40 8,10 0,32 0,32

  • Continue labeling images for aesthetic classifier
  • Introduce new desirable biases in labeling (e.g. low technical quality == low aesthetics)
  • Improve prediction speed of models (e.g. lighter CNN architectures)

Aesthetics Next Steps

81

slide-82
SLIDE 82

16,10 0,00 9,80 16,10 0,00 6,94 5,90 6,30 7,40 8,10 0,32 0,32

Summary

82

slide-83
SLIDE 83

16,10 0,00 9,80 16,10 0,00 6,94 5,90 6,30 7,40 8,10 0,32 0,32

  • Transfer learning allowed us to train image tagging and aesthetic classifiers with a few

thousand domain specific samples

  • Showed the importance of having noise-free data for quality predictions
  • Use of attribution & visualization techniques helps understand model decisions and

improve them

Summary

83

slide-84
SLIDE 84

16,10 0,00 9,80 16,10 0,00 6,94 5,90 6,30 7,40 8,10 0,32 0,32

Check us out! #idealoTech

84

https://github.com/idealo https://medium.com/idealo-tech-blog

slide-85
SLIDE 85

16,10 0,00 9,80 16,10 0,00 6,94 5,90 6,30 7,40 8,10 0,32 0,32

We’re hiring!

85

Data Engineers, DevOps Engineers across different teams Check out our job postings: jobs.idealo.de

slide-86
SLIDE 86

16,10 0,00 9,80 16,10 0,00 6,94 5,90 6,30 7,40 8,10 0,32 0,32

Tanuj Jain

tanuj.jain@idealo.de @tjainn

Christopher Lennan

christopher.lennan@idealo.de @chris_lennan

86

slide-87
SLIDE 87

16,10 0,00 9,80 16,10 0,00 6,94 5,90 6,30 7,40 8,10 0,32 0,32

THE END

87