Learning Representations for Automatic Colorization Gustav Larsson, - - PowerPoint PPT Presentation

learning representations for automatic colorization
SMART_READER_LITE
LIVE PREVIEW

Learning Representations for Automatic Colorization Gustav Larsson, - - PowerPoint PPT Presentation

Learning Representations for Automatic Colorization Gustav Larsson, Michael Maire, Greg Shakhnarovich TTI Chicago / University of Chicago ECCV 2016 Colorization Let us first define colorization Colorization Definition 1: The inverse of


slide-1
SLIDE 1

Learning Representations for Automatic Colorization

Gustav Larsson, Michael Maire, Greg Shakhnarovich

TTI Chicago / University of Chicago ECCV 2016

slide-2
SLIDE 2

Colorization Let us first define “colorization”

slide-3
SLIDE 3

Colorization Definition 1: The inverse of desaturation. Original

slide-4
SLIDE 4

Colorization Definition 1: The inverse of desaturation. Original Desaturate

Grayscale

slide-5
SLIDE 5

Colorization Definition 1: The inverse of desaturation.

Grayscale

slide-6
SLIDE 6

Colorization Definition 1: The inverse of desaturation. Original Colorize

Grayscale

slide-7
SLIDE 7

Colorization Definition 1: The inverse of desaturation. (Underconstrained!) Original Colorize

Grayscale

slide-8
SLIDE 8

Colorization Definition 2: An inverse of desaturation, that...

Grayscale

slide-9
SLIDE 9

Colorization Definition 2: An inverse of desaturation, that...

Our Method

Colorize

Grayscale

... is plausible and pleasing to a human observer.

slide-10
SLIDE 10

Colorization Definition 2: An inverse of desaturation, that...

Our Method

Colorize

Grayscale

... is plausible and pleasing to a human observer.

  • Def. 1: Training + Quantitative Evaluation
  • Def. 2: Qualitative Evaluation
slide-11
SLIDE 11

Manual colorization I thought I would give it a quick try...

slide-12
SLIDE 12

Manual colorization

Grass texture

Low-level features

slide-13
SLIDE 13

Manual colorization

Grass texture Tree

Mid-level features

slide-14
SLIDE 14

Manual colorization

Grass texture Tree Landscape scene

High-level features

slide-15
SLIDE 15

Manual colorization Grass is green

slide-16
SLIDE 16

Manual colorization Sky is blue

slide-17
SLIDE 17

Manual colorization Mountains are... brown?

slide-18
SLIDE 18

Manual colorization Manual (≈ 15 s)

slide-19
SLIDE 19

Manual colorization Manual (≈ 15 s) Manual (≈ 3 min)

slide-20
SLIDE 20

Manual colorization Manual (≈ 15 s) Manual (≈ 3 min) Automatic (< 1 s)

Our Method

slide-21
SLIDE 21

Motivation

  • 1. Colorize old B&W photographs
slide-22
SLIDE 22

Motivation

  • 1. Colorize old B&W photographs
  • 2. Proxy for visual understanding
  • Learning representations useful for other tasks
slide-23
SLIDE 23

Related work Scribble-based methods

Levin et al. (2004); Huang et al. (2005); Qu et al. (2006); Luan et al. (2007) Input Output

Transfer-based methods

Welsh et al. (2002); Irony et al. (2005); Charpiat et al. (2008); Morimoto et al. (2009); Chia et al. (2011) Reference Input Output

Prediction-based methods

Deshpande et al. (2015); Cheng et al. (2015) Iizuka et al. (2016) Zhang et al. (2016); Larsson et al. (2016) Input Output

slide-24
SLIDE 24

Related work Scribble-based methods

Levin et al. (2004); Huang et al. (2005); Qu et al. (2006); Luan et al. (2007) Input Output

Transfer-based methods

Welsh et al. (2002); Irony et al. (2005); Charpiat et al. (2008); Morimoto et al. (2009); Chia et al. (2011) Reference Input Output

Prediction-based methods

Deshpande et al. (2015); Cheng et al. (2015) ← ICCV Iizuka et al. (2016) Zhang et al. (2016); Larsson et al. (2016) Input Output

slide-25
SLIDE 25

Related work Scribble-based methods

Levin et al. (2004); Huang et al. (2005); Qu et al. (2006); Luan et al. (2007) Input Output

Transfer-based methods

Welsh et al. (2002); Irony et al. (2005); Charpiat et al. (2008); Morimoto et al. (2009); Chia et al. (2011) Reference Input Output

Prediction-based methods

Deshpande et al. (2015); Cheng et al. (2015) Iizuka et al. (2016) ← SIGGRAPH Zhang et al. (2016); Larsson et al. (2016) Input Output

slide-26
SLIDE 26

Related work Scribble-based methods

Levin et al. (2004); Huang et al. (2005); Qu et al. (2006); Luan et al. (2007) Input Output

Transfer-based methods

Welsh et al. (2002); Irony et al. (2005); Charpiat et al. (2008); Morimoto et al. (2009); Chia et al. (2011) Reference Input Output

Prediction-based methods

Deshpande et al. (2015); Cheng et al. (2015) Iizuka et al. (2016) Zhang et al. (2016); Larsson et al. (2016) ← ECCV Input Output

slide-27
SLIDE 27

Design principles p

Input: Grayscale Image

slide-28
SLIDE 28

Design principles

  • Semantic knowledge

p

Input: Grayscale Image

slide-29
SLIDE 29

Design principles

  • Semantic knowledge → Leverage ImageNet-based classifier

p

Input: Grayscale Image VGG-16-Gray

conv1 1 conv5 3 (fc6) conv6 (fc7) conv7

slide-30
SLIDE 30

Design principles

  • Semantic knowledge → Leverage ImageNet-based classifier
  • Low-level/high-level features

p

Input: Grayscale Image VGG-16-Gray

conv1 1 conv5 3 (fc6) conv6 (fc7) conv7

slide-31
SLIDE 31

Design principles

  • Semantic knowledge → Leverage ImageNet-based classifier
  • Low-level/high-level features → Zoom-out/Hypercolumn

p

Input: Grayscale Image VGG-16-Gray

conv1 1 conv5 3 (fc6) conv6 (fc7) conv7

Hypercolumn

slide-32
SLIDE 32

Design principles

  • Semantic knowledge → Leverage ImageNet-based classifier
  • Low-level/high-level features → Zoom-out/Hypercolumn
  • Colorization not unique

p

Input: Grayscale Image VGG-16-Gray

conv1 1 conv5 3 (fc6) conv6 (fc7) conv7

Hypercolumn

slide-33
SLIDE 33

Design principles

  • Semantic knowledge → Leverage ImageNet-based classifier
  • Low-level/high-level features → Zoom-out/Hypercolumn
  • Colorization not unique → Predict histograms

p

Input: Grayscale Image VGG-16-Gray

conv1 1 conv5 3 (fc6) conv6 (fc7) conv7

Hypercolumn

h fc1

Hue Chroma

slide-34
SLIDE 34

Design principles

  • Semantic knowledge → Leverage ImageNet-based classifier
  • Low-level/high-level features → Zoom-out/Hypercolumn
  • Colorization not unique → Predict histograms

p

Input: Grayscale Image VGG-16-Gray

conv1 1 conv5 3 (fc6) conv6 (fc7) conv7

Hypercolumn

h fc1

Hue Chroma ← − Expectation ← − Median

slide-35
SLIDE 35

Design principles

  • Semantic knowledge → Leverage ImageNet-based classifier
  • Low-level/high-level features → Zoom-out/Hypercolumn
  • Colorization not unique → Predict histograms

p

Input: Grayscale Image VGG-16-Gray Output: Color Image

conv1 1 conv5 3 (fc6) conv6 (fc7) conv7

Hypercolumn

h fc1

Hue Chroma

Ground-truth

Lightness

slide-36
SLIDE 36

Histogram prediction The histogram representation is rich and flexible:

slide-37
SLIDE 37

Histogram prediction The histogram representation is rich and flexible:

slide-38
SLIDE 38

Histogram prediction The histogram representation is rich and flexible:

slide-39
SLIDE 39

Histogram prediction The histogram representation is rich and flexible:

slide-40
SLIDE 40

Histogram prediction The histogram representation is rich and flexible:

slide-41
SLIDE 41

Histogram prediction The histogram representation is rich and flexible:

slide-42
SLIDE 42

Training

  • Start with an ImageNet pretrained network
slide-43
SLIDE 43

Training

  • Start with an ImageNet pretrained network
  • Adapt to grayscale input
slide-44
SLIDE 44

Training

  • Start with an ImageNet pretrained network
  • Adapt to grayscale input
  • Fine-tune for colorization with log-loss on ImageNet

without labels

slide-45
SLIDE 45

Sparse Training Trained as a fully convolutional network with:

slide-46
SLIDE 46

Sparse Training Trained as a fully convolutional network with: Dense hypercolumns

  • Low-level layers are upsampled
  • ✗ High memory footprint
slide-47
SLIDE 47

Sparse Training Trained as a fully convolutional network with: Dense hypercolumns

  • Low-level layers are upsampled
  • ✗ High memory footprint

Sparse hypercolumns

  • Direct bilinear sampling
  • ✓ Low memory footprint
slide-48
SLIDE 48

Sparse Training Trained as a fully convolutional network with: Dense hypercolumns

  • Low-level layers are upsampled
  • ✗ High memory footprint

Sparse hypercolumns

  • Direct bilinear sampling
  • ✓ Low memory footprint

Source code available for Caffe and TensorFlow

slide-49
SLIDE 49

Comparison: Previous work Significant improvement over state-of-the-art:

10 15 20 25 30 35

PSNR

0.00 0.05 0.10 0.15 0.20 0.25

Frequency Cheng et al. Our method

  • vs. Cheng et al. (2015)

0.0 0.2 0.4 0.6 0.8 1.0

RMSE (αβ)

0.0 0.2 0.4 0.6 0.8 1.0

% Pixels No colorization Welsh et al. Deshpande et al. Ours Deshpande et al. (GTH) Ours (GTH)

  • vs. Deshpande et al. (2015)
slide-50
SLIDE 50

Comparison: Concurrent work

Model MSE PSNR Zhang et al. 270.17 21.58 Baig et al. 194.12 23.72 Ours 154.69 24.80

Source: Baig and Torresani (2016) [Arxiv]

slide-51
SLIDE 51

Comparison: Concurrent work

Model MSE PSNR Zhang et al. 270.17 21.58 Baig et al. 194.12 23.72 Ours 154.69 24.80

Source: Baig and Torresani (2016) [Arxiv]

Model AuC CMF VGG Top-1 Turk non-rebal rebal Classification Labeled Real (%) (%) (%) Accuracy (%) mean std Ground Truth 100.00 100.00 68.32 50.00 – Zhang et al. 91.57 65.12 56.56 25.16 2.26 Zhang et al. (rebal) 89.50 67.29 56.01 32.25 2.41 Ours 91.70 65.93 59.36 27.24 2.31

Source: Zhang et al. (2016) [ECCV]

slide-52
SLIDE 52
slide-53
SLIDE 53
slide-54
SLIDE 54

Examples

Input Our Method Ground-truth Input Our Method Ground-truth

slide-55
SLIDE 55

Examples B&W photographs

slide-56
SLIDE 56

Examples Failure modes

slide-57
SLIDE 57

Self-supervision (ongoing work)

  • 1. Train colorization from scratch
slide-58
SLIDE 58

Self-supervision (ongoing work)

  • 1. Train colorization from scratch

Initialization RMSE PSNR ImageNet Classifier 0.299 24.45 Random 0.311 24.25

How much does ImageNet pretraining help colorization?

slide-59
SLIDE 59

Self-supervision (ongoing work)

  • 1. Train colorization from scratch

Initialization RMSE PSNR ImageNet Classifier 0.299 24.45 Random 0.311 24.25

How much does ImageNet pretraining help colorization?

  • 2. Use network for other tasks, such as semantic segmentation:
slide-60
SLIDE 60

Self-supervision (ongoing work)

  • 1. Train colorization from scratch

Initialization RMSE PSNR ImageNet Classifier 0.299 24.45 Random 0.311 24.25

How much does ImageNet pretraining help colorization?

  • 2. Use network for other tasks, such as semantic segmentation:

Initialization XImageNet YImageNet mIU (%) ImageNet Classifier ✓ ✓ 64.0 Random 32.5

Pascal VOC 2012 segmentation val

slide-61
SLIDE 61

Self-supervision (ongoing work)

  • 1. Train colorization from scratch

Initialization RMSE PSNR ImageNet Classifier 0.299 24.45 Random 0.311 24.25

How much does ImageNet pretraining help colorization?

  • 2. Use network for other tasks, such as semantic segmentation:

Initialization XImageNet YImageNet mIU (%) ImageNet Classifier ✓ ✓ 64.0 ImageNet Colorizer ✓ 50.2 Random 32.5

Pascal VOC 2012 segmentation val

slide-62
SLIDE 62

Summary

  • Fully automatic colorization with state-of-the-art results
  • Efficient training via sparse sampling of hypercolumns
  • Promising proxy task for visual representation learning

See you at poster O-3A-04 upstairs!

Source code and demo available online: colorize.ttic.edu gustavla/autocolorize

pip install autocolorize autocolorize grayscale.png -o color.png

slide-63
SLIDE 63

Summary

  • Fully automatic colorization with state-of-the-art results
  • Efficient training via sparse sampling of hypercolumns
  • Promising proxy task for visual representation learning

See you at poster O-3A-04 upstairs!

Source code and demo available online: colorize.ttic.edu gustavla/autocolorize

pip install autocolorize autocolorize grayscale.png -o color.png

slide-64
SLIDE 64

Summary

  • Fully automatic colorization with state-of-the-art results
  • Efficient training via sparse sampling of hypercolumns
  • Promising proxy task for visual representation learning

See you at poster O-3A-04 upstairs!

Source code and demo available online: colorize.ttic.edu gustavla/autocolorize

pip install autocolorize autocolorize grayscale.png -o color.png

slide-65
SLIDE 65

References

Baig, M. H. and Torresani, L. (2016). Colorization for image compression. arXiv preprint arXiv:1606.06314. Charpiat, G., Hofmann, M., and Sch¨

  • lkopf, B. (2008). Automatic image colorization via multimodal predictions. In ECCV.

Cheng, Z., Yang, Q., and Sheng, B. (2015). Deep colorization. In ICCV. Chia, A. Y.-S., Zhuo, S., Gupta, R. K., Tai, Y.-W., Cho, S.-Y., Tan, P., and Lin, S. (2011). Semantic colorization with internet images. ACM Transactions on Graphics (TOG), 30(6). Deshpande, A., Rock, J., and Forsyth, D. (2015). Learning large-scale automatic image colorization. In ICCV. Huang, Y.-C., Tung, Y.-S., Chen, J.-C., Wang, S.-W., and Wu, J.-L. (2005). An adaptive edge detection based colorization algorithm and its

  • applications. In ACM international conference on Multimedia.

Iizuka, S., Simo-Serra, E., and Ishikawa, H. (2016). Let there be Color!: Joint End-to-end Learning of Global and Local Image Priors for Automatic Image Colorization with Simultaneous Classification. ACM Transactions on Graphics (Proc. of SIGGRAPH 2016), 35(4). Irony, R., Cohen-Or, D., and Lischinski, D. (2005). Colorization by example. In Eurographics Symp. on Rendering. Larsson, G., Maire, M., and Shakhnarovich, G. (2016). Learning representations for automatic colorization. ECCV. Levin, A., Lischinski, D., and Weiss, Y. (2004). Colorization using optimization. ACM Transactions on Graphics (TOG), 23(3). Luan, Q., Wen, F., Cohen-Or, D., Liang, L., Xu, Y.-Q., and Shum, H.-Y. (2007). Natural image colorization. In Eurographics conference on Rendering Techniques. Morimoto, Y., Taguchi, Y., and Naemura, T. (2009). Automatic colorization of grayscale images using multiple images on the web. In SIGGRAPH: Posters. Qu, Y., Wong, T.-T., and Heng, P.-A. (2006). Manga colorization. ACM Transactions on Graphics (TOG), 25(3). Welsh, T., Ashikhmin, M., and Mueller, K. (2002). Transferring color to greyscale images. ACM Transactions on Graphics (TOG), 21(3). Zhang, R., Isola, P., and Efros, A. A. (2016). Colorful image colorization. ECCV.