We need a better perceptual similarity metric Lubomir Bourdev - - PowerPoint PPT Presentation

we need a better perceptual similarity metric
SMART_READER_LITE
LIVE PREVIEW

We need a better perceptual similarity metric Lubomir Bourdev - - PowerPoint PPT Presentation

We need a better perceptual similarity metric Lubomir Bourdev WaveOne, Inc. CVPR Workshop and Challenge on Learned Compression June 18th 2018 Challenges in benchmarking compression Measurement of perceptual similarity Consideration


slide-1
SLIDE 1

CVPR Workshop and Challenge on Learned Compression

Lubomir Bourdev

We need a better 
 perceptual similarity metric

WaveOne, Inc.

June 18th 2018

slide-2
SLIDE 2

Challenges in benchmarking compression

  • Measurement of perceptual similarity
  • Consideration of computational efficiency
  • Choice of color space
  • Aggregating results from multiple images
  • Ranking of R-D curves
  • Dataset bias
  • Many more!
slide-3
SLIDE 3
  • Measurement of perceptual similarity
  • Consideration of computational efficiency
  • Choice of color space
  • Aggregating results from multiple images
  • Ranking of R-D curves
  • Dataset bias
  • Many more!

Challenges in benchmarking compression

slide-4
SLIDE 4

Why perceptual similarity is critical now?

  • Perceptual similarity is not a new problem

■ Manos and Sakrison, 1974 ■ Girod, 1993 ■ Teo & Heeger, 1994 ■ Eskicioglu and Fisher, 1995 ■ Eckert and Bradley, 1998 ■ Janssen, 2001 ■ Wang, 2001 ■ Wang and Bovik, 2002 ■ Wang et al., 2002 ■ Pappas & Safranek, 2000 ■ Wang et al., 2003 ■ Sheikh et al., 2005 ■ Wang and Bovik, 2009 ■ Wang et al., 2009 ■ Many more…

slide-5
SLIDE 5

Why perceptual similarity is critical now?

  • Perceptual similarity is not a new problem

■ Manos and Sakrison, 1974 ■ Girod, 1993 ■ Teo & Heeger, 1994 ■ Eskicioglu and Fisher, 1995 ■ Eckert and Bradley, 1998 ■ Janssen, 2001 ■ Wang, 2001 ■ Wang and Bovik, 2002 ■ Wang et al., 2002 ■ Pappas & Safranek, 2000 ■ Wang et al., 2003 ■ Sheikh et al., 2005 ■ Wang and Bovik, 2009 ■ Wang et al., 2009 ■ Many more…

  • Today we have new much more powerful tools
  • Deep nets can exploit any weaknesses in the metrics
slide-6
SLIDE 6

Why perceptual similarity is critical now?

  • Perceptual similarity is not a new problem:

■ Manos and Sakrison, 1974 ■ Girod, 1993 ■ Teo & Heeger, 1994 ■ Eskicioglu and Fisher, 1995 ■ Eckert and Bradley, 1998 ■ Janssen, 2001 ■ Wang, 2001 ■ Wang and Bovik, 2002 ■ Wang et al., 2002 ■ Pappas & Safranek, 2000 ■ Wang et al., 2003 ■ Sheikh et al., 2005 ■ Wang and Bovik, 2009 ■ Wang et al., 2009 ■ Many more…

  • Today we have new much more powerful tools
  • Deep nets can exploit any weaknesses in the metrics
  • Nets get penalized if they do better than the metric
slide-7
SLIDE 7

How do we measure quality assessment?

slide-8
SLIDE 8

How do we measure quality assessment?

  • Idea 1: Stick to traditional metrics
  • MSE, PSNR
  • SSIM, MS-SSIM [Wang et. al. 2003]

  • Simple, intuitive way to benchmark performance
slide-9
SLIDE 9

How do we measure quality assessment?

  • Idea 1: Stick to traditional metrics
  • MSE, PSNR
  • SSIM, MS-SSIM [Wang et. al. 2003]

  • Simple, intuitive way to benchmark performance
  • However, they are far from ideal
slide-10
SLIDE 10

Min PSNR on MS-SSIM isocontour

Target MS-SSIM: 0.99 PSNR: 11.6dB

slide-11
SLIDE 11

Min PSNR on MS-SSIM isocontour

MS-SSIM: 0.997 PSNR: 14.4dB Target

slide-12
SLIDE 12

Min MS-SSIM on PSNR isocontour

Target MS-SSIM: 0.15 PSNR: 30dB

slide-13
SLIDE 13

Min MS-SSIM on PSNR isocontour

MS-SSIM: 0.90 PSNR: 40dB Target

slide-14
SLIDE 14

Min MS-SSIM on PSNR isocontour

MS-SSIM: 0.90 PSNR: 40dB Target

Idea 2: Maybe we should maximize both?

slide-15
SLIDE 15

Is maximizing PSNR + MS-SSIM the right solution?

slide-16
SLIDE 16

Is maximizing PSNR + MS-SSIM the right solution?

~200 
 bytes

slide-17
SLIDE 17

Is maximizing PSNR + MS-SSIM the right solution?

~200 
 bytes

Generic WaveOne
 (no GAN) Domain-aware 
 Adversarial model

slide-18
SLIDE 18

Is maximizing PSNR + MS-SSIM the right solution?

~200 
 bytes

Generic WaveOne
 (no GAN) Domain-aware 
 Adversarial model MS-SSIM: 0.93 PSNR: 25.9 MS-SSIM: 0.89 PSNR: 23.0

slide-19
SLIDE 19

Is maximizing PSNR + MS-SSIM the right solution?

~200 
 bytes

Generic WaveOne
 (no GAN) Domain-aware 
 Adversarial model MS-SSIM: 0.93 PSNR: 25.9 MS-SSIM: 0.89 PSNR: 23.0

Idea 3: Maybe we should use GANs?

slide-20
SLIDE 20

GANs are very promising

slide-21
SLIDE 21

GANs are very promising

  • Reconstructions visually appealing (sometimes!)
  • Generic and intuitive objective:
  • Similarity function of the difficulty of distinguishing the

images by an expert

slide-22
SLIDE 22

GANs are very promising

  • Reconstructions visually appealing (sometimes!)
  • Generic and intuitive objective:
  • Similarity function of the difficulty of distinguishing the

images by an expert

  • Unfortunately the loss is different for every

network and evolves over time

slide-23
SLIDE 23

What makes people prefer the right image?

slide-24
SLIDE 24

What makes people prefer the right image?

Looks like leaves Looks like grass

slide-25
SLIDE 25

What makes people prefer the right image?

Looks like leaves Looks like grass

Idea 4: Maybe we should use semantics?

slide-26
SLIDE 26

Losses based on semantics

  • Intermediate layers of pre-trained classifiers

capture semantics [Zeiler & Fergus 2013]

  • Significantly better correlation to MoS vs

traditional metrics

[Zhang et al, CVPR18]

slide-27
SLIDE 27

Losses based on semantics

  • Intermediate layers of pre-trained classifiers

capture semantics [Zeiler & Fergus 2013]

  • Significantly better correlation to MoS vs

traditional metrics

[Zhang et al, CVPR18]

  • However, arbitrary and over-complete
  • Millions of parameters
  • Trained on unrelated task
  • Which nets? Which layers? How to combine them?
slide-28
SLIDE 28

Idea 5: Attention-driven metrics

Where people look Where the bandwidth goes

slide-29
SLIDE 29

Idea 5: Attention-driven metrics

Where people look Where the bandwidth goes

  • All existing metrics treat every pixel equally
  • Clearly suboptimal
slide-30
SLIDE 30

Idea 5: Attention-driven metrics

Where people look Where the bandwidth goes

  • All existing metrics treat every pixel equally
  • Clearly suboptimal
  • But defining importance is another open problem
slide-31
SLIDE 31

Idea 6: Task-driven metrics

  • A/B testing compression variants based on feature
  • Goal: Social sharing
  • Measure: user engagement


  • Goal: ML on the cloud
  • Measure: performance on the ML task
slide-32
SLIDE 32

Idea 6: Task-driven metrics

  • A/B testing compression variants based on feature
  • Goal: Social sharing
  • Measure: user engagement


  • Goal: ML on the cloud
  • Measure: performance on the ML task
  • Solves the “right” problem
slide-33
SLIDE 33

Idea 6: Task-driven metrics

  • A/B testing compression variants based on feature
  • Goal: Social sharing
  • Measure: user engagement


  • Goal: ML on the cloud
  • Measure: performance on the ML task
  • Solves the “right” problem
  • However, not accessible, not repeatable, 


not back-propagatable

slide-34
SLIDE 34

Idea 7: when all fails, ask the experts

slide-35
SLIDE 35

Idea 7: when all fails, ask the experts

  • Humans are the gold standard for perceptual

fidelity

slide-36
SLIDE 36

Idea 7: when all fails, ask the experts

  • Humans are the gold standard for perceptual

fidelity

  • Challenges
  • Hard to construct objective tests
  • Can’t back-propagate through humans
  • Expensive to evaluate (both time & money)
  • Non-repeatable

“On a scale from 0 to 1, how different are these two pixels?
 Only another 999,999 comparisons to go!”

slide-37
SLIDE 37

Conclusion

  • The impossible wishlist for ideal quality metric:
  • Simple and intuitive
  • Repeatable
  • Back-propagatable
  • Content-aware
  • Efficient
  • Importance-driven
  • Task-aware
slide-38
SLIDE 38

Conclusion

  • The impossible wishlist for ideal quality metric:
  • Simple and intuitive
  • Repeatable
  • Back-propagatable
  • Content-aware
  • Efficient
  • Importance-driven
  • Task-aware
  • Improving quality metrics is critical in the neural net age
slide-39
SLIDE 39

Conclusion

  • The impossible wishlist for ideal quality metric:
  • Simple and intuitive
  • Repeatable
  • Back-propagatable
  • Content-aware
  • Efficient
  • Importance-driven
  • Task-aware
  • Improving quality metrics is critical in the neural net age

The wrong metrics lead to good solutions to the wrong problem!

slide-40
SLIDE 40

http://wave.one

Thanks to my team!

The WaveOne team, compressed to 0.01 BPP,
 using GAN specializing on frontal faces