HDR imaging using Deep Learning Mukul Khanna, IIT Gandhinagar HDR - - PowerPoint PPT Presentation

hdr imaging
SMART_READER_LITE
LIVE PREVIEW

HDR imaging using Deep Learning Mukul Khanna, IIT Gandhinagar HDR - - PowerPoint PPT Presentation

HDR imaging using Deep Learning Mukul Khanna, IIT Gandhinagar HDR High Dynamic Range Dynamic Range Introduction Common digital cameras can not capture the wide range of light intensity levels in a natural scene. Introduction


slide-1
SLIDE 1

HDR imaging

using Deep Learning

Mukul Khanna, IIT Gandhinagar

slide-2
SLIDE 2

HDR

slide-3
SLIDE 3

High Dynamic Range

slide-4
SLIDE 4

Dynamic Range

slide-5
SLIDE 5
slide-6
SLIDE 6
slide-7
SLIDE 7
slide-8
SLIDE 8

slide-9
SLIDE 9

Introduction

  • Common digital cameras can not capture the wide range of light intensity

levels in a natural scene.

slide-10
SLIDE 10

Introduction

  • Common digital cameras can not capture the wide range of light intensity

levels in a natural scene.

slide-11
SLIDE 11

Introduction

  • Common digital cameras can not capture the wide range of light intensity

levels in a natural scene.

  • This can lead to a loss of pixel information in under-exposed and over-

exposed regions of an image, resulting in a low dynamic range (LDR) image.

slide-12
SLIDE 12

Introduction

  • Common digital cameras can not capture the wide range of light intensity

levels in a natural scene.

  • This can lead to a loss of pixel information in under-exposed and over-

exposed regions of an image, resulting in a low dynamic range (LDR) image.

slide-13
SLIDE 13

Courtesy: OpenHDR (viewer.openhdr.org)

slide-14
SLIDE 14

Introduction

  • To recover the lost information and represent the wide range of illuminance in

an image, High Dynamic Range (HDR) images need to be generated.

slide-15
SLIDE 15

HDR IMAGE ENCODING

slide-16
SLIDE 16

HDR image encoding

  • Commonly, the images that we see on our phones and computers, are 8-bit

(per channel) encoded RGB images.

slide-17
SLIDE 17

HDR image encoding

  • Each pixel’s value is stored using 24-bit representations, 8-bit for each

channel (R, G, B). Each channel of a pixel has a range of 0–255 intensity values.

slide-18
SLIDE 18

HDR image encoding

  • The problem with this encoding that it is not capable of containing the

large dynamic range of natural scenes. It only allows a range of 0–255 (only integers) for accommodating the intensity range, which is not sufficient.

slide-19
SLIDE 19

HDR image encoding

  • The problem with this encoding that it is not capable of containing the

large dynamic range of natural scenes. It only allows a range of 0–255 (only integers) for accommodating the intensity range, which is not sufficient.

  • To solve this problem, HDR images are encoded using 32-bit floating point

numbers, for each channel. This allows us to capture the wide uncapped range of HDR images.

slide-20
SLIDE 20

HDR image encoding

  • The problem with this encoding that it is not capable of containing the

large dynamic range of natural scenes. It only allows a range of 0–255 (only integers) for accommodating the intensity range, which is not sufficient.

  • To solve this problem, HDR images are encoded using 32-bit floating point

numbers, for each channel. This allows us to capture the wide uncapped range of HDR images.

  • There are various formats for writing HDR images, the most common

being .hdr and .exr.

slide-21
SLIDE 21

DISPLAYING HDR IMAGES

slide-22
SLIDE 22

Displaying HDR images

  • Most off the shelf display devices are incapable of delivering the wide

uncapped range of HDR images.

  • They expect the input source to be in the three-channel 24-bit (3x8) RGB

format.

  • Due to this reason, the wide dynamic range needs to be toned down to be

able to accommodate it in the 0–255 range of RGB format.

slide-23
SLIDE 23

Tone-mapping

  • Tone mapping addresses the problem of strong contrast reduction from

the scene radiance to the displayable range while preserving the image details and color appearance important to appreciate the original scene content.

slide-24
SLIDE 24

HDR IMAGE GENERATION

slide-25
SLIDE 25

APPROACHES

  • Non-learning based
  • Learning based
slide-26
SLIDE 26

Non learning based approach

slide-27
SLIDE 27

Non learning based approach

  • Conventionally, HDR images are developed by merging images captured at

different exposures.

slide-28
SLIDE 28

Non learning based approach

  • These images are merged using a software algorithm and are saved as a single HDR

image, in a way that the best portions of each image make it to the final image.

slide-29
SLIDE 29

Non learning based approach

  • These images are merged using traditional image processing algorithms

and are saved as a single HDR image, in a way that the best portions of each image make it to the final image.

slide-30
SLIDE 30

Caveats

  • Conventional approaches aren’t robust enough when it comes to dynamic

scenes with motion between the bracketed frames.

slide-31
SLIDE 31

Caveats

  • Conventional approaches aren’t robust enough when it comes to dynamic

scenes with motion between the bracketed frames.

  • They rely on Optical Flow to account for the motion between frames.
slide-32
SLIDE 32

Caveats

  • Conventional approaches aren’t robust enough when it comes to dynamic

scenes with motion between the bracketed frames.

  • They rely on Optical Flow to account for the motion between frames.
slide-33
SLIDE 33

Caveats

  • Conventional approaches aren’t robust enough when it comes to dynamic

scenes with motion between the bracketed frames.

  • They rely on Optical Flow to account for motion between frames.
  • But Optical Flow is not accurate.
slide-34
SLIDE 34

Caveats

  • Conventional approaches aren’t robust enough when it comes to dynamic

scenes with motion between the bracketed frames.

  • They rely on Optical Flow to account for motion between frames.
  • But Optical Flow is not accurate.
  • This can result in ghosting artifacts in the final image.
slide-35
SLIDE 35

Caveats

  • Conventional approaches aren’t robust enough when it comes to dynamic

scenes with motion between the bracketed frames.

  • They rely on Optical Flow to account for motion between frames.
  • But Optical Flow is not accurate.
  • This can result in ghosting artifacts in the final image.
slide-36
SLIDE 36

Learning based approach

slide-37
SLIDE 37

Learning based approach

  • Learning based approaches harness the capabilities of deep neural

network architectures as function approximators to learn LDR to HDR representations.

slide-38
SLIDE 38

Learning based approach

  • Learning based approaches harness the capabilities of deep neural

network architectures as function approximators to learn LDR to HDR representations.

slide-39
SLIDE 39

Learning based approach

  • Learning based approaches harness the capabilities of deep neural

network architectures as function approximators to learn LDR to HDR representations.

  • Such networks can do better due to -

improved learning based flow mechanisms

hallucinating HDR content in saturated regions when LDR input is limited ○

  • ptimised, quick, low-memory alternative
slide-40
SLIDE 40

Learning based approach

  • Learning based approaches can be broken down into two types -
slide-41
SLIDE 41

Learning based approach

  • Learning based approaches can be broken down into two types -
  • Single LDR input
slide-42
SLIDE 42

Approaches - learning based

  • Learning based approaches can be broken down into two types -
  • Single LDR input
  • Multiple LDR inputs
slide-43
SLIDE 43

Learning based - multiple LDR inputs

  • Multiple exposure input
slide-44
SLIDE 44

Learning based - multiple LDR inputs

  • Multiple exposure input
slide-45
SLIDE 45

Learning based - multiple LDR inputs

  • Multiple exposure input
  • More dynamic range is provided to the network
slide-46
SLIDE 46

Learning based - multiple LDR inputs

  • Multiple exposure input
  • More dynamic range is provided to the network
  • Explicit mechanism is required for motion compensation
slide-47
SLIDE 47

Learning based - multiple LDR inputs

  • Multiple exposure input
  • More dynamic range is provided to the network
  • Explicit mechanism is required for motion compensation
  • Better results
slide-48
SLIDE 48

Learning based - multiple LDR inputs

  • Multiple exposure input
  • More dynamic range is provided to the network
  • Explicit mechanism required for motion compensation
  • Better results
  • But input is a constraint
slide-49
SLIDE 49

Single LDR input approaches

slide-50
SLIDE 50

Learning based - single LDR input

  • More challenging scenario
  • Limited dynamic range information input
  • More important for real life situations
  • Heavily relies on ability of deep CNNs to hallucinate content in saturated

image regions.

slide-51
SLIDE 51

Related work

slide-52
SLIDE 52

HDRCNN

  • G. Eilertsen, J. Kronander, G. Denes, R. K. Mantiuk, and J. Unger, “Hdr image reconstruction from a

single exposure using deep cnns,” ACM Transactions on Graphics (TOG), vol. 36, no. 6, p. 178, 2017

slide-53
SLIDE 53

HDRCNN

  • G. Eilertsen, J. Kronander, G. Denes, R. K. Mantiuk, and J. Unger, “Hdr image reconstruction from a

single exposure using deep cnns,” ACM Transactions on Graphics (TOG), vol. 36, no. 6, p. 178, 2017

slide-54
SLIDE 54

Deep reverse tone mapping

  • Y. Endo, Y. Kanamori, and J. Mitani, “Deep reverse tone mapping.,” ACM Trans. Graph., vol. 36, no. 6,
  • pp. 177–1, 2017.
slide-55
SLIDE 55

ExpandNet

  • D. Marnerides, T. Bashford-Rogers, J. Hatchett, and K. Debattista, “Expandnet: A deep

convolutional neural network for high dynamic range expansion from low dynamic range content,” in Computer Graphics Forum, vol. 37, pp. 37–49, Wiley Online Library, 2018.

slide-56
SLIDE 56

Caveats

  • Not end-to-end trainable

OR/AND

  • Only overexposed regions are recovered

OR/AND

  • High network parameter count
slide-57
SLIDE 57

Our approach

slide-58
SLIDE 58

Feedback networks

  • Feedback systems are adopted to influence the input based on the

generated output.

slide-59
SLIDE 59

Feedback networks

  • Feedback systems are adopted to influence the input based on the

generated output.

  • Initial low level features are guided by the high level features using a

hidden state of a Recurrent Neural Network over n iterations.

slide-60
SLIDE 60

Feedback networks

slide-61
SLIDE 61

Feedback networks

slide-62
SLIDE 62

Feedback networks

slide-63
SLIDE 63

Feedback networks

slide-64
SLIDE 64

Model architecture

slide-65
SLIDE 65

Feedback block

slide-66
SLIDE 66

Dilated Dense Block (DDB)

slide-67
SLIDE 67

Loss function

Ground-truth HDR LOSS

slide-68
SLIDE 68

Loss function

  • Loss calculated directly on HDR images is misrepresented due to the

dominance of high intensity values of images with a wide dynamic range.

slide-69
SLIDE 69

Loss function

  • Loss calculated directly on HDR images is misrepresented due to the

dominance of high intensity values of images with a wide dynamic range.

  • Therefore, we tonemap the generated and the ground truth HDR images to

compress the wide intensity range before calculating the loss.

slide-70
SLIDE 70

Loss function

  • Loss calculated directly on HDR images is misrepresented due to the

dominance of high intensity values of images with a wide dynamic range.

  • Therefore, we tonemap the generated and the ground truth HDR images to

compress the wide intensity range before calculating the loss.

  • We use the μ-law for tonemapping -
slide-71
SLIDE 71

Loss function

  • Loss calculated directly on HDR images is misrepresented due to the

dominance of high intensity values of images with a wide dynamic range.

  • Therefore, we tonemap the generated and the ground truth HDR images to

compress the wide intensity range before calculating the loss.

  • We use the μ-law for tonemapping.
  • L1 loss and Perceptual loss (λ = 0.1)
slide-72
SLIDE 72

Experiments

slide-73
SLIDE 73

Datasets

The performance of the network was evaluated over two datasets-

slide-74
SLIDE 74

Datasets

The performance of the network was evaluated over two datasets-

  • CityScene dataset

○ 128 x 64 size ○ Training set - 39,460 LDR-HDR image pairs ○ Testing set - 1,672 pairs

slide-75
SLIDE 75

Datasets

The performance of the network was evaluated over two datasets-

  • Curated dataset

○ 256 x 256 size ○ Training set - 11,262 LDR-HDR image pairs ○ Testing set - 500 image pairs (512 x 512)

slide-76
SLIDE 76

Evaluation metrics

  • PSNR score (db) - Peak Signal-to-Noise Ratio
  • SSIM score - Structural Similarity Index
  • HDR-VDP2 Q-score
slide-77
SLIDE 77

Feedback mechanism analysis

slide-78
SLIDE 78

Results

slide-79
SLIDE 79

Qualitative evaluation

LDR GENERATED GROUND TRUTH

slide-80
SLIDE 80

Qualitative evaluation

LDR GENERATED GROUND TRUTH

slide-81
SLIDE 81

Qualitative evaluation

LDR GENERATED GROUND TRUTH

slide-82
SLIDE 82

Qualitative evaluation

LDR GENERATED GROUND TRUTH

slide-83
SLIDE 83

Qualitative comparisons

LDR FHDR GROUND TRUTH DRTMO

slide-84
SLIDE 84

Qualitative comparisons

LDR FHDR GROUND TRUTH HDRCNN

slide-85
SLIDE 85

Qualitative evaluation

LDR GENERATED GROUND TRUTH

slide-86
SLIDE 86

Qualitative comparisons

LDR FHDR GROUND TRUTH DRTMO

slide-87
SLIDE 87

Qualitative comparisons

LDR FHDR GROUND TRUTH HDRCNN

slide-88
SLIDE 88

Quantitative evaluation

slide-89
SLIDE 89

HDR VIDEO

slide-90
SLIDE 90

HDR VIDEO GENERATION

slide-91
SLIDE 91

HDR Video generation

  • LDR video -> HDR video
slide-92
SLIDE 92

HDR Video generation

  • LDR video -> HDR video

○ Single exposure LDR to HDR ○ Multiple exposure LDR sequences to HDR

slide-93
SLIDE 93

HDR Video generation

  • LDR video -> HDR video

○ Single exposure LDR to HDR ○ Multiple exposure LDR sequences to HDR

  • Temporal coherency is crucial because of vulnerability of neural networks

to produce highly varied outputs for minutely different inputs.

slide-94
SLIDE 94

HDR Video generation

  • LDR video -> HDR video

○ Single exposure LDR to HDR ○ Multiple exposure LDR sequences to HDR

  • Temporal coherency is crucial because of vulnerability of neural networks

to produce highly varied outputs for minutely different inputs.

  • RNNs, LSTMs to propagate temporal information across sequences.
slide-95
SLIDE 95

HDR Video generation

  • LDR video -> HDR video

○ Single exposure LDR to HDR ○ Multiple exposure LDR sequences to HDR

  • Temporal coherency is crucial because of vulnerability of neural networks

to produce highly varied outputs for minutely different inputs.

  • RNNs, LSTMs to propagate temporal information across sequences.
  • Adversarial training using temporal discriminators.
slide-96
SLIDE 96

Conclusion

slide-97
SLIDE 97

Conclusion

  • HDR content is important.
slide-98
SLIDE 98

Conclusion

  • HDR content is important.
  • Deep learning helps - outperforms traditional approaches, again.
slide-99
SLIDE 99

Thank you

Mukul Khanna

mukul18khanna@gmail.com