LOW-LATENCY, NEAR-EYE GAZE ESTIMATION Michael Stengel, Alexander - PowerPoint PPT Presentation

NVGAZE: ANATOMY-AWARE AUGMENTATION FOR LOW-LATENCY, NEAR-EYE GAZE ESTIMATION Michael Stengel, Alexander Majercik

Part I (Michael) 25 min • Eye tracking for near-eye displays Synthetic dataset generation • Network training and results • AGENDA Part II (Alexander) 15 min • Fast Network Inference using cuDNN Deep Learning Best Practice • 2

NVGAZE TEAM Michael Stengel Alexander Majercik Joohwan Kim Shalini De Mello New Experiences Group New Experiences Group New Experiences Group Perception & Learning David Luebke Morgan McGuire Samuli Laine New Experiences Group VP of Graphics Research New Experiences Group 3

EYE TRACKING FOR NEAR-EYE DISPLAYS Michael Stengel 4

EYE TRACKING IN VR/AR [Sun et al.] Periphery [Padmanaban et al.] [Patney et al.] [Eisko.com] Computational Displays Foveated Rendering Avatars Perception Dynamic Streaming [Vedamurthy et al.] [arpost.co] [Sitzmann et al.] [eyegaze.com] Gaze Interaction Health Care Attention Studies User State Evaluation 5

SUBTLE GAZE GUIDANCE Enlarging virtual spaces through redirected walking [Sun et al., Siggraph‘18] 6

FOVEATED RENDERING Accelerating Real-time Computer Graphics 7

Enhancing Depth Perception ACCOMMODATION SIMULATION 8

GAZE-AS-INPUT 9

LABELED REALITY 10

EYE TRACKING IN VR/AR WORKING PRINCIPLE How do video-based eye tracking systems work? • Lens (x,y) Eye Camera Display Face Domain mapping 3d gaze vector or Eye capture Pupil localization using calibration 2d point of regard parameters 11

ON-AXIS VS OFF-AXIS GAZE TRACKING Camera view off-axis Camera view on-axis 12

ON-AXIS GAZE TRACKING Eye tracking prototype for Virtual Reality headsets Components for on-axis eye tracking integration Eye tracking cameras, dichroic mirrors, infrared illumination, VR glasses frame Modded GearVR with integrated gaze tracking 13

ON-AXIS GAZE TRACKING Eye tracking prototype for VR headsets 14

ON-AXIS EYE TRACKING CAMERA VIEW 15

OFF-AXIS GAZE TRACKING Eye tracking prototype for VR headsets Camera Eye Lens Display 16

OFF-AXIS GAZE TRACKING Eye tracking prototype for VR headsets 17

EYE TRACKING IN VR/AR CHALLENGES FOR MOBILE VIDEO-BASED EYE TRACKERS Changing illumination conditions (over-exposure and hard shadows) • • Occlusions from eyes lashes, skin, blink, glasses frame • Varying eye appearance : flesh, mascara and other make-up Reflections • Camera view and noise (blur, defocus, motion) • • drifting calibration (single-camera case) due to HMD or glasses motion • End-to-end latency → Reaching low latency AND high robustness is hard ! Capturing training data is expensive • 18

PROJECT GOALS • Deep learning based gaze estimation Higher robustness than previous methods • • Target accuracy is < 2 degrees of angular error (over full field of view!) Fast inference ranging in a few milliseconds even on mobile GPU • • Compatibility to any captured input (on-axis, off-axis, near-eye, remote, etc., dark pupil tracking only, glint-free tracking) • Explore usage of synthetic data Can we learn increase calibration robustness ? • 19

RELATED RESEARCH • PupilNet [Fuhl et al., 2017] 2-pass CNN-based method running in 8 ms (CPU) performing pupil • localization task 1 st pass on low res image (96x72 pixels) • 2nd pass on full-res image (VGA resolution) • • trained on 135k manually labeled real images • Higher robustness than previous ‘hand - crafted’ pupil detectors Domain Randomization [Tremblay et al., Nvidia, 2018] • • Image and label generator for automotive setting • Randomized objects force network to learn essential structure of cars independent of view and lighting condition 20

NVGAZE SYNTHETIC EYES DATASET 21

GENERATING TRAINING DATA 1: Eye Model We adopted the eye model from Wood et al. 2015 * and modified it to more accurately represent human eyes. 5 deg Optical Axis * Wood, E., Baltrušaitis , T., Zhang, X., Sugano, Y., Robinson, P., & Bulling, A. “Rendering of eyes for eye - shape registration and gaze estimation”, ICCV 2015. 22

GENERATING TRAINING DATA 2: Pupil Center Shift Pupil center is off from iris center , and it moves as pupil changes in size. Average displacements: 8mm pupil: 0.1 mm nasal and 0.07 mm up 6mm pupil: 0.15 mm nasal and 0.08 mm up 4mm pupil: 0.2 mm nasal and 0.09 mm up This is known to cause gaze tracking error of up to 5 deg in pupil-glint tracking methods. 23

GENERATING TRAINING DATA 2: Scanned faces 24

GENERATING TRAINING DATA 2: Combining Eye and Head Models 10 scanned faces with photorealistic eye, adopted the eye model from Wood et al. 2015 • physical material properties for cornea, sclera and skin under infrared lighting conditions • 25

GENERATING TRAINING DATA 2: Synthetic Model 26

GENERATING TRAINING DATA 3: Dataset • 4M Synthetic HD eye images for animated eye (400K images per subject) are generated using Blender on Multi-GPU cluster. Render engine used is Cycles as physically accurate path tracer. • 27

GENERATING TRAINING DATA 3: Dataset 28

ANATOMY-AWARE AUGMENTATION 29

GENERATING TRAINING DATA 4: Region Labels Skin Pupil Iris Glint Sclera Sclera occluded by skin Region maps are generated out of images with self-illuminating material. • Refractive effect of air-cornea layer is accounted for. • Synthetic ground truth is available even if regions are occluded by skin (during blink). • 30

ANATOMY-AWARE AUGMENTATION Original Synthetic Image Augmented Synthetic Image Region-wise • Contrast scaling • Blur • Intensity offset Global • Contrast scaling • Gaussian noise Samples of real images for comparison 31

NVGAZE NETWORK 32

NVGAZE INFERENCE OVERVIEW Input Image C onvolutional Network Gaze Vector IR Camera 33

NETWORK ARCHITECTURE Camera image 640x480 F ully (x, y) C onnected Layer 122 81 F C C onv6 54 C onv5 36 C onv4 C onv3 24 Layer Resolution Num. Channels C onv2 16 Input 255 x 191 1 C onv1 Conv1 127 x 95 16 Conv2 63 x 47 24 Fully convolutional network Conv3 31 x 23 36 In reference design, each layer has … Conv4 15 x 11 54 Stride of 2 No padding Conv5 7 x 5 81 3x3 Conv. kernel Conv6 3 x 2 122 34

NETWORK COMPLEXITY ANALYSIS 35

TRAINING AND VALIDATION Loss function • Trained on a 10 synthetic subjects + 3 real subjects . No fine-tuning. Ramp-up and ramp-down for 50 epochs at the beginning and end. • Adam optimizer with MSE loss • 36

NEURAL NETWORK PERFORMANCE Gaze Estimation Accuracy / Near Eye Display 2.1 degrees of error in average across real subjects Error is almost evenly distributed across the entire tested visual field 1.7 degrees best-case accuracy when trained for single subject Accuracy / Remote Gaze Tracking 8.4 degrees average accuracy for remote gaze tracking (same accuracy as state of the art by Park et al., 2018) but 100x faster Latency for gaze estimation <1 milliseconds for inference and data transfer between CPU and GPU space cuDNN implementation running on TitanV or Jetson TX2 bottleneck is camera transfer @ 120 Hz 37

PUPIL LOCALIZATION 38

NEURAL NETWORK PERFORMANCE Pupil Location Estimation 39

NEURAL NETWORK PERFORMANCE Pupil Location Estimation Our network is more accurate, more robust and requires less memory than others. 41

OPTIMIZING FOR FAST INFERENCE Alexander Majercik 42

PROJECT GOALS • Deep learning based gaze estimation Higher robustness than previous methods • • Target accuracy is <2 degrees of angular error Fast inference ranging in a few milliseconds even on mobile GPU • • Compatibility to any captured input (on-axis, off-axis, near-eye, remote, etc., dark pupil tracking only, glint-free tracking) • Explore usage of synthetic data (large dataset >1,000.000 images) Can we learn increase calibration robustness ? • 43

PROJECT GOALS • Deep learning based gaze estimation Higher robustness than previous methods • • Target accuracy is <2 degrees of angular error Fast inference ranging in a few milliseconds even on mobile GPU • • Compatibility to any captured input (on-axis, off-axis, near-eye, remote, etc., dark pupil tracking only, glint-free tracking) • Explore usage of synthetic data (large dataset >1,000.000 images) Can we learn increase calibration robustness ? • 44

NETWORK LATENCY REQUIREMENTS [Sun et al.] Periphery [Patney et al.] [Eisko.com] Computational Displays Foveated Rendering Avatars Perception Dynamic Streaming [Vedamurthy et al.] [arpost.co] [Sitzmann et al.] [eyegaze.com] Gaze Interaction Health Care Attention Studies User State Evaluation 45

NETWORK LATENCY REQUIREMENTS Human Perception Esports 60 ms To Get it Right Esports Research at NVIDIA Gaze-Contingent Rendering and Human perception 46

NETWORK LATENCY REQUIREMENTS Human Perception Esports 60 ms To Get it Right Esports Research at NVIDIA Gaze-Contingent Rendering and Human perception BOTTOM LINE: Network should run in ~1ms! 47

Fast inference is also training problem 48

LOW-LATENCY, NEAR-EYE GAZE ESTIMATION Michael Stengel, Alexander - PowerPoint PPT Presentation

NVGAZE: ANATOMY-AWARE AUGMENTATION FOR LOW-LATENCY, NEAR-EYE GAZE ESTIMATION Michael Stengel, Alexander Majercik Part I (Michael) 25 min Eye tracking for near-eye displays Synthetic dataset generation Network training and results

Gaze Tracking -Shashank Shekhar Aim To estimate a person's gaze using a webcam. Gaze

gaze-following and recognizing intentions from gaze Outline infant gaze following studies

Asynchronous I/O Stack: A Low-latency Kernel I/O Stack for Ultra-Low Latency SSDs Jinkyu Jeong

Multimodal Interaction Eye Gaze and Head Movement Tracking Iris Recognition Dr Pradipta Biswas,

a story telling robot: modelling and evaluation of human-like gaze behaviour 1 motivations

13 th November 2015 John Liddle Senior Account Manager Tobii Dynavox Tobii Dynavox Our

Implementation Strategies for Eye Gaze Users Katelyn Oeser SLP Brenda Del Monte SLP They are

Saccade Tasks Visual Search Saccades Micro-Fixation Saccades Reading Gaze Shifts Reading Gaze

Parts of the Eye and Eye Disorders Pretest By: Colby Tharp Parts of the Eye Parts of the Eye

Optics of the Human Eye Optics of the Human Eye Optics of the Human Eye Optics of the Human Eye

HSEye Full Full- Full Full - - -Time HSE eye Time HSE eye Time HSE eye Time HSE eye 1

Eye Tracking and Topics EMA in Computer Eye tracking definition Science Eye tracker

Realtime Gaze Estimation with Online Calibration Li Sun, Mingli Song, Zicheng Liu, Ming-Ting Sun

Three classes of eye movements: Gaze Stabilization with body movement Optokinetic Nystagmus (OKN)

Learning video saliency from human gaze using candidate selection Rudoy,Goldman, Schechtman,

Learning to Predict Gaze in Egocentric Videos Yin Li, Alireza Fathi, James M. Rehg Outline: -

Augmented Reality Information Displays Psychology 6135: Psychology of Data Visualization Matthew

Clinical case presentation Dr. Arunbabu. R Post MCh Senior Resident Neurosurgery, NIMHANS.

Andrej Karpathy Bay Area Deep Learning School, 2016 So far... So far... Some input vector (very

A. Hyv arinen and P. O. Hoyer A Two-Layer Sparse Coding Model Learns Simple and Complex Cell

Plaiting Perspectives Transdisciplinary connection-making Dr Helen Ramoutsaki Adjunct Research

Loxahatchee National Wildlife Refuge Visitor Services Planning Process A.R.M. Loxahatchee

Divers Augmented Vision Display (DAVD) Undersea Defence Technology 2018 Military Diver

1 White and Green Phosphor NVD Comparative Assessment Army Aviation Test Evaluation Section 2