The Future of the In-Car Experience Abdelrahman Mahmoud Product - - PowerPoint PPT Presentation

the future of the in car experience
SMART_READER_LITE
LIVE PREVIEW

The Future of the In-Car Experience Abdelrahman Mahmoud Product - - PowerPoint PPT Presentation

The Future of the In-Car Experience Abdelrahman Mahmoud Product Manager Ashutosh Sanan Computer Vision Scientist @affectiva Affectiva Emotion AI Emotion recognition from face and voice powers several industries Social Robots Mood


slide-1
SLIDE 1

@affectiva

The Future of the In-Car Experience

Abdelrahman Mahmoud Product Manager Ashutosh Sanan Computer Vision Scientist

slide-2
SLIDE 2

@affectiva

Affectiva Emotion AI

Interviewing

Mood Tracking

Social Robots

Drug Efficacy Banking

Content Management (video / audio)

Focus Groups

Customer Analytics Education

Surveillance

Telehealth

Academic Research

Connected devices / loT

Health & Wellness Social Robotics MOOCs

Recruiting

Market Research

Legal

Mental health

Web Conferencing

Healthcare

Real time student feedback Video & Photo organization

Automotive

Fraud Detection

Retail

Virtual Assistants Online education Gaming

Live streaming

Telemedicine

Security

In market products since 2011

  • 1/3 of Fortune Global 100, 1400 brands
  • OEMs and Tier I suppliers

Emotion recognition from face and voice powers several industries

Built using real-world data

  • 6.5M face videos from 87 countries
  • 42,000 miles of driving quarterly

Recognized Market / AI Leader

  • Spun out of MIT Media Lab
  • Selected for Startup Autobahn and Partnership on AI 


slide-3
SLIDE 3

@affectiva

Affectiva Automotive AI

The Problem Affectiva Solution Driver Safety Transitions in control in semi-autonomous vehicles (e.g. the L3 handoff problem) Current solutions based on steering wheel sensors are irrelevant in autonomous driving Next generation AI based system to monitor and manage driver capability for safe engagement Occupant Experience Differentiated and monetizable in-cab experience (e.g. the L4 luxury car challenge) First in-market solution for understanding occupant state and mood to enhance overall in-cab experience

slide-4
SLIDE 4

@affectiva

External Context

Weather Traffic Signs Pedestrians

People Analytics

Personal Context

Identity Likes/dislikes & preferences Occupant state history Calendar

In-Cab Context

Occupant relationships
 Infotainment content Inanimate objects Cabin environment

Emotion AI

Facial expressions Tone of voice

Body posture

People Analytics

Anger Surprise Distraction Drowsiness Intoxication Cognitive Load Enjoyment Attention Excitement Stress Discomfort Displeasure People Analytics context-aware with Emotion AI as the foundational technology.

Personalization

Individually customized baseline Adaptive environment Personalization across vehicles

Safety

Next generation driver monitoring Smart handoff Proactive intervention

Monetization

Differentiation among brands Premium content delivery Purchase recommendations

slide-5
SLIDE 5

@affectiva

Affectiva approach to addressing Emotion AI complexities

Data

Our robust and scalable data strategy enables us to acquire large and diverse data sets, annotate these using manual and automated approaches.

Algorithms

Using a variety of deep learning, computer vision and speech processing approaches, we have developed algorithms to model complex and nuanced emotion and cognitive states.

Team

Our team of researchers and technologists have deep expertise in machine learning, deep learning, data science, data annotation, computer vision and speech processing

Infrastructure

Deep learning infrastructure allows for rapid experimentation and tuning

  • f models as wells as large

scale data processing and model evaluation.

slide-6
SLIDE 6

@affectiva

World’s largest emotion data repository

87 countries, 6.5M faces analyzed, 3.8B facial frames Includes people emoting on device, and while driving Top Countries for Emotion Data

USA 1,166K MEXICO 150K BRAZIL 194K GERMANY 148K UNITED KINGDOM 265K CHINA 562K JAPAN 61K VIETNAM 148K PHILIPPINES 159K INDONESIA 325K THAILAND 184K INDIA 1,363K

slide-7
SLIDE 7

@affectiva

Data Strategy

To develop a deep understanding of the state of occupants in a car, one needs large amounts of data. With this data we can develop algorithms that can sense emotions and gather people analytics in real world conditions.

Foundational proprietary data will drive value to accelerate data partner ecosystem

Spontaneous

  • ccupant data

Using Affectiva Driver Kits and Affectiva Moving Labs to collect naturalistic driver and occupant data to develop metrics that are robust to real-world conditions

Data partnerships

Acquire 3rd party natural in-cab data through academic and commercial partners (MIT AVT, fleet operators, ride-share companies)

Simulated data

Collect challenging data in safe lab simulation environment to augment the spontaneous driver dataset and bootstrap algorithms (e.g. drowsiness, intoxication) multi-spectral & transfer learning.

Auto Data Corpus

Affectiva Confidential

slide-8
SLIDE 8

@affectiva

slide-9
SLIDE 9

@affectiva

Affectiva approach to addressing Emotion AI complexities

Data

Our robust and scalable data strategy enables us to acquire large and diverse data sets, annotate these using manual and automated approaches.

Algorithms

Using a variety of deep learning, computer vision and speech processing approaches, we have developed algorithms to model complex and nuanced emotion and cognitive states.

Team

Our team of researchers and technologists have deep expertise in machine learning, deep learning, data science, data annotation, computer vision and speech processing

Infrastructure

Deep learning infrastructure allows for rapid experimentation and tuning

  • f models as wells as large

scale data processing and model evaluation.

slide-10
SLIDE 10

@affectiva

Algorithms

slide-11
SLIDE 11

@affectiva

Deep learning advancements driving the automotive roadmap

The current SDK consists of deep learning networks that:

  • Face detection: given an image, detect faces
  • Landmark localization: given a image + bounding box, detect and track landmarks
  • Facial analysis: detect facial expression/emotion/attributes

Face detection
 (RPN + bounding boxes)

image

Landmark localization (Regression + confidence) Facial analysis
 (Multi-task CNN/RNN)

face image

per face analysis

bounding boxes Region Proposal Network Shared Conv. Shared Conv. Shared Conv. Classification Landmark
 estimate Landmark
 refinement Confidence Emotions Temporal Expressions Attributes

slide-12
SLIDE 12

@affectiva

Task: Facial Action/Emotion Recognition

  • Given a face classify the

corresponding visual expression/ emotion occurrence.

  • Many Expressions: Facial

muscles generate hundreds of facial expressions/emotions.

  • Multi-Attribute Classification
  • Fast enough to run on mobile/

embedded devices.

Joy Yawn Eye Brow Raise

slide-13
SLIDE 13

@affectiva

Is a single image always enough?

Giphy

slide-14
SLIDE 14

@affectiva

Information in Time

Emotional state continuously evolving process over time. Adding temporal information makes it easier to detect highly subtle changes in facial state. How to utilize temporal information

  • Use post-processing based over static classifier output using previous predictions and images.
  • Use Recurrent Architectures.

Intensity of Expression 26 53 79 105 1 2 3 4 5 6 7 8 9

TIME

slide-15
SLIDE 15

@affectiva

Spatio-Temporal Action Recognition

CNN CNN CNN L S T M

Temporal Sequence of Frames Spatial Feature Extraction

0.5 , , , , 0.8

Learning temporal structure Frame Level Classification

Yawn Recognition using CNN + LSTM

slide-16
SLIDE 16

@affectiva

Training Challenges & Inferences

slide-17
SLIDE 17

@affectiva

Data challenges

While training RNN’s expect a continuous temporal sequence. Missing facial frames


  • Bad lighting
  • Face out of view
  • Face not visible

Possible Solutions


  • Use shorter and fixed continuous sequences with no missing data
  • Copy the last state of the sequence. Repeat last tracked frame
  • Mask the missing frames

Missing human annotations


Facial frames not labeled by humans

Missing Frames in Sequence

slide-18
SLIDE 18

@affectiva

Masking vs Copying last state

Results indicate that masking works better than copying the last state

Chart Title

0.94 0.948 0.955 0.963 0.97 ROC-AUC Val Acc Using last state Masking

slide-19
SLIDE 19

@affectiva

Expressions

Input A Input B

Frozen Feature Extractors

Yawn Transfer

Two approaches to train our model:

  • Train both convolution and

recurrent filters jointly.

  • Transfer learning using

previously learned convolutional filters.

How to train a Spatio-Temporal model?

slide-20
SLIDE 20

@affectiva

Transfer learning for runtime performance

Chart Title

0.961 0.963 0.966 0.968 0.97 ROC-AUC Val Acc Fixed Weights Fully Trainable

Shared Conv. Emotions Temporal Expressions Attributes

Intelligent Filter Reuse

  • Increased runtime performance

to run on mobile.

  • Minimal benefit by tuning filters

from scratch.

  • Large real-world dataset for

pretrained filters.

Usage of transfer learning to help with the runtime performance

slide-21
SLIDE 21

@affectiva

Yawn ROC-AUC Performance (Temporal vs Static)

0.93 0.94 0.95 0.96 0.97 Static Temporal

Smile ROC-AUC Performance (Temporal vs Static)

0.93 0.938 0.946 0.954 0.962 Static Temporal

Outer Brow Raiser-AU02 ROC-AUC Performance (Temporal vs Static)

0.86 0.868 0.875 0.883 0.89 Static Temporal

Does temporal info always help?

slide-22
SLIDE 22

@affectiva

Models in Action

slide-23
SLIDE 23

@affectiva

Key Takeaways

  • Not all the metrics are benefited by adding complex temporal information
  • Using all the data (complete & partial sequences) definitely helps the model
  • Masking works better with partial sequences than copying last frames
  • Intelligent filters reuse makes it possible to deploy these models on mobile with

real-time performance

slide-24
SLIDE 24

What’s next ?

@affectiva

ANGER 100.0 JOY 0.00 SMILE 0.00 EXPRESSIVENESS 92.00 FATIGUE 98.00 EYE CLOSURE 100.0 SMILE 0.00 EXPRESSIVENESS 57.00 CONCENTRATION 85.00 FEAR 78.00 JOY 0.00 EXPRESSIVENESS 68.00

  • Analyze the effects of

difference in frame rate at deployment vs training.

  • Use facial markers to

create a drowsiness intensity metric.

slide-25
SLIDE 25

@affectiva

Q&A



 Learn more at affectiva.com