@affectiva
The Future of the In-Car Experience
Abdelrahman Mahmoud Product Manager Ashutosh Sanan Computer Vision Scientist
The Future of the In-Car Experience Abdelrahman Mahmoud Product - - PowerPoint PPT Presentation
The Future of the In-Car Experience Abdelrahman Mahmoud Product Manager Ashutosh Sanan Computer Vision Scientist @affectiva Affectiva Emotion AI Emotion recognition from face and voice powers several industries Social Robots Mood
@affectiva
Abdelrahman Mahmoud Product Manager Ashutosh Sanan Computer Vision Scientist
@affectiva
Mood Tracking
Social Robots
Drug Efficacy BankingContent Management (video / audio)
Focus GroupsCustomer Analytics Education
Surveillance
TelehealthAcademic Research
Connected devices / loT
Health & Wellness Social Robotics MOOCs
RecruitingMental health
Web Conferencing
Healthcare
Real time student feedback Video & Photo organization
Retail
Virtual Assistants Online education Gaming
Live streamingTelemedicine
Security
In market products since 2011
Emotion recognition from face and voice powers several industries
Built using real-world data
Recognized Market / AI Leader
@affectiva
The Problem Affectiva Solution Driver Safety Transitions in control in semi-autonomous vehicles (e.g. the L3 handoff problem) Current solutions based on steering wheel sensors are irrelevant in autonomous driving Next generation AI based system to monitor and manage driver capability for safe engagement Occupant Experience Differentiated and monetizable in-cab experience (e.g. the L4 luxury car challenge) First in-market solution for understanding occupant state and mood to enhance overall in-cab experience
@affectiva
External Context
Weather Traffic Signs Pedestrians
Personal Context
Identity Likes/dislikes & preferences Occupant state history Calendar
In-Cab Context
Occupant relationships Infotainment content Inanimate objects Cabin environment
Emotion AI
Facial expressions Tone of voice
Body posture
People Analytics
Anger Surprise Distraction Drowsiness Intoxication Cognitive Load Enjoyment Attention Excitement Stress Discomfort Displeasure People Analytics context-aware with Emotion AI as the foundational technology.
Personalization
Individually customized baseline Adaptive environment Personalization across vehicles
Safety
Next generation driver monitoring Smart handoff Proactive intervention
Monetization
Differentiation among brands Premium content delivery Purchase recommendations
@affectiva
Data
Our robust and scalable data strategy enables us to acquire large and diverse data sets, annotate these using manual and automated approaches.
Algorithms
Using a variety of deep learning, computer vision and speech processing approaches, we have developed algorithms to model complex and nuanced emotion and cognitive states.
Team
Our team of researchers and technologists have deep expertise in machine learning, deep learning, data science, data annotation, computer vision and speech processing
Infrastructure
Deep learning infrastructure allows for rapid experimentation and tuning
scale data processing and model evaluation.
@affectiva
87 countries, 6.5M faces analyzed, 3.8B facial frames Includes people emoting on device, and while driving Top Countries for Emotion Data
USA 1,166K MEXICO 150K BRAZIL 194K GERMANY 148K UNITED KINGDOM 265K CHINA 562K JAPAN 61K VIETNAM 148K PHILIPPINES 159K INDONESIA 325K THAILAND 184K INDIA 1,363K
@affectiva
To develop a deep understanding of the state of occupants in a car, one needs large amounts of data. With this data we can develop algorithms that can sense emotions and gather people analytics in real world conditions.
Foundational proprietary data will drive value to accelerate data partner ecosystem
Spontaneous
Using Affectiva Driver Kits and Affectiva Moving Labs to collect naturalistic driver and occupant data to develop metrics that are robust to real-world conditions
Data partnerships
Acquire 3rd party natural in-cab data through academic and commercial partners (MIT AVT, fleet operators, ride-share companies)
Simulated data
Collect challenging data in safe lab simulation environment to augment the spontaneous driver dataset and bootstrap algorithms (e.g. drowsiness, intoxication) multi-spectral & transfer learning.
Auto Data Corpus
Affectiva Confidential
@affectiva
@affectiva
Data
Our robust and scalable data strategy enables us to acquire large and diverse data sets, annotate these using manual and automated approaches.
Algorithms
Using a variety of deep learning, computer vision and speech processing approaches, we have developed algorithms to model complex and nuanced emotion and cognitive states.
Team
Our team of researchers and technologists have deep expertise in machine learning, deep learning, data science, data annotation, computer vision and speech processing
Infrastructure
Deep learning infrastructure allows for rapid experimentation and tuning
scale data processing and model evaluation.
@affectiva
@affectiva
Deep learning advancements driving the automotive roadmap
The current SDK consists of deep learning networks that:
Face detection (RPN + bounding boxes)
image
Landmark localization (Regression + confidence) Facial analysis (Multi-task CNN/RNN)
face image
per face analysis
bounding boxes Region Proposal Network Shared Conv. Shared Conv. Shared Conv. Classification Landmark estimate Landmark refinement Confidence Emotions Temporal Expressions Attributes
@affectiva
corresponding visual expression/ emotion occurrence.
muscles generate hundreds of facial expressions/emotions.
embedded devices.
Joy Yawn Eye Brow Raise
@affectiva
Giphy
@affectiva
Emotional state continuously evolving process over time. Adding temporal information makes it easier to detect highly subtle changes in facial state. How to utilize temporal information
Intensity of Expression 26 53 79 105 1 2 3 4 5 6 7 8 9
TIME
@affectiva
CNN CNN CNN L S T M
Temporal Sequence of Frames Spatial Feature Extraction
0.5 , , , , 0.8
Learning temporal structure Frame Level Classification
Yawn Recognition using CNN + LSTM
@affectiva
@affectiva
While training RNN’s expect a continuous temporal sequence. Missing facial frames
Possible Solutions
Missing human annotations
Facial frames not labeled by humans
Missing Frames in Sequence
@affectiva
Results indicate that masking works better than copying the last state
Chart Title
0.94 0.948 0.955 0.963 0.97 ROC-AUC Val Acc Using last state Masking
@affectiva
Expressions
Input A Input B
Frozen Feature Extractors
Yawn Transfer
Two approaches to train our model:
recurrent filters jointly.
previously learned convolutional filters.
@affectiva
Chart Title
0.961 0.963 0.966 0.968 0.97 ROC-AUC Val Acc Fixed Weights Fully Trainable
Shared Conv. Emotions Temporal Expressions Attributes
Intelligent Filter Reuse
to run on mobile.
from scratch.
pretrained filters.
Usage of transfer learning to help with the runtime performance
@affectiva
Yawn ROC-AUC Performance (Temporal vs Static)
0.93 0.94 0.95 0.96 0.97 Static Temporal
Smile ROC-AUC Performance (Temporal vs Static)
0.93 0.938 0.946 0.954 0.962 Static Temporal
Outer Brow Raiser-AU02 ROC-AUC Performance (Temporal vs Static)
0.86 0.868 0.875 0.883 0.89 Static Temporal
@affectiva
@affectiva
real-time performance
@affectiva
ANGER 100.0 JOY 0.00 SMILE 0.00 EXPRESSIVENESS 92.00 FATIGUE 98.00 EYE CLOSURE 100.0 SMILE 0.00 EXPRESSIVENESS 57.00 CONCENTRATION 85.00 FEAR 78.00 JOY 0.00 EXPRESSIVENESS 68.00
difference in frame rate at deployment vs training.
create a drowsiness intensity metric.
@affectiva