6.808 Mobile and Sensor Computing aka IoT Systems Lecture 14 Split - - PowerPoint PPT Presentation

6 808 mobile and sensor computing
SMART_READER_LITE
LIVE PREVIEW

6.808 Mobile and Sensor Computing aka IoT Systems Lecture 14 Split - - PowerPoint PPT Presentation

6.808 Mobile and Sensor Computing aka IoT Systems Lecture 14 Split Computing / Continuous Object Recognition Logistics & Norm Setting What to do now? Turn on your video (if your connection allows it) Mute your mic (unless you are


slide-1
SLIDE 1

Split Computing / Continuous Object Recognition

Lecture 14

6.808 Mobile and Sensor Computing

aka IoT Systems

slide-2
SLIDE 2

Logistics & Norm Setting

  • What to do now?
  • Turn on your video (if your connection allows it)
  • Mute your mic (unless you are the active speaker)
  • Open the “Participant” List
  • Make sure your full name is shown
  • If you have a question:
  • Use the chat feature to either write the question or to indicate

your interest in asking the question

  • James will be monitoring the chat
  • unmute -> ask question -> mute again
  • Same procedure for answering questions
  • We will post this online
slide-3
SLIDE 3

Continuous, Real-Time Object Recognition on Mobile Devices

Glimpse

Tiffany Chen

Lenin Ravindranath Shuo Deng Victor Bahl Hari Balakrishnan

slide-4
SLIDE 4

Continuous, Real-Time Recognition Apps

  • Apps that continuously locate and label objects

in a video stream.

slide-5
SLIDE 5

Continuous, Real-Time Recognition Apps

Face Recognition Augmented Reality Shopping Augmented Reality Tourist App Driver Assistance

SLOW

slide-6
SLIDE 6

Earlier Designs: Picture-Based Object Recognition

slide-7
SLIDE 7

Earlier Designs: Picture-Based Object Recognition

slide-8
SLIDE 8

Earlier Designs: Picture-Based Object Recognition

calories 180

slide-9
SLIDE 9

Video-Based Object Recognition

slide-10
SLIDE 10

Video-Based Object Recognition

Top seller Buy 1 get 1 free

slide-11
SLIDE 11

Glimpse

  • Continuous, real-time object recognition on mobile

devices in a video stream

slide-12
SLIDE 12

Glimpse

  • Continuous, real-time object recognition on mobile

devices in a video stream

  • Continuously identify and locate objects in each

frame

Alice Bob Alice Bob Bob Alice Alice

slide-13
SLIDE 13

Object Recognition Pipeline

slide-14
SLIDE 14

Object Recognition Pipeline

Detection Feature Extraction

Classification

slide-15
SLIDE 15

Object Recognition Pipeline

Detection Feature Extraction

Classification

slide-16
SLIDE 16

Object Recognition Pipeline

Detection Feature Extraction

Classification

slide-17
SLIDE 17

Object Recognition Pipeline

Detection Feature Extraction

Classification

slide-18
SLIDE 18

Object Recognition Pipeline

Detection Feature Extraction

Classification

Stop Sign

slide-19
SLIDE 19

Before Convolutional Neural Network

Detection Feature Extraction

Classification

slide-20
SLIDE 20

Feature Extraction

Feature engineering

Before Convolutional Neural Network

slide-21
SLIDE 21

Feature Extraction

Feature engineering

Before Convolutional Neural Network

slide-22
SLIDE 22

Feature Extraction

. . … … .

12

  • 2

Feature engineering

Before Convolutional Neural Network

slide-23
SLIDE 23

Convolutional Neural Network

Feature Extraction

Feature learning

Berkeley caffe http://caffe.berkeleyvision.org/

slide-24
SLIDE 24

Object Recognition Pipeline

Detection Feature Extraction

Classification

Stop Sign

slide-25
SLIDE 25

Object Recognition Pipeline

Detection Feature Extraction

Classification

  • Computationally expensive and memory-intensive
  • Server is 700x faster than Google Glass
  • Scalability
  • We need to offload the recognition pipeline to servers

Stop Sign

slide-26
SLIDE 26

Client-Server Architecture

Server Client

Detection Feature Extraction Classification

Labels, bounding boxes Frame Camera Display

Network

slide-27
SLIDE 27

End-to-End Latency Lowers Accuracy

Expected In reality…

slide-28
SLIDE 28

Client-Server Architecture

Server Client

Detection Feature Extraction Classification

Labels, bounding boxes Frame Camera Display

Network

Challenges

  • 1. End-to-end latency lowers object recognition accuracy
slide-29
SLIDE 29

Client-Server Architecture

Server Client

Detection Feature Extraction Classification

Labels, bounding boxes Frame Camera Display

Network

Challenges

  • 1. End-to-end latency lowers object recognition accuracy
  • 2. Bandwidth and battery efficiency
slide-30
SLIDE 30

Glimpse Architecture

Server

Detection Feature Extraction Classification

Network Active Cache

Client

Camera Display Labels, bounding boxes Frame

  • 1. Active Cache combats e2e latency and regains accuracy
slide-31
SLIDE 31

Glimpse Architecture

Server

Detection Feature Extraction Classification

Network Trigger Frame Active Cache

Client

Camera Display Labels, bounding boxes Frame

  • 1. Active Cache combats e2e latency and regains accuracy
  • 2. Trigger Frame reduces bandwidth usage
slide-32
SLIDE 32

Glimpse Architecture

Server

Detection Feature Extraction Classification

Network Trigger Frame Active Cache

Client

Camera Display Labels, bounding boxes Frame

  • 1. Active Cache combats e2e latency and regains accuracy
slide-33
SLIDE 33

End-to-End Latency Lowers Accuracy

Is it possible to combat latency and regain accuracy?

slide-34
SLIDE 34
  • Object tracking on the client to re-locate the object

Relocate Moving Object with Tracking

Frame 0 Frame 12 (delay = 360 ms)

slide-35
SLIDE 35
  • Object tracking on the client to re-locate the object

Frame 0 Frame 12 (delay = 360 ms)

Relocate Moving Object with Tracking

Fast

slide-36
SLIDE 36
  • Object tracking on the client to re-locate the object
  • Fails to work when object displacement is large

Relocate Moving Object with Tracking

slide-37
SLIDE 37
  • Object tracking on the client to re-locate the object
  • Fails to work when object displacement is large

Relocate Moving Object with Tracking

Frame 30 (delay= 1 sec) Frame 0

slide-38
SLIDE 38

Regain Accuracy with Active Cache

  • Cache and run tracking through the cached frames
slide-39
SLIDE 39

Regain Accuracy with Active Cache

  • Cache and run tracking through the cached frames

Server

Frame 0 Active Cache Network Delay = 1 sec

slide-40
SLIDE 40

Regain Accuracy with Active Cache

  • Cache and run tracking through the cached frames

Server

Alice Frame 0 Active Cache Network Delay = 1 sec

slide-41
SLIDE 41

Regain Accuracy with Active Cache

  • Cache and run tracking through the cached frames

Server

Alice Active Cache Frame 0 Frame 30

Run tracking from Frame 0 to Frame 30

Network Delay = 1 sec

slide-42
SLIDE 42

Regain Accuracy with Active Cache

Server

Tracking through all cached frames takes too long!

….

Network Active Cache

  • Cache and run tracking through the cached frames
slide-43
SLIDE 43

Regain Accuracy with Active Cache

Server

Tracking through all cached frames takes too long!

….

Network Active Cache

  • Cache and run tracking through the cached frames
slide-44
SLIDE 44

Adaptive Frame Selection

Given n_cached frames, select s_selected frames so that we can catch up without sacrificing tracking performance

slide-45
SLIDE 45

Adaptive Frame Selection

Given n_cached frames, select s_selected frames so that we can catch up without sacrificing tracking performance

  • 1. How many frames to select?
  • 2. Which frames to select?
slide-46
SLIDE 46

Adaptive Frame Selection

Given n_cached frames, select s_selected frames so that we can catch up without sacrificing tracking performance

  • 1. How many frames to select?

‧ s_selected: active cache processing time vs. tracking accuracy

slide-47
SLIDE 47

Adaptive Frame Selection

Given n_cached frames, select s_selected frames so that we can catch up without sacrificing tracking performance

  • 1. How many frames to select?

‧ s_selected: active cache processing time vs. tracking accuracy

What is the maximum number of frames that can be tracked?

e = execution time for processing any frame in the active cache N frames per second => have 1/N seconds before next frame => Can process s_selected = (1/N)/e frames

slide-48
SLIDE 48

Adaptive Frame Selection

Given n_cached frames, select s_selected frames so that we can catch up without sacrificing tracking performance

  • 1. How many frames to select?

‧ s_selected: active cache processing time vs. tracking accuracy

What is the maximum number of frames that can be tracked? What if I’m okay with increasing the latency a bit?

e = execution time for processing any frame in the active cache N frames per second => have 1/N seconds before next frame If I’m fine with a lag of t frames => Can process s_selected = (t/N)/e frames

slide-49
SLIDE 49

Adaptive Frame Selection

Given n_cached frames, select s_selected frames so that we can catch up without sacrificing tracking performance

  • 2. Given s_selected, which frames to select?

‧Temporal redundancy between frames

slide-50
SLIDE 50

Adaptive Frame Selection

Given n_cached frames, select s_selected frames so that we can catch up without sacrificing tracking performance

  • 2. Given s_selected, which frames to select?

‧Temporal redundancy between frames ‧ Use frame differencing to quantify movement and select frames to capture as much movement as possible

slide-51
SLIDE 51

Active Cache Short Question

  • Does Glimpse reduce the end-to-end latency of object

recognition?

slide-52
SLIDE 52

Active Cache Short Question

  • Does active cache reduce the end-to-end latency of
  • bject recognition?
  • No. It’s a trick to cheat the users into thinking the

recognition is in real time.

slide-53
SLIDE 53

Glimpse Architecture

Server

Detection Feature Extraction Classification

Network Trigger Frame Active Cache

Client

Camera Display Labels, bounding boxes Frame

  • 1. Active Cache combats e2e latency and regains accuracy
  • 2. Trigger Frame reduces bandwidth usage
slide-54
SLIDE 54

Reduce Bandwidth Usage with Trigger Frames

‧Strategically send certain trigger frames to the server

slide-55
SLIDE 55

‧Strategically send certain trigger frames to the server

1. Measuring scene changes from the previously processed frame

Reduce Bandwidth Usage with Trigger Frames

slide-56
SLIDE 56

‧Strategically send certain trigger frames to the server

1. Measuring scene changes from the previously processed frame 2. Detecting tracking failure

  • Feature points deviate when the size, angle, or appearance of the object changes.
  • The standard deviation of distance of all tracked points between two frames

Reduce Bandwidth Usage with Trigger Frames

slide-57
SLIDE 57

‧Strategically send certain trigger frames to the server

1. Measuring scene changes from the previously processed frame 2. Detecting tracking failure

‧Limiting the number of frames in-flight

  • 1 frame in-flight strikes the best balance between bandwidth and accuracy

Reduce Bandwidth Usage with Trigger Frames

slide-58
SLIDE 58

Evaluation

  • Object recognition pipelines
  • 1. Face recognition
  • 2. Road sign recognition
slide-59
SLIDE 59

Evaluation

  • Object recognition pipelines
  • 1. Face recognition
  • 2. Road sign recognition
  • Datasets
  • 1. Face Dataset:
  • 26 videos recorded with a smartphone
  • 30 minutes, 54K frames, and 36K faces
  • Scenarios: shopping with friends and waiting at a subway station
  • 2. Road Sign Dataset:
  • 4 walking videos recorded using Google Glass from YouTube
  • 35 minutes, 63K frames, and 5K road signs
slide-60
SLIDE 60
  • Evaluation Metrics
  • Intersection over union (IOU) to measure recognition accuracy

Oi: bounding box of the detected object i Gi: bounding box of object i’s ground truth

  • Correct if IOU > 50% and the label matches ground truth

Evaluation

IOU = 50%

slide-61
SLIDE 61
  • Evaluation Metrics
  • Precision

# of objects correctly labeled and located total # of objects detected

  • Recall

# of objects correctly labeled and located total # of objects in the ground truth

Evaluation

slide-62
SLIDE 62
  • Evaluation Metrics
  • Precision

# of objects correctly labeled and located total # of objects detected

  • Recall

# of objects correctly labeled and located total # of objects in the ground truth

Evaluation

# faces in the ground truth:4 # faces detected: 3 # faces correctly labeled and detected: 2 Precision: Recall:

slide-63
SLIDE 63
  • Evaluation Metrics
  • Precision

# of objects correctly labeled and located total # of objects detected

  • Recall

# of objects correctly labeled and located total # of objects in the ground truth

Evaluation

# faces in the ground truth:4 # faces detected: 3 # faces correctly labeled and detected: 2 Precision: 2/3 Recall: 2/4

slide-64
SLIDE 64
  • Network conditions
  • Wi-Fi, Verizon’s LTE, and AT&T’s LTE network

Evaluation

slide-65
SLIDE 65
  • 1. Face recognition
  • 2. Road sign recognition
  • 3. Face recognition with hardware-assisted face detection

Results Outline

slide-66
SLIDE 66

Active Cache Achieves High Accuracy

‧Face dataset ‧ Wi-Fi (End-to-end delay: 430 ms)

slide-67
SLIDE 67

Active Cache Achieves High Accuracy

‧Face dataset ‧ Wi-Fi (End-to-end delay: 430 ms)

slide-68
SLIDE 68

Trigger Frame Reduces Bandwidth Usage without Sacrificing Accuracy

‧Face dataset ‧ Wi-Fi (End-to-end delay: 430 ms)

slide-69
SLIDE 69

Trigger Frame Reduces Bandwidth Usage without Sacrificing Accuracy

‧Face dataset ‧ Wi-Fi (End-to-end delay: 430 ms)

slide-70
SLIDE 70

Trigger Frame Consistently Reduces Bandwidth Usage

‧Face Dataset (Wi-Fi)

slide-71
SLIDE 71

Glimpse Achieves Higher Accuracy and Lower Bandwidth Usage

‧Road sign dataset ‧ Wi-Fi (End-to-end delay: 520 ms)

slide-72
SLIDE 72

Glimpse Achieves Higher Accuracy and Lower Bandwidth Usage

‧Road sign dataset ‧ Wi-Fi (End-to-end delay: 520 ms) Lower than precision, why?

slide-73
SLIDE 73

Hardware-Assisted Object Detection

  • Mobile devices are now equipped with object detection

hardware

  • Is Glimpse still helpful?
slide-74
SLIDE 74

Glimpse Improves Accuracy even with Detection Hardware on Devices

‧Face dataset (Wi-Fi) ‧Face detection in hardware

Why less than before?

slide-75
SLIDE 75
  • Glimpse enables continuous, real time object

recognition on mobile devices

  • Glimpse achieves high recognition accuracy by

maintaining an active cache of frames on the client

  • Glimpse reduces bandwidth consumption by

strategically sending only certain trigger frames

Glimpse

slide-76
SLIDE 76
  • CoreML (Apple)
  • MediaPipe (cross-platform)
slide-77
SLIDE 77
slide-78
SLIDE 78

Live Streaming is Gaining Popularity

slide-79
SLIDE 79

Search Capability is Very Limited

Location Tags/Username

slide-80
SLIDE 80

Search Capability is Very Limited

Location Tags/Username

Example Tags: it ended JJJ Hola buenas noches Back at it again LLL

slide-81
SLIDE 81

Search Capability is Very Limited

Location Tags/Username

I want to search based on the contents!

Example Tags: it ended JJJ Hola buenas noches Back at it again LLL

slide-82
SLIDE 82
  • Why now?
  • 1. Cameras everywhere
  • dashcams, GoPro, phones
  • 2. Advances in computer vision
  • CNN
  • 3. Faster compute
  • GPUs
slide-83
SLIDE 83

System Design

Object Recognition Pipeline Online Indexer Ranker

slide-84
SLIDE 84

Challenges

  • 1. Scalability
  • more than 1M incoming streams (30 fps)
  • 2. Liveliness and Relevance
  • contents keep changing – perhaps an incremental online indexer
  • the provided videos match what users want

Object Recognition Pipeline Online Indexer Ranker

slide-85
SLIDE 85

Unleash the Power of Cameras – What are the possible apps?

  • Indoor localization
  • Pothole detection
  • Activity recognition
  • Map-matching
  • Agriculture IoT
  • Smart city camera networks
  • Beyond cameras?
  • RF reconstruction
slide-86
SLIDE 86

Concerns

slide-87
SLIDE 87

Announcements

  • All project equipment ordered & shipped
  • If you don’t receive (some of your) items by

Friday, ping us

  • Lab4 due today (April 8)
  • Midterm next Mon (April 13)
  • Open book and notes
  • During lecture time
  • More midterm instructions coming this week