Improving Graduate Seminar April 3rd, 2020 Computer Vision for - - PowerPoint PPT Presentation

improving
SMART_READER_LITE
LIVE PREVIEW

Improving Graduate Seminar April 3rd, 2020 Computer Vision for - - PowerPoint PPT Presentation

Sara Beery CompSust Open Improving Graduate Seminar April 3rd, 2020 Computer Vision for Camera Traps Leveraging Practitioner Insight to Build Solutions for Real-World Challenges Big goal: monitoring biodiversity, globally and in real time.


slide-1
SLIDE 1

Improving Computer Vision for Camera Traps

Sara Beery

CompSust Open Graduate Seminar April 3rd, 2020

Leveraging Practitioner Insight to Build Solutions for Real-World Challenges

slide-2
SLIDE 2

Big goal: monitoring biodiversity, globally and in real time.

2

slide-3
SLIDE 3

Big goal: monitoring biodiversity, globally and in real time.

3

How can we contribute?

slide-4
SLIDE 4

Camera traps

4

slide-5
SLIDE 5

Camera traps

  • 1,000s of organizations
  • 10,000s of projects
  • 1,000,000s of camera traps
  • 100,000,000s of images

5

*estimates by Eric Fegraus, Conservation International

slide-6
SLIDE 6

Camera traps

  • 1,000s of organizations
  • 10,000s of projects
  • 1,000,000s of camera traps
  • 100,000,000s of images

6

*estimates by Eric Fegraus, Conservation International

For example: Idaho Department of Fish and Game alone has 5 years of unprocessed, unlabeled data, around 5 million images

slide-7
SLIDE 7

Camera trap data is challenging

slide-8
SLIDE 8

All these images have an animal in them

slide-9
SLIDE 9

9

SOA models don’t generalize

# Training Examples Error

101 102 103 104 10-1 10-2 100

Cis Trans

Recognition in Terra Incognita, Beery et al., ECCV 2018

slide-10
SLIDE 10

Microsoft AI for Earth

MegaDetector

Efficient Pipeline for Automating Species ID in new Camera Trap Projects, Beery, et al., BiodiversityNext 2019 https:/ /github.com/microsoft/CameraTraps/blob/master/megadetector.md

Class-agnostic detectors generalize best

slide-11
SLIDE 11

11

slide-12
SLIDE 12

12

Rare classes are hard

# Training Examples Error

101 102 103 104 10-1 10-2 100

Cis Trans

Recognition in Terra Incognita, Beery et al., ECCV 2018

slide-13
SLIDE 13
slide-14
SLIDE 14
slide-15
SLIDE 15

Camera traps are static, and objects of interest are habitual

15

slide-16
SLIDE 16

Synthetic data improves rare-class performance

Synthetic Examples Improve Generalization for Rare Classes, Beery et al., WACV 2020

slide-17
SLIDE 17

Camera traps are static, and objects of interest are habitual

17

slide-18
SLIDE 18

Human labeling method

18

slide-19
SLIDE 19

Human labeling method

19

slide-20
SLIDE 20

Human labeling method

20

slide-21
SLIDE 21

Human labeling method

21

slide-22
SLIDE 22

Human labeling method

22

slide-23
SLIDE 23

Human labeling method

23

Impala!

slide-24
SLIDE 24

Human practitioners use this information, can we build a machine learning model that can do the same?

24

Camera traps are static, and objects of interest are habitual

Context R-CNN: Long Term Context for Per-Camera Object Detection, Beery et al., CVPR 2020

slide-25
SLIDE 25

1. Improve per-location object classification These are probably the same species, and if we’re confident about

  • ne, that should help us

classify the other

25

Camera traps are static, and objects of interest are habitual

slide-26
SLIDE 26

These rocks have not moved in a month, they’re probably not animals.

26

Camera traps are static, and objects of interest are habitual

1. Improve per-location object classification 2. Ignore salient false positives

slide-27
SLIDE 27

Contextual memory strategy

  • Extract features offline
  • Reduce feature size
  • Curate features
  • Maintain spatiotemporal information

27

Context R-CNN: Long Term Context for Per-Camera Object Detection, Beery et al., CVPR 2020

slide-28
SLIDE 28

28

Use attention to incorporate context

Context R-CNN: Long Term Context for Per-Camera Object Detection, Beery et al., CVPR 2020

slide-29
SLIDE 29

29

Context is incorporated based on relevance

Context R-CNN: Long Term Context for Per-Camera Object Detection, Beery et al., CVPR 2020

slide-30
SLIDE 30

Related Work: long-term temporal context in video

Wu et al., Long-Term Feature Banks for Detailed Video Understanding Deng et al., Object Guided External Memory Network for Video Object Detection Shvets et al., Leveraging Long-Range Temporal Relationships Between Proposals for Video Object Detection Wu et al., Sequence Level Semantics Aggregation for Video Object Detection 30

slide-31
SLIDE 31

Datasets

  • Snapshot Serengeti (SS): 225

cameras, 3.4M images, 48 classes, Eastern African game preserve

  • Caltech Camera Traps (CCT): 140

cameras, 243K images, 18 classes, American Southwestern urban wildlife

  • CityCam (CC): 17 cameras, 60K

images, 10 vehicle classes, traffic cameras from NYC

31

Context R-CNN: Long Term Context for Per-Camera Object Detection, Beery et al., CVPR 2020

slide-32
SLIDE 32

Results

32

SS: Snapshot Serengeti CCT: Caltech Camera Traps CC: CityCam

slide-33
SLIDE 33

Improves predominantly on challenging cases

33

slide-34
SLIDE 34

34

Attention is temporally adaptive to relevance

slide-35
SLIDE 35

Snapshot Serengeti mAP improves for all classes

35

slide-36
SLIDE 36

Background classes are learned without supervision

36

slide-37
SLIDE 37

Static passive monitoring sensors

  • Sparse, irregular frame rate
  • Power, computational, and memory constraints.
  • Much of the data is “empty”

37

slide-38
SLIDE 38

Big goal: monitoring biodiversity, globally and in real time.

38

How can we contribute?

slide-39
SLIDE 39

Current Biodiversity AI Competitions

GeoLifeCLEF 2020

https:/ /www.imageclef.org/GeoLifeCLEF2020 https:/ /www.kaggle.com/c/iwildcam-2020-fgvc7 Global camera traps (WCS) + RS 2M Species Observations + RS + LC + Covariates

slide-40
SLIDE 40

Acknowledgements

AI for Earth

40

Caltech Vision Lab