Indoor Places Lukas Kuster Motivation GPS for localization [7] 2 - - PowerPoint PPT Presentation

indoor places
SMART_READER_LITE
LIVE PREVIEW

Indoor Places Lukas Kuster Motivation GPS for localization [7] 2 - - PowerPoint PPT Presentation

Semantic Localization of Indoor Places Lukas Kuster Motivation GPS for localization [7] 2 Motivation Indoor navigation [8] 3 Motivation Crowd sensing [9] 4 Motivation Targeted Advertisement [10] 5 Motivation


slide-1
SLIDE 1

Semantic Localization of Indoor Places

Lukas Kuster

slide-2
SLIDE 2
  • GPS for localization

2

Motivation

[7]

slide-3
SLIDE 3
  • Indoor navigation

3

Motivation

[8]

slide-4
SLIDE 4
  • Crowd sensing

4

Motivation

[9]

slide-5
SLIDE 5
  • Targeted Advertisement

5

Motivation

[10]

slide-6
SLIDE 6
  • Tourist guidance

6

Motivation

[12]

slide-7
SLIDE 7
  • GPS
  • WiFi
  • Images
  • Sound
  • Mobility

7

Semantic Localization

slide-8
SLIDE 8
  • GPS
  • WiFi
  • Images
  • Sound
  • Mobility

8

Semantic Localization

  • Works for unseen places
  • Outdoor and indoor
  • Rich in information
  • User’s point of view
  • No special hardware
slide-9
SLIDE 9
  • Motivation
  • Image Indoor Scene Recognition
  • Recognizing Indoor Scenes – 2009
  • Unsupervised Discovery of Mid-Level Discriminative Patches – 2012
  • Blocks that Shout – 2013
  • Semantic Localization in full Systems
  • Conclusions

9

Overview

slide-10
SLIDE 10
  • Goals:
  • Assign a scene category to an input image

10

Scene classification in computer vision

Scene classifier

Library Classroom

slide-11
SLIDE 11
  • Outdoor scenes
  • Global properties
  • Geometric
  • Indoor scenes
  • Local properties
  • Semantic meaningful objects
  • Arrangement of Objects

11

Challenges in scene recognition

slide-12
SLIDE 12

12

Scene Classification

2009 2012 2013 1 2 3 Recognizing Indoor Scenes Quattoni et al. Unsupervised Discovery

  • f Mid-Level

Discriminative Patches Singh et al. Blocks that Shout: Distinctive Parts for Scene Classification Juneja et al.

...

slide-13
SLIDE 13
  • Two different Image feature descriptors
  • Global information – Gist descriptors
  • Local informations – Sift descrptors
  • MIT Scene 67 dataset

13

Recognizing Indoor Scenes - Quattoni et al. (2009)

slide-14
SLIDE 14

14

Recognizing Indoor Scenes - Quattoni et al. (2009)

Random Prototypes

slide-15
SLIDE 15
  • Manual and automatic segmentation into ROI

15

Recognizing Indoor Scenes - Quattoni et al. (2009)

Random Prototypes Segmentation

slide-16
SLIDE 16
  • Manual and automatic segmentation into ROI
  • 2x2 Histogram of Visual Words

16

Recognizing Indoor Scenes - Quattoni et al. (2009)

Random Prototypes Segmentation ROI descriptors

slide-17
SLIDE 17
  • Manual and automatic segmentation into ROI
  • 2x2 Histogram of Visual Words
  • Optimize parameters on test set

17

Recognizing Indoor Scenes - Quattoni et al. (2009)

Random Prototypes Segmentation Learning

  

p k x g x f k

k m j k kG kj kj

x h

1 ) ( ) (

1

exp ) (

 

ROI descriptors

slide-18
SLIDE 18
  • Manual and automatic segmentation into ROI
  • 2x2 Histogram of Visual Words
  • Optimize parameters on test set

18

Recognizing Indoor Scenes - Quattoni et al. (2009)

Random Prototypes Segmentation Learning

  

p k x g x f k

k m j k kG kj kj

x h

1 ) ( ) (

1

exp ) (

 

ROI descriptors

Prototype weight Local features Global feature

slide-19
SLIDE 19

MIT Scene 67 dataset

19

  • 15620 labeled images
  • 67 indoor scenes categories
slide-20
SLIDE 20

Category 1 (Predicted) Category 2 (Predicted) Category 3 (Predicted) Category 4 (Predicted) Category 5 (Predicted) Category 1 (Actual)

90.12% 0.00% 9.88% 0.00% 0.00%

Category 2 (Actual)

0.00% 100.00% 0.00% 0.00% 0.00%

Category 3 (Actual)

0.00% 0.00% 92.66% 0.00% 7.34%

Category 4 (Actual)

37.20% 0.00% 10.34% 52.46% 0.00%

Category 5 (Actual)

0.00% 0.00% 12.69% 0.00% 87.31% 20

Test Setup – Quattoni et al. (2009)

  • 67 * 80 images for training
  • 67 * 20 images for testing
  • Performance metric: Standard average multiclass

prediction accuracy

slide-21
SLIDE 21

21

Results – Quattoni et al. (2009)

slide-22
SLIDE 22

Evaluation – Quattoni et al. (2009)

22

  • Segmentation Methods:
  • Segmentation: automatic
  • Annotation: manual
  • Features:
  • Only ROI
  • ROI + Gist
slide-23
SLIDE 23

 Indoor Scene classification  Local and global features  Low accuracy (26%)  Manual annotation

23

Conclusion – Quattoni et al. (2009)

slide-24
SLIDE 24

24

Scene Classification

2009 2012 2013 1 2 3 Recognizing Indoor Scenes Quattoni et al. Unsupervised Discovery

  • f Mid-Level

Discriminative Patches Singh et al. Blocks that Shout: Distinctive Parts for Scene Classification Juneja et al.

...

slide-25
SLIDE 25

Unsupervised Discovery of Mid-Level Discriminative Patches – Singh et al. (2012)

25

  • Mid-Level patches
  • Representative: frequent occurence in world
  • Discriminative: diffrent enough from rest of the world
slide-26
SLIDE 26

26

Singh et al. (2012)

Random discovery set

slide-27
SLIDE 27

27

Singh et al. (2012)

Random patches Random discovery set

slide-28
SLIDE 28
  • Cluster patches in HOG space

28

Singh et al. (2012)

Kmeans clustering Random patches Random discovery set

slide-29
SLIDE 29
  • Cluster patches in HOG space
  • Train detector for each cluster

29

Singh et al. (2012)

SVM train Kmeans clustering Random patches Random discovery set

slide-30
SLIDE 30
  • Cluster patches in HOG space
  • Train detector for each cluster
  • Use detector on validation set
  • Get top 5 matches for new cluster
  • Kill clusters that have less than 2 matches

30

Singh et al. (2012)

Detect new patches SVM train Kmeans clustering Random patches Random discovery set

slide-31
SLIDE 31
  • Purity
  • Same visual concept
  • Sum of top r detection scores
  • Discriminativeness
  • Detected rarely in natural world
  • 31

Ranking Detectors – Singh et al. (2012)

world) natural set (training in detections # set g in trainin detections # 

slide-32
SLIDE 32
  • Detect Patches on diffrent scales and

diffrent spatial pyramid levels

  • Train classifier with SVM

32

Image descriptor – Singh et al. (2012)

Object Bank Image representation – Li, L-J et al. (2010)

slide-33
SLIDE 33
  • Detect Patches on diffrent scales and

diffrent spatial pyramid levels

  • Train classifier with SVM

33

Image descriptor – Singh et al. (2012)

SVM Object Bank Image representation – Li, L-J et al. (2010)

slide-34
SLIDE 34

34

Top Ranked patches – Singh et al. (2012)

  • MIT 67 Benchmark
slide-35
SLIDE 35

Evaluation – Singh et al. (2012)

35

Spatial Pyramid HOG 29,8 Spatial Pyramid SIFT (SP) 34,4 ROI-GIST (Quattoni et al.) 26,5 Object Bank 37,6 Patches 38,1

Accuracy:

slide-36
SLIDE 36

Evaluation – Singh et al. (2012)

36

Spatial Pyramid HOG 29,8 Spatial Pyramid SIFT (SP) 34,4 ROI-GIST (Quattoni et al.) 26,5 Object Bank 37,6 Patches 38,1 GIST+SP+DPM 43,1 Patches+GIST+SP+DPM 49,4

Combination approaches: Accuracy:

slide-37
SLIDE 37

 Indoor Scene classification  Local and global features  Low accuracy (26%)  Manual annotation

37

Conclusion

Quattoni et al. (2009) Singh et al. (2012)

 Low supervision  Better accuracy  Low accuracy (49%)  Inefficient

slide-38
SLIDE 38

38

Scene Classification

2009 2012 2013 1 2 3 Recognizing Indoor Scenes Quattoni et al. Unsupervised Discovery

  • f Mid-Level

Discriminative Patches Singh et al. Blocks that Shout: Distinctive Parts for Scene Classification Juneja et al.

...

slide-39
SLIDE 39
  • More efficient
  • Distinctive patches

39

Blocks that Shout: Distinctive Parts for Scene Classification – Juneja et al. (2013)

slide-40
SLIDE 40

40

Blocks that Shout – Juneja et al. (2013)

Initial training set

Seeding

slide-41
SLIDE 41
  • Automatic segmentation into superpixels

41

Blocks that Shout – Juneja et al. (2013)

Superpixels Initial training set

Seeding

slide-42
SLIDE 42
  • Automatic segmentation into superpixels
  • Seedblocks:
  • Intermediate sized superpixels
  • Image variation

42

Blocks that Shout – Juneja et al. (2013)

Initial training set Superpixels Seed Blocks

Seeding

slide-43
SLIDE 43

Seeding Expansion

43

Blocks that Shout – Juneja et al. (2013)

Seed Block HOG descriptor

  • 8x8 HOG cells of 8x8 pixels
slide-44
SLIDE 44

44

Seed Block HOG descriptor

  • 8x8 HOG cells of 8x8 pixels
  • Detect similiar blocks

Exemplar SVM

Blocks that Shout – Juneja et al. (2013)

Seeding Expansion

slide-45
SLIDE 45

45

Seed Block HOG descriptor

  • 8x8 HOG cells of 8x8 pixels
  • Detect similiar blocks
  • 5 iterations for final part detector

Exemplar SVM

seed round1 round2 round3 round4 round5

Blocks that Shout – Juneja et al. (2013)

Seeding Expansion

slide-46
SLIDE 46

46

  • Select most distincitve part detectors
  • Entropy:

 

N y

r y p r y p r Y H

1 2

) , ( log ) , ( ) , (

Blocks that Shout – Juneja et al. (2013)

Seeding Expansion Selection

slide-47
SLIDE 47
  • Detect Patches on diffrent scales and

diffrent spatial pyramid levels

  • Train classifier with SVM

47

Image descriptor – Blocks that Shout (2013)

SVM Object Bank Image representation – Li, L-J et al. (2010)

slide-48
SLIDE 48

48

Blocks that Shout – Juneja et al. (2013)

Results

slide-49
SLIDE 49

49

Blocks that Shout – Juneja et al. (2013)

Accuracy:

ROI-GIST (Quattoni et al.) 26,5 Object Bank 37,6 Patches (Singh et al.) 38,1 BoP 46,1

Evaluation

slide-50
SLIDE 50

50

Blocks that Shout – Juneja et al. (2013)

Patches+GIST+SP+DPM (Singh et al.) 49,4 IFV + BoP 63,1

Combination approaches: Accuracy:

ROI-GIST (Quattoni et al.) 26,5 Object Bank 37,6 Patches (Singh et al.) 38,1 BoP 46,1

Evaluation

slide-51
SLIDE 51

 Indoor Scene classification  Local and global features  Low accuracy (26%)  Manual annotation

51

Conclusion

Quattoni et al. (2009) Singh et al. (2012)

 Low supervision  Better accuracy  Low accuracy (49%)  Inefficient

Juneja et al. (2013)

 Low supervision  More efficient  Distinctive Parts  Even better accuracy  Low accuracy (63%)

slide-52
SLIDE 52
  • Motivation
  • Image Indoor Scene Recognition
  • Recognizing Indoor Scenes – 2009
  • Unsupervised Discovery of Mid-Level Discriminative Patches – 2012
  • Blocks that Shout – 2013
  • Semantic Localization in full Systems
  • Conclusions

52

Overview

slide-53
SLIDE 53

53

Systems Overview

CrowdSense@Place

  • 2012
  • Crowd sensing
  • Link visits with

place categories

  • Share output with

location sensitive applications

slide-54
SLIDE 54

54

Systems Overview

CrowdSense@Place

  • 2012
  • Crowd sensing
  • Link visits with

place categories

  • Share output with

location sensitive applications

Place Naming System

  • 2013
  • Crowd sensing

Output:

  • Functional name

(eg. Food place)

  • Business name

(eg. Starbucks)

  • Personal name

(eg. My home)

slide-55
SLIDE 55

CrowdSense@Place

  • 2012
  • Crowd sensing
  • Link visits with

place categories

  • Share output with

location sensitive applications

55

Systems Overview

Place Naming System

  • 2013
  • Crowd sensing

Output:

  • Functional name

(eg. Food place)

  • Business name

(eg. Starbucks)

  • Personal name

(eg. My home)

CheckInside

  • 2014
  • Location-based

Social Network

  • Improved venues

list in Check-ins

slide-56
SLIDE 56
  • Mobility:
  • GPS
  • WiFi
  • Trajectory

56

Sensor Data

slide-57
SLIDE 57
  • Mobility:
  • GPS
  • WiFi
  • Trajectory
  • Visual Classifiers:
  • Text Recognition
  • Indoor Scene Classification
  • Object Recognition

57

Sensor Data

slide-58
SLIDE 58
  • Mobility:
  • GPS
  • WiFi
  • Trajectory
  • Visual Classifiers:
  • Text Recognition
  • Indoor Scene Classification
  • Object Recognition
  • Sound Classifiers:
  • Speech Recognition
  • Sound Classification

58

Sensor Data

slide-59
SLIDE 59

CrowdSense@Place

  • 1241 places
  • 6 categories
  • Accuracy:

~ 40% - 95%

  • Overall : ~ 69%

59

Evaluation

slide-60
SLIDE 60

CrowdSense@Place

  • 1241 places
  • 6 categories
  • Accuracy:

~ 40% - 95%

  • Overall : ~ 69%

60

Evaluation

Place Naming System

  • 3800 places
  • 9 categories
  • Functional name:

~ 20% - 90% Business Name:

slide-61
SLIDE 61

CrowdSense@Place

  • 1241 places
  • 6 categories
  • Accuracy:

~ 40% - 95%

  • Overall : ~ 69%

61

Evaluation

Place Naming System

  • 3800 places
  • 9 categories
  • Functional name:

~ 20% - 90% CheckInside

  • 711 stores
  • 99% in top 5

Business Name:

slide-62
SLIDE 62
  • Good for functional

naming

62

Visual Scene Recognition Evaluation

CrowdSense@Place accuracy:

slide-63
SLIDE 63
  • Good for functional

naming

  • Intermediate

performance gain for business naming

63

Visual Scene Recognition Evaluation

CrowdSense@Place accuracy: Business Naming accuracy:

slide-64
SLIDE 64
  • Crowd sensing improves semantic localization
  • Relatively low accuracy
  • User interaction still needed
  • Visual scene recognition:
  • Fast progress
  • State of the art could improve the systems

64

Conclusion

slide-65
SLIDE 65

(1) Quattoni, A.; Torralba, A., "Recognizing indoor scenes," IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2009. (2) Singh, S.; Gupta, A; Efros, A. A., “Unsupervised discovery of mid-level discriminative patches,” European conference on Computer Vision (ECCV), 2012. (3) Juneja, M.; Vedaldi, A.; Jawahar, C.V.; Zisserman, A., "Blocks That Shout: Distinctive Parts for Scene Classification," IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2013. (4) Chon, Y.; Lane, N. D.; Li, F.; Cha, H.; Zhao, F., “Automatically characterizing places with

  • pportunistic crowdsensing using smartphones,” ACM Conference on Ubiquitous Computing

(UbiComp), 2012. (5) Chon, Y.; Kim, Y.; Cha, H., “Autonomous place naming system using opportunistic crowdsensing and knowledge from crowdsourcing,” International conference on Information processing in sensor networks (IPSN), 2013. (6) Elhamshary, M; Youssef, M., “CheckInside: a fine-grained indoor location-based social network,” ACM International Joint Conference on Pervasive and Ubiquitous Computing (UbiComp), 2014.

65

References

slide-66
SLIDE 66

(7) http://www.extremetech.com/extreme/126843-think-gps-is-cool-ips-will-blow-your-mind/2 (8) http://free-wifi-service.com/ (9) http://denimandsteel.com/talks/polyglot/ (10) http://www.elatewiki.org/images/Special.jpeg (11) Li, L.J., Su, H., Xing, E., Fei-fei, L., “Object bank: A high-level image representation for scene classication and semantic feature sparsication,” Conference on Neural Information Processing Systems (NIPS), 2010. (12) https://www.flip4new.de/blog/nokia-lumia-920-review-was-kann-das-windows-phone/

66

References