Contextual Media Retrieval Using Natural Language Queries IMPRS-CS - - PowerPoint PPT Presentation

contextual media retrieval using natural language queries
SMART_READER_LITE
LIVE PREVIEW

Contextual Media Retrieval Using Natural Language Queries IMPRS-CS - - PowerPoint PPT Presentation

Contextual Media Retrieval Using Natural Language Queries IMPRS-CS PhD Application Talk Sreyasi Nag Chowdhury Masters Thesis Supervisors Dr. Mario Fritz Dr. Andreas Bulling Adviser M.Sc. Mateusz Malinowski 1 Outline Motivation and


slide-1
SLIDE 1

Contextual Media Retrieval Using Natural Language Queries

IMPRS-CS PhD Application Talk Sreyasi Nag Chowdhury

Master’s Thesis Supervisors

  • Dr. Mario Fritz
  • Dr. Andreas Bulling

Adviser M.Sc. Mateusz Malinowski

1

slide-2
SLIDE 2

Outline

2

  • Motivation

and Overview

  • Contextual Media

Retrieval System

  • Results and

Conclusion

Sreyasi Nag Chowdhury | Contextual Media Retrieval Using Natural Language Queries 23-02-2015

slide-3
SLIDE 3

3

“Collective Memory”

  • f media content

23-02-2015

Spatio-temporal exploration of media

  • n wearable devices

Motivation

Sreyasi Nag Chowdhury | Contextual Media Retrieval Using Natural Language Queries

slide-4
SLIDE 4

System Overview

4

Demonstration

23-02-2015 Sreyasi Nag Chowdhury | Contextual Media Retrieval Using Natural Language Queries

slide-5
SLIDE 5

System Overview

4

Demonstration : Spatial Exploration

23-02-2015 Sreyasi Nag Chowdhury | Contextual Media Retrieval Using Natural Language Queries

slide-6
SLIDE 6

System Overview

4

Demonstration : Temporal Exploration

23-02-2015 Sreyasi Nag Chowdhury | Contextual Media Retrieval Using Natural Language Queries

slide-7
SLIDE 7

System Overview

5 23-02-2015

Natural Language Voice Query Images and Videos Dynamic-Egocentric environment

Sreyasi Nag Chowdhury | Contextual Media Retrieval Using Natural Language Queries

slide-8
SLIDE 8

Related Work

6

Category Existing Functions Our contribution

23-02-2015 Sreyasi Nag Chowdhury | Contextual Media Retrieval Using Natural Language Queries

slide-9
SLIDE 9

Related Work

6

Category Existing Functions Our contribution

Spatio-temporal Media Retrieval

23-02-2015 Sreyasi Nag Chowdhury | Contextual Media Retrieval Using Natural Language Queries

slide-10
SLIDE 10

Related Work

6

Category Existing Functions Our contribution

Spatio-temporal Media Retrieval

Browsing media collections in a static allocentric setting; Click-based GUI

23-02-2015 Sreyasi Nag Chowdhury | Contextual Media Retrieval Using Natural Language Queries

slide-11
SLIDE 11

Related Work

6

Category Existing Functions Our contribution

Spatio-temporal Media Retrieval

Browsing media collections in a static allocentric setting; Click-based GUI Browsing media collections in a dynamic egocentric setting; hands-free GUI

23-02-2015 Sreyasi Nag Chowdhury | Contextual Media Retrieval Using Natural Language Queries

slide-12
SLIDE 12

Related Work

6

Category Existing Functions Our contribution

Spatio-temporal Media Retrieval

Browsing media collections in a static allocentric setting; Click-based GUI Browsing media collections in a dynamic egocentric setting; hands-free GUI

Natural Language Query Processing

23-02-2015 Sreyasi Nag Chowdhury | Contextual Media Retrieval Using Natural Language Queries

slide-13
SLIDE 13

Related Work

6

Category Existing Functions Our contribution

Spatio-temporal Media Retrieval

Browsing media collections in a static allocentric setting; Click-based GUI Browsing media collections in a dynamic egocentric setting; hands-free GUI

Natural Language Query Processing

Question-answering w.r.t. a static world; Returning textual information

23-02-2015 Sreyasi Nag Chowdhury | Contextual Media Retrieval Using Natural Language Queries

slide-14
SLIDE 14

Related Work

6

Category Existing Functions Our contribution

Spatio-temporal Media Retrieval

Browsing media collections in a static allocentric setting; Click-based GUI Browsing media collections in a dynamic egocentric setting; hands-free GUI

Natural Language Query Processing

Question-answering w.r.t. a static world; Returning textual information Question-answering w.r.t. a dynamic world; Returning media files

23-02-2015 Sreyasi Nag Chowdhury | Contextual Media Retrieval Using Natural Language Queries

slide-15
SLIDE 15

Related Work

6

Category Existing Functions Our contribution

Spatio-temporal Media Retrieval

Browsing media collections in a static allocentric setting; Click-based GUI Browsing media collections in a dynamic egocentric setting; hands-free GUI

Natural Language Query Processing

Question-answering w.r.t. a static world; Returning textual information Question-answering w.r.t. a dynamic world; Returning media files

Media Retrieval Using Natural Language Queries

23-02-2015 Sreyasi Nag Chowdhury | Contextual Media Retrieval Using Natural Language Queries

slide-16
SLIDE 16

Related Work

6

Category Existing Functions Our contribution

Spatio-temporal Media Retrieval

Browsing media collections in a static allocentric setting; Click-based GUI Browsing media collections in a dynamic egocentric setting; hands-free GUI

Natural Language Query Processing

Question-answering w.r.t. a static world; Returning textual information Question-answering w.r.t. a dynamic world; Returning media files

Media Retrieval Using Natural Language Queries

Retrieving media based on scene contents; Using short structured phrases as queries; Does not take into account user's context

23-02-2015 Sreyasi Nag Chowdhury | Contextual Media Retrieval Using Natural Language Queries

slide-17
SLIDE 17

Related Work

6

Category Existing Functions Our contribution

Spatio-temporal Media Retrieval

Browsing media collections in a static allocentric setting; Click-based GUI Browsing media collections in a dynamic egocentric setting; hands-free GUI

Natural Language Query Processing

Question-answering w.r.t. a static world; Returning textual information Question-answering w.r.t. a dynamic world; Returning media files

Media Retrieval Using Natural Language Queries

Retrieving media based on scene contents; Using short structured phrases as queries; Does not take into account user's context Retrieving media based on geographic location; Using rich complete natural language sentences as queries; Takes into account user's context

23-02-2015 Sreyasi Nag Chowdhury | Contextual Media Retrieval Using Natural Language Queries

slide-18
SLIDE 18

Outline

7

  • Motivation

and Overview

  • Contextual Media

Retrieval System

  • Results and

Conclusion

23-02-2015 Sreyasi Nag Chowdhury | Contextual Media Retrieval Using Natural Language Queries

slide-19
SLIDE 19

8 23-02-2015 Sreyasi Nag Chowdhury | Contextual Media Retrieval Using Natural Language Queries

slide-20
SLIDE 20

23-02-2015 Sreyasi Nag Chowdhury | Contextual Media Retrieval Using Natural Language Queries 8

slide-21
SLIDE 21

23-02-2015 Sreyasi Nag Chowdhury | Contextual Media Retrieval Using Natural Language Queries 8

slide-22
SLIDE 22

23-02-2015 Sreyasi Nag Chowdhury | Contextual Media Retrieval Using Natural Language Queries 8

slide-23
SLIDE 23

23-02-2015 Sreyasi Nag Chowdhury | Contextual Media Retrieval Using Natural Language Queries 8

slide-24
SLIDE 24

23-02-2015 Sreyasi Nag Chowdhury | Contextual Media Retrieval Using Natural Language Queries 8

slide-25
SLIDE 25

23-02-2015 Sreyasi Nag Chowdhury | Contextual Media Retrieval Using Natural Language Queries 8

slide-26
SLIDE 26

23-02-2015 Sreyasi Nag Chowdhury | Contextual Media Retrieval Using Natural Language Queries 8

slide-27
SLIDE 27

23-02-2015 Sreyasi Nag Chowdhury | Contextual Media Retrieval Using Natural Language Queries 8

slide-28
SLIDE 28

23-02-2015 Sreyasi Nag Chowdhury | Contextual Media Retrieval Using Natural Language Queries 8

slide-29
SLIDE 29

23-02-2015 Sreyasi Nag Chowdhury | Contextual Media Retrieval Using Natural Language Queries 8

slide-30
SLIDE 30

9

Question – Answering

23-02-2015 Sreyasi Nag Chowdhury | Contextual Media Retrieval Using Natural Language Queries

Q&A Model of Percy Liang

(‚Learning Dependency-based Compositional Semantics‛, Liang et al.)

slide-31
SLIDE 31

9

Question – Answering

23-02-2015 Sreyasi Nag Chowdhury | Contextual Media Retrieval Using Natural Language Queries

Q&A Model of Percy Liang

(‚Learning Dependency-based Compositional Semantics‛, Liang et al.)

slide-32
SLIDE 32

9

Question – Answering

23-02-2015 Sreyasi Nag Chowdhury | Contextual Media Retrieval Using Natural Language Queries

Constraint Satisfaction Problem Q&A Model of Percy Liang

(‚Learning Dependency-based Compositional Semantics‛, Liang et al.)

slide-33
SLIDE 33

9

Question – Answering

23-02-2015 Sreyasi Nag Chowdhury | Contextual Media Retrieval Using Natural Language Queries

Unambiguous query Q&A Model of Percy Liang

(‚Learning Dependency-based Compositional Semantics‛, Liang et al.)

slide-34
SLIDE 34

9

Question – Answering

23-02-2015 Sreyasi Nag Chowdhury | Contextual Media Retrieval Using Natural Language Queries

Unambiguous query Single correct answer Q&A Model of Percy Liang

(‚Learning Dependency-based Compositional Semantics‛, Liang et al.)

slide-35
SLIDE 35

9

Question – Answering

23-02-2015 Sreyasi Nag Chowdhury | Contextual Media Retrieval Using Natural Language Queries

Unambiguous query Single correct answer Static world Q&A Model of Percy Liang

(‚Learning Dependency-based Compositional Semantics‛, Liang et al.)

slide-36
SLIDE 36

9

Question – Answering

23-02-2015 Sreyasi Nag Chowdhury | Contextual Media Retrieval Using Natural Language Queries

Unambiguous query Single correct answer Static world Textual answer Q&A Model of Percy Liang

(‚Learning Dependency-based Compositional Semantics‛, Liang et al.)

slide-37
SLIDE 37

9

Question – Answering

23-02-2015 Sreyasi Nag Chowdhury | Contextual Media Retrieval Using Natural Language Queries

Our Q&A Model

what is there in front

  • f MPI-INF?

Ambiguous query

slide-38
SLIDE 38

9

Question – Answering

23-02-2015 Sreyasi Nag Chowdhury | Contextual Media Retrieval Using Natural Language Queries

what is there in front

  • f MPI-INF?

Ambiguous query Subjective; multiple correct answers Our Q&A Model

slide-39
SLIDE 39

9

Question – Answering

23-02-2015 Sreyasi Nag Chowdhury | Contextual Media Retrieval Using Natural Language Queries

what is there in front

  • f MPI-INF?

Ambiguous query Subjective; multiple correct answers Dynamic world Our Q&A Model

slide-40
SLIDE 40

9

Question – Answering

23-02-2015 Sreyasi Nag Chowdhury | Contextual Media Retrieval Using Natural Language Queries

what is there in front

  • f MPI-INF?

Ambiguous query Subjective; multiple correct answers Dynamic world Media as answer Our Q&A Model

slide-41
SLIDE 41

10

Dynamic-Egocentric Extension

23-02-2015 Sreyasi Nag Chowdhury | Contextual Media Retrieval Using Natural Language Queries

𝑿𝒑𝒔𝒎𝒆 (𝒙)

slide-42
SLIDE 42

10

Dynamic-Egocentric Extension

23-02-2015 Sreyasi Nag Chowdhury | Contextual Media Retrieval Using Natural Language Queries

𝑿𝒑𝒔𝒎𝒆 (𝒙)

slide-43
SLIDE 43

10

Dynamic-Egocentric Extension

𝑻𝒖𝒃𝒖𝒋𝒅 𝑿𝒑𝒔𝒎𝒆 (𝒙𝒕)

cafe(‘mensa’,49.2560,7.0454). building(‘mpi_inf’,49.2578,7.0460).

23-02-2015 Sreyasi Nag Chowdhury | Contextual Media Retrieval Using Natural Language Queries

𝑿𝒑𝒔𝒎𝒆 (𝒙)

slide-44
SLIDE 44

10

Dynamic-Egocentric Extension

𝑻𝒖𝒃𝒖𝒋𝒅 𝑿𝒑𝒔𝒎𝒆 (𝒙𝒕)

cafe(‘mensa’,49.2560,7.0454). building(‘mpi_inf’,49.2578,7.0460).

23-02-2015 Sreyasi Nag Chowdhury | Contextual Media Retrieval Using Natural Language Queries

𝑿𝒑𝒔𝒎𝒆 (𝒙)

𝑬𝒛𝒐𝒃𝒏𝒋𝒅 𝑿𝒑𝒔𝒎𝒆 𝒙𝒆

slide-45
SLIDE 45

10

Dynamic-Egocentric Extension

𝑻𝒖𝒃𝒖𝒋𝒅 𝑿𝒑𝒔𝒎𝒆 (𝒙𝒕)

cafe(‘mensa’,49.2560,7.0454). building(‘mpi_inf’,49.2578,7.0460).

23-02-2015 Sreyasi Nag Chowdhury | Contextual Media Retrieval Using Natural Language Queries

𝑿𝒑𝒔𝒎𝒆 (𝒙)

𝑬𝒛𝒐𝒃𝒏𝒋𝒅 𝑿𝒑𝒔𝒎𝒆 𝒙𝒆

slide-46
SLIDE 46

10

Dynamic-Egocentric Extension

𝑻𝒖𝒃𝒖𝒋𝒅 𝑿𝒑𝒔𝒎𝒆 (𝒙𝒕)

cafe(‘mensa’,49.2560,7.0454). building(‘mpi_inf’,49.2578,7.0460).

𝑽𝒕𝒇𝒔 𝑵𝒇𝒖𝒃𝒆𝒃𝒖𝒃 (𝒙𝒆𝒗)

person(49.2578,7.0460,’n’). day(20150220).

23-02-2015 Sreyasi Nag Chowdhury | Contextual Media Retrieval Using Natural Language Queries

𝑿𝒑𝒔𝒎𝒆 (𝒙)

𝑬𝒛𝒐𝒃𝒏𝒋𝒅 𝑿𝒑𝒔𝒎𝒆 𝒙𝒆

slide-47
SLIDE 47

10

Dynamic-Egocentric Extension

𝑻𝒖𝒃𝒖𝒋𝒅 𝑿𝒑𝒔𝒎𝒆 (𝒙𝒕)

cafe(‘mensa’,49.2560,7.0454). building(‘mpi_inf’,49.2578,7.0460).

𝑽𝒕𝒇𝒔 𝑵𝒇𝒖𝒃𝒆𝒃𝒖𝒃 (𝒙𝒆𝒗)

person(49.2578,7.0460,’n’). day(20150220).

𝑫𝒑𝒎𝒎𝒇𝒅𝒖𝒋𝒘𝒇 𝑵𝒇𝒏𝒑𝒔𝒛 (𝒙𝒆𝒏)

image(`img_20141111_165828',2014111 1,49.2566,7.0442,`november'). video(`vid_20141121_120149',20141121 ,49.2569, 7.0456,`november').

23-02-2015 Sreyasi Nag Chowdhury | Contextual Media Retrieval Using Natural Language Queries

𝑿𝒑𝒔𝒎𝒆 (𝒙)

𝑬𝒛𝒐𝒃𝒏𝒋𝒅 𝑿𝒑𝒔𝒎𝒆 𝒙𝒆

slide-48
SLIDE 48

11

Qualitative Results

23-02-2015 Sreyasi Nag Chowdhury | Contextual Media Retrieval Using Natural Language Queries

slide-49
SLIDE 49

11

Qualitative Results

23-02-2015 Sreyasi Nag Chowdhury | Contextual Media Retrieval Using Natural Language Queries

slide-50
SLIDE 50

11

Qualitative Results

23-02-2015 Sreyasi Nag Chowdhury | Contextual Media Retrieval Using Natural Language Queries

slide-51
SLIDE 51

11

Qualitative Results

23-02-2015 Sreyasi Nag Chowdhury | Contextual Media Retrieval Using Natural Language Queries

slide-52
SLIDE 52

11

Qualitative Results

23-02-2015 Sreyasi Nag Chowdhury | Contextual Media Retrieval Using Natural Language Queries

slide-53
SLIDE 53

11

Qualitative Results

23-02-2015 Sreyasi Nag Chowdhury | Contextual Media Retrieval Using Natural Language Queries

slide-54
SLIDE 54

11

Qualitative Results

23-02-2015 Sreyasi Nag Chowdhury | Contextual Media Retrieval Using Natural Language Queries

slide-55
SLIDE 55

Outline

12

  • Motivation

and Overview

  • Contextual Media

Retrieval System

  • Results and

Conclusion

23-02-2015 Sreyasi Nag Chowdhury | Contextual Media Retrieval Using Natural Language Queries

slide-56
SLIDE 56

Evaluation

13 23-02-2015 Sreyasi Nag Chowdhury | Contextual Media Retrieval Using Natural Language Queries

Agreement and Disagreement between users * Model tested

  • n 500

test queries

slide-57
SLIDE 57

Evaluation

13 23-02-2015 Sreyasi Nag Chowdhury | Contextual Media Retrieval Using Natural Language Queries

26.67%

Agreement and Disagreement between users

Total agreement

* Model tested

  • n 500

test queries

slide-58
SLIDE 58

Evaluation

13 23-02-2015 Sreyasi Nag Chowdhury | Contextual Media Retrieval Using Natural Language Queries

Agreement and Disagreement between users

~ 40% Majority agreement

* Model tested

  • n 500

test queries

slide-59
SLIDE 59

Evaluation

13 23-02-2015 Sreyasi Nag Chowdhury | Contextual Media Retrieval Using Natural Language Queries

Agreement and Disagreement between users

~ 25% Disagreement

* Model tested

  • n 500

test queries

slide-60
SLIDE 60

Evaluation

13 23-02-2015 Sreyasi Nag Chowdhury | Contextual Media Retrieval Using Natural Language Queries

Agreement and Disagreement between users

~ 25% Future scope for personalization

* Model tested

  • n 500

test queries

slide-61
SLIDE 61

Evaluation

14

Study of human reference frame resolution

23-02-2015 Sreyasi Nag Chowdhury | Contextual Media Retrieval Using Natural Language Queries

slide-62
SLIDE 62

Evaluation

14

Study of human reference frame resolution

23-02-2015 Sreyasi Nag Chowdhury | Contextual Media Retrieval Using Natural Language Queries

Future scope for using

  • ther Knowledgebases
slide-63
SLIDE 63

Summary

We have:

  • Instantiated a ‚Collective Memory” of media content
  • Developed a novel architecture for media retrieval

with natural language voice queries in a dynamic setting - Xplore-M-Ego

  • Integrated ‘egocentrism’ to media retrieval

15 23-02-2015 Sreyasi Nag Chowdhury | Contextual Media Retrieval Using Natural Language Queries

slide-64
SLIDE 64

Summary

We have:

  • Instantiated a ‚Collective Memory” of media content
  • Developed a novel architecture for media retrieval

with natural language voice queries in a dynamic setting - Xplore-M-Ego

  • Integrated ‘egocentrism’ to media retrieval

15 23-02-2015 Sreyasi Nag Chowdhury | Contextual Media Retrieval Using Natural Language Queries

Thank You

slide-65
SLIDE 65

References

  • Photo Tourism: Exploring Photo Collections in 3D

Noah Snavely, Steven M. Seitz, Richard Szeliski

  • Video Collections in Panoramic Contexts

J.Tompkin, F.Pece, R.Shah, S.Izadi, J.Kautz, C.Theobalt

  • Videoscapes: Exploring Sparse, Unstructures Video Collections

J.Tompkin, K. In Kim, J.Kautz, C.Theobalt

  • PhotoScope:Visualizing Spatiotemporal Coverage of Photos for

Construction Management F.Wu, M.Tory

16 23-02-2015 Sreyasi Nag Chowdhury | Contextual Media Retrieval Using Natural Language Queries

slide-66
SLIDE 66

References

  • Learning Dependency-Based Compositional Semantics

Percy Liang, Michael I. Jordan, Dan Klein

  • A multi-world approach to question answering about real-world

scenes based on uncertain input

  • M. Malinowski, M. Fritz
  • Image Retrieval with Structured Object Queries Using Latent

Ranking SVM T.Lan, W.Yang, Y.Wang, G.Mori

  • Interpretation of Spatial Language in a Map Navigation Task
  • M. Levit, D. Roy

16 23-02-2015 Sreyasi Nag Chowdhury | Contextual Media Retrieval Using Natural Language Queries

slide-67
SLIDE 67

Extra Material

23-02-2015

slide-68
SLIDE 68

Contribution

  • Instantiation of a ‚Collective Memory‛ of media files
  • Extension of question-answering to a dynamic setting
  • Extension of spatio-temporal exploration of media to a

dynamic setting

  • Incorporation of ‘egocentrism’ to media retrieval
  • Use of natural language voice queries for media

retrieval

23-02-2015

Sreyasi Nag Chowdhury | Contextual Media Retrieval Using Natural Language Queries

slide-69
SLIDE 69

System Overview

69

Modules of Xplore-M-Ego

  • The Google Glass:

User Interface

  • Pre-processing :

Modification of query, Mapping of a dynamic environment to a static environment

  • Semantic Parser + Denotation :

Semantic parsing and prediction of answer

  • Collective Memory :

Store of media files

23-02-2015

slide-70
SLIDE 70

Related Work

  • Spatio-temporal Media Retrieval

70

Paper Author(s) Overview

Photo tourism: exploring photo collections in 3D

  • N. Snavely, S. M. Seitz,

and R. Szeliski Exploration of popular world sites by browsing through images Video collections in panoramic contexts

  • J. Tompkin, F. Pece, S.

Rajvi, I. Shahram, K. Jan, and C. Theobalt Spatio-temporal exploration of videos embedded on a panoramic context

23-02-2015

slide-71
SLIDE 71

Related Work

  • Natural Language Question-Answering

71

Paper Author(s) Overview

Learning Dependency-based compositional Semantics

  • P. Liang, M. I. Jordan,

and D. Klein Training of a semantic parser with question- answer pairs; single static world approach A multi-world approach to question answering about real-world scenes based on uncertain input

  • M. Malinowski and M.

Fritz Question-answering task based on real world indoor images; static multi-world approach

23-02-2015

slide-72
SLIDE 72

Related Work

  • Media Retrieval with Natural Language Queries

72

Paper Author(s) Overview

Towards surveillance video search by natural language query

  • S. Tellex and D. Roy

Retrieval of video frames from surveillance videos with spatial relations ‚across‛ and ‚along‛ Image retrieval with structured

  • bject queries using

latent ranking SVM

  • T. Lan, W. Yang, Y.

Wang, and G. Mori Retrieval of images based on scene contents using short structured phrases as queries

23-02-2015

slide-73
SLIDE 73

Data Collection

73 10-02-2015

1. Map information : OpenStreetMap

Contains –

  • Type of the entity
  • GPS coordinates
  • Name
  • Address
slide-74
SLIDE 74

Data Collection

74 10-02-2015

  • 2. Collection of media files : Collective Memory

** Media files were captured with smart phones

slide-75
SLIDE 75

Data Collection

75 10-02-2015

  • 3. Training and Test data

 Synthetically-generated Data  Real-world Data

(‚What is there on the left of MPI-INF?‛, ‘img_20141102_123406’) (‚What is on the left of MPI-INF?‛, ‘img_20141113_160930’) (‚What is to the left of MPI-INF?‛, ‘img_20141109_134914’) (‚What is on the left side of MPI-INF?‛, ‘img_20141115_100705’) (‚What is there in front of MPI-INF?‛, answer(A, (frontOf(A, ‘mpi inf’)))) (‚What is there behind MPI-INF?‛, answer(A, (behind(A, ‘mpi inf’)))) (‚What is there on the right of MPI-INF?‛, answer(A, (rightOf(A, ‘mpi inf’)))) (‚What is there on the left of MPI-INF?‛, answer(A, (leftOf(A, ‘mpi inf’))))

slide-76
SLIDE 76

Data Collection

76 23-02-2015

slide-77
SLIDE 77

77

Dependency-based Compositional Semantics (DCS) by Percy Liang

  • DCS tree defines

relations between predicates

  • Denotation

are solutions satisfying the relations

  • city, major, loc, CA

are predicates

Semantic Parser

23-02-2015

slide-78
SLIDE 78

78

World(w):

state('california','ca', 'sacramento', 23.67e+6, 158.0e+3,31, 'los angeles', 'san diego', 'san francisco', 'san jose'). city('alabama','al','birmingham',284413). river('arkansas',2333,['colorado','kansas', 'oklahoma','arkansas']). mountain('alaska','ak','mckinley',6194). road('86',['massachusetts','connecticut']). country('usa',307890000,9826675).

Example Questions

‚What is the highest point in Florida?‛ ‚Which State has the shortest river?‛ ‚What is the capital of Maine?‛ ‚What are the populations of states through which the Mississippi river run?‛ ‚Name all the lakes of US?‛

Semantic Parser

23-02-2015

slide-79
SLIDE 79

79

Learning in DCS

Semantic Parser

**slide courtesy: Percy Liang

23-02-2015

slide-80
SLIDE 80

80

  • Induction of logical forms
  • Logical

forms (DCS trees) induced as latent variables according to a probability distribution parametrized with θ

  • Answer

y evaluated with respect to world w

Semantic Parser

23-02-2015

slide-81
SLIDE 81

81

  • Induction of logical forms

Requirements –

A set of rules/predicates:

city(cityid(City,St)) :- state(State,St,_ ,_ ,_ ,_ ,City,_ ,_ ,_ ). loc(cityid(City,St),stateid(State)) :- state(State,St,_,_ ,_ ,_ ,City,_ ,_ ,_ ). river(riverid(R)) :- river(R,_ ,_ ). loc(cityid(City,St),stateid(State)) :- city(State,St,City, ). traverse(riverid(R),stateid(S)) :- river(R, ,States), member(S,States). area(stateid(X),squared mile(Area)) :- state(X,_ ,_ ,_ ,Area,_ ,_ ,_ ,_ ,_ ). population(countryid(X),Pop) :- country(X,Pop,_). major(X) :- city(X), population(X,moreThan(150000)).

Semantic Parser

23-02-2015

slide-82
SLIDE 82

82

  • Induction of logical forms

Requirements –

A set of lexical triggers(L):

<(function words; predicate)> (most, size). (total, sum). (called, nameObj). <([POS tags]; [predicates])> (WRB; loc) ([NN;NNS]; [city,state,country,lake,mountain,river,place) ([NN;NNS]; [person,capital,population]) ([NN;NNS; JJ]; [len,negLen,size,negSize,elevation) ([NN;NNS; JJ]; [negElevation,density,negDensity,area,negArea]) (JJ; major)

Augmented Lexicon(L+):

(long, len). (large, size). (small, negSize). (high, elevation).

Semantic Parser

23-02-2015

slide-83
SLIDE 83

83

World(w):

image(`img_20141111_165828',201 41111,49.2566,7.0442,`november'). video(`vid_20141121_120149',2014 1121,49.2569, 7.0456,`november'). cafe(‘mensa’,49.2560,7.0454). building(‘mpi_inf’,49.2578,7.0460). bank(‘postbank’,49.2556,7.0449).

Example Questions

‚What is there on the right of MPI- INF?‛ ‚What is there in front of postbank?‛ ‚What is there on the left of Mensa?‛ ‚What is there near Science Park?‛ ‚What happened here one day ago?‛ ‚What does this place look like in December?‛

Media Retrieval from Denotations

23-02-2015

slide-84
SLIDE 84

84

Lexical triggers:

Basic lexicon L

Augmented lexicon L+

([WP,WDT], [image,video]). (NN, [atm,building,cafe,highway,parking,research_institution, restaurant,shop,sport,tourism,university]). (JJS, [nearest]). ([NN,NNS,VB], [view]). (VBD, [view]).

Prediction accuracy: 17.9%

(front, frontOf). (behind, behind). (right, rightOf). (left, leftOf).

Prediction accuracy: 47%

Dynamic-Egocentric Extension

23-02-2015

slide-85
SLIDE 85

85

Dynamic-Egocentric Extension

g g g g g g g g g g g g g

23-02-2015

slide-86
SLIDE 86

86

Dynamic-Egocentric Extension

g g g g g g g g g g g g g

23-02-2015

slide-87
SLIDE 87

87

POS tags from Penn Treebank

  • WRB : Wh-adverb
  • NN : Noun, singular or mass
  • NNS : Noun, plural
  • JJ : Adjective
  • WP : Wh-pronoun
  • WDT : Wh-determiner
  • NN : Noun, singular or mass
  • JJS : Adjective, superlative
  • NNS : Noun, plural
  • VB : Verb
  • VBD : Verb, past tense

23-02-2015

slide-88
SLIDE 88

88

Reason behind hard-coding spatial relations

  • What is there left/VBN of MPI?
  • What is there on the left/NN of MPI?
  • What is there in front/NN of MPI?
  • What is there behind/IN MPI?
  • What is there right/RB of MPI?
  • What is there on the right/NN of MPI?

23-02-2015

slide-89
SLIDE 89

89

Predicates used in Xplore-M-Ego

23-02-2015

slide-90
SLIDE 90

Results and Evaluation

90

  • Synthetically generated

question-answer pairs used for training and testing

  • Maximum prediction

accuracy – 47%

23-02-2015

slide-91
SLIDE 91

Results and Evaluation

91

Performance Measures:

  • 𝑟𝑛 = 𝑜𝑣𝑛𝑐𝑓𝑠 𝑝𝑔 𝑟𝑣𝑓𝑠𝑗𝑓𝑡 𝑥𝑗𝑢𝑖 𝑛𝑓𝑒𝑗𝑏 𝑠𝑓𝑢𝑠𝑗𝑓𝑤𝑏𝑚𝑡
  • 𝑟𝑠 = 𝑜𝑣𝑛𝑐𝑓𝑠 𝑝𝑔 𝑟𝑣𝑓𝑠𝑗𝑓𝑡 𝑥𝑗𝑢𝑖 𝑠𝑓𝑚𝑓𝑤𝑏𝑜𝑢 𝑠𝑓𝑢𝑠𝑗𝑓𝑤𝑏𝑚𝑡 𝑏𝑛𝑝𝑜𝑕 𝑟𝑛
  • 𝑟𝑢 = 𝑜𝑣𝑛𝑐𝑓𝑠 𝑝𝑔 𝑟𝑣𝑓𝑠𝑗𝑓𝑡 𝑥𝑗𝑢𝑖 𝑢𝑓𝑦𝑢𝑣𝑏𝑚 𝑠𝑓𝑢𝑠𝑗𝑓𝑤𝑏𝑚𝑡 𝑏𝑜𝑒 𝑜𝑝 𝑠𝑓𝑢𝑠𝑗𝑓𝑤𝑏𝑚𝑡
  • 𝑏𝑤𝑓𝑠𝑏𝑕𝑓 𝑞𝑠𝑓𝑑𝑗𝑡𝑗𝑝𝑜 =

𝑟𝑠 𝑟𝑛

  • 𝑏𝑤𝑓𝑠𝑏𝑕𝑓 𝑠𝑓𝑑𝑏𝑚𝑚 =

𝑟𝑠 𝑟𝑛 + 𝑟𝑢

23-02-2015

slide-92
SLIDE 92

Results and Evaluation

  • ‚human-in-the-loop‛ training of the model
  • Five different models were trained
  • Training accuracies ranged from 42.6% to 48.8%
  • The best model based on training accuracy was used for

further evaluations

92 23-02-2015

slide-93
SLIDE 93

Results and Evaluation

  • ‚human-in-the-loop‛ training of the model

93

  • It is a method of

training the semantic parser by human users through relevance feedback

  • ‚Correct‛/‚Wrong‛

decisions are made solely based on the predicted answers

  • The models are

trained with real questions from human users

23-02-2015

slide-94
SLIDE 94

Results and Evaluation

  • ‚human-in-the-loop‛ training of the model
  • Automatic training of the semantic parser with the real

data was not possible because –

  • GPS coordinates of media files showing a particular entity

does not match that of the map data

  • Humans are inconsistent with regards to reference frames
  • Question-answer pairs didn’t follow any pattern
  • Denotations (often more than one answer) never matched

with true answers, hence EM-like algorithm failed to learn

94 23-02-2015

slide-95
SLIDE 95

Results and Evaluation

  • RealModel -model

trained with real- world data

  • Relevance feedback

collected from five users

  • Overall percentage of

relevant retrievals = 26.67%

95

Human evaluation of model trained with real-world data

23-02-2015

slide-96
SLIDE 96

Results and Evaluation

  • Recall of SynthModel =

15.88%

  • Recall of RealModel =

26.67%

96 23-02-2015

slide-97
SLIDE 97

Evaluation

  • Five hypothetical

locations and viewing directions provided to users

  • Relevance feedback

collected for retrievals following a canonical reference frame and a user-centric reference frame

97

Human evaluation of temporal and contextual Q&A

23-02-2015 Sreyasi Nag Chowdhury | Contextual Media Retrieval Using Natural Language Queries

slide-98
SLIDE 98

Evaluation

98

  • Canonical and User-centric

reference frame:

Human evaluation of temporal and contextual Q&A

User heading East

Original: What is there in front of MPI-INF? Altered: What is there on the right of MPI-INF?

‚front of‛ ‚behind‛ ‚left of‛ ‚right of‛

23-02-2015 Sreyasi Nag Chowdhury | Contextual Media Retrieval Using Natural Language Queries

slide-99
SLIDE 99

Discussion

Problem with matching GPS coordinates

99

MPI-INF

iCoffee

What is in front

  • f MPI-INF?

‚Front of MPI-INF‛

ground-truth media retrieved media

23-02-2015

slide-100
SLIDE 100

Discussion

100

Challenges Limitations

Converting a dynamic world to a static world Spatial and temporal references not identified Integrating ‘egocentrism’ Words tagged with incorrect POS tags Handling temporal queries Arguments not identified from sentences Collection of data Scalability Increasing the coverage of the static database Reference resolution is not handled

23-02-2015

slide-101
SLIDE 101

Discussion

Accuracy of Performance

  • Matching the exact GPS coordinates for retrievals proved to be a

failure

  • It was handled by rough localization by rounding the GPS

coordinates to the first 6 significant digits (49.2578401 -> 49.2578) Failure case:

101 23-02-2015

slide-102
SLIDE 102

Discussion

Future Work

  • Integration of image processing and computer vision methods

for scene understanding (similar to Malinowski et al.)

  • Development of a better semantic parser in light of our

discussions about its limitations

  • Development of more robust location sensors in devices used for

capturing media

  • Generation of a consensus about reference frames for

applications involving the use of spatial relations

102 23-02-2015

slide-103
SLIDE 103

103

Summary of Quantitative Results

23-02-2015