1
University of T
- ronto
Training Video Content in Natural Language Descriptions Marcus - - PowerPoint PPT Presentation
Training Video Content in Natural Language Descriptions Marcus Rohrbach, Wei Qiu, Ivan Titov, Stefan Thater, Manfred Pinkal, Bernt Schiele 1 University of T oronto Outline Brief overview The 3 main questions of video description
1
University of T
2
University of T
3
University of T
4
University of T
5
University of T
6
University of T
7
University of T
Video sample from TACoS Video's corresponding data
8
University of T
9
University of T
1) A language model:
2) A translation model:
target languages 3) A decoder:
probabilities
10
University of T
11
University of T
1, ...,yi li, ..., yi Li, zi) where Li is the number of SR
li, yi li)
12
University of T
1, zi), ..., (yi Li, zi).
Li, zi).
13
University of T
14
University of T
u,xi>
j: vector of the size of the video representation xi
j,k
li and SR labels
yi
li = <n1, n2, ..., nN> using loopy belief propagation (LBP)
15
University of T
Find the verbalization of a label ni. e.g., HOB -> stove Translate a word from LS to LT Determine the ordering of the concepts
Find the alignment between two languages Not necessarily all semantic concepts are verbalized in D. e.g., KNIFE not verbalized in He cuts a carrot Certain words in LS not represented in LT
e.g., articles Not necessarily all verbalized concepts are semantically represented. e.g, CUT, CARROT -> He cuts the carrots Certain words in LT not represented in LS
A language model of D is used to achieve a grammatically correct and fluent target sentence. A language model of LT is used to achieve a grammatically correct and fluent target sentence.
16
University of T
17
University of T
18
University of T
the reference sentence(s) Information from Frank Rudzicz's slides for the NLC course.
19
University of T
20
University of T
21
University of T
22
University of T
judgements
locations described.
23
University of T
1Computed only on a 272 sentence subset where the corpus contains more than a
single reference sentence for the same video. This reduces the number of references by one which leads to a lower BLEU score.
24
University of T
25
University of T
26
University of T
27
University of T
28
University of T