Visual Search Engine for Handwritten and Typeset Math in Lecture Videos and LATEX Notes
Kenny Davila and Richard Zanibbi August 6, 2018
Center for Unified Biometrics and Sensors
Visual Search Engine for Handwritten and Typeset Math in Lecture - - PowerPoint PPT Presentation
Visual Search Engine for Handwritten and Typeset Math in Lecture Videos and LATEX Notes Kenny Davila and Richard Zanibbi August 6, 2018 Center for Unified Biometrics and Sensors Select 2 Select 3 Select Search 4 SEARCH RESULTS Found in
Kenny Davila and Richard Zanibbi August 6, 2018
Center for Unified Biometrics and Sensors
2
Select
3
Select
4
Select
Search
SEARCH RESULTS Found in Lecture Videos
… Related Topics
5
What about other Mathematical Expressions? Could I write my queries instead
6
What about other Mathematical Expressions? Could I write my queries instead
Yes, using
7
Lecture Notes Lecture Video
→ →
Whiteboard Whiteboard
→
Whiteboard
→ Whiteboard
8
Applied to Indexing and Retrieval of formulae from Lecture materials Based on Matching Symbol Pairs from Line of Sight Graphs (LOS) Domain knowledge is given by Recognition Module
Source code released: https://cs.rit.edu/~dprl/Software.html
9
Related fields:
[1] J. Sivic & A. Zisserman, “Video Google: A text retrieval approach to object matching in videos,” in ICCV 2003 [2] S. Sudholt & G. A. Fink, “Phocnet: A deep convolutional neural network for word spotting in handwritten documents,” in ICFHR 2016 [3] R. Zanibbi & D. Blostein, “Recognition and retrieval of mathematical expressions,” IJDAR, vol. 15, no. 4, 2012. [4] K. Davila & R. Zanibbi, “Layout and semantics: Combining representations for mathematical formula search,” SIGIR, 2017
10
Indexing Pipeline Navigation Pipeline Retrieval Pipeline
11
Input Lecture Notes
Binary Images
Output Math Expressions
12
[1] Davila, K., Zanibbi, R. Whiteboard Content Summarization via Spatio-Temporal Conflict Minimization in Lecture Videos. ICDAR 2017
Preprocessing Lecture Video Summarization [1]
Temporal Index
Binary Images
Input Lecture Video Output Whiteboard Contents Keyframes
MTS/ MP4 Content Extraction Temporal Segmentation Spatio- temporal Analysis
13
14
(Videos Only)
Temporal Index
Raw Data Pre- processing Binary Images
[1] Davila, K., Zanibbi, R. Whiteboard Content Summarization via Spatio-Temporal Conflict Minimization in Lecture Videos. ICDAR 2017
AccessMath Lecture Video Summarization [1]
15
(Videos Only)
Temporal Index
Raw Data Pre- processing Spatial Index Binary Images LOS Graph Construction Spatial Index Construction
[1] Davila, K., Zanibbi, R. Whiteboard Content Summarization via Spatio-Temporal Conflict Minimization in Lecture Videos. ICDAR 2017
Tangent-V AccessMath Lecture Video Summarization [1]
16
Uses Connected Components (CC) as Nodes Two nodes are connected if
17
True Node Labels/Relationships are unknown
𝝏∈Ω
≥ 80% 𝑙 ≤ 10
18
(0.707, 0.707, 0.000) (1.000, 0.000, 0.000) (-0.707, -0.707, 0.000)
(0.146, -0.146, 0.978)
19
𝒒𝒚 - 𝒒(𝝏𝒚|𝒕𝒚) 𝒅 - 3D Unit Vector from 𝒕𝟐 to 𝒕𝟑 𝒕𝒒 - Size Ratio between 𝒕𝟐 and 𝒕𝟑
Tuples Generated 𝛁𝟐 × 𝛁𝟑
𝝏𝟐, 𝝏𝟑, 𝒒𝟐, 𝒒𝟑, 𝒅, 𝒕𝒒
Top k-labels per node 𝛁
𝑇1 = 𝑦 𝑇2 = 8 𝛻1 = (𝑦, 0.8), (𝑌, 0.2) 𝛻2 = (8, 0.6), (&, 0.3) 𝒅 = 𝟏. 𝟖𝟐, −𝟏. 𝟖𝟐, 𝟏. 𝟏𝟏 𝒕𝒒 = 1.26
Inverted Index for Symbol Pairs Entries: Pairs of symbol labels 𝝏𝟐, 𝝏𝟑 Posting lists: Pair locations in images with 𝑱𝑬,𝒒𝟐, 𝒒𝟑, 𝒅, 𝒕𝒒 𝒅𝟐, 𝒅𝟑
Indexing of Videos/Notes Data Indexing Pipeline Navigation Pipeline Retrieval Pipeline Spatial Index
Temporal Index
20
Spatial Index Search Results Initial Lookup Pre- processing Query Image Query Graph Structural Alignment Layer 1 Layer 2
21
Query symbol pairs are used to find matches on their corresponding entries on the inverted index structure A match between index symbol pair 𝑄𝑑 = (𝑑1, 𝑑2) and query pair 𝑄𝑟 = (𝑟1, 𝑟2) will be accepted as valid if and only if: 1 - They are spatially consistent:
𝒅 ⋅ 𝒓 ≥ cos 45∘
2 - Optionally, if they have consistent size ratios (not too small/large)
Matching Pairs Scores are then aggregated by unique Graph Pair IDs
22
23
Matching Pairs Matching Subgraphs
24
Greedy Match Growing
Matching Pairs Matching Subgraphs
X + Y Match 1 X + Y Match 2 X + Y New Match X + Y Query
+ =
Score= 0.7 Score= 0.5 Score= 1.2
25
Greedy Match Growing Greedy Match Connection
Matching Pairs Matching Subgraphs
+ =
X + Y Query = Match 1
Score= 0.5
X + 1 = Match 2
Score= 0.4
X + 1 = New Match
Score= 0.9
X + 1 =
26
Greedy Match Growing Greedy Match Connection Incompatible Match Removal
Matching Pairs Matching Subgraphs
Score= 5.0 Score= 0.5
X + X Query + 1 2 Match 1 X + X + 1 2 Match 2 X + X + 1 2
Accepted Removed
27
Greedy Match Growing Greedy Match Connection Incompatible Match Removal Match Grouping
Matching Pairs Matching Subgraphs
Query: Lecture 01 – KF #5 Lecture 01 – KF #6 Same match!
We introduce two scoring schemes: α and h
28
Item 𝜷 𝑵 𝒊 𝑵 Description A weighted edge recall Harmonic mean of weighted edge recall and node recall Edge weighting pair-wise symbol alignments and scaled cosine similarity scaled cosine similarity Node weighting
Based on
Execution Times Faster Slower
[1] R. Zanibbi, K. Davila, A. Kane, & F. Tompa, “Multi-stage math formula search: Using appearance-based similarity metrics at scale,” SIGIR, 2016
Retrieval System Data Search Results Indexing Pipeline Navigation Pipeline Retrieval Pipeline Query Spatial Index
Temporal Index
29
Video Navigation Data Search Results Indexing Pipeline Navigation Pipeline Retrieval Pipeline Query Spatial Index
Temporal Index
30
Check our demo at: https://youtu.be/gn24qo1MLN0
31
AccessMath Dataset
A total of 20 evaluation queries were chosen with rejection sampling A total of 4 combinations of Query-vs-Index modalities
For a given query, the target is to find a math expression that contains the whole query graph
32
Two metrics are considered
𝑆𝑆 = 1 𝑠 1 ≤ 𝑠 ≤ 10 𝑝𝑢ℎ𝑓𝑠𝑥𝑗𝑡𝑓
33
Weighted Edge Recall 𝜷 Harmonic Mean h Query Index
𝜷 𝜷∧ 𝜷∧𝒕 𝒊 𝒊∧ 𝒊∧𝒕
LaTeX 1.00 1.00 1.00 1.00 1.00 1.00 Whiteboard 0.95 1.00 1.00 1.00 1.00 1.00 Whiteboard 0.95 0.95 0.90 0.95 1.00 0.95 Whiteboard LaTeX 0.80 0.85 0.85 0.90 0.90 0.90
34
Weighted Edge Recall 𝜷 Harmonic Mean h Query Index
𝜷 𝜷∧ 𝜷∧𝒕 𝒊 𝒊∧ 𝒊∧𝒕
LaTeX 0.98 1.00 1.00 0.98 1.00 1.00 Whiteboard 0.93 1.00 1.00 1.00 1.00 1.00 Whiteboard 0.66 0.69 0.71 0.89 0.84 0.86 Whiteboard LaTeX 0.63 0.71 0.74 0.74 0.78 0.84
35
Tangent-V is effective for search between Typeset and Handwriting
Tangent-V can also be used to create navigational tools New symbol recognizers can be used for indexing of new domains
Future work:
36
This material is based upon work supported by the National Science Foundation (USA) under Grants No. IIS-1016815 and HCC-1218801. We also thank Anurag Agarwal for helping in the creation of the lecture videos used to evaluate our system. Source code: www.cs.rit.edu/~dprl/Software.html
37