Visual Search Engine for Handwritten and Typeset Math in Lecture - - PowerPoint PPT Presentation

visual search engine for
SMART_READER_LITE
LIVE PREVIEW

Visual Search Engine for Handwritten and Typeset Math in Lecture - - PowerPoint PPT Presentation

Visual Search Engine for Handwritten and Typeset Math in Lecture Videos and LATEX Notes Kenny Davila and Richard Zanibbi August 6, 2018 Center for Unified Biometrics and Sensors Select 2 Select 3 Select Search 4 SEARCH RESULTS Found in


slide-1
SLIDE 1

Visual Search Engine for Handwritten and Typeset Math in Lecture Videos and LATEX Notes

Kenny Davila and Richard Zanibbi August 6, 2018

Center for Unified Biometrics and Sensors

slide-2
SLIDE 2

2

Select

slide-3
SLIDE 3

3

Select

slide-4
SLIDE 4

4

Select

Search

slide-5
SLIDE 5

SEARCH RESULTS Found in Lecture Videos

  • 1. Linear Algebra – Lecture 06
  • 2. Linear Algebra – Lecture 08
  • 3. Linear Algebra – Lecture 10

… Related Topics

  • 1. Systems of Equations
  • 2. Matrix Reduction
  • 3. Linear Algebra

5

slide-6
SLIDE 6

What about other Mathematical Expressions? Could I write my queries instead

  • f using Images?

6

slide-7
SLIDE 7

What about other Mathematical Expressions? Could I write my queries instead

  • f using Images?

Yes, using

7

slide-8
SLIDE 8

Potential Search Modes

Lecture Notes Lecture Video

→ →

Whiteboard Whiteboard

Whiteboard

→ Whiteboard

8

slide-9
SLIDE 9

Tangent-V Visual Search Engine

Applied to Indexing and Retrieval of formulae from Lecture materials Based on Matching Symbol Pairs from Line of Sight Graphs (LOS) Domain knowledge is given by Recognition Module

  • Currently: Mathematical Symbol Recognition

Source code released: https://cs.rit.edu/~dprl/Software.html

9

slide-10
SLIDE 10

Related Work

Related fields:

  • Content-Based Image Retrieval [1]
  • Word Spotting [2]
  • Mathematical Information Retrieval [3]
  • Formula Representation: Semantic vs Appearance
  • Retrieval Modality: Symbol vs Image-based
  • Tangent-V generalizes the Tangent-S formula retrieval model [4]

[1] J. Sivic & A. Zisserman, “Video Google: A text retrieval approach to object matching in videos,” in ICCV 2003 [2] S. Sudholt & G. A. Fink, “Phocnet: A deep convolutional neural network for word spotting in handwritten documents,” in ICFHR 2016 [3] R. Zanibbi & D. Blostein, “Recognition and retrieval of mathematical expressions,” IJDAR, vol. 15, no. 4, 2012. [4] K. Davila & R. Zanibbi, “Layout and semantics: Combining representations for mathematical formula search,” SIGIR, 2017

10

slide-11
SLIDE 11

Tangent-V Overview

Indexing Pipeline Navigation Pipeline Retrieval Pipeline

11

slide-12
SLIDE 12

Supplementary Lecture Notes ( LaTe )

Input Lecture Notes

Binary Images

Output Math Expressions

12

slide-13
SLIDE 13

[1] Davila, K., Zanibbi, R. Whiteboard Content Summarization via Spatio-Temporal Conflict Minimization in Lecture Videos. ICDAR 2017

Preprocessing Lecture Video Summarization [1]

Temporal Index

Binary Images

Input Lecture Video Output Whiteboard Contents Keyframes

MTS/ MP4 Content Extraction Temporal Segmentation Spatio- temporal Analysis

13

slide-14
SLIDE 14

Lecture Video Navigation from Keyframes

14

slide-15
SLIDE 15

Indexing Pipeline (Overview)

(Videos Only)

Temporal Index

Raw Data Pre- processing Binary Images

[1] Davila, K., Zanibbi, R. Whiteboard Content Summarization via Spatio-Temporal Conflict Minimization in Lecture Videos. ICDAR 2017

AccessMath Lecture Video Summarization [1]

15

slide-16
SLIDE 16

Indexing Pipeline (Overview)

(Videos Only)

Temporal Index

Raw Data Pre- processing Spatial Index Binary Images LOS Graph Construction Spatial Index Construction

[1] Davila, K., Zanibbi, R. Whiteboard Content Summarization via Spatio-Temporal Conflict Minimization in Lecture Videos. ICDAR 2017

Tangent-V AccessMath Lecture Video Summarization [1]

16

slide-17
SLIDE 17

Line of Sight (LOS) Graphs

Uses Connected Components (CC) as Nodes Two nodes are connected if

  • One can see the other
  • Max. distance factor considered for whiteboard content (2 times median size)

17

slide-18
SLIDE 18

True Node Labels/Relationships are unknown

  • After Symbol Recognition, each Node has top k labels with probabilities
  • 𝑞 𝝏|𝑡𝑦

𝝏∈Ω

≥ 80% 𝑙 ≤ 10

  • Edges have 3D unit vectors indicating direction

Line of Sight (LOS) Graphs

18

2𝑦 2𝑦 𝑦2

(0.707, 0.707, 0.000) (1.000, 0.000, 0.000) (-0.707, -0.707, 0.000)

𝒚

(0.146, -0.146, 0.978)

slide-19
SLIDE 19

Spatial Indexing using Symbol Pairs

19

𝒒𝒚 - 𝒒(𝝏𝒚|𝒕𝒚) 𝒅 - 3D Unit Vector from 𝒕𝟐 to 𝒕𝟑 𝒕𝒒 - Size Ratio between 𝒕𝟐 and 𝒕𝟑

Tuples Generated 𝛁𝟐 × 𝛁𝟑

𝝏𝟐, 𝝏𝟑, 𝒒𝟐, 𝒒𝟑, 𝒅, 𝒕𝒒

Top k-labels per node 𝛁

𝑇1 = 𝑦 𝑇2 = 8 𝛻1 = (𝑦, 0.8), (𝑌, 0.2) 𝛻2 = (8, 0.6), (&, 0.3) 𝒅 = 𝟏. 𝟖𝟐, −𝟏. 𝟖𝟐, 𝟏. 𝟏𝟏 𝒕𝒒 = 1.26

Inverted Index for Symbol Pairs Entries: Pairs of symbol labels 𝝏𝟐, 𝝏𝟑 Posting lists: Pair locations in images with 𝑱𝑬,𝒒𝟐, 𝒒𝟑, 𝒅, 𝒕𝒒 𝒅𝟐, 𝒅𝟑

slide-20
SLIDE 20

Tangent-V Overview

Indexing of Videos/Notes Data Indexing Pipeline Navigation Pipeline Retrieval Pipeline Spatial Index

Temporal Index

20

slide-21
SLIDE 21

Tangent-V Retrieval Model

Spatial Index Search Results Initial Lookup Pre- processing Query Image Query Graph Structural Alignment Layer 1 Layer 2

21

slide-22
SLIDE 22

Layer 1: Initial Lookup

Query symbol pairs are used to find matches on their corresponding entries on the inverted index structure A match between index symbol pair 𝑄𝑑 = (𝑑1, 𝑑2) and query pair 𝑄𝑟 = (𝑟1, 𝑟2) will be accepted as valid if and only if: 1 - They are spatially consistent:

𝒅 ⋅ 𝒓 ≥ cos 45∘

2 - Optionally, if they have consistent size ratios (not too small/large)

Matching Pairs Scores are then aggregated by unique Graph Pair IDs

22

slide-23
SLIDE 23

Layer 2: Structural Alignment

23

Matching Pairs Matching Subgraphs

slide-24
SLIDE 24

Layer 2: Structural Alignment

24

Greedy Match Growing

Matching Pairs Matching Subgraphs

X + Y Match 1 X + Y Match 2 X + Y New Match X + Y Query

+ =

Score= 0.7 Score= 0.5 Score= 1.2

slide-25
SLIDE 25

Layer 2: Structural Alignment

25

Greedy Match Growing Greedy Match Connection

Matching Pairs Matching Subgraphs

+ =

X + Y Query = Match 1

Score= 0.5

X + 1 = Match 2

Score= 0.4

X + 1 = New Match

Score= 0.9

X + 1 =

slide-26
SLIDE 26

Layer 2: Structural Alignment

26

Greedy Match Growing Greedy Match Connection Incompatible Match Removal

Matching Pairs Matching Subgraphs

Score= 5.0 Score= 0.5

X + X Query + 1 2 Match 1 X + X + 1 2 Match 2 X + X + 1 2

Accepted Removed

slide-27
SLIDE 27

Layer 2: Structural Alignment

27

Greedy Match Growing Greedy Match Connection Incompatible Match Removal Match Grouping

Matching Pairs Matching Subgraphs

Query: Lecture 01 – KF #5 Lecture 01 – KF #6 Same match!

slide-28
SLIDE 28

Match Scoring and Ranking

We introduce two scoring schemes: α and h

28

Item 𝜷 𝑵 𝒊 𝑵 Description A weighted edge recall Harmonic mean of weighted edge recall and node recall Edge weighting pair-wise symbol alignments and scaled cosine similarity scaled cosine similarity Node weighting

  • Individual symbol alignments

Based on

  • Maximum Subtree Similarity (MSS) [1]

Execution Times Faster Slower

[1] R. Zanibbi, K. Davila, A. Kane, & F. Tompa, “Multi-stage math formula search: Using appearance-based similarity metrics at scale,” SIGIR, 2016

slide-29
SLIDE 29

Tangent-V Overview

Retrieval System Data Search Results Indexing Pipeline Navigation Pipeline Retrieval Pipeline Query Spatial Index

Temporal Index

29

slide-30
SLIDE 30

Tangent-V Overview

Video Navigation Data Search Results Indexing Pipeline Navigation Pipeline Retrieval Pipeline Query Spatial Index

Temporal Index

30

slide-31
SLIDE 31

Lecture Video Navigation from Search Results

Check our demo at: https://youtu.be/gn24qo1MLN0

31

slide-32
SLIDE 32

Experiments

AccessMath Dataset

  • 13 Lecture videos with supplementary notes

A total of 20 evaluation queries were chosen with rejection sampling A total of 4 combinations of Query-vs-Index modalities

  • Handwritten expressions
  • Typeset expressions

For a given query, the target is to find a math expression that contains the whole query graph

  • query is same expression
  • query is sub-expression

32

slide-33
SLIDE 33

Evaluation Metrics

Two metrics are considered

  • Recall @ 10: Target found @ rank ≤ 10
  • MRR @ 10: Mean of Reciprocal Rank (RR), with

𝑆𝑆 = 1 𝑠 1 ≤ 𝑠 ≤ 10 𝑝𝑢ℎ𝑓𝑠𝑥𝑗𝑡𝑓

33

slide-34
SLIDE 34

Results: Recall @ 10

Weighted Edge Recall 𝜷 Harmonic Mean h Query Index

𝜷 𝜷∧ 𝜷∧𝒕 𝒊 𝒊∧ 𝒊∧𝒕

LaTeX 1.00 1.00 1.00 1.00 1.00 1.00 Whiteboard 0.95 1.00 1.00 1.00 1.00 1.00 Whiteboard 0.95 0.95 0.90 0.95 1.00 0.95 Whiteboard LaTeX 0.80 0.85 0.85 0.90 0.90 0.90

34

slide-35
SLIDE 35

Results: MRR @ 10

Weighted Edge Recall 𝜷 Harmonic Mean h Query Index

𝜷 𝜷∧ 𝜷∧𝒕 𝒊 𝒊∧ 𝒊∧𝒕

LaTeX 0.98 1.00 1.00 0.98 1.00 1.00 Whiteboard 0.93 1.00 1.00 1.00 1.00 1.00 Whiteboard 0.66 0.69 0.71 0.89 0.84 0.86 Whiteboard LaTeX 0.63 0.71 0.74 0.74 0.78 0.84

35

slide-36
SLIDE 36

Conclusions

Tangent-V is effective for search between Typeset and Handwriting

  • Multiple labels help finding targets when recognition accuracy is low

Tangent-V can also be used to create navigational tools New symbol recognizers can be used for indexing of new domains

  • Code is released for others to try on new domains (http://cs.rit.edu/~dprl/Software.html)

Future work:

  • Test unsupervised symbol classification
  • Explore Vector formats
  • Speed-up search

36

slide-37
SLIDE 37

Thank You!

This material is based upon work supported by the National Science Foundation (USA) under Grants No. IIS-1016815 and HCC-1218801. We also thank Anurag Agarwal for helping in the creation of the lecture videos used to evaluate our system. Source code: www.cs.rit.edu/~dprl/Software.html

37