Logical Structure Analysis of Scientific Publications in Mathematics
Valery Solovyev, Nikita Zhiltsov
Kazan (Volga Region) Federal University, Russia
1 / 44
Logical Structure Analysis of Scientific Publications in Mathematics - - PowerPoint PPT Presentation
Logical Structure Analysis of Scientific Publications in Mathematics Valery Solovyev, Nikita Zhiltsov Kazan (Volga Region) Federal University, Russia 1 / 44 Overview LOD Cloud has been growing at 200-300% per year since 2007
1 / 44
◮ LOD Cloud has been growing at 200-300%
◮ Prevalent domains: government (43%),
◮ However, it lacks data sets related to
∗C.Bizer et al. State of the Web of Data.
2 / 44
1 Background 2 Proposed Semantic Model 3 Analysis Methods 4 Experiments and Evaluation 5 Prototype
3 / 44
◮ Well-structured documents ◮ The presence of mathematical formulae ◮ Peculiar vocabulary (“mathematical
4 / 44
◮ Specification of the document logical structure ◮ Methods for extracting structural elements
◮ A large corpus of semantically annotated papers ◮ Semantic search of mathematical papers
5 / 44
◮ LaT
◮ Sections:
6 / 44
◮ LaT
◮ Three ontologies:
7 / 44
8 / 44
◮ Languages for formalized mathematics
◮ Semiformal math languages
◮ Presentation/authoring formats
A
9 / 44
10 / 44
◮ arXMLiv format
◮ Present work
11 / 44
1 Background 2 Proposed Semantic Model 3 Analysis Methods 4 Experiments and Evaluation 5 Prototype
12 / 44
13 / 44
◮ It is an ontology that captures the structural layout
◮ The segment represents the finest level of
◮ Select most frequent segments from sample
◮ Consider synonyms as one concept (e.g. conjecture
14 / 44
◮ Select basic semantic relations between segments
◮ Integration with SALT Document Ontology classes:
15 / 44
16 / 44
1 Background 2 Proposed Semantic Model 3 Analysis Methods 4 Experiments and Evaluation 5 Prototype
17 / 44
◮ The ontology specifies a controlled vocabulary to
◮ T
18 / 44
19 / 44
20 / 44
21 / 44
1 Elicit a LaT
2 Associate it with a string that may be
3 Filter out standard formatting environments (e.g.
4 Compute string similarity between a string and
5 Check if the found most similar concept is
22 / 44
◮ “By applying Lemma 1, we obtain ...” (dependsOn) ◮ “Theorem 2 provides an explicit algorithm ...”
23 / 44
1 Given a segment S; split its text into sentences,
2 Referential sentences are ones that contain the \ref
3 For each sentence:
24 / 44
◮ Train a learning model using these features and a
◮ Apply the model to classify new induced relations
25 / 44
26 / 44
1 Seek a segment of one of these types 2 Find its segments-predecessors 3 Filter out segments of inappropriate types 4 Return the closest predecessor
27 / 44
1 Background 2 Proposed Semantic Model 3 Analysis Methods 4 Experiments and Evaluation 5 Prototype
28 / 44
◮ 1355 papers of the “Izvestiya Vysshikh Uchebnykh
◮ A sample of 1031 papers from arXiv.org
◮ LaT
◮ GATE framework ◮ Weka ◮ Jena
29 / 44
◮ Evaluation on the arXiv sample only ◮ Q-gram string matching algorithm was used ◮ The threshold value was optimized w.r.t. F1-score
30 / 44
◮ Evaluation on the both entire collections (“Izvestiya”
◮ Equations are most ubiquitous segments (52% and
◮ The ontology covers types of 91.9% and 91.6% of
31 / 44
0% 5% 10% 15% 20% 25% 30% Theorem Proof Lemma Remark Corollary Definition Proposition Example
Claim Conjecture Percentage of segment occurrences
Izvestiya arXiv
32 / 44
◮ A paper contains 51.4 (Izvestiya) and 53.9 (arXiv)
◮ 243 referential sentences were randomly selected
◮ 95% were true navigational relations ◮ A decision tree learner (C4.5) was trained ◮ The results were from 10-fold cross validation
33 / 44
34 / 44
◮ Evaluation on the arXiv sample only ◮ 10% of the documents which contain certain
◮ For each such a segment, corresponding relations
◮ Known issues: imported corollaries and examples
35 / 44
◮ The ontology covers the largest part of the logical
◮ The task of segment type recognition has been
◮ The method for recognizing navigational relations
◮ The baseline method for recognizing restricted
36 / 44
1 Background 2 Proposed Semantic Model 3 Analysis Methods 4 Experiments and Evaluation 5 Prototype
37 / 44
◮ demonstrates our ongoing research on
◮ incorporates the logical structure analysis
◮ is integrated with arXiv API ◮ enables enhanced search for arXiv papers
◮ publishes the semantic index as Linked
38 / 44
39 / 44
40 / 44
41 / 44
42 / 44
◮ The proposed approach aims to analyze the
◮ Our ontology provides a controlled
◮ The methods elicit document segments in
◮ The extracted semantic graph can be used
43 / 44
44 / 44