accessibility issues in digital mathema cal libraries
play

Accessibility Issues in Digital Mathemacal Libraries Petr Sojka, - PowerPoint PPT Presentation

Accessibility Issues in Digital Mathemacal Libraries Petr Sojka, Michal Rika, Maro Kucbel, and Marn Jarmar Masaryk University, Faculty of Informacs, Brno, Czech Republic <sojka@fi.muni.cz>, {mruzicka, kocka,


  1. Accessibility Issues in Digital Mathema�cal Libraries Petr Sojka, Michal Růžička, Maroš Kucbel, and Mar�n Jarmar Masaryk University, Faculty of Informa�cs, Brno, Czech Republic <sojka@fi.muni.cz>, {mruzicka, kocka, 172981}@mail.muni.cz Universal Learning Design, Brno, 13th February 2013

  2. Introduc�on PDF Processing MathML Processing Summary . . . . . . . . . . . . . . . . . . . . . . Outline 1 Introduc�on . . 2 PDF Processing . . 3 MathML Processing . . 4 Summary . . Accessibility Issues in Digital Mathema�cal Libraries Universal Learning Design, Brno, 13th February 2013

  3. Introduc�on PDF Processing MathML Processing Summary . . . . . . . . . . . . . . . . . . . . . . Introduc�on • Digital mathema�cs libraries are on the rise. • The European Digital Mathema�cs Library (EuDML, <h�ps://eudml.org/>). • The Czech Digital Mathema�cs Library (DML-CZ, <h�p://dml.cz/>). • Serves not only metadata but also full texts with mathema�cal formulae. • PDF. • MathML. • *T EX. Accessibility Issues in Digital Mathema�cal Libraries Universal Learning Design, Brno, 13th February 2013

  4. Introduc�on PDF Processing MathML Processing Summary . . . . . . . . . . . . . . . . . . . . . . PDF, T EX/L A T EX, MathML • Thanks to pdfT EX, PDF is the de facto standard output format of the modern T EX distribu�ons. • L A T EX mathema�cal nota�on is well known and effec�ve. • Used not only in L A T EX documents but also in a variety of other projects such as Wikipedia. • L A T EX source code is usually a good choice for plain text representa�on of mathema�cal expressions. • MathML is o�en used as both machine and human readable language for describing mathema�cal nota�ons. Accessibility Issues in Digital Mathema�cal Libraries Universal Learning Design, Brno, 13th February 2013

  5. Introduc�on PDF Processing MathML Processing Summary . . . . . . . . . . . . . . . . . . . . . . PDF Processing 1 Introduc�on . . 2 PDF Processing . . PDF Processing PDF Enhancement 3 MathML Processing . . 4 Summary . . Accessibility Issues in Digital Mathema�cal Libraries Universal Learning Design, Brno, 13th February 2013

  6. Introduc�on PDF Processing MathML Processing Summary . . . . . . . . . . . . . . . . . . . . . . MaxTract • A command line tool that reads a PDF and returns various types of enriched output. • L A T EX for use with Tralics. • L A T EX for layered PDF with L A T EX and text layers. • L A T EX for annotated PDF with L A T EX annota�ons. • A simple text file. • A text file with math in L A T EX. • Under development by the Scien�fic Document Analysis Group at School of Computer Science, University of Birmingham, UK. • Homepage: <h�p://www.cs.bham.ac.uk/research/groupings/reasoning/sdag/maxtract.php> Accessibility Issues in Digital Mathema�cal Libraries Universal Learning Design, Brno, 13th February 2013

  7. Introduc�on PDF Processing MathML Processing Summary . . . . . . . . . . . . . . . . . . . . . . MaxTract (cont.) • For successful analysis, the PDF file must make sole use of Type 1 fonts with embedded encodings. • MaxTract is wri�en in OCaml and uses the pd�k for decompressing PDF files. Accessibility Issues in Digital Mathema�cal Libraries Universal Learning Design, Brno, 13th February 2013

  8. Introduc�on PDF Processing MathML Processing Summary . . . . . . . . . . . . . . . . . . . . . . In�yReader OCR • Old documents are o�en available in paper form only. • It is necessary to scan them and process by Op�cal Character Recogni�on (OCR) so�ware. • In�yReader OCR so�ware has unique feature of recogni�on of mathema�cal expressions in scanned documents. • In�yReader is part of the In�yProject (<h�p://www.in�yproject.org/>) under development by Masakazu Suzuki’s research and development group in Japan. Accessibility Issues in Digital Mathema�cal Libraries Universal Learning Design, Brno, 13th February 2013

  9. Introduc�on PDF Processing MathML Processing Summary . . . . . . . . . . . . . . . . . . . . . . In�yReader OCR (cont.) • In�yReader inputs and output various formats. input TIFF, BMP, GIF, PNG, PDF. output L A T EX, XHTML+MathML, various XML formats. • Quality and resolu�on of scans is crucial. Accessibility Issues in Digital Mathema�cal Libraries Universal Learning Design, Brno, 13th February 2013

  10. Introduc�on PDF Processing MathML Processing Summary . . . . . . . . . . . . . . . . . . . . . . CopyMath • The ActualText command of the PDF language is used to mark the region of the mathema�cal expression inside the PDF document. • We want the package to be as user friendly as possible – users should not be forced to modify their mathema�cal expressions in any way, \usepackage{copymath} should cater for all their needs. • The implementa�on is not easy. • This requires nonstandard modifica�ons of the L A T EX mathema�cal environments. Accessibility Issues in Digital Mathema�cal Libraries Universal Learning Design, Brno, 13th February 2013

  11. Introduc�on PDF Processing MathML Processing Summary . . . . . . . . . . . . . . . . . . . . . . Standard PDF document L A T EX source code: Text $\Pi(x) = \pi(x) + \frac{1}{2}\pi(x^{1/2}) + \frac{1}{3}\pi(x^{1/3}) + \cdots$ text. Accessibility Issues in Digital Mathema�cal Libraries Universal Learning Design, Brno, 13th February 2013

  12. Introduc�on PDF Processing MathML Processing Summary . . . . . . . . . . . . . . . . . . . . . . Standard PDF document PDF code: BT /F16 9.9626 Tf 148.712 707.125 Td [(T)83(ext)]TJ/F17 9.9626 Tf 23.247 0 Td [(\005\050)]TJ/F20 9.9626 Tf 11.346 0 Td [(x)]TJ/F17 9.9626 Tf 5.694 0 Td [(\051)-278(=)]TJ/F20 9.9626 Tf 17.158 0 Td [(\031)]TJ/F17 9.9626 Tf 6.036 0 Td [(\050)]TJ/F20 9.9626 Tf 3.875 0 Td [(x)]TJ/F17 9.9626 Tf 5.694 0 Td [(\051)-222(+)]TJ/F18 6.9738 Tf 17.247 3.923 Td [(1)]TJ ET Accessibility Issues in Digital Mathema�cal Libraries Universal Learning Design, Brno, 13th February 2013

  13. Introduc�on PDF Processing MathML Processing Summary . . . . . . . . . . . . . . . . . . . . . . Standard PDF document Text obtained using Copy & Paste func�on of PDF reader: Text Π( 𝑦 ) = 𝜌 ( 𝑦 ) + 1 2 𝜌 ( 𝑦 1/2) + 1 3 𝜌 ( 𝑦 1/3) + · · · text. Accessibility Issues in Digital Mathema�cal Libraries Universal Learning Design, Brno, 13th February 2013

  14. Introduc�on PDF Processing MathML Processing Summary . . . . . . . . . . . . . . . . . . . . . . CopyMath-enabled PDF document L A T EX source code: Text $\Pi(x) = \pi(x) + \frac{1}{2}\pi(x^{1/2}) + \frac{1}{3}\pi(x^{1/3}) + \cdots$ text. Accessibility Issues in Digital Mathema�cal Libraries Universal Learning Design, Brno, 13th February 2013

  15. Introduc�on PDF Processing MathML Processing Summary . . . . . . . . . . . . . . . . . . . . . . CopyMath-enabled PDF document PDF code: BT /F16 9.9626 Tf 148.712 707.125 Td [(T)83(ext)]TJ ET 1 0 0 1 171.959 707.125 cm /Span << /ActualText<245C506920287829203D205C706920287829202B205C66726163207B317D7B32 7D5C70692028785E7B312F327D29202B205C66726163207B317D7B337D5C70692028785E7B31 2F337D29202B205C63646F74732024> >> BDC 1 0 0 1 -171.959 -707.125 cm BT /F17 9.9626 Tf 171.959 707.125 Td [(\005\050)]TJ/F20 9.9626 Tf 11.346 0 Td [(x)]TJ/F17 9.9626 Tf 5.694 0 Td [(\051)-278(=)]TJ/F20 9.9626 Tf 17.158 0 Td [(\031)]TJ/F17 9.9626 Tf 6.036 0 Td [(\050)]TJ/F20 9.9626 Tf 3.875 0 Td [(x)]TJ/F17 9.9626 Tf 5.694 0 Td [(\051)-222(+)]TJ/F18 6.9738 Tf 17.247 3.923 Td [(1)]TJ ET Accessibility Issues in Digital Mathema�cal Libraries Universal Learning Design, Brno, 13th February 2013

  16. Introduc�on PDF Processing MathML Processing Summary . . . . . . . . . . . . . . . . . . . . . . CopyMath-enabled PDF document Text obtained using Copy & Paste func�on of PDF reader: Text $\Pi (x) = \pi (x) + \frac {1}{2}\pi (x^{1/2}) + \frac {1}{3}\pi (x^{1/3}) + \cdots $ text. Accessibility Issues in Digital Mathema�cal Libraries Universal Learning Design, Brno, 13th February 2013

  17. Introduc�on PDF Processing MathML Processing Summary . . . . . . . . . . . . . . . . . . . . . . CopyMath Implementa�on • We need to add \pdfliteral at the beginning and end of every mathema�cal environment. • The dollar sign ($) is ac�vated and redefined. • It is necessary to keep track of nested mathema�cal environments. • Simple redefini�on of A MS -L A T EX mathema�cal environments is not possible. • S�ll experimental. Accessibility Issues in Digital Mathema�cal Libraries Universal Learning Design, Brno, 13th February 2013

  18. Introduc�on PDF Processing MathML Processing Summary . . . . . . . . . . . . . . . . . . . . . . MathML Processing 1 Introduc�on . . 2 PDF Processing . . 3 MathML Processing . . Making Maths Accessible MathML Processing 4 Summary . . Accessibility Issues in Digital Mathema�cal Libraries Universal Learning Design, Brno, 13th February 2013

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend