Research Platform for Old Indo-Aryan Texts Brge Kiss (IDH), Daniel - - PowerPoint PPT Presentation

research platform for old indo aryan texts
SMART_READER_LITE
LIVE PREVIEW

Research Platform for Old Indo-Aryan Texts Brge Kiss (IDH), Daniel - - PowerPoint PPT Presentation

It Takes a Village: Co-developing VedaWeb , a Digital Research Platform for Old Indo-Aryan Texts Brge Kiss (IDH), Daniel Klligan (HVS), Francisco Mondaca (CCeH), Claes Neuefeind (IDH), Uta Reinhl (ASW), Patrick Sahle (CCeH) 05.03.2019


slide-1
SLIDE 1

It Takes a Village: Co-developing VedaWeb, a Digital Research Platform for Old Indo-Aryan Texts

Börge Kiss (IDH), Daniel Kölligan (HVS), Francisco Mondaca (CCeH), Claes Neuefeind (IDH), Uta Reinöhl (ASW), Patrick Sahle (CCeH)

05.03.2019

slide-2
SLIDE 2

Research Goals

Traditional research with large corpora

  • concordances / word indexes, lexica: make usage

patterns and frequencies visible

  • determination of meanings, functions, syntactic

patterns based on researchers' individual assessments and their "reading experience"

  • problems: rather intuitive, subjective; the more texts,

the more intractable

slide-3
SLIDE 3

Research Goals

  • nline platform allowing combined searches of (1)

lexical, (2) morphological, (3) metrical and (4) syntactic information, e.g.

  • (1): lexical fields: differences between words for x, e.g.

'man/woman' [Kazzazi 2001]; 'light' [Roesler 1997] etc.

  • (2): use/distribution/functional difference of allomorphs:

e.g. áśv-a- ʻhorseʼ, nom.pl. áśvās / áśvāsas ‘horses’

  • (http://ifl.phil-fak.uni-koeln.de/36486.html?&L=1)
  • (3): position of forms in verse; word-shapes
  • (4): information structure (topic/focus)
slide-4
SLIDE 4

Background

Rigveda

  • ldest text of Indo-Aryan, part of Indo-European

language family, ca. 1300 / 1000 BC

  • ca. 160.000 words (in 1028 hymns grouped into 10

books = "mandalas"); cf. Homer's Iliad + Odyssey =

  • ca. 190.000 words
  • hymns to gods (Indra, Soma, Varuna, Mitra, …) recited

mostly during Soma sacrifice (juice of intoxicating plant) Further texts to be integrated: Atharvaveda (c. 170.000 words), Yajurveda; Vedic prose: Aitareya Brahmana (c. 100.000 words), Maitrayani Samhita (c. 120.000 words)

slide-5
SLIDE 5

Background

Data

  • morphology
  • annotation provided by Prof. G. Dunkel, Prof. P.

Widmer et al., University of Zurich

  • metre
  • Prof. K. Ryan, University of Harvard
  • syntax
  • Prof. H. Hettrich (University of Würzburg), Dr. O.

Hellwig (University of Düsseldorf);

  • Dr. U. Reinöhl (University of Cologne/Mainz) using

GRAID (Grammatical Relations and Animacy in Discourse)

slide-6
SLIDE 6

Team

CCeH/DCH

  • Apl. Prof. Dr. Patrick Sahle, P.I.

Francisco Mondaca, M.A. Jonathan Blumtritt, M.A. Martina Gödel, M.A. IDH - Spinfo

  • Dr. Claes Neuefeind, P.I.

Börge Kiss, M.A. ASW/HVS PD Dr. Daniel Kölligan, P.I.

  • Dr. Uta Reinöhl , P.I.

Jakob Halfmann Natalie Korobzow Felix Rau, M.A.

slide-7
SLIDE 7

Co-operation partners

  • Prof. Dr. Paul Widmer, Universität Zürich
  • Dr. Salvatore Scarlata, Universität Zürich
  • Prof. Dr. Kevin Ryan, University of Harvard
  • Dr. Dieter Gunkel, University of Richmond
  • Prof. Dr. Laurent Romary, Inria/HU Berlin, TEI
  • Prof. Dr. Nikolaus P. Himmelmann, Universität zu Köln
slide-8
SLIDE 8

VedaWeb: A digital platform for working with Old Indic texts

  • make available RV + translations + morphological

glossings for view & export

  • connecting all word-forms of the annotated RV with

the corresponding lexical entries in Grassmann, Böhtlingk / Roth, Monier Williams and vice versa

  • allowing combinatorial searches of lemmas, word-

forms, morphological and metrical information via cascading search index

slide-9
SLIDE 9

State of the Art

  • revisions & additions of Zurich glossings
  • development of data model and APIs for dictionaries

(Francisco Mondaca)

  • development of web application (Börge Kiss)
  • integration of further resources
slide-10
SLIDE 10

Morphological Glossings (Zurich)

slide-11
SLIDE 11

Translations: German, English, French, Latin, Russian…

slide-12
SLIDE 12

Workflow

slide-13
SLIDE 13

TEI - Modelling

  • Appropriate data model is of central importance for

consistence, transfer, persistence and presentation

  • TEI (Text Encoding Initiative) offers the best way for

textual data to persist in time, due to its active community of scholars and a detailed documentation. It’s the de facto standard in Digital Humanities projects.

  • modelling of texts (RV, translations) and dictionaries

(Grassmann; Vedic Index of Names and Subjects)

slide-14
SLIDE 14

Software Architecture

slide-15
SLIDE 15

VedaWeb App

  • http://vedaweb.uni-koeln.de
slide-16
SLIDE 16

Cooperation within the project

  • not traditional "chasm" between IT and humanities

people, but rather different ranges of competences and

  • verlapping responsibilities:
  • "family constellation"
slide-17
SLIDE 17

Cooperation within the project

  • overlap of competence areas makes project feasible
  • regular communication
  • close feedback loops
  • gitlab, issue tracking system
  • regular team meetings (once a month)
slide-18
SLIDE 18

simple and challenging issues

  • different expectations of what is easy and difficult to

implement, e.g.

  • multiple, combinable full-text search
  • search functions over diversely structured sets of data
  • complex structure of the base text:
  • books, hymns, verses, half-verses
  • different counting systems (by books, by hymns)
  • different text versions (editions; lemmas and annotations;

"padapatha")

slide-19
SLIDE 19

learning from each other

  • for linguists:
  • insights into opportunities provided by digital research platforms
  • getting to know affordances of data for building an online

platform and ensure data longevity (TEI)

  • for technical researchers:
  • complexity of ancient texts (internal structure, variation, different

layers of form and meaning)

  • interests of linguists and other humanities scholars in the data
  • both:
  • make one's terminology explicit and clear
  • make the data consistent
slide-20
SLIDE 20

improved collaboration

  • general understanding
  • for DH researchers:
  • of the objects studied in various humanities disciplines and the

relevant research questions and methods

  • for humanities scholars:
  • of the different fields and methods in DH (e.g. building a web-

platform vs data modelling in TEI)

slide-21
SLIDE 21

Future plans: next version

  • metrical data (D. Gunkel/K. Ryan)
  • audio & video:
  • some recordings of A. Daniélou available
  • complete recording of RV in Copenhagen - not really available
  • http://www.kb.dk/en/nb/samling/os/Sydasien/veda.html
  • texts: Atharvaveda, Maitrayani Samhita
  • annotation layers / user accounts: GRAID etc.
  • semantic search … (Semantic Web)
slide-22
SLIDE 22

C-SALT : Cologne South Asian Languages and Texts http://c-salt.uni-koeln.de/

  • overview of projects and digital resources related to

South Asian languages, texts, and culture at the University of Cologne (TEI Sanskrit dictionaries, Pali dictionary…)

  • C-SALT coordinates the activity of these projects and

facilitates sustainable development of the diverse resources.

  • further plans:
  • Iranian (Avestan corpus + annotation; digital version of

Bartholomae's dictionary; Middle Persian texts)

  • Nuristani (A. Degener [Mainz]: Kalasha-Ala, Prasun)
slide-23
SLIDE 23

धन्रवाद Thank you!