Research Platform for Old Indo-Aryan Texts Brge Kiss (IDH), Daniel - - PowerPoint PPT Presentation

Research Goals

Traditional research with large corpora

concordances / word indexes, lexica: make usage

patterns and frequencies visible

determination of meanings, functions, syntactic

patterns based on researchers' individual assessments and their "reading experience"

problems: rather intuitive, subjective; the more texts,

the more intractable

SLIDE 3

Research Goals

nline platform allowing combined searches of (1)

lexical, (2) morphological, (3) metrical and (4) syntactic information, e.g.

(1): lexical fields: differences between words for x, e.g.

'man/woman' [Kazzazi 2001]; 'light' [Roesler 1997] etc.

(2): use/distribution/functional difference of allomorphs:

e.g. áśv-a- ʻhorseʼ, nom.pl. áśvās / áśvāsas ‘horses’

(http://ifl.phil-fak.uni-koeln.de/36486.html?&L=1)
(3): position of forms in verse; word-shapes
(4): information structure (topic/focus)

SLIDE 4

Background

Rigveda

ldest text of Indo-Aryan, part of Indo-European

language family, ca. 1300 / 1000 BC

ca. 160.000 words (in 1028 hymns grouped into 10

books = "mandalas"); cf. Homer's Iliad + Odyssey =

ca. 190.000 words
hymns to gods (Indra, Soma, Varuna, Mitra, …) recited

mostly during Soma sacrifice (juice of intoxicating plant) Further texts to be integrated: Atharvaveda (c. 170.000 words), Yajurveda; Vedic prose: Aitareya Brahmana (c. 100.000 words), Maitrayani Samhita (c. 120.000 words)

SLIDE 5

Background

Data

morphology
annotation provided by Prof. G. Dunkel, Prof. P.

Widmer et al., University of Zurich

metre
Prof. K. Ryan, University of Harvard
syntax
Prof. H. Hettrich (University of Würzburg), Dr. O.

Hellwig (University of Düsseldorf);

Dr. U. Reinöhl (University of Cologne/Mainz) using

GRAID (Grammatical Relations and Animacy in Discourse)

SLIDE 6

Team

CCeH/DCH

Apl. Prof. Dr. Patrick Sahle, P.I.

Francisco Mondaca, M.A. Jonathan Blumtritt, M.A. Martina Gödel, M.A. IDH - Spinfo

Dr. Claes Neuefeind, P.I.

Börge Kiss, M.A. ASW/HVS PD Dr. Daniel Kölligan, P.I.

Dr. Uta Reinöhl , P.I.

Jakob Halfmann Natalie Korobzow Felix Rau, M.A.

SLIDE 7

Co-operation partners

Prof. Dr. Paul Widmer, Universität Zürich
Dr. Salvatore Scarlata, Universität Zürich
Prof. Dr. Kevin Ryan, University of Harvard
Dr. Dieter Gunkel, University of Richmond
Prof. Dr. Laurent Romary, Inria/HU Berlin, TEI
Prof. Dr. Nikolaus P. Himmelmann, Universität zu Köln

SLIDE 8

VedaWeb: A digital platform for working with Old Indic texts

make available RV + translations + morphological

glossings for view & export

connecting all word-forms of the annotated RV with

the corresponding lexical entries in Grassmann, Böhtlingk / Roth, Monier Williams and vice versa

allowing combinatorial searches of lemmas, word-

forms, morphological and metrical information via cascading search index

SLIDE 9

State of the Art

revisions & additions of Zurich glossings
development of data model and APIs for dictionaries

(Francisco Mondaca)

development of web application (Börge Kiss)
integration of further resources

SLIDE 10

Morphological Glossings (Zurich)

SLIDE 11

Translations: German, English, French, Latin, Russian…

SLIDE 12

Workflow

SLIDE 13

TEI - Modelling

Appropriate data model is of central importance for

consistence, transfer, persistence and presentation

TEI (Text Encoding Initiative) offers the best way for

textual data to persist in time, due to its active community of scholars and a detailed documentation. It’s the de facto standard in Digital Humanities projects.

modelling of texts (RV, translations) and dictionaries

(Grassmann; Vedic Index of Names and Subjects)

SLIDE 14

Software Architecture

SLIDE 15

VedaWeb App

http://vedaweb.uni-koeln.de

SLIDE 16

Cooperation within the project

not traditional "chasm" between IT and humanities

people, but rather different ranges of competences and

verlapping responsibilities:
"family constellation"

SLIDE 17

Cooperation within the project

overlap of competence areas makes project feasible
regular communication
close feedback loops
gitlab, issue tracking system
regular team meetings (once a month)

SLIDE 18

simple and challenging issues

different expectations of what is easy and difficult to

implement, e.g.

multiple, combinable full-text search
search functions over diversely structured sets of data
complex structure of the base text:
books, hymns, verses, half-verses
different counting systems (by books, by hymns)
different text versions (editions; lemmas and annotations;

"padapatha")

SLIDE 19

learning from each other

for linguists:
insights into opportunities provided by digital research platforms
getting to know affordances of data for building an online

platform and ensure data longevity (TEI)

for technical researchers:
complexity of ancient texts (internal structure, variation, different

layers of form and meaning)

interests of linguists and other humanities scholars in the data
both:
make one's terminology explicit and clear
make the data consistent

SLIDE 20

improved collaboration

general understanding
for DH researchers:
of the objects studied in various humanities disciplines and the

relevant research questions and methods

for humanities scholars:
of the different fields and methods in DH (e.g. building a web-

platform vs data modelling in TEI)

SLIDE 21

Future plans: next version

metrical data (D. Gunkel/K. Ryan)
audio & video:
some recordings of A. Daniélou available
complete recording of RV in Copenhagen - not really available
http://www.kb.dk/en/nb/samling/os/Sydasien/veda.html
texts: Atharvaveda, Maitrayani Samhita
annotation layers / user accounts: GRAID etc.
semantic search … (Semantic Web)

SLIDE 22

C-SALT : Cologne South Asian Languages and Texts http://c-salt.uni-koeln.de/

overview of projects and digital resources related to

South Asian languages, texts, and culture at the University of Cologne (TEI Sanskrit dictionaries, Pali dictionary…)

C-SALT coordinates the activity of these projects and

facilitates sustainable development of the diverse resources.

further plans:
Iranian (Avestan corpus + annotation; digital version of

Bartholomae's dictionary; Middle Persian texts)

Nuristani (A. Degener [Mainz]: Kalasha-Ala, Prasun)

SLIDE 23

It Takes a Village: Co-developing VedaWeb, a Digital Research Platform for Old Indo-Aryan Texts

Research Goals

Traditional research with large corpora

patterns and frequencies visible

patterns based on researchers' individual assessments and their "reading experience"

the more intractable

Research Goals

lexical, (2) morphological, (3) metrical and (4) syntactic information, e.g.

'man/woman' [Kazzazi 2001]; 'light' [Roesler 1997] etc.

e.g. áśv-a- ʻhorseʼ, nom.pl. áśvās / áśvāsas ‘horses’

Background

Rigveda

language family, ca. 1300 / 1000 BC

books = "mandalas"); cf. Homer's Iliad + Odyssey =

mostly during Soma sacrifice (juice of intoxicating plant) Further texts to be integrated: Atharvaveda (c. 170.000 words), Yajurveda; Vedic prose: Aitareya Brahmana (c. 100.000 words), Maitrayani Samhita (c. 120.000 words)

Background

Data

Widmer et al., University of Zurich

Hellwig (University of Düsseldorf);

GRAID (Grammatical Relations and Animacy in Discourse)

Team

CCeH/DCH

Francisco Mondaca, M.A. Jonathan Blumtritt, M.A. Martina Gödel, M.A. IDH - Spinfo

Börge Kiss, M.A. ASW/HVS PD Dr. Daniel Kölligan, P.I.

Jakob Halfmann Natalie Korobzow Felix Rau, M.A.

Co-operation partners

VedaWeb: A digital platform for working with Old Indic texts

glossings for view & export

the corresponding lexical entries in Grassmann, Böhtlingk / Roth, Monier Williams and vice versa

forms, morphological and metrical information via cascading search index

State of the Art

(Francisco Mondaca)

Morphological Glossings (Zurich)

Translations: German, English, French, Latin, Russian…

Workflow

TEI - Modelling

consistence, transfer, persistence and presentation

textual data to persist in time, due to its active community of scholars and a detailed documentation. It’s the de facto standard in Digital Humanities projects.

(Grassmann; Vedic Index of Names and Subjects)

Software Architecture

VedaWeb App

Cooperation within the project

people, but rather different ranges of competences and

Cooperation within the project

simple and challenging issues

implement, e.g.

"padapatha")

learning from each other

platform and ensure data longevity (TEI)

layers of form and meaning)

improved collaboration

relevant research questions and methods

platform vs data modelling in TEI)

Future plans: next version

C-SALT : Cologne South Asian Languages and Texts http://c-salt.uni-koeln.de/

South Asian languages, texts, and culture at the University of Cologne (TEI Sanskrit dictionaries, Pali dictionary…)

facilitates sustainable development of the diverse resources.

Bartholomae's dictionary; Middle Persian texts)

धन्रवाद Thank you!