Linguistic Research Infrastructure - LiRI
Linguistic Research Infrastructure
Information event October 11, 2019 LiRI team members
10/18/2019 Title of the presentation, Author Page 1
Linguistic Research Infrastructure Information event October 11, - - PowerPoint PPT Presentation
Linguistic Research Infrastructure - LiRI Linguistic Research Infrastructure Information event October 11, 2019 LiRI team members 10/18/2019 Title of the presentation, Author Page 1 Linguistic Research Infrastructure - LiRI Introduction
Linguistic Research Infrastructure - LiRI
Information event October 11, 2019 LiRI team members
10/18/2019 Title of the presentation, Author Page 1
Linguistic Research Infrastructure - LiRI
Linguistic Research Infrastructure - LiRI
Page 3
Linguistic Research Infrastructure - LiRI
18.10.2019 Seite 4
We build a new laboratory (collection of devices and facilities) for linguistics and language and speech sciences, plus data storage/processing/science via a group of experts (“LiRI staff”) . We aim at
CLARIN, FAIR principles)
excellent digital research infrastructure
Our vision: LiRI as a starting point for mid- and large-scale collaborative national and international third- party funded research projects.
Linguistic Research Infrastructure - LiRI
Research Infrastructures 2021-2024“ Co-application by two overarching linguistic units (ZüKL and URPP „Language and Space“)
internal applications for each funding year plus SNSF applications for larger devices, R‘Equip, by LiRI team)
Page 5
Linguistic Research Infrastructure - LiRI
Linguistic Research Infrastructure - LiRI
Plus ‘LIS team’ until July 2020: cl specialists plus software developer to set up the “Linguistic Information System”, in collaboration with local IT unit S3IT
Linguistic Research Infrastructure - LiRI
eye-tracking devices, software, computing power (3d model)
Videocameras, eye-tracking devices, LENA devices, esp. for fieldwork
EEG systems (stationary and mobile), ABR, NIRS
sound-proof cabins, articulatograph, measurement devices User groups:
Page 8
Linguistic Research Infrastructure - LiRI
plus LiRI Team (Volker Dellwo, Martin Meyer, Wolfgang Kesselheim; coordination: Agnes Kolmer). – Today: Constitution of LiRI SAB: Prof Shanley E.M. Allen, TU Kaiserslautern; Prof Lars Borin, University of Gothenburg; Prof Anne-Lise Giraud, Université de Genève; Prof Stuart Rosen, University College London UCL; Prof Lukas Rosenthaler, University of Basel.b
Riedi): Taras Zakharko, Gerold Schneider, Stefan Vrankovic
development) plus a technician, see website
Page 9
Linguistic Research Infrastructure - LiRI
local UZH technology platform
Page 10
Linguistic Research Infrastructure - LiRI
10/12/19 LiRI Page 1
Linguistic Research Infrastructure - LiRI
10/12/19 LiRI Page 2
Language Development Language Processing Language Production Language Interaction
Linguistic Research Infrastructure - LiRI
10/12/19 LiRI Page 3
Linguistic Research Infrastructure - LiRI
10/12/19 LiRI
Page 4
Linguistic Research Infrastructure - LiRI
10/12/19 LiRI Page 5
Linguistic Research Infrastructure - LiRI
10/12/19 LiRI Page 6
Linguistic Research Infrastructure - LiRI
10/12/19 LiRI Page 7
CRDN
anonymize export to R archive release integrate media media to UZH process media create backup media to CRDN media to FNUNIV record video store media enter metadata report recordings collect metadata integrate ELAN check ELAN integrate media update database process metadata assign TA translation transcription send ELAN corrections assign glosser glossing answer questions send questions integrate TBX corrections check TBX update/send TBX
UZH FNUNIV
weekly monthly yearly
Linguistic Research Infrastructure - LiRI
10/12/19 LiRI
Page 8
Linguistic Research Infrastructure - LiRI
10/12/19 LiRI
Page 9
Linguistic Research Infrastructure - LiRI
LENA technology is standard for measuring talk with children. LENA uses a small wearable audio recorder that is combined with a speech recognition algorithm: it automatically analyzes and segments the audio data in different time frames
10/12/19 LiRI Page 10
Linguistic Research Infrastructure - LiRI
10/12/19 LiRI Page 11
Linguistic Research Infrastructure - LiRI
10/12/19 LiRI Page 12
Linguistic Research Infrastructure - LiRI
10/12/19 LiRI Page 13
Linguistic Research Infrastructure - LiRI
8 x 4K video cameras 8 x 60 Hz ET glasses 4 x ToF cameras
Linguistic Research Infrastructure - LiRI
code switch to the standard variety: enacting the authority of the institution hypothesis about expected events sensory perception assessment of perception
Linguistic Research Infrastructure - LiRI
12.10.19 16
Linguistic Research Infrastructure - LiRI
Kesselheim & Hottiger - Besucherexperimente 17
Linguistic Research Infrastructure - LiRI
Linguistic Research Infrastructure - LiRI
– Hemodynamic approaches (e.g.) are less ideal as they have a poor temporal resolution – Neurophysiological approaches are better able to investigate language functions at the range of milliseconds – Innovative source estimation allows an estimation of neural sources of cognitive and sensory processes
Linguistic Research Infrastructure - LiRI
Linguistic Research Infrastructure - LiRI
Linguistic Research Infrastructure - LiRI
Brain oscillations Acoustic modulation
Linguistic Research Infrastructure - LiRI
Linguistic Research Infrastructure - LiRI
Linguistic Research Infrastructure - LiRI
Homae et al. (2006) Neuroscience Kovelman et al. (2012) NeuroImage
Linguistic Research Infrastructure - LiRI
Understand voice production, acoustics and perception, for example current topics... ... voice processing in the brain. ... relationships between face and voice (e.g. predicting face from a voice). ... individuality in voice and voice recognition. ... encoding of emotion/attitude/attractiveness in voice. Essential - for example - for... ... models of human communication with voice (relation to animal communication) ... voice synthesis, voice recognition ... civil and forensic application in voice analysis
Linguistic Research Infrastructure - LiRI
Measurements of articulatory movements
Linguistic Research Infrastructure - LiRI
Linguistic Research Infrastructure - LiRI
Screening and behavioural testing equipment:
Audio-visual equipment:
Articulatory measurements:
Linguistic Research Infrastructure - LiRI
Linguistic Research Infrastructure - LiRI
Linguistic Research Infrastructure - LiRI
– LiRI Information System assists clients (linguistic researchers) with defining, setting up and managing infrastructure and workflows necessary to do successful research. – Consulting – Managing data storage with backup – Workflow management – Setting up and managing software tools and services – Assistance with practical tasks and (see following example scenarios) – In addition, LIS experts proactively seek out opportunities to improve the state of the art of technical linguistic infrastructure by engaging in synergetic projects: see our workflow project
2
Linguistic Research Infrastructure - LiRI
3
NLP CL ZüKL NLP DS NLP ES R, Praat, my tools SSD @s3it SSD SSD SSD SSD
local HD
SSD SSD
Linguistic Research Infrastructure - LiRI
4
Data Storage
NLP pipes NoSketch R CQP Praat
Linguistic Research Infrastructure - LiRI
– A linguistics professor has collected 100,000 tweets and newspaper articles in her SNF project. She wants to do linguistic processing and is looking for help – LiRI offers, for ger, eng, fre, ita, spa and later other languages: – Tokenisation into words and sentences – Recognition of names (person, organisation, location) – Part-of-Speech Tagging – Access to a web-based KWIC query interface, with CQP or NoSketch – Syntactic Parsing – Treebank Searches
5
Linguistic Research Infrastructure - LiRI
– A group of post-docs is investigating statistical models. Some know a bit of R, others do not. They would like to use a collaborative platform – LiRI offers to linguistic researchers: – Pre-processing – Advice on statistics – A pre-configured shiny R environment with our recommended libraries – interactively explore statistics and visualisations – going beyond the power of individual laptops – Seamless access to databases for data that is larger than memory – Sessions and snapshots are always available
6
Linguistic Research Infrastructure - LiRI
– As the result of a project, a large corpus and extensive statistical data have been collected. The data is too large for the servers of the institute, and it should also be accessible 10 years from now. – LiRI offers: – Storage and backup, in collaboration with s3it, UZH’s central IT services and SWISSUbase. – Advice on which solution fits best. – Help on – which data should be downloadable from the web – licensing issues – which data should be stored to ensure re-producability of published results
7
Linguistic Research Infrastructure - LiRI
– A sociolinguist has conducted interviews and needs to transcribe them. He is unsure which level of transcription to use (phonetic, textual) and which tools can help in the task – LiRI offers: – Advice and practical help with transcription tools like Praat. – Expertise on using automatic transcription tools – Quality Control, storing and archiving of data
8
Linguistic Research Infrastructure - LiRI
– A psycholinguist would like to conduct eyetracking and EEG experiments to test her NLP models and research hypotheses. – LiRI offers: – Use of Eye-trackers and EEG equipment – Collection and management
– Advice in statistical and NLP methods
9
Linguistic Research Infrastructure - LiRI
– A historical linguist is given a large collection of scanned old historic records and books. He would like to scan them and harvest both linguistic change and a glimpse into society from earlier periods – LiRI offers: – Advanced OCR with Transkribus – Mapping of historical spelling variants to present-day varieties, for ger and eng, later other languages. – Interfaces for further manual annotation and correction – Help in statistically exploring the corpus, with tools from Digital Humanities
10
Linguistic Research Infrastructure - LiRI
LiRI 9/25/19
– Large amount of fieldwork materials have to be processed into a corpus by a team of linguists – Open problems: – Work coordination – Issue tracking – Quality control – Reporting and archiving – Proposed solution: an IT-assisted workflow
11
Linguistic Research Infrastructure - LiRI
LiRI 9/25/19 12
Versioned database Annotated data Audiovisual data Task Tracker User User User
Validation Reporting Automatic continuous integration Processing
Manager Manager Archive
Linguistic Research Infrastructure - LiRI
LiRI 9/25/19
– The workflow combines existing tools and services – Version control system (git) – Issue tracker for managing tasks – Contribution management system for controlled contribution – Storage system for audiovisual data – …and custom business logic and reporting components (developed and maintained by LiRI) – Validation (e.g. check if metadata are present and correct, check that the corpus is well formed) – Identification (assign every new version of a resource a unique descriptor and associate it with the version control) – Archiving (prepare/export the data to be archived) – Reporting (describe the state of the corpora for a given point in time)
13
Linguistic Research Infrastructure - LiRI
– A sociologist and a linguist have annotated data together. They realise that they will not be able to annotate as much data as they originally planned. They wonder if automatic approaches can help them. – LiRI offers: – Large suite of machine learning tools – Reports on how accurately machine learning labels the data – Extraction of the most salient characteristics (“features”) that make the model parsimonious and leads to further insights
15
Hauptbibliothek
11 October 2019, Florian Steurer Data Services, Main Library (HBZ)
Hauptbibliothek
11 October 2019
𐀁 In 2018, Executive Board of UZH approved a pilot project to test possibilities to implement a national repository for publication and storing data 𐀁 No solo effort, but national cooperation with FORS, UNIL and SWITCH 𐀁 Discipline-specific approach, to start with requirements of linguistic research including
Resources and Technology) 𐀁 UZH players: 𐀁 Data Services, Main Library UZH (general requirements, metadata, links linguists to SWISSUbase) 𐀁 S3IT (metadata, application developement)
Hauptbibliothek
11 October 2019
Research Data Life Cycle Research infrastructure LiRI Data publication infrastructure SWISSUbase Data creation Researchers create data (text, video, audio, EEG) Researchers create metadata
Researchers annotate, analyse, segment, structure data
Data access/reuse
Persistent Identifier License agreements Data dissemination Metadata dissemination (CLARIN)
Hauptbibliothek
11 October 2019
𐀁 upload data (metadata, corpus data) via web UI 𐀁 metadata harvested by CLARIN VLO (CMDI via OAI-PMH) 𐀁 interoperable metadata scheme for linguistic research data (META-SHARE) Prospect 𐀁 ingest snapshots/versions of corpora to SWISSUbase (automated) 𐀁 ingest pre-processed metadata schemes from LiRI to SWISSUbase (JSON, XML)
Linguistic Research Infrastructure - LiRI
Linguistic Research Infrastructure - LiRI
consult also head of strategic research infrastructures unit at UZH Thomas Trüb
regulate use and minimize financial risk
local budget risks, greater sustainability
additional staff / percentages or the respective devices at the respective university/department/chair, even more so as many services and devices are only needed for smaller time spans: e.g. 40 CHF/week for a portable eye traccing system (price. 30’000 CHF); 37 CHF/hour for LiRI experts (post-docs) - estimations, we still work on the price list.
Linguistic Research Infrastructure - LiRI
proposals; LiRI will provide support in writing that part of your application
funds, e.g. seed money programs, personal financial equipment etc.
Initiative or else) to help finance smaller research projects