Linguistic Research Infrastructure Information event October 11, - - PowerPoint PPT Presentation

linguistic research infrastructure
SMART_READER_LITE
LIVE PREVIEW

Linguistic Research Infrastructure Information event October 11, - - PowerPoint PPT Presentation

Linguistic Research Infrastructure - LiRI Linguistic Research Infrastructure Information event October 11, 2019 LiRI team members 10/18/2019 Title of the presentation, Author Page 1 Linguistic Research Infrastructure - LiRI Introduction


slide-1
SLIDE 1

Linguistic Research Infrastructure - LiRI

Linguistic Research Infrastructure

Information event October 11, 2019 LiRI team members

10/18/2019 Title of the presentation, Author Page 1

slide-2
SLIDE 2

Linguistic Research Infrastructure - LiRI

Introduction Elisabeth Stark (project leader, member of LiRI board)

slide-3
SLIDE 3

Linguistic Research Infrastructure - LiRI

Overall idea: The LiRI Architecture

Page 3

slide-4
SLIDE 4

Linguistic Research Infrastructure - LiRI

18.10.2019 Seite 4

We build a new laboratory (collection of devices and facilities) for linguistics and language and speech sciences, plus data storage/processing/science via a group of experts (“LiRI staff”) . We aim at

  • enabling scientific cooperation by providing access to state-of-the-art research infrastructures
  • providing access to shared research resources
  • providing ample data science support
  • participating in teaching and advising graduate students (MA, PhD level)
  • providing compatibility to the major European and international research infrastructure standards (e.g.

CLARIN, FAIR principles)

  • enabling access to both national and international funding sources for research projects that require an

excellent digital research infrastructure

Our vision: LiRI as a starting point for mid- and large-scale collaborative national and international third- party funded research projects.

LiRI mission and strategy

slide-5
SLIDE 5

Linguistic Research Infrastructure - LiRI

Context: What happened until now

  • March 2017: Invitation to submit short proposals (> 5 mio CHF) for the „Swiss Roadmap for

Research Infrastructures 2021-2024“ Co-application by two overarching linguistic units (ZüKL and URPP „Language and Space“)

  • January 2018: Successful evaluation, invitation to submit long proposal
  • July 2018: „A“ evaluation by SNSF (three external experts)
  • October 2018: Decision of board of UZH to establish LiRI with local funding (= continuous

internal applications for each funding year plus SNSF applications for larger devices, R‘Equip, by LiRI team)

  • 17.04.2019 Integration in the Swiss Roadmap
  • First large-scale research infrastructure in linguistics in Switzerland

Page 5

slide-6
SLIDE 6

Linguistic Research Infrastructure - LiRI

Facilities and devices at LiRI

slide-7
SLIDE 7

Linguistic Research Infrastructure - LiRI

LiRI staff

  • (Administration/coordination)
  • System administrator
  • Technician (for lab/devices)
  • Data acquisition expert
  • Data processing expert / software development
  • Data scientists (from 2021 onwards)

Plus ‘LIS team’ until July 2020: cl specialists plus software developer to set up the “Linguistic Information System”, in collaboration with local IT unit S3IT

slide-8
SLIDE 8

Linguistic Research Infrastructure - LiRI

Examples of possible projects in LiRI and users of LiRI

  • Interaction in larger groups:

eye-tracking devices, software, computing power (3d model)

  • The role of input and intake in language acquisition:

Videocameras, eye-tracking devices, LENA devices, esp. for fieldwork

  • Language and communication skills of the elderly people:

EEG systems (stationary and mobile), ABR, NIRS

  • Speaker recognition and understanding of speech articulation:

sound-proof cabins, articulatograph, measurement devices User groups:

  • researchers at universities (local, national, international);
  • specialized research institutions;
  • partners beyond academia, also industry (Forensisches Institut, speech pathologists, etc.).

Page 8

slide-9
SLIDE 9

Linguistic Research Infrastructure - LiRI

What has been done so far

  • Constitution of the LiRI Board (Sabine Stoll, Martin Volk, Elisabeth Stark)

plus LiRI Team (Volker Dellwo, Martin Meyer, Wolfgang Kesselheim; coordination: Agnes Kolmer). – Today: Constitution of LiRI SAB: Prof Shanley E.M. Allen, TU Kaiserslautern; Prof Lars Borin, University of Gothenburg; Prof Anne-Lise Giraud, Université de Genève; Prof Stuart Rosen, University College London UCL; Prof Lukas Rosenthaler, University of Basel.b

  • Launch of a dedicated website, some interviews
  • Submission of proposals (SNSF, local funding schemes) to finance the LiRI devices
  • Hiring of one data scientist, specialist on Data Acquisition (in the field): Dagmar Jung
  • Constitution of a team to set up the Linguistic Information system in collaboration with S3IT (head: Marcel

Riedi): Taras Zakharko, Gerold Schneider, Stefan Vrankovic

  • Collaboration with the Data Services Team for interface to SwissUbase (Andrea Malits, Florian Steurer)
  • Ongoing work on rules of procedures / price list for fees
  • Advertisement of a second job position (focus on data management/processing, data bases, software

development) plus a technician, see website

Page 9

slide-10
SLIDE 10

Linguistic Research Infrastructure - LiRI

Next steps

  • Finalizing rules of procedures / price list for fees, implement a charter of access  recognition of LiRI as a

local UZH technology platform

  • Hire staff (data processing specialist/software developer and technician)
  • Acquire and install devices and facilities
  • Set up the laboratory
  • Start of LiRI on July 1 2020.

Page 10

slide-11
SLIDE 11

Linguistic Research Infrastructure - LiRI

10/12/19 LiRI Page 1

Data Acquisition Units @ LiRI and LiRI research environment

Presenters: Sabine Stoll, Dagmar Jung, Wolfgang Kesselheim,

Martin Meyer, Volker Dellwo

slide-12
SLIDE 12

Linguistic Research Infrastructure - LiRI

Interest groups

  • Neurolinguistics
  • Language Development
  • Phonetics
  • Interaction Studies
  • Digital Linguistics

10/12/19 LiRI Page 2

Language Development Language Processing Language Production Language Interaction

slide-13
SLIDE 13

Linguistic Research Infrastructure - LiRI

Why LiRI? Evolving new methods in all disciplines of Language Sciences:

Equipment and support Data collection and preparation Enabling data reproducibility

10/12/19 LiRI Page 3

slide-14
SLIDE 14

Linguistic Research Infrastructure - LiRI

10/12/19 LiRI

LiRI data acquisition unit

What we do

  • Support for linguistic research
  • Evidence-based linguistics and

big data

  • Help with Laboratory and Field

set-up scenarios

  • Help with acquiring and

processing textual data

  • How to design a corpus –

experience based on Best Practices

  • Ensure sustainability of data

Provide linguistic data acquisition devices and a matching research environment

  • LiRI-LAB infrastructure and

portable devices

  • Localized software for data

processing and analysis

Page 4

slide-15
SLIDE 15

Linguistic Research Infrastructure - LiRI

Language Data Scientists and LAB technician

Which equipment for the research question? How are the devices employed successfully? Which workflow? How to go about data management, data files and metadata? What about ethics, informed consent? Where should the data be archived, in which format, what about access rights…?

10/12/19 LiRI Page 5

slide-16
SLIDE 16

Linguistic Research Infrastructure - LiRI

Primary data: Workflow management

Data Annotation -> Corpus compilation

10/12/19 LiRI Page 6

Field Written Lab

Audiovisual data Textual data Experimental data

slide-17
SLIDE 17

Linguistic Research Infrastructure - LiRI

How to design a corpus – experience based on Best Practices: Workflow management

Recording Metadata Annotation Analysis Publication (versioning) Archiving

10/12/19 LiRI Page 7

CRDN

anonymize export to R archive release integrate media media to UZH process media create backup media to CRDN media to FNUNIV record video store media enter metadata report recordings collect metadata integrate ELAN check ELAN integrate media update database process metadata assign TA translation transcription send ELAN corrections assign glosser glossing answer questions send questions integrate TBX corrections check TBX update/send TBX

UZH FNUNIV

weekly monthly yearly

  • n demand
slide-18
SLIDE 18

Linguistic Research Infrastructure - LiRI

10/12/19 LiRI

Primary data: written/textual

  • Collecting digital text data (e.g. by web crawling)
  • Scanning and OCR of printed texts or handwritten manuscripts
  • Crowd sourcing and Citizen Science
  • Optical Character Recognition (OCR):
  • which equipment and settings
  • which software to use involving complex page layout
  • which export formats depending on the further needs for analysis or

editing

Page 8

slide-19
SLIDE 19

Linguistic Research Infrastructure - LiRI

10/12/19 LiRI

Available equipment in the field and in the lab

LENA speech analysis devices High resolution video cameras (4K) Eye-tracking (LAB and portable) Time-of-Flight Cameras Hi-end acoustic recording equipment Electroencephalogram (EEG) Electromagnetic Articulograph (EMA) Functional Near Infrared Spectroscopy (fNIRS)

Page 9

slide-20
SLIDE 20

Linguistic Research Infrastructure - LiRI

Language Development – Acquisition devices How much input?

LENA technology is standard for measuring talk with children. LENA uses a small wearable audio recorder that is combined with a speech recognition algorithm: it automatically analyzes and segments the audio data in different time frames

10/12/19 LiRI Page 10

slide-21
SLIDE 21

Linguistic Research Infrastructure - LiRI

LENA

10/12/19 LiRI Page 11

slide-22
SLIDE 22

Linguistic Research Infrastructure - LiRI

LENA in the field

10/12/19 LiRI Page 12

slide-23
SLIDE 23

Linguistic Research Infrastructure - LiRI

LENA: automatic analysis of audio environment

10/12/19 LiRI Page 13

slide-24
SLIDE 24

Linguistic Research Infrastructure - LiRI

Documenting verbal and non-verbal interaction

8 x 4K video cameras 8 x 60 Hz ET glasses 4 x ToF cameras

slide-25
SLIDE 25

Linguistic Research Infrastructure - LiRI

Researching interaction: Audio

code switch to the standard variety: enacting the authority of the institution hypothesis about expected events sensory perception assessment of perception

slide-26
SLIDE 26

Linguistic Research Infrastructure - LiRI

Researching interaction: + Video

12.10.19 16

slide-27
SLIDE 27

Linguistic Research Infrastructure - LiRI

Researching interaction: + Eye tracking

Kesselheim & Hottiger - Besucherexperimente 17

slide-28
SLIDE 28

Linguistic Research Infrastructure - LiRI

Researching interaction: + 3-D sensors

slide-29
SLIDE 29

Linguistic Research Infrastructure - LiRI

– Hemodynamic approaches (e.g.) are less ideal as they have a poor temporal resolution – Neurophysiological approaches are better able to investigate language functions at the range of milliseconds – Innovative source estimation allows an estimation of neural sources of cognitive and sensory processes

Brain Imaging Techniques

slide-30
SLIDE 30

Linguistic Research Infrastructure - LiRI

Numerous applications and research questions

slide-31
SLIDE 31

Linguistic Research Infrastructure - LiRI

Teaching Old Dogs New Tricks

slide-32
SLIDE 32

Linguistic Research Infrastructure - LiRI

Brain oscillations Acoustic modulation

slide-33
SLIDE 33

Linguistic Research Infrastructure - LiRI

Auditory Brainstem Response System

slide-34
SLIDE 34

Linguistic Research Infrastructure - LiRI

Mobile EEG

slide-35
SLIDE 35

Linguistic Research Infrastructure - LiRI

Functional Near Infrared Spectroscopy

fNIRS in combination provides an ideal ratio of spatial and temporal resolution

Homae et al. (2006) Neuroscience Kovelman et al. (2012) NeuroImage

slide-36
SLIDE 36

Linguistic Research Infrastructure - LiRI

Understanding ‘VOICE’

Outer appearence Audio-visual voice signal Inner appearence

Understand voice production, acoustics and perception, for example current topics... ... voice processing in the brain. ... relationships between face and voice (e.g. predicting face from a voice). ... individuality in voice and voice recognition. ... encoding of emotion/attitude/attractiveness in voice. Essential - for example - for... ... models of human communication with voice (relation to animal communication) ... voice synthesis, voice recognition ... civil and forensic application in voice analysis

slide-37
SLIDE 37

Linguistic Research Infrastructure - LiRI

Electromagnetic Articulography (EMA)

Measurements of articulatory movements

slide-38
SLIDE 38

Linguistic Research Infrastructure - LiRI

Electro Glottography (EGG)/Laryngography with High speed Endoscopy

slide-39
SLIDE 39

Linguistic Research Infrastructure - LiRI

Equipment of the LiRI Lab

Screening and behavioural testing equipment:

  • Two test cubicles for hearing and cognitive screening.

Audio-visual equipment:

  • Hi-end acoustic recording equipment
  • High resolution video camera
  • Audio-visual channel separation (speakers can see but not hear each other or vice-versa)

Articulatory measurements:

  • Electromagnetic Articulography (EMA)
  • Electro Glottography (EEG) connected with laryngoscopy with high frame-rate (4k) camera
slide-40
SLIDE 40

Linguistic Research Infrastructure - LiRI

The LiRI Speech & Brain Lab will be organised by a technician (m/f; ~80% FTE) with the following responsibilities:

  • Advise scientists about methodological choices for the

different equipment.

  • Assist scientists with more complex equipment (e.g. EMA or

Laryngoscopy)

  • Set-up the equipment for the different experimental sessions
  • Take care of the functionality of the equipment (carry-out

repairs/replace faulty equipment)

  • Organize the time-table of the lab and guarantee smooth and

user-friendly bookings Job description is ready, position will be advertised at next

  • ccasion.

technician

Staffing

slide-41
SLIDE 41

Linguistic Research Infrastructure - LiRI

LiRI Information System with contributions by: Taras Zakharko Gerold Schneider Martin Volk Tanja Samardžić Stefan Vranković

slide-42
SLIDE 42

Linguistic Research Infrastructure - LiRI

LiRI Information System Overview

– LiRI Information System assists clients (linguistic researchers) with defining, setting up and managing infrastructure and workflows necessary to do successful research. – Consulting – Managing data storage with backup – Workflow management – Setting up and managing software tools and services – Assistance with practical tasks and (see following example scenarios)
 – In addition, LIS experts proactively seek out opportunities to improve the state of the art of technical linguistic infrastructure by engaging in synergetic projects: see our workflow project

2

slide-43
SLIDE 43

Linguistic Research Infrastructure - LiRI

Current situation: few hand-crafted isolated installations

3

NLP CL ZüKL NLP DS NLP ES R, Praat, my tools SSD @s3it SSD SSD SSD SSD

local HD

SSD SSD

slide-44
SLIDE 44

Linguistic Research Infrastructure - LiRI

Vision: LiRI Information System

4

Data Storage

S3IT Science Cloud

NLP pipes NoSketch R CQP Praat

slide-45
SLIDE 45

Linguistic Research Infrastructure - LiRI

– A linguistics professor has collected 100,000 tweets and newspaper articles in her SNF project. She wants to do linguistic processing and is looking for help – LiRI offers, for ger, eng, fre, ita, spa and later other languages: – Tokenisation into words and sentences – Recognition of names (person, organisation, location) – Part-of-Speech Tagging – Access to a web-based KWIC query interface, 
 with CQP or NoSketch
 – Syntactic Parsing – Treebank Searches

5

Support scenario I: NLP pipelines

slide-46
SLIDE 46

Linguistic Research Infrastructure - LiRI

– A group of post-docs is investigating statistical models. Some know a bit of R, others do not. They would like to use a collaborative platform – LiRI offers to linguistic researchers: – Pre-processing – Advice on statistics – A pre-configured shiny R environment with our recommended libraries – interactively explore statistics and visualisations – going beyond the power of individual laptops – Seamless access to databases for data that is larger than memory – Sessions and snapshots are always available

6

Support scenario II: statistical collaboration, data analysis

slide-47
SLIDE 47

Linguistic Research Infrastructure - LiRI

– As the result of a project, a large corpus and extensive statistical data have been collected. The data is too large for the servers of the institute, and it should also be accessible 10 years from now. – LiRI offers: – Storage and backup, in collaboration with s3it, UZH’s central IT services and SWISSUbase. – Advice on which solution fits best. – Help on – which data should be downloadable from the web – licensing issues – which data should be stored to ensure re-producability of published results

7

Support scenario III: corpus data management

slide-48
SLIDE 48

Linguistic Research Infrastructure - LiRI

– A sociolinguist has conducted interviews and needs to transcribe them. He is unsure which level of transcription to use (phonetic, textual) and which tools can help in the task – LiRI offers: – Advice and practical help with transcription tools
 like Praat. – Expertise on using automatic transcription tools – Quality Control, storing and archiving of data

8

Support scenario IV: audio and video transcription

slide-49
SLIDE 49

Linguistic Research Infrastructure - LiRI

– A psycholinguist would like to conduct eyetracking and EEG experiments to test her NLP models and research hypotheses. – LiRI offers: – Use of Eye-trackers
 and EEG equipment
 – Collection and 
 management 


  • f the voluminous data

– Advice in statistical and NLP methods

9

Support scenario V: eye-tracking, EEG

slide-50
SLIDE 50

Linguistic Research Infrastructure - LiRI

– A historical linguist is given a large collection of scanned old historic records and books. He would like to scan them and harvest both linguistic change and a glimpse into society from earlier periods – LiRI offers: – Advanced OCR with Transkribus – Mapping of historical spelling variants to 
 present-day varieties, for
 ger and eng, later other languages. – Interfaces for further manual annotation 
 and correction – Help in statistically exploring the corpus, 
 with tools from Digital Humanities

10

Support scenario VI: OCR and diachronic linguistics

slide-51
SLIDE 51

Linguistic Research Infrastructure - LiRI

LiRI 9/25/19

– Large amount of fieldwork materials have to be processed into a corpus by a team of linguists – Open problems: – Work coordination – Issue tracking – Quality control – Reporting and archiving
 – Proposed solution: an IT-assisted workflow

11

Workflow Project: supporting collaborative annotation workflows

slide-52
SLIDE 52

Linguistic Research Infrastructure - LiRI

LiRI 9/25/19 12

Versioned database Annotated data Audiovisual data Task Tracker User User User

Validation Reporting Automatic continuous integration Processing

Manager Manager Archive

Workflow Project: supporting collaborative annotation workflows

slide-53
SLIDE 53

Linguistic Research Infrastructure - LiRI

LiRI 9/25/19

– The workflow combines existing tools and services – Version control system (git) – Issue tracker for managing tasks – Contribution management system for controlled contribution – Storage system for audiovisual data – …and custom business logic and reporting components (developed and maintained by LiRI) – Validation (e.g. check if metadata are present and correct, check that the corpus is well formed) – Identification (assign every new version of a resource a unique descriptor and associate it with the version control) – Archiving (prepare/export the data to be archived) – Reporting (describe the state of the corpora for a given point in time) 


13

S3IT

Workflow Project: supporting collaborative annotation workflows

slide-54
SLIDE 54

Linguistic Research Infrastructure - LiRI

– A sociologist and a linguist have annotated data together. They realise that they will not be able to annotate as much data as they originally planned. They wonder if automatic approaches can help them. – LiRI offers: – Large suite of machine learning tools – Reports on how accurately machine learning 
 labels the data – Extraction of the most salient characteristics (“features”) 
 that make the model parsimonious and leads 
 to further insights

15

Support scenario VII: supervised machine learning

slide-55
SLIDE 55

Hauptbibliothek

SWISSUbase and LiRI

Differentiation and Relationship

LiRI Information Event

11 October 2019, Florian Steurer Data Services, Main Library (HBZ)

slide-56
SLIDE 56

Hauptbibliothek

General Conditions and Goals of SWISSUbase

11 October 2019

  • F. Steurer, SWISSUbase

𐀁 In 2018, Executive Board of UZH approved a pilot project to test possibilities to implement a national repository for publication and storing data 𐀁 No solo effort, but national cooperation with FORS, UNIL and SWITCH 𐀁 Discipline-specific approach, to start with requirements of linguistic research including

  • a. coordination with LiRI at UZH
  • b. interoperability with CLARIN (European Research Infrastructure for Language

Resources and Technology) 𐀁 UZH players: 𐀁 Data Services, Main Library UZH (general requirements, metadata, links linguists to SWISSUbase) 𐀁 S3IT (metadata, application developement)

slide-57
SLIDE 57

Hauptbibliothek

Differentiation of LiRI and SWISSUBase

11 October 2019

  • F. Steurer, SWISSUbase

Research Data Life Cycle Research infrastructure LiRI Data publication infrastructure SWISSUbase Data creation Researchers create data (text, video, audio, EEG) Researchers create metadata

  • Data processing and analysis

Researchers annotate, analyse, segment, structure data

  • Data preservation
  • Storing minor/major versions, evtl. snapshots

Data access/reuse

  • Providing search catalogue

Persistent Identifier License agreements Data dissemination Metadata dissemination (CLARIN)

slide-58
SLIDE 58

Hauptbibliothek

Relationship

11 October 2019

  • F. Steurer, SWISSUbase

𐀁 upload data (metadata, corpus data) via web UI 𐀁 metadata harvested by CLARIN VLO (CMDI via OAI-PMH) 𐀁 interoperable metadata scheme for linguistic research data (META-SHARE) Prospect 𐀁 ingest snapshots/versions of corpora to SWISSUbase (automated) 𐀁 ingest pre-processed metadata schemes from LiRI to SWISSUbase (JSON, XML)

slide-59
SLIDE 59

Linguistic Research Infrastructure - LiRI

The LiRI ‘Business Case’ Elisabeth Stark (project leader, member of LiRI board)

slide-60
SLIDE 60

Linguistic Research Infrastructure - LiRI

Why isn’t all that for free? Why should I be interested in paying for LiRI services?

  • No equipment at one institute or a chair = central service unit / technology platform: please

consult also head of strategic research infrastructures unit at UZH Thomas Trüb

  • No obligation of LiRI to fully refinance the investments, but system of fees necessary to

regulate use and minimize financial risk

  • note that partial financial autonomy of LiRI is also an advantage for us, less dependency on

local budget risks, greater sustainability

  • Fees are considerably cheaper in time, effort and money than individual investments to get

additional staff / percentages or the respective devices at the respective university/department/chair, even more so as many services and devices are only needed for smaller time spans: e.g. 40 CHF/week for a portable eye traccing system (price. 30’000 CHF); 37 CHF/hour for LiRI experts (post-docs) - estimations, we still work on the price list.

slide-61
SLIDE 61

Linguistic Research Infrastructure - LiRI

Where do I get the money from?

  • LiRI fees (direct costs) are in accordance with the rates chargeable in SNSF project

proposals; LiRI will provide support in writing that part of your application

  • All other funding sources (ERC etc.) also foresee infrastructure costs that you can apply for
  • Smaller studies, e.g. pilot studies to prepare a proposal or else, can be funded by smaller

funds, e.g. seed money programs, personal financial equipment etc.

  • Plan: Set up a LiRi Fund (support by foundations, local initiatives such as the Digital Society

Initiative or else) to help finance smaller research projects