Enhancing language resources with maps Janne Bondi Johannessen, - - PowerPoint PPT Presentation

enhancing language resources with maps
SMART_READER_LITE
LIVE PREVIEW

Enhancing language resources with maps Janne Bondi Johannessen, - - PowerPoint PPT Presentation

Enhancing language resources with maps Janne Bondi Johannessen, Kristin Hagen, Anders Nklestad, Joel Priestley The Text Laboratory, University of Oslo LREC, Malta, May 19.-21., 2010 Partners The ScanDiaSyn-project Two goals:


slide-1
SLIDE 1

Enhancing language resources with maps

Janne Bondi Johannessen, Kristin Hagen, Anders Nøklestad, Joel Priestley

The Text Laboratory, University of Oslo

LREC, Malta, May 19.-21., 2010

slide-2
SLIDE 2

Partners

slide-3
SLIDE 3

The ScanDiaSyn-project

Two goals:

  • Investigate

– systematically map and study the syntactic variation across the Scandinavian dialect continuum

  • Document

– create a database: Nordic Syntactic Judgements Database – create a corpus: Nordic Dialect Corpus

  • Transcribed and tagged speech material linked with audio

and video.

  • Web-based with a user friendly interface on the internet.
slide-4
SLIDE 4
slide-5
SLIDE 5
slide-6
SLIDE 6
slide-7
SLIDE 7

Interview Conversation Questionnaire Translation

  • One informant interviewed by the research assistant
slide-8
SLIDE 8

Interview Conversation Questionnaire Translation

  • Two informants from the same

measure point speak freely

slide-9
SLIDE 9

Questionnaire

slide-10
SLIDE 10

The Nordic Dialect Corpus in numbers, 10 May 2010

Informants Places Words Denmark 75 14 229 909 Faroe Islands 19 5 48 427 Iceland 4 1 10 287 Norway 301 94 1 200 120 Sweden 126 40 299 866 Total 525 154 1 788 609

slide-11
SLIDE 11
slide-12
SLIDE 12
slide-13
SLIDE 13
slide-14
SLIDE 14

Search for negation adverbs

slide-15
SLIDE 15

Results, with phonetic and orthographic script plus Google transation

slide-16
SLIDE 16

ikkje

slide-17
SLIDE 17

ikke

slide-18
SLIDE 18

Innte/nte

slide-19
SLIDE 19
  • More information in

map

slide-20
SLIDE 20

Search for non-standard word order (V3)

  • Standard word order: V2

Hvor bor du? Where live you? ’Where do you live?’

  • Dialect word order: V3

Hvor du bor? Where you live? ’Where do you live?’

slide-21
SLIDE 21

How to search

slide-22
SLIDE 22

Results

slide-23
SLIDE 23

V3 dialect word order spread across all Norway

slide-24
SLIDE 24

Database

  • Web-based queries

– Query specific grammatical features by category – Query specific grammatical features by form – Gender queries – Age queries – Diachronic queries

  • Interactive maps

– Grammatical isoglosses – The dialects of particular areas or places – Specific grammatical features

slide-25
SLIDE 25
slide-26
SLIDE 26
slide-27
SLIDE 27
  • Testing V3 order
slide-28
SLIDE 28
slide-29
SLIDE 29

Information on informants

slide-30
SLIDE 30

Information on informants

slide-31
SLIDE 31

Conclusion

  • Maps are indispensible for showing

geographical varation

  • Maps are valuable not just for structured

databases, but also for corpora

  • Generally: any kind of tool that can shed

light on the data is good. Case in point: Google maps and Google translate...

slide-32
SLIDE 32

The action menu

slide-33
SLIDE 33

Count

slide-34
SLIDE 34

Deleting or selecting individual results

slide-35
SLIDE 35

Annotating results

slide-36
SLIDE 36

Downloading files, different formats

slide-37
SLIDE 37

Future research possibilities

  • The Scandinavian Dialect Corpus and Database
  • Opens up possible research for the whole spectre of

Scandinavian dialects syntax morphology phonology socio-linguistics lexicography discourse analysis