Term and Collocation Extraction by means of complex Linguistic Web - - PowerPoint PPT Presentation

term and collocation extraction by means of complex
SMART_READER_LITE
LIVE PREVIEW

Term and Collocation Extraction by means of complex Linguistic Web - - PowerPoint PPT Presentation

Term and Collocation Extraction by means of complex Linguistic Web Services Ulrich Heid, Fabienne Fritzinger, Erhard Hinrichs, Marie Hinrichs, Thomas Zastrow Institut f ur maschinelle Sprachverarbeitung, Universit at Stuttgart and Seminar


slide-1
SLIDE 1

Term and Collocation Extraction by means of complex Linguistic Web Services

Ulrich Heid, Fabienne Fritzinger, Erhard Hinrichs, Marie Hinrichs, Thomas Zastrow

Institut f¨ ur maschinelle Sprachverarbeitung, Universit¨ at Stuttgart and Seminar f¨ ur Sprachwissenschaft, Universit¨ at T¨ ubingen Germany

Linguistic Resources and Evaluation Conference, 2010: Valletta, Malta

Heid et al. (Stuttgart/T¨ ubingen) D-SPIN Extraction WebServices LREC 2010 1 / 16

slide-2
SLIDE 2

Overview

Heid et al. (Stuttgart/T¨ ubingen) D-SPIN Extraction WebServices LREC 2010 2 / 16

slide-3
SLIDE 3

Overview

  • Objectives and scenarios addressed

Heid et al. (Stuttgart/T¨ ubingen) D-SPIN Extraction WebServices LREC 2010 2 / 16

slide-4
SLIDE 4

Overview

  • Objectives and scenarios addressed
  • Data used for experimentation

Heid et al. (Stuttgart/T¨ ubingen) D-SPIN Extraction WebServices LREC 2010 2 / 16

slide-5
SLIDE 5

Overview

  • Objectives and scenarios addressed
  • Data used for experimentation
  • Procedures to extract single word term candidates

Heid et al. (Stuttgart/T¨ ubingen) D-SPIN Extraction WebServices LREC 2010 2 / 16

slide-6
SLIDE 6

Overview

  • Objectives and scenarios addressed
  • Data used for experimentation
  • Procedures to extract single word term candidates
  • Procedures to extract collocation candidates

Heid et al. (Stuttgart/T¨ ubingen) D-SPIN Extraction WebServices LREC 2010 2 / 16

slide-7
SLIDE 7

Overview

  • Objectives and scenarios addressed
  • Data used for experimentation
  • Procedures to extract single word term candidates
  • Procedures to extract collocation candidates
  • Combining the tools for both extraction tasks

Heid et al. (Stuttgart/T¨ ubingen) D-SPIN Extraction WebServices LREC 2010 2 / 16

slide-8
SLIDE 8

Overview

  • Objectives and scenarios addressed
  • Data used for experimentation
  • Procedures to extract single word term candidates
  • Procedures to extract collocation candidates
  • Combining the tools for both extraction tasks
  • The extraction as a web service:

Architecture – technical issues addressed – open questions

Heid et al. (Stuttgart/T¨ ubingen) D-SPIN Extraction WebServices LREC 2010 2 / 16

slide-9
SLIDE 9

Overview

  • Objectives and scenarios addressed
  • Data used for experimentation
  • Procedures to extract single word term candidates
  • Procedures to extract collocation candidates
  • Combining the tools for both extraction tasks
  • The extraction as a web service:

Architecture – technical issues addressed – open questions

  • Conclusion – Future Work

Heid et al. (Stuttgart/T¨ ubingen) D-SPIN Extraction WebServices LREC 2010 2 / 16

slide-10
SLIDE 10

Objectives

Heid et al. (Stuttgart/T¨ ubingen) D-SPIN Extraction WebServices LREC 2010 3 / 16

slide-11
SLIDE 11

Objectives

  • Provision of computational linguistic tools for
  • Term candidate extraction
  • Collocation candidate extraction
  • Extraction of regionalism candidates

Heid et al. (Stuttgart/T¨ ubingen) D-SPIN Extraction WebServices LREC 2010 3 / 16

slide-12
SLIDE 12

Objectives

  • Provision of computational linguistic tools for
  • Term candidate extraction
  • Collocation candidate extraction
  • Extraction of regionalism candidates
  • Tools based on standard corpus processing techniques:

Tagging – parsing – pattern-based extraction – lexicostatistics

Heid et al. (Stuttgart/T¨ ubingen) D-SPIN Extraction WebServices LREC 2010 3 / 16

slide-13
SLIDE 13

Objectives

  • Provision of computational linguistic tools for
  • Term candidate extraction
  • Collocation candidate extraction
  • Extraction of regionalism candidates
  • Tools based on standard corpus processing techniques:

Tagging – parsing – pattern-based extraction – lexicostatistics

  • Tools wrapped and provided as chains of web services:
  • to assess possibilities of creating complex linguistic web services
  • to test the processing of non-trivial amounts of data via web services

Heid et al. (Stuttgart/T¨ ubingen) D-SPIN Extraction WebServices LREC 2010 3 / 16

slide-14
SLIDE 14

Scenarios addressed

Heid et al. (Stuttgart/T¨ ubingen) D-SPIN Extraction WebServices LREC 2010 4 / 16

slide-15
SLIDE 15

Scenarios addressed

  • Type I: single word term candidate extraction
  • to find specialilzed terms of a specific domain of knowledge
  • to find lexical material specific of a given region:

German of: Germany – Austria – Switzerland – South Tyrol

Heid et al. (Stuttgart/T¨ ubingen) D-SPIN Extraction WebServices LREC 2010 4 / 16

slide-16
SLIDE 16

Scenarios addressed

  • Type I: single word term candidate extraction
  • to find specialilzed terms of a specific domain of knowledge
  • to find lexical material specific of a given region:

German of: Germany – Austria – Switzerland – South Tyrol

  • Type II: extraction of multiword expressions (MWEs)
  • to find collocations (cf. Weller & Heid, this session )
  • to find multiword terms and phraseology of specialized domains
  • to find collocations typical of a “region” (D – A – CH – ST)

Heid et al. (Stuttgart/T¨ ubingen) D-SPIN Extraction WebServices LREC 2010 4 / 16

slide-17
SLIDE 17

Data used in the experiments

Work on German texts

Heid et al. (Stuttgart/T¨ ubingen) D-SPIN Extraction WebServices LREC 2010 5 / 16

slide-18
SLIDE 18

Data used in the experiments

Work on German texts

  • General Language: newspaper texts
  • Frankfurter Rundschau (1992/1993)

40 M

  • Frankfurter Allgemeine Zeitung (1995 - 1998)

78 M

  • Die Zeit (1999 - 2005)

50 M

  • Stuttgarter Zeitung (1992/1993)

36 M

  • Handelsblatt (1995 - 1998)

50 M

  • total newspapers
  • ca. 254 M

Heid et al. (Stuttgart/T¨ ubingen) D-SPIN Extraction WebServices LREC 2010 5 / 16

slide-19
SLIDE 19

Data used in the experiments

Work on German texts

  • General Language: newspaper texts
  • Frankfurter Rundschau (1992/1993)

40 M

  • Frankfurter Allgemeine Zeitung (1995 - 1998)

78 M

  • Die Zeit (1999 - 2005)

50 M

  • Stuttgarter Zeitung (1992/1993)

36 M

  • Handelsblatt (1995 - 1998)

50 M

  • total newspapers
  • ca. 254 M
  • Specialized language (taken from the OPUS Website):
  • European Medecine Agency (EMEA): pharmaceuticals tests

10 M

Heid et al. (Stuttgart/T¨ ubingen) D-SPIN Extraction WebServices LREC 2010 5 / 16

slide-20
SLIDE 20

Data used in the experiments

Work on German texts

  • General Language: newspaper texts
  • Frankfurter Rundschau (1992/1993)

40 M

  • Frankfurter Allgemeine Zeitung (1995 - 1998)

78 M

  • Die Zeit (1999 - 2005)

50 M

  • Stuttgarter Zeitung (1992/1993)

36 M

  • Handelsblatt (1995 - 1998)

50 M

  • total newspapers
  • ca. 254 M
  • Specialized language (taken from the OPUS Website):
  • European Medecine Agency (EMEA): pharmaceuticals tests

10 M

  • National or regional variants of German:
  • Austria (excerpts from the DeReKo corpus of IdS Mannheim)

180 M

  • Switzerland (dito: DeReKo)

180 M

  • South Tyrol (Eurac/Athesia publishers)
  • ca. 60 M

Heid et al. (Stuttgart/T¨ ubingen) D-SPIN Extraction WebServices LREC 2010 5 / 16

slide-21
SLIDE 21

Procedures for single word term candidate extraction

Based of relative frequency relationships

“Weirdness scores”

Ahmad et al. 1992

Heid et al. (Stuttgart/T¨ ubingen) D-SPIN Extraction WebServices LREC 2010 6 / 16

slide-22
SLIDE 22

Procedures for single word term candidate extraction

Based of relative frequency relationships

“Weirdness scores”

Ahmad et al. 1992

  • Intuition:

Terms from a domain are more frequent in domain-specific texts than elsewhere

Heid et al. (Stuttgart/T¨ ubingen) D-SPIN Extraction WebServices LREC 2010 6 / 16

slide-23
SLIDE 23

Procedures for single word term candidate extraction

Based of relative frequency relationships

“Weirdness scores”

Ahmad et al. 1992

  • Intuition:

Terms from a domain are more frequent in domain-specific texts than elsewhere

  • Calculation: for each noun, verb, adjective from the specialized text:

Heid et al. (Stuttgart/T¨ ubingen) D-SPIN Extraction WebServices LREC 2010 6 / 16

slide-24
SLIDE 24

Procedures for single word term candidate extraction

Based of relative frequency relationships

“Weirdness scores”

Ahmad et al. 1992

  • Intuition:

Terms from a domain are more frequent in domain-specific texts than elsewhere

  • Calculation: for each noun, verb, adjective from the specialized text:
  • RS: Relative frequency in the specialized text:

number of occurrences / corpus size (by POS) of the specialized text

Heid et al. (Stuttgart/T¨ ubingen) D-SPIN Extraction WebServices LREC 2010 6 / 16

slide-25
SLIDE 25

Procedures for single word term candidate extraction

Based of relative frequency relationships

“Weirdness scores”

Ahmad et al. 1992

  • Intuition:

Terms from a domain are more frequent in domain-specific texts than elsewhere

  • Calculation: for each noun, verb, adjective from the specialized text:
  • RS: Relative frequency in the specialized text:

number of occurrences / corpus size (by POS) of the specialized text

  • RG: Relative frequency of the same item in general language text:

newspapers taken to be without bias for a given domain

Heid et al. (Stuttgart/T¨ ubingen) D-SPIN Extraction WebServices LREC 2010 6 / 16

slide-26
SLIDE 26

Procedures for single word term candidate extraction

Based of relative frequency relationships

“Weirdness scores”

Ahmad et al. 1992

  • Intuition:

Terms from a domain are more frequent in domain-specific texts than elsewhere

  • Calculation: for each noun, verb, adjective from the specialized text:
  • RS: Relative frequency in the specialized text:

number of occurrences / corpus size (by POS) of the specialized text

  • RG: Relative frequency of the same item in general language text:

newspapers taken to be without bias for a given domain

  • Relationship RS/RG

Heid et al. (Stuttgart/T¨ ubingen) D-SPIN Extraction WebServices LREC 2010 6 / 16

slide-27
SLIDE 27

Procedures for single word term candidate extraction

Based of relative frequency relationships

“Weirdness scores”

Ahmad et al. 1992

  • Intuition:

Terms from a domain are more frequent in domain-specific texts than elsewhere

  • Calculation: for each noun, verb, adjective from the specialized text:
  • RS: Relative frequency in the specialized text:

number of occurrences / corpus size (by POS) of the specialized text

  • RG: Relative frequency of the same item in general language text:

newspapers taken to be without bias for a given domain

  • Relationship RS/RG
  • Output:

Heid et al. (Stuttgart/T¨ ubingen) D-SPIN Extraction WebServices LREC 2010 6 / 16

slide-28
SLIDE 28

Procedures for single word term candidate extraction

Based of relative frequency relationships

“Weirdness scores”

Ahmad et al. 1992

  • Intuition:

Terms from a domain are more frequent in domain-specific texts than elsewhere

  • Calculation: for each noun, verb, adjective from the specialized text:
  • RS: Relative frequency in the specialized text:

number of occurrences / corpus size (by POS) of the specialized text

  • RG: Relative frequency of the same item in general language text:

newspapers taken to be without bias for a given domain

  • Relationship RS/RG
  • Output:

1 items occurring only in the specialized text

Heid et al. (Stuttgart/T¨ ubingen) D-SPIN Extraction WebServices LREC 2010 6 / 16

slide-29
SLIDE 29

Procedures for single word term candidate extraction

Based of relative frequency relationships

“Weirdness scores”

Ahmad et al. 1992

  • Intuition:

Terms from a domain are more frequent in domain-specific texts than elsewhere

  • Calculation: for each noun, verb, adjective from the specialized text:
  • RS: Relative frequency in the specialized text:

number of occurrences / corpus size (by POS) of the specialized text

  • RG: Relative frequency of the same item in general language text:

newspapers taken to be without bias for a given domain

  • Relationship RS/RG
  • Output:

1 items occurring only in the specialized text 2 items more frequent in the specialized text than elsewhere

Heid et al. (Stuttgart/T¨ ubingen) D-SPIN Extraction WebServices LREC 2010 6 / 16

slide-30
SLIDE 30

Procedures for single word term candidate extraction

Scenario type I: typical results – term candidates from EMEA

term candidates f (abs.) Durchstechflasche 5638 Injektionsstelle 3489 Pharmakokinetik 3426 H¨ amoglobinwert 3395 Fertigspritze 3271 Ribavirin 3234 Gebrauchsinformation 2801 Dosisanpassung 2580 Epoetin 2302 Hydrochlorothiazid 2128 term candidates weirdness f (abs.) Filmtablette 25522 6389 Injektionsl¨

  • sung

19854 4970 Packungsbeilage 14710 7365 Niereninsuffizienz 14233 3563 Verkehrst¨ uchtigkeit 13558 3394 Leberfunktion 8385 2099 Hypoglyk¨ amie 8353 2091 Toxizit¨ at 7957 1992 Einnehmen 7035 7045 Hypotonie 6823 1708 Only EMEA (not FR) EMEA and FR

Heid et al. (Stuttgart/T¨ ubingen) D-SPIN Extraction WebServices LREC 2010 7 / 16

slide-31
SLIDE 31

Procedures for collocation candidate extraction

Why not use a flat approach – dependency parsing as an alternative

Heid et al. (Stuttgart/T¨ ubingen) D-SPIN Extraction WebServices LREC 2010 8 / 16

slide-32
SLIDE 32

Procedures for collocation candidate extraction

Why not use a flat approach – dependency parsing as an alternative

  • English: pattern-based extraction + sorting by AMs

Kilgarriff et al. 2004

  • configurational: subject < verb < object
  • little morphological form variation

Heid et al. (Stuttgart/T¨ ubingen) D-SPIN Extraction WebServices LREC 2010 8 / 16

slide-33
SLIDE 33

Procedures for collocation candidate extraction

Why not use a flat approach – dependency parsing as an alternative

  • English: pattern-based extraction + sorting by AMs

Kilgarriff et al. 2004

  • configurational: subject < verb < object
  • little morphological form variation
  • German:

Ivanova et al. 2008

Problems in transferring the Sketch Engine approach

Heid et al. (Stuttgart/T¨ ubingen) D-SPIN Extraction WebServices LREC 2010 8 / 16

slide-34
SLIDE 34

Procedures for collocation candidate extraction

Why not use a flat approach – dependency parsing as an alternative

  • English: pattern-based extraction + sorting by AMs

Kilgarriff et al. 2004

  • configurational: subject < verb < object
  • little morphological form variation
  • German:

Ivanova et al. 2008

Problems in transferring the Sketch Engine approach

  • three models of word order ⇒ need three sets of patterns

Heid et al. (Stuttgart/T¨ ubingen) D-SPIN Extraction WebServices LREC 2010 8 / 16

slide-35
SLIDE 35

Procedures for collocation candidate extraction

Why not use a flat approach – dependency parsing as an alternative

  • English: pattern-based extraction + sorting by AMs

Kilgarriff et al. 2004

  • configurational: subject < verb < object
  • little morphological form variation
  • German:

Ivanova et al. 2008

Problems in transferring the Sketch Engine approach

  • three models of word order ⇒ need three sets of patterns
  • constituent order in the topological Mittelfeld: rather free

⇒ need to permute the patterns

Heid et al. (Stuttgart/T¨ ubingen) D-SPIN Extraction WebServices LREC 2010 8 / 16

slide-36
SLIDE 36

Procedures for collocation candidate extraction

Why not use a flat approach – dependency parsing as an alternative

  • English: pattern-based extraction + sorting by AMs

Kilgarriff et al. 2004

  • configurational: subject < verb < object
  • little morphological form variation
  • German:

Ivanova et al. 2008

Problems in transferring the Sketch Engine approach

  • three models of word order ⇒ need three sets of patterns
  • constituent order in the topological Mittelfeld: rather free

⇒ need to permute the patterns

  • case syncretism of German:
  • nly 22 % of all German NPs in Negra are unambiguous

Evert 2004

⇒ low precision of flat analysis

Heid et al. (Stuttgart/T¨ ubingen) D-SPIN Extraction WebServices LREC 2010 8 / 16

slide-37
SLIDE 37

Procedures for collocation candidate extraction

Why not use a flat approach – dependency parsing as an alternative

  • English: pattern-based extraction + sorting by AMs

Kilgarriff et al. 2004

  • configurational: subject < verb < object
  • little morphological form variation
  • German:

Ivanova et al. 2008

Problems in transferring the Sketch Engine approach

  • three models of word order ⇒ need three sets of patterns
  • constituent order in the topological Mittelfeld: rather free

⇒ need to permute the patterns

  • case syncretism of German:
  • nly 22 % of all German NPs in Negra are unambiguous

Evert 2004

⇒ low precision of flat analysis

  • Alternative: Dependency parsing

Heid et al. (Stuttgart/T¨ ubingen) D-SPIN Extraction WebServices LREC 2010 8 / 16

slide-38
SLIDE 38

Procedures for collocation candidate extraction

Why not use a flat approach – dependency parsing as an alternative

  • English: pattern-based extraction + sorting by AMs

Kilgarriff et al. 2004

  • configurational: subject < verb < object
  • little morphological form variation
  • German:

Ivanova et al. 2008

Problems in transferring the Sketch Engine approach

  • three models of word order ⇒ need three sets of patterns
  • constituent order in the topological Mittelfeld: rather free

⇒ need to permute the patterns

  • case syncretism of German:
  • nly 22 % of all German NPs in Negra are unambiguous

Evert 2004

⇒ low precision of flat analysis

  • Alternative: Dependency parsing

Parsing

Collocation Extraction

Calculation of associative strength

Corpus I (parsed) Collocations Collocations Significant Corpus I Heid et al. (Stuttgart/T¨ ubingen) D-SPIN Extraction WebServices LREC 2010 8 / 16

slide-39
SLIDE 39

Procedures for collocation candidate extraction

Sample dependency analysis

Use of FSPar

Schiehlen 2003 lieferte

TOP

. TOP zweite

ADJ

StudieNP:nom ErgebnisseNP:akk dieSPEC ähnlicheADJ 1 2 3 4 5 6 Die zweite Studie lieferte ähnliche Ergebnisse . ART ADJA NN VVFIN ADJA NN $. d 2. Studie liefern ähnlich Ergebnis . | | Nom:F:Sg 3:Sg:Past:Ind* | Akk:N:Pl | 2 2 3 −1 5 3 −1 SPEC ADJ NP:1 TOP ADJ NP:8 TOP

Heid et al. (Stuttgart/T¨ ubingen) D-SPIN Extraction WebServices LREC 2010 9 / 16

slide-40
SLIDE 40

Procedures for collocation candidate extraction

Scenario type II: typical results – verb+object pairs from Swiss newspapers Abkl¨ arung treffen 96 Abkl¨ arung vornehmen 91 Anlaß besuchen 73 Anlaß durchf¨ uhren 199 Anlaß

  • rganisieren

367 Beschwerde gutheißen 88 Bilanz deponieren 82 Busse aussprechen 72 Defizit budgetieren 94 Einsitz nehmen 295 Einsprache erheben 262 Entscheid f¨ allen 79 Gegensteuer geben 143 Gesuch bewilligen 90

Heid et al. (Stuttgart/T¨ ubingen) D-SPIN Extraction WebServices LREC 2010 10 / 16

slide-41
SLIDE 41

Combining the two scenarios

Extraction of specialized collocations

Tagging Corpus II Parsing Tagging Corpus II (tagged)

Collocation Extraction

Corpus I (parsed) Corpus I (tagged) Filtering Collocations Relevant

Comparison

Single Words Relevant Corpus I Collocations

Steps:

1 Find relevant single word terms (e.g. from EMEA or regional texts) 2 Extract collocation candidates only for these items 3 Output: candidates:

  • EMEA: domain-specific collocations
  • collocations of regionalisms (e.g. from CH)

Heid et al. (Stuttgart/T¨ ubingen) D-SPIN Extraction WebServices LREC 2010 11 / 16

slide-42
SLIDE 42

The extraction as a web service

Framework

D-SPIN web service tool chain: WebLicht

Hinrichs et al. 2010

  • Experiments with chaining of different corpus processing tools
  • Joint effort: Universities of T¨

ubingen, Leipzig, BBAW Berlin and

  • thers

Results Results Results Results Text Tuebingen Berlin Leipzig Stuttgart Berlin Tuebingen Stuttgart Berlin Stuttgart Berlin Leipzig Leipzig Text2Dspin Tokenizer GermaNet Tagger Parser NER Frequency Analyzer Lemmatizer

Heid et al. (Stuttgart/T¨ ubingen) D-SPIN Extraction WebServices LREC 2010 12 / 16

slide-43
SLIDE 43

The extraction as a web service

Architecture principles

  • Tool and resource wrappers:

tools unchanged with respect to stand-alone version

  • Slim format for data exchange

between chained components: D-SPIN Text Corpus Format, TCF

Heid et al. 2010

  • WebLicht used as:
  • Chaining tool and interface
  • Workflow infrastructure

Composition Webservice

Tools XML Wrappers Infrastructure Clients

Web Service Infrastructure Wrapper Wrapper Wrapper Wrapper Transformer Application Application Service Tool A Tool B Res. Res.

Heid et al. (Stuttgart/T¨ ubingen) D-SPIN Extraction WebServices LREC 2010 13 / 16

slide-44
SLIDE 44

The extraction as a web service

Technical problems to be addressed wrt the extraction scenarios

Heid et al. (Stuttgart/T¨ ubingen) D-SPIN Extraction WebServices LREC 2010 14 / 16

slide-45
SLIDE 45

The extraction as a web service

Technical problems to be addressed wrt the extraction scenarios

  • Scenario I: comparison of two corpora
  • Uploading both corpora (e.g. in one ’file’)
  • Or: keeping comparison data (e.g. from one journal) as an internal

resource

Heid et al. (Stuttgart/T¨ ubingen) D-SPIN Extraction WebServices LREC 2010 14 / 16

slide-46
SLIDE 46

The extraction as a web service

Technical problems to be addressed wrt the extraction scenarios

  • Scenario I: comparison of two corpora
  • Uploading both corpora (e.g. in one ’file’)
  • Or: keeping comparison data (e.g. from one journal) as an internal

resource

  • Scenario II: parsing of large amounts of data
  • Time-consuming (10 M words on a LINUX PC: ca. 30 min)
  • Web service should alert user when processing is done

Heid et al. (Stuttgart/T¨ ubingen) D-SPIN Extraction WebServices LREC 2010 14 / 16

slide-47
SLIDE 47

The extraction as a web service

Open problems: parameterizing a complex web service

Input

Statistical

Associative strength Output

separated all−in−one LFG NN+NNgen ADJ+NN Chi−squared T−Score Dice Fisher’s Test LogLikelihood V+Nakk Chi−2 LogL Fisher T−Score Dice Basis

Parsing text

Dependency

Collocation selection

alphabetical

Sorting Result

Significance Collocator

Users may wish to select options

  • Tool-related options:

parser – association measures – collocation types ... to be used = ⇒ Parameters to be given to the individual component tools

  • Output-related options:

sorting of collocation candidates – format of the output = ⇒ Possibly need for extra post-processing components

Heid et al. (Stuttgart/T¨ ubingen) D-SPIN Extraction WebServices LREC 2010 15 / 16

slide-48
SLIDE 48

Conclusion – Future Work

Heid et al. (Stuttgart/T¨ ubingen) D-SPIN Extraction WebServices LREC 2010 16 / 16

slide-49
SLIDE 49

Conclusion – Future Work

  • Computational linguistic tools for term and collocation extraction,

based on standard corpus processing components

Heid et al. (Stuttgart/T¨ ubingen) D-SPIN Extraction WebServices LREC 2010 16 / 16

slide-50
SLIDE 50

Conclusion – Future Work

  • Computational linguistic tools for term and collocation extraction,

based on standard corpus processing components

  • Experiments of web service use:
  • works fine (version at IMS Stuttgart)
  • needs to be registered for WebLicht

Hinrichs et al. 2010

  • open questions wrt parameterization

Heid et al. (Stuttgart/T¨ ubingen) D-SPIN Extraction WebServices LREC 2010 16 / 16

slide-51
SLIDE 51

Conclusion – Future Work

  • Computational linguistic tools for term and collocation extraction,

based on standard corpus processing components

  • Experiments of web service use:
  • works fine (version at IMS Stuttgart)
  • needs to be registered for WebLicht

Hinrichs et al. 2010

  • open questions wrt parameterization
  • Future Work
  • Further development of extraction components

Weller/Heid 2010

  • Integration of components into specific tool chains,

e.g. for provision of raw material to lexicographers

  • Web service parameterization and pertaining user interfaces

Heid et al. (Stuttgart/T¨ ubingen) D-SPIN Extraction WebServices LREC 2010 16 / 16