Acquisition of domain- specific multiword expressions in Serbian - - PowerPoint PPT Presentation

acquisition of domain specific multiword expressions in
SMART_READER_LITE
LIVE PREVIEW

Acquisition of domain- specific multiword expressions in Serbian - - PowerPoint PPT Presentation

Acquisition of domain- specific multiword expressions in Serbian Vesna Paji , Milo Paji Center for data mining and bioinformatics University of Belgrade Faculty of Agriculture PARSEME svesna@agrif.bg.ac.rs From domain-specific MWEs


slide-1
SLIDE 1

Acquisition of domain- specific multiword expressions in Serbian

Vesna Pajić, Miloš Pajić

Center for data mining and bioinformatics University of Belgrade – Faculty of Agriculture

svesna@agrif.bg.ac.rs

PARSEME

slide-2
SLIDE 2

From domain-specific MWEs to the terminology

l over 70% of the terms are composed of more than one

word (Krieger and Finatto (2004)) – Multiword Terms (MWTs)

l domain-specific MWEs are difficult to detect

automatically

l morphosynthatic analysis is used to improve the

extraction of MWE

l for MWTs extraction, it is necessary to know the relevant

syntactic structures used in particular language and the domain for expressing specific concepts

svesna@agrif.bg.ac.rs

PARSEME

slide-3
SLIDE 3

Research specifics

Objectives:

l Populate existing MWE dictionary of Serbian l Starting the creation of agricultural terminology l Analyse the properties of MWTs from agricultural domain in

Serbian Tools and Resources:

l Collection of scientific papers in Serbian from the agricultural

engineering domain

l Electronic morphological dictionaries for Serbian (simple

words, multi-words)

l Unitex 3.0

svesna@agrif.bg.ac.rs

PARSEME

slide-4
SLIDE 4

Results

716000 tokens in total 29,753 expressions of the A N structure (~12000 unique) 34% of the 1000 top frequent were MWTs from the agricultural engineering domain angažovana snaga (engaged power), aromatično bilje (aromatic plants), genetički potencijal (genetic potential), hidraulički sistem (hydraulic system) najveći broj (maximum number), nova tehnologija (new technology), dobijeni rezultat (obtained result)

svesna@agrif.bg.ac.rs

PARSEME