From LEX i TRON to Asian WordNet on From LEX i TRON to Asian - - PowerPoint PPT Presentation

from lex i tron to asian wordnet on from lex i tron to
SMART_READER_LITE
LIVE PREVIEW

From LEX i TRON to Asian WordNet on From LEX i TRON to Asian - - PowerPoint PPT Presentation

From LEX i TRON to Asian WordNet on From LEX i TRON to Asian WordNet on Collaborative Development Platform Virach Sornlertlamvanich National Electronics and Computer Technology (NECTEC) NSTDA National Electronics and Computer Technology


slide-1
SLIDE 1
  • From LEXiTRON to Asian WordNet on
  • From LEXiTRON to Asian WordNet on

Collaborative Development Platform

Virach Sornlertlamvanich

National Electronics and Computer Technology (NECTEC) NSTDA National Electronics and Computer Technology (NECTEC) NSTDA, Thailand virach@tcllab.org

10 Years LEXiTRON, NECTEC-ACE, Bangkok Convention Center, September 24-25, 2008

slide-2
SLIDE 2

LEXiTRON version 1.1

Corpus-based dictionary Dictionary for writing

y g

เผยแพรในป 2538 CD-ROM สําหรับ Windows 3.1 Thai CD ROM สาหรบ Windows 3.1 Thai

Edition

ไทย 11,000 คํา;อังกฤษ 9,000 คํา

6 พจนานุกรมในหนึ่งเดียว 1) พจนานกรมไทยทั่วไป )

2) พจนานุกรมการใชภาษาไทย 3) พจนานกรมคําเหมือนคําตรงขาม 3) พจนานุกรมคาเหมอนคาตรงขาม 4) พจนานุกรมไทยอังกฤษ 5) พจนานกรมกลมคําไทย

10 Years LEXiTRON, NECTEC-ACE, Bangkok Convention Center, September 24-25, 2008

5) พจนานุกรมกลุมคาไทย

slide-3
SLIDE 3

C b d Di ti Corpus‐based Dictionary and Dictionary for Writing

การเขาถึงคํา

คําเหมือน (synonym) คําตรงขาม (antonym)

( y )

ตัวอยางประโยค (usage) กลมคํา (word group)

ุ ( g p)

คําแปล (equivalent)

10 Years LEXiTRON, NECTEC-ACE, Bangkok Convention Center, September 24-25, 2008

slide-4
SLIDE 4

Design of LEXiTRON

สรางจากพจนานุกรมสําหรับ

ระบบแปลภาษา 30,000 คํา

โครงสรางคํา

คําเดี่ยว

ขอมูลของคํา

คํา คําอาน คําประสม Prefix Suffix คาอาน ประเภทของคํา (หลัก 14, ยอย 45) คําลักษณนาม Suffix Verb pattern (12 -> 9 VPs) คําเหมือน คําตรงขาม คาตรงขาม ตัวอยางประโยค คําแปลภาษาอังกฤษ กลุมความหมาย

10 Years LEXiTRON, NECTEC-ACE, Bangkok Convention Center, September 24-25, 2008

slide-5
SLIDE 5

Synset Assignment via English Surface

Use English equivalents to link the existing dictionary to

WordNet WordNet

POS (n, v, adv, adj), English equivalent, and English

equivalent of synonym of the target language are used to pinpoint the appropriate link

Number of matched English equivalents in the Synset

confirms the appropriate link

Experiment on Thai‐English, Indonesian‐English and

Mongolian English dictionaries Mongolian‐English dictionaries

10 Years LEXiTRON, NECTEC-ACE, Bangkok Convention Center, September 24-25, 2008

slide-6
SLIDE 6

Asian WordNet Development Asian WordNet Development

Addition Discussion

X-English X English

Lookup

X-English X-English Indonesian

  • English

AWN WN merged-WN

g

GWN KUI

X E li h Thai-English

Applications

Dictionary Ontology CL-Search

Correction Translation

X-English X-English X-English

MT Summarization IE/IR ….

Voting Translation

10 Years LEXiTRON, NECTEC-ACE, Bangkok Convention Center, September 24-25, 2008

slide-7
SLIDE 7

Synset Assignment (CS=4)

Accept the Synset that

includes more than one English Equivalent with L E00 S0 ∈ ∈ English Equivalent with confidence score of 4. L0 S1 E01 ∈ S ∈

Example: L0: เปาหมาย

S2

L0: เปาหมาย E0: aim E1: target S0: purpose intent intention aim design

10 Years LEXiTRON, NECTEC-ACE, Bangkok Convention Center, September 24-25, 2008

S0: purpose, intent, intention, aim, design S1: aim, object, objective, target S2: aim

slide-8
SLIDE 8

Synset Assignment (CS=3)

Accept the Synset that

includes more than one English Equivalent from the L0 E0 S0 ∈ ∈ English Equivalent from the synonym of the target language with confidence f 3 S1 E1 ∈ S ∈ L1

Example: L0: จอง

score of 3. S2

L0: จอง L1: เพงมอง

Synonym

E0: stare E1: gaze S0: stare S1: gaze, stare

10 Years LEXiTRON, NECTEC-ACE, Bangkok Convention Center, September 24-25, 2008

slide-9
SLIDE 9

Synset Assignment (CS=2)

Accept the only Synset that

includes the English ∈ includes the English Equivalent with confidence score of 2. L0 E0 S0 ∈

Example: Example: L0: สูติแพทย E0 b t t i i E0: obstetrician S0: obstetrician, accoucheur ,

10 Years LEXiTRON, NECTEC-ACE, Bangkok Convention Center, September 24-25, 2008

slide-10
SLIDE 10

Synset Assignment (CS=1)

Accept more than one Synset

that includes each of the English Equivalent with confidence E0 S0 ∈ ∈ Equivalent with confidence score of 1. L0 E0 S1 ∈ E1 ∈

Example: L0: ชอง

S2

E0: hole E1: canal E1: canal S0: hole, hollow

10 Years LEXiTRON, NECTEC-ACE, Bangkok Convention Center, September 24-25, 2008

, S1: hole, trap, cakehole, maw, yap, gap S2: canal, duct, epithelial duct, channel

slide-11
SLIDE 11

Quantitative Evaluation for T‐E

WordNet (synset) T-E Dict (entry) total assigned total assigned Noun 145,103 18,353 (13%) 43,072 11,867 (28%) (13%) (28%) Verb 24,884 1,333 (5%) 17,669 2,298 (13%) Adjective 31,302 4,034 (13%) 18,448 3,722 (20%) 737 1 519 Adverb 5,721 737 (13%) 3,008 1,519 (51%) t t l 207 010 24,457 82 197 19,406 total 207,010 , (12%) 82,197 , (24%)

10 Years LEXiTRON, NECTEC-ACE, Bangkok Convention Center, September 24-25, 2008

slide-12
SLIDE 12

Qualitative Evaluation for T‐E

CS=4 CS=3 CS=2 CS=1 total Noun 5 (71.4%) 306 (63.9%) 34 (53.1%) 55 (20.2%) 400 (48.7%) 23 6 4 33 Verb 23 (52.3%) 6 (8.0%) 4 (13.8%) 33 (22.3%) Adj ti 2 2 Adjective 2 (8.0%) 2 (3.4%) Adverb 7 (100%) 4 (100%) 4 (100%) 1 (100%) 16 (100%) Adverb (100%) (100%) (100%) (100%) (100%) total 12 (80 0%) 335 (60 7%) 44 (30 8%) 60 (18%) 451 (43 2%) (80.0%) (60.7%) (30.8%) (18%) (43.2%)

10 Years LEXiTRON, NECTEC-ACE, Bangkok Convention Center, September 24-25, 2008

slide-13
SLIDE 13

Improvement by Consulting Dictionaries from Multiple Improvement by Consulting Dictionaries from Multiple Sources

CS=4 CS=3 CS=2 CS=1 total

MMT T-E Dictionary

CS=4 CS=3 CS=2 CS=1 total Total 12 (80 0%) 335 (60 7%) 44 (30 8%) 60 (18%) 451 (43 2%) (80.0%) (60.7%) (30.8%) (18%) (43.2%)

MMT and LEXiTRON T-E Dictionary

CS=4 CS=3 CS=2 CS=1 total Total 14 337 72 93 516 Total (93.3%) (61.1%) (50.3%) (27.8%) (49.4%)

10 Years LEXiTRON, NECTEC-ACE, Bangkok Convention Center, September 24-25, 2008

slide-14
SLIDE 14

Participation

10 Years LEXiTRON, NECTEC-ACE, Bangkok Convention Center, September 24-25, 2008

slide-15
SLIDE 15

Lookup

10 Years LEXiTRON, NECTEC-ACE, Bangkok Convention Center, September 24-25, 2008

slide-16
SLIDE 16

English‐English

10 Years LEXiTRON, NECTEC-ACE, Bangkok Convention Center, September 24-25, 2008

slide-17
SLIDE 17

Thai‐English

10 Years LEXiTRON, NECTEC-ACE, Bangkok Convention Center, September 24-25, 2008

slide-18
SLIDE 18

Thai‐Indonesian

10 Years LEXiTRON, NECTEC-ACE, Bangkok Convention Center, September 24-25, 2008

slide-19
SLIDE 19

Future Work

Asian WordNet Community Language resource conversion and alignment Language technology sharing Collaborative development platform Collaborative development platform

AsianWordnet AsianWordnet (www.tcllab.org/kui -> www.asianwordnet.org)

10 Years LEXiTRON, NECTEC-ACE, Bangkok Convention Center, September 24-25, 2008