from lex i tron to asian wordnet on from lex i tron to
play

From LEX i TRON to Asian WordNet on From LEX i TRON to Asian - PowerPoint PPT Presentation

From LEX i TRON to Asian WordNet on From LEX i TRON to Asian WordNet on Collaborative Development Platform Virach Sornlertlamvanich National Electronics and Computer Technology (NECTEC) NSTDA National Electronics and Computer Technology


  1. � � From LEX i TRON to Asian WordNet on From LEX i TRON to Asian WordNet on Collaborative Development Platform Virach Sornlertlamvanich National Electronics and Computer Technology (NECTEC) NSTDA National Electronics and Computer Technology (NECTEC) NSTDA, Thailand virach@tcllab.org 10 Years LEXiTRON, NECTEC-ACE, Bangkok Convention Center, September 24-25, 2008

  2. ุ LEXiTRON version 1.1 � Corpus-based dictionary � Dictionary for writing y g � เผยแพรในป 2538 � CD-ROM สําหรับ Windows 3.1 Thai � CD ROM สาหรบ Windows 3.1 Thai Edition � ไทย 11,000 คํา ; อังกฤษ 9,000 คํา ฤ � 6 พจนานุกรมในหนึ่งเดียว 1) พจนานกรมไทยทั่วไป ) 2) พจนานุกรมการใชภาษาไทย 3) พจนานกรมคําเหมือนคําตรงขาม 3) พจนานุกรมคาเหมอนคาตรงขาม 4) พจนานุกรมไทยอังกฤษ 5) พจนานกรมกลมคําไทย 5) พจนานุกรมกลุมคาไทย 10 Years LEXiTRON, NECTEC-ACE, Bangkok Convention Center, September 24-25, 2008

  3. ุ C Corpus ‐ based Dictionary b d Di ti and Dictionary for Writing � การเขาถึงคํา � คําเหมือน (synonym) � คําตรงขาม (antonym) ( y ) � ตัวอยางประโยค (usage) � กลมคํา (word group) ( g p) � คําแปล (equivalent) 10 Years LEXiTRON, NECTEC-ACE, Bangkok Convention Center, September 24-25, 2008

  4.  ํ Design of LEX i TRON � สรางจากพจนานุกรมสําหรับ � โครงสรางคํา ระบบแปลภาษา 30,000 คํา � คําเดี่ยว � ขอมูลของคํา � � คําประสม � คํา � Prefix � คําอาน � คาอาน � Suffix � Suffix � ประเภทของคํา ( หลัก 14, ยอย 45) � คําลักษณนาม � Verb pattern (12 -> 9 VPs) � คําเหมือน � คําตรงขาม � คาตรงขาม � ตัวอยางประโยค � คําแปลภาษาอังกฤษ � กลุมความหมาย 10 Years LEXiTRON, NECTEC-ACE, Bangkok Convention Center, September 24-25, 2008

  5. Synset Assignment via English Surface � Use English equivalents to link the existing dictionary to WordNet WordNet � POS (n, v, adv, adj), English equivalent, and English equivalent of synonym of the target language are used to pinpoint the appropriate link � Number of matched English equivalents in the Synset confirms the appropriate link � Experiment on Thai ‐ English, Indonesian ‐ English and Mongolian English dictionaries Mongolian ‐ English dictionaries 10 Years LEXiTRON, NECTEC-ACE, Bangkok Convention Center, September 24-25, 2008

  6. Asian WordNet Development Asian WordNet Development Addition Discussion X-English X English X-English Lookup X-English Indonesian -English g WN merged-WN AWN GWN KUI Thai-English X E X-English li h Correction Applications X-English Dictionary Ontology X-English Translation Translation CL-Search MT Summarization IE/IR Voting 10 Years LEXiTRON, NECTEC-ACE, Bangkok Convention Center, September 24-25, 2008 ….

  7. Synset Assignment (CS=4) ∈ S 0 � Accept the Synset that includes more than one E 00 ∈ English Equivalent with English Equivalent with ∈ L L 0 S 1 confidence score of 4. E 01 ∈ S 2 S Example: L0: เปาหมาย L0: เปาหมาย E0: aim E1: target S0: purpose intent intention aim design S0: purpose, intent, intention, aim, design S1: aim, object, objective, target 10 Years LEXiTRON, NECTEC-ACE, Bangkok Convention Center, September 24-25, 2008 S2: aim

  8. Synset Assignment (CS=3) ∈ S 0 � Accept the Synset that includes more than one L 0 E 0 ∈ English Equivalent from the English Equivalent from the ∈ S 1 synonym of the target L 1 E 1 ∈ language with confidence S 2 S score of 3. f 3 Example: L0: จอง L0: จอง Synonym L1: เพงมอง E0: stare E1: gaze S0: stare S1: gaze, stare 10 Years LEXiTRON, NECTEC-ACE, Bangkok Convention Center, September 24-25, 2008

  9. Synset Assignment (CS=2) � Accept the only Synset that ∈ ∈ includes the English includes the English L 0 E 0 S 0 Equivalent with confidence score of 2. Example: Example: L0: สูติแพทย E0 E0: obstetrician b t t i i S0: obstetrician, accoucheur , 10 Years LEXiTRON, NECTEC-ACE, Bangkok Convention Center, September 24-25, 2008

  10. Synset Assignment (CS=1) � Accept more than one Synset ∈ S 0 that includes each of the English E 0 E 0 ∈ ∈ Equivalent with confidence Equivalent with confidence L 0 score of 1. S 1 E 1 ∈ S 2 Example: L0: ชอง E0: hole E1: canal E1: canal S0: hole, hollow , S1: hole, trap, cakehole, maw, yap, gap 10 Years LEXiTRON, NECTEC-ACE, Bangkok Convention Center, September 24-25, 2008 S2: canal, duct, epithelial duct, channel

  11. Quantitative Evaluation for T ‐ E WordNet (synset) T-E Dict (entry) total assigned total assigned 18,353 11,867 Noun 145,103 43,072 (13%) (13%) (28%) (28%) 1,333 2,298 Verb 24,884 17,669 (5%) (13%) 4,034 3,722 Adjective 31,302 18,448 (13%) (20%) 737 737 1 519 1,519 Adverb 5,721 3,008 (13%) (51%) 24,457 , 19,406 , t t l total 207 010 207,010 82,197 82 197 (12%) (24%) 10 Years LEXiTRON, NECTEC-ACE, Bangkok Convention Center, September 24-25, 2008

  12. Qualitative Evaluation for T ‐ E CS=4 CS=3 CS=2 CS=1 total 5 306 34 55 400 Noun (71.4%) (63.9%) (53.1%) (20.2%) (48.7%) 23 23 6 6 4 4 33 33 Verb (52.3%) (8.0%) (13.8%) (22.3%) 2 2 2 2 Adj Adjective ti (8.0%) (3.4%) 7 4 4 1 16 Adverb Adverb (100%) (100%) (100%) (100%) (100%) (100%) (100%) (100%) (100%) (100%) 451 12 335 44 60 total (43.2%) (43 2%) (80 0%) (80.0%) (60 7%) (60.7%) (30 8%) (30.8%) (18%) (18%) 10 Years LEXiTRON, NECTEC-ACE, Bangkok Convention Center, September 24-25, 2008

  13. Improvement by Consulting Dictionaries from Multiple Improvement by Consulting Dictionaries from Multiple Sources MMT T-E Dictionary CS=4 CS=4 CS=3 CS=3 CS=2 CS=2 CS=1 CS=1 total total 12 335 44 60 451 Total (80 0%) (80.0%) (60.7%) (30.8%) (60 7%) (30 8%) (18%) (18%) (43.2%) (43 2%) MMT and LEXiTRON T-E Dictionary CS=4 CS=3 CS=2 CS=1 total 14 337 72 93 516 Total Total (93.3%) (61.1%) (50.3%) (27.8%) (49.4%) 10 Years LEXiTRON, NECTEC-ACE, Bangkok Convention Center, September 24-25, 2008

  14. Participation 10 Years LEXiTRON, NECTEC-ACE, Bangkok Convention Center, September 24-25, 2008

  15. Lookup 10 Years LEXiTRON, NECTEC-ACE, Bangkok Convention Center, September 24-25, 2008

  16. English ‐ English 10 Years LEXiTRON, NECTEC-ACE, Bangkok Convention Center, September 24-25, 2008

  17. Thai ‐ English 10 Years LEXiTRON, NECTEC-ACE, Bangkok Convention Center, September 24-25, 2008

  18. Thai ‐ Indonesian 10 Years LEXiTRON, NECTEC-ACE, Bangkok Convention Center, September 24-25, 2008

  19. Future Work � Asian WordNet Community � Language resource conversion and alignment � Language technology sharing � Collaborative development platform � Collaborative development platform AsianWordnet AsianWordnet (www.tcllab.org/kui -> www.asianwordnet.org) 10 Years LEXiTRON, NECTEC-ACE, Bangkok Convention Center, September 24-25, 2008

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend