documents transliterated
play

Documents Transliterated Queries Transliterated Documents Native - PowerPoint PPT Presentation

Native script Documents Transliterated Queries Transliterated Documents Native script Queries 5 teams, 25 runs FIRE 2014 Shared Task on Transliterated Search Overview Contributors: Yesha Shah, Swati Jhawar, Ria Gupta (DA-IICT), Kalika


  1. Native script Documents Transliterated Queries Transliterated Documents Native script Queries

  2. 5 teams, 25 runs

  3. FIRE 2014 Shared Task on Transliterated Search Overview

  4. Contributors: Yesha Shah, Swati Jhawar, Ria Gupta (DA-IICT), Kalika Bali (MSR India) All participants, Task coordinators

  5. Team Runs System BIT 2 word bi-gram query in both scripts using Google Transliterate; query expansion using pseudo-relevance feedback. BITS- 2 Back-transliterated the queries and docs to Devanagari using Google Transliteration engine; removed vowels as part of the normalising step and Lipyantaran indexed character n-grams as tokens. DCU 2 Dictionary of cross-script equivalents from the documents in the corpus which contained the song in both scripts. Transliteration engine for OOV. Edit-distance based term matching; word bigram index IIITH 1 Roman as the operating script; normalisation rules like repetition of the same character was replaced by single occurrence. Edit distance based matching Total: 7

  6. 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 BITS-L IIITH DCU BIT NDCG5 NDCG10 MAP CSR@10

  7. Type Example nDCG@5 join/split koi ek taara ek taara (0.14) 0.286 tune diithi iye anguthi (0.15) 2013 best: 0.805 Other bin tere (0.67) 0.617 2014 best: 0.757 Roman din dhal jaye (0.69) तेरे मेरे बिच मेः क ै सा है ये िंधन (0.91) Devanagari 0.722 तेरे मेरे सपने अि एक रंग (0.93)

  8. Total Runs received: 54 Runs accepted: 39 Rejected Runs: Transliterate-Kgp (3X3=9) 1 more team, 6 more runs. 13

  9. Training Data: FIRE 2013 (query): 100 • Facebook forum: 700 (no • transliteration) #tokens: 20.6k (364) • 1 0.9 Test Data: 0.8 FIRE 2013 (query): 100 • 0.7 Facebook forum: 639 (no • 0.6 transliteration) 0.5 #tokens: 17.3k (397) • 0.4 0.3 Sub-track Winner: 0.2 • JU-NLP-Lab 0.1 0 Contributors: BMS-Brainz IIITH IITP-TS ISI JU-NLP • Amitava Das LA TF ETPM Score

  10. Training Data: FIRE 2013 (query): 150 • FIRE 2013 best: 0.976 #tokens: 937 (890) • 1 0.9 Test Data: 0.8 FIRE 2013 (query): 150 • 0.7 #tokens: 1078 (1064) • 0.6 0.5 Sub-track Winner: 0.4 • DA-IR 0.3 0.2 Contributors: 0.1 • DA-IICT 0 BMS-Brainz DA-IR IIITH LA TF ETPM Score

  11. Training Data: • FIRE 2013 (query): 500 • Facebook forum: 700 (no transliteration) • Facebook forum: 30 • #tokens: 27.6k (2420) 1 0.9 Test Data: 0.8 • FIRE 2013 (query): 500 0.7 • Facebook forum: 708 (no 0.6 transliteration) 0.5 • Facebook forum: 63 0.4 • #tokens: 32.1k (2512) 0.3 0.2 0.1 Sub-track Winner: 0 • IITP-TS Contributors: Amitava Das • MSR India • LA TF Score

  12. Training Data: None • Test Data: 1 Blogs: 119 • #tokens: 1271 (815) 0.9 • 0.8 0.7 Sub-track Winner: • BMS-Brainz 0.6 0.5 0.4 Contributors: Dr. Shambhavi B. R. (BMS) • 0.3 Dr. B. M. Sagar (RVCE) • 0.2 Sandesh (BMS) • 0.1 Shweta Kulkarni (BMS) • 0 • Abhishek J. (BMS) BMS-Brainz I1 IIITH LA TF ETPM Score

  13. Training Data: Blogs: 150 • #tokens: 1914 (0) • 1 Test Data: 0.9 Blogs: 120 • 0.8 #tokens: 1473 (885) • 0.7 0.6 Sub-track Winner: 0.5 • IIITH 0.4 0.3 Contributors: 0.2 Rekha Vaidyanathan (NIT • 0.1 Bhopal, TCS) 0 BMS-Brainz IIITH LA TF ETPM Score

  14. Training Data: None • Test Data: 1 Blogs: 49 • 0.9 #tokens: 974 (0) • 0.8 0.7 Sub-track Winner: 0.6 • IIITH 0.5 0.4 Contributors: 0.3 Dr. Dinesh Jayagopi (IIIT • 0.2 Bangalore) 0.1 Arun Prasad (IIITB) • 0 Kumaresh Krishnan (IIITB) • BMS-Brainz IIITH • P. S. Srinivasan (IIITB) LA TF Score

  15. Training Data: • FIRE 2013 (query): 500 • Facebook forum: 700 (no transliteration) • Facebook forum: 30 • #tokens: 27.6k (2420) 1 0.9 Test Data: 0.8 • FIRE 2013 (query): 500 0.7 • Facebook forum: 708 (no 0.6 transliteration) 0.5 • Facebook forum: 63 0.4 • #tokens: 32.1k (2512) 0.3 0.2 0.1 Sub-track Winner: 0 • IITP-TS Contributors: Amitava Das • MSR India • LA TF Score

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend