SLIDE 1
Documents Transliterated Queries Transliterated Documents Native - - PowerPoint PPT Presentation
Documents Transliterated Queries Transliterated Documents Native - - PowerPoint PPT Presentation
Native script Documents Transliterated Queries Transliterated Documents Native script Queries 5 teams, 25 runs FIRE 2014 Shared Task on Transliterated Search Overview Contributors: Yesha Shah, Swati Jhawar, Ria Gupta (DA-IICT), Kalika
SLIDE 2
SLIDE 3
SLIDE 4
Transliterated Queries Transliterated Documents Native script Documents Native script Queries
SLIDE 5
SLIDE 6
SLIDE 7
5 teams, 25 runs
SLIDE 8
FIRE 2014 Shared Task
- n Transliterated
Search Overview
SLIDE 9
Contributors:
Yesha Shah, Swati Jhawar, Ria Gupta (DA-IICT), Kalika Bali (MSR India) All participants, Task coordinators
SLIDE 10
Team Runs System
BIT 2
word bi-gram query in both scripts using Google Transliterate; query expansion using pseudo-relevance feedback.
BITS- Lipyantaran 2
Back-transliterated the queries and docs to Devanagari using Google Transliteration engine; removed vowels as part of the normalising step and indexed character n-grams as tokens.
DCU 2
Dictionary of cross-script equivalents from the documents in the corpus which contained the song in both scripts. Transliteration engine for OOV. Edit-distance based term matching; word bigram index
IIITH 1
Roman as the operating script; normalisation rules like repetition
- f the same character was replaced by single occurrence. Edit distance
based matching
Total: 7
SLIDE 11
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
BITS-L IIITH DCU BIT NDCG5 NDCG10 MAP CSR@10
SLIDE 12
Type Example nDCG@5 join/split koi ek taara ek taara (0.14) tune diithi iye anguthi (0.15) 0.286 Other Roman bin tere (0.67) din dhal jaye (0.69) 0.617 Devanagari तेरे मेरे बिच मेः क ै सा है ये िंधन (0.91) तेरे मेरे सपने अि एक रंग (0.93) 0.722 2013 best: 0.805 2014 best: 0.757
SLIDE 13
SLIDE 14
SLIDE 15
SLIDE 16
SLIDE 17
SLIDE 18
13
Total Runs received: 54 Runs accepted: 39 Rejected Runs:
Transliterate-Kgp (3X3=9) 1 more team, 6 more runs.
SLIDE 19
SLIDE 20
SLIDE 21
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
BMS-Brainz IIITH IITP-TS ISI JU-NLP LA TF ETPM Score
Training Data:
- FIRE 2013 (query): 100
- Facebook forum: 700 (no
transliteration)
- #tokens: 20.6k (364)
Test Data:
- FIRE 2013 (query): 100
- Facebook forum: 639 (no
transliteration)
- #tokens: 17.3k (397)
Sub-track Winner:
- JU-NLP-Lab
Contributors:
- Amitava Das
SLIDE 22
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
BMS-Brainz DA-IR IIITH LA TF ETPM Score
Training Data:
- FIRE 2013 (query): 150
- #tokens: 937 (890)
Test Data:
- FIRE 2013 (query): 150
- #tokens: 1078 (1064)
Sub-track Winner:
- DA-IR
Contributors:
- DA-IICT
FIRE 2013 best: 0.976
SLIDE 23
Training Data:
- FIRE 2013 (query): 500
- Facebook forum: 700 (no
transliteration)
- Facebook forum: 30
- #tokens: 27.6k (2420)
Test Data:
- FIRE 2013 (query): 500
- Facebook forum: 708 (no
transliteration)
- Facebook forum: 63
- #tokens: 32.1k (2512)
Sub-track Winner:
- IITP-TS
Contributors:
- Amitava Das
- MSR India
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
LA TF Score
SLIDE 24
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
BMS-Brainz I1 IIITH LA TF ETPM Score
Training Data:
- None
Test Data:
- Blogs: 119
- #tokens: 1271 (815)
Sub-track Winner:
- BMS-Brainz
Contributors:
- Dr. Shambhavi B. R. (BMS)
- Dr. B. M. Sagar (RVCE)
- Sandesh (BMS)
- Shweta Kulkarni (BMS)
- Abhishek J. (BMS)
SLIDE 25
Training Data:
- Blogs: 150
- #tokens: 1914 (0)
Test Data:
- Blogs: 120
- #tokens: 1473 (885)
Sub-track Winner:
- IIITH
Contributors:
- Rekha Vaidyanathan (NIT
Bhopal, TCS)
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
BMS-Brainz IIITH LA TF ETPM Score
SLIDE 26
Training Data:
- None
Test Data:
- Blogs: 49
- #tokens: 974 (0)
Sub-track Winner:
- IIITH
Contributors:
- Dr. Dinesh Jayagopi (IIIT
Bangalore)
- Arun Prasad (IIITB)
- Kumaresh Krishnan (IIITB)
- P. S. Srinivasan (IIITB)
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
BMS-Brainz IIITH LA TF Score
SLIDE 27
SLIDE 28
SLIDE 29
SLIDE 30
Training Data:
- FIRE 2013 (query): 500
- Facebook forum: 700 (no
transliteration)
- Facebook forum: 30
- #tokens: 27.6k (2420)
Test Data:
- FIRE 2013 (query): 500
- Facebook forum: 708 (no
transliteration)
- Facebook forum: 63
- #tokens: 32.1k (2512)
Sub-track Winner:
- IITP-TS
Contributors:
- Amitava Das
- MSR India
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
LA TF Score
SLIDE 31
SLIDE 32
SLIDE 33