Documents Transliterated Queries Transliterated Documents Native - - PowerPoint PPT Presentation

documents transliterated
SMART_READER_LITE
LIVE PREVIEW

Documents Transliterated Queries Transliterated Documents Native - - PowerPoint PPT Presentation

Native script Documents Transliterated Queries Transliterated Documents Native script Queries 5 teams, 25 runs FIRE 2014 Shared Task on Transliterated Search Overview Contributors: Yesha Shah, Swati Jhawar, Ria Gupta (DA-IICT), Kalika


slide-1
SLIDE 1
slide-2
SLIDE 2
slide-3
SLIDE 3
slide-4
SLIDE 4

Transliterated Queries Transliterated Documents Native script Documents Native script Queries

slide-5
SLIDE 5
slide-6
SLIDE 6
slide-7
SLIDE 7

5 teams, 25 runs

slide-8
SLIDE 8

FIRE 2014 Shared Task

  • n Transliterated

Search Overview

slide-9
SLIDE 9

Contributors:

Yesha Shah, Swati Jhawar, Ria Gupta (DA-IICT), Kalika Bali (MSR India) All participants, Task coordinators

slide-10
SLIDE 10

Team Runs System

BIT 2

word bi-gram query in both scripts using Google Transliterate; query expansion using pseudo-relevance feedback.

BITS- Lipyantaran 2

Back-transliterated the queries and docs to Devanagari using Google Transliteration engine; removed vowels as part of the normalising step and indexed character n-grams as tokens.

DCU 2

Dictionary of cross-script equivalents from the documents in the corpus which contained the song in both scripts. Transliteration engine for OOV. Edit-distance based term matching; word bigram index

IIITH 1

Roman as the operating script; normalisation rules like repetition

  • f the same character was replaced by single occurrence. Edit distance

based matching

Total: 7

slide-11
SLIDE 11

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

BITS-L IIITH DCU BIT NDCG5 NDCG10 MAP CSR@10

slide-12
SLIDE 12

Type Example nDCG@5 join/split koi ek taara ek taara (0.14) tune diithi iye anguthi (0.15) 0.286 Other Roman bin tere (0.67) din dhal jaye (0.69) 0.617 Devanagari तेरे मेरे बिच मेः क ै सा है ये िंधन (0.91) तेरे मेरे सपने अि एक रंग (0.93) 0.722 2013 best: 0.805 2014 best: 0.757

slide-13
SLIDE 13
slide-14
SLIDE 14
slide-15
SLIDE 15
slide-16
SLIDE 16
slide-17
SLIDE 17
slide-18
SLIDE 18

13

Total Runs received: 54 Runs accepted: 39 Rejected Runs:

Transliterate-Kgp (3X3=9) 1 more team, 6 more runs.

slide-19
SLIDE 19
slide-20
SLIDE 20
slide-21
SLIDE 21

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

BMS-Brainz IIITH IITP-TS ISI JU-NLP LA TF ETPM Score

Training Data:

  • FIRE 2013 (query): 100
  • Facebook forum: 700 (no

transliteration)

  • #tokens: 20.6k (364)

Test Data:

  • FIRE 2013 (query): 100
  • Facebook forum: 639 (no

transliteration)

  • #tokens: 17.3k (397)

Sub-track Winner:

  • JU-NLP-Lab

Contributors:

  • Amitava Das
slide-22
SLIDE 22

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

BMS-Brainz DA-IR IIITH LA TF ETPM Score

Training Data:

  • FIRE 2013 (query): 150
  • #tokens: 937 (890)

Test Data:

  • FIRE 2013 (query): 150
  • #tokens: 1078 (1064)

Sub-track Winner:

  • DA-IR

Contributors:

  • DA-IICT

FIRE 2013 best: 0.976

slide-23
SLIDE 23

Training Data:

  • FIRE 2013 (query): 500
  • Facebook forum: 700 (no

transliteration)

  • Facebook forum: 30
  • #tokens: 27.6k (2420)

Test Data:

  • FIRE 2013 (query): 500
  • Facebook forum: 708 (no

transliteration)

  • Facebook forum: 63
  • #tokens: 32.1k (2512)

Sub-track Winner:

  • IITP-TS

Contributors:

  • Amitava Das
  • MSR India

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

LA TF Score

slide-24
SLIDE 24

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

BMS-Brainz I1 IIITH LA TF ETPM Score

Training Data:

  • None

Test Data:

  • Blogs: 119
  • #tokens: 1271 (815)

Sub-track Winner:

  • BMS-Brainz

Contributors:

  • Dr. Shambhavi B. R. (BMS)
  • Dr. B. M. Sagar (RVCE)
  • Sandesh (BMS)
  • Shweta Kulkarni (BMS)
  • Abhishek J. (BMS)
slide-25
SLIDE 25

Training Data:

  • Blogs: 150
  • #tokens: 1914 (0)

Test Data:

  • Blogs: 120
  • #tokens: 1473 (885)

Sub-track Winner:

  • IIITH

Contributors:

  • Rekha Vaidyanathan (NIT

Bhopal, TCS)

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

BMS-Brainz IIITH LA TF ETPM Score

slide-26
SLIDE 26

Training Data:

  • None

Test Data:

  • Blogs: 49
  • #tokens: 974 (0)

Sub-track Winner:

  • IIITH

Contributors:

  • Dr. Dinesh Jayagopi (IIIT

Bangalore)

  • Arun Prasad (IIITB)
  • Kumaresh Krishnan (IIITB)
  • P. S. Srinivasan (IIITB)

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

BMS-Brainz IIITH LA TF Score

slide-27
SLIDE 27
slide-28
SLIDE 28
slide-29
SLIDE 29
slide-30
SLIDE 30

Training Data:

  • FIRE 2013 (query): 500
  • Facebook forum: 700 (no

transliteration)

  • Facebook forum: 30
  • #tokens: 27.6k (2420)

Test Data:

  • FIRE 2013 (query): 500
  • Facebook forum: 708 (no

transliteration)

  • Facebook forum: 63
  • #tokens: 32.1k (2512)

Sub-track Winner:

  • IITP-TS

Contributors:

  • Amitava Das
  • MSR India

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

LA TF Score

slide-31
SLIDE 31
slide-32
SLIDE 32
slide-33
SLIDE 33