ROMIP: one step forward, one step aside http://romip.ru/en/ Pavel - - PowerPoint PPT Presentation

romip one step forward one step aside
SMART_READER_LITE
LIVE PREVIEW

ROMIP: one step forward, one step aside http://romip.ru/en/ Pavel - - PowerPoint PPT Presentation

ROMIP: one step forward, one step aside http://romip.ru/en/ Pavel Braslavski, Ilia Chetviorkin, Maxim Gubin, Natalia Lukashevich, Igor Nekrestyanov, Marina Nekrestyanova, Natalia Vassileva CLEF 2012 ROMIP at a glance TREC-like Russian


slide-1
SLIDE 1

ROMIP:

  • ne step forward,
  • ne step aside

http://romip.ru/en/

Pavel Braslavski, Ilia Chetviorkin, Maxim Gubin, Natalia Lukashevich, Igor Nekrestyanov, Marina Nekrestyanova, Natalia Vassileva CLEF 2012

slide-2
SLIDE 2

ROMIP at a glance

  • TREC-like Russian initiative
  • Started 2002
  • Several freely available

text and image collections

  • 10-15 participating teams each year
  • Remote participation + live meeting
  • Popular testbed for IR research in Russia
  • Related activities: RuSSIR

20.09.2012 ROMIP 2

slide-3
SLIDE 3

20.09.2012 3 ROMIP

ROMIP 2004

slide-4
SLIDE 4

Largest text collections

Collection Documents Size (compressed) Topics Evaluated within ad-hoc search track Legal ~300,000 2 Gb 14,794 220 By.Web 1,524,676 8 Gb ~ 60,000 1 500+ KM.RU 3,010,455 13 Gb ~ 60,000 ~250 20.09.2012 ROMIP 4

slide-5
SLIDE 5

(Retired) text document tracks

Ad-hoc text retrieval Text categorization Snippet generation QA and fact extraction News clustering Search by sample document

20.09.2012 ROMIP 5

slide-6
SLIDE 6

Image collections

Photo collection: 20,000 images from Flickr Dups collection: 15 hrs video 37 800 frames Panoramic series: 55,000 images (data recycled from

Internet Math 2011)

20.09.2012 ROMIP 6

slide-7
SLIDE 7

Image tracks

Content based image retrieval Near-duplicate detection Image annotation Finding panoramic series

20.09.2012 ROMIP 7

slide-8
SLIDE 8
  • Low participation from academia
  • Fatigue of classical IR tasks
  • available relevance tables – no need to participate
  • verfitting on available datasets;
  • hard to model realistic settings and data;
  • well-studied tasks – new results are hard to expect.
  • Limited resources
  • ML challenges (e.g. www.kaggle.com)

20.09.2012 ROMIP

ROMIP by 2011

8

slide-9
SLIDE 9
  • Sentiment analysis
  • Search by query image (low participation )
  • Schedule shifted to fall

20.09.2012 ROMIP

ROMIP light 2011

9

slide-10
SLIDE 10

ROMIP timeline

20.09.2012 ROMIP 10

5 10 15 20 25 30

2003 2004 2005 2006 2007 2008 2009 2010 2011

systems applied systems participated # of tracks

slide-11
SLIDE 11

Sentiment analysis (SA)

Three domains: movies, books, and digital cameras ‘Transfer learning’ (data from different sources) Classification into 2, 3, and 5 classes 23 teams registered 12 submitted results

6 reports published

2-class: 105 runs, 3-class: 81 runs, 5-class: 30 runs

20.09.2012 ROMIP 11

slide-12
SLIDE 12

SA: data

20.09.2012 ROMIP

Training set

15,000+ movie reviews (10-point scale) 24,000+ book reviews (10-point scale) 10,000+ camera reviews (5-point scale)

Test set

blog posts collected via blog search w. subsequent filtering 275 posts on movies 329 posts on books 270 posts on digital cameras

12

slide-13
SLIDE 13

Plans

  • New edition of SA track
  • Finer granularity (sentence)
  • Opinions for a given entity
  • Re-launch of image tracks (in cooperation

with Graphicon conference)

  • Machine translation track

20.09.2012 13 ROMIP

slide-14
SLIDE 14
  • Strong industrial players
  • 1M parallel sentences (Ru-En) to release
  • Collaboration with TAUS Labs
  • Metrics
  • BLEU
  • human assessment

20.09.2012 ROMIP 14

MT evaluation track (2012)

slide-15
SLIDE 15

Thank you! Questions?

Pavel Braslavski pb@kontur.ru

20.09.2012 ROMIP 15