What CLIR researchers assume User is User needs Machine happy (or - - PowerPoint PPT Presentation

what clir researchers assume
SMART_READER_LITE
LIVE PREVIEW

What CLIR researchers assume User is User needs Machine happy (or - - PowerPoint PPT Presentation

iCLEF 2009 overview tags : image_search, multilinguality, interactivity, log_analysis, web2.0 J U LI O G O N ZA LO V CTO R P E I N A D O J U LI O G O N ZA LO , V CTO R P E I N A D O , P A U L CLO U G H & J U S S I K A R LG R E N


slide-1
SLIDE 1

iCLEF 2009 overview

tags: image_search, multilinguality, interactivity, log_analysis, web2.0

J U LI O G O N ZA LO V Í CTO R P E I N A D O J U LI O G O N ZA LO , V Í CTO R P E I N A D O , P A U L CLO U G H & J U S S I K A R LG R E N

CL E F 2 0 0 9 , CO R F U

slide-2
SLIDE 2

What CLIR researchers assume

User needs information. Machine searches. User is happy (or not).

slide-3
SLIDE 3

But finding is a matter of two But finding is a matter of two

smart Fast stupid slow

Room for collaboration!

slide-4
SLIDE 4

“Users screw things up” g p

Can’t be reset Differences between systems dissappear y pp Diff b t i t ti t t ! Differences between interactive systems too! Who needs QA systems having a search engine and a user?

slide-5
SLIDE 5

But CLIR is different

slide-6
SLIDE 6

Help! p

slide-7
SLIDE 7

iCLEF methodology: hypothesis-driven gy yp

hypothesis Reference & contrastive systems, topics, users

y , p ,

latin-square pairing between system/ topic/ user Features:

Hypothesis-based (vs. operational) Controlled (vs. ecological) Deductive (vs. inductive) Sound

slide-8
SLIDE 8

iCLEF 2001-2005: tasks 5

On newswire On im age archives On newswire

Cross-Language

Document Selection On im age archives

Cross-Language Image

search. Document Selection

Cross-Language query

formulation and search. formulation and refinement

Cross-Language Cross Language

Question Answering

slide-9
SLIDE 9

Practical outcome!

slide-10
SLIDE 10

iCLEF 2001-2005: problems 5 p

U li ti h i l

Unrealistic search scenario, user sample

  • pportunistic

i l d i ff i

Experimental design not cost-effective

Only one aspect of CLIR at a time

i h f i i i i b i

High cost of recruiting, training, observing users.

slide-11
SLIDE 11

Pick a document for “saffron”

slide-12
SLIDE 12

Pick an illustration for “saffron”

slide-13
SLIDE 13

Flickr

slide-14
SLIDE 14

iCLEF 2006

Topics Topics Methodology Methodology

Ad hoc: find as many

photographs of (different) european parliaments as

Participants m ust

propose their own m ethodology and european parliaments as possible.

Creative: find five

m ethodology and experim ent design illustrations for this article about saffron in Italy.

Visual: What is the name Visual: What is the name

  • f the beach where this

crab is lying on?

slide-15
SLIDE 15

Explored issues p

  • How users deal with native/ passive/ unknown

languages?

  • Do they actually use CLIR facilities when

il bl ?

user’s behaviour

available?

behaviour

  • Satisfaction (all tasks)
  • Completeness (creative,ad-hoc)
  • Quality (creative)

user’s perceptions

H f i d ( i d h )

search

  • How many facets were retrieved (creative, ad-hoc)
  • Was the image found? (visual)

search effectiveness

slide-16
SLIDE 16

iCLEF 2008/ 2009 / 9

Produce reusable dataset Much larger set of users dataset search log users search log analysis task.

  • nline gam e
slide-17
SLIDE 17

iCLEF 2008/ 2009: Log Analysis / 9 g y

Online game: see this image? Find it! (in any of six languages) Game interface features ML search assistance Users register with a language profile Users register with a language profile Dataset: rich search log

  • All search interactions
  • Explicit success/ failure
  • Post-search questionnaires

Queries

  • Easy to find with the appropriate tags ( typically 3 tags)
  • Hint mechanism (first target language then tags)
  • Hint mechanism (first target language, then tags)
slide-18
SLIDE 18

Simultaneous search in six languages g g

slide-19
SLIDE 19

Boolean search with translations

slide-20
SLIDE 20

Relevance feedback

slide-21
SLIDE 21

Assisted query translation q y

slide-22
SLIDE 22

User profiles p

slide-23
SLIDE 23

User rank (Hall of Fame) ( )

slide-24
SLIDE 24

Group rank p

slide-25
SLIDE 25

Hint mechanism

slide-26
SLIDE 26

Language skills bias in 2008 g g

Native Languages

Language Skills: English

DE EN

native

ES FR IT

native active passive unknown

NL Other

unknown

slide-27
SLIDE 27

Language skills bias in 2008 g g

Target language was for the user…

31% 55% active passive k 55% 14% unknown

slide-28
SLIDE 28

Selection of topics (images) p ( g )

N E li h t ti ( f )

No English annotations (new for 20 0 9) Not buried in search results Visual cues No named entities

slide-29
SLIDE 29

Harvested logs g

20 0 8 20 0 8 20 0 9 20 0 9

312 users / 41 teams 5101 complete search sessions Linguistics students

  • 130 users / 18 teams
  • 2410 complete search sessions
  • CS & linguistics students

Linguistics students,

photography fans, IR researchers from industry and academia monitored groups

  • CS & linguistics students,

photography fans, IR researchers from industry and academia monitored groups academia, monitored groups,

  • ther

academia, monitored groups,

  • ther.
slide-30
SLIDE 30

Language skills bias in 2009 g g 9

0% 1%

Target language was for the user…

active passive k 99% unknown 99%

slide-31
SLIDE 31

Log statistics g

slide-32
SLIDE 32

Distribution of users Distribution of users

slide-33
SLIDE 33

Language skills g g

Interface Interface Native languages Native languages

slide-34
SLIDE 34

Language skills (II) g g ( )

English English Spanish Spanish

slide-35
SLIDE 35

Language skills (III) g g ( )

Germ an Germ an Dutch Dutch

slide-36
SLIDE 36

Language skills (and IV) g g ( )

French French Italian Italian

slide-37
SLIDE 37

Participants (I): log analysis p ( ) g y U i i f

  • Goal: correlation between lexical ambiguity in

queries and search success

  • Methodology: analysis of full search log

University of Alicante

  • Goal: correlations between several search parameters

and search success M th d l t f h l l i

UAIC

  • Methodology: own set of users, search log analysis
  • Goal: correlation between search strategies and

h

UNED

search success

  • Methodology: analysis of full search log

UNED

  • Goal: study confidence and satisfaction from search

logs

  • Methodology: analysis of full search log

SICS

slide-38
SLIDE 38

Participants (II): other strategies p ( ) g

  • Goal: focus on users’ trust and confidence to

reveal their perceptions of the task. M h d l O f f

Manchester Metropolitan

  • Methodology: Own set of users, own set of

queries, training, observational study, retrospective thinking aloud, questionnaires.

Metropolitan University

G l d t di h ll h

  • Goal: understanding challenges when

searching images that have multilingual annotations.

  • Methodology: Own set of users training

University of North Texas

  • Methodology: Own set of users, training,

questionnaires, interviews, observational analysis.

North Texas

slide-39
SLIDE 39

Discussion

8 l “iCLEF l ”

2008+2009 logs = “iCLEF legacy”

442 users w. heterogeneous language skills 7511 search sessions w. questionnaires

iCLEF has been a success in terms of iCLEF has been a success in terms of

providing insights into interactive CLIR

and a failure in terms of gaining adepts?

slide-40
SLIDE 40

So long! g

slide-41
SLIDE 41

And now… the iCLEF Bender Awards