Cross-Language Explicit Semantic Analysis Nedim Lipka Maik Anderka - - PowerPoint PPT Presentation

cross language explicit semantic analysis
SMART_READER_LITE
LIVE PREVIEW

Cross-Language Explicit Semantic Analysis Nedim Lipka Maik Anderka - - PowerPoint PPT Presentation

Cross-Language Explicit Semantic Analysis Nedim Lipka Maik Anderka Benno Stein Bauhaus University Weimar www.webis.de 1 Lipka@CLEF [ ] 01.10.09 Outline Retrieval Models The CL-ESA Retrieval Model CL-ESA at TEL@CLEF 2009


slide-1
SLIDE 1

Cross-Language Explicit Semantic Analysis

Nedim Lipka Maik Anderka Benno Stein Bauhaus University Weimar

www.webis.de

1 Lipka@CLEF [∧] 01.10.09

slide-2
SLIDE 2

Outline

❑ Retrieval Models ❑ The CL-ESA Retrieval Model ❑ CL-ESA at TEL@CLEF 2009 ❑ Formalization of CL-ESA

2 Lipka@CLEF [∧] 01.10.09

slide-3
SLIDE 3

Retrieval Models

Real-world document q ∈Q d ∈D d ∈D

R

Computer-based relevance judgment

ρR(q,d)

Information need q ∈Q Query representation

αR

Conceptual document models, Linguistics, Computer linguistics Human query formulation Computer-based document generation Underlying theories Retrieval model R Document representation

3 Lipka@CLEF [∧] 01.10.09

slide-4
SLIDE 4

The CL-ESA Retrieval Model

Explicit Semantic Analysis, ESA

[Gabrilovich/Markovitch 2007]

4 Lipka@CLEF [∧] 01.10.09

slide-5
SLIDE 5

The CL-ESA Retrieval Model

Explicit Semantic Analysis

Document collection D

0.4

  • 0.1

... 0.2

  • 0.7

...

5 Lipka@CLEF [∧] 01.10.09

slide-6
SLIDE 6

The CL-ESA Retrieval Model

Explicit Semantic Analysis

Document collection D

0.4

  • 0.1

... 0.2

  • 0.7

... 0.5

  • 0.2

... 0.2

  • 0.3

... 0.1

  • 0.3

...

Index collection DI e.g. Wikipedia

6 Lipka@CLEF [∧] 01.10.09

slide-7
SLIDE 7

The CL-ESA Retrieval Model

Explicit Semantic Analysis

Document collection D

0.4

  • 0.1

... 0.2

  • 0.7

... 0.5

  • 0.2

... 0.2

  • 0.3

... 0.1

  • 0.3

...

Index collection DI e.g. Wikipedia

ϕ

... ... 0.1 0.0

  • 0.2

Concept space

7 Lipka@CLEF [∧] 01.10.09

slide-8
SLIDE 8

The CL-ESA Retrieval Model

Explicit Semantic Analysis

Document collection D

0.4

  • 0.1

... 0.2

  • 0.7

... 0.5

  • 0.2

... 0.2

  • 0.3

... 0.1

  • 0.3

...

Index collection DI e.g. Wikipedia

ϕ

... ... 0.1 0.0

  • 0.2

concept space

ϕESA ϕ

Similarity analysis in a collection-relative

... ... 0.2 0.1

  • 0.0

Ranking: d∗ = argmaxd∈D ϕESA(q, d), where ϕESA(q, d) := ϕ(q|DI, d|DI)

8 Lipka@CLEF [∧] 01.10.09

slide-9
SLIDE 9

The CL-ESA Retrieval Model

Cross-Language Explicit Semantic Analysis

ϕ

... ... 0.3 0.7

  • 0.2

... ... 0.4 0.1

  • 0.4

Similarity analysis in concept space English index collection DI1, DI1 aligned with DI2 English collection D1

0.6

  • 0.3

... 0.9

  • 0.3

... 0.3

  • 0.5

...

  • yyy
  • yyy
yyy

0.6

  • 0.8

... 0.3

  • 0.9

...

  • yyy
  • yyy
yyy

0.4

  • 0.1

... 0.2

  • 0.7

... 0.5

  • 0.2

... 0.2

  • 0.3

... 0.1

  • 0.3

...

ϕ

... ... 0.1 0.0

  • 0.2

... ... 0.2 0.1

  • 0.0

German collection D2 German index collection DI2

ϕCL-ESA

9 Lipka@CLEF [∧] 01.10.09

slide-10
SLIDE 10

CL-ESA at TEL@CLEF 2009

Setting Index collection:

❑ Wikipedia snapshot March 2009 ❑ 169000 articles per language ❑ 3 index collections ❑ Query representation: title + description ❑ Document representation: title + subject + alternative

10 Lipka@CLEF [∧] 01.10.09

slide-11
SLIDE 11

CL-ESA at TEL@CLEF 2009

Setting Index collection:

❑ Wikipedia snapshot March 2009 ❑ 169000 articles per language ❑ 3 index collections ❑ Query representation: title + description ❑ Document representation: title + subject + alternative

Difficulties at TEL@CLEF:

❑ Selecting the correct index collection. (language detection needed) ❑ Correct index collection not always available. ❑ Fields title, subject, and alternative not always share the same language.

11 Lipka@CLEF [∧] 01.10.09

slide-12
SLIDE 12

CL-ESA at TEL@CLEF 2009

12 Lipka@CLEF [∧] 01.10.09

slide-13
SLIDE 13

Formalization of CL-ESA

13 Lipka@CLEF [∧] 01.10.09

slide-14
SLIDE 14

Formalization of CL-ESA

ESA

0.4

  • 0.1

... 0.2

  • 0.7

...

ϕ

... ... 0.1 0.0

  • 0.2

... ... 0.2 0.1

  • 0.0

D

0.5

  • 0.2

... 0.2

  • 0.3

... 0.1

  • 0.3

...

DI D|DI

14 Lipka@CLEF [∧] 01.10.09

slide-15
SLIDE 15

Formalization of CL-ESA

ESA

0.4

  • 0.1

... 0.2

  • 0.7

...

ϕ

... ... 0.1 0.0

  • 0.2

... ... 0.2 0.1

  • 0.0

D

0.5

  • 0.2

... 0.2

  • 0.3

... 0.1

  • 0.3

...

DI D|DI

AD|DI = AT

DI · AD Terms Documents Documents Concept coordinates Index documents Terms AD|DI = ADI · AD

T

ADI

T

AD

  • =

|DI| × |D| |DI| × |V| |V| × |D|

  • y
y y
  • y
y

15 Lipka@CLEF [∧] 01.10.09

slide-16
SLIDE 16

Formalization of CL-ESA

CL-ESA ϕCL−ESA(q, d) = ϕ(q|DI1, d|DI2),

with DI1, DI2 aligned

= ϕ(AT

DI1 · q, AT DI2 · d)

= nf (AT

DI1 · q)T · AT DI2 · d

= nf qT · ADI1 · AT

DI2 · d

16 Lipka@CLEF [∧] 01.10.09

slide-17
SLIDE 17

Formalization of CL-ESA

CL-ESA ϕCL−ESA(q, d) = ϕ(q|DI1, d|DI2),

with DI1, DI2 aligned

= ϕ(AT

DI1 · q, AT DI2 · d)

= nf (AT

DI1 · q)T · AT DI2 · d

= nf qT · ADI1 · AT

DI2 · d

∼ Cross language term co-occurrence

= nf qT · GL1,L2 · d

17 Lipka@CLEF [∧] 01.10.09

slide-18
SLIDE 18

Formalization of CL-ESA

CL-ESA ϕCL−ESA(q, d) = ϕ(q|DI1, d|DI2),

with DI1, DI2 aligned

= ϕ(AT

DI1 · q, AT DI2 · d)

= nf (AT

DI1 · q)T · AT DI2 · d

= nf qT · ADI1 · AT

DI2 · d

∼ Cross language term co-occurrence

= nf qT · GL1,L2

  • Query

translation

· d

18 Lipka@CLEF [∧] 01.10.09

slide-19
SLIDE 19

Outlook

  • 1. Consideration of more index collections
  • 2. Better language detection
  • 3. Detailed analysis of document fields

19 Lipka@CLEF [∧] 01.10.09

slide-20
SLIDE 20

20 Lipka@CLEF [∧] 01.10.09