cross language explicit semantic analysis
play

Cross-Language Explicit Semantic Analysis Nedim Lipka Maik Anderka - PowerPoint PPT Presentation

Cross-Language Explicit Semantic Analysis Nedim Lipka Maik Anderka Benno Stein Bauhaus University Weimar www.webis.de 1 Lipka@CLEF [ ] 01.10.09 Outline Retrieval Models The CL-ESA Retrieval Model CL-ESA at TEL@CLEF 2009


  1. Cross-Language Explicit Semantic Analysis Nedim Lipka Maik Anderka Benno Stein Bauhaus University Weimar www.webis.de 1 Lipka@CLEF [ ∧ ] 01.10.09

  2. Outline ❑ Retrieval Models ❑ The CL-ESA Retrieval Model ❑ CL-ESA at TEL@CLEF 2009 ❑ Formalization of CL-ESA 2 Lipka@CLEF [ ∧ ] 01.10.09

  3. Retrieval Models q ∈ Q q ∈ Q Retrieval model R Information� Query� need representation Human query formulation ρ R ( q , d ) d ∈ D R d ∈ D Computer-based� α R relevance judgment Real-world� Document� document Computer-based� representation �document generation Underlying� Conceptual document models, Linguistics, Computer linguistics theories 3 Lipka@CLEF [ ∧ ] 01.10.09

  4. The CL-ESA Retrieval Model Explicit Semantic Analysis, ESA [Gabrilovich/Markovitch 2007] 4 Lipka@CLEF [ ∧ ] 01.10.09

  5. The CL-ESA Retrieval Model Explicit Semantic Analysis 0.4� 0.2� ... � ... � 0.1 0.7 Document� collection D 5 Lipka@CLEF [ ∧ ] 01.10.09

  6. The CL-ESA Retrieval Model Explicit Semantic Analysis 0.5� 0.4� 0.2� ... 0.2� � ... � ... � 0.1� ... � 0.2 0.1 ... 0.3 � 0.7 0.3 Index collection D I� Document� collection D e.g. Wikipedia 6 Lipka@CLEF [ ∧ ] 01.10.09

  7. The CL-ESA Retrieval Model Explicit Semantic Analysis 0.1� ϕ 0.0� 0.5� ... ... � 0.4� 0.2� ... 0.2� � 0.2 ... � ... � 0.1� ... � 0.2 0.1 ... 0.3 � 0.7 0.3 Index collection D I� Document� collection D Concept space e.g. Wikipedia 7 Lipka@CLEF [ ∧ ] 01.10.09

  8. The CL-ESA Retrieval Model Explicit Semantic Analysis 0.1� ϕ 0.0� 0.5� ... ... � 0.4� 0.2� ... 0.2� � 0.2 0.2� ϕ ... � ... � 0.1� ... � 0.2 0.1� 0.1 ... 0.3 � 0.7 ... ... ϕ ESA � 0.3 0.0 Similarity analysis in� Index collection D I� a collection-relative Document� collection D concept space e.g. Wikipedia Ranking: d ∗ = argmax d ∈ D ϕ ESA ( q, d ) , where ϕ ESA ( q, d ) := ϕ ( q | D I , d | D I ) 8 Lipka@CLEF [ ∧ ] 01.10.09

  9. ��� yyy yyy ��� ��� ��� yyy yyy yyy ��� yyy ��� The CL-ESA Retrieval Model Cross-Language Explicit Semantic Analysis German index collection D I2 German collection D 2 0.1� 0.0� 0.5� ... ... ϕ � 0.4� 0.2� ... 0.2� � 0.2 0.2� ... � ... � 0.1� ... � 0.2 0.1� 0.1 ... 0.3 � 0.7 ... ... � 0.3 0.0 ϕ CL-ESA 0.3� 0.7� 0.6� ... ... ϕ � 0.9� 0.6� 0.3� ... � 0.2 0.4� ... � ... � 0.3� ... 0.3 � 0.1� 0.3 0.8 ... � 0.9 ... ... � 0.5 0.4 English index collection D I1 ,� Similarity analysis in� English collection D 1 D I1 aligned with D I2 concept space 9 Lipka@CLEF [ ∧ ] 01.10.09

  10. CL-ESA at TEL@CLEF 2009 Setting Index collection: ❑ Wikipedia snapshot March 2009 ❑ 169000 articles per language ❑ 3 index collections ❑ Query representation: title + description ❑ Document representation: title + subject + alternative 10 Lipka@CLEF [ ∧ ] 01.10.09

  11. CL-ESA at TEL@CLEF 2009 Setting Index collection: ❑ Wikipedia snapshot March 2009 ❑ 169000 articles per language ❑ 3 index collections ❑ Query representation: title + description ❑ Document representation: title + subject + alternative Difficulties at TEL@CLEF: ❑ Selecting the correct index collection. (language detection needed) ❑ Correct index collection not always available. ❑ Fields title, subject, and alternative not always share the same language. 11 Lipka@CLEF [ ∧ ] 01.10.09

  12. CL-ESA at TEL@CLEF 2009 12 Lipka@CLEF [ ∧ ] 01.10.09

  13. Formalization of CL-ESA 13 Lipka@CLEF [ ∧ ] 01.10.09

  14. Formalization of CL-ESA ESA 0.5� ϕ 0.1� 0.2� 0.2� ... � 0.4� 0.0� ... � 0.1� 0.2� 0.1� 0.2 ... � ... ... � ... ... ... 0.3 ... � � � 0.1 0.2 0.3 0.7 0.0 D | D I D I D 14 Lipka@CLEF [ ∧ ] 01.10.09

  15. � y y � y � � y � y Formalization of CL-ESA ESA 0.5� ϕ 0.1� 0.2� 0.2� ... � 0.4� 0.0� ... � 0.1� 0.2� 0.1� 0.2 ... � ... ... � ... ... ... 0.3 ... � � � 0.1 0.2 0.3 0.7 0.0 D | D I D I D A D | D I = A T D I · A D Documents Terms Documents Terms documents� coordinates Concept� Index� •� = T T A D A D | DI = A DI · A D A DI | DI | × | D |� | DI | × | V |� | V | × | D | 15 Lipka@CLEF [ ∧ ] 01.10.09

  16. Formalization of CL-ESA CL-ESA ϕ CL − ESA ( q, d ) = ϕ ( q | D I 1 , d | D I 2 ) , with D I 1 , D I 2 aligned = ϕ ( A T D I 1 · q , A T D I 2 · d ) D I 1 · q ) T · A T = nf ( A T D I 2 · d = nf q T · A D I 1 · A T D I 2 · d 16 Lipka@CLEF [ ∧ ] 01.10.09

  17. Formalization of CL-ESA CL-ESA ϕ CL − ESA ( q, d ) = ϕ ( q | D I 1 , d | D I 2 ) , with D I 1 , D I 2 aligned = ϕ ( A T D I 1 · q , A T D I 2 · d ) D I 1 · q ) T · A T = nf ( A T D I 2 · d = nf q T · A D I 1 · A T D I 2 · d ∼ Cross language term co-occurrence = nf q T · G L 1 , L 2 · d 17 Lipka@CLEF [ ∧ ] 01.10.09

  18. Formalization of CL-ESA CL-ESA ϕ CL − ESA ( q, d ) = ϕ ( q | D I 1 , d | D I 2 ) , with D I 1 , D I 2 aligned = ϕ ( A T D I 1 · q , A T D I 2 · d ) D I 1 · q ) T · A T = nf ( A T D I 2 · d = nf q T · A D I 1 · A T D I 2 · d ∼ Cross language term co-occurrence = nf q T · G L 1 , L 2 · d � �� � Query translation 18 Lipka@CLEF [ ∧ ] 01.10.09

  19. Outlook 1. Consideration of more index collections 2. Better language detection 3. Detailed analysis of document fields 19 Lipka@CLEF [ ∧ ] 01.10.09

  20. 20 Lipka@CLEF [ ∧ ] 01.10.09

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend